If you have ever inherited a fine-tuned checkpoint with no clean way to confirm what it was tuned from, Cisco's new Model Provenance Kit is aimed at exactly that gap. Released as an open-source Python toolkit and CLI, MPK fingerprints models at the weight level — examining architecture metadata, tokenizer structure, and the learned weights themselves to determine whether two transformers share a common origin.
The kit ships with two modes. Compare produces a detailed similarity breakdown between any two models. Scan matches a single model against an initial fingerprint database covering roughly 150 base models across 45 families and 20 publishers, with parameter counts spanning 135M to over 70B. This is a different posture than sigstore-style attestation projects like sigstore/model-transparency, which sign artifacts at publish time. Cisco's approach assumes the artifact is already in your hands and you need to recover lineage from the weights themselves — useful when upstream signing was never performed or when a model arrives with no paperwork.
Weight-level fingerprinting fills a gap that signing alone does not. Every fine-tune, every LoRA merge, every uncredited fork in a HuggingFace pull is a place where lineage gets lost. Regulators leaning on EU AI Act provenance requirements, security teams scanning for poisoned base models, incident response after a CVE on an upstream — all need a way to ask "what is this model, really?" without trusting a manifest. This is the kind of infrastructure the wrapper economy has been quietly missing: not a new model, but a way to know what you're standing on.
If you ship anything that ingests third-party models — internal AI platforms, model marketplaces, fine-tune services — clone the repo, run Scan against your inventory, and see what surfaces. The fingerprint database is the first 150; the value compounds as contributors add more. If you publish base models, contributing fingerprints is how the ecosystem gets honest about lineage.
