Understanding the PURL Specification (Package URL)

73 todsacerdoti 64 6/5/2025, 4:02:54 PM fossa.com ↗

Comments (64)

alcroito · 1d ago
I wish PURL proposed something sensible or at least usable for tracking C / C++ native libraries, that are NOT hosted on a registry like conan.io, or one of the linux distro registries, but is still (self-)hosted somewhere online.

For libraries that are hosted on `github`, there's at least the github type.

But there is no official `gitlab` or `git` type, and i've read comments that even the `github` type is considered a mistake.

One example of such a library could be a Qt or KDE / Plasma library.

They are hosted on their own forges, https://code.qt.io/ and https://invent.kde.org respectively.

So to the more knowledgeable people out there, what is the PURL way of identifying a C++ library like that?

Is `generic` type + vcs_url qualifier really the only way?

Right now it seems impossible to track vulnerabilities for such libraries with OSS / open tools, because none of the open tools or databases support a custom type or registry or ecosystem.

For example none of services here support some custom C++ ecosystem (putting aside conan):

https://docs.dependencytrack.org/analysis-types/known-vulner...

Same for https://docs.dependencytrack.org/datasources/osv/

pombreda · 21h ago
You wrote:

> So to the more knowledgeable people out there, what is the PURL way of identifying a C++ library like that?

That's a blind spot. This is a real problem for every as you rightfully explained.

So I have been thinking a lot about how to track C/C++ native libraries, and I have been working on a plan to deal with this.

You can read a summary there (that I just posted to supply this discussion!) - https://github.com/aboutcode-org/www.aboutcode.org/issues/30

And this comment links to more detailed work-in-progress planning doc: - https://github.com/aboutcode-org/www.aboutcode.org/issues/30...

If you want to chip in and help, this would be awesome.

And IMHO, aligned with your thinking this should not be tied to a build system or a for-profit operation like conan.io, or a linux distro, or for that matter a specific build tool or approach as they are so many, and be self-hosted, easy to sync, and simple to store in a git repo.

alcroito · 20h ago
Thanks for the links! I hope the proposal works out. I skimmed through the doc, and one thing i’d suggest is to consider using the CPS format rather than the ABOUT one for the metadata. The format is driven by Kitware, the developers of cmake, and thus if it’s contributed to them, a big chunk of the cpp ecosystem would get buy-in just because of the intertia of using cmake, and getting it for free with the tool.

https://cps-org.github.io/cps/overview.html

I’m not sure how I can help, but I’m open for discussion, because the company i work for is also interested in how to handle this well for our products.

pombreda · 6h ago
let's chat. There is a really a lot of folks interesting because of the suffering! ABOUT is just a suggestion, and an TIL about cps and it looks awesome! pombredanne@aboutcode.org Or a comment on the issue or doc linked.
pombreda · 22h ago
Note that there should be a gitlab type as it is planned for: https://github.com/package-url/purl-spec/blob/a90ee02679afc3...

gitlab and github do provide package-like discoverability. Do you have a pointer that says a github package is a mistake?

alcroito · 10h ago
I believe i was thinking of the comments at https://github.com/package-url/purl-spec/issues/59 but I see you've already replied there.
donenext · 1d ago
completely agree here `git` type using the namespace of your choice would be plenty to enable tools to find these packages. Even though its not "officially" supported in the spec this is what we do internally
pombreda · 21h ago
IMHO, a bare git stuff would be a git URL as specified in pip and SPDX and not a PURL... I would be interested to know more about your use case. Feel free to drop a note at pombredanne@aboutcode.org
giantrobot · 19h ago
Something else PURLs don't capture well for native libraries is any sort of build configuration. I don't know of any clear way in a PURL to describe a say Debian package built from a src package with a custom set of compiler options.

For Java and interpreted language packages the "build" configuration is less important or non-existent. For compiled packages the build environment is important.

It seems the only way is to use a custom namespace and abuse the qualifiers but then you've got a non-canonical PURL and its utility in things like SBOMs is limited.

pombreda · 6h ago
Good point, but that's may not be in scope either... since this is not even something you can get from Debian easily: not just looking at a Debian pool or diving into a package control files AFAIK?

Say I rebuild a Debian package with some new build options.

Is this a the same or a new package? I'd say a new one.

Is this the same name? I'd say a new one.

Is this distributed by Debian? Nope, so this comes from another repo and pool, right?

The idea with PURL is to have simple and short PURLs for the common case, and make it possible to handle less common cases. Rebuilding a package and sharing it on another repo would be a less common case to me? WDYT?

giantrobot · 2h ago
I've worked with ingesting and generating SBOMs a bit which is where my experience with PURLs come from. I loved the idea because it gets about 80% to usefully identifying software components. So just to be clear I don't dislike them and think you've done good work.

I don't necessarily agree that a site-built package is a different package. It's just a single line of text might not be enough to encode build configurations.

A binary package built by Debian's build fleet is a unique artifact signed by the project's keys. It's a thing with a canonical identifier. A deb-src, Gentoo package, or FreeBSD port might have a canonical identifier for the original source but that isn't canonical once it's built on a machine. In many cases the difference is immaterial but there's a lot of #IFDEFs in a lot of code. Then whatever autoconf generates for any system.

The canonical source distribution is useful information but then so is the build information. I'm not sure this can be captured via qualifiers, at least I can't think of a way to do it.

Maybe just a source package is enough. For reporting a bug or CVE knowing something came from a particular source package is a start to triaging an issue. But you'd want a distinct namespace for source packages. A source package namespace at least tells you "in summary this package contains all the diffs Debian uses" versus the PURL for the upstream source package (from GitHub etc).

emddudley · 1d ago
Not related to PURLs (Persistent URLs) administered by the Internet Archive.

https://purl.archive.org/

CaliforniaKarl · 1d ago
Or PURLs in general, the concept for which was developed in 1995, per https://en.m.wikipedia.org/wiki/Persistent_uniform_resource_...
layer8 · 1d ago
Nor to the Purl programming language: https://esolangs.org/wiki/Purl

I wonder if Yarn will support PURLs. ;)

ttepasse · 1d ago
I remember when purl.org namespace URIs where the thing for RSS 1.0 modules. 25 years ago,
pombreda · 1d ago
Not at all related. Just nicknamed the same.
01HNNWZ0MV43FF · 1d ago
Where can I read more about this?
pombreda · 1d ago
We maintain the spec at https://github.com/package-url/purl-spec

And the new thing, working towards making it a real standard with Ecma https://tc54.org/purl/ ... :)

kdeldycke · 23h ago
I have a project called Meta Package Manager that supports pURLs, so you can:

$ mpm install pkg:npm/left-pad@1.2.3

Other commands allows you to export the SBOM of all packages installed on your machine. More info at: https://github.com/kdeldycke/meta-package-manager

pombreda · 21h ago
This is awesomely nice!
zzo38computer · 21h ago
In my opinion, there are some problems with this, such as:

- The cryptographic hash is not included. (They do mention security, a hash and/or public keys would be helpful for security. It would also be helpful for identification if names are reused for unrelated reasons.)

- There is not a distinction between interfaces and implementations (which in some cases you might care about, although not always).

- They do not mention examples of what qualifiers are possible for some package types.

pombreda · 21h ago
Can you tell a bit more? What is this? The OP article?
cryptonector · 13h ago
u/zzo38computer wants:

- an optional(?) hash parameter

- a way to say you depend on a thing for which there are multiple implementations and not specify which implementation

pombreda · 6h ago
For "generic" interface-based dependencies, that's tougher.

This is a problem with a few ecosystems. OTH rpms, debs and Java OSGI... and may be a few more. We need to survey these to find if we can solve that and if this is a PURL problem at all.

Can I rope you in and interest you in filing an issue in the spec so we can move the discussion there? :P This would be great.

https://github.com/package-url/purl-spec/issues/

cryptonector · 1h ago
Well, for one thing a dependence on an interface could not have a hash to bind the provider(s), but one could have a dependence on an interface and also associated dependencies on one-of-N providers of the interface, then the latter could have hashes.

Basically you need a way to indicate "this package is an interface and requires providers of it" and also you need a way to indicate which packages are the associated providers (either as attributes of the interface PURLs, as attributes of the provider PURLs, or both).

pombreda · 6h ago
We have a standard checksum "qualifier" at https://github.com/package-url/purl-spec/blob/main/PURL-SPEC... ... that would be the "hash" ... would this work?
dedicate · 1d ago
Okay, so PURL is basically the thing that actually makes SBOMs usable for open source, not just a list of 'best guesses' with CPEs?
pombreda · 1d ago
That's actually the best explanation I have seen in a long time!

- in most cases, no guesses needed - you can use it in Cyclone, SPDX, and CSAF and still talk about the same package even if the format varies - CVE.org is considering it as an addition on the same footing as CPE - there a good bunch of databases that "speak" PURL, like Google OSV, Sonatype OSS Index, Deps.dev, and AboutCode's PurlDB and VulnerableCode (disclosure: I am a lead maintainer for AboutCode FOSS projects) - most scanners speak PURL too.

Note that same scanners and tools speak not exactly PURL but some "PURLish" dialect and we have a project to help streamline that and lift up the whole ecosystem of PURL users with https://nlnet.nl/project/purlvalidator/

donenext · 1d ago
Yes, 1000x yes
quibono · 1d ago
For all its expressiveness of the CPE format I find PURLs much easier to work with. Especially when it comes to software that doesn't fall neatly into the classic vendor/product split like what CPE envisions.
pombreda · 21h ago
Yeah, the CPE idea of a vendor for an open source package does not compute too well!

FWIW, PURL came about as I could NOT put my mind around CPEs when I was scanning for package and deps with scancode and could not find any easy way to go from that to looking up a vulnerability/CVE in the NVD, as it was all guesswork and manual.

So we started instead to put the vuln data in our own db, keyed by something that would be easy to relate from the scans. This eventually became PURL

This is all tracked in these places: - The original issue: https://github.com/aboutcode-org/scancode-toolkit/issues/805 - The initial pull request with many comments: https://github.com/package-url/purl-spec/pull/1

RS-232 · 1d ago
I love PURLs, but the namespace attribute smells. It’s way too arbitrary.

What’s the point of com.something.other? Why are we using dot notation when everything else is kebab case?

pombreda · 21h ago
Not sure I parse... do you mind to elaborate?
pombreda · 21h ago
Is this about Maven "groupid" mapped to a namespace? "com.foo.bar" is Maven's own invention and notation.... in most cases we are just trying to adopt the ecosystem convention to minimize fictions.
account42 · 9h ago
Shouldn't it be called PURI because it only indentifies a package but doesn't locate it?
pombreda · 6h ago
Actually, this is a locator alright. You can resolve a PURL to an actual package in an actual location.
rahkiin · 1d ago
How does the purl work for docker images that are not hosted on docker? Or custom npm registries?
LawnGnome · 1d ago
The standard supports a repository_url "qualifier" (query parameter)[0], which can be used to override whatever the default registry is (which, for Docker, is hub.docker.com[1]).

[0]: https://github.com/package-url/purl-spec/blob/main/PURL-SPEC...

[1]: https://github.com/package-url/purl-spec/blob/main/PURL-TYPE...

nonethewiser · 1d ago
Maybe fall into here?

>There's even a generic type as a catch-all for things that don't fit an existing ecosystem (for example, a proprietary or legacy component) or for ecosystems that build custom distributions, such as yocto or buildroot. We should note, however, that SBOM and software composition analysis tools vary widely in their ability to understand generic PURLs, so we do recommend you talk to your current (or prospective) vendor if this is an important feature for you.

pombreda · 1d ago
You want to avoid the "generic" type... and for docker containers and OCi images that's not needed.
m4r71n · 1d ago
You can use the `oci` package type for non-Docker images (or any OCI artifacts for that matter).
heavenlyhash · 1d ago
soo..... what's the guidance for when package names include a slash?

such as approximately everything in golang, which very often matches e.g. "github.com/*" as a package name?

Do would PURL suggest that "github.com/foobar/go-whatnot" should be parsed as namespace="github.com" (odd) and package name "foobar/go-whatnot" (since there aren't any more slashes in the blessed separators)?

layer8 · 1d ago
The canonical answer would be percent-encoding, so pkg:golang/github.com%2Ffoobar/go-whatnot.

https://en.wikipedia.org/wiki/Percent-encoding#:~:text=accor...

pombreda · 1d ago
Encode the slash as explained in the clarified spec https://github.com/package-url/purl-spec/pull/453 :)

We are working on further clarifying Golang which a bit problematic: there is really no name or namespace in Go, just a path, and it is not possible at scale to tell when a Go module stops and when a Go package starts just by looking at the path... this is going to be clarified after the merge of the PR 453.

pombreda · 1d ago
Disclosure: I created that spec and we are working hard to clarify it and remove grey areas!
conradludgate · 1d ago
I don't know, but I imagine those are actually the namespace. Eg I would imagine pkg:go/github.com/foo/bar@1.0.0 To be package bar in the github.com/foo namespace.

The distinction doesn't really seem to matter much between namespace and name in all honestly.

pombreda · 1d ago
Agreed. In hindsight, I always wonder if this was a good idea to have this split. At least the namespace is optional and required only certain package types
Joker_vD · 1d ago
What's the guidance when URI paths include a slash?

    pkg:github.com%2Ffoobar/go-whatnot
pombreda · 1d ago
This is not a valid PURL as it is missing a type, assuming you wanted golang here.

It could be instead:

    pkg:golang/github.com%2Ffoobar%2Fgo-whatnot
donenext · 22h ago
Hot take, `generic` as a type is a crutch most tooling uses out of laziness and has significantly reduced the usefulness of PURL spec. How do we improve this?
pombreda · 21h ago
Yeah, I added generic as an escape hatch, but this should be only used by exception, e.g., a crutch. An abused crutch.

Eventually, let's fix this first for C/C++:

https://github.com/aboutcode-org/www.aboutcode.org/issues/30

And based on that approach we can either: 1. create new, sensible types as needed 2. and/or maintain a last resort open registry of generic types at least so we get some sanity in the process.

jessoteric · 22h ago
isn't the issue that sometimes a given scanner can't know from where the package is sourced?

like if I'm scanning an arbitrary linux system, and I see `libssl.so.1` but I don't see it in the local package manager, I don't really have an option other than to call it generic.

I do agree that "generic" seems to be WAY overused though. Maybe tools that report on SBOMs, like FOSSA or whatever, should emit warnings to users about "generic" PURLs.

pombreda · 21h ago
> isn't the issue that sometimes a given scanner can't know from where the package is sourced?

That's the problem: there is no metadata with or in libssl.so.1 that I can reliably use to tell what this is

Eventually I can see a solution made of

1. create the metadata, say a simple YAMl or deb822 key-valud pair file that can then be included upstream or as an overlay 2. define a simple spec for binary formats to include a PURL (say in an ELF section or a WinPE string or sorts, where many of these are already stored) 3. create content-based tools like we have in PurlDB to match code, but may be more like a bunch of generated yara rules that would match symbols and strings from source to binaries and can recognize that libssl.so.1 is from OpenSSL 1.1.1g.

donenext · 22h ago
Thats fair. It just seems silly that a spec intended to "uniquely ID a package" supports a type that is the complete opposite of "unique". I guess another way to frame my take is should `generic` be consider a valid PURL? Keep it as a fall back sure, but distinguish between "fully qualified" PURLs and "partial" PURLs.

This then gives tooling a path to prompt users to provide missing context needed to fully qualify the PURL

pombreda · 21h ago
> distinguish between "fully qualified" PURLs and "partial" PURLs.

Can you tell a bit more? Not sure I get what you meant

jessoteric · 21h ago
That seems like a good idea... hmm.
donenext · 22h ago
Can we completely eliminate generic as a type to remove this crutch?
pombreda · 21h ago
All abstractions leak eventually, so we need that escape hatch IMHO. Otherwise you end up with the other issue which is that there are stuff you cannot track with PURL?
CodingKing · 18h ago
PURL is messed up. URLs have location. Urns have identity. This should have just been a registered URN.
pombreda · 6h ago
Yeah, this is messed up! But tell me when you cannot locate a correct PURL?
90s_dev · 1d ago
xkcd 927 is shown in the first link. It seems xkcd is now as official a part of the everlasting software community as markdown is.
pombreda · 1d ago
Actually, I also used it when I first presented PURL at FOSDEM in 2018 https://archive.fosdem.org/2018/schedule/event/purl/ .... scroll the video at 9 minutes :] We need moooaaar standards, do we?
specialist · 6h ago
Ya, technical allegory. Nicely spotted.

"Shaka, when the walls fell."

https://en.wikipedia.org/wiki/Darmok

pombreda · 5h ago
:D