Full disclaimer - I can't follow all the words in this new manuscript. It is very computer science term (?) heavy. Honestly, if I hadn't found on page 35 that this code is available it wouldn't have made it on the blog, but from the proteomics data I can follow it looks really promising.
If you're a computational n
erd person, I think this is what you want (Github).
From what I can get, at very reasonable FDR, InstaNovo is identifying as much as 50% of human peptides that are known - with no database at all. None. Sure, having a database for something you have one for looks better, but this opens up a tremendous number of things that we don't have sequences for at all. They pressure test this with less used enzymes (GluC) and do some HLA/MHC peptides and some mixed proteomic samples (metaproteomics).