Wednesday, May 24, 2017

Is de novo sequencing already a viable alternative to database searches?


If you look you might see some ravings on this blog regarding the DeNovoGUI, another incredible free resource out of the CompOmics group.

If you're interested, the original paper is here.

Since that paper came out the DeNovoGUI has expanded and incorporated more algorithms. When I booted my copy today it told me there were new updates as well! (Downloading now)

We all know de novo searching algorithms are out there. I know more and more labs that are using PEAKS as their primary software -- meaning PEAKS has come a long way! I think the consensus maybe 5-6 years ago was -- yeah, it was a nice tool, but it was your fallback plan if you didn't find what you were looking for with database tools.

As a sign -- and thorough measurement of this possible shift, check out this new paper from Thilo Muth and Bernhard Renard (the latter is a fun name to say! try it 3 times fast!)


The question they set out to answer -- are the de novo algorithms, right now, a good alternative to the database tools?

To test it they get 4 publicly deposited datasets. All were generated on Orbitraps. 3 are high/low (Orbitrap for MS1 and ion trap for MS/MS) and one is high/high. Yeast, human, mouse, and some weird thing -- oh, it's an extremophile! cool! Pyrococcus furiosis. My last Latin class was {dedacted} years ago, but I'm pretty sure it's name means something like -- "we found this tube shaped thing growing in an active volcano" I may need to check these RAW data out later!

For comparison they use PEAKS, Novor and PepNovo -- 2 of which can just be ran in the DeNovoGUI (but they may have ran them some other way, I didn't check).

To establish their working base, all the data was searched with MS-GF+ and X!Tandem. I'm a little fuzzy on the details (honestly, I skimmed a little...big day ahead!), but I think they took the peptide spectral matches that both engines agreed upon.

There is a TON to be learned from this paper -- including some really interesting info on what peptide sequences modern de novo engines have the most trouble with, which ones scale the best (more processors meaning much more performance), etc., etc.,

But check this out. Oxford Bioinformatics -- I love your Journal and only recently discovered what a treasure trove it is (thanks @PastelBio!). If it is a problem I used one of the images, please email me (orsburn@vt.edu) to receive my apology and instant removal, but this is an awesome chart!


Again, I'm not 100% on the metrics here -- but this looks pretty darned good, right? I found PepNovo really surprising. I've used it a lot over the years and it was the main reason I started using the DeNovoGUI (cause I did just have an old PC that only ran this program!), but I use PepNovo+ and I don't think that these authors did....

Ignoring this, Peaks and Novor did REALLY well!  Even SequestHT and Mascot disagree on correct matches by 10%-15% or so (crude numbers from long ago when I still had access to both -- don't hold me to them). 60-70% sounds pretty darned good -- given NO database!

The best data -- I won't show here -- is the crazy Volcano creature -- in an example where we don't have a good database to use the classical engines with (I am imagining them trying to kill this creature to get it's DNA out. After years of failure by every international team, President Michelle reveals the truth she's known all along -- that a partially completed deathray had been abandoned in a secret facility in Siberia at the end of the Cold War. An international treaty is established and 2 scientists from each country are selected to work on the team and to complete work on the project. By diverting all the electrical consumption of Europe for 2 weeks (almost 4 minutes worth for NYC or Vegas) they can accumulate enough power to fire the completed deathray one time. In doing so this will also destroy the device and all the resources required to every build another, but they know it is the only chance they will ever have -- and fire the ray, finally break the Pyrococcus cell wall.  Their hopes are shattered, however, when they find the process shreds the DNA completely, leaving only the proteins(??) intact...leaving us right where we started...) and our only choice is to use de novo tools, the comparison between engines is really interesting - and maybe the most pertinent

In summary -- maybe the current generation de novo algorithms aren't 100% ready to replace our current database-driven tools, but WOW have they ever gotten good!

No comments:

Post a Comment