Wednesday, November 11, 2020

INFERNYS! First (second) impression!

 


DISCLAIMERS NECESSARY! 

1) This may look like a violation of the long standing blog rule "if you can't blog anything nice, get a less weird hobby, you extremely strange person who types too fast".

2) I literally may have no idea what I'm doing. I'm just an aging ape in cool old shoes who types really fast. 

3) I could 100% be doing this wrong

4) I think XCORR and, SeQuest, in general is kinda dumb. 

5) The dataset that I'm currently interested in and using this to test might be suboptimal for comparisons (more below) 

Subtitle for this post: 

INFERNYS: A node made specifically for making some old guys in proteomics mad?  (For real, y'all are gonna get some phone calls. 

Oh yeah. INFERNYS is a new node in Proteome Discoverer(TM/R) that came out in the new release. It's the first addition of deep learning to this software package that I probably literally use every day of my life. I rambled about some of the new stuff a few days ago here

Let's start with the great stuff first!!! 

1) I've ran several comparisons of the exact same workflow with and without INFERNYS. We're currently working on deepening or understanding of the human liver (it's super complex and surprisingly under studied) so that's my focus right now.

2) In EVERY comparison when I've added INFERNYS so far, I've gotten more of everything. I've gotten more Peptide Spectral Matches (PSMs) and that has translated to more peptide groups, and to a higher percentage coverage of the proteins in the liver.  This has translated to, but much more modest, increase in the unique protein and protein group identifications. More PSMS is good! 

The stuff I think will make some people mad. 

In my hands (again, I could be legit dumb, yo) the PSMs have had terrible XCORRs. 

What's an XCORR? It is a historically important metric for spectral match quality. Dr. Will Fondrie is a smart guy who does proteomic informatics stuff, and he did a remarkable job of capturing the idea on his blog here


Let's talk about my dataset first. I'm working off the Adult Human Liver samples from Chan-Hyun Na et al., 2014 (Ramble here)

The ones I like best are 24 High pH reverse phase fractions ran out at around 100 minutes on an Orbitrap Elite in "high/high" mode with HCD (given the speed of the Orbitrap Elite, and today's technology imagine this is a  QE HF with maybe 1/3 sensitivity and a good bit more overhead between scans, and you use around 3 times more electricity per unit time (3 220V 3 phase lines for the instrument! P.S. I still love those huge old monsters.) 

Don't quote me, but I think the MS1 scans are 120,000 resolution and the MS/MS are 30,000 resolution. It's really really nice data, but at this scan speed, there just aren't all that many spectra compared to todays' stuff. There are only 388,000 MS/MS scans. Given the fact that most of today's human stuff I download that is fractionated is in the millions of MS/MS scans, this is why I'm concerned that it might be this dataset.

As you can see in the histograms above -- You get an increase in PSMs, but the most obvious at first glance are the increase in PSMs with an XCorr <1 and....you get a decent increase in PSMs of XCorr <0.5. 

What's a PSM with an XCorr of <0.5 look like? It's probably one or two matching fragment ions. 


This is BIG DATA. It's entirely unfair to pick a couple ions to take a look at and poke fun at them. There isn't a single dataset out there that you can't find a couple peptides like this that snuck through in. 

And....we should 100% consider this. There is a lot of routine data analysis out there in the world on really important things, like "how much pesticide is on that celery" is determined with one fragment ion. Here you've got a high resolutin mass and 2 fragment ions? Heck, there's a good chance that MS/MS spectra above is a good match! Imagine that you build a PRM for this peptide and you picked those two high mass fragment ions? You'd quan off them and move on. (Hopefully they'd be more than 2,000 counts, but I hope what I'm implying is clear). 

However, if you work with someone who is going to take a look at the XCorr and think you're a dumbass for letting some fancy semi-supervised thing (Percolator) and some deep learning mumbojumbo (Inferys) make decisions for you over a tried-and-true statistical model defined in the good ol' days when mass was something we crudely and slowly estimated by painfully ejecting ions synchronously out of a little box using an estimated stability matrix defined by fist sized transistors, I hope that I've given you a fair warning.  😇

What I really mean is that ALL these things are shortcuts. They're necessary because I couldn't look at every PSM in this dataset in the next year unless that's all I did (and I'd get really distracted after the 4th one). Keep that in mind. Never stake anything important on it until you've checked it out manually! 

And -- this is the best little guide for that ever (this blog post has links to the paper)!!

No comments:

Post a Comment