Tuesday, May 17, 2022

Benchmarking DIA for patient samples! -- and a rant about study motivation and new informatics tools.

 


I hope the resolution of this image turns out a little better than it looks on my screen right this second, because the figures in this new study are fantastic

The amount of work here is just staggering. We just sent out a paper where I used 5 DDA search engines in an effort to help reviewers feel more comfortable about some peptide IDs I've made, since they've been a little controversial so I also spent most of a week working on an HTML interface so people could see the original spectra, and match data, stats, decoy matches and all that stuff. I want everyone who sees these identifications to see the original spectra from which I made these identifications. If I'm wrong, I want (need) to know about it because I'm about to embark on a couple years of work based on these identifications. It will totally suck to find out that I was wrong in 2027. While this is largely just me bragging about how cool the last 3 weeks have been in solitary confinement in my office, there is a secondary point here. 

New software is showing up that is dramatically increasing the number of peptides and proteins that we are identifying -- and that has happened before, yo, and it ended up not working out great for everybody when no one could validate those new protein IDs. It sank some companies and some really high profiel projects. I'm not saying that history is repeating itself, by any means, but it is definitely making me nervous. (I'm not the only one, I got to hang out with some core lab directors last year and it was a major topic). 

I want more identifications in less time just like everyone else, but I need some sort of confidence boost to go along with those identifications as well. Actually, geez, I'm hoping that I'm just continuing to head into this cantankerous old academic stage of my career, but I actually want to see evidence in mass spectra that identifications are real. Not all of them, that's impossible now, but I'd at least like to be able to spotcheck IDs whenever I feel like it -- and absolutely when they are something important. 

I don't want to detract from this awesome study that was a huge amount of work and should definitely be published in this prestigious journal and did a lot of really impressive stuff --  okay, but I have no  idea what any of these words are

However, I do know what these words are and this, encapsulates things pretty well for me. 


Everyone has different goals and motivation going into a study. I only know how to do one thing with mass spectrometry data so here are my goals and motivations, basically always: I use proteomics and metabolomics of disease states to find markers. Then I try to make a targeted assay for that/those markers OR I give that list of identifications to someone who has only a basic working knowledge of how I made those IDs, but trusts me that every one I gave them was correct (or, at the very least will be really mean to me if they find a wrong one later that they put work into). 

There is no point in my workflow where anything outweighs a low quality identification. The MD who brought be this stuff will never say "well, I guess it was worth it to waste two years of my time and these patients who suffered physical pain to provide these samples because those plots were pretty." 

Now, that being said, there are other people and other goals and to be perfectly fair, we've identified most of the easy diagnostic markers. Most are combinatorial and need fancy statistics to uncover.  GWAS has helped people despite the fact the data for each patient, in isolation, is difficult to interpret at best, and largely inaccurate at worst. For people trying to use large patient n and machine learning and so on to identify patterns in patient data to make identifications, the negative effects of having low quality identifications WILL be outweighed by having more data. 

In addition, and I have to add this (see the cantankerous statements above) other groups that will find these benefits will outweigh other things are instrument and software manufacturers and creators and vendors. 

I'll end this rant now, with this statement. THE IMPORTANT PART HERE is knowing which group you are in. If you are in a core lab or collaborative center environment and you need to stand by every identification that goes out your door for your and your team's livelihoods,  I bet I don't need to tell you to be a little skeptical about new software that boosts your IDs by 20% with no easy way to see if those IDs are based by spectral evidence. If you are in the other groups I mentioned above, you are probably okay! 

No comments:

Post a Comment