News in Proteomics Research: December 2021

Friday, December 31, 2021

My favorite proteomic papers of 2021!

What a fucking year, y'all. This might be just one part of the wrapup that no one has ever asked for! Despite the fact that this was the least active the blog has been since 2012 I still feel the need to close it strong. I've probably read more papers this year than any year of my life, but I've had to focus on things like learning basic biology and figuring out how the hell a cell sorter works so I can better understand my data.

Enough rambling (not really) but here are my favorite papers in proteomics of 2021, in no particular order.

1) 38,000 runs and going strong. The mounting evidence that we're overusing the weakest link in proteomics -- the Nanoflow HPLC. Do you have nanograms of protein or picograms of peptides? NanoLC is still critical, but if you've got micrograms of protein the improvements in mass spectrometers over the last decade have largely made gains in sensitivity that you get with Nanoflow HPLC redundant.

2) RAWBeans -- Rapid, near universal, deep insight into your instrument files and performance from a simple and handy little tool. You can get great insight into metabolomics files using the tool as well.

3) Multiplexed DIA is real. Maybe this is confusing with actually multiplexing your DIA windows. In this case I'm referring to multiplexing your samples with tags and running DIA so you get data from multiple samples simultaneously. You can use it with SILAC! Or with two cool new methods that use 3-plex tags. This one in ACS earlier in the year and this more recent preprint. And -- TMT?!? Why not?!?

4) Key the groans, but I have used AlphaFold2 a couple of times in December. Does it sometimes output some whacky gibberish? Sure! But with color coding to indicate structural confidence it's pretty easy to rule out and it beats having no structure at all!

5) MONTE -- a method to get all the materials you could probably want from your cells. Some biological samples are literally priceless. This is the cleanest procedure I've ever seen to make sure that very little goes to waste.

6) GlycoRNAs -- I mostly like this paper because it shows just how much more there is to learn about biology with yet another class of critical new molecules.

7) Q-MRM might be a bit polarizing, but I think we haven't scratched the surface of the potential this represents for updating 60 year old colorimetric assays used in the clinic today (or...ugh...radioactive ELISAs...) with inexpensive single quads. Hey! I just remembered that I was interviewed about this and I've never seen the article. I'll have to look for it.

8) I was trying to keep this somewhat vendor neutral, but I do really like these two studies that are definitely not neutral, so I'll give them the same number:

8a) SureQuant-IsoMHC -- stupid levels of sensitivity, selectivity and accuracy in quan for MHC peptides.

8b) AlphaTIMS -- makes digging through TIMSTOF data intuitive and nearly instantaneous. Data export comes off kind of whacky and I keep meaning to write the authors. Maybe I'll do that now!

8c) Okay...well...three...this new mass spectrometer has so much novel about it that I'm going to feature it here. I'm also supremely impressed by how good of a secret this was kept.

9) Inactivating coronaviruses!

This is largely me just picking things with a little spare time I have while these files transfer. It was another big year for proteomics and this is just some of the great stuff y'all have done this year. Looking forward to reading a lot more of your great stuff in 2022!

Wednesday, December 29, 2021

Multiplexed proteomics of low input samples from a rare eye disease!

This new study might be a textbook example of how to study limited samples and work your way to a biological conclusion.

How sample limited were they? They got little left over little pieces of eyeball material from patient volunteers following cataract surgery. They didn't get all of that for proteomics, they had to share some of it for microscopy work as shown in the really nice picture above (check out the paper, I just snipped it so the resolution dropped).

Samples were S-Trapped, TMTPro labeled, and analyzed by 2D LCMS on an Orbitrap Fusion (2? I forget now, I think it was a 2) using SPS MS3. Interestingly Comet was used for the data analysis using a "1.25 Da monoisotopic peptide mass tolerance, and a 1.0005 Da monoisotopic fragment ion tolerance." I suspect this is to allow for incorrect monoisotopic assignments made by the instrument on the fly.

Sunday, December 26, 2021

Mapping the Melanoma Plasma Proteome with MS1 Transfer!

How far can we push MS1 transfer (more commonly Match Between Runs)? When we want to dig deep into the plasma proteome? These authors decided to find out with the plasma from patients with Melanoma. (I really like the plot above that shows what each plasma depletion technique starts to let you see!)

The base strategy is pretty straight-forward. They used something very similar to match between runs (using Proteome Discoverer 2.4) for multiple plasma depletion strategies as well as some work with a cancer cell line digest to get a bunch of IDs and then they transferred the MS1 peptides to the undepleted plasma using 2 hour gradients on an HF-X. FDR appears to be applied to the MS1 matches, but I'm a little fuzzy on that and the role DIA played in the study.

Regardless, they come up with a whole pile of previously established markers for melanoma, as well digging brings up several more that look extremely promising.

The files haven't been unlocked as of the time I'm writing this, but I'm sure they will be soon and they're at ProteomeXchange here.

Saturday, December 25, 2021

Sample size comparable spectral libraries enhance DIA data!

Have you ever digested a single "purified" human protein from a commercial source and compared it against something big like the Human RefSeq database? If so you are probably familiar with how global tools for peptide spectral match and protein scoring tend to not scale very well for tiny inputs.

BioPharma Finder is specifically designed for much smaller inputs for this reason. I've been looking for a good free tool that can do the stuff that BPF can and I haven't had much luck yet, but there are still nearly 1,000 proteomic software tools to check!

This new study at ACS digs into the gap between small sample input and database size specifically for DIA. By optimizing the database to reflect the sample, everything gets better!

Friday, December 24, 2021

Comparing 4 phospho-enrichment strategies -- and a rant about whether we always need them!

Before I type about this great new paper comparing 4 different phosphopeptide strategies, I'd like to ramble about the fact that I've done maybe 4-6 phosphoproteomics studies in the last year and I think I've chemically enriched twice,

Of course, you can clearly get more phosphopeptides by chemically enriching up front, but I've largely found it to be a waste of both my and my collaborator's time. (My opinion may change completely here if we're talking about tyrosine phosphopeptides, in which case I'll want to break out an antibody for pY enrichment -- or studies where the peptides of interest are super low abundance). The best I can tell, basically only mass spectrometrists are impressed with lists of 15,000 phosphopeptides since no tools exist to perform automated analysis of crappy semi-quantitative piles of phospho sites (please correct me if this has been remedied by someone.) If someone is fishing for phosphopeptides, I may just run a lower flowrate on my TIMSTOF using the same gradient. My S/N goes up and I process my data while looking for the phosphorylation sites appropriate for their organism. If they have a hypothesis, I use the PhosphoPedia to build PRMs for their phosphorylation sites of interest -- and we're off to the Q Exactive to do some targeting. This isn't just because I'm not very good at sample prep. In a recent study for a collaborator with 3 x 3 biological replicates, I quantified around 5,000 proteins from her mouse organ and sent her a report with ~2,500 quantified phosphopeptides. 2hr gradient/IonOpticks Aurora/200nL/min. This isn't TIMSTOF specific, we quantified several thousand phosphopeptides from tumors from fractionated Fusion Lumos data and showed they correlated with extensively enriched phosphopeptides from the same samples.

An increasing body of work is showing that phosphopeptides are in your global data (paper 1, paper 2) you just have to look for them, and maybe read your audience. Is your collaborator looking for the biggest list of things possible, or is your collaborator someone who is looking for an explanation for some biological thing they're seeing. If they are the latter, they probably are looking for a somewhat large change and maybe a couple hundred or thousand good phosphosites will do it. If they're the former, do a big enrichment and, for good measure, run the samples on an ion trap -- BOOM! 7 million phosphopeptides from 60,000 MS2 spectra.

(Kidding about a decent amount of that last paragraph.)

Okay, but for real, sometimes you are going to need phosphopeptide enrichment and there are crap ton of different strategies. What is the best use of your time?

We've come full circle!

Thursday, December 23, 2021

Skyline Batch! Ditch the scripts and get a report!

Skyline can do a lot of different great things. I have probably used it around least once a week for somewhere around a decade to do all sorts of different things with different instruments. Every time I have to use it, I start with a new folder that will always end up full of increasingly angrywords.sky as I try over and over again to get the software to create a quantitative report comparing different sets of files. Eventually (sometimes) I'll get the seemingly random set of magical conditions all aligned and I'll end up with a table!

Okay, so what if someone super talented decided to actually fix this rather than wasting hundreds of hours over a decade screaming and occasionally breaking keyboards?

Meet Ali Marsh and Skyline Batch!

Skyline and Intuitive?!?!?

The paper is just a couple of pages. What you probably actually want is these webinars and tutorials!

Wednesday, December 22, 2021

A nice clear single cell proteomics protocol!

Having trouble keeping your mPOPs staight from your nPOPs or maybe you get your ID-DARTs mixed up with the legitimately smart ideas that have come from the 14 big "single cell proteomics" papers you saw this year ~~that you can't help but notice the majority of didn't do any proteomics on actual single cells~~ straight in your head?

(Don't worry, it's just like diluting a commercial tryptic digest you can get from any instrument vendor.)

Maybe you need a nice clear step-by-step protocol to re-center!

Tuesday, December 21, 2021

Comparison of digestion methods for aquaculture species!

Aquaculture is one of the fastest growing industries in the world and we'll soon hit the point where the majority of aquatic products consumed will be from these farms. There is an interesting 20 year summary here.

With any move of new species to mass production, new and smarter ways to monitor what is going on is going to be absolutely critical. What a great place for streamlined high-throughput proteomics! Sequence that mussel or oyster's DNA with 4,000x coverage if you want (I'm sure someone will) it's not going to give you insight into how that organism is responding to stresses.

This group took a look at some aquaculture species when proteomic samples were created with FASP, S-Trap and SP3.

The samples were analyzed with a Q Exactive HF to see which method they'd use for monitoring aquaculture stresses. I should note that this appears to be more exploratory than what would make a good "is that oyster farm stressed out" due to the use of 80 minute gradients and 120k/30k resolution scans. It should also be pointed out, as the authors themselves highlight, that these results might be affected by the not-entirely-100% comprehensive protein FASTAs for the organisms they studied.

In general, though, all 3 methods perform well and they all have peptides that are not caught by the others. For characterizing what the authors wanted to, FASP seemed to have a narrow lead over SP3, with S-Trap just behind, but S-Trap resulted in the highest number of unique proteins identified.

A good summary is probably that people use all 3 methods for good reason and what really matters is consistency. A big part of my first year back in academia has been collecting as many digestion protocols that people have inherited from grad students long gone and

(not that, but I've definitely thought about it when I've seen some that were clearly written before my time.... )

traded them for the S-Trap quick card protocol. Maybe there are peptides that we're missing that SP3 could catch, but I absolutely prefer the thought that the 2 S-Trap 96-well plates I'm doing digestions with today will look just like the digestions from the S-Trap spin columns that a collaborator used 6 months ago.

Tuesday, December 14, 2021

Another big resource -- libraries for human renal carcinomas!

Wooo! Big practical instrument agnostic resources!

This well executed study (FASP is still out there!) takes a serious look at kidney cancer.

While I highly recommend you read this short study if you can deal with the weird web viewer this publisher loves so much (seriously, Wiley, stop spending money on this. do a survey, I bet 99.9999% of respondents hate this thing and anything like it).

But what you might want is all these resources up on ProteomeXchange (this is an early release, I'm sure this will unlock very soon! Until then you can use these credentials (not sure why my blogging window just freaked out when I tried to paste these)

For:

: user: reviewer_pxd028411@ebi.ac.uk

MXrPjYFc

reviewer_pxd028411@ebi.ac.uk,password:

Monday, December 13, 2021

Omicron (B.1.1.529) protein sequence FASTA and PROSIT spectral libraries!

Want a FASTA for Omicron? Download the NCBI data packet here.

Want a PROSIT generated spectral library? You can download one made off of the NCBI data packet here. Overview of settings and how I made it are below.

Permanent link to zip file on my Google drive in case PROSIT online hosting isn't forever.

This may not be perfect, but it's something you can find in less than an hour on Google.

Disclaimers are over there. -->

I feel some guilt for what I initially thought about the person who contacted me asking if I knew where to get the Protein FASTA for the SARS-CoV-2 Omicron (B.1.1.529) variant in the news. It's been two years and there are 4,000 funded programs to make these sequences available. Obviously it is super easy to find these new sequences. Right???

Then what I thought would take me from 3:30 - 3:35 AM (mostly finding that gif that shows you how to type things into Google) took a whole lot longer.

Big shoutout, in particular, to this team who made a protein FASTA and in their rush to have the first paper on biorxiv didn't have time to make anything useful downloadable.

After about an hour of looking at pretty and flashy new sites completely dedicated to SARS-CoV-2 that are mostly news bites about how much they've contributed to the pandemic and thinking for the 7 millionth time that I need to get my swearing back under control before my kid's first word is an f-bomb, I went back to exactly where I got my first SARS-CoV-2 FASTA 2 years ago, the trusty ol' NCBI. The format of the site hasn't changed since I first used it for a class project in the 1990s, the search bar hasn't improved in it's ability to find what you're looking for, but--as always -- what you're looking for is there.

I made the PROSIT peptide.CSV input with EncyclopeDIA using these settings

Then I did deep learning magic using the PROSIT online server using these settings. (More thorough instructions for using both of these tools here).

You can download the PROSIT spectral libraries for Omicron here.

Sunday, December 12, 2021

TIMSCONVERT -- SIZE CONSCIOUS Universal Formats for TIMSToF files!

I have a single imaging file from our TIMSTOF Flex that contains 100,000,000 spectra. 100 MILLION spectra. Try doing a lot of stuff with that! THEN think that is just one of the control tissue slices. You've got 4 more of those + 5 drug treated. It makes my complaints about the LCMS data file sizes seem a little stupid.

TIMSCOVERT to the rescue??

Paper here.

Github here!

Wednesday, December 8, 2021

Great free proteomics course through YouTube!

Woo! Easy blog post! I've been meaning to add these videos to the resources for newbies page over there --> somewhere for a while and it just came up on an App on my phone somehow.

This guy has been putting up free lectures on proteomics for years on YouTube!

If you can get past his weird accent 😉 it's an absolute gold mine for both people getting started or for catching up on things maybe you haven't thought about in a while, or for finally getting an infant to go to sleep.

Direct link to these here!

Tuesday, December 7, 2021

Conspicuous in absence -- where is the easy and obvious SOMASCAN experiment?

If you're in a big biology or medical institution you probably heard the sigh of relief when this paper dropped.

"Finally", they say, "FI-NALLY we're free of the tyranny of those cantankerous vitamin D deprived mass spectrometrists! Proteomics has been wrest out of their pale and shaky hands and we can do real science with it!"

Honestly, I bet you can't blame them. I certainly can't. I submitted to a core proteomics journal recently after several years and I forgot the culture of snide comments and borderline trolling that is permitted in the two "big" proteomics journals that is somewhat unique to this scientific community. That's on me and that's a different story entirely.

The story of the moment is the easy experiment and why it hasn't been done yet. SOMASCAN is an aptamer based proteomics technology. Aptamers are single stranded nucleotide chains that can be used to detect proteins or pollutants, or just about anything, really.

This study is the first massive use of the technology on an extremely well-characterized population. If you aren't familiar deCODE/ ENCODE was a huge thing. Iceland is pretty isolated and there is a really interesting method for keeping track of heredity in the absence of records -- the unique use of last names. The suffix "son" "dottir"can be used pretty much without fail to track family lineages back through time with or without solid record keeping.

deCODE started back in the 90s when some dude from Boston decided to just go and sequence a ton of people and kicked off a massive genome sequencing project on this population. This group did amazing things to bring in the genomics revolution. I'm a big fan, and we all should be. The point of the background is that when these people do stuff, the rest of the world listens.

All the sudden, SOMASCAN isn't just some isolated weird company that spends more money on buying Google Ad space than on testing that their panels are quantitatively accurate. They're now a huge weird company that the world knows about largely because they spend more money buying Google Ad space than testing that their panels have any sort of quantitatively accuracy that has just pulled off a 5,000 proteome project with one of the biggest names in medicine in the world. Now, I have a bunch of US government COI stuff, so I can't even invest in Thermo, despite their recent diversification into the highest priced percolating coffee makers on earth.

So feel free to ignore me, as always. I couldn't invest in the exciting new wave of proteomics technology like this one even if I wanted to, but I love to talk about proteomics so much that I'll even answer the phone when people representing big capital management firms call to ask me about stuff when I'm driving. SOMASCAN has been the topic of a lot of calls like this and while I certainly wouldn't tell anyone how to spend their money, this is my reservation.

Here is the thing. This isn't new tech. It's been around for years now and the experiment that SOMASCAN needs to do to make me and every other naysayer out there shut up isn't hard or all that expensive. It's easy, relatively inexpensive and now at least 5 years in, the fact it hasn't been done should give you Elisabeth Holmes knows about that usb drive you buried off the Coyote Trail in San Jose level chills. Maybe Katie Holmes? Probably also scary.

Here is the experiment that hasn't been done.

Run samples with SOMASCAN. Run same samples with LCMS. Compare results. For fans of this technology they'll cite that work has been done. Sure. There is stuff, but LCMS based proteomics comes in many different flavors. There is the truly quantitative stuff and there is the kind of quantitative stuff. For most global proteomics out there, particularly as you travel back through time, it's the kind of quantitative variety. It wasn't that is couldn't be quantitative. It was more like proteomics was used to cast a wide net to look for things to then go after quantitatively. Good accurate quantitative mass spectrometry is BOOOOOOOORING. You need standards and controls and you have to think about %CVs and LODs and LOQs and I don't want to do that on 8,000 protein targets. I want to do something kinda quantitative on 8,000 proteins then I want to look at the evidence of the cool ones for a month and then pick 1or 10 to do the boring stuff on.

That's just me. There are nerds out there who basically ONLY do the real quantitative stuff. A bunch of them have congregated in Seattle for some reason, but I worked with one in Pittsburgh recently, so they are spreading. Here is the experiment: Have someone well established in translational or clinical mass spectrometry run those same samples using a quantitative pipeline. Compare the results. This isn't a "gotcha!" There are clinical mass spectrometry assays that have been blessed by the American Society of Clinical Pathologists (ASCP) and CLIA and are used to diagnose patients. The error bars and CVs are well controlled and well-understood.

Again, yes, there have been LCMS to SOMASCAN comparisons. Have they been designed well to be quantiative? If so, I'd LOVE to see one because I haven't yet. That isn't sarcasm. If SOMASCAN works I would literally use it. Mass specs are an expensive pain in the ass and if we're just quantifying proteins I do not care if I use a mass spec. It's the alternative proteoforms and PTMs and splicing events that we need LCMS for.

This many years in, though, and without that experiment it's went way beyond suspicious.

All that being said, I think this effort by the deCODE group is pretty cool 5,000 proteomes? Even if the data is lousy, this is a group that makes sense out of GWAS data and that's pretty much the crappiest "-omics" data you can get out of anything except a $510 drip coffee maker and they've done great stuff with it over the years. (GWAS, not the coffee maker, though maybe both.) And science is largely about resources. Mass spectrometers require a LOT of electricity and Iceland has a well documented lack of access to stable and affordable power, which has something to do with all the volcanoes and geothermal stuff around. I think it knocks the power out all the time or something or makes it a lot more expensive. I might have part of that mixed up. Either way, they got these beautiful plots like the one at the top of this post that I expect was met with either rage or projectile vomiting in certain facilities in Seattle.

Just because protein quantitation doesn't meet the criteria of classically trained analytical chemists doesn't mean that the biology isn't real. (Something is very wrong with that sentence, I'm not sure what). My problem with this approach is that these results could very easily be validated or supported with classical analytical techniques like the ones that are so precise the FDA lets medical technologists in hospitals use them to determine what is up with a patient.

It would be super easy. Please don't interpret this as I'm saying someone is hiding something. I ain't saying that someone is hiding something. But it's pretty weird.

As a final statement that requires more time and exploration, this isn't the only "next gen" platform aiming for proteomics. There is another one coming and, this is imporant, the two do not appear to agree..... Which one is right? Are either? I don't know but it sure wouldn't be all that hard to figure it out.

Saturday, December 4, 2021

ProMetIS -- an R package to combine proteomic, metabolomic and clinical data!

Need a way to make sense of overlapping proteomic and metabolomic data in a phenotypic context? Me too!

ProMetIS is a well-thought out solution for R mechanics.

If you don't have your 10,000 hours of R under your belt, this still seems do-able though because of all the data that this team has made available. I feel like I could (eventually) take my data and work backward through their steps and get to useful data. And if I have their data sitting here beside me I can figure out the source of my problems.

You can read about ProMetIS and check out this cool dataset here!

Friday, December 3, 2021

SpectroNaut vs DIA-NN vs MSFragger for diaPASEF -- a user perspective!

Are you using fancy TIMS based DIA proteomics and want to see how some of the tools compare? Look no further than this great new preprint! (They also do some DDA as well).

Thursday, December 2, 2021

New software takes glycan structure to the next level!

I'll just leave this here so I can remember to check both of these out later!

For context, if you've got some glycobiology people around, sometimes just saying "there is something with 4 HexNaC and 1 PeuCNaC here" isn't enough to answer their question. The order of those sugars and which isomer they are can be critical information for them. Maybe these new tools can get to that information?

Wednesday, December 1, 2021

The growing need for controlled access models for proteomics (and metabolomics)...

EEEEEEEK. Last year I helped trick like 30 different people into helping us reanalyze something like 1,200 CPTAC files to look for PTMs and mutations. (If you're curious, it is open access here.) Part way through we decided it would be great to get the genomic sequences from those tumors because the NCI had the exomes and transcriptomes from these patients. You can't just download that stuff.

You apply for access and.....

.... you............wait..........................and maybe you get it.....maybe you don't................but you wait either way.............

Genomics people are used to this! This is how it goes. We're used to going to ProteomeXchange and getting every file that we want as fast as our internet connection will allow.

An increasing body of evidence is building that we probably shouldn't be allowed to do things the way that we are now, and this short story in Nasty Comms is another reinforcement. For now I think most people in the scientific community mostly forget that proteomics exists. With some increased recent interest I think we've went from being forgotten for months or years at a time to maybe just days or weeks at a time. While it's easy to blame those of you that have nice labs with windows rather than the classic subterranean dungeons where giant boxes go in every few years and we largely stay out of sight and mind, but it's probably the science.

If you can identify a person and their traits with technology A and technology B and doing it with A is both illegal and unethical, you really have to stop and think about how you handle technology B.

Metabolomics might surprise you a little because it isn't quite as straight forward to identify a person (if you can, I don't know how) but you can get SO much information about them. The way that I try to impress people into letting me do metabolomics is running some of their precious clinical samples that they have all the information on and then sending back a focused report of drugs and drug metabolites. If you go in blind and show some MDs what drugs their patient was on at that blood draw you get instant credibility. The flip side is that you can easily find that information in basically any untargeted metabolomics study. A fun one we just published was from a collaboration with a group that has done lots of work associating schizophrenia with cannabis use. Global metabolomics on their cohort strongly suggests that the method of screening their control group (giving them a questionaire about their drug use) has the weakness that people sometimes lie about using drugs. Even worse, sometimes LOTS of people lie about using illegal drugs, so there is a bunch of data on a huge historic cohort that may need a revisit or 12.

It will be interesting to see how data access continues to develop for us going forward, but I predict that 10 years from now it won't be nearly as easy as it is today. I also suspect a lot of us will learn how HIPAA secure data storage systems work....