News in Proteomics Research: July 2022

Sunday, July 31, 2022

Ion-mobility fractionation?

Different types of gas phase fractionation methods pop up here every couple of years, but this is the first one I've seen that breaks up the ion density using TIMS.

The results really illustrate how dense the peptide signal is. Remember when we were trying to get to a point where we could fragment 100,000 peptides and that seemed way off in the future?

These authors use extremely narrorow ion mobility ranges for multiple injections on a TIMSTOF Pro and crank their protein IDs way up. To demonstrate further utility they use the sum of these runs to generate a spectral library for pasefDIA and improve those results relative to other methods of library generation. Clever idea with solid results worth thinking about.

Saturday, July 30, 2022

FlashIDA -- Real time top down deconvolution and targeting (on Fusion 3 or 4)!

More smart data acquisition things, and this one is for top down!

You can read about FlashIDA here!

It works on the Fusion 3 (Eclipse). I'm going to keep talking about the Fusion 4 until I get over my disappointment of not seeing one released this year. (No inside information, I was just expecting a refresh soon.)

If you're doing top down you should definitely check this out!

Friday, July 29, 2022

Party is over -- guidelines for single cell proteomics!

What does an exciting new field that is rapidly evolving need? Probably a bunch of old people making rules about how and where it should go next! Check this out!

This one is tarnished a little by the fact that they've slipped in some people on this who are younger than I am. I've got a USB drive full of Golden Girl gifs (GGgs) that I thought I was going to get to break out here.

(Edit: If you don't know me and the GG reference doesn't tip you off, and you just see a profile pic of me that hasn't been updated in....a while....this is one of the jokes, fellow kids. I'm well above the median of this list and I may wear shoes to work today that are likely older than some of these authors).

What I'm going to break out instead is a metaphor (or analogy, I forget which is what).

Imagine some prominent aptamer research center somewhere that has been just tearing that field up for the last 20 years. They can make DNA oligomers that can bind to just about anything. In the aptamer field this group is right up there with the 5 best groups in the world. All the grants coming their way allow them to pick up their first mass spectrometer so they can check that the masses of their products make sense without sending them out, and they go with a nice ESI ion trap system. After they get used to running this thing, someone brings in an aptamer binding reaction that doesn't make any sense with their traditional assays and they decide to infuse it on the ion trap. What they see is more than one compound and they can't really tell what is going on, it's too complicated, so someone gets the crazy idea of coupling an HPLC to that ion trap to reduce the complexity. It turns out that DNA oligomer made in that new organism (I dunno) has a big mass discrepancy, so these researchers dial in their chromatography and fragment that ion. Get this: What they see is a bunch of ions that correspond to amino acid masses, and if they were in a perfectly linear chain, the masses only line up if the exact same bond (!!) was broken at every amino acid. Shine up that Nobel Prize, because what they find is that a whole lot of their old aptamer things that were discarded actually have these long linear amino acid chains stuck to them. Unbelievably, they can work out EVERY one of them now because these amino acid chains all fragment the exact same way.

Now we'll step out of this scenario and back to you, strange person who has read this far:

Imagine how excited you are going to be when Science Magazine's Aptamer Research Journal (SMARJ) and other leading papers in the field publishes twenty five papers (18 of them are reviews from the guy who supervised the guy who first ran the ion trap) that drop in a 2 year period. Linear amino acid chain identification by vacuum chamber accelerated breakage coupled to reversed phase liquid based separation of reacted aptamer products (LAACIbVCABcRpLbS_RAP) is even making mainstream news. Oh crap. They've even got software that can semi-automatically figure out the order of those amino acids now and it's in one-word Nature! Boom. Paper. Boom. Paper. It is 2022 and Ion Trap sales are off the charts. Every aptamer researcher in the world needs their own because LAACIbVCABcRpLbS_RAP is the future of medicine and environmental research.

And then all those top aptamer researchers in the world get together and start releasing community guidelines. They even indicate in those guidelines that they know that proteomics exists. Heck, they're even thinking about some of the proteomics data (while, of course, focusing almost entirely on how much better LAACIbVCABcRpLbS_RAP is than the Excel sheets of proteomics data they pulled from the supplemental info of a couple of PLOS papers that are making the rounds in their circles).

Now, assuming that this rambling metaphor has anything at all to do with whatever I started talking about, I should note that this group is still taking suggestions! You can add your community guidelines and recommendations to LAACIbVCABcRpLbS_RAP, wait, I mean, single cell proteomics, at this cool site here: https://single-cell.net/guidelines

Of course, there is absolutely nothing stopping someone who has completed one of the thousands of successful single cell sequencing studies from contributing to these guidelines. Heck, experts from a reasonably mature, standardized, well-accepted, and incredibly impactful field of science might even have some interesting input from their last 20 years of successes into a field that literally didn't exist 5 years ago. And I really have to wonder if all the noise we're making makes us seem like we should type a little less and listen a little more. (The irony of who typed that sentence is, by far, my second favorite part of this entire post).

Wednesday, July 27, 2022

Dysregulated proteins and RNA in Parkinson's lymphoblasts!

I keep going back to this recent study in Proteomes for a couple of reasons.

Obviously, if you're going to study Parkinson's you need access to some cerebral spinal fluid or postmortem brain sections, right? Those are so easy to get that there is a guy I know who will provide you with the latter material only under the upfront agreement that he is the senior author on the paper and he gets to pick the author order. You can do a proteomics or a metabolomics study, make every figure, and you might be 7th author. That's the deal. Go get 30 brain sections somewhere else if you don't like it.

What drew me to this at first was the use of RNASeq and proteomics. Always a draw, and then once I googled what a lymphoblast was (I'd heard of it, give me a break, I just couldn't point it out on a map) then it made me think this group was a little crazy. What do we care about white blood cells for in a neurological disease?

They find a bunch of differences at both the proteomic and transcript levels. And the talk about how this disease does end up affecting other systems you wouldn't necessarily have thought of.

The proteomics was on a QE HF and that data was processed by MaxQuant (I'm pretty sure, details have faded a little). More proteins were found altered between healthy and diseases patients than transcripts at the cutoffs they used, which is pretty cool. Chances are only 1 out of 3 of the changing transcripts actually make it to alter a protein anyway, so the less of those to think about the better.

The final reason I kept thinking about this paper was the fact that the data aren't publicly available. Based on the extensive ethics declarations for the study, I had a hunch and I confirmed the senior author that they couldn't get a release from the medical ethics people to make the files available. As much as we have to be proud of as a field for making our data available....

-- you know, except the few big labs that have decided making a single 1TB zip folder and uploading that still counts as making the data available -- which, it doesn't. If that's you...

PRIDE is amazing but a LOT can happen during an 8 day download. And who has space for a 1,000 GB zip folder that is EVEN bigger after it's unzipped?

What was I....oh yeah!

...human samples have a ton of rules and sometimes someone behind a desk somewhere says that .RAW file isn't leaving the lab, and what do you do about that?

Tuesday, July 26, 2022

Questions about alzheimer's data being blown up by the media....

If you needed another reason to hate western blots, you should check out this somewhat overstated piece in Science. It's like following Elizabeth Blik's Twitter where she takes apart images.

The media is running wild with this piece, as if Amyloid plaques and large neural insolubility problems in neurological models don't exist, and that is not true at all. I'm definitely guilty of reading a quick news blurb on it and repeating it to some people before thinking about it.

No joke, if you get these insoluble chunks from neurological diseases it is crazy how hard they are they are to break up so you can digest them. Someone I won't name here underestimated this a couple of years ago and may have set a new world record for EasySpray column consumption. This Explainer piece from I fucking love science puts it in context:

Rather than being a big shakeup that questions a whole ton of things, the best I can tell this should just help clarify existing data. Toss out the assumption that AB-56 oligomer is important and that should help, right? Good thing we've been collecting data on all the peptides that are present, rather than using rabbit blood extracts bonded to the stuff from firefly butts to "measure" one thing at a time!

Monday, July 25, 2022

Protein adsorption loss (one of) the bottlenecks of single cell proteomics!

This new study probably isn't what you think it is. It is still important, but I jumped to some conclusions about it when I saw the title.

What this is:

A nice review of the status of single cell proteomics.

A reminder that we don't have good QC for it. Come on, y'all, we just sorta started doing QC for bulk proteomics!

A review of some things I'd never heard of that might have some sort of application to help us with QC'ing SCP samples

What this is not:

A study that quantifies absorption loss or provides tips for how to avoid it.

At least it's not another deceptively written "single cell levels" study that will confuse potential collaborators and grant reviewers and lead to messed up expectations of what we're capable of because it completely ignores that protein adsorption loss is a thing.

Thursday, July 21, 2022

SimPLIT -- Streamlined workflow for TMT labeling in 96 well plates!

Need a step by step protocol to break out the microchannels and 96-well plates for TMT proteomics? Maybe you should check out SimPLIT.

What is cool here is how this group is prepping tons of cell culture proteomics in 96-well plates with labeling, offline fractionation (by HPLC) and repooling and running TMT (on a Fusion).

This feels like a study that came together out of a lot of replication and thinking about how to streamline sample prep to the maximum possible efficiency.

One cool thing here is how they're getting their cells lysed with a 8-horn sonicator thing. I've never seen one of these that could sonicate more than one sample at a time and it took me a bit to find it.

If I have a criticism of this study, it is that it kind of feels like overkill on the LCMS side. 12x pooled fractions at 150min gradients (30 hours) is starting to seem like a lot to me, 30 hours means that instrument operating 24-7 will complete 292 18-plexes per calendar year. For showing off how great your method is from a number of proteins quantified perspective, this is a great setup and I'm sure that shortening the gradient length or cutting out the 2D fractionation would work great for a lot of projects!

For real, though, great study, great data, cool new acronym!

Wednesday, July 20, 2022

MaxQuant summer school BARCELONA -- September 5th! Register by 7/31!!!

A couple of years ago I planned to finally go to MaxQuant summer school and then a virus disrupted the coolest conference schedule that anyone had ever written on paper. I plan to complain about this until I die.

I can't go to Barcelona, but I'll be watching it on Youtube after it posts.

If you can go you should go, but you need to register here in the next 11 days!!

Tuesday, July 19, 2022

Even more protein products identified from "noncoding" regions!

Every day I feel like we're getting closer to the answer of "WTF are all of these unmatched spectra?!?!" the answers may not be the most fun in the world and here is a great new example!

I passed by this paper a couple of times because the title just didn't catch my attention, but then it popped up in a different web interface with the abstract graphic. In that graphic you see they did N-terminal enrichment and RiboSeq (!?!) and then proteomics.

RiboSeq is another complicated way for genomics people to get to protein level data without dealing with frustrating mass spectrometrists. Basically you stop ribsosomes in their tracks then degrade all of the transcripts in a cell EXCEPT the ones that are stuck inside the frozen ribosomes. Then you get a picture of what was actively being made into an original proteoform. There are actually smart workflows out there for combining proteomics and RiboSeq data and that's what this team used.

To simplify their overall matrix, they used N-terminal protein enrichment and they justify their reasoning better than I could summarize here.

What they find is a ton of weird stuff that can only be meaningfully attributed to regions of the genome that are annotated as "noncoding". Since no one likes a peptide match that comes from the vast majority of the genome (reminder -- supposedly only 1% of the human genome encodes for proteins, because that totally makes sense. Why wouldn't billions of years of selective evolutionary pressure result in 99% stuff that organisms won't use, though -- strangely like 1/2 is complete duplicates and 80% is considered regulatory) they go above and beyond to rule out that other things might be better matches.

What they come up with is a great big pile of things that definitely seem like they are misannotated as noncoding under this context. Now....fair to mention that this is an old cancer cell line and these are strange things, but this still points at some fundamental flaws in the upstream processes that result in those nice and concise FASTA databases that we use.

Sunday, July 17, 2022

Prepper -- Reduce peptide sequence intensity bias!

We know with 100% certainty that some peptides "fly" better than others. We exploit this fact when we're picking the best peptide targets for our best possible targeted assays.

Does this impart some sort of a systematic bias? Sure it does! But what are you going to do about it?

Thanks to this new study you can now learn up a machine with Prepper!

These people are really good at math and write about half the paper in Greek with funny variations in the font sizes used (example):

...which makes the study seem a little on the daunting side, however, there is a Github up here and my role here might be to be enthusiastic about what a great idea this is so that someone integrates it into tools that I use!

They run through a ton of previously published work and apply Prepper and results that didn't make quite as much sense, now make more sense than they did.

We all bash the transcript abundance to protein abundance correlation stuff, but -- ultimately there are classes of proteins that are going to be regulated primarily by transcription/translation. You'd think more would match up than they do. Some of this might very well be us! (Of course, many proteins are regulated by degradation, in which case transcription levels are largely meaningless in terms of protein abundance.)

Friday, July 15, 2022

Spectral quality overrides peptide scores!

I've got some great new people who are new to proteomics around and I get to forward and think about this fantastic tutorial again.

I don't know when scientists got so obsessed with tiny numbers far to the right of decimal places, I think the BLAST score is probably to blame in biology. This tutorial is fantastic because it reminds you that the mass spectrometer doesn't actually generate p/q/r -values, it generates m/z ratios and intensity values (I mean...that's what you see on the PC attached to it most of the time..). It might be more fantastic because of how approachable it is.

Thursday, July 14, 2022

IsobaricQuant -- Powerful new toolkit for multiplexed proteomics!

Wow. We've been doing multiplexed proteomics for 15 or 20 years. We also have more than 1,000 different pieces of software. Despite this, there has been a shortage of open tools to dig into multiplexed data. IsobaricQuant to the rescue!

1) Freely available at this Github.

2) Has documentation page up with installations instructions for multiple operating systems.

3) Has smart and powerful QC filters

4) Runs MokaPot!

5) Gets you back to the original data so you can check, by eye, if there is real support for your observations!

Tuesday, July 12, 2022

A panoramic review of human phosphorylation!

I've got to move fast here, but this big thinking study..

...is trying to make some sense of the huge amount of data that we've acquired, as a field, for the location and conditions for observing human phosphorylation.

I've dreamed for years about a system that could replace a guy at the US National Cancer Institute that I got to work with once who knows phosphocascades inside and out (there is only one of him and he's expensive and busy). You go "hey Mike, I've got upregulation of JAK, STAT2, and these three other things." and Mike says "sounds like you've activated Integrin alpha 4, you should look dimerization with ITGA4 and ITGB3.

Which has always made me think that there are patterns there, right? But we've never taken the 20+ years of studying phosphopatterns in a large sense the way a high operating human brain can do it. Maybe this the kind of approach we need to get there?

They do observe patterns here, which makes me wish that all the phosphodata on PRIDE was all the same level of quality, but hey I'll take it!

Monday, July 11, 2022

Exploring the journey of wheat stress proteomics!

My few forays into plant proteomics have impressed upon me how challenging that whole branch of life is to work with. For people who haven't tried it, imagine this scenario: Your model organism, most commonly known as ScumBag Arabidopsis, which almost everything is based on, was chosen as model because it has a genome juuuuust large enough to allow it to keep alive. Plant gene products are also relatively tiny, so it isn't uncommon to have genomes in plants that are 2 or 3 times larger than mammals to sort through, and your results get to be compared back to the barely relevant duck weed plant and it's stupid tiny genome.

But -- holy cow, we kind of need plants, and we should probably try to understand things like drought resistance so we can make it the few decades we have while this planet is still capable of supporting life at all. What could possibly tie these words I'm typing together?

This great new review!

The historical perspective timeline might be my favorite part of the review. But the models for inducing stress response through increasing salinity, etc., make it seem like crop researchers are really thinking about the challenges the world will be facing in the very near future. I'll take every little bit of cautious optimism for the future that I can get my hands on.

Sunday, July 10, 2022

Ridiculously complex proteomic effects of the herbicide glyphosate...

(From this original article in the Guardian)

....okay...well, glyphosate is a ridiculous molecule to analyze. I've never gotten it to retain on any chromatography at all. The only way I've ever gotten anything close to reasonable signal on it was by analyzing it in the flow through and using a really great catch all mixed-phase chromatography to retain just about everything else.

What are the odds that the group who did this analysis are just quantifying the wrong molecule?

Here is the report. Hopefully they don't know what they're doing!

....well....scratch that....ion chromatography would....work..... Glyphosate will absolutely retain on IC, and I didn't even know you could do 2D IC, and they used stable isotope dilutions. What a pain in the neck that must be to set up....

Actually the Schutze et al., paper is kind of brilliant. They retain the compounds with a KOH buffer but you don't want to be TurboSpraying that into your 5500, so the plumbing needs to be pretty complex to get rid of it. Hence the second dimension not just being really smart, it is probably essential. Check out this plumbing diagram!

Okay, so someone really really really good at detection of glyphosate just found that basically everyone in the US is pumped full of it. Well....what does it do (besides, you know, kill dandelions and cut the price of soybean production by 70%?).

It is worth noting that the carcinogenic aspects are still somewhat debated, but there have been a flurry of successful civil suits in the last few years from farm workers who have been exposed to a lot and ended up with very similar cancers.

At a functional level, this paper from a couple years ago provides a ton of insight.

They also broke out stable isotope glyphosate from Cambridge Isotopes and injected it into mice. They tracked the small molecules and they did a lot of proteomics using a variant of MuDPiT on a QE Plus system and did both lipidomics and proteomics on top of targeted analysis (it is a really nice study all around).

What they find is a pretty clear mechanism of action....

(that craziness on the far right is stuck to cysteines....)

And these effects, while they can't be ruled out elsewhere, appear to be the most pronounced in the liver and to set of important detoxifying proteins, which probably would better without some craziness stuck to all of their cysteines....

On the positive side, if you are doing human proteomics on Americans, you know a new PTM that you can be looking for, particularly in liver samples, though reduction/alkylation might strip that mod off? I'm not sure and it's already 5am and I've spent way too much too much time on this rabbithole.

Saturday, July 9, 2022

Find UnderRepresented PTMs with urPTMdb and TeaProt!

What's an under represented PTM? Trust me, you probably don't want to know, but...ugh...there are a lot of PTMs out there....

This is actually a cool chart that is the number of publications about some of these under-considered PTMs per year:

Just because we aren't searching for them doesn't mean they aren't there. But how would you get to them? Enter TeaProt and the urPTMdb!

First off, urPTMdb and TeaProt are very different things housed at the same location (https://tea.coffeeprot.com/)

TeaProt is a really clever Shiny App for downstream analysis of quantitative proteomics data.

urPTMdb is a set of databases of proteins with these weird PTMs.

The more I mess with TeaProt (and work out the acronyms) the more I liked how clever the output is. There are a lot of downstream proteomic analysis Shiny applications and probably a lot more on the way. What sets this one apart?

The big differentiator here is probably the number of databases you can compare your protein lists against. And if you go all the way to the bottom to the functional gene set enrichment analysis -- you can pull up the urPTMdb and compare your data to that as well.

Imagine this scenario:

What if you've done everything right and you've completed a fantastic proteomic analysis of some disease conditions and you get to the end and everything is the same? I've got a couple of these on backup hard drives that haunt me years later. I didn't skip a step. I had the luxury of time and the right access to resources to do it all right -- and I couldn't find anything to explain those phenotypes.

What if you take that output report and dump it in here.

What if TeaProt and urPTMdb says -- hey, those a bunch of those proteins have an F-U-mylation (actually an option on the list). Did you ever think of an F-U-mylation? What if a shift in F-U-mylation actually drives this disease state?

Name another way to get to that data. I'll wait.

In the meantime, the rest of the app is well written and crisp and the graphs are solid and smart.

Friday, July 8, 2022

How to (not) set up diaPASEF!

diaPASEF has been all the rage recently based on really amazing levels of coverage that we've been seeing on both long and short gradients at every meeting (and an increasing number of new papers).

As an operator of a TIMSTOF who might want to start exploring these relatively new superpowers, here is my advice from a focused holiday weekend of getting this all up and running.

Let's get this out of the way first. Newer versions of TIMSControl (released at Halloween ASMS 2021) have the ability to create windows based off of your DDA data.

NO. MATTER. WHAT. DO. NOT. USE. THIS. FEATURE.

For real.

If you do use that feature, you'll get the stupidest DIA windows that ever happened and they use up huge amounts of your cycle time. Real examples!

Now...if you are interested in the peptides that have an m/z of 100-150 and the extremely selective ion mobility isolation range of 0.6-0.95, please ignore what I just typed. I do think that it is possible to see doubly charged dipeptides and some smaller tripeptides in this range. Don't quote me, but I do think that this window would allow you to see such interesting targets such as RK and RAK and RGK, and if that's your thing, ignore me and let TIMSControl build your windows.

What is also important here is that : If you did import your data from DDA, you can not under any circumstances whatsoever, delete or alter those windows. They are stuck in that method forever until the end of time. Trying to alter or delete those windows will put you into a perpetual loop where TIMSControl will keep reloading that DDA file (which will take several minutes per cycle) and it happily do this forever. Your best best bet is to....if you have a TIMMY...you already know what I'm going to say...

It's time to fire up:

-the most stable

-best written

and by far, the

- most utilized executable in the history of this vendor's mass spectrometry instrumentation...

The STOP ABSOLUTELY EVERYTHING button. (Sometimes people think this is funny that this exists, and considering the absolutely amazing number of steps needed to switch from trapping to nontrapping on a NEO attached to a Fusion 2...maybe other instruments would also benefit from this button).

Now that we've got what to never ever do out of the way. What should you do? Well, you should manually type in your windows and you should use big windows for short gradients and small windows for longer gradients. It looks like it's best to just manually type everything from someone else's paper and just use their gradient conditions exactly.

Want a step by step guide? This one is great!

Another caution here. It is possible to pull the instrument method out of the .d data files using Data Analysis. I haven't had much luck with these, particularly from studies out of some place in Munich. My guess is that the software that they work off of is a version or ten ahead of the commercial releases. Also, my Flex doesn't respond well to Pro files all the time, so that might be a Flex specific issue.

Aha. There it is! I've had this open on tab 84 on my iPhone SE that is perpetually hot for some reason. This is brand new and from the title you wouldn't guess that it is a thorough diaPASEF optimization study (for plasma).

Their optimization shows that they get much better data tweaking things specifically for plasma because it isn't as convenient as HeLa.

This is obviously a very new toolset and things are developing very quickly and I may have a bug and might just be a jerk. The data you get when using diaPASEF correctly does appear to be great. Honestly, even when I wasted cycle time with stupid windows, my diaPASEF runs still outperformed a QE Classic running clever variable DIA windows from a new preprint out of Slavov lab (n=84, processed with DIA-NN, allowing it to make the library and pick the fragment ion tolerances.)

However, you probably didn't remove ceiling tiles to get slightly better data than a ten year old instrument. Since I threw out singly charged molecules in DIA-NN this was probably unfair to the TIMSTOF and all of the dipeptides it spent time analyzing over the holiday weekend before I put an SOS out to the totally awesome Dr. Florian Meier who helped get me back on track.

Thursday, July 7, 2022

Protein contaminants matter! New libraries and tools for DIA!

What a great run going on at JPR right now! Every time I go to finish my post on TeaCup, I end up distracted by something new that has just dropped that I could really use right now!

This group rightfully points out that while we've got some great FASTA databases out there for contaminants, we don't really use FASTAs for DIA. They also put in a load of work comparing the MaxQuant contaminant database to cRAP to their in-house developed contaminants.

They go the extra mile in using a lot of software in their pipeline and in the tools that they've made publicly available in just about every way you can think of.

There is a nice instructional Github here.

All the original RAW files for the contaminant library generation is here.

The data was generated using an HF-X which is probably worth keeping in mind if you're using other hardware. For example, I've found that while tools like Prosit do an amazing job of matching fragment intensities for Orbitrap spectra, they don't match quite as well for TIMMY devices, particularly for ions of higher m/z where fragmentation and isolation efficiency of the latter isn't all that great due to the lack of a robust CE normalization algorithm and quadrupole limitations, respectively.

There are some really cool examples in this paper of matches that look really good until you compare the data to the new contaminant spectral libraries they've generated and then -- whoops -- that's definitely a peptide from LysC!

Turns out that if the only thing you think of at G.W. is the scammy old super computer thing that skims hundreds of millions of dollars off the federal government every year despite not one published study in about a decade, think again, there is a forward thinking proteomics/multiomics group there worth keeping an eye on!