News in Proteomics Research: October 2024

Thursday, October 31, 2024

Can Stellar hit "next gen" numbers of protein targets?

I've got a pile of huge O-link population level proteomics studies on my desktop to rant about at some point. However, at some point a blog post has consumed so much of my time that I think "maybe I should float this by an editor to see if I can get real world credit for those hours" and I made that mistake yet again with that one.

I love proteomic discovery work. I don't think we've found anywhere near all the protein post-translational modifications that matter in humans, and that's how we'll find them. Heck, we don't even know how many f'ing proteins there even are. The biologists, however, don't seem to have the patience for these primary discoveries. They want large n targeted studies and they want them yesterday.

You can PRM a decent number of targets at high resolution. A stellar student in my group did 300 targets on a ZenoTOF but that's WAY more than I've personally set up. I don't know if I've done more than 40 on a Q-Orbitrap ever.

Sure QQQs are fast, but in real biological matrices for MRM/SRMs? You need a crapload of transitions and you need a synthetic peptide for every one. There's just too much noise.

An O-link bigtime assay (maybe that's what they call it? something like that) that uses a super high throughput (and increasingly unreliable - some of that new Illumina super multiplex stuff has a load of garbage artifacts in it) can target (and, please remember, target doesn't necessarily mean detect) thousands, and that really hasn't been in reach for LCMS

Until now? Can the handy dandy super fast little unit resolution ion trap target on that same scale? These people seem to think so.

At first this doesn't seem all that impressive, right? I did nearly all of my PhD work on the new-at-the-time 3200 QQQTrap system. Triple quad with signal boosting trap on the end. Yawn. Same stuff.

My QQQTrap couldn't do >100 scans/second. And it sure as heck wasn't more sensitive than any QQQ on the market today. The trick, though, is in the combination of really sophisticated control software in combination with quantitative software everyone in the field except for me is really good at using.

The authors take the Biognosys 600-ish targets for plasma reagent kit and demonstrate really good quan even when running 100 samples/day. What's that? 14 minutes run to run? Something like that. But with the number of concurrent targets that can be measured per cycle, it's clear they could add on a crapload more and have the cycle time to pull them off. It'll probably be a question for the community soon, for how many points of confidence do we need for unit resolution PRMs? Given the full blown acceptance of tools that predict peptide fragmentation patterns, retention times and ion mobility when available in wide window experiments....I dunno, it's hard to see too many problems with targeted quan without a heavy internal standard for every peptide. I'd betcha two of my kid's newly acquired KitKat bars that the quan is better than the O-linky and Aptamer-ma-bobs.

Wednesday, October 30, 2024

msqrob2PTM - Normalize PTM data against global proteomics to actually find the important PTMs!

If you take this great new paper and print out any figure you could approach any scientist on earth from a long distance holding it up. As soon as they could see it at all, they'd know - everything in this paper is written in R. Which is totally cool, I'm not making fun. It's just the most R'y proteomics paper I've opened in a while.

What is it? It's a statistically valid way of taking your PTM level peptide data and normalizing it against your whole protein data, so you can get stuff like relative phospho-site occupancy per molecule. Great for those biologists who expect this to be the standard output in a PTM study.

Tuesday, October 29, 2024

Shine up that crappy figure with public domain illustrations from NIAID BioArt!

What an awesome new resource from the great scientific illustration team at the US NIAID!

Check them out here at: https://bioart.niaid.nih.gov/

Shine up your ugly figures with these great illustrations. If you are concerned about what and how you can reuse them, just filter by the ones that are 100% public domain! Which, at least on the protein tab, (which is all that really matters, right?) appears to be every single one of them.

A Don Hunt story (John's Version)?

Several years ago I got to introduce Don Hunt at a packed meeting in Bethesda. Trying to easily summarize the list of accomplishments and pivot points in protein biochemistry where he has played a role? Challenging. Trying to do it so he has time to show new data? Impossible.

If you've got 5 minutes, this perspective by John something or the other can really put some of this into ...pers...pective.... hmmm.... article classifications aren't always so neatly packed into the category with the exact literal meaning. This one is.

Bonus - the tale of the twists and turns in getting Sequest published should inspire anyone out there who has really believed in an idea that hasn't gotten the warmest reviews from peers! Totally fun read.

Also, I really enjoyed reference 31 (I think, read it yesterday).

https://www.science.org/doi/10.1126/science.246.4926.33

Monday, October 28, 2024

Gain of cysteine missense mutations in both disease and healthy(?) human tissues?!?

In shotgun proteomics we generally do our best to ignore cysteines - and especially the super important PTMs that they tend to carry. We reduce the cysteines (losing the PTMs and anything else, like drugs that might bind to them) and then we put in harsh binding chemicals to make sure those cysteines never do anything remotely biological ever again. We assume those 2 reactions occured with 100.0000000000% efficiency, and we move on.

Chemoproteomics people, however, tend to be very intersted in the drugs that bind to active things like cysteines so they have to use other approaches. And....maybe the coolest discoveries of this awesome new paper weren't a surprise while doing some chemoproteomics, but I felt like there was an air of increasing surprise (there was for me!) as I read through it!

The paper isn't a short read because this team did a lot. On the mass spec front, cysteine pull-downs and whole proteomes (which did employ reduction/alkylation) were analyzed on an Orbitrap Eclipse with a nanoLC system. TMT was also used at times.

On the genomics front, cell lines were sequenced and the variant call files were integrated into the database search using a 2 step process with MSFragger through command line. I'm not sure if this was just their typical way of doing things or whether integrating the normal FASTA with the processed peptide variants and controlling FDR the way they did required some fine tuning that is easier to set up outside of the GUI.

These cancer cell lines largely suffer from problems repairing mismatch errors in their DNA. (Deficient in MisMatch Repair, or dMMR). Makes sense, right? Cancer is often a DNA disease. Errors propagate until you've created renegade cells that do whatever they want. Missense mutations typically end up changing one amino acid to another. Why would new cysteines be the most likely outcome?

From a pure codon perspective, it doesn't seem like the most likely outcome! If you are randomly altering bases in DNA you'd think Leucing or Arginine (6 codons each) would be the most likely random occurence, right? Cysteines are only 2. (Stolen from Wikipedia)

...but we're talking about the selective pressure of cancer cells....does having more cysteines infer some sort of an advantage? Beyond me to think about, but it sure is weird.

Where is gets weirder is that it looks like missense mutations are also found in the healthy human data they evaluated as well.....again....beyond this blogger to really think about - but it should go as yet another of this huge pile of reasons to question our current ability to target every human protein.

As an aside, I found myself reading between the lines of this one more than I should have, but I could imagine someone doing chemoproteomics of cysteine binding drugs (maybe because I spent a lot of time on sotorasib the last few years) and then finding 800 peptides it bound to that had no presence in any human FASTA database. It sure would justify the time they put into this great thought-provoking study!

Sunday, October 27, 2024

De novo analysis of poly(!!) clonal antibodies from human blood!

I can't seem to get this picture in higher resolution, but the paper is open access here!

I have a list of things in my head that I don't think you can do with a mass spectrometer. Or should do. Or, maybe, if you do it, it's definitely going to suck.

Mixtures of polyclonal antibodies? Definitely on that list. Mixtures taken from human serum? Yeah, good luck with that!

There are a LOT of steps here from the best antibody characterization group I personally know of, but solving the absurd mixture of proteoforms that are present in human serum following a viral infection?

It's hard to quantify KRAS when you know you've got a copy of the WT and one copy of one of the convenient mutant proteoforms on the other chromosome. That's 2 proteoforms on a little (though annoying) protein. mABs class switch and glycosylate and crosslink in weird places if you look at them funny. So you end up with these 10x larger proteins with a big conserved region from all the different variants and then a mixture of craziness from the variable regions way down in the low abundance region!

To do it required all those enzymes above with both bottom-up and middle-down proteomics. An Exploris 240 and Orbitrap Eclipse were both used. I'm a little unclear from the methods, but I'd assume the middle down definitely went on the Eclipse for ETHcD, though it may have been used for most things. Also, this is the first time I've seen an EvoSep used for mAB mapping and I'm totally psyched that it works well for it. (You know, it's kinda tuned for global proteomics and in peptide mapping you want the little tiny peptides and the big ones as well).

And - this is a paper from a company, right, and - GASP - all the data is up on MASSIVE if you want to check it out for yourself. No joke, the 3 papers I had in front of this one to read were a 0/3 for publicly available data, so you won't hear about them here!

Friday, October 25, 2024

Parallelizing the most challenging steps in proteomic analysis on the cloud!

I got this preprint sent to me after my brainstorming on core hour usage for proteomics. I was largely doing that to figure out whether it was worth it to me to use spend the time on slurming around for 50,000 cpu core hours I just got access to. What I didn't get into in that post was where the Harvard team found their HPC spending the most time - it was, by far, on match between runs.

In this preprint, this team demonstrates some early results in parallelizing that pain point on the Cloud.

The best figure in the paper is probably the panel at the top. Go to 1,000 files and - yeah - you use a lot of cores but you cut 6 days of processing time to a few hours. Since Clouds (which are just someone elses HPC) tend to do a really good job in charging you for what resources you actually use (because it's a highly competitive commercial environment and if they didn't do it right you'd give your money to someone else) the costs end up working out to just about the same, same cost but you get your results back almost a week later? Everyone is taking that deal.

Again, very preliminary, but you should be excited because you know someone who would like to talk to you about their 5,000 FFPE blocks for proteomics and you can only avoid them for so long. Pretty cool to know that someone is thinking about a bottleneck you haven't go to yet!

Thursday, October 24, 2024

iHUPO release of the upgraded ZenoTOF (+)

I only have marketing stuff to go off of, but the ZenoTOF platform got an upgrade release at iHUPO in the ZenoTOF+. I guess the other one is now the ZenoTOF Classic or the ZenoTOF Lite. You can read the marketing stuff here.

The system keeps EAD and Zeno pulsing, but appears to add a fast sliding(?) quad ramp function.

By rapidly parallelizing the pulse/eject in the zenotrap it looks like the efficiency goes way way up and allows some ridiculously fast acquisition times for high loads as well as impressive coverage of low load samples (top panel). To get to the numbers in that panel they did drop the microflow and moved over to the IonOpticks Aurora 15cm and 150 nL/min.

You know, when you need a QQQ it's made sense for the whole time I've been doing mass spec to talk to each vendor and either visit a demo lab or two or send out samples. How cool is it that it makes sense to do the same thing for global proteomics today? ~~Hopefully this all translates to competitive pricing!~~

Wednesday, October 23, 2024

Get 10-fold higher mass spec imaging resolution by inflating your sample!

As cool as mass spectrometry imaging is, microscope people are quick to point out that 5um resolution (which takes a long time to do and is often the upper limit for even the nicest mass specs) is crappy for a microscope.

There is obviously lots of work in this area, but wouldn't it be easier to just make the sample itself a lot bigger?

No, you don't need some Pimm particles or whatever

(by far, my favorite Ant Man scene in any movie)

This might sound like a joke, but sample expansion is absolutely used in microscopy. It was a pretty big deal when it was first done about a decade ago and people still do it. You uniformly add polymer to your sample and that sample uniformly expands, then you look at that. Or something. Obviously not my expertise.

So why didn't we just do this before? Well, polymers and mass spectrometers aren't always the best of friends. You need to find something that causes expansion AND is mass spec compatible and this team seems to have done that here. Super cool backward approach to getting absurd mass spec resolution!

Tuesday, October 22, 2024

HUPO 2025 big hardware surprise? The fastest and most efficient high res mass spec ever?

I couldn't possibly describe my jealousy fully that I'm not one of the 1,800 (!!) delegates at HUPO in Dresden which started yesterday. I can, however, continue to monitor the social networking things and the preprint servers to get a feel for what the big developments are.

One of these dropped yesterday -

- and has already drawn some headlines like this one!

It's really no secret right now that only a tiny fraction of the peptides ionized are making it to the detector. The whole reason the TIMSTOFs are so good is that they are parallelizing their accumulation so that more of the ions being generated are being detected. There are other reasons, but the magic of a TIMSTOF is absolutely the front end stuff. Turn it off and...it's a TOF.

What MOBILion proposes is utilizing a dramatically higher percentage of the generated ions through their SLIM technology with parallel accumulation. Super cool stuff that I'm confident everyone will hear more about soon.

According to blog rules I totally made up but I've stuck to for years and are rambled about over there somewhere --> I am a paid consultant for MOBILion, although no one has ever paid me for writing a post on this blog.

I'm not sure what more I can share yet, but I don't think anyone would be mad if I typed something like - if you're thinking this is a singular prototype system spread across a room with wires and oscilloscopes everywhere - you'd be wrong on 2 counts.

Monday, October 21, 2024

New DIA-NN release! 1.9.2!

Obviously, Vadim isn't going to drop a new DIA-NN build at international HUPO without it being better and faster in a bunch of ways. That probably goes without saying.

For the end user this is the big one for me -- you can look at your visualized evidence on more than just one thing at a time!

You can get the 1.9.2 upgrade here! As always, pay attention to the legalese and stuff!

Sunday, October 20, 2024

Streamlined proteome stability - find those drug on- /off- targets on 30 samples/day!

What a week for proteomic applications!

This came up on great study I rambled about earlier this week, but drugs almost always target or affect proteins. Hopefully just the one you care about, but off-target effects can and do happen where that drug binds some other proteins.

PISA has been discussed on this blog before and you can probably find it in the search bar, but this and related applications expose a proteome to a drug and look at the proteome effects after treating the proteome at different temperatures prior to digestion.

If your drug is binding to a protein chances are that protein is going to fold/unfold at different temperatures and alter the downstream peptides you quantify. Super cool stuff, right?!?

The problem with just about every earlier study is that it takes freaking forever. Remember that to get a decent coverage of the proteome even 5 years ago could take 1 day/sample. This new study walks through optimization of a bunch of steps and gets to a really solid and approachable method...

...with great throughput! At 30SPD using DIA on a solid and pretty affordable workhorse of an instrument, these authors characterize the on- and off- target effects of 22 different drugs in record time.

The data on drugs we know about matches up really well with older data and adds a ton of credibility to the uncharacterized drugs. If you're just interested in the drug output data, it appears to all be up there on Zenodo here.

Saturday, October 19, 2024

Spatial proteomics identifies an effective treatment for a lethal skin disease!

Holy shit, y'all.... if you're going to read 1 proteomics paper in 2024. This should be that paper.

I'm going to change the formatting of the blog to add a "success stories for proteomics technology" or something over there -->

I haven't before because .....it's not a very long list..... THIS IS ONE.

1) Let's start with one of the most horrific diseases you've never heard of. TEN or Toxic epidermal necrolysis. It's as bad as that last word suggests. Patients are on other treatments - it looks like typically from chemotherapies, but possibly also from other treatments - AND 30% or more of their skin dies and falls off. Mortality rates are high. Skin is an important thing for humans to have all of.

2) Screw the method and approach and the fact that a TIMSTOF SCP and an Orbitrap Astral were used in combination with multiple really good spatial techniques. All cool stuff -

THEY IDENTIFIED A MECHANISM - something we already have approved drugs in the clinic for - and it was something we already had drugs for in the clinic -

AND THEY CURED PEOPLE! 10 patients?!? I was excitedly reading this on my phone while my kid was digging a hole outside and I'm only getting to type this while he's in the bath.

All the stuff in the middle is important. They did the deep visual proteomics workflow with the TIMSTOF SCP. They derived cells for deep proteomics/phosphoproteomics from limited material with the Astral. What they found in both the FFPE tissues they were analyzing and multiple relevant models was that the disease messes with JAK/STAT. JAK inhibitors cured mice - and then - they worked on people!

Such an inspirational, exciting and beautiful study.....

Friday, October 18, 2024

More evidence the blood brain barrier is a drug metabolizing system!

I've never had a pharmacology class. I started with a book called something like "pharmacology made very very simple for people who are a little slower than average." In that book it is pretty clear that drug metabolism occurs in the liver. You can find similar things by googling "where does drug metabolism occur" like this nice picture from the European Patient's Academy.

So when (now Dr.) Abigail Wheeler hypothesized that toxic effects of HIV antiretroviral drugs were due to drugs being metabolized by cells at the brain and not cells at the liver, she had several tough years to build evidence for that case. She had to quantify metabolism products and use painful targeted quantification to make the case that drug metabolizing enzymes were really present in a lot of places outside the liver.

Fast forward some technology improvements and a couple years of hard work by another young scientist and some helpers and - here is how and where that drug metabolism (and transport of those drugs and drug metabolites) happens at the blood brain barrier!

Again - this is some controversial stuff - so there are pages and pages of validation including western blots and FACS and efflux assays and other words I don't know.

For the proteomics stuff, diaPASEF on a TIMSTOF Flex (later model, so Pro2 cartridge) was used to characterize the cells that make up the blood brain barrier. The files were processed in SpectroNaut and the proteomic ruler technique was adapted to generate solid copy number and nM concentration estimates for 8,000 or so proteins. Those numbers are summarized on a nice Shiny web portal which can be directly accessed here.

Oh yeah, and I didn't do any of this study, I taught author 1 how to do really good proteomics and author 3 how to write stuff in R, kept the service plans paid and tried (unsuccessfully) to keep all the instruments from being destroyed by floods. Boom - Hannah wrote a really nice story that helps illuminate some serious questions we have about drug toxicity and I have a great new resource bookmarked at the top of my browser.

Thursday, October 17, 2024

Three different retinal degeneration mutations result in the same (treatable?) phenotype!

Need to read something super positive and optimistic today? I strongly recommend this new study in press at MCP that totally made my day!

It's really easy to look at the broad range of different genetic mutations that can lead to a single disease and think.....

Retinal degeneration diseases ABSOLUTELY fall in this category. Check out this associated paper on progressive vision loss in dogs.

Mutations on 17 different stupid genes are known to lead to just progressive retinal atrophy - which is just one of many diseases that cause dogs to go blind later in life.

If you are in drug development in either primary research or for applied for-profit stuff what do the odds of success sound like for a disease caused by at least 17 different things? Can you convince someone to help fund you while you chase targets that may only help a small percentage of those afflicted?

Almost always? No. That's a bad elevator pitch and a worse grant application. In pharma? Start sending out CVs before you ask.

Why this paper is so very very cool is that they took some of the mouse models for progressive retinal degradation (mutations on different genes!) and looked at the proteins that actually change vs controls. They're the same!

Unnecessary reminder for most people here (good for outsiders, who still can't seem to get this stuff straight)

Genome is genotype, that's what the DNA says, but that isn't what is physically happening

Proteome is often the phenotype (what is physically happening!) (or at least very close and involved in the phenotype)

AND - Nearly all drugs target proteins!

These authors don't miss the point here either. Who cares what the gene is that caused the protein change if you know the protein causing the problem? Not me, not these authors, and certainly not patients. Cause now you've got something to develop a drug against!

Wednesday, October 16, 2024

Plasma or serum - and which one??

Oldy but goodie that I'm putting here so that I can find it when I inevitably need it again.

Let's take this extreme example. I sold my plasma all through undergrad. I'd go in 2 times/week to unless I was sick or something and they'd put a big needle in my arm, pull out blood, separate out the plasma and put my blood cells and saline back into my body.

I'd watch cable, try not to listen to the people around me talking, because a lot of poor Americans are really really dumb - you might see this reflecting in our politics right now, and I'd leave with $20-$40 based on whatever promotion they were running.

When you buy bulk human plasma from a company - anything greater than about 3 mL - that is probably where it came from.

When you're doing clinical diagnostics that IS NOT how you get the plasma or serum.

You're getting those from not-at-all confusing things like these.

How much would it suck to find a really good diagnostic marker from a disease from plasma harnessed from volunteers by a live action plasma separation machine that could not be detected at all when samples are generated from centrifuge tubes? A lot? Yeah, a lot.

Tuesday, October 15, 2024

Revisiting the Harvard FragPipe on an HPC technical note in terms of total time/costs!

I read and posted on this great technical note from the Steen groups a while back and I've had an excuse to revisit it today.

Quick summary - they ran EvoSep 60SPD proteomics on a TIMSTOF Pro2 on the plasma of 3,300 patients. They looked at their run time on their desktop and estimated processing it the way they wanted to would take about 3 months. Ouch.

What they did instead was set the whole thing up on their local high performance cluster and they walk you through just about every step.

It took them just about 9 days to process the data using a node with 96 cores and 180GB of RAM. They do note that they never appeared to use even 50% of the available resources, so they could have scaled back in different ways.

Where I was interested was - if I was paying for HPC access, how many core hours would I be set back for doing it this way? 9 days x 24 hours = 216 hours x 96 cores puts it at 20,000 core hours, right? I know some HPCs track how much you actually use in real time based on the load you're putting on their resources, but others don't. So it's probably at the very most 20,000 core hours. Which is the estimate that I was looking for when I went looking for this paper.

Not counting blanks/QCs/maintenance - 2 months of run time for a 3,300 patient study. 9 days to process. It's such an exciting time to be doing proteomics for people who care about the biology. And - I'll totally point this out - 60 SPD isn't even all that fast right now! It's a 6 week end to end study at 100SPD!

Thursday, October 10, 2024

Use a carrier channel - to reduce(!?!) your boring background!

This smart new technical note does something that I think many people have thought about, but both pulls it off AND methodically dissects it so it's now a completely valid tool to put in our utility belts.

Problem: There are 10,000 proteins here and I don't care about any of them. I care about the stuff after those first 10k.

Traditional solution: Fractionate and fractionate some more and cross your fingers.

New idea - Isobaric tag (TMT is one solution) all your peptides. Then tag (with a different channel) a higher abundance amount of the peptides that you care about.

Perfect application? Infected cells! Even if you've got a super duper bad bacterial infection pretty close to 100% of the protein around is going to be human. But if you label bacterial proteins and spike those in at a higher level you've biased your stochastic sampling toward the bacterial proteins and effectively reduced the host background!

Where this shines is the pressure testing. Smart standards are made and tested and tested. Instruments that can reduce coisolation with tricks like MS3 seem to be the best. Ion mobility (here FAIMS) coupled MS2 comes in second and MS2 alone has a lot of background, but still works.

The proof is divided between a bunch of public repositories. Easier to copy paste than link them here.

Wednesday, October 9, 2024

How much do sample specific libraries help in DIA low input/single cell proteomics?

At first this new study is a bit of a head scratcher, but once you get past the unnecessary nomenclature, it's worth the time to read.

Ignore the DIA-ME thing altogether. I should remove it from the title. Wait - I have a car analogy - just about every review of the Ford Mustang Mach-E is something like "this is a really nice EV, we were just confused about the whole Mustang thing."

DIA-ME is just a name for how literally everyone processes single cell DIA data. We know library free isn't as good as library. And we know that it really doesn't make sense to look for transcription factors in global single cell data. Not even the marketing releases at ASMS have claimed to get to proteins at 10 copies/cell and - oh boy - there are some slide decks from ASMS 2022 that no one has published yet...and not just because I'm reviewing every other SCP paper and limping around punching things while typing anonymous snarky things (I'd rather write snarky things where everyone knows who I am and why). So you run 100 or 200 of your cells on your super sensitive new instrument and you make a library out of that data. Maybe you do that 10 times. Then you analyze your single cells against that library. Works great. Walkthrough here for 2 popular programs.

However - we're all largely doing that because you've got to get 1,000 proteins/cell to get your paper published in a Nature family journal. How much does using these sample specific libraries effect our results and the biological findings?

That's the gold in the method of this paper. These authors painstakingly disect it with spike ins and different library loads and it's all very telling. They use 5 cell and 20 cell and 100 cell libraries and on and on.

If you're interested you can read it. I'm adding it to my reference folder for later.

THEN - the paper gets cool. Forget the mass spec stuff - this group takes some U-2 OS cells which are one of the best studied cell lines for understanding circadian rhythm (smart! stealing this idea for some targeted stuff coming up) and they hit the cells with Interferon gamma. I don't know how to make the funny greek letter thing.

And - no real surprise to anyone who has seen a control/dose response thing in single cells - they identify 2 very different populations of cells. In fact, the two populations appear to be almost entirely opposite in their response! There isn't as much on this as you might hope from the biology side, but it's still cool. Would we want every single one of our cells to go into a pro-inflammatory response? Probaby not! Most adult humans I know are doing everything they possibly can to reduce inflammation whenever possible because that stuff is gross and toxic.

It drives home how important it is for eukaryotic cells that not every cell is going into a full out inflammation cascade when messed up cells derived from a cancer patient and grown in plastic since 1964(!!!) are exhibiting a bimodal response. I was snarky at the beginning of this post, but I think it's both an important and very interesting study, as well as both visually pretty and well organized.