News in Proteomics Research: December 2022

Thursday, December 29, 2022

Aptamers, capillary electrophoresis, or MS/MS?

You CAN have it all, as proven in this study that is so new it is published in 2023!

What this group shows is that well designed aptamers (DNA things that are designed to enrich other things -- legitimately you can enrich everything from environmental pollutants through proteins) they can enrich and almost directly analyze what they enriched!

Why?!?!

You've got me. Actually, I think that most aptamer assays use fluorescence so MS/MS on the back end. MS/MS would allow you to QC that aptamer to see if it really is grabbing what you want it to. In the example they show it looks like their aptamer grabs a couple of things and by MS/MS signal they can work out the ratios of those things. Having an integrated system for QC'ing aptamers or doing quantification on enriched molecules from multi-purpose aptamers sounds a lot easier than digesting each aptamer individually, so this seems like a smart/weird application.

The MS instrument used was a TripleTOF 4600.

Tuesday, December 27, 2022

MS-DAP -- Another powerful data interface for proteomics!

Well....you know....sometimes science is so awesome that if you have to pull a couple all-nighters in the lab during the holidays it is SO SO WORTH IT. And sometimes it just makes you wonder if it would be more fun making a lot more money running a backhoe. My kid would definitely think I was cooler if I ran a backhoe -- when in doubt, get him another toy ashpalt roller -- and I did get to operate one and it was sort of awesome.

HEY! I haven't read a paper in like 2 weeks, except for the ones I'm reviewing and I can't talk about those yet. And this one has been in the queue!

Wow. That title is super long. At first glance I assumed I was a coauthor on this. Still worth it.

Look, there are a bunch of ways now to QC your proteomics data and to process it.

For anyone dreaming of a free interface for DIA-NN data (!!!!!) you're in business!

Now, MS-DAP isn't as easy as something like LFQ-Analyst (which dropped a new version at HUPO that accepts FragPipe data! which, I guess if you run DIA-NN through FragPipe, you're probably good to go!) but the instructions are super clear and you can set this all up locally after you follow all the friendly instructions!

Monday, December 19, 2022

The ProteomicsShow holiday special pilot episode, featuring Dr. Lindsay Pino!

We planned a surprise that dropped yesterday, but I was at a family party and forgot to announce it.

Background: Ben and I recorded some stuff before tricking US HUPO into backing THE Proteomics Show, so we released the very first thing we recorded -- an hour long chat with Dr. Lindsay Pino.

The downside of talking with Lindsay for an hour is how uncool you feel afterward.

You can share this feeling of uncoolness by finding the episode wherever you get podcast things!

Sunday, December 18, 2022

DirectMS1 is back and compares favorably as a chemoproteomics tool!

Metabolomics people get away with working out networks and stuff often by using MS1 based quan. Sometimes at unit resolution. They have smart networks for it like the Momichaug or mummichaug or mamachug or whatever it is called (let's go with MC)

If you don't know about it, you should, it's amazing. Here is my dummy's version.

What if you're doing metabolomics and you find 3 really cool molecules that your drug treatment causes a whole lot more of them to be detected (increased relative abundance). You're confident on the mass and the retention time and that there is more of them, but your identifications are sort of wobbly.

Ion A could be molecule in your library 1, 2 or 3

And Ion B could be molecules 4 or 5

And ion C you're pretty darned sure is molecule 6, but maybe 7

What MC does is takes the pathway information into account. And if one of the canonical pathways in your organisms involves direct links between molecules 1,4 and 6 where if you have more of #1 it would be weird to NOT have more of 4 and 6. So it says -- "yo, your molecules are 1, 4, and 6, duh"

That wasn't what I was talking about. I was talking about DIRECTMS1 again!

(from this great team that I truly hope is doing okay)

Thanks to this new paper!

What's DirectMS1 again? Oh, that's where you identify your peptides from high resolution MS1, reassamble those peptides into proteins, take into consideration the quan of the peptides in the confidence of the protein quan and do proteomics really really fast (5 minutes in this Q Exactive HF(X? I forget) example.

See the parallels that made me think of a well established Metabolomics tool?

What this team did this time is compare MS1 results of drug treated cells to chemoproteomics data acquired in the more typical and dramatically more slow MS1 + MS2 acquisition over hours of time.

Could it possibly compare?

(I mean...I wouldn't be up at 4:30am typing poorly about waiting for THE SLOWEST ESPRESSO MACHINE IN THE WORLD it if it didn't work...probably....)

It does really impressively well. The paper is open, you should check it out. Yes, there are proteins/pathways that using 100x more LCMS time did find that DirectMS1 did not, but you don't see real disagreement here. The paper is short and does get a little confusing for me toward the end (see the lack of caffeine/time thing) but what you also get is some optimation to speed up/improve the sample prep time as well.

Yo, question everything.

Saturday, December 17, 2022

Derivatization of small peptides allows MALDI peptide sequencing!

The lasers have gotten better, but otherwise MALDI is sort of in the same place it has been for a long time otherwise, particularly if you're interested in peptides and proteins.

This is a neat trick to derivatize small peptides so that they're out of that low mass range where all the junk and matrix hangs out. If the tech can't improve further, maybe what we need is smart things like this on the sample preparation side to get us past the expensive pretty picture stage!

Friday, December 16, 2022

Surpass the high mass limits of proteoform analysis with integrative top-down proteomics!

If you are a subscriber of the Analytical Scientist you might have gotten an early holiday present when you saw one of our own has a big time article in this big subscription popsci magazine! While no photo could capture the true charisma of Dr. Neil Kelleher, it's still a pretty great covershot. If you have 7 minutes, which my phone thinks that's how long it takes, which is an...interesting...metric for this new MacIntosh phone update. I strongly recommend taking a look at the ambitious goals laid out in the article.

While I could fill this blog (and largely have) with Dr. Kelleher's successes toward his futurist take of where proteomics could and hopefully will be, today we're all a little intact mass limited.

I don't have any of the coolest top down tech today -- at least the ones that people have developed applications on -- so I'm a little out of the loop of what can be done. However, when I look at the best proteoform resources, the proteins still seem to predominantly be below the average mass of those in human cells.

What if there was a technique that could shatter these mass limits for proteoform analysis RIGHT NOW? Forget charge detection and it's 5x mass improvement. I'm talking 20x proteoform size distinction benchtop for just about everyone?

Check out integrative top down proteomics!

In this approach proteoforms can be separated in two dimensions allowing rapid selection of quantitatively interesting proteoforms that can be identified on virtually any instrument.

Here is a visualization of this 2-dimensional separation technique

See those red arrows (click to expand). That's 250kDa! Proteoforms are first separated by their electrophoretic mobility (dimension 1) then by the by their intact MW!

Then you can cut the ones out that are differential, digest and analyze them!

Thursday, December 15, 2022

Need more speed? Experiments with parallel accumulation on Exploris!

I've been cruising along with blazing fast scan acquisition rates thanks to some TOFs here with ion accumulation technology stuffed on or in them. What if you could push an Exploris up to the similar speeds?

Check out this poster making the rounds on Mastodon!

Orbitraps have 2 traps. C-trap and....Orbitrap...and overhead from both due to ion gating and transfer. What they did here was do some parallel accumulation to concentrate ion signal and get the efficiency way way up.

What's WAY WAY? That's image above shows 100 Hz. 100 scans/second ON AN ORBITRAP? There are some obvious consequences in this proof of concept, like less than 2,000 resolution at that speed. At the more stable 75 Hz, the resolution doubles to about 3,750 at 200 m/z.

Tuesday, December 13, 2022

Is using formic acid in your buffers stupid? Get 2x more signal with acetic acid!

I'm not going to name any names, but it sure wasn't MY idea to use formic acid. I'd never even heard of the stuff before I got hands on my first LCMS system. I have some vague impression that you can use it to kill ants. This group probably made a mistake and accidentally tried a different acid that doesn't break down in light at room temperature and -- BOOM --

2.5x more TIC signal???

Check with your vendor to make sure your system and pump seals and stuff are all compatible with this. You don't want to be on stage talking about a big buffer optimization study for endogenous peptides you got funded and the first "question" from the audience is a vendor rep informing you that most of those buffer additives will void the warranty on your system. About 120 people saw this happen once and it was really funny. If they aren't definitely make sure you keep buffers around to swap out whenever someone comes on-site. The buffer containing DMSO should NEVER say DMSO on it. It should say "He sparged" or some other made-up phrase.

Monday, December 12, 2022

Tutorial: Adding common contaminants to your DIA workflows!

Yikes! Okay, y'all. I'm seeing way way too many DIA experiments without any kind of contaminant libraries being used. Not throwing out any names, but this goes all the way to spectral libraries included in something commercial I paid a lot for.

Recent work really went into how very important this is even if the title was a little tone deaf.

This isn't hard to do, at all, and I know one increasingly grumpy old guy who might be a complete jerk during peer review it you aren't doing it.

If you are doing things like "library free" this is super easy. If you aren't it is also easy.

Library free for PROSIT or DIA-NN or whatever?

Just append your FASTA database you are using for prediction to have contaminants.

If you don't have NotePad++ on your PC, you should. I'm pretty sure that Bruker instruments come with it preinstalled on the instrument PCs now. It is free, it doesn't append silly things to the name of your file and it appears to be able to open documents of almost limitless size.

Open your FASTA you were going to predict or whatever and open your favorite contaminants libary. Cut/Paste them together. You can go with the classic: cRAP (https://www.thegpm.org/crap/) or you can download the MaxQuant contaminant library.

Very related: Charlotte Dawson has this great discussion on contaminant libraries as part of the campProtR package they developed (Charlotte and Tom Smith @ Cambridge?) as well as direct links to download various contaminant FASTAs. Totally worth skimming through and I'm definitely checking out that R tool.

Looking at the contaminant library I have above, I don't think that these annotations are going to look perfect in everything. I'm appending in 12 different KRAS mutants from SwissProt and they don't have colons or semicolons. They use |, so I'm going to use some quick Ctrl+R, Replace All, starting with : for | so they all look the same. Once it looks like they'll pass (you can always proof read your FASTA with the free tool in the PD viewer or in MaxQuant to verify you don't have a bunch of mistakes in your FASTA (or probably 75 other tools).

5 steps for MaxQuant below. Chances are if MaxQuant can parse your FASTA properly now you're good to go

Merge your FASTA. I'm just going to go to the shorter one, right click on the screen, Select All, Cut, and paste it at the bottom. Then I'm going to save mine as HumanUniProt04162010_12KRAS121322_contaminants.fasta (part of that is a joke).

This is on here somewhere, but Windows has this default in most cases that you may need to turn off --

You can search for Folder options or file explorer options. If this is enabled when you save a file .fasta it will save it as .fasta.txt just to mess with you.

Now you've got a FASTA! If I'm going to use PROSIT, I use the EncylopeDIA to make the input file. Here is the walkthrough for that.

Something I only learned recently was that you don't actually have to process data in DIA-NN. You can just use it to make you spectral libraries. It will also generate your Prosit input if you have DIA-NN but you don't have EncyclopeDIA (which I highly recommend you have, it's amazing).

Here I've just had it take my FASTA -- no input files and generate my spectral library and Prosit input. You can also do something funny with DIA-NN where you give it one spectral library format and it will give you it's favorite but I should go to work soon.

Either way -- here you should now have a predicted spectral library with contaminant in it.

Want to generate a library with one? Chances are (I hope!) you already have! If you are searching your DDA data with a good contaminant library to make your input -- don't filter them out before making your spectral library from your data. That's it. I know a lot of tools or templates autofilter out the ++ contaminants or whatever. Remove those filters before building your library. I build my libraries using Skyline and then convert them to whatever format I need with EncyclopeDIA.

Hopefully you don't need any of this information and you're like "geez, Ben, great way to waste 38 minutes of your life (I actually type sort of slow)".

Sunday, December 11, 2022

The MSFragger PD-nodes paper! Time to repost a popular tutorial!

Two people contacted me while I was at HUPO looking for info on setting up Proteome Discoverer with MSFragger! That is a little more than average, and then I realized this paper is finally out!

I've been using MSFragger as a standard part of my workflows in Proteome Discoverer for the last 2 years, primarily for rapidly hunting PTMs and for open searching with visualization of the PSMs.

I originally posted this in 2020 and I thought I'd do some updates today, but it's all about the same. MSFragger works just fine in everything up to PD 2.5 and I'm about 11% sure I could just drop the .dlls into the 3.0 and it would work fine, but I haven't done that test yet.

I'm on my desktop in my home office and I've got a PD 2.4 viewer that I use for MSAmanda and MSFragger searches. I do my LFQ quan with ~~PeakJuggler~~ ap.Quant and that feeds into LIMMA. Most of my workflows on my commercial license in the lab will look something like this (example for TMT quan shown below) if I'm looking for PTMs. I really like having the ability to rapidly screen PTMs by ones identified by more than one search engine before I start flipping through the PSMs manually. I also have some other really cool nodes and you can find notes on those over there somewhere -->

OH. This reminds me about something I should talk about later! I'm actually using a prescreening my spectra up front these days to remove peptides from collagen first. It is crazy how many of our really intense unmatched spectra are collagen with 4 oxidations on them or something. There is some guy in the field who has been going on about this for years and everyone ignores him 😅. We shouldn't be. I'll come back to that at a later date. It just took me a while to figure out how to work filters into my workflows. I actually had to see how Dr. Amol Prakash was doing prefiltering and then that inspired me to make a workflow!

Saturday, December 10, 2022

COMPLEX-Down Proteomics (top down protein complexes) off the shelf!

This was sitting open on my desktop for a while and I kept skipping over it. You'd guess from the paper title that you would be looking at a secret magic instrument and a bunch of python doodads to process the data, right?

You aren't. This is a commercial instrument (Exactive UHMR) and commercial software (BioPharma Finder 3.something) used in a really clever way to work out an intact protein complex.

Where this really shines is how the authors work out the native monomer fragmentation patterns and then apply what they know to the tetramer. Check this out --

RIGHT?!? Makes sense, but I wouldn't have thought to check! Normally when you see really nice topdown work they used something that is 8kDa. This tetramer is ~150kDa! So here you are with the ability to study the intact monomer easily, the native monomer, get a solid high accuracy mass of the intact thing and see it's a tetramer. That's all normal stuff a lot of people do. But if you do the native complex fragmentation you can start to figure out how the darned thing is configured. What's internal in the complex? What is solvent facing? You can model this stuff with tools, chances are if you use 3 you'll get 4 different anwers -- and here you get direct evidence of which model is correct!

Friday, December 9, 2022

HUPO 2022 more big takeaways - Question everything?

Whenever I try to take a picture from a room where a lecture is happening it always looks like no one is there. There were a ton of people in this room (where we were split into 4 or 5 separate overlapping sessions). I know I'm allowed to show this photo during Dr. Claire Eyers's talk because an AV issue kept her from having any slides for about half her talk, and she still did the presentation. "If slide #4 were here you'd see.."

I'll come back to the protein folding plenary lately. John Yates tipped me off to a rabbit hole you could fall down for a while.

1) The title of this blog post comes from the talk above, where I'll paraphrase from my kid's favorite person to stare at without blinking while he is covered from chin to the top of his head in pink yogurt --

"People hand down protocols and we need to take a step back sometime and see if they really make sense."

You know what the half life is of a histidine phosphorylation site at a pH of 2-3?

15 minutes!!!

Half your phosphopeptide signal is gone! WTF, right? So Eyers lab has been working on phospho preps that aren't acidic and finding huge increases in recovery. Another thing that she mentioned that I planned to evaluate when I got back was whether we were seeing increased recovery of phosphopeptides when they were recovered from samples that had not been reduced prior to digestion. I often don't reduce and alkylate proteomics samples because I care about cysteine PTMs that are irreversibly lost when you heat a sample with DTT. I've got a pile of spreadsheets to dig through.

2) Monash is generating actionable HLA data from clinically relevant amounts of material -- with TOFs

I had an 8am rant about how much I hate nanoflow and am doing some reasonable single cell without it (preprint is #3 on the list currently behind all of these darned proofs) and Anthony Purcell (who I didn't quite get to meet) followed me showing an IonOpticks setup on a ZenoTOF (didn't know that was possible -- it does look sort of funny) and clinically actionable HLA peptide recovery from a freaking TOF. EAD looks super valuable for endogenous peptides

3) I did get to meet Hugo Gagnon at Phenoswitch and he helped me with a problem I've been stuck on and he showed how proteomics can help do QC on two of the hottest things in medicine -- PROTAC and CRISPR/CAS9 off target effects. I've mentioned this on the blog before, but if you've fallen for the "CRISPR/CAS9 is perfect" thing, you obviously haven't used the technology yourself. It is really really well marketed. Sure, it's amazing, but it'll miss and make alterations places you aren't expecting them. It'll also make mistakes sometimes even when it does hit the right part of the DNA. The argument I've always heard is that since the protein coding region is so small, it's unlikley to make alterations in places where you'll see it at the protein level.

4) Important to me technology --- EvoSEP showed off automatic tip loading on my very favorite robot!

You bet we're going to get this set up ASAP! (OpenTrons prices have went way up, but you're still talking about a $10k robot!) https://www.evosep.com/applications/automation/

For EvoSep users, Dorte Bekker-Jennsen herself showed data from the new update that your LC told you about on Friday that adds a new faster Whisper method AND a new 500SPD method. 500 injections per day??? 200ng K562 hitting 2,500 proteins!??! WTaF? Vendor talk, grain of salt, but WTaF, right (ZenoSWATH) and all 50 FDA biomarkers recovered in plasma.

5) Anne Gingras (sp?) showed all the BioID work that they are doing which has led to the ever growing www.HumanCellMap.org (they already baited 244 BioID targets and a crapload of new baits are coming.

6) Ruth Huttenbrug (sp?) showed how her group is using APEX (like bioID and I forget the difference, it's on the blog somewhere) to figure out how ALL THE GPCRs work! You remember those from class, right? They have a very conserved X number of transmembrane domains and they're responsible for loads of things like pain response/repression and, yes, that's all I remember. We have 800 different ones! So they're apexing them to figure out what each one does. She highlighted a super weird one.

7) Sonja Kabatnik talked about clinically applying the Deep Visual Proteomics thing Mann lab is doing to improve histology. Super cool stuff using old samples every hospital has tons of. I haven't seen this preprinted so I'm not going to go into it. It's easy to think "okay, that's cool you had $5M in equipiment but what are you going to do with it other than fill big journals up so there is no space for anyone else? Then you see an actual application and you feel bad for thinking this was the very first thing you did if you found a picture of a very nice group of hard working scientists --

--sorry, but this is still totally cracking me up.

It's 9am!! I gotta go. Probably will ramble more later!

Thursday, December 8, 2022

HUPO 2022 (Quintana Roo!) part 1(?) My quick takeaways!

Didn't get to go to HUPO 2022 because someone mistakenly thought you said "Cancun" instead of "Quintana Roo" when requesting permission to go?

This was my first HUPO since Vancouver, which was awesome, so I had some high hopes despite the location.

My time is very limited and my email thingies are very very full, so a couple of big things first:

1) Holy crap. Half of these "next gen" proteomics things are protein microarrays.

I'm not even joking. Look, no stress, you can get good data out of protein microarrays, but I definitely thought that a lot of these new and exciting companies were....I dunno...newer? excitinger...? something that hasn't been ar

2) The big "next gens" are coming for us with complementary solutions and load and loads of marketing money.

I met the SomaLogic marketing team --

(These aren't really them. Similar strategy, though.)

I did spend some time asking a really patient scientist with O-Link a lot of really dumb questions and I'm not the only person that did that (though I bet the other people asked smarter questions). But now I understand how that one works. It sure ain't cheap. You need to add about $300k worth of things to an Illumina NovaSeq (list about $1.5M, but Illumina is a kit revenue based company, you sign up for enough kits they'll figure out a way to get you one). Then you need the reagents and flowcells from Illumina AND the OLink kits per sample. Once you get to a stupid high n -- it makes sense. If you don't have 40,000 samples lying around ready to go, you probably want to line those up first before asking for getting set up. They do have a little PCR based protein analyzer for massive targeted validation and I think they sold more than a few this week and for sure at least one of the big NovaSeq based ones.

3) DIA is THE method for LCMS based proteomics. I'll thank Dr. Naomi Diaz for prioritizing this one on my take-away list. I think she said "did you see any DDA in this entire day's poster session"? And I checked my notes and I'm not sure I did. The paradigm has shifted. People are clearly still doing DDA proteomics, but it is increasingly rare. When someone sorts out the PTM prediction and localization thing for DIA, it might be over. When you do find someone doing DDA, they're probably doing glyco or phospho or chemoproteomics or something.

4) Everyone is AlphaFolding or MetaFolding. Monday's plenary was someone from EBI or EMBL (it'll be in my daily recap if I get to it) which was awesome. Dude totally just broke down the different scores for AlphaFold stuff and where people are most likely to misinterpret the quality of the data. Just about every poster that focused on a specific protein included supporting(?) evidence for the 3D structures.

5) This is may be a weird takeaway but for me it was one of the most optimistic and important (particularly for patients) takeaway from the final day of poster sessions. I went through an entire aisle of posters where the focus was finding clinically relevant secondary inhibitors for tumors that had developed new resistances -- and they were all just about perfect. Institution independent, instrument independent, country independent -- when it comes down to one of the things where proteomics can make an impact in someone's life right now -- everyone is doing it right. PB 03.94 - 99 are all underlined and say the words "really really good resistance mechanism studies". I'm generally really critical of these studies, so I don't write a lot about them here on my portal of proteomics positivity, but what a great thing to see.

More later -- I've got like 15 pages of notes to sort through!

Friday, December 2, 2022

The incongruity (not incredulity) of "validating" mass spectrometry data with western blots!

I know some authors who just earned a citation in every paper that I clumsily write for whatever is left of my "career"!

This is short. It is well-written. And they didn't mess around with references.

They put one of the most important studies in our field's history as #1.

Finally -- the ZenoSWATH paper!

ZenoSWATH is an upgrade for the SCIEX 7600 that rolled out at ASMS 2022, and it looks like somebody got it enabled and took it for a spin!

The ZenoTrapping/ZenoPulsing thing is a neat trick inline after the quad and prior to the TOF. Rather than just running the ions into the accelerator thingy, they can accumulate (pulse?) prior to firing them into the TOF. My understanding is that it doesn't do it with full scan, just MS2, so you get the data looks sort of weird where your MS1 signal doesn't really go up, but your MS2s do. If you are doing data dependent you can probably imagine some drawbacks here, right? Maybe you don't have enough sample intensity to trigger what you want to, etc., but we're still finding it really useful for DDA. Where is absolutely shines is the PRM (mrmHR) functionality.

However -- if you're DIAing/SWATH....buckling...?

Who cares about your MS1s anyway? (I mean, I do like to have MS1s, but they are clearly less important here).

You get to a point where SWATH can't see anything, but ZenoSWATH keeps on going. I'm less impressed with the high end numbers (yo, just load more peptide) but the low levels are impressive. The copy number distributions shift decidedly in the correct direction.

When you look at the numbers in this paper, keep in mind this isn't nanoflow! This is microflow (5uL/min) AND analytical flow (2.1mm column!!!!!!)

Seriously cool trick and another big step in the right direction for a competitive landscape in LCMS proteomics technology.

Thursday, December 1, 2022

Open up the plasma proteome with ultracentrifugation of EVs!

Every single time, plasma proteomics is disappointing. Haven't done it in a while? Have an instrument that is fast, more sensitive, and just better in every single way than the last one you tried it with? Fire it up .....

(drumroll)

....the same 400 proteins you were able to quantify in 2012 on that instrument you can get on Ebay for a pack of Big Red and paying for freight....

You can deplete your top 3, 10, 14, 31.23 most abundant proteins and see more, but now you're introducing a whole new can of worms (which is a weird phrase. who has a can of worms? gross).

What if you could just centrifuge the heck out of the plasma and concentrate the EVs? Extracellular vesicles are something that has been leveraged before (there is some really cool work from Geiger lab rings a bell Promise-Quan?) but how does it compare on today's fast DIA methods?

That's what this group went after in this fun new study!

And it seems to work really really well. 1,400 proteins using DIA in relatively short gradients and these aren't just random proteins they directly correspond to what we currently know of the human plasma proteome.