Thursday, April 11, 2024

How fast is a Mac M2 chip for proteomic / scientific data processing?


I recently had a Windows PC perform an unauthorized system update in the middle of a big batch of files and gave up and bought my first ever MacIntosh/Apple computer. Possibly because I grew up in somewhat extreme poverty and possibly because I don't particularly like the color silver, these things always seemed like they weren't for me. But they are advertised as very very fast. 

It's easy to be skeptical, though, because now they are making their own chips. They can use whatever benchmarks they want. The benchmarks I care about are processing proteomics data! 

TESTING TIME! My Apple M2 Pro with 16GB of RAM and 1TB hard drive vs my closest looking Windows PC - same number of cores, but Passmark thinks it is a little slower. 

First experiment, performed between when my elderly dog needed to pee at 3am and when my toddler woke up at 5am. Details are fuzzy....

Tools - SearchGUI with MSAmanda 3.0 and SAGE

Random HeLa digest file from an Exploris 480 with FAIMS. 200ng separated over 120 minutes on an EasySpray 25cm x 75um with a 3cm PepMap trap. Converted to MGF and having around 110,000 MS/MS scans. I might have generated it or downloaded it. Not sure. It was on my hard drive from this review I did a while back. If you want the file the MASSIVE download link is in it. 

Search tolerance was set at 10ppm MS1 and 10ppm MS2. I used the Uniprot Swissprot for human (9606) that I downloded this morning. 20k entries and I had SearchGUI add decoys. M+oxidation and static carbamidomethylations (these are the SearchGUI defaults).

Windows PC - pretty freaking fast at 3 min and 25 seconds! 

M2Pro -- 

2 minutes and 5 seconds! WHAT? Not the way I expected that to go. 

Okay - I should also point out here that SAGE is nuts. But the discrepancy is even more insane on the much new MacIntosh chip. 13.4 seconds to search a file vs 2.9 seconds to search a file?? 

Then it occurred to me that it's probably not fair to run a file on a MacBook on a battery. I plugged it into the wall, deleted the first output file directory and reran the file.

Maybe a tiny bit faster, but within the margin of error here? 

It is fair to note that the Windows PC I'm currently typing this on doesn't do much data processing these days. It's certainly perfectly suitable and I have zero reasons to think I'm not going to be using it as my primary office PC for several more years. However, for our main PC "server" which has supported as many as 6 users with SpectroNaut, Proteome Discoverer and Compound Discoverer we've been using a Ryzens 9 (I think this is the chip on that box below) - which aren't even all that fast now! The big EPYCs and Threadrippers are pushing 3x this benchmark - but you've got to be ready to drop as much as $10k just on your processor. Good time to be a consumer if you need PC power! 

That was a lot of words, for - hot dog - these Mac chips (and SAGE! WTF?) are FAST! 

Monday, April 8, 2024

Is a TOF killing technology on the horizon? TMT32-plex looks official!

We saw a leak to potential investors of one of our field's oldest and most ethical companies stating that a 32-plex was coming. Not a "I SILAC'ed it and TMT'ed it" technology, a real life reagent that would allow you to multiplex 32 samples simultaneously! 

While the expectation is that we'll see real details of these reagents at some thing in Anaheim in June, one core facility is already advertising that they've got it. You might be able to guess which one, but I think it's fair to say that it is really truly real now. 

Short of completely redesigning the tags there are only a couple of ways that this would be able to work, right? And I can't think of a way that you do this without requiring an increase in the relative mass resolution necessary to use the tags. 

For some people this might not be a big deal, right? There will probably be some drawbacks, like the whole "now I have 32x more albumin to deal with instead of 18x more albumin" but for some of us this reagent could be completely transformative. 

This is how we multiplex single cells in our lab (from Eberhard and Orsburn, which should be out any day now) using 2 x 96 well plates for 2 conditions. That leaves us with 16 unused wells per plate, which is what we use for LFQ single cell method development. Anything you see out of our group like the acetic acid preprint or our upcoming miserable work with 20um ID columns, is done using the 16 cell left over from the hundreds of plates we have went through the last 3.5 years. 

32-plex will allow us to switch to using 2 x 384 well plates! know what is crazy? This actually almost makes it easier for me to automate. I just have to fabricate a new sample deck that will allow me to stack my plates sideways! 

Sunday, April 7, 2024

Confuse everyone by converting an EvoSep One into a fraction collector!

If you've got a spare EvoSep One sitting around not doing anything you could sell it to me for less than a new one costs OR could get a 3D printer and convert it into a (??) ... robust and sensitive offline fractionator.

The 3D printer converted to a fraction collector is seriously cool.  You can get a shockingly high resolution 3D printer for <$500 these days and you won't need super precise movements to put your tube above the right well in a plate. Steal these plans to build your next fraction collector, for sure! 

The...use of an HPLC system that costs the same? more? than an HPLC with an integrated fraction collector...likely makes this more of a niche idea. The sensitivity is really good, though, with good recovery of high pH fractions of 5ng of tryptic digest. 

Saturday, April 6, 2024

Moms in Proteomics pushes for change!


If I knew about this, I forgot about it somehow, but it came up on a cool podcast recording and now I have the link. You can check it out here!

...or this article! 

Or this original article! I've clearly been living under a rock or something.... What super cool ideas all around! We all know that proteomics will inevitably become the most imporant -omics. Making sure we have support systems for all of our scientists now is the way to make sure it happens sooner and is sustainable. 

Friday, April 5, 2024

New THE Proteomics Show podcast with - Dr. Henry Rodriguez!

B-sides is the new "season" of THE Proteomics Show where we just have guests that we've really wanted to talk to. Henry Rodriguez has been on my list for a long time, even before he held a position in the WHITE HOUSE! 

There probably hasn't been a stronger advocate for proteomics - as a science - in the US government and it was super cool to get his side of the stories - like how did CPTAC get started?? 

Thursday, April 4, 2024

I'm walking into spiderwebs - making sense of protein-protein interaction data!


For maybe the most creative abstract/TOC graphic you'll see today - 

I legitimately laughed out loud when I saw this new paper ASAP. When Ska went mainstream in the late 90s, I was NOT a fan, but it was absolutely everywhere. You couldn't avoid it at parties and I WAS a fan of parties in the 90s. 

More importantly, this paper is really super useful! Who doesn't have a protein protein interaction network on their desktop they are saving for a day they have nothing else to do and feel really really smart? There might be 10 on mine that I'm not entirely sure what to do with, but an answer might actually be in there somewhere. We have some compelling and very confusing FANCM enrichment data that I've looked at off and on for months. There is something in there, I have no idea what. 

And in case you aren't in your 40s, this is what this TOC graphic for the paper might be inspired by. Maybe. The single from that album was called something something spider webs. 

Wednesday, April 3, 2024

Reinvigorate that older hardware with second party ion funnels!


I'll leave this here without much comment, but pulling off the front end of an Orbitrap Velos/Elite system and putting in a dual stage ion funnel sure seems like a smart way to keep that depreciated instrument contributing. I feel pretty confident that I can guess an application this might be applied to by the corresponding author's exceptionally innovative group. I'll be on the lookout for it! 

Tuesday, April 2, 2024

SpectiCal - Use those low mass fragment ions for something!


You know those low mass fragment ions your search engine is probably set to ignore - because - let's face it - how useful is it to know there is a protonated lysine fragment there? 

SpectiCal will take mzmL files that have these things in them (they all do!) and will use it to recalibrate your MS/MS spectra! 

We recalibrate all our TOF TMT spectra using the reporter ion exact masses using a lousy program I pulled together, but it is very manual and - not useful for something that isn't TMT.

This is dynamic and - while it has the dreaded "pip" in this Github, that's something ChatGPT can help you with if you're not sure where to get started. 

I presume for Orbitrap spectra you could start talking about ppB mass accuracy after running it through something like this! For me, I just want Orbitrap level accuracy at (affordable) TOF speeds. 

Monday, April 1, 2024

Two sources appear to leak details on new ASMS hardware! Booking a flight to Anaheim now!


Honestly, I don't think this is anything that will come as a surprise to anyone at all. Particularly if you have taken a look an realized how very very old the Trapped Ion Mobility patent is (2008!

The first came to us out of Australia, by way of 

Full text here! Honestly, my first thought was "why not Ultra" but Flex obviously makes sense, since Thermo doesn't have MALDI capabilities. And since Bruker can't get pasef to work with MALDI, Asstral backend makes an awful lot of sense! 

Just a few minutes later we had this release suggesting the obvious - that Veridian Dynamics -  is involved in this collaboration. 

While we haven't heard much from VD recently, they were the first to jump on the expiration of the Orbitrap patent and begin commercial construction of their own hybrid devices! 

Given word choice on the Veridian release, I think it's safe to say that one of our leading labs is involved in this! 

Guess I do have to work out travel to Anaheim afterall....biggest year for $,$$$,$$$ level mass spec labs ever! 

Wednesday, March 27, 2024

Aftermarket high resolution FAIMS allows separation of MEGADALTON protein complexes!


FAIMS gets a bad wrap because most of the commercial systems have a resolving power of something between 5 and 20. They're great systems if you just don't want to fragment (or see) +1 ions and you want your mass spec to only see +2 or +3 ions. Cleans up your spectra so your mass spec doesn't have to work as hard, and everyone is happy at the end. that a limitation of FAIMS technology itself, or is that what is mass 😁 produced for the general market? Sure sounds like it's the latter. 

In this new study an aftermarket/custom high resolution FAIMS system was coupled to a UHMR (which has an upper mass limit of 80,000 m/z? Is that right? That's huge) and oligomers of antibodies (so a monomer is around 150,000 Da!) were coupled. 

The study is maths heavy and there are a lot of formulas, so I found it hard to get to the effective IMS resolution. However, this 2019 study indicates that the FAIMS is >100 resolution, and I think the two devices are similar

And - get this - you can get this FAIMS system for the front of just about any instrument and they can be custom tuned for small molecules, peptides, or intact proteins. And they're a lot less expensive than the 10 resolution units that you can buy for only certain instruments. 

Tuesday, March 26, 2024

5-plex your data INdependent analysis methods by modifying dimethyl tags!

This has sat with a bookmark on it for quite a while I hoped that I'd remember to ask someone intelligent questions about it. 

Multiplexing DIA sounds like either the best or worst of both worlds. More samples/day but you've increased the complexity of your background so those magical neural network thingies have to think a lot harder. 

There is very little chance I'd consider 2-plexing my DIA. That isn't worth it to me in any way at all. 3-plex? That is enough that I bought reagents so I could eventually try it, but I haven't been anywhere near excited enough to actually do the try part. 

5-plex? That's worth thinking about. 5-plex without fancy expensive labeling kits? That's worth bookmarking.

Disclaimer: I've never dimethyl-labeled. I feel like there is some drawback to it, like the tags shift the retention times just a tiny bit? I forget. Again, meant to do some background research - and didn't.

You can read about it here!

Sunday, March 24, 2024

Multi-study meta-analysis of dog samples with oral diseases!


I'm wrapping up a meta-analysis right now that I've bugged just about every proteomic informatics person I know about in one way or another. The insanely beautiful data that I'm working on reanalyzing is from a patient study where they said "you thought CPTAC was thorough? hold my espresso". No joke 33 offline fractions at 2 hrs each on a QE HF on these priceless human samples. The data is perfect for what I'm doing, but during the uploading of data for almost 70 patients, they missed a few here and there. The PI is now retired and the team is dissolved, so I ain't getting patient 36 fraction 32 or patient 51 fraction 7. Whether to keep these patients or not is what I've been bugging people about. 

BTW, this data just provided a spreadsheet of which patient was whom and I find it just fine to work with. 😉. I didn't need a short wave radio or to learn Morse code or anything. 

Crap, that's a joke like 15 people might get, again, right? I need to get out of my house. 

The great people at the Proteomics Standards Initiative came up with a meta-data standard for data in repositories called SRDF and it's a great idea for anyone wanting to automatically pull loads of proteomics data for reanalysis. 

Since my poor sense of direction would never allow me to find the utility closet at HUPO where the PSI meeting is allowed to have their annual meeting, I missed it. When I was told I was supposed to be using this format, Google helpfully told me that it has something to do with condensed short wave radio signals're around to the topic of this post....

These researchers successfully drew conclusions from an extremely wide array of proteomic studies deposited over time for the under-studied disease that they are interested in. 

Including ELISA data - old spot cut out MALDI-TOF peptide mapping stuff and some more modern iTRAQ and TMT work.

However, I do think this is a good example of studies where this hasn't been done and no one will go back and do it later, where some interesting findings were still made. 

SEER Proteograph prepped plasma on TIMSTOF Pro2 and HT!


There has been lots of excitement (and some scary good data) out of the SEER proteograph system for plasma proteomics.

Here is some more! 

While the authors clearly intended this to be more of a comparison of two very nice recently released instruments in their lab, as you can probably see from the figure at the top, the proteograph steals the show.

Clearly the 14-bit digitizer and higher capacity TIMS improve identifications, but the plasma precursors go up 3x - 4x when moving the prep to a kit that I have absolutely no idea at all how any of you can afford to use. 

The peptide loading plots are also really cool. I've never considered running over 400ng on the TIMSTOF Flex, and we only use that much when we use the EvoSep which runs closer to microflow than nanoflow levels. 

Friday, March 22, 2024

Registration is open for Cold Spring Harbor Proteomics Course 2024!


Do you want to buckle down and spend 2 straight weeks learning proteomics? 

There is probably no better way to make that happen than getting accepted at the Cold Spring Harbor proteomics course. You can apply here

I'm lucky enough to have been able to participate as a guest instructor twice in the past at this amazing workshop. I did have to take it off of my CV, however, because I'm not listed on the website as an instructor. 

The way I've described it to people is "Sunday night it's a bunch of people in a room with a slide deck that says "What is proteomics". Later that week there are 10 people crowded around an instrument monitor at 1:30 am watching phosphopeptides someone taking the course prepped themselves finally start eluting and fragmenting off an instrument someone taking the course is running themself. Then everyone celebrates!" 

It might have changed I was there a long time ago. 

If you do get to go pay very very close attention to the train station map. There are ZERO sidewalks and a windy back road with New York drivers so you and your roller bag may have to rapidly dive off the road with your roller bag if you walk in from the wrong stop. (Story I heard from someone who is very bad at maps). 

Thursday, March 21, 2024

Deep learning - of glycopeptides!?!?!?


Okay.....on the surface this was first surprising that the first PTMs we'd see deep learning successfully applied to en masse was going to be glycopeptides. Then I thought....well...the problem is the stupid sugars all have the exact same masses. 

Here is an illustration from an unrelated study for fragmenting the fragments of the fragments to figure out what a glycan chain actually is because the fragments of the fragments of these important glycan chains still have the EXACT SAME MASSES. 

So maybe this isn't the biggest stretch in the world ever (link to this paper and topic of this post). 

Unlike the topic of yesterday's post, this tool is still in the - you need some skill with a computer to actually use it - but if you have such skills you can get DeepGlyco here. It does require installation on GPU for the deep learning magic. 

Wednesday, March 20, 2024

AI assisted proteomics - for everyone -with Ms2ReScore 3.0!


MS2ReScore is a great idea! Let's use Artificial Intelligence things to reanalyze our proteomics data based on various features of confidence to create bigger and higher confidence lists. It's free, too! see what you're dreading you'll see. "DOCKER"?? "git"?!?  

Great...another tool for bioinformaticians written for bioinformaticians that I could spend the next 3 weeks learning how to use if I didn't have 11 mass spectrometers that are running suboptimally, a grad student who still hasn't returned from Dubai and 3 grant deadlines. (These are only examples, but you get the point, time is limited). Could I just have a Windows installer for some of this magic? 

You can now! 

Not to be missed here is MSAmanda 3.0 which can be installed in PD 3.1 or newer and has stand alone versions for PC, Mac and Linux and can be ran through PeptideShaker. MSAmanda 3.0 by default outputs the features that Ms2ReScore uses to reassign peptide confidence.

When using MS2ReScore you also get cool HTML QC plots along with a bunch more peptides. When compared to MSAmanda + Percolator, it looks like a solid 20% increase on low abundance peptides. The authors use a nice SCP dataset that I'm going to assume came from Erwin Schoof's group and then they do a bunch of analysis that makes it look like these peptides are real and make sense in relation to the other peptides. 

I'm all for more data analysis tools, and I'm even more for them when I can just download them and run them! 

Tuesday, March 19, 2024

PRECISE readout of MEK phosphorylation cascades by top down proteomics! I won't lie. I'm stunned. I didn't think we were here yet.... and, to be fair, maybe we arent,but this group is! 

Bad background by Ben: A whole lot of the central regulatory pathways controlling tons of things in cells are based on some key central phosphorylation cascades. MAP kinase and MTOR are famous ones. They modify proteins by phosphorylating them because the modification can be fast and it reversible. If you go into any oncology centric place there is probably some really really really skilled pathway scientist (or 4, if they can afford them) who can dissect these cascades, probably through western blots and FACs. These people can tell you that if MEK1 S2998 is phosphorylated but T2992 is not - that means something critically important. 

When we do shotgun proteomics, we cut this region into a small piece and peptides phosphorylated once on the same region of that fragment coelute. It's often very hard, if not impossible, to tell which site is phosphorylated - or both. 

Obviously we should do this without digesting them, right? The problem with that is that top down proteomics only really works on small proteins. If MEK1 was 16kDa, it would still be tough, but you could do it. MEK1 is almost 50kDa, though! 

Enter "Individual Ion Mass Spectrometry" IIIMS? (older post explaining what that is here) and - what?? - ETD IIMS? 

I was going to cut in parts of the materials and methods section here, but I don't want to intimidate anyone into thinking this is clearly just a proof of concept. However, I will say that the process to get to these data makes label free single cell proteomics look like a nice fun day (which it is NOT). 

However, we have to start somewhere and being able to confidently localize 4 separate phosphorylation sites on a critically important protein this big - with anything - is a step in the right direction! 

Monday, March 18, 2024

US HUPO 2024 special live episode of THE Proteomics Show is out!


No joke at all, this is by far, my favorite episode of the podcast so far. RASR has a great radio voice, asks super smart questions and Dr. Olga Vitek has such a cool perspective and story to share! Such great content that Ben couldn't even ruin it. 100% recommended!  

Sunday, March 17, 2024

Asparagine to iso-aspartage conversion in norovirus infection!


Holy cow, y'all, this one was definitely not a fun puzzle for these authors to sort out! 

The punchline here is that there is a spontaneous post-translational modification - get this one - it is a deamidation of asparagine - which makes it the exact same mass as aspartate - that changes both the protein 3D structure AND alters binding partners.

How do you go about even suggesting that is what is happening? It seems like they started with trying to model the norovirus capsid protein by NMR and it was fuzzy and suggested two forms. They ended up doing a lot of NMR and hydrogen deuterium exchange mass spectrometry. 

Deamidations are very easy to mistake for naturally occurring isotopes, so they used inline automated pepsin digestion and used 120,000 resolution on the MS/MS spectra (!!!) to help properly resolve the fragments and this story. Seriously, just a monumental effort to show how a tiny and extremely simple virus can use clever biochemistry (I've never heard of at all until now!) to cause complex things to happen. 

Saturday, March 16, 2024

Improve intact protein analysis with outlier rejection borrowed from astronomy!

Apparently mass spectrometers aren't the only thing that generate "spectra"! And we aren't the only people who have to worry about averaging lots of measurements because we have limited signal.

Would it be worth borrowing some tools from one of these other fields for -- a 45% increase in intact protein identifications?!?!? 

Considering how many people in our field will drop $1M for a 20% increase in IDs, I'd say this is worth taking a look at. And - you don't even have to - it looks like you can just upgrade MetaMorpheus and run some topdown proteomics through it (good benchmark dataset here) and just use the new "averaging with rejection" feature.