Wednesday, December 11, 2024

Proteoform level analysis of "purified" albumin reveals shocking levels of complexity!

I'm again going to put off blog posts on the 50,000 human proteome cohorts using "next gen" spot-based targeted proteomics.

I get it - I love the idea that we can use alternative technologies and get to this kind of population level protein level studies. But -just to remind you - we don't know the answer to this question


What we do know - without any possible doubt, whatsoever, that evolution is super ridiculously stingy when it comes to making new stuff. Sure, there are excessive things that don't negatively impact the overall survival of a population - but those are exceedingly rare.

If a cell makes an alternative form of a protein you can almost guarantee that there is a good fucking reason for it. 

Case in point - what if you treated one of the single best characterized proteins on the planet - not as a protein - but as a proteome

This team took a look at multiple "purified" forms of trusty old bovine serum albumin and treated it like a population of proteoforms - and 

1) It definitely is

2) "Purifying" a protein is....umm... something you'd think was 100% super well defined. Let's go with "could use further definition and characterization". 

We've seen things like these in the past - here is an old post where a modified Exactive (I think what later became the Exactive EMR, one of my all time favorite little boxes) - pulled out 59 different forms of ovalbumin. 

That's hard to look at and really encapsulate. Woodland et al., went full classic protein biochemistry and - this isn't hard to understand. This is a "purified albumin" separated out by isoelectric focusing in dimension 1 and by SDS-PAGE in dimension 2. 


This is a purified protein??? Some of that stuff probably was just tagging along, but a whole ton of that is albumin proteoforms. And - again - there is probably a very good reason for why an organism would expend energy to develop alternative forms of all these proteins, right? 

This is a super cool and thought provoking study that pokes some holes in more than a couple of our normal assumptions. 

Tuesday, December 10, 2024

ProHap - Search your proteomics data against population variants! Critically important new community resource!

 


STOP. IGNORE THE FLOWCHART ABOVE. These are bioinformatics people, they think this stuff is mandatory. I assume their conferences all have contests where the winner makes the flowchart most likely to make someone in another field throw up.

Again - don't look at it - 'cause this is legitimately important. 

You know how the genomics people have been doing things for years with illustrious sounding titles like "The 1,000 Human Genome Project?" Particularly when a lot of those things kicked off and the technology was more expensive, these things absorbed HUGE amounts of research dollars. The goals were to undestand how human genomes vary across us - as a species. 

And they did these things and they kept the results 100% secret from everyone forever. 

I guess that's not true, but -to me, as a human proteomics reseacher -they have been less than useless. Yay, you did a bunch of stuff. Who does that help? Not me or anyone I know. Even researchers I know who focus on health disparities can't get usable data out of these things.

UNTIL NOW. 

What these awesome, though flow-chart loving people did was dig into these top secret genomic databases and they assessed - 

-you won't believe it -

Protein level changes across human populations! This is where it gets important. 

How many peptide level variants could there possibly be in 1,000 genomes? 12? 15? 

Try 54,679! Don't believe me? Here is a completely not illegally taken screenshot. Don't sue me!


Almost FIFTY-FIVE THOUSAND PEPTIDE VARIANTS?!?

How many are you looking for in your data? One? Yeah, me too. I mean, unless we're doing deep cancer genomics and then we search for 2 million. Why not normal variants?!? 

Okay - are you thinking - "big deal, I probably need to spend the next 10 days downloading klugey python scripts written by proteomics people and finding out that my Docker thing is from 2017? How on earth does this help me?" 

And this is where this is super legit. 

Go here. https://zenodo.org/records/12671302

Download this - 


Use 7-zip or something to unzip it twice. (I don't know, it's right there with the flowchart competition, bioinforomatics people have contests to see who can Zip things the most number of times. Bonus - as in here - instead of naming each Zip .zip you can name them weird things. The first thing you unzip is .gz, then it will make a .tar, and you also unzip that - and you'll get the whole reason I've written this entire thing -


You get a FASTA FILE that represents common peptide level variants that appear in human beings across our population! 


Yeah, it's pretty big. 104MB and 157k entries. But you're encapsulating some much larger percentage of normal human genetics now! 

100% check out the paper. They did other smart stuff and there are other (possibly superior files depending on your application.) 

If you're using FragPipe (you should be!) check out this advice from Alexey! 


And check out this additional resource from his team here!

Monday, December 9, 2024

Top down proteoform analysis of kinase inhibitors in an approachable method!

 


Wow. This new study of kinase inhibitor treatment of cancer cells - using top down (intact protein/ no digestion) proteomics is 

1) Super legit

2) Seems really approachable

3) Kind of resets the bar in my head for what we can do right now with today's off-the-shelf technology.

And I might have a surprise for you. While Neil Kelleher's name is here because it is part of a special issue in his honor - this isn't a Kelleher lab study! 


Generally when we see a super impressive top down study I flip through it and then think - cool - maybe I'll be able to replicate it in 10 years? There is often modified instruments or things where you think - if I was able to keep my core scientific team together in a group for a decade we could pull off something this hard. 

Not to say there isn't some legit technical firepower on this study (Kevin Gao is a pro's pro mass spectrometrist), but you can read through this protocol and think - wait - could I totally do this? 

Instrumentation is an Exploris 240! (Approachable, affordable, clean it yourself hardware!) 

The HPLC is a custom Accela....okay, well...I don't have one of those, but it is running at 400nL/min with an interesting combination of buffers. I assume any U3000 RSLC, Eksigent or whatever could assume those same performance metrics. 

Custom nanoLC source. Details in references, but you can make a nanoLC source from Legos. Probably not that tough to reproduce (or necessary). There are funny little bars that are necessary for the Exploris systems when you make your own source and those can set you back several hundred $$

They used TopPic suite for the data analysis, which you can get for free here, as long as you sign stuff saying you won't be a jerk. For some of the focused proteoform specific (is the phosphorylation site at this place or this place) they (interestingly) used BioPharma Finder. I've never loaded more than 5 proteins in that at a time and it's super slow with that many. I assume they put in one sequence and a narrow time window in order to really lock down that one target they're trying to localize. 

The results are well displayed - really pretty and clear - and, again, really might just change your mind about doing top down proteomics. Bravo to this team, I legitimately loved reading this paper from beginning to end. 

Wait - found something to complain about! Whew, I was worried. They haven't unlocked the PRIDE repository so I can look at the files. It was just accepted (JPR ASAP). 

Sunday, December 8, 2024

Is this the year I finally win the US HUPO conference T-shirt design contest?!?

 


I think it is, though I also thought that in 2022....

and...maybe I did win the Chicago one...? No, it looks like I tried to print my own shirt and the company thought I was playing a joke on them? Weird. 

Well, if you think you can beat my entry, go ahead and try! Mwhahahahahahaaaa. You can waste your time submitting one here

Saturday, December 7, 2024

THE (real) single cell proteomics technique scSeq people love - NanoSplits- is out!

 


Check out one of my favorite techniques of the last few years - the NanoSplits paper here! 


The first preprint of this study is somewhere on the blog, but the work evolved considerably since we initially saw it.

If you aren't familiar, what this does is label free preparation of REAL NORMAL SIZED SINGLE ONE (1, uno, um, eins, jeden, yski, en siffra, een, ichi) at a time on glass slides using precision robotics. 

THEN the lysed cell is split into 2 fractions with most of the protein going one way and more of the little transcripts going the other way. You do single cell proteomics on the fraction with more protein and you can amplify the transcripts in the other fraction for transcriptomics. 

BOOM! You get everything! Now, there are obviously some drawbacks here, including that it is really hard to do. You need the precision robotics. This team features some people with serious instrumentation backgrounds but also people with a history of simplifying methods so mortals can eventually do them. We've written 2 grant applications where the technique has been prominently featured. The scSeq people are a whole lot more comfortable with this measuring protein thing if they can get evidence that you aren't just making stuff up! 

What's super cool here is that while multiple groups have shown complementary data by doing stuff like single cell proteomics and single cell seq on the same or very similar populations of cells (my group's first study was dosing the same cell line from the same source with the same drug - in a recent study) - here you get a real - Cell A proteomics and transcriptomics fill in a specific pattern. Cell B the same. 

The authors are quick to point out that NanoSplits could be a bridge technique to unify findings between more traditional studies where you either do SCP or scSeq or both on the same population. A small number of cells split could explain discrepancies between these 2 data types, or help you truly link 2 populations together. 

Seriously - a phenomenal, clever technique with top notch data collection and informatics and when I resubmit a grant in a couple of months I'm sure my reviewers will be excited to see a prominently published paper rather than a link to a preprint.   

Friday, December 6, 2024

Nature's Method of the Year 2024 is Spatial Proteomics!

 


WOOOOOOOOOHOOOOOO! 

Editorial here! 

Last year it was long read sequencing or something (they forgot to include it in the 2014 issue, I'm pretty sure).

Check out this special virtual issue (click on the references!) highlighting a bunch of cool people in our field and their work!

Thursday, December 5, 2024

Improve your false discovery in your match between runs with PIP-ECHO!

For an old and probably inaccurate description of match between runs (MBR) you can check out this old post. 

Also, you probably shouldn't go past Fengchao and Sarah's paper here. Link might be the preprint.

Quick breakdown, though -

Imagine you run 50 LCMS runs on different patient samples.

In 35 of those runs you fragment and successfully identify PEPTIIIIDEK, it's pretty much 100% +2 charged and 634.3608 and comes off at 15.6 minutes 

In the other 15 runs you see a +2 peptide at 15.6 minutes but you don't fragment it or don't get good enough sequence quality for a positive ID. 

Match Between Runs (MBR) to the rescue! It donates that identification to the runs where it was not identified. 

Perfect idea, right? What's the problem? There are crapload of peptides in any tryptic digest and they coelute a lot. And as the dynamic range of our instruments keeps going up we see lower abundance peptides that we might not have before. 

Compound this with shorter LC gradients

And the fact that every mass analyzer has a +/- mass error

And the retention time on nanoLC, which everyone is pretty much using for some reason no one can justify, is probably more accurately, in those 35 runs, that peptide is coming off somewhere between 14.5 and 16.5 minutes - and now you might be quantifying the wrong peptide. 

The top link above is to IonQuant which works in FragPipe. 

Could you take that idea and build something even better? Maybe! Just do this! 




The comparisons look good, though! And there is some serious nerd power on the preprint


I checked the updates and it doesn't look like it's live on MetaMorpheus yet, though something that was driving me absolutely crazy a while back is! (Thank you! I thought my brain had broken (the diagnostic ion is wrong a lot of places - including in software from my group. ) 

Knowing these people we'll PIP-ECHO it in one of the upcoming builds. Fingers crossed it will work outside of the Orbi domain! 




Wednesday, December 4, 2024

BlueSky is tracked by Altmetric. Twitter is finally dead to science!



Well...that was a run! I was on Twitter for 11(?) years and Tweeted over 10,000 times. 

Mandatory, obviously. 


It's done, though. Biorxiv won't link your Tweets and I don't see a tab on Altmetric. That increasingly bizarre drug addict killed what was at one time the best device for rapidly disseminating scientific advances I'd ever seen. 

BLUESKY is going to be better, I think. The expertise density is legit and some people are starting to figure out some of the cool features that I haven't yet. 

And!  BLUESKY IS TRACKED BY ALTMETRIC, JUST LIKE THIS WEIRD BLOG!  Wait. What? Sure is! And has been for almost a decade. I don't know why, I just type things here with my nice ergonomic keyboard and my broken brain filters. 

Tuesday, December 3, 2024

SCPro - Not single cell proteomics, at all, in any way, but still pretty cool!

 


Here, I fixed it for ya! Even though I can clearly see why I also wouldn't have recommended Ben Orsburn as a reviewer for this one, I do actually really like this new study


I will, however, complain first. Last week I had a great meeting set up with a potential funder for my program and we got to an impasse that was something like the most important person on the call saying

"-of course we understand that single cell proteomics is not actually a single cell" 

And, while that was not at all unexpected because this was not a dumb group of people. They're up to date and read a lot and obviously they realize that

THE VAST MAJORITY OF "SINGLE CELL PROTEOMICS" PAPERS 

DO NOT

DO PROTEOMICS ON ONE (1) CELL. 

They don't do one cell because it is still hard to do. Believe me, I don't care what hardware you have you really have to be on your A-game with everything planned out and have some luck (no lab floods helps a lot) and a 384 well plate that is slightly mismanufactured so it sits silly in your CellenOne can break some expensive glass and you walk away with nothing at the end of a 14 hour day in the lab when you didn't get to stop for lunch (true story).

So people do things like flow sort 1,000 cells as in the study above or they stain and cut out 10 or 200 cell regions based on cell type specificity markers (as they also did in this study) and to really boost the impact of their paper they put "single cell proteomics" in the title. Or, if they're super on their marketing game they'll name their not-Single-Cell-Proteomics method something like SCPro. Deliberately confusing? 

Again, in this study - which I do seriously like - they do both. The microscopy is nice, the flow sorting looks good. The front end prep on a tip with SCX and C-18 is - in my somewhat professional opinion - probably a whole lot of extra work for very little actual gain over doing something with less steps and places for lower concentration peptides to bind. But the library generation and diaPASEF analysis on a TIMSTOF Pro results in a solid number of IDs. (50 um custom columns, with low flow rates on a nanoElute). When you get down to what looks like to my eye, probably 5-ish cells sliced out, (probably 1ng-ish of protein/peptide) they're getting 500 proteins, which is a solid achievement. At 10-30 cells they're getting above 2k. (maybe 2-6 ng of protein/peptide). Again, nice numbers. 

The downstream analysis is well integrated and the files are publicly available for both the LCMS files (I haven't checked, my office PC has software on it that doesn't like the iPROX access portal sometimes). 

Again, it's a nice study, working with a grand total of 100 nanograms of peptides from 1,000 flow sorted cells does still require some finesse. It is, however, frustrating to spend weeks optimizing the isolation of one (1) single cell at a time and analyzing them one at a time and then hit these new perception hurdles. Like - well, this other lab is doing single cell proteomics (they aren't) or that no one can actually analyze one single cell when we have whole conferences where the admittedly small number of researchers actually doing one cell at a time do speak about it. 

These perception hurdles have always existed, though. I have little scratches in the surface of my relatively new tablet where I've broken the expensive little tips off of the pens that can write on my relatively new tablet when people have said "well, mass spec isn't quantitative." 4 scratches in the last 2 years, for sure. 

This would have been a super positive review (aside from the SCX/C-18 tip) without the title and misleading name for the technique. 

Monday, December 2, 2024

Proteomics of ...spontaneous achilles rupture....

 

I would like to thank these authors and the prestigious Journal of Proteome Research for something new to have nightmares about

SPONTANEOUS ACHILLES EXPLOSIONS! 


Proteomics to the rescue! (By the way, there is this whole series of bizarre children's books where there will be some silly problem and it's all COWS TO THE RESCUE or something. It's funny by the 11th page and continues through the 6th book somehow.) 

Obviously, this group wants to understand why sometimes people's achilles up and explode just for fun, and they are able to get samples from patient who end up getting correction surgeries! Obviously, this is yet another place where genomics/transcriptomics of the tissue will probably tell you nothing - so it's proteomics time! 

Interstingly the group breaks out iTRAQ 8-plex and does pooling, offline fractionation (by SCX, I think, but I forget now, did a singing daycare drop between reading it and now) and then analysis of the fractions on a Q Exactive (Classic, I think). All the files are up on PRIDE where they should be. I have no issue with iTRAQ 8-plex here, by the way. They turned up the collision energy, fragmented each target 2x before putting it on the dynamic exclusion list. The 8-plex allows them to run the QE at maximum (vendor permitted) speed with 17,500 resolution at m/z 200. 

What I do have an issue with is the surprising pictures of the operation itself!! I was expecting a calibration or volcano plot and - blech. 

Seriously, though, it is all pretty intersting. There are structural differences in the ruptured tissue that are clearly visible and they go into depth with IHC. They find a panel of targets that might be indicative of potential rupture candidates? It's a super compelling study all around - on something I now get to think about. 

Interestingly...I think this is a dataset that would be a solid candidate for a reanalysis because it looks like this group didn't consider common collagen PTMs. I'm assuming when they considered dynamic oxidations they exclusively mean methionines. Collagens are hyper-modified. In fact, in the BOLT cloud search engine, Amol wrote in a whole crapload of common collagen PTMs in the first pass search because they're just that common. I think he got that idea from the cRAP database guy, whatever his name is. 😇

Wednesday, November 27, 2024

New Hats!

 


It'll take me a while to update everything on all the internet things but we can finally wear these new hats openly.

We'll get used to the colors, though orange and maroon will likely always be my favorite combo.

We're moving so I can get started as Assistant Professor in the Department of Pharmacology and Cellular Biology (through the new institute I don't know if I can talk about yet -soon!). I'll also be helping out Stacy Gelhaus as an Associate Director in the phenomenal Health Sciences Mass Spectrometry center. 

I JUST ORDERED SUPER EXPENSIVE BIG HEAVY THINGS and saw pictures of crates with MY name on them. Not someone else's. My name. Weird. Normally "care of" would be expected.  Huge shoutout to my friends at Bruker Daltronic for helping an Assistant Professor's money stretch out enough to cover what I need to start 


Monday, November 25, 2024

Find what proteins are being made RIGHT now with DADPS!

 

This new paper is an improvement over a super cool already existing technology that I had NO IDEA EXISTED AT ALL


Did you know that you could dump in something that cells would mistake for normal methionine that you could selectively pull down, so that you knew what proteins were being made at that point in time? I did not. 

I bet the nerds pulling down ribosomes and cutting the attached DNA with nucleases, then busting up the ribosomes and sequencing what didn't get digested (RiboSeq, definitely not convoluted at all) don't either. Sure, RiboSeq is cool - and programs like ProteoFormer have been around for a while combine RiboSeq and proteomics data output. 

But I don't know how to do RiboSeq. I'd have to go back to a grant application that wasn't funded a while back to figure out who on our team was the expert on that part to even know who to start asking dumb questions about the technology. OR I could just do this? And have an output I understand that says "your drug is causing the cell to start making proteins A/B/C right now?" 

Again, this is an optimization, but the authors use the original long acronym thing you don't need to commit to memory (because theirs is better) and demonstrate that is can also be used with TMT, it seems to work best with SPS/MS3 based TMT quan. 


Friday, November 22, 2024

Cognitive impairment at high altitude? Proteomics (and metabolomics) to the rescue!

 


I don't know about you, but I'm waaaaaay dumber than usual when I'm at really high elevations. Not only dumb, but also lazy and tired and 2 glasses of wine and I might just fall right off of a ski lift. 

I went to a wine convention thing in Colorado at a Ski resort years ago and found out all of these things. 

This study tried to get to the bottom of this by collecting proteomic and metabolomic samples from people who went to work in high elevations for 6 months. Holy crap. Some of this work was at 4,800 meters in elevation. There isn't anything in Colorado that tall. The highest lift I fell off of wasn't even 4,000 meters. I'd be useless up there. 


These authors compared serum proteomics (TIMSTOF Pro with....ummm....I'm not sure I understand what else. SDS-PAGE fractionation? Online? At high pH? And an Ultimate 3000 was used on the TIMSTOF but no details seem to be here for the column, flow rate, gradient length) and the data was processed by ....umm....magic....? No software was mentioned that made the PSM/Peptide or Protein Assignments.  I suspect the person doing the actual proteomics was not consulted by the authors on that section. Or they're author 7 and they're like "f' you guys, figure it out yourself" which I've heard sometimes happens. 

The metabolomics was done on a Q Exactive and there are details. Waters BEH 2.1mm x 10 cm column at 350uL/min for a completely secret gradient length. It literally says "over time" and if you want to know about how they ran the Q Exactive - you did not come to the right authors. No joke, check this out. 

We optimized it for best performance and we will never ever tell you what that was and it's weird you'd want to know. The reviewers were not on top of their game for this one. Weird to see something like this in JPR. Nature something or other? Sure. Not JPR or MCP. Meh. 

The data does appear to be publicly available if you're all nosy and want to know or...you know...if you thought this was cool and you wanted to reproduce this work.

The plots are really nice, though. MSStats was used for the super secret proteomics data and an R wizard was onboard who tells you what packages he/she/they used. Hmm...no version info - but AHA there is a github and it is populated. The biology looks cool and I really dig some of the graphs, so Imma post it. The authors could put the complete mass spec methods in the Github later since it doesn't appear to be published to Zenodo or something that will lock it from alterations. 

Wednesday, November 20, 2024

The single cell proteomics/mass spec meeting schedule for 2025!

 

Image from my kid's very favorite thing to fall asleep looking at - NIH Bioart! 

When Single Cell Mass Spectrometry is announced, I'll add it here. Please ping me if there are others. I have no idea how I missed that Asilomar thing a couple of years ago, but I still feel dumb about not even knowing that I should be there. 

Here are the first two! 

Single Cell Proteomics (US/Boston/NorthEastern) May 27-May 28

European Single Cell Proteomics (I love this one and it crushed me to miss 2024 in one of my all time favorite cities) Vienna! August 26-27! 


Tuesday, November 19, 2024

SomaScan - 7k vs 11k - seem to largely agree with one another...

 


I'm leaving this here largely for me to check if I can find plasma proteomics by LCMS that I can confidently link back to this same cohort. There is a lot of LCMS proteomics data from these authors, though all I have actively worked with has been muscle biopsies.


However - people are going to want to use the 11k SomaScan assay that is now out there, and it is nice to see that these authors find - after proper normalization  and a lot of batch effect analysis - that data from these two technologies are aligned.

It should be noted that the precision of SomaScan has been shown to be solid. In 8 years of watching for it, I still haven't seen any evidence that the system produces results that are an accurate reflection of the amount of protein present. 


With the growth of this technology - including the use by some of the best groups in the world, such as this one - I hope hope hope hope that it is actually accurate and one day we'll see proof of it. 

Monday, November 18, 2024

HUPO 2025 - worth it for US people for döner kebabs alone!


Planning your conference schedule for 2025 and trying to decide which amazing US east coast city is going to get your money? 

US HUPO in Philadelphia? (February 22-26!) 

ASMS in the greatest city in the world? (June 1-5!) 

International HUPO in Toronto? (November 9-13!) 

If you're already in North America - Toronto is your one chance to get the amazing artery clogging amazingness that every European takes completely for granted. Toronto has döner kebab places EVERYWHERE! Even in NYC - which has everything - you need to really look, and probably hop on one of their famously clean subways to pop over a couple stops to find one. Toronto? All over the place! 

Saturday, November 16, 2024

Get ultralow flow rate nanoLC with ONE pump??

 

I did my PhD with an amazing chromatographer as a co-mentor. As such, it was just easier for him to handle all the hard stuff and I'd just run the vacuum chamber things. I like the simplest HPLC possible. Less to break and even better if I don't have to EVER look at a diagram like whatever the picture above is trying to explain to me. There are definitely switching valves involved. 

However - I could sure think of 4 things I could use ultralow flow rates for right this second. What if I could get that with ONE loading pump by preloading plugs of solvent that naturally form gradients by diffusion? Sounds like magic, but I'm absolutely interested! 

Check this craziness out here! 








Friday, November 15, 2024

Astral vs TIMSTOF Ultra on real single cells? Just about evenly unreal numbers!

Maybe one of the coolest things about the EvoSep is how it really can allow us to minimize variables from one system to the next. You generally have the same column (or, at most, maybe 3 columns, which do make a serious difference, more on that some day), but the same flow rates, etc.,

Which can allow some head to head instrument comparisons that are hard to get otherwise! 

Without further ado - ultra low level samples - 

I don't have either, but I do have ProteomeXchange and 


and 


People who actually make their results publicly available! 

As another front end bit of usefulness both studies used this amazing single cell sample prep front end


Which generates AMAZING SINGLE CELL PROTEOMICS DATA. No question. Amazing. Is it the most expensive way you can prep a single cell today? Also yes. But before we update a preprint with a new clear winner for absolutely most expensive single cell proteomics study ever performed (you can probably guess what mass spec was used...) we'll pull down a crapload of files from these studies and process them with the same workflow.

Thank you DIA-NN 1.9.1 (also, after talking to Vadim I realized I should make separate library free libraries for each study, it does take into account whether you're using .d or .RAW when it makes the library. However, every other setting was left the same. Predicted off the same UniProt human library with all the same settings - and allowing DIA-NN to work out the appropriate windows for mass accuracy, etc., trying to be unbiased. Select file, select correct spectral library. Run. Wait. Again -boom - standardized data processing? 

As an aside - wow - Ultra files take a lot longer to run in DIA-NN. The mass accuracy isn't quite as good which takes longer and then it's got to do the IMS comparisons. I thought the Astral files were straight up crashing because they were done so much faster without that 4th dimension to think about at all. 

40SPD Whisper (nonZoom) for both. 

IonOpticks Aurora 15cm x 75um (which is Ultimate? Worth noting that they've updated their naming protocols for columns recently.) 

You're talking absolutely neck and neck here. Edge goes to the Astral by a tiny amount maybe? I'm getting 3,200 protein groups on the Ultra(1) and about 3,400 on the Astral per cell. HOWEVER, the paper using the Astral is using HeLa cells which has a higher total protein content than the cells the Broad used in this powerful demonstration of the budgetary power these two groups have. I think I processed 6 random cells from the Bruker and maybe 12 from the Astral. Largely because of the time constraints mentioned above. 

Both groups go to 80SPD in the study. Edge appears to go to the Astral at 1,200 protein groups/cell and 850/cell in the Ultra. Again, that might be the larger cell, but it is very difficult to tell. HeLa has this really useful smooth protein distribution, particularly in the high end, compared to most other cells which is why it's such a good cell type for demonstrating a proteomics method. 

Wait. No plots and error bars? Yo, this is a blog post I'm writing while waiting for espresso to do magic stuff to my brain. If I felt smart enough to fire up GraphPad, I'll start doing actual work. I actually did a decent job on this comparison at the time because I thought I was going to buy one of these big heavy things this year. 

Ultimately, these are always flawed comparisons so this is where I'm going to stop. You can go to ProteomeXchange and pull these down and process them yourselves. Both groups get much higher numbers than I do because they both generated actual libraries from their own data. We know that helps A LOT on the TOFs and generally less so with the Orbitrap data because we know what these predictors are based on - but the Astral isn't an Orbitrap and I don't thing we've got a good comparison? Today on what that difference is. 

At the end, though, it's really cool to see that we - as consumer scientists  - we have viable options for instrumentation. We can pick hardware because of how comfortable we are with the software interfaces, or for price or space considerations and then forget about the silly hardware arm's race and start doing biology with these things. 

ALSO - HOW CRAZY IS THIS??? The QC we had at the core I worked at before I went back to fail in academia was 200 NANOGRAMS OF HeLa and 2 HOUR GRADIENTS. I wasn't getting 3,500 proteins on my hardware with that! The fact you can process an actual sample that is 1,000 times lower in concentration and get 3,500 proteins in less than 1 hours is absurdly amazing. I think about this all the time. Where else can you say, yeah, we got 1,000 times more sensitive and faster in like 5 years? Absurd. And all signs seem to indicate we aren't out of this exponential increase in hardware capabilities quite yet. 

Tuesday, November 12, 2024

What DIA algorithms perform best for single amino acid variants?!?

 


Yes! I have also wondered this exact same question. DIA is great for protein level quan but you get lots more peptides/protein, so it works out if they aren't always as high quality as DDA proteomics.

Protein post-translational modifications still don't work as well in my hands with DIA vs DDA. My last check of human samples ran on a TIMSTOF with DDA vs DIA was like 8x more PTMs with DDA. Come on IRB paperwork! I'd love to write this paper someday. 

What about those annoying single amino acid variants that every human being has? Except for the completely normal in every way Craig Venter who loaned so much early DNA that he's probably just the UniProt human database (there are many inaccuracies in this last sentence). 

Don't do it yourself - read this cool open access paper! 


Probably worth mentioning that this is Orbi-Orbi DDA and DIA. An Eclipse and Exploris were both employed at different points. You could probably assume some variation when you go to faster instruments with lower relative resolution/mass accuracy. So...maybe doing this analysis with a TOF might make sense? 

A minor criticism is that the authors did more work than strictly necessary. You could just ignore COSMIC entirely and just go download the XMAn fasta libraries that an amazing professor (who mentored an enthusiastic weirdo a few years ago - crap, maybe it was more than a few years ago) updates every few years. Original paper here. If you wanted single amino acid variants since the 2020 (?) update, I can see doing the extra work yourself, I guess. 

As an aside (surprise -enthusiasm! - maybe it's back for good?)  we struggled a lot at first with my postdoc's CKB type knockout mice when doing DIA proteomics. There are lots of CKs and they are very similar. We'd see CKM(uscle) in places in mice where there is no muscle and the knockout mice were always down-regulated, not knocked out, because one peptide would be attributed to the wrong CK (creatine kinase). That seemed to get a lot better every time we'd get a new DIA-NN or SpectroNaut update. 

What tool would you use for a fasta with a bunch of SAAVs in it? The paper is open access, check it out yourself. Worth noting, DIA-NN's new update specificaly has words about improving proteoform level quan in the new release notes.