Do they cite the paper that gave the S-Trap method to the world? No, and that thing would be at about a zillion citations if everyone did. Otherwise, it's a nice comparative analysis.
now also at www.proteomics.rocks
Do they cite the paper that gave the S-Trap method to the world? No, and that thing would be at about a zillion citations if everyone did. Otherwise, it's a nice comparative analysis.
Okay - so this one has bugged me (and a lot of other people for a long time) - we can do a pretty great job now of predicting peptide fragmentation (unless the vast majority of PTMs are involved). Supposedly we can do a solid job of predicting peptide elution patterns (exclusively from C-18 reversed phase chromatography).
What has been missing is predicting what peptides from each protein will actually ionize (or fly).
This has been tried before, btw -
I'm a little sad to say this but when I did my normal round of sending a paper that I just found yesterday and was reading at lunch the responses were univerally ...skeptical at best.... but maybe this is finally it!
Introducing pFLY! (I read it at lunch yesterday and it's faded in my mind a little but I'm just about 99.0% that the p stands for Pug)
I've got a lot to do today, but this new study is a jaw-dropper.
Sometimes this blog is just what I learn as I'm going through learning something for myself - and this is clearly one of those posts.
One thing that was not emphasized nearly as well as it could have been during my interviews at Pitt was the absolutely amazing world class High Performance Computational /Computer / Cluster (HPC) framework that we have.
It took a little work and me bugging colleagues with dumb questions, but I've got some workflows going that need a lot of firepower!
Namely things like FragPipe open search - and R packages that almost inevitably require ludicrous amounts of RAM.
Things I've learned so far.1) The time to get my data to the HPC can be a bottleneck worth considering. My TT Ultra2 is generating around 160GB of data/day right now. Around 1.5GB per single cell and closer to 4GB for libraries and QC samples. Seems to average out pretty close to 160GB. Transferring 1 day of files to the HPC seems to be around 1-2 hours. Not a big deal, but something to consider if you're the person prepping samples, running the instruments, writing the grants and papers, writing blogposts and picking your kid up on time from daycare every day. Worth planning those transfer out.
2) NOT ALL PROGRAMS YOU USE WORK IN LINUX. FragPipe, SearchGUI/PeptideShaker, MaxQuant are all very very pretty in Linux. Honestly, they look nicer and probably run better than in Windows. DIA-NN will run in Linux, but you do lose the GUI. You have to go command line. But what you can do is set up your GUI runs and then export those from DIA-NN. Maybe I'll show that later.
3) You may need to have good estimates of your time usage. In my case I currently get a 50,000 core hour allotment. If I am just doing 80 Fragpipe runs, I need to think about
Cores I need x number of hours I need those cores. I can't request more than 128 cores simultaneously right now (for some reason, yesterday I could only request 64 with FragPipe, I should check). But if I need 128 cores - do I need those for 10 hours? If so, thats' 1,280 core hours I will blow through.
Since MSFragger is ultra-fast but match between runs and MS1 ion extraction is less fast and uses fewer maximum cores/file, there isn't a difference for a small dataset for using 32 cores. Your bottlenecks aren't where you really scale up forever.
4) Things that are RAM dependent may be WAY WAY FASTER. I think we scale to 8GB of RAM/core on our base clusters here. 32 cores gives me 256 GB of RAM! If your program is fast enough to read/write to offset a lack or RAM or use every amount of RAM around to maximum effect, those things can be much much faster.
5) Processes that are processing core speed dependent may be slower. For a test, I gave FragPipe 22 14 cores on a desktop in my lab and 14 cores on the HPC with the same 2 LFQ files. Unsurprisingly, you can really crank up the Ghz on desktop PCs where it makes sense to have lower overall core speeds when you have 10,000 cores sitting around.
6) You probably need help with all installation and upgrades. Most of us are used to that by now, though. I can upgrade my lab PCs to FragPipe 23 today. I need to put in a service request to have someone upgrade me on the HPC.
7) You may have to wait in line. I tried to set up some FragPipe runs before bed and requested the HPC allotments. Then I dozed off in my chair waiting my turn. Then when I woke up the clock had already started ticking. I wasn't using my cores, but I had blocked them so no one else could use them, so they did count against me.
I'll probably add to this later -but I highly recommend this recent study out of Harvard which has been my go-to guide.
One thing that I find really surprising here is that - unlike previous studies - this group tried the reduced volume of 384 well plates and found autosampler vials more reproducible. I'm stumped on this one. This is contrary to everything I've seen and Matzinger et al., found and is frankly just counter intuitive across the board.
The surface area of an autosampler vial is huge, comparatively to the bottom of a 384 well plate. I do find it a complete pain in the neck to calibrate some autosamplers for accurately picking up out of 384 well plates, but I don't know how much that plays in here. Also some glass binds less peptides than some plastics. Insert shrug.
That aside, the authors put one oocyte into things with the CellenOne and then add digest. Incubate and inject. 60 min run to run on a 50um x 20cm column and running diaPASEF with a 166ms ramp time.
Data analysis was in SpectroNaut.
Okay, and the reason this is escaping the drafts folder is because the biology is really cool. They look at both artificial (handling) and natural (aging linked) conditions and how they effect single oocytes. There are a lot of people out there who care about how those things (probably not in mice, but maybe?) change throughout the aging process!
I wonder if this was inspired by some of the same things that I was just complaining about?
Okay, so rather than just complain about it, I also went crowdsourcing to find resources - and here is a 4 minute video showing you how to make your data publicly available on PRIDE!
Transcript abundance tells you what a cell wants to do.
Peptide/protein abundance tells you what the cell is actually doing.
You can get measurements of the transcripts of tens of thousands of cells with a few hours of effort and passing it off with reports coming back in a few days.
Each single cell proteome is a lot slower and a lot more expensive, but worth it for the whole... biological relevance... thing....
I've been on the fence about diagonalPASEF, but I guess when my SpectroNaut license goes live it's probably time to try it.
I legitimately don't know who came up with diagonalPASEF - there were too many cool methods too fast for me to even try them. But it almost looks like 3 groups (all European...of course....) all had very similar ideas. But on my new instrument it's just a button, so Imma just push it and see what happens.
The bummer is that I do have to take my source off and calibrate the instrument with the ESI source - which I haven't done since it was installed - (you can do good mass and TIMS calibration now without the ESI source but you do need to sensitivity tune and/or quad tune for diagonalPASEF with the source).
But this is legitimately smart looking!
Y'all, this ASMS is going to be sooooo crazy. Despite the lack of Europeans and the fact none of us in the US have any money to do science.....
The biologists want to detect a protein and they want to be able to say that in condition 1 vs condition 2 one of those conditions might possibly maybe have more protein. They don't care at all how much more protein. And - again - I'm the one here who is probably wrong. Hannah did this phenomenal thesis project in my lab and she worked out the nanomolar concentrations of 7k or 8k proteins at the blood brain barrier. We were operating under the assumption that absolute concentrations have value. Like - if your are doing medical imaging you know that proteins below xxnM just can't be visualized with any of today's technology. Don't try. And maybe that's just one outlier where we absolutely have to know the protein concentration.
Maybe the other clinical assays, like CRP and troponin and ALT/AST ratios are also outliers. Sure - whether you're going to get a wire jabbed into a blood vessel might be determined by the absolute amount of troponin in your blood right now as compared to 30 minutes ago. But it really appears that for the vast majority of new people in proteomics they want to know - is there probably less protein here and more protein there?
So if you really just want to detect proteins and you truly do not care how much of the protein is around - and you've got a lot of money - do I have a technology to show you!
SOMASCAN COUPLED TO NEXT NEXT GEN SEQUENCING! It's called Illumina Protein Prep!
For real, it's a real thing.
First of all - let's look at what aptamers are and what they do. I'll back way up because some people looked at me like I was out of my mind when I talked about the proteomics assay with the lowest quantitative dynamic range.
Let's start with this review from ancient history (2010). Don't worry, this is a physical limitation of protein oligonucleotide interactions. Not much has changed but there are more modern references below.
The binding, however, is calculated through either the dissociation constant or association constant and this functions in a linear way over an extremely narrow dynamic range.
Imagine you have patient A and patient B. And one has 5nm of IgE in their blood? Well....that's probably about where the blank is, so you get a zero. What happens if you have 1,000nM of IgE? Well...you probably register at about 150nM, maybe a little bit more? Again, maybe you do not actually care in any way whether you have 150nM or 10,000nM? Maybe you're just weird for wanting to know.
What's important here, though is that each aptamer is like this. It is designed for a very specific protein and each one has it's own binding and dissociation constants. It's also important to know that in a complex solution, you're dumping in (in the case of Illumina protein prep) about 10,000 of these different aptamers! It is very very likely that the 1 order linear quantitative dynamic range represented in this figure in an isolated 1 vs 1 system is perturbed and not quite as successful as the above.
Edit -5/3/2025 because I'm self conscious about the crazy number of views this 20 minutes of typing has gotten in a single day.
This is how a pile of aptamer measurements work.
True concentration of protein X - 0 nM - Aptamer readout - Not zero
True concentration of protein X - 5nM - Aptamer readout - Same as blank
True concentration of protein X - 10nM - Aptamer readout - 2x blank
True concentration of protein X - 20nM - 2x of 10nM - This is good! You're in your dynamic range!
True concentration of protein X - 50nM - 3x of 10nM - It's still higher, but you've already left that little window where you're aptamer binding corresponds linearly to the amount of proteins (your linear quantitative dynamic range).
True concentration of protein X - 100nM - 4x of 10nM.....It's still higher but you are now need fancy math to have some way of estimating how much of the protein is there based on the aptamer binding response.
True concentration of protein X -1000nM - about 5x of 10nM.... You've maxed out your concentration and all you know is that you've got more than 100nM
True concentration of protein X - 10000nM - about 5x of 10nM - same as above.
This is important because as you'll see at the very last panel, it s pretty common in mass spectometery to get a linear concentration /signal increase across this ENTIRE range.
So - in aptamer measurements -
A) You almost always see a signal whether or not there is any protein there at all. So....when someone tells you they can detect 1,000 or 10,000 or 100,000 proteins in a sample you need to keep in mind that that is simply how many aptamers they put into the mixture. That doesn't mean they actually detect that number of proteins. They love to mix those terms up. And maybe you see a measurement for each protein aptamer. That does not necessarily mean protein detection.
B) You can trust that signal corresponds to how much protein is present in only a very narrow concentration range.
C) Above that 10x concentration range the value you see has no relationship AT ALL to the amount of protein present. You've simply maxed out.
End 5/3/2025 edits.
Again - the figures and review above are old - what can we do in 1 vs 1 relationships in 2025? Here is what I'd consider the high water mark today.
How did they do? Pretty darned good! About 1 order!
So....imagine my disappointment when knowing that I couldn't talk about what I knew regarding an illumina - somalogic partnership (I just assume I'm under NDA with every proteomics company in the world now and I just don't share anything until I can google search it) - and I discover that does not appear to be what they did?
They appear to simply throw in the requirement to own a NovaSeq 6000 or NovaSeq X system to generate - get this - data on up to 384 samples per WEEK, which is 1/3 the speed of O-link? And even slower than mass spectrometry?
And if you're new here and aren't familiar with the quantitative dynamic range of mass spectrometry - here is the first thing I found searching my desktop. It's a Sciex app note, but this isn't extraordinary data. I can show you real data like this all day. It's actually surprising because normally you think vendor app notes are going to be crazy unachievable data and this is just very normal.