Thursday, September 19, 2024

Thermo and SpectroNaut are friends again!

 


So....I think there were some ...concerns... in some corners about how one very popular and polished commercial software package was getting with one company. 

Boom! At least 3 vendors seem to be fully partnered up with one of my personal favorite tools! You can read this new announcement here

Wednesday, September 18, 2024

Analyzing the complexosome(?!) of malaria infected red blood cells!

 


Okay - so - y'all ready for a cool sideways approach to find protein protein interacting pairs? 

Is your first thought? 

Ummm....don't we have 1 million of those? Like immunoprecipitation and affinity enrichments? 

Sure - if you have an antibody to every protein in your organism. Do you have an antibody to every protein from a malaria parasite? We don't even have a good FASTA database for it. 

Ummm...okay well we've totally got APEX and BioID! 

Sure - you just need to convince someone to fund the development of hundreds of mutant strains of a parasite that pretty much only kills very very poor people. Again, we don't even have a very good FASTA database for this organism. 

What about native CE complexosomics with a gaussian interaction profiler? WTF is that? It's a technique that can fill in the blanks I mentioned above! 


Now - it doesn't look like a ton of fun - the other things are easier -but you basically lyse your cells under friendly enough conditions that you don't bust up the protein complexes and interactors. Then you take fractions by (in this case capillary electrophoresis) then you just digest everything in those fractions, analyze it like regular old shotgun proteomics. You need to use the Gaussian thing to backtrack your way to the interactors. It appears to be, in this case, totally compatible with MaxQuant label free output. 

This is where I probably sorta get what is happening - but I think this is a lot like when we try to hunt down a natural product with an enzymatic reaction. If I have 30 fractions of all the small molecules that a weird mushroom/algae/or bacteria I've never heard of and fraction 6 has a little activity, 7 has a little more 8 has a ton, and 9 has an almost detectable amount - we start eliminating molecules by those that do/do not follow those trends. Ideally the one molecule with activity will perfectly track to that, right? As a disclaimer I send every request for natural product discovery to ANYONE else and if they strike out there then I'll do it. Has worked twice - in 2 decades. 

Similar here - these gaussian models help backtrack the proteins that are most statistically within the clusters. Sounds smart, right?!? And if anything else can help you really backtrack to native protein-protein complex interactions in understudied organisms (and in this case a parasite- human interactions) I can't think of one. What a superb new tool for our utility belts! 

Thursday, September 12, 2024

FragPipeAnalyst - Save yourself a couple of button clicks to BOOM - Analyst data!

 


A little while ago FragPipe overtook the always-amazing MaxQuant in terms of number of global users. While there are probably some level of error bars, such as maybe the old server I have offline because it doesn't have a CMOS(?) battery in it's motherboard so it thinks it is 2018 and exactly zero of my annual licenses on anything have expired (it runs MaxQuant 1.6.17, which was one of my very favorites). No Fragpipe on that, it has 32 bit Java. 

That was a joke. AND 1) That probably wouldn't work and 2) No sane person would admit to that if it did work. 

However, Fragpipe is a powerhouse today and Analyst has rapidly become an easy push button go-to for mid-stream proteomic data analysis. Here I'm going with mid-stream being "getting to a list of plausible targets of interest". And then I'm going to use downstream as "getting to the targets that likely explain your phenotype". I'm blasting FDA Omics Days on the other computer in the converted old garage where I'm definitely not housing some computers that don't know what year it is. 

What if you're having one of these days... 


..and you just can't find the energy to upload your FragPipe data and your burdensome experimental design into Analyst? You can still do it! 

AND if you need some middle road rather than pushing the "open FragPipe Analyst" button, starting somewhat recently (I had a scan header thing for an in house tool so we stuck to a pre-20 version for quite a while) now the experimental design is already filled out for Analyst. This is assuming you filled in the box in FragPipe. Super cool, right? 

Wednesday, September 11, 2024

The human genome and proteome are larger than we thought - with some caveats!

 A whole international consortium got together in 2022 and found something like 10% more human proteins! 

Does that mean that you now have a FASTA you can reprocess your data with and get like 10% more IDs? 

...not exactly...at least not yet....but it's super cool! Here is the preprint!

Wow. That's a lot of names, including some of the wettest blankets in all of proteomics - "false discovery" this" "analytical metrics of precision" that - "standard pipelines and data storage types" on an on. Names you may not recognize are even worse - they're RiboSeq people.... (I wrote up some stuff on what Riboseq a few years ago here, if you're interested)

Please read the paragraph above in this voice, if you didn't already. 


With the important stuff out of the way, what is all of this? Well, it puts into question how we build those nice protein level FASTA files everyone in mass spec based proteomics takes for granted today - until you don't have one. 

In a nutshell, they threw out some of the assumptions and looked at a few billion human MS/MS sequences on ProteomeXchange that are from tryptic datasets. Billion with a B. And they looked at a few hundred million MS/MS sequences from HLA immunopeptidomics experiments. Honestly, I was pretty surprised there was that much HLA data publicly available. Y'all have been busy! There wasn't very much (good) stuff out there when I un-retired from science in 2018. 

Have you ever 6 frame translated your own genomic data in MaxQuant? There is a little tool for it. And it defaults to something like 50 amino acids. What if the genomics people have also been doing something like that all along? Would you care? What use is a 31 amino acid protein? At 110 Da each that's only 3,410 Da. Cut it with trypsin once or twice and it is probably too small to detect. And you won't get more than 1 peptide for it. 

Here is where it gets cool, though. For about 4 years people have been confidently finding surface peptides (MHCs or HLAs or NeoAntigens, whatever you want to call them) on the cell surface that map to genetic information that isn't in our FASTAs. There was a flurry of this in 2020-2022. In the study I know the best out of these Amol Prakash found over 700 that he was super confident about. And that was one of maybe 5 papers that dropped over this period of time where everyone was like ...ummm....WTF...? 

And - get this - the RiboSeq nerds have been seeing the same thing. There are mRNA transcripts going to the ribosome - presumably to be ribosomed into chains of amino acids - and they come from regions of the DNA that are annotated as noncoding. 

So these two groups worked on it for like 2 years and this is what they found - overlapping data supported by both MS based proteomics data in repositories and whatever stuff the RiboSeq thingamabobs produce. 

And what did they find? I'm just going to take screenshots of the coolest stuff. I started this on my phone earlier today. 



100 codons! Wait. That's 100 amino acids, right? That's not as small as my example above! 

Boom! Add these to my FASTA! Let's gooooooo!

But we're not there yet. This is a cautious group. They first built some cool new resources

Then they remind us (me? you?) that there are long established rules about calling something a protein that were agreed upon by the Chromosome Centric Human Proteome Project (you're using the SpongeBob voice now, right?) And that there is validation and other stuff. BUT - this is all still really cool. 

If you find a section of the DNA turned into mRNA and hanging out inside of ribosome AND you find (probably a gross looking, immunopeptides or no fun) MS1 and MS2 fragmentation spectra showing that same sequence occupying one of the HLA things you pulled down - that's probably around somewhere doing stuff, right? AND what if you're being all nosy and looking in other people's proteomics data AND you see those peptides there? 

Evolution is pretty stingy. It doesn't generally go out of it's way to make new mRNA and then put it in the thing that translates it and then leave it floating around for some nerds to detect, right? Accidents happen where there isn't sufficient evolutionary pressure to lead to the removal of things, but they are the exception rather than the rule. 

Super exciting stuff, right? 

Man, while I was looking for the preprint on my PC and this was half written, I found a much better breakdown of the study and results in the form of a Tweetorial. You can check it out here

Did I forget to make fun of the fact they used the Trans Proteomic Pipeline thing? That is what they processed the data with. And the fun people at the Broad probably sequenced the peptides by hand with a ruler. 

I'll leave you with one last screenshot from this super cool likely text book altering study. 


(BTW, they're calling these cool things they found "ncORFs". They leave a lot of questions open to the community for how these should be categorized and dealt with, etc., but you'll have to go to the paper for those. 

If you are new here I should probably clarify that me taking the time to poke fun at a study wherever I can is the highest form of compliment I generally can come up with. This study may contribute to answering so many riddles like -what are these other spectra? Why is our coverage of the immunopeptidome so abysmal? It also shows why we can't just target the proteome for every study - What percentage of the proteome do we even understand now?!? If these data were all from targeted experiments, we'd never know that the genome/proteome may be 10% larger than we thought. What other stuff is hiding? 

I can't recommend this (51 pages???? WTaF?) enough. 

Tuesday, September 10, 2024

MonoMS1 - Can you just identify peptides from predicted precursor/RT and ion mobility alone?


Didn't I just post an MS1 only based prediction paper? I sure did, but here is another! 

Have you ever just thought we're doing a little too much stuff? How many degrees of certainty do you need that the peptide you're interested in is the one you're looking at.

MonoMS1 takes a step back and asks this question: If I have

Solid chromatography

High resolution ion mobility

And a high resolution precursor

Do I need everything else? Can I just predict where my peptide is going to elute, how many charges it will pick up and what the isotopic envelope - and here is the twist - predict a solid 1/k0 value, can I do proteomics?

What paper? Oh yeah - this one - 


They model on HeLa peptides on a TIMSTOF Pro then they move up from an E.coli digest to increasingly complex samples - next serum - and then to single cells (by a reanalysis of this TIMSTOF SCP based paper). Someone has a big hard drive! It's a zipped terabyte of data! 

In the final case the improvements are really interesting (from figure 4C). The MS2 identifications and MS1 predictions don't always agree, but the information they pull out appears complementary in downstream analysis

For fields looking at a whole lot more precursors than fragmentation events (still!) deep learning precursors might be a solid avenue to figuring out what some of this stuff actually is. 
 

Monday, September 9, 2024

ABRF 2025 Session "Single Cell Proteomics in the Core Lab - Are we there yet??"

 


It's on, y'all! Can we do single cell proteomics (SCP) in the core where cost recovery is always hanging around in the background getting on your nerves?

Let's find out at ABRF 2025 March 23-26 in Las Vegas! 

Justin Walley just committed as my first invited speaker (I'm session chair! What?!?) 

And Justin's team's paper on Arabidopsis root biology by single cell proteomics is out now! You can check out the published paper here. 



Thursday, September 5, 2024

Happy Proteomics Day!


30 years ago today some guy said the word "Proteomics" at a conference. It was THE first term used to encapsulate all of the versions of a specific class of molecules in an organism. Transcriptomics Followed. Then Metabolomics. Etc. Etc.,

We sat down with the man himself, Dr. Marc Wilkins to talk about it. This is the kick off for the next season of THE Proteomics Show "What is a proteomics?" 

https://open.spotify.com/show/3R8aGNVMKwfovkWWRVnu4E

or https://podcasts.apple.com/us/podcast/the-proteomics-show/id1655412251

or IMDB (weird) https://www.imdb.com/title/tt27034412/

or amazon music or 10 other things or type "podcast" into your phone. Suprised the hell out of me when that worked. 

Check it out, and happy birthday proteomics! 


I generated both of the figures above using the Dall-E program in OpenAI. You can totally use them, unless I'm reading something wrong. I'm about to subscribe. 

I'm a paid subscriber of the DreamStudio AI and I know for sure you can use the prompt generated figures I generate because that's how I made the cover of the world's best proteomics journal a couple months ago. ACS checks the legality of these images. This is what DreamStudio came up with and I do like it. Balloons inside the 3D protein structure? Again, feel free to use however you'd like that spreads the power and excitement of proteomics. 


Sunday, September 1, 2024

You can do TMT on the TIMSTOFs!?!?

 


This revolutionary new application note is making the rounds on LinkedIn and I'm largely just floored by the fact that you can multiplex on the TIMSTOFs! 

Apparently the trick is that you use every other TMTPro channel, since you can't get full baseline separation of every N/C TMT reporter ion pair with the TIMSTOFs and it would be patently irresponsible to try and do quantification without that separation. 

They even go to "single cell equivalents" which means they diluted HeLa / K562 and some other cell line digests down to a 100% theoretical yield of a digested single cell, labeled those and - HeLa/K562 and the other cell line separate by PCA. The do it pretty fast, using the Whisper 120 method and TMT 9-plex.  If they didn't have any blanks or QCs this workflow could do more than 1000 "cells"/day. And who needs blanks and QCs anyway? 

If you are going to attempt this groundbreaking application, you probably want to not use just the "n" channels. If you can pick any channels you want you're better off alternating N/C because at this resolution, using the N channels alone will give you the maximum amount of cross talk between your isotopic impurities. 

Saturday, August 31, 2024

Lupine - Impute TMT missing values across thousands of samples!

 


For some reason I thought lupine had something to do with werewolves, but if that connection exists, I couldn't sort through Google's ads to get there. You can find some pretty colored flower things, though. 

Lupine is the topic of this new preprint where a deep learning model applied to 1,000 or so of the samples in the CPTAC project.  

CPTAC has been going on for a long time and that puts a lot of confounding variables - that's beside even the fact that these proteomes are from diverse cell types and cancer types. So...Figure 5 is pretty darned impressive....



Wednesday, August 28, 2024

TesorAI Search - Cloud based spectral libraries - no percolating required!

 


Is there finally a small weird splintery group of proteomics people out there who are starting to think about "The Cloud Computing?" Probably not, but this new preprint is really cool! 


As in any bioinformatics paper there are very boring flow charts everywhere BUT they did us all a favor and kept their equations to themselves. No one likes you showing off how many Greek letters you know. 

Why does it belong on this awful blog, though? There are so....many....tools.... Well, I took a dataset that I know exceptionally well and I ran those files in it, and it looks legit.

AND it took me like 2 minutes to figure out the software. 

Go here. https://console.tesorai.com/

Register a fake email address (don't enough bioinformatics people know how to reach you already... no? use your real email, whatever........you do you) 

You'll get 20 credits/tokens and each one of those is worth 1 Thermo .RAW file.

I used 3 single HeLa cells prepped with NanoPots and ran with a ridiculously low flow rate on an Orbitrap Exploris 480 system that some nice people at BYU put up on ProteomeXchange.

The one part that isn't intuitive is that you should load your FASTA file from the file upload then go to the FASTA tab. It looks like you can load it through workflow, but you have to back out.

It only accepts Thermo .RAW files right now and the C+57 and M+/- oxidation so there are no buttons to push, really. Just load that stuff and tell it to run. 

My 3 .RAW files (about 800MB each) took 38 minutes from run to report.

I'm looking at a protein group report with about 2,800 proteins and that seems very reasonable based on my previous analyses of these cells. I'll have to check, but this is probably more than any combination of search tools have ever gotten me on these files, but they're certainly in the right range.  The PSM report seems reasonable and there are all the normal things - intensity values, funny new score metrics. 

Don't take my word for it, dig out that old Hotmail account you use at the grocery store and try it out for yourself! 

Big shoutout to Matt Labenski for tipping me off to this great use of my lunch break!