Thursday, June 30, 2022

Optimal strategies for labeled multiplexed PHOSPHOproteomics!!

 

The image quality above is not this blog input's fault. This is actually what it looks like through the publisher's online browser that they keep forcing us to use for some reason. 

HOWEVER. This paper is 100% recommended for absolutely everyone. This has the answers to the big questions that I'll call friends who work in core labs to ask.

Multiplexed phosphoproteomics -- when do I label vs. when do I enrich? You can enrich then label then mix, and you can label and mix then enrich. I've done it both ways out of necessity, but what should you actually do? 

TAAADAAAAA! This new permanently on my desktop forever paper asks the hard questions and answers them. The authors obviously knew how important these questions were, and had no question whatsoever that this was going to get published, so they drew the figures in crayon just for fun. 


(I realize it is the conversion process into the publisher's super annoying interface, but this IS the version that I will keep a PDF of, and not only so I don't have to see the coffee cup filling icon on my screen. I happen to like the crayon effect.) 

Wednesday, June 29, 2022

Using transcriptomics to REDUCE databases in proteogenomics!

 

This new study in Genome Biology is, on the surface, probably counter-intuitive. Our smallest databases are the nice reviewed ones from UniProt/SwissProt. When we start looking at databases that are a little (lot) more biologically relevant because of things like genetic variation, it is easy for the input database size to blow up to astronomical proportions. We frequently use an input database with millions of entries, which requires special class based FDR and a lot of computational power (2 million are known cancer-linked mutations, so they're just little snippets of sequences). When we start to toss in databases from those same samples that were derived from next gen DNA or RNA, things tend to blow up. BTW, neither is nearly as clean as you'd guess given the fact we're on "3rd gen" sequencing technology with 1TB of data coming off per sample with these new sequencers. There are fundamental questions being asked right now like -- wait -- is the genome way way way way more complex than we every thought, or is Illumina generating less and less relevant data with each generation and more literal garbage and hoping to cash out before someone stops to consider that the latter is the simpler explanation. Today's data density might give them another 10 years because it will take that long to process the data from 4 patients. 

This wasn't supposed to be a genomics rant, but while I'm going -- long read sequencing is the way to go for us, y'all. Illumina and whatever the thing is that Thermo sells that no one uses, generate really short read sequences. 6-frame translate those little things (reducing them /3 to get amino acids) gives a lot of tiny annoying things to search against. PacBio and NanoPore both produce much longer outputs and it is transformative for us, both by reducing a ton of redundancy and giving us more sequences to match against. 

All of the words starting with the 2nd sentence were meant to impart the fact that, unless I've been doing it totally wrong for 10 years, which is completely possible, proteogenomics databases shouldn't get smaller. They just keep getting bigger. It would be awesome if there was some way, any way, to reduce them.

There is a lot here, and the paper tackles two different concepts. The first is a recently proposed strategy for database reduction that I won't go into because they don't like it. The second big concept they use -- is one where they utilize the transcriptomics (RNA) to reduce databases. The logic is that if there are no transcripts expressed for that gene, it seems silly to go looking for that protein. Using this approach they find a "more sensitive" peptide detection rate (you'll see their terminology and it makes sense shortly into it) even using standard target decoy based approaches.

Big caveat here, of course, that if you are perturbing a system by, for example, irradiating cells that would induce a rapid response that would shut off transcription you definitely shouldn't do this. Proteins with long half-lifes relative to their oligo counterparts would still be hanging around, and then you wouldn't have entries for them. This is just the first example off the top of my head for what a bummer thinking about these systems in a biological concept is probably like. 

Also, I am not sure what the figure I chose for this blogpost is displaying, I really liked their choice of colors. 

Tuesday, June 28, 2022

HUPO2022 -- Quintana Roo deadline next week!

 


Next week is the abstract deadline for HUPO 2022, which is being held in the Mexican state of Quintana Roo. This remote and obscure location was chosen as it would not flag the attention of managers and  administrators who might question your underlying motivation for where you disseminate your newest research in Human Proteomics. We can't go to all of the meetings and we know that we only think about which meetings best aid in the advancement of our professional goals and our field before considering any other factors. While we all know this, these simple facts may be difficult for others who don't understand the level of devotion we have to our craft. They might, for example, question our choices between two very similar sounding conferences with similar sounding goals and names if we chose the one in a famous beach town in December over the one in Chicago in February. 

Therefore, when time came around for Human Proteomics to be in the Americas again, the powers that be saved us any questions by choosing a town in largely unknown state of Quintana Roo in Mexico to make sure our motivations are not questioned. We can now drop right into a quiet and secluded location with a surprisingly convenient airport where there will be no distractions and just 100% focused Human Proteomics. Bravo, HUPO board, bravo! 

Deadline is next week! (7/8)!!

Monday, June 27, 2022

ACS Special Issue Deadlines coming up!

 Probably just leaving this here for me so I don't forget these coming up: 

Software Tools and Resources (round 3!) has a deadline of September 30, 2022! 


And methods for omics research, round 2, is December 1, 2022. 



Sunday, June 26, 2022

Multiplex 29 proteomics samples by using every reagent!

 

Around my ever increasing horror watching my country, the one with the most nuclear weapons, fall every day to the demands of a minority of people who live their lives by literal interpretations of very small excerpts of an ancient and poorly translated book that they clearly have not read, it is hard to focus on science, but for my sanity I'm going to try anyway. 

When I saw this new title I was SUPER EXCITED....

It is tough to create new multiplexed reagents, while more than the ones we commonly talk about probably exist the ones that we talk about are protected by some extremely observant lawyers. As I might have mentioned before, I've experienced the keen observation skills of said lawyers first-hand and what I type in these boxes has to, at some level, be guided by considerations of what might end up making them get all jumpy and litigationy again. 

The number "29" probably should have tipped me off prior to the abstract that something fishy might be going on here. We've got 11-plex and 18-plex reagents now. Which to use? 


At first you might say, as I did, f******************k thaaaaaaaaaaat. How would you process the data? Why would you ever double your reporter background interference from coisolated peptides? 126-131n literally just doubled. 

Okay, but somehow the data looks good here. Maybe this is worth reading more of? They probably used real time search based MS3 or super tight ion mobility? Nope, they used a Q Exactive HF with 1.0 Da isolation windows an the JUMP masstag software

The authors used a two proteome digest to pressure test their quan with E.coli labeled with one multiplex reagent and human brain labeled with the others and -- somehow -- the quan doesn't look bad. I don't think that with my software of choice that I can replicate this. I'd need to have my 4 tags that are usually static set as dynamic which would do wonders for my search space and FDR and it would take a lot of post-processing to get this sorted out. I might be wrong, though! And -- if you're in a pinch, maybe this is something to keep in the back of your mind? 

Monday, June 20, 2022

Patterns in proximity labeling (and other experiments) improve when overlaying on LOPIT data!

 

Yo. This paper is super dense and it took me a little while, but I think this is a positive concept, particularly for y'all out there APEXing and BioID'ing stuff. 


I might be too old for using nonlinear dimensionality reduction in my daily life but these smart young people with all their neuroplasticity are getting the hang of it. I think I've posted this before, but here is a good video introduction to these concepts. (All of this is just an aside, but it is really cool seeing t-SNE's in proteomics that make sense :)

Check this out, though. You know all the data people have been acquiring using LOPIT (localization of proteins by their organelles, or something, I'm at least close)? These data aren't just a neat trick like I thought. This group reuses LOPIT data to better understand other data! Think about where most of our annotations come from that say "this protein is in the cytoplasm, this protein is in the nucleus, etc.," I'd bet you $4 that it didn't come from proteomics data. It is more likely that in 1954 someone tagged a yeast protein with uranium and before they died at their microscope, they were pretty sure that protein was at the yeast membrane. Then someone in the 90s when all crazy with BLAST on their NetScape browser and found a 10% sequence match and 1e-74 whatever crazy metric-less score on BLAST that makes everything sound more significant than any metric ever and now that human protein is annotated as membrane forever. We should probably think about updating some of that stuff with LOPIT data....

When you do? These authors demonstrate multiple examples where evaluating quantitative data in this subcellular context further strengthens the conclusions of earlier studies. Even more interesting? There is a strong example where the conclusions you can draw from the data seem very different when thinking of the protein in an organellar context rather than as a list of ups and downs. 

The trick here is how to apply these maps without having to bother with updating my R packages which I'm sure are out of sync between PCs by now....

Sunday, June 19, 2022

Find the genomic variants that make it to peptides -- with MaxQuant!

 

Step by step instructions for how to do something -- in MaxQuant?!? That's great! 

Step by step instructions on how to do something really really hard in MaxQuant?!? That is somehow even better! 

Genomic data has just as many stupid formats as we do, but their files are often bigger. I've built a couple of proteomics studies from crappy sequencing data on repositories and I can't remember what the difference between a .fasq or a .bum is without looking them up. Chances are I'll just download the wrong one and will realize it is wrong when I can't convert it. 

It's all here! 


 

Saturday, June 18, 2022

Perspectives on de novo antibody sequencing!

 


A lot of antibodies and antibody drug conjugates and whatever they call those other things coming your way? Thinking about how best to analyze these frustratingly impure things? It is crazy how a fine tuned evolutionary mechanism optimized over billions of years to make variable proteins to combat a near infinite number of biological obstacles can end up making varied proteins despite our 40 years of effort to make them make one thing at a time. 

However, utilizing these things for our own designs is awesome and if you start accepting antibody characterization projects it can open the flood gates to crazy proteins. 

Where are we at on de novo analysis of endogenous antibodies? Here is a great perspective from a team that knows a thing or two about it! 



Friday, June 17, 2022

High pH fractionation of 6 microgram of peptides on 96 well plates!

 


I've been looking absolutely everywhere for something like this and I think I've got everything I need already sitting here to pull this off!  Paper link!


On closer look this isn't exactly what I was looking for, but I think that I can use this as inspiration because their goal is different than mine and they demonstrate how remarkably flexible C-18 and HPH fractionation can be. They essentially elute their peptides off of immobilized C-18 and pull the supernatant as the fractionated. Super cool, right?!? Sounds a little complex, but they boost their peptide coverage of interest from low sample volumes, so there are applications for this kind of fractionation for sure. 

Thursday, June 16, 2022

Mouse proteome draft map!

 


Whoa! That's a lot of samples! And they did phosphopeptide enrichment of all of them?

Okay, so if mouse proteomics isn't your thing but cancer is, there is a second resource as well. Check out this web deployment of Pacific



Wednesday, June 15, 2022

Is fancy trypsin the best science scam of all time?

 


When I was in high school I worked at a grocery store at night with a bunch of other poor redneck kids and we did typical stupid redneck kid things before and after work. My coworkers had the biggest subwoofers that would fit in their Ford Festivas or Chevy Cavaliers and we'd test to see if you could hear the bass from these in places like the walk in freezers inside the grocery store. Being a nonconformist, I spent all my money on gasoline and new rear tires for my idiotic 79 mustang that had way too large of an engine for the 4 spd transmission that was intended for the stock 2.3L shared with the Ford Pinto. Why did that have a shared bolt pattern with the pre-emissions small blocks? Who knows?

My friends spent a lot of their money on things like gold plated amp terminals. Or gold plated wire and I always assumed those were real things until James Randy went on a crusade about it and proved humans can't discern the differences (if they even exist), big shoutout to Dave for tipping me off to this years ago.

Why would I ramble about this scam at the end of my blog hiatus? Because there might be a parallel. As some in proteomics are shedding off the (largely unnecessary?) burdens of nanoLC there is a downside in that you need to load more peptides and may need tons more of that expensive trypsin. Even if you aren't scaling up sample sizes, we're being pushed to higher n and trypsin prices aren't going down.  Last year at ASMS2021 I got to see some great analytical flow data at a poster by Matt Foster and he tipped me off to the fact he's using "lower grade" trypsin and his data looks sick.

I got some cool waste material from our group and set to work with trypsin that cost me $40 a gram. And, you know what? I can't tell a difference AT ALL. 

This is probably my worst run so far. I think it is from heart (randomized tissue samples) but given the proteins that would be my guess just looking at it. 

Are we all being scammed? At ASMS2022 they were giving out some free new super duper trypsin samples. Brett, one of the few mass spectrometrists to not get COVID at ASMS because he got it right before, did this math. 

That's kind of a lot of money. 

Sunday, June 12, 2022

Trap column intact protein analysis!

 


Okay y'all, get this. Those trap columns that you just use for making sure your collaborator's samples don't ruin your nice instrument are actually just normal columns that are smaller! Not kidding. You probably wouldn't have realized this from the fact that they say misleading things on them like what resin is inside them and length that they are. 

This royal chemistry society advance exploits this fact


In this advance, this team skips the desalting steps for intact protein analysis and -- get this -- they run their protein over a column while allowing the first part of the liquid that goes through the column TO GO TO WASTE! The salts from their intact protein don't go into the instrument. It goes to some waste thing. Then they elute off of that trap column as if it's a short little column. And the proteins come off of that column and into the mass spec saving untold amounts of time. 

I largely like this paper because I think that it you could potentially read my impression of it in two different ways. 1) If you didn't know that you could do this, legit, bravo to these researchers and this nice data. Not kidding. Not being sarcastic. I will be citing this paper all the time going forward. 

2) If you've been doing intact protein analysis like this for the last 10 or 20 years and are thinking -- obviously?? OMG, WTactualF?!?!? this should impress upon you how poor we are as a field of passing our information on in a structured and universal matter. 

Mass spectrometry is still more like an art than a science in a lot of ways and that isn't a good thing at all. Maybe there is a paper that shows this would totally work out there? I can't think of one and I couldn't find a paper with a Scholar search this morning. 

I'm absolutely going to cite the shit out this paper. 

Thursday, June 9, 2022

Agilent ASMS2022 hardware releases -- new QQQ and GCMS!

 

GCMS is something that I think most people in proteomics are probably surprised people still do, but it absolutely has places in chemistry. Wondering about that lama oil you bought from the old man with pigtails at the farmer's market that you largely purchased just so he would stop standing so close to you that kind of smells like acetone? Easy to figure out that -- YUP. That's a lot of acetone! Try doing that will LCMS! 


Next question: WTF do they feed that lama? Aaaaand....which government agency do you even contact to explain this situation to without sounding crazier than the close talking pigtail grandpa? 

Despite the clear importance of GCMS to things like lama protection and the petroleum industry (or CBD extraction from hemp which is sometimes performed by people who make lama oil people seem well grounded in reality and scientific capabilities) there hasn't been a lot of development in this space, besides in the ultra high end market (GC-WTOFs and GC-Orbis). It's cool to see that Agilent is pushing forward into this space with the release of multiple new GCMS systems. 

In addition, a new small, but high end performance QQQ dropped at ASMS. There is more info on these releases here if this is your thing

Wednesday, June 8, 2022

ASMS Big hardware release #3 -- ZenoTrapping SWATH

 


My coverage/guest blogger coverage of the COVID superspreader event otherwise known as ASMS2022 was impacted by some craziness in my schedule and the fact that no one I know has felt very well recently for some reason. I'll do some backtracking as I find time. 

On the hardware front, one move that likely only will surprise you because you probably assumed a ZenoTrap on a SCIEX QTOF would be able to do ZenoTrapping + SWATH, SCIEX rolled out ZenoSwath. The one in our lab (proof it is here in the only mass spec lab in the world with magic pink carpeting [magic because it keeps the asbestos in the floor and out of our lungs!] can't yet ZenoSWATH, so we are puuuuuuuuuuuumped for that upgrade!)

The ZenoTrap is inline after the quad and after the collision cell and right before the TOF. The physics of the ZenoTrap is really cool because it essentially slows the ions into accumulation rather than pushing them to a dead stop with a hard gate. This is super important for my stuff because it compensates for the small time of flight effect of fragment ions of different sizes, allowing the ZenoTrap to eject both high and low mass fragment ions into the TOF in one go. Somewhat less important for most SWATH experiments, but a feature that helps me out a lot. 

It looks like from the online info that you do necessarily have to lose some speed when using ZenoSWATH vs regular SWATH, but 133 Hz isn't all that bad for a 10x boost in low abundance signal. 

This is an aside because I just got data from this. I'll be honest, while EAD and the E-I-E-I-O fragmentation stuff sounded like a neat trick (giant magnets force a charge onto your peptide and you get ETD/ECD fragmentation -- you can actually use your phone to detect the magnets, they're that powerful) it wasn't why I wanted a 7600. I need higher intrascan linear dynamic range. I had some open time on Sunday night and reran some leftover samples with EAD and....it's probably the best looking ETD-type spectra I've ever generated. 

This is following using a non-fragment filter and exclusively plotting c/z ions. I'm not picking and choosing, they ALL look like this.  I don't have any good PTM data, I just used it for regular old peptide ID. I'll drop some comparisons that make sense later! 



Tuesday, June 7, 2022

Guest blogger on the ground at ASMS2022 -- Software updates!

Guest blog report, ASMS2022 (@SpecInformatics, otherwise known as Conor Jenkins) part 2

So the first full day of ASMS is over


and there have been some awesome talks! 

First off, let’s start on the software side.

Fragpipe 18.0 Just dropped from Alexey, bringing a host of new features with it. There is now integrated spectrum visualization of results, a “Headless Mode” that enables fragpipe to be run without the GUI and in the command line so now you can spin it up on your server, and something called diagnostic feature discovery apparently giving you a boost in IDs. New verison can be found here (https://github.com/Nesvilab/FragPipe/releases

Ben interrupting: There is currently a search for mnemonic devices to assist with the spelling of Alexey's last name. Word is he doesn't even know how many "i"s are in it. (As someone with a silent "r" in my last name, I'm allowed to say things like this, the rest of you should try being less insensitive). 
Proteome Discoverer 3.0 is finally coming out. I don’t know about you but ever since I updated PD last, the multithreading capabilities have been less than desirable. Well apparently they have fixed this issues and processing data especially on large sample sets has been noticeably faster! This isn’t the only happy boost. The Chimerys Machine learning node can now be brought into PD. Now looking at a vendor poster today (just for disclosure), they are reporting that this node improves your label free quantification data with a 19% boost in peptides and a 7% boost in protein IDs.

Ben interrupting: For my impression of CPU based machine learning rescoring with Impetus in PD 2.5, please see this post. Y'all, more peptide numbers is cool and all but not if we can't back them up. I am pumped for PD 3.0, but more numbers without evidence is just going to burn this whole proteomics party down right after we got credibility again. Sorry to be a jerk again. I should really check it out again.
Now for the new stuff!
You may remember DeepRescore from a couple years ago:  https://doi.org/10.1002/pmic.201900334. Well that team is back with DeepRescore2 and some pretty amazing results. They are posting a 40% improvement in the number of PSMs and a 10% improvement in phospho localization and identification. I don’t think that they have put the code online yet because a simple google isn’t giving my anything but they are putting DeepRescore2 in a nextflow workflow to make it easily adopted.
The Kuster lab is putting out something that may change the way that we do TMT and label free analyses....Instead of doing a match between runs approach, heavily depending on chromatography, they developed a spectra clustering algo called SIMSI-Transfer. So from my understanding, the spectal scan are grouped into clusters by their similarity. If a spectra doesn’t have a match in the cluster from a search, the identify of that spectra is assigned based on which cluster it belongs too. They showed a >35% improvement in PSMs, >15% improvement in Peptide IDs and >5% protein IDs over match-between runs! This is available right now on github https://github.com/kusterlab/simsi-transfer !

Finally, I have no idea that Metamorpheous could do this, but did you know you can search raw files that were collected in a top down method and a bottom up method at the same time and it kicks ass at identifying more proteoforms than a single search of top down run???? Basically you add your bottom up and top down data to a search but change the protease to topdown for your top down file and then let it fly! They are reporting that a Top down search along identified about 121 different proteoforms, but when you added the bottom up, BAM! Over 8,000 proteoforms identified!

Cool advancement on the software side!