Wednesday, December 11, 2024

Proteoform level analysis of "purified" albumin reveals shocking levels of complexity!

I'm again going to put off blog posts on the 50,000 human proteome cohorts using "next gen" spot-based targeted proteomics.

I get it - I love the idea that we can use alternative technologies and get to this kind of population level protein level studies. But -just to remind you - we don't know the answer to this question


What we do know - without any possible doubt, whatsoever, that evolution is super ridiculously stingy when it comes to making new stuff. Sure, there are excessive things that don't negatively impact the overall survival of a population - but those are exceedingly rare.

If a cell makes an alternative form of a protein you can almost guarantee that there is a good fucking reason for it. 

Case in point - what if you treated one of the single best characterized proteins on the planet - not as a protein - but as a proteome

This team took a look at multiple "purified" forms of trusty old bovine serum albumin and treated it like a population of proteoforms - and 

1) It definitely is

2) "Purifying" a protein is....umm... something you'd think was 100% super well defined. Let's go with "could use further definition and characterization". 

We've seen things like these in the past - here is an old post where a modified Exactive (I think what later became the Exactive EMR, one of my all time favorite little boxes) - pulled out 59 different forms of ovalbumin. 

That's hard to look at and really encapsulate. Woodland et al., went full classic protein biochemistry and - this isn't hard to understand. This is a "purified albumin" separated out by isoelectric focusing in dimension 1 and by SDS-PAGE in dimension 2. 


This is a purified protein??? Some of that stuff probably was just tagging along, but a whole ton of that is albumin proteoforms. And - again - there is probably a very good reason for why an organism would expend energy to develop alternative forms of all these proteins, right? 

This is a super cool and thought provoking study that pokes some holes in more than a couple of our normal assumptions. 

Tuesday, December 10, 2024

ProHap - Search your proteomics data against population variants! Critically important new community resource!

 


STOP. IGNORE THE FLOWCHART ABOVE. These are bioinformatics people, they think this stuff is mandatory. I assume their conferences all have contests where the winner makes the flowchart most likely to make someone in another field throw up.

Again - don't look at it - 'cause this is legitimately important. 

You know how the genomics people have been doing things for years with illustrious sounding titles like "The 1,000 Human Genome Project?" Particularly when a lot of those things kicked off and the technology was more expensive, these things absorbed HUGE amounts of research dollars. The goals were to undestand how human genomes vary across us - as a species. 

And they did these things and they kept the results 100% secret from everyone forever. 

I guess that's not true, but -to me, as a human proteomics reseacher -they have been less than useless. Yay, you did a bunch of stuff. Who does that help? Not me or anyone I know. Even researchers I know who focus on health disparities can't get usable data out of these things.

UNTIL NOW. 

What these awesome, though flow-chart loving people did was dig into these top secret genomic databases and they assessed - 

-you won't believe it -

Protein level changes across human populations! This is where it gets important. 

How many peptide level variants could there possibly be in 1,000 genomes? 12? 15? 

Try 54,679! Don't believe me? Here is a completely not illegally taken screenshot. Don't sue me!


Almost FIFTY-FIVE THOUSAND PEPTIDE VARIANTS?!?

How many are you looking for in your data? One? Yeah, me too. I mean, unless we're doing deep cancer genomics and then we search for 2 million. Why not normal variants?!? 

Okay - are you thinking - "big deal, I probably need to spend the next 10 days downloading klugey python scripts written by proteomics people and finding out that my Docker thing is from 2017? How on earth does this help me?" 

And this is where this is super legit. 

Go here. https://zenodo.org/records/12671302

Download this - 


Use 7-zip or something to unzip it twice. (I don't know, it's right there with the flowchart competition, bioinforomatics people have contests to see who can Zip things the most number of times. Bonus - as in here - instead of naming each Zip .zip you can name them weird things. The first thing you unzip is .gz, then it will make a .tar, and you also unzip that - and you'll get the whole reason I've written this entire thing -


You get a FASTA FILE that represents common peptide level variants that appear in human beings across our population! 


Yeah, it's pretty big. 104MB and 157k entries. But you're encapsulating some much larger percentage of normal human genetics now! 

100% check out the paper. They did other smart stuff and there are other (possibly superior files depending on your application.) 

If you're using FragPipe (you should be!) check out this advice from Alexey! 


And check out this additional resource from his team here!

Monday, December 9, 2024

Top down proteoform analysis of kinase inhibitors in an approachable method!

 


Wow. This new study of kinase inhibitor treatment of cancer cells - using top down (intact protein/ no digestion) proteomics is 

1) Super legit

2) Seems really approachable

3) Kind of resets the bar in my head for what we can do right now with today's off-the-shelf technology.

And I might have a surprise for you. While Neil Kelleher's name is here because it is part of a special issue in his honor - this isn't a Kelleher lab study! 


Generally when we see a super impressive top down study I flip through it and then think - cool - maybe I'll be able to replicate it in 10 years? There is often modified instruments or things where you think - if I was able to keep my core scientific team together in a group for a decade we could pull off something this hard. 

Not to say there isn't some legit technical firepower on this study (Kevin Gao is a pro's pro mass spectrometrist), but you can read through this protocol and think - wait - could I totally do this? 

Instrumentation is an Exploris 240! (Approachable, affordable, clean it yourself hardware!) 

The HPLC is a custom Accela....okay, well...I don't have one of those, but it is running at 400nL/min with an interesting combination of buffers. I assume any U3000 RSLC, Eksigent or whatever could assume those same performance metrics. 

Custom nanoLC source. Details in references, but you can make a nanoLC source from Legos. Probably not that tough to reproduce (or necessary). There are funny little bars that are necessary for the Exploris systems when you make your own source and those can set you back several hundred $$

They used TopPic suite for the data analysis, which you can get for free here, as long as you sign stuff saying you won't be a jerk. For some of the focused proteoform specific (is the phosphorylation site at this place or this place) they (interestingly) used BioPharma Finder. I've never loaded more than 5 proteins in that at a time and it's super slow with that many. I assume they put in one sequence and a narrow time window in order to really lock down that one target they're trying to localize. 

The results are well displayed - really pretty and clear - and, again, really might just change your mind about doing top down proteomics. Bravo to this team, I legitimately loved reading this paper from beginning to end. 

Wait - found something to complain about! Whew, I was worried. They haven't unlocked the PRIDE repository so I can look at the files. It was just accepted (JPR ASAP). 

Sunday, December 8, 2024

Is this the year I finally win the US HUPO conference T-shirt design contest?!?

 


I think it is, though I also thought that in 2022....

and...maybe I did win the Chicago one...? No, it looks like I tried to print my own shirt and the company thought I was playing a joke on them? Weird. 

Well, if you think you can beat my entry, go ahead and try! Mwhahahahahahaaaa. You can waste your time submitting one here

Saturday, December 7, 2024

THE (real) single cell proteomics technique scSeq people love - NanoSplits- is out!

 


Check out one of my favorite techniques of the last few years - the NanoSplits paper here! 


The first preprint of this study is somewhere on the blog, but the work evolved considerably since we initially saw it.

If you aren't familiar, what this does is label free preparation of REAL NORMAL SIZED SINGLE ONE (1, uno, um, eins, jeden, yski, en siffra, een, ichi) at a time on glass slides using precision robotics. 

THEN the lysed cell is split into 2 fractions with most of the protein going one way and more of the little transcripts going the other way. You do single cell proteomics on the fraction with more protein and you can amplify the transcripts in the other fraction for transcriptomics. 

BOOM! You get everything! Now, there are obviously some drawbacks here, including that it is really hard to do. You need the precision robotics. This team features some people with serious instrumentation backgrounds but also people with a history of simplifying methods so mortals can eventually do them. We've written 2 grant applications where the technique has been prominently featured. The scSeq people are a whole lot more comfortable with this measuring protein thing if they can get evidence that you aren't just making stuff up! 

What's super cool here is that while multiple groups have shown complementary data by doing stuff like single cell proteomics and single cell seq on the same or very similar populations of cells (my group's first study was dosing the same cell line from the same source with the same drug - in a recent study) - here you get a real - Cell A proteomics and transcriptomics fill in a specific pattern. Cell B the same. 

The authors are quick to point out that NanoSplits could be a bridge technique to unify findings between more traditional studies where you either do SCP or scSeq or both on the same population. A small number of cells split could explain discrepancies between these 2 data types, or help you truly link 2 populations together. 

Seriously - a phenomenal, clever technique with top notch data collection and informatics and when I resubmit a grant in a couple of months I'm sure my reviewers will be excited to see a prominently published paper rather than a link to a preprint.   

Friday, December 6, 2024

Nature's Method of the Year 2024 is Spatial Proteomics!

 


WOOOOOOOOOHOOOOOO! 

Editorial here! 

Last year it was long read sequencing or something (they forgot to include it in the 2014 issue, I'm pretty sure).

Check out this special virtual issue (click on the references!) highlighting a bunch of cool people in our field and their work!

Thursday, December 5, 2024

Improve your false discovery in your match between runs with PIP-ECHO!

For an old and probably inaccurate description of match between runs (MBR) you can check out this old post. 

Also, you probably shouldn't go past Fengchao and Sarah's paper here. Link might be the preprint.

Quick breakdown, though -

Imagine you run 50 LCMS runs on different patient samples.

In 35 of those runs you fragment and successfully identify PEPTIIIIDEK, it's pretty much 100% +2 charged and 634.3608 and comes off at 15.6 minutes 

In the other 15 runs you see a +2 peptide at 15.6 minutes but you don't fragment it or don't get good enough sequence quality for a positive ID. 

Match Between Runs (MBR) to the rescue! It donates that identification to the runs where it was not identified. 

Perfect idea, right? What's the problem? There are crapload of peptides in any tryptic digest and they coelute a lot. And as the dynamic range of our instruments keeps going up we see lower abundance peptides that we might not have before. 

Compound this with shorter LC gradients

And the fact that every mass analyzer has a +/- mass error

And the retention time on nanoLC, which everyone is pretty much using for some reason no one can justify, is probably more accurately, in those 35 runs, that peptide is coming off somewhere between 14.5 and 16.5 minutes - and now you might be quantifying the wrong peptide. 

The top link above is to IonQuant which works in FragPipe. 

Could you take that idea and build something even better? Maybe! Just do this! 




The comparisons look good, though! And there is some serious nerd power on the preprint


I checked the updates and it doesn't look like it's live on MetaMorpheus yet, though something that was driving me absolutely crazy a while back is! (Thank you! I thought my brain had broken (the diagnostic ion is wrong a lot of places - including in software from my group. ) 

Knowing these people we'll PIP-ECHO it in one of the upcoming builds. Fingers crossed it will work outside of the Orbi domain! 




Wednesday, December 4, 2024

BlueSky is tracked by Altmetric. Twitter is finally dead to science!



Well...that was a run! I was on Twitter for 11(?) years and Tweeted over 10,000 times. 

Mandatory, obviously. 


It's done, though. Biorxiv won't link your Tweets and I don't see a tab on Altmetric. That increasingly bizarre drug addict killed what was at one time the best device for rapidly disseminating scientific advances I'd ever seen. 

BLUESKY is going to be better, I think. The expertise density is legit and some people are starting to figure out some of the cool features that I haven't yet. 

And!  BLUESKY IS TRACKED BY ALTMETRIC, JUST LIKE THIS WEIRD BLOG!  Wait. What? Sure is! And has been for almost a decade. I don't know why, I just type things here with my nice ergonomic keyboard and my broken brain filters. 

Tuesday, December 3, 2024

SCPro - Not single cell proteomics, at all, in any way, but still pretty cool!

 


Here, I fixed it for ya! Even though I can clearly see why I also wouldn't have recommended Ben Orsburn as a reviewer for this one, I do actually really like this new study


I will, however, complain first. Last week I had a great meeting set up with a potential funder for my program and we got to an impasse that was something like the most important person on the call saying

"-of course we understand that single cell proteomics is not actually a single cell" 

And, while that was not at all unexpected because this was not a dumb group of people. They're up to date and read a lot and obviously they realize that

THE VAST MAJORITY OF "SINGLE CELL PROTEOMICS" PAPERS 

DO NOT

DO PROTEOMICS ON ONE (1) CELL. 

They don't do one cell because it is still hard to do. Believe me, I don't care what hardware you have you really have to be on your A-game with everything planned out and have some luck (no lab floods helps a lot) and a 384 well plate that is slightly mismanufactured so it sits silly in your CellenOne can break some expensive glass and you walk away with nothing at the end of a 14 hour day in the lab when you didn't get to stop for lunch (true story).

So people do things like flow sort 1,000 cells as in the study above or they stain and cut out 10 or 200 cell regions based on cell type specificity markers (as they also did in this study) and to really boost the impact of their paper they put "single cell proteomics" in the title. Or, if they're super on their marketing game they'll name their not-Single-Cell-Proteomics method something like SCPro. Deliberately confusing? 

Again, in this study - which I do seriously like - they do both. The microscopy is nice, the flow sorting looks good. The front end prep on a tip with SCX and C-18 is - in my somewhat professional opinion - probably a whole lot of extra work for very little actual gain over doing something with less steps and places for lower concentration peptides to bind. But the library generation and diaPASEF analysis on a TIMSTOF Pro results in a solid number of IDs. (50 um custom columns, with low flow rates on a nanoElute). When you get down to what looks like to my eye, probably 5-ish cells sliced out, (probably 1ng-ish of protein/peptide) they're getting 500 proteins, which is a solid achievement. At 10-30 cells they're getting above 2k. (maybe 2-6 ng of protein/peptide). Again, nice numbers. 

The downstream analysis is well integrated and the files are publicly available for both the LCMS files (I haven't checked, my office PC has software on it that doesn't like the iPROX access portal sometimes). 

Again, it's a nice study, working with a grand total of 100 nanograms of peptides from 1,000 flow sorted cells does still require some finesse. It is, however, frustrating to spend weeks optimizing the isolation of one (1) single cell at a time and analyzing them one at a time and then hit these new perception hurdles. Like - well, this other lab is doing single cell proteomics (they aren't) or that no one can actually analyze one single cell when we have whole conferences where the admittedly small number of researchers actually doing one cell at a time do speak about it. 

These perception hurdles have always existed, though. I have little scratches in the surface of my relatively new tablet where I've broken the expensive little tips off of the pens that can write on my relatively new tablet when people have said "well, mass spec isn't quantitative." 4 scratches in the last 2 years, for sure. 

This would have been a super positive review (aside from the SCX/C-18 tip) without the title and misleading name for the technique. 

Monday, December 2, 2024

Proteomics of ...spontaneous achilles rupture....

 

I would like to thank these authors and the prestigious Journal of Proteome Research for something new to have nightmares about

SPONTANEOUS ACHILLES EXPLOSIONS! 


Proteomics to the rescue! (By the way, there is this whole series of bizarre children's books where there will be some silly problem and it's all COWS TO THE RESCUE or something. It's funny by the 11th page and continues through the 6th book somehow.) 

Obviously, this group wants to understand why sometimes people's achilles up and explode just for fun, and they are able to get samples from patient who end up getting correction surgeries! Obviously, this is yet another place where genomics/transcriptomics of the tissue will probably tell you nothing - so it's proteomics time! 

Interstingly the group breaks out iTRAQ 8-plex and does pooling, offline fractionation (by SCX, I think, but I forget now, did a singing daycare drop between reading it and now) and then analysis of the fractions on a Q Exactive (Classic, I think). All the files are up on PRIDE where they should be. I have no issue with iTRAQ 8-plex here, by the way. They turned up the collision energy, fragmented each target 2x before putting it on the dynamic exclusion list. The 8-plex allows them to run the QE at maximum (vendor permitted) speed with 17,500 resolution at m/z 200. 

What I do have an issue with is the surprising pictures of the operation itself!! I was expecting a calibration or volcano plot and - blech. 

Seriously, though, it is all pretty intersting. There are structural differences in the ruptured tissue that are clearly visible and they go into depth with IHC. They find a panel of targets that might be indicative of potential rupture candidates? It's a super compelling study all around - on something I now get to think about. 

Interestingly...I think this is a dataset that would be a solid candidate for a reanalysis because it looks like this group didn't consider common collagen PTMs. I'm assuming when they considered dynamic oxidations they exclusively mean methionines. Collagens are hyper-modified. In fact, in the BOLT cloud search engine, Amol wrote in a whole crapload of common collagen PTMs in the first pass search because they're just that common. I think he got that idea from the cRAP database guy, whatever his name is. 😇