Tuesday, March 31, 2020
UniProt just launched this great new page, linking all their COVID-19 resources. I expect a lot more with UniProt2020_02 in just a few weeks.
Monday, March 30, 2020
I think my expectations are typical for what I expect from the good people of the world making free software and proteomics tools.
I just want:
1) Completely new ideas that are way better than the old ones.
2) Ultra powerful algorithms that use resources I couldn't possibly get or use elsewhere.
3) It all bundled in a way that will only take me like 45 seconds to install.
4) It to be intuitive enough that I don't have to read anything to use all this power.
All the perfectly reasonable expectations we all have for our bioinformagicians out there.
We know that chemically modifying a peptide with a PTM shifts it's retention time. How? That depends on the modification. Phosphopeptides generally come out earlier (you probably lost a significant number of them if you used a PepMap trap column.), but what about the other ones? That's one of the problems you need DeepLC for. Loads of application here, but I've gotta move fast today.
You can get the program from this Github!
Super easy installation and it only has 8 settings! Tons of new power for me in exchange for exactly zero effort on my part?
I'm taking 15 minutes to stop looking at a FAIMS mystery to just, honestly, not look at it for a second and -- BOOM -- someone put mice on turntables and did proteomics of their brains?
I'm also going to grab breakfast/lunch
...and a figure from the abstract to prove I'm not joking!
...apparently it is about learning about how mice learn. I'm going to still do the joke I planned.
If you weren't aware this is a thing, it's down to the finals and team mass spec needs your votes.
You can vote here!
Yes....um....CRISPR and PCR were beaten in the earlier rounds...because...hmmm....
Okay, but mass spec is DEFINITELY cooler than Cryo-EM. If you don't use a UHMR or EMR to speed up the workflows, even with an inexpensive Cryo-EM ($2M) to do all of the QC work and optimization stuff for your normal Cryo-EM (maybe $8M) you can only solve 4 or 6 protein structures per year with one.
Sunday, March 29, 2020
Okay...so...I'm honestly just floored by how ridiculously accurate these Prosit spectra are... Everyone should be using this tool. (Click should expand it.)
First off -- This tool doesn't require you to be a master bioinformagician or anything to use it.
Here is my simple walkthrough on how I generate Prosit spectral libraries.
At the top is a mirror plot generated in Proteome Discoverer 2.4. The top is the experimental peptide from a COVID-19 / SARS-CoV-2 preprint that came out on Monday 3/23/20. The bottom is the prediction made by Prosit on 1/27/20.
Worth noting: In earlier versions of Proteome Discoverer the MSPepSearch may not accept the Prosit spectral libraries (whateveryoucalledit.MSP). I pretty much jumped from 2.2 to 2.4 since I was doing mostly small molecule stuff for a year. If you are on an earlier version of PD 2.2, there is a solution -- MSAna (which is compatible with PD 2.1-2.4 -- all versions -- including the free versions)
You can get MSAna and the installation instructions from www.pd-nodes.org -- now you can use spectral libraries. MSAna also has several options for decoy library generation. It is worth checking out on it's own.
Back to the library vs theoretical, though!
This is the Prosit predicted fragmentation pattern for this peptide. This is a screenshot from the ridiculously handy and free tool PDV (Proteomics Data Viewer) that I use basically every day now for one reason or another.
Yo, where did y14 and y15 go? And it is kind of a neat characteristic that it only predicts that you'll see a central series of b -ions...
An experimental PSM says...?
....pretty darned close!
Okay -- that's not bad, right? However, you should go back to the top and look at the relative intensities of these fragments.....because Prosit predicts those too! Actually, here is a zoomed in clip!
A deep learning tool predicted the bottom....and the top is the real spectra from a peptide that had never been experimentally observed in unlabeled for until last Monday!! Crazy, right?
I have been pretty hard on a lot of the "artificial learning, machine intelligence, deep intelligence" stuff and I still think this is legit funny --
-- and I'm still going to be skeptical of anyone saying those terms (as we should be, of course) but I'm flipping through these spectra from this tool and this is one deep learning thingamabob that looks like its doing exactly what it's supposed to.
EDIT -- If you use spectral libraries in Proteome Discoverer -- go to the PSM level and double click anywhere on the PSM to open the normal menu you are used to. Now you'll find that this button is not greyed out. Click that and it will open your experimental vs library spectrum(a)(es)
Saturday, March 28, 2020
Avoid my Saturday rambling about this cool new paper...and...other stuff, and just go read it yourself here!
One of the cool things about genomics being a decade or so ahead of us in many regards is that we
can learn from the mistakes they made (...well...we could..in theory...) hmmm.... I'm...hmm...okay...well... start over
can steal a lot of their cool ideas and programs!
If you're trying to figure out what peptides or proteins are significantly different between your conditions, you're probably using a tool that was designed for RNA microrrays, like:
LIMMA works great. We have loads of proof, but there is a huge difference between RNA microarrays and shotgun proteomics.
...besides the fact that RNA doesn't correlate with protein levels....
You always get the same number of measurements for each target! The old Affy arrays I used would have something like 46,000 RNA things stuck to it. Each sample would hybridize (or whatever) to those 46,000 so, in theory, you're always getting back 46,000 measurements per sample.
Shotgun proteomics isn't like that at all! Some proteins will only get 1 or 2 PSMS, even when you go all out. Even in high abundance proteins, you'll have stochastic effects, you almost never see even a technical replicate where you always got 84 PSMs for the protein in each one.
What if you adjusted for that in some way? Like you purposely adjust your model so that it expects a situation where the PSM levels are realistically variable from run to run?
I've rambled enough and I can't pretend I can follow the math, but this group validates the crap out of this approach using a ton of different types of publicly deposited proteomics data and, across the board, it looks fantastic in every one of them. Here are the conclusions that I am the most excited about:
Friday, March 27, 2020
If you aren't curious about how this group in Frankfurt somehow got this much high quality proteomics out the door on SARS-CoV-2 so impossibly fast, you're weird. (If you missed it, here is my recap of it with links to their data. This is the TMT calibrator study and the first proteomics study out on this stupid virus)
The great people of the London Proteomics Discusion Group is moving to a full webinar format and Christian Munch (my keyboard won't make the correct symbols, there is supposed to be something over the vowel in his last name) will talk about this amazing endeavor.
You can register for it here! (I also attempted to make the register button work, it probably didn't)
Huge shoutout to Dr. Harvey Johnston for tipping me off to this.
I'm just going to leave this here.
As someone who believes that NanoLC is one of the 3 main reasons why proteomics isn't taken seriously, wow, am I ever excited to think about throwing a 2 micron internal diameter column into my workflows
75 picograms of peptides. 1,000 proteins ID'ed.
I presume a 1 uL bubble in this system takes 11 years to get out of your system, but it is interesting nonetheless.
Thursday, March 26, 2020
Who knew proteomics was so fast?!?! COVID-19 study number 3?!?
I was thrown off at first because I didn't recognize the species in their FASTA files. The SARS-CoV-2 virus was used to infect monkey cells.
Nanopore sequencing was used here as well as "high-low" shotgun proteomics on a Fusion Lumos system.
(High resolution MS1, ultra fast, low resolution MS/MS in the ion trap. I'm pretty sure they used HCD fragmentation. (I don't have time to go back through it today).
What I did think was interesting enough to screenshot was the number of phosphorylation sites that the detected. I know I rambled about the ModPred program on here somewhere recently (Edit -found it!) and, check this out!
I can't use pictures from the preprint due to the copyright terms, but the phosphorylation sites that they find line up surprisingly well with the ModPred predictions!
There is a neat point where they find multiple phosphorylation sites in one area that ModPred is convinced there is only one likely location....and given that it is ion trap data, it wouldn't be strange to think that the localization was difficult. It would be interesting to take a deeper look into which one was right in that scenario.
The RAW files have been publicly uploaded, but I can't seem to pull them down. They are through a service I haven't heard of before.
Wednesday, March 25, 2020
I'm not qualified to give an opinion on this new study. Fortunately, that hasn't stopped me in at least a decade....
This is how I explain it when I lecture, though.
Proteomics is really good at peptide-spectral-matching.
As a field...it's fair to say that we're better at drinking beer than we are at assembling those peptides back into proteins. (Come on, that's at least part of the reason we're so bummed out about all the conferences we're missing right now. Nothing facilitates a discussion about ion fragmentation than being in a bar with 100 of the world's best experts on the topic, unless it's going to the next bar and finding another 90 or so).
Hopefully that protein is 100% unique. No amino acids line up in order with any other protein in all of the universe. If that is the case, we're set! If it isn't....well.....new code that is designed to be easily integrated into our existing pipelines? That's a total win, even if I don't understand it at all.
Want a good overview of the two COVID-19 proteomics studies? This article does a great job of covering the giant Munch lab TMT study and the impossibly large protein-protein interaction data that posted in BiorXIV yesterday.
I think you have to be signed up to read it, but I'm pretty sure it's free to read.
Moral of the story? This isn't the proteomics of 2004 when SARS was tearing things up and we couldn't help. We've grown up a lot and have things to offer.
Tuesday, March 24, 2020
It is truly freaking amazing how fast some of these studies for COVID-19 / SARS-CoV-2 /2019-nCOV, are coming together.
As I'm waiting patiently for even the RAW files to download, you have to think -- how the heck did they get this study done this fast!?!??
Okay -- so 75 people working on it? That could be helpful!
Each protein from the virus was cloned and used as a bait -- whoa...they even cloned the predicted cleavage sites separately....and they tested all these drugs....?...wow.....
A KingFisher was used to help automate the pull-downs/IPs/AEs, whatever you like to call them.
The digestions were performed on-bead and 2 LCMS systems were used. One is a QE Plus system running 75 minute gradients. I'm a little unclear on if 2 identical systems were used or if the second system was something different.
Downstream analysis used SAINT and MIST and MaxQuant and protein complexes were cross-referenced with CORUM. And the bioinformaticians on this weren't sitting around. The number of programs used in the downstream is staggering....
What you want is the ProteomeXchange data and it's all here.
Aren't a proteomics wizard? Ummm..you might have come to the wrong weird blog. But if you just want the Protein Protein interaction data -- they were uploaded to NDex!
Monday, March 23, 2020
For today's break from watching people who think this COVID-19 thing is a hoax, I present the easiest way to optimize your peptide loading that I've ever seen!!
I get this question all the time. You get this question all the time. Sure, we did the BCA, so there is all the protein, but how much peptide do we put on the instrument.
The actual value might surprise you! ...nevermind...I guess it is right there in the image....
We can argue about loading amounts vs how dirty your quadrupole gets vs whatever later. I strongly suggest this paper, if only, to see the easiest way to truly measure and optimize your peptide load you've ever tried.
I don't know what is going on in Ghent, but they seem to have this notion that they can just fix all these things in proteomics that have been broken forever.....
Sunday, March 22, 2020
Wait. Maybe this is the paper that kicked off all the twitter conversations before I woke up this morning and my brain didn't make the connections till after the espresso kicked in, I was in the methods, and half the files had downloaded. This paper is here, open, and fantastic.
First thing to clear is that these are all bacterial pathogens. Haemophilus influenzae is a bacteria and is not at all related to influenza viruses. I bet the names just go back to the symptoms. I'm going to put in a picture so the post has some color!
What did they do?
They grew up the pathogens in culture and did discovery proteomics.
They used this data to find the very best targets
They developed PRMs (Q Exactive HF)
Then they used those PRMs to see if they could find those peptides in clinical samples taken from swabs!!
Now, here is the kicker. They used NanoLC and 50 minute gradients.
In my home state of West Virginia where people are being told a minimum of 3 days before they'll find out if they test positive for COVID-19 (once they prove, without a shadow of a doubt that they have symptoms, were in contact with someone who tested positive AND are willing to go to the media in frustration to try and get testing) a 50 minute assays probably seems pretty damned fast for a pathogen diagnostic.
Again -- COVID-19 is a virus. These are bacteria, I'm just making a comparison here.
Okay -- I don't have all the files, but this is interesting. They appear to have utilized what I'd call an "enriched DDA" approach as well. Someone had a clever name for it a few years back, and maybe that's what it was and I stole it. I forget.
In this method you have a targeted list that you put in and you use regular ol' DDA. If the instrument sees the peptides on your inclusion list in the MS1 scan within the limits that you give it, it preferentially selects them for fragmentation.
If it doesn't see them, it keeps on doing normal DDA.
This can be super powerful, but when I've seen this not work as planned, the typical culprit is that an in silico selection of targets gives you the monoisotopic.
For bigger peptides, that 0.1% of the carbons being C-13 adds up. Even here at ~2500, the monoisotopic isn't the best peak to pick. As you increase in size the monoisotopic will become a less attractive target for high res triggering.
By using the experimental data to pick their targets, they avoid this common in-silico hurdle.
Great study. Topical (which I learned recently means more than "don't eat")!
Saturday, March 21, 2020
The jury is out on this great and pertinent survey Dr. Pino put out on social media (most people are voting for leaving the vacuum and electronics on which probably is the best bet, but not everyone can do that).
If you are going the shutdown route, I worked for the US government for several years in my career. I've got loads of experience with shutdowns!
Disclaimer: Always talk to your vendor and local FSE. Do not take the advice of some weirdo on the internet about your instrument that was hundreds of thousands or millions of dollars. Duh.
First of all -- please dig that instrument manual out. Wait. Scratch that. Use the newest instrument manual! I bet your vendor's website has the newest one. If you're Q Exactive user, it'll look like this.
Way better front page than the one you've probably got a hard copy of. Manuals seriously do evolve. I've been told to do things that I know was not in the manual and found out I just had an old one and the way the old one said to do it was wrong. True stories!
Follow the manual. It's boring, but super important.
Big thing I always forget: It is very extremely common for the gas line solenoids to default to open rather than closed when you shut off the main power. If you're on gas cylinders or liquid dewars, you won't be the next day....
If you're on house N2 and it's limitless, you can keep venting it forever, I guess.
There are big words on every single page of this PDF that warn me not to share it without the explicit permission of the authors, so here it is!
That really cool slide deck that neither of us know where you got it from is great despite being a few years older. Not everything that is old is bad. Just...almost...everything. If you go to the end there is lots about doing a PM and the oil part is super critical if you're being powered down.
According to this surprisingly interesting Leybold (they make some of the vacuum pumps you're familiar with) course material on "Fundamentals of Vacuum Technology"...
When you're constantly pulling a vacuum and the pumps are hot you are venting the moisture that might come into your instrument. It's not much and that's why the oil last for 6 months to a year depending on pump and oil. When it is sitting, you can pick up some moisture.
This may be completely wrong, but this is what we'd do after a prolonged shutdown for Congress and whoever to agree to whatever.
1) Visually examine the oil for signs of any partitioning or oil/water layers. If you see anything, dump that oil the fuck out and replace with the vendor recommended oil.
2) Pump down the instrument and bake out.
3) In the very near future -- in Baltimore summers, 1 week max, rest of the time, maybe 2 weeks, shut it all down again and change out the pump oil. Water in the pump reduces the roughing pump capabilities. And you know where that extra pumping power has to come from -- the turbos.
Change the oil in my roughing pumps more often than strictly necessary, or stress my turbo more?
Disclaimer Repeat: Always talk to your vendor and local FSE. Do not take the advice of some weirdo on the internet about anything, in particular, your instrument that was hundreds of thousands or millions of dollars. Duh.
Friday, March 20, 2020
This new study in FASEB is interesting for several reasons beyond the obvious ones like: How do you talk someone into exercising and then getting a muscle biopsy?
Doesn't ubiquitination (ubiquitylation?) just mediate protein destruction? I guess not. Geez...could any of these biological systems just be not-complicated?
For this study, Parker et al., use TMT 10-plex and 2 hour gradients on a QE Plus and Fusion system. The MS2 resolution used is 35,000 which I know some groups consider too low, but on the D30 systems like the Orbi Velos or Q Exactive Classic/Plus can make a lot more sense than getting half the number of scans because you went overkill with 70,000 depending on the experiment. In single shot and ultra-complex mixture? It can make a lot of sense.
The RAW files are all up on ProteomeXchange here.
This is a good representative image of the closest reporter ions from one of these files (keep in mind that resolution in the Orbitrap Velos is an estimation of the resolution at 400 m/z and 200 m/z in the Q Exactives, so the resolution at the tag region is much higher. Time of Flight systems, however, have uniform resolution across the m/z range, you really need a 50,000 resolution TOF to get data this good.)
Would baseline be a little better? Sure, but at the cost of 50% scans? A decision worth thinking about depending on your experiment.
This got a little off topic, but this is a cool study and a great example of doing quantitative ubiquitin signaling studies with multiplex tags!
Thursday, March 19, 2020
(Nope. I don't know what that is, but I'm borrowing it because it's super cool looking and I don't see anywhere that says I can't use it.)
Last year at ASMS it was PROSIT, Prosit, PROSIT!!, Prosit, Prosit. Deep learning for spectral libraries. And you know what I feel bad about? The fact that there are 3 or 4 other things out there for deep learning spectral libraries that are also AMAZING and I can't name them off the top of my head. I'm going to work on that. PROSIT is the most recent and it draws off of ProteomeTools which is synthesizing EVERY HUMAN PEPTIDE. So...that's a pretty big advantage. Plus it is so easy to use that I can do it.
Prove it? Here is a walkthrough that I made. Bonus Shia LaBeouf (caution, music)
How far can you push Prosit spectral libraries? Here is a brand new pressure test for Data Independent Acquisition (DIA):
Why would you read it? There...is....so...much...DIA...optimization....out...there....?
Single shot human with E.coli peptides spiked in
Q Exactive (the regular ol' D30 Q Exactive -- okay, it's a Plus, but the quad is a little better and it gates smarter.)
>8,000 proteins quantified
How do they get there?
SPEEEEEEEED digestion (me rambling about it here)
uPAC columns (great, albeit kind of long chromatography -- shortest gradient is 160 minutes)
Optimization of targets based on Data Point Per Peak (DPPP...which...isn't an acronym I can imagine using again)
What's that do?
It reduces the size of an in silico digested peptide library from over 3e6 precursors to around 2e5 precursors.
Don't believe me? Check out the data yourself. It is all up on ProteomeXchange as PXD017639
And -- they didn't use SpectroNaut, by the way! They used DIA-NN (Neural Networks) which you can get from Github here!
Wednesday, March 18, 2020
Did you know that there is only one (1!) mammalian organ that annually generates? I did not.
That is kind of important, right? And -- why haven't we figured out how we could use this to regenerate other mammalian organs? Maybe it's time we did! And what better way to start than some comprehensive proteomics?
The work was done on an LC-Impact II QTOF system and with a load of negative controls. The goal was to distinguish the proteins that are clearly involved in regeneration from the normal deer proteins. This is a great approach in the less well understood genomes/proteomes (which is basically all of them).
You know what they end up with? Around 150! 150 proteins that are probably controlled through amazing levels of tightly orchestrated regulation and probably a lot of PTMs that will need to be worked out later, but still -- 150 proteins -- to REGENERATE A MAMMALIAN ORGAN! Makes it seem like it wouldn't be all that hard, right? And definitely sounds like something that is worth pursuing further!
What other mysteries might the majestic deer of the forest hold the answer to?