Wednesday, April 29, 2020
Not tired of SARS-CoV-2 yet?
Maybe you need to hear someone with a squeaky nasal voice talk about it in a West Virginian accent.
If so, you're in luck! Somehow I've been added to a lineup this Friday for the London Proteomics Discussion Group.
Two qualified scientists will also be speaking!
You can register for it here.
Tuesday, April 28, 2020
JPT is one of those companies that I think everyone forgets about, because they've been around long enough and made enough great standards that everyone has heard of them at some point. I, for one, can never remember the name and it helps if I start thinking about BASS CANNONS. (Changed my mind -- an Excision link does belong here. Used to tour as the loudest act on earth. Not kidding.)
If you don't know about JPT -- they make my favorite retention time standard for proteomics, PROCAL. Which not only has super sensitive chromatography metrics thanks to the really clever way that the peptides were designed -- but it also can be used to match your collision energy from instrument to instrument!
You can tell the labs that use it because they'll have multiple instruments in their method files and the Tribrid is at 32.5 HCD and the benchtop is at 26, because they used science and standard to match them!
Now -- I'm not just beaming about JPT solutions because I'm pretty sure I keep forgetting to pay them for HLA peptide standards I bought this spring (which they've never once reminded me about -- and are the only place I ever order them from, because they are suuuuuper clean).
They've been killing it on the dumb virus thing we keep talking about. You can basically buy peptides covering the entire virus -- and now you can buy convenient sets!
Monday, April 27, 2020
Glycoproteomics is rapidly massively evolving right now, everyone. It seems like every week some reason that makes glycoproteomics an awful and terrible thing to do with your life is solved -- or, at least, mitigated...
Next addition? Selective labeling of O-GalNac!
It is until you remember the best part about glycans in mass spectrometry! All the stupid sugars have the same stupid masses!
Is it a serine GlcNac? Or a serine GalNac? I dunno. It has the same exact stupid mass. If you really try with optimizing the fragmentation, the distribution between the fragment ions will look a little different. Fortunately, which one it is doesn't matter to biological systems!
Living systems can be annoyingly particular about what sugar isomer is where! And with this system, I don't totally understand, you can selectively label O-GalNacs and tell them apart, not only with mass spectrometry, but also with microscopy!
I don't have to understand the labeling methodology to get that this is a potential game changer that can light up big questions in biology! I'm so confident of that fact that I only skimmed that part of the paper.
Sunday, April 26, 2020
To start -- SimpliFi is a commercial product and is the property of Protifi. I can't make trademark signs, but both those words have those.
Next, however, this is critically important. Ingenuity Pathways Analysis (also gets a TM) is crazy expensive, and was never ever meant for proteomics data. It shows, and I think it always will. But what else do you do? Holy cow, maybe you SimpliFi!
I saw a demo of this at ASMS last year, but it was an outline. If you caught the day 1 virtual US HUPO talks (which you still can here!) John Wilson gave a talk on this program -- and he undersold it. I've been messing around with a demo of it, and it's a game changer for 2 main reasons
1) It make you think about experimental design. (Something proteomics, as a field, isn't real good at)
2) It puts a lot of power at your fingertips in an intuitive way (except for the exerimental design part)
3) The output is proteomics centric (who makes a list with 2 things?)
You should just ignore me and try out SimpliFi here!
I registered and got a demo code to try it out and the data import is fantastic.
You upload your CSV results from whatever you want. SimpliFi then makes solid guesses about that data you gave it. If it is wrong, simply highlight the columns or rows that are correct.
Have you uploaded data into ShinyApps or into Ingenuity? How much time do you spend going back and reformatting your stupid CSV file so that it will identify your data correctly? 99% of the time you've blocked off for your data interpretation? Oh -- is there an invisible space in your CSV? And is that why the browser window locked up for 3 hours while you thought it was processing?
Not a problem here!
Also -- and I'm scared to hope -- for real -- but SimpliFi allows you to put your data in with the distinct batches your files were in. We're thinking about BATCHES -- IN PROTEOMICS?
(Not the game?)
More on that later when I dig up some data where I can test it, right now I'm messing around with some cool cancer demo data -- and look how beautiful this output is!
Okay -- yes -- if you're an expert with R and or Perseus, maybe you won't find this as impressive as I do, but if you're not an expert in this stuff and you're constantly flabbergasted by requests to interpret the beautiful data you just generated -- this may be what you're looking for.
I think it's still in demo/development phase, and we might find out it costs as much as Ingenuity to use when it comes out, but in just one day of messing around with it with classic NBA games playing in the background here and there (Iverson is under appreciated, btw, Hall of Fame or not.) -- I'd write a justification to trade the two out right now.
Friday, April 24, 2020
This is big!
In our preprints we proposed what it would take to make LCMS a competitive diagnostic assay for the other SARS-CoV-2 assays out there. We know we probably can't beat RT-PCR, but we need to hit a magic number the ImmunoSwab assays and ELISA's and Protein Arrays can hit, because they've been validated to work!
That magic number is around a detection of 20 picograms viral protein/mL biological fluid.
And these people just showed you can hit that with PRM!
They used an Orbitrap Eclipse.
They used 60 or 90 minute gradients.
They used NanoLC.
You could argue none of these are ideal for clinical diagnostics (unless you're the manufacturer of the Orbitrap Eclipse, which, would probably argue that's a great idea at a little north of $1M USD per box), but as a proof of concept?!?! This is fantastic.
The files are at ProteomeXchange/PRIDE here.
Thursday, April 23, 2020
Wednesday, April 22, 2020
When I started grad school I had to shift gears a lot and learn a lot of molecular biology. I didn't retain any of it, but I do remember a lot about yeast 2 hybrid assays and how methodical and painful those sounded. Express one protein as a bait? Locate it's interactors one at a time, in one system? It's something like that. I think I maybe thought that mass spectrometry and big interactome studies completely negated those technologies.
Okay -- but what if someone started doing these around the turn of the century and then just kept going? Obviously, the systems work, but the only way to get all the data is to do ALL OF THE PROTEINS. Then you'd end up with a binary map of all of them. And that's what we've got here.
What would you need to do this, outside of a decade or two?
175 people helping you? That's a decent start.
I bet the reviewers at Nature were just like ".......whoa.....okay.....you deserve to be here because I am SO glad that I didn't have to do this....."
Now -- I get a little fuzzy on the numbers here. And probably on the details. To be honest, I'm paywalled but I can still explore the output of this research. It's called HURI and you can access the portal here.
The numbers get fuzzy because it appers that around 9,000 proteins were done by the Yeast 2 Hybrid thingy, but HuRI also curates "high quality interactions" from the literature. Also, some mass spectrometry is also pulled in because there is a small set of data from ProteomeXchange that is referenced (PXD012321) but this looks a bit like a validation set.
As much as it seems like I'm poking fun at the method, this is obviously valuable. When you're trying to work out the biological implications of the proteins you're seeing that are messed up where do you go from there? Personally, I send people right to BioPlex. However, BioPlex is an ongoing project that only just now deposited data from it's second cell line. If you're protein isn't there -- there is HuRI. Also -- I'm sure looking at the data in BioPlex (which is high quality AE-MS [IP-MS]) which might encompass a lot of secondary, and possibly tertiary interactions, might be aided by looking at binary interaction data, at least as a filter. (Not a biologist, but hey -- I'm adding it to my list for when people say "thanks for the list -- WTF do I do now?"
Tuesday, April 21, 2020
Huge potential (probably) because if you could gargle a solution and then SMART digest some of it and then find SARS-CoV-2 peptides, that's awesome. That is even easier than a throat swab!
SMART digests are fast, and you don't get easier. In my hands, S-Traps are far far cleaner, but "put protein in this tube and heat it up." If you've got the separation or instrument resolution, that's a great solution. They toss in some PNGase to deglycosylate and it looks like that's the only variation.
Slightly mixed feelings:
1) A 3 hour total run time on a ~$1M Tribrid system running a complex sample method (both high resolution MS/MS and simultaneous low resolution MS/MS acquisition -which, don't get me wrong -- on it's own, that's cool. In the context of a starting point for developing a diagnostic assay? Seems like
2) No data deposited?!? AAAAAAAAAAAAAAAAHHHHHHH!!!
(Feel free to explore the hypocrisy from the guy who has been in industry almost his whole career and has deposited very little data because no one would let me do it. That happens, but in this pandemic, we've gotten spoiled. I've downloaded some of the COVID-19 proteomics data from other groups before the preprint even posted.)
Again -- huge promise here. And my assumption is that the team is so busy designing faster assays that they are still waiting for their files to upload.
The gargle avenue could be a huge advance for the ease of diagnostic angles, but if we've barely picked up one ion trap MS/MS spectra on a 3 hour gradient on the world's most sensitive high resolution instrument -- that says a lot about whether this Quantum Discovery over there is gonna be able to pick up a peptide from the same solution.
Monday, April 20, 2020
Okay - I'm finally back after an almost week long struggle with a super sophisticated malware thing that blocked the install of all malware updates that could remove it. All sorts of fun, I promise.
AND -- Now I have proteomic AND metabolomic data from 40+ COVID-19 infected patients and 50+ controls to dig through?!?
Wait -- has this been out for 2 weeks? Okay -- well -- thanks Google Scholar alerts, you're winning it in the pandemic.....
The proteomics is:
TMT Pro labeled (16-plex)
Fractionated into 120 fractions
Concatenated into 40
Ran with microflow (not nano) in 35 minute gradients
The Metabolomics is:
Separated into 4 batches
QE HF -- I'm unclear as to the data acquisition strategy. If you read the methods it suggests MS1 only at 35,000 resolution, which is not only inaccurate (because that setting doesn't exist on the device), but is a suboptimal way to run metabolomics, but then data dependent acquisition is implied to have happened. (The classical metabolomics world seems to think that lower resolution and mass accuracy is okay -- I disagree. I think low resolution metabolomics is fantastic if you've got a pathway you want to find and whether it's actually there or not is of secondary concern.)
The stats are on-point. I dig the downstream analysis here.
All the data is available at ProteomeXchange via IPX Project ID: IPX0002106000
Sunday, April 19, 2020
BOOM! Brand new, clear, and with pretty and thorough figures (open ahead of print!)
Friday, April 17, 2020
If you've ever published anything you probably have a special folder from "Journals" of an often increasing degree of sophistication offering you publication spots, or Editorial positions, or -- and this is the best one -- inviting you to speak at conferences that they just made up.
There are multiple databases trying to keep up with these "predatory" organizations, but -- again -- some of them are getting better. I seriously almost fell for going to Europe for a predatory conference. I love talking about proteomics and I'm probably going to fall for it eventually.
The best things are when people deliberately mess with these journals -- and this might be one of the best ones yet -- I humbly present my favorite thing I've read today--
Thursday, April 16, 2020
When you get to go back to lab, you cancer researchers, you'll have some great new ready made resources courtesy of
You can check them out here!
Okay...to be honest, I thought this post was going to be poking fun at this -- like -- hey, what is wrong with Picky and Phosphopedia -- they do this stuff and have been around for years, but the more I jump around, the more this looks like well-spent tax dollars.
LOD/LLOQ on peptide assays? Legit SOPs that you can download? Yeah, you'll have to do more of the heavy lifting than with the two resources I mentioned, but for modeling? this is legit.
Wednesday, April 15, 2020
Want the inside track on how this huge protein protein interaction study was assembled so rapidly?
Figure out what time it is in London and check out this great talk on Friday!
ALSO -- have you heard about the COVID-19 Mass Spectrometry Coalition?
Sign up and let's kill this stupid virus thing!
Tuesday, April 14, 2020
Mystery with a short answer (I think) that took me a while to figure out. However, since it isn't detailed precisely in the study above, I feel a little less unsmart. (1,2,3 negatives in that sentence? Meh.)
When you fire up your FAIMS Pro, you're going to be IMpressed. The background noise is great, the nitrogen consumption....less great..... and you're going to see WAY more protein IDs!
However, there is a cost to this. You're going to get less coverage of those proteins. The authors made this pretty green chart, so I don't have to (borrowed without any permission whatsoever)
I'll even go one better with another stolen green plot -- the proteins you'll find will be lower abundance! (Deep HeLa is fractionated).
If you spend less time on albumin, titin and keratin, we all win (unless you're a keratin researcher, and you have your own challenges).
I scratched my head about this and plotted stuff a bunch of different ways. Surprisingly the MS1 isolation interference doesn't seem all that different (but -- keep in mind that this is essentially a normalized measurement)
<---No FAIMS left
FAIMS -75 right -->
However, if you take out all the z=1 peptides and your Signal (S) goes up, you're also raising low abundance peptides that are now contributing to your Noise (N), so probably this is all good stuff.
Okay -- so what is actually different between the files?
The stupid charge state distributions!
No FAIMS left -- FAIMS -75 right. All the sudden you've got a bunch more +3 peptides (relatively)
Okay -- this gets better, I think, because you know what a lot of search engines assume? They assume that your MS/MS fragments will be +1 charged.
Sure, they'll try to look at more, but as awesome as a 1980s TransAm looks and sounds, it's only got 205 horsepower. With age, it's probably closer to 170 at the wheels without a full rebuild by someone good. You can get faster used hybrids on Craigslist.
Is that a long and unnecessary metaphor/analogy/something or other? Absolutely.
Nearly all of the newer search engines that I'm always going on about start by deconvoluting the MS/MS spectra prior to searching.
As a more controlled experiment (n=1! I'm winning science today) -- let's take the same file and run it through Proteome Discoverer with and without first deconvoluting the MS/MS spectra so all the fragments are +1 (this was in PD 2.1, due to the fact I'm using an older PC today thanks to some weird malware issues that I think are Zoom related)
It's safe to assume in a tryptic digest that you've always got your single basic residue at the terminus. There's one. In a +2 you've got one "mobile proton" so probably +1 fragments make sense, but in a +3?
I'm too bored with this post to dig up some MS/MS spectra. (I went down a rabbit hole reading about Trans Am specs to make sure I was right. My Craigslist hybrid is faster 0-60 than a 1985 Trans Am when the two were brand new. (The Trans Am looks way cooler, though).
CV -75 file number 1. No deconvolution/SeQuest+ Percolator
Same file. Changed nothing except added deconvolution of the MS/MS spectra
FAIMS Pro results may not be exactly what everyone wants. If you're looking for the highest coverage, maybe you want to take it off and run without it. If what you want is the highest number of protein IDs because you are willing to sacrifice some coverage of the higher abundance ones to see some of the lower abundance ones....
----particularly if you're willing to tweak your workflow toward this newish kind of data!
Monday, April 13, 2020
To the amazing people who have signed up for the largest proteomic informatic challenge in the history of the universe -- THANK YOU!
Also -- just a reminder that we were targeting this week for data submissions.
However -- if you've found your life a little bit offset by some virus thing, we think it's fair to move back the due date.
We'll have a portal or something set up soon(?) to start submitting data this week, but we'll continue to accept submissions for the next 2 weeks.
If you're just now hearing about the biggest proteomics informatics challenge in history and want to join in -- there's no better time than right now.
You can find the official site here.
I have something permanently stuck to the front page of this blog over there --> somewhere.
And I have an informal description of what we're doing and why this matters (beyond showing off how good your progam is or what a wizard you are at processing data) here!
Sunday, April 12, 2020
2) FAIMS Pro
3) Exploris 480
Even conservatively on the 20-ish minutes DDA 500ng HeLa runs I'm getting over 3,800 human proteins on the best runs.
Worth noting, on the DDA runs single FAIMS compensation voltages (CV) are utilized. The authors are very clear that the CV voltages of 70/75 that appear to max out the number of identifications as seen here may be very specific to the instrument being at absolute peak performance (i.e., clean and very lightly used).
I haven't processed the DIA data yet, but it appears to produce better results in their hands.
Huge question here that should be addressed based on TIC alone between some runs I have from EasyNLC vs the signal that I'm seeing here.....do you just naturally get more signal with the EvoSep due to the fact that you're loading directly to your separation column, rather than to a trap as is typical in NanoLC (since you need that column to live a little longer?)
If anyone has that kind of data, I'd love to see it.
Oh yeah! Here is the paper.