Monday, November 30, 2020

NRPro -- A powerful new approach for antibiotic drug discovery!


Even if you aren't interested in the fact that the pipeline for new antibiotics pretty much ran dry decades ago, you should check out this new paper at ACS

You can eventually drive a nail into a wall with the handle of a screwdriver. Similarly, you can eventually get a search engine designed for tryptic peptides to help you find some oddly conformed endogenous peptides. However, if you want to build a house by nailing peptidic natural products together you probably want to try a different tool and come up with your analogies after coffee. 

Most metabolomic tools are terrible for peptidic natural products. They're too complex, and large and they often multiply charge. Likewise, your peptide tools are looking for a minimum of 7 amino acids or something, and they sure aren't prepped for the mass changes when these little things cyclize! These new tools bridge that gap! 

Sunday, November 29, 2020

Modeling the peptide universe collisional cross sections?

This preprint was not exactly what I was looking for. I was looking for something like PROSIT for collisional cross section predictions, and this is kind of like that.

It feels more like a thought experiment, and it the results are really interesting. There are clear collisional cross sectional patterns (like the crazy image above). It is an interesting read and makes me hopeful that what I was originally looking for would make sense. 

Saturday, November 28, 2020

Tutorial slides -- Proteome Discoverer Free Version with MS-Fragger Install!

Would you like to have a fully functioning version of the Proteome Discoverer environment on your PC at home (or on multiple PCs throughout your lab, which is a much more normal thing to do)? You obviously can't have the commercial nodes that the manufacturer has to pay royalties on, but you can have lots of tools including:

!!!MS-FRAGGER!!! operating in PD on any Windows PC. 

The SugarQB glycoproteomics workflow (which...I'd argue is as good as ANYTHING available for glycoproteomics for any price right now, with that caveat that you have to have your glycan mod in your database) 


Here is a link to a 20-ish slide walkthrough for setting up PD with a bunch of cool free nodes. I tried to include every relevant link, so your browser or security settings might be mad about all the extensions in the slide deck. 

This might not be completely accurate (duh), but I tried and I hope it helps. 

I put up the Version info in the image above, because maybe I'll update it going forward. 

As an aside, having PD installed at home helps me morally justify having a badass OMICSPCs system in my basement. It's only coincidental that it can run Crysis. 

Friday, November 27, 2020

The first TIMSTOF post. First week impressions!

I'm a die hard Orbitrap fan and I always will be. However, there are finally some other technologies out there that can compete in some ways with Dr. Makarov's incredible invention. I don't know if this has made the blog or not, it's been a busy year, but I've finally found a long-term position at JOHNS HOPKINS. I'm junior faculty in Namandje Bumpus's lab and we've got an amazing group of smart fast moving young people and a bunch of crazy ideas. Point of evidence? That ceiling disrupting monster in the picture above. We did a lot of research and compared a lot of data over the last year and a lab with all Thermo instruments filled most of a room with an instrument made by the NMR company? Sure did. That 2,000 pound 8'7" monster is the TIMSTOF Flex + MALDI 2. 

There will probably be a lot of TIMSTOF posts as I struggle through learning this thing and making it do what we want it to do (much of which it doesn't seem to want to do) because writing about it will help me think about it. Here are some very early impressions and some information I would have liked to have had going into the last week. 

First impressions. Installation: 

It's frickin' huge. 

The truck that came to deliver it, was TOO BIG for our loading docks. The 5 crates that arrived (2 of which were over 5 feet tall) does not include an HPLC (which, if you get from Bruker, would be box #6).  If you have any space limitation of any kind, you need to really plan this out. Boxes need to be opened in a specific order, because some crates are just the equipment necessary to move and install the rest. 

You'll also need to be prepared to remove lots of doors -- and to probably get on your neighbor's nerves for a day or two. Recommendation: Have at least 2 people around in addition to the FSE, and prepare for a solid 12 hours the first day to get moved and powered up. 

In general, though, the installation was pretty clean once the behemoth was in place. There is, of course, a federal law against any mass spectrometer manufacturer providing you fully accurate and complete pre-installation documents. I believe the death penalty is a consideration in some states if you receive the correct NEMA plug type schematic ahead of time. Of course, we installed exactly what the instructions stated, and of course, they were wrong. This isn't a knock on Bruker. It's the law. 

First impressions: Ease of use. do I write this without coming off as arrogant or stupid or both? This instrument isn't anywhere near as finished as hardware from other manufacturer's gear. Ease of use was not at the top of the design list. If this is your first mass spectrometer, I think it's going to be rough. 

If you are a biologist and the mass spec is just one of 15 tools you'll use during the week and you don't have a dedicated mass spectrometrist? 

Let's start with hardware first. Swapping from your ESI to your Nano source requires tools. You don't pop two hinges and swap them out. You'll need a tiny hex wrench (1.25mm, I think) there are multiple small parts that will be devastating to lose or break, including tiny gold o-rings and spring. It is very very easy to break your nano column while both installing and removing the nanospray source. It is reasonably easy to instally your nano emitter incorrectly, which you'll only find out that you did after you've fully installed your nano source, requiring you to take it all apart to reseat the seals on your source. You test that seal by blocking a filter on the completed source with a (GLOVED; there's EVIL STUFF IN THE FILTER) finger tip and monitoring the drop in vacuum pressure. 

Fortunately, you will only be swapping the sources every single day, so your chances of making mistakes will be rare. This can be minimized by putting evil fluorinated compounds into a filter so you have a spread of lock mass compounds into your nanospray at all times. I'm putting up a little sign to remind everyone. Gloves if touching the nanoESI. Huge shoutout to Gabriela and Brett at the UC Davis core for providing this secret protocol to me before I broke every NanoLC column in the entire mid-Atlantic taking the source off every day. 

Edit 7/21/21: You don't need to use the evil compounds for your lockmasses. Bruker has suggestions for lockmass compounds for proteomics that won't kill everyone. 

First impressions: Vendor software

Impressively, the TIMSTOF is compatible in an almost plug-n-play format with every vendor HPLC. If you can find the drivers for the instrument they load up well. Without question, the EasyNLC works better on the TIMSTOF than on any Thermo instrument. They communicate digitally and you have better control over parameters. Weird. 

Interstingly, the hardest thing to find in any of the vendor software packages is a mass spectrum. Chromatograms and mobilograms make the front page of most of the software that must be open while you're running the instrument. Finding a mass spectrum requires around 10 button clicks. I think someone forgot to tell the developers what this thing is. You can, however, eventually find one, but the software "Data Analysis" will be very annoyed that you figured it out and will ask you if you want to save changes when you close the software. 

Edit 7/21/21: The <Ctrl>+ Click button is your friend. Navigating around in Data Analysis isn't the most intuitive thing in the world. Fortunately, Mann lab seems to have notice this and fixed it with this fast and intuitive free software! AlphaTims to the rescue! 

First impressions: Performance

THIS THING IS FAST. Ricky Bobby drafting off a slingshot fast. Sloth escaping after burning down a hospital fast. FAST.  120 MS/MS scans per second at 40,000 resolution FAST. 

And the sensitivity is there. You wouldn't think that it would be, but it is. TOFS aren't sensitive. That's most of the problem with them, right? Holy cow. When you extract 500,000 MS/MS spectra out of a 60 min run with around 50ng of peptides on a column, it's easy to think "why was I cursing so loud about how hard it was to get this thing to show me a single MS/MS spectrum?" 

There are some really cool features hidden in the instrument methods. Best I can tell, mostly undocumented. There is a cool preprint from those Max Plank people where the authors state "functions of the instrument are largely unexplored'. I think I have a 1/4 finished post about that somewhere. 

First impressions: Compatibility with tools

This is getting better all the time, but if you find the fact that there are over 1,000 proteomics software tools in the world a little daunting, this might be an instrument for you! Very very few of the tools are compatible with the data from the device. Frustratingly, some of them will look like they're processing the data, hang out a couple days heating your office, and output gibberish. 

Things that don't work:
ProteoWizard (ouch. yeah. that one hurt. I figured I could just convert it to anything I wanted and reopen all my tools) 
Edit: 7/21/2021: ProteoWizard does work! You just need to update your versions. There is even a PasefMGF drop down for conversions. 
Morpheus - weirded out by this one. Even a Bruker generated mZmL crashes for me, but it might just be me. 
Actually, it could just be a poor mZmL formatting thing, so I won't list any others till I have a chance to put more time in. 

Things that do work: 
FragPipe/MS-Fragger (in some functions; no TMT, etc.,) 
MaxQuant (also appears to work, but not for TMT)
Skyline (very recent addition/update) it ever slow, though! 
Several commercial software packages; Bolt/Pinnacle, PEAKS, SpectroNaut, Byonic, what is Robin Park's software called again? IP2? It might be called something else now, and it can work in real time!  I meant to check that out, but busy busy busy.  

Again, this is just a first week with the instrument and I'm sure my impressions will change and evolve. There are a lot of pluses here. This monster of an instrument is enormously capable. It's fast, sensitive, and there is loads of opportunity to build new and exciting experiments, but it doesn't feel like a fully finished product. I think that early adopters are going to largely feel like beta testers. And if that's what you're down for, hell yeah! Me too!  

But if spending lots of time doing method development and maybe fabricating useful parts or testing less dangerous chemicals for calibrating your instrument, or writing patches to make your favorite tools work with your new million dollar instrument isn't what you consider a good use of the minutes you have left on this planet, this might not be what you're looking for yet. It'll be interesting to see what direction this new tech goes in, though! 

Thursday, November 26, 2020

Process TMTPro(16-plex) reagents in MaxQuant!


Huge thank you to Ed Emmott for posting these resources so I could get this workflow going this morning. 

Even if you just downloaded MaxQuant yesterday (1.6.17; i.e. "v.Mannly Wabbit"; [if you didn't know the newest version of R is called "Bunnie-Wunnies Freak Out", I think we could keep this trend going!]) you'll see that you don't have an option for TMTPro yet. That's no problem, but there are a couple of steps! 

You need: 

1) MaxQuant

2) To know where your MaxQuant folders are. 

3) This DropBox link from the Emmott Lab; you'll need both files (link here!)

4) This text file I made. Even though the title of the file is "probably wrong" it seems to work! Woo! 

Step 1: 

1) Make sure MaxQuant is closed. Check your version number that you are currently using. 

2) Find the folder where MaxQuant is actually operating out of. (You probably have a shortcut on your desktop and the actual folders are in your Downloads drive somewhere)


This is probably where you want to be. You want to now swap out the modifications.XML that is in your folder for the one from the Emmott lab dropbox. If you're a paranoid weirdo, just move the original XML file, but I've heard paranoia is rare in mass spectrometrists, so you'll probably just copy right over that old one while shouting "YOLO" or "Parkour" or something equally hip.

Now start MaxQuant. If you go to your Configuration tab, you should see at the bottom that the TMTPro reagent has been added! 

Now that you've got the reagents you can build your new Quan table or import the "TMT16_Ben_Probably wrong" Text file. 

You can also use the second folder in the Emmott Lab dropbox to file --> load parameters. 

And that's it. If you have the correction factors from the kit you purchased and you're a person who uses those you should punch those in. Otherwise you should be set. Looks like it works to me! 

Tuesday, November 24, 2020

Optimal Fragmentation of N- and O- Glycopeptides!

If you're like me and were just kinda hoping you could go the rest of your career without actually knowing the difference between an N-glycan and an O-glycan, I've got some very bad news for you

You're going to have to go to Wikipedia and get them straightened out in your head before you set up that instrument, particularly if you're thinking of using fancy fragmentation methods to get them straight in your head.

The good news is that these authors pretty much fine tuned this all out for you in this great paper. 
The one thing that I would mention is that there are differences from one instrument to the next in terms of the fragmentation energies required. (That's why things like PROCAL exist (<-- blog post for paper). Blog post with exact masses and link to JPT who sells it now. )  

The instrument specific fragmentation energies are somewhat easy to forget about when you're running peptides. They handily fragment well right at the peptide bond, but these authors show what appears to be a huge difference in N-glycopeptide identification with an HCD change from 30 to 35 (!yikes!). You might want to verify that your instrument(s) are really doing what they say they're doing if you're thinking about intact glycopeptide work! 

If you have multiple instruments, it's always good to know what settings match up. It would suck to optimize on your Fusion 1 and then move that to another instrument and waste all those glycopeptide runs! 

Monday, November 23, 2020

PASS-DIA -- Ultradeep (Discovery?) DIA Experiments!

 One of the limitations of DIA has been "you have to know what is there first" sensitive discovery of new things? That's tough if you've got 40 bazillion ions all coisolated at once! 

But what if you dropped your DIA windows to miniscule levels -- as small as what you do with DDA, and you just scanned across and fragmented all the things?  Don't pass the turkey on Thursday (because turkey is gross, but not as gross as the COVID you're dumb cousin with the red hat is going to give you) PASS-DIA! 

One of the things you don't have in your DIA data is a way to link your precursor up to your fragment ions. You'll need this awesome thing hosted at PNNL. It's called mPE-MMR. (Pronounced "muppet murder")

Next you need to run the sample multiple times. It takes a looooong time to acquire all the 2Da windows across a mass range. For PASS-DIA, you make multiple passes. About 150 Da is examined per experiment, and then you reinject for the next pass, moving your scan window to the next 150 Da. That way you cover absolutely everything in 2Da windows. 

The authors show application in a broad range of tissue types and experiments, including glyco and phospho- peptide experiments. 

Monday, November 16, 2020

Two-thirds of proteins of the host and parasite are modified!


Experimental setup?
Red blood cells
Red blood cells + malaria (Plasmodium falciparum; the ultra-deadly on the first infection type)
Enrich for a ton of different PTMs, including: 

Phosphorylation, acetylation, crotonylation, 2- hydroxyisobutyrylation, N-glycosylation, and ubiquitination!!
Q Exactive LCMS

Stunning downsteam analysis.

INCORRECT LINK IN THE PAPER. Don't go to the WolfPSORT link in the paper! 

This is the correct link for it: (this alone is really cool, you should check it out, it's a subcellular localization predictor)

What did they get? 

Over 2/3 of the proteins are modified! Which explains a lot of things! 

Sunday, November 15, 2020

The HUPO High-Stringency Inventory -- An editorial update!

This is a quick open one/two page editorial that makes a really nice read on where we are on the Human Proteome

I almost didn't link to it after seeing Donald Rumsfeld quoted in it, but after careful deliberation, assumed that the author considered quoting Lloyd Christmas and figured a real life person of similar intelligence would be better received by ACS. 

Saturday, November 14, 2020

Argonaut: A Webportal for MultiOmics Collaboration!

Coon Lab has been busy this year! Item 1: A reeeeeaaaaaly intersting patent application.

What's this do? Exactly what the title says! Legit multiOmics integration stuff through a web based platform? I'm only messing around with the example data, but check this out. 

What if you had lipidomics and proteomics and metabolomics on a system? You can link it all together! (You should be able to click on the image to expand). 

And within these separate experiments you can do some really cool comparisons like graphically setting up correlations between the observations and the various conditions. 

You can directly go to your outliers (or you datapoints of extreme interest) by rapidly and creepily fastily kicking out the reports of exactly what you're looking at or just direct info on that datapoint, what that protein or lipid (gross) is, as well as the evidence that supports it. (In this case you can see how many bon-bons will fit in a Ferrari in scientific notation). 

Obviously, this is their example data on their hosted site, but if the IPSA is any indication, I'd guess that this thing works just as well as described. Some people in Madison have some mad programming skills. 100% recommend you check this out! 

Oh, and surprisingly, this exists as an independent entity of the Coon Lab collaboration of COVID severity symptoms. (Preprint link in my raving about it here.) Direct link to the resource here! I assumed that this was going to be the resource announcement for the system that was used to build amazing tool, but it does appear to be distinct.

Like I said, busy! 

Thursday, November 12, 2020

RAW mass spec data is too pretty for you? Look at it in R!

I'm not joking. Images like the one above could BE YOURS! 

Impress your friends with a selection of : UNIVERSALLY HIDEOUS R FONTS! 

Like these? 

Yes! Exactly like these! 

Make sure your dog can't critique your manuscript figures with: 16 COLOR GRAPHICS! (Available in other R packages)

Make sure no one can see that monoisotopic mass is off just a little by: OVERLAPPING LABELS FROM YOUR SODIUM ADDUCTS! 

I'm just being a jerk, it's clear from anyone reading anything in science how critical R is to science. It's used so universally now that it's weird to see downstream analysis without it. 

And you know what we've never really had? A way to go straight from RAW files to stats and that's what this now allows!  Downstream to stats? We've got amazing tools like MSStats and SProCoP and that loads of cool tools from people like Gatto and Wilmarth that have been out there for years for us to use.  But if you've ever thought "wow, I'd really like to extract out anything spectra that has a delta of X?  OR (better question for the power built innately into R) "how often does this delta occur in my RAW data?" 

Get your RAW data into R and all the sudden that becomes possible! 

THEN you could totally make things like this! 

Wednesday, November 11, 2020

INFERNYS! First (second) impression!



1) This may look like a violation of the long standing blog rule "if you can't blog anything nice, get a less weird hobby, you extremely strange person who types too fast".

2) I literally may have no idea what I'm doing. I'm just an aging ape in cool old shoes who types really fast. 

3) I could 100% be doing this wrong

4) I think XCORR and, SeQuest, in general is kinda dumb. 

5) The dataset that I'm currently interested in and using this to test might be suboptimal for comparisons (more below) 

Subtitle for this post: 

INFERNYS: A node made specifically for making some old guys in proteomics mad?  (For real, y'all are gonna get some phone calls. 

Oh yeah. INFERNYS is a new node in Proteome Discoverer(TM/R) that came out in the new release. It's the first addition of deep learning to this software package that I probably literally use every day of my life. I rambled about some of the new stuff a few days ago here

Let's start with the great stuff first!!! 

1) I've ran several comparisons of the exact same workflow with and without INFERNYS. We're currently working on deepening or understanding of the human liver (it's super complex and surprisingly under studied) so that's my focus right now.

2) In EVERY comparison when I've added INFERNYS so far, I've gotten more of everything. I've gotten more Peptide Spectral Matches (PSMs) and that has translated to more peptide groups, and to a higher percentage coverage of the proteins in the liver.  This has translated to, but much more modest, increase in the unique protein and protein group identifications. More PSMS is good! 

The stuff I think will make some people mad. 

In my hands (again, I could be legit dumb, yo) the PSMs have had terrible XCORRs. 

What's an XCORR? It is a historically important metric for spectral match quality. Dr. Will Fondrie is a smart guy who does proteomic informatics stuff, and he did a remarkable job of capturing the idea on his blog here

Let's talk about my dataset first. I'm working off the Adult Human Liver samples from Chan-Hyun Na et al., 2014 (Ramble here)

The ones I like best are 24 High pH reverse phase fractions ran out at around 100 minutes on an Orbitrap Elite in "high/high" mode with HCD (given the speed of the Orbitrap Elite, and today's technology imagine this is a  QE HF with maybe 1/3 sensitivity and a good bit more overhead between scans, and you use around 3 times more electricity per unit time (3 220V 3 phase lines for the instrument! P.S. I still love those huge old monsters.) 

Don't quote me, but I think the MS1 scans are 120,000 resolution and the MS/MS are 30,000 resolution. It's really really nice data, but at this scan speed, there just aren't all that many spectra compared to todays' stuff. There are only 388,000 MS/MS scans. Given the fact that most of today's human stuff I download that is fractionated is in the millions of MS/MS scans, this is why I'm concerned that it might be this dataset.

As you can see in the histograms above -- You get an increase in PSMs, but the most obvious at first glance are the increase in PSMs with an XCorr <1 get a decent increase in PSMs of XCorr <0.5. 

What's a PSM with an XCorr of <0.5 look like? It's probably one or two matching fragment ions. 

This is BIG DATA. It's entirely unfair to pick a couple ions to take a look at and poke fun at them. There isn't a single dataset out there that you can't find a couple peptides like this that snuck through in. 

And....we should 100% consider this. There is a lot of routine data analysis out there in the world on really important things, like "how much pesticide is on that celery" is determined with one fragment ion. Here you've got a high resolutin mass and 2 fragment ions? Heck, there's a good chance that MS/MS spectra above is a good match! Imagine that you build a PRM for this peptide and you picked those two high mass fragment ions? You'd quan off them and move on. (Hopefully they'd be more than 2,000 counts, but I hope what I'm implying is clear). 

However, if you work with someone who is going to take a look at the XCorr and think you're a dumbass for letting some fancy semi-supervised thing (Percolator) and some deep learning mumbojumbo (Inferys) make decisions for you over a tried-and-true statistical model defined in the good ol' days when mass was something we crudely and slowly estimated by painfully ejecting ions synchronously out of a little box using an estimated stability matrix defined by fist sized transistors, I hope that I've given you a fair warning.  😇

What I really mean is that ALL these things are shortcuts. They're necessary because I couldn't look at every PSM in this dataset in the next year unless that's all I did (and I'd get really distracted after the 4th one). Keep that in mind. Never stake anything important on it until you've checked it out manually! 

And -- this is the best little guide for that ever (this blog post has links to the paper)!!

Tuesday, November 10, 2020

Neuropeptidomics of the....umm...cockroach....?

 I honestly can't tell if I should have had breakfast or if I'm glad I didn't....I legitimately don't know. However, now that I have an animated understanding of the central nervous system of the American Cockroach, I'm just pumped to be able to share that image with you. It's from this interesting new study! 

How'd they do neuropeptides on this amazingly interesting organism? 

They dissected it (them?) and took the organs and froze them in liquid nitrogen, followed by grinding them in a mortar and pestle. Next, they separated the peptides from the proteins using a 10kDa molecular weight cutoff filter and kept what didn't get stuck in the filter. Next they did high pH reverse phase fractionation (number of fractions, etc., isn't clear from the methods section, but maybe they discuss it later in the paper, I was afraid there was going to be biological interpretation and I didn't want to accidentally find out what an "MAG" does in one of these things.) 

The fractions were analyzed using data dependent acquisition (they call it "IDA") on a SCIEX TripleTOF (or PentaTOF, I forget now. HexaTOF?) system and the peptides were analyzed with PEAKS. 

Sunday, November 8, 2020

Stressed out about finding NanoLC columns in the fall of 2020?!? Here are some vendors!


Need something new to be concerned about? My Inbox is a random-ish sampling of what is going on in the world, and why is everyone worried about NanoLC columns?!?!

First of all, it turns out that New Objective has been making a lot of things and it seems like a lot of companies just buy their stuff, rebadge it and crank the price up. 

New Objective got hit hard by COVID-19. However, the best I can tell, they are still operational, but they are at a reduced capacity. Now -- it does seem like someone hacked their front page. Don't panic. Somewhere you've got their phone number. Give them a ring, but don't get too stressed unless you need something weird. GOOD LUCK GETTING OPERATIONAL (and write me if I can help at all, for real, I owe NO bigtime for helping me over the years. Unfortunately, I'm not worth much more than a hype man right now, we're doing big stuff in Baltimore! but if you're just looking for some Magnitude, I got you). 

There are other options out there. 

One, I guess if you've got full field vision because you didn't jab a screwdriver through your eyeball this summer, you could pack and pull your own columns. 

The amazing UWPR site has you covered for instructions. 

CHECK THIS OUT. (Direct link to the PDF

If you can't see well enough to do this, please keep in mind that there are other vendors out there. You could pay a 2.6x markup at Fitcher Scientific, but check this out. 

Ever heard of Me neither! But their prices are on point!

John Nouveau (sp?) appears to have called it quits at Harvard....

( the wrong gif...but it's too funny to delete now....)

and is more than just background ion reduction! They've got loads of NanoLC and CapLC solutions (and cool column bombs if both your eyes work.)

In this hunt, I was hoping to run into a group in SoCal that I'd gotten super long (100cm-200cm) nanoLC columns from in the past for very little, but I can't seem to find them anywhere. If you go back through the history of this blog, you'll see that I was a big fan of ProteomicsPlus here in the DC area, but they closed up shop a while back.

We'll get through this! If you know of some other vendors, Tweet me (or something) and I'll add it to the list! 

Friday, November 6, 2020

LFQmbrFDR! (FDR for Match Between Runs!)

That's the level of happy that this new preprint should make you! Keep your eye out of the black pug and then try saying

LFQ-MBR-FDR 3 times! I think I broke my chair! 

This is the preprint I'm about to ramble about! 

Why am I pug puppy chair breaking level excited? Well, let's take this fantastic example of how we typically do Match Between Runs (MBR). 

This is from this great paper
by Schonke et al., (the o in the author's name isn't really an o, I believe it is supposed to be an astonished emoji.  😲 

If you don't use match between runs, you only identify the peptides in your individual runs that you obtained MS/MS fragments for that you can identify. This is obviously a subsection of your total peptides present because of a lot of things, like: your instrument is too slow to fragment all of them, or in sample Ob2 you have some other peptide coisolating and lowering your peptide score below your cutoff, etc. etc., 

MBR allows you to extrapolate from run to run what that peptide you clearly see an MS1 signal for but just didn't identify.

In my always and forever humble opinion, the way Sch😲nke et al., did it is the best way, because for the peptides without matching we have a second level of confidence of the peptide ID. We have retention time and MS1 and probably isotopes and we've got a Peptide Spectral Match. For the not matching we just have the first 2. 

Everyone wants more data, right, but you should denote somehow where you got it from because there is no confidence metric for your match between runs. UNTIL NOW! 

I do think it is worth noting that some guys at Harvard did some really serious work last year to try and estimate how often MBR makes errors. You can read what some weird guy wrote about that here. 

Back to the new preprint:

This group shows off a way of estimating the quality of the MBR data and it's application in their currently existing software! I'm going to check today to see if it is already live there, but I think it is. 

They pressure test it against a bunch of OrbitalTrap and TIMMYTOFFY data. 

Where it should really really be applied? Datasets without a lot of signal. Single cell, in particular! 

Thursday, November 5, 2020

DIA Software Deathmatch!


So many ways to process DIA data! Which one is best? This new preprint tries to answer the question that we've also been working on recently at Hopkins! 

While I strongly recommend you check this out (and I will put some time into reading this) here are my ramblings about what we've tried here. 

DIA-Umpire has been integrated into FragPipe. If you've got an older instance of DIA-Umpire, I strongly suggest you try the updated FragPipe. It's a lot easier to use this way.

We're demo'ing SpectroNaut now and I really like it. At $6k/year for an academic license (my understanding is that it costs more for cores) it ain't cheap, but it's by far the most intuitive to set up processing. The data  It's really pretty with all these QC graphs and things up front, but finding something like "these are my replicates and this is their relative ratios" requires more digging than you'd guess.

DIA-NN is crazy powerful, but the data comes out in such an odd and sideways way that your job isn't done when the software is. There is an R package for working with the data, though.  DIA-NN will also build your libraries directly from FASTA like DIA-Umpire/Fragpipe and SpectroNaut. 

Skyline is, of course, Skyline. If you know how to use is, it's probably the best thing out there. If you don't know how to use it, it seems like a confusing maze of "where is anything".  As it adds power and hides more features in places that might make sense to people using it for the last 10 years, it is becoming my least favorite software to show to students unless we can get them off to a workshop immediately. Like MaxQuant I don't know how this is avoidable, so this isn't really a criticism, but I think we're all accepting that you can't really master these software packages unless you fly to a workshop. 

I haven't tried OpenSwath in years and I'm sure it's come a long way, so that should be on the list for me to give a whirl. At a shortish stop on my career recently we had ScaffoldDIA and I think that if you like Scaffold, you probably want to get a demo of that one. 

Pinnacle didn't make this list. I'm a big fan, but I'm extremely biased, because my friend and long time coworker wrote it. It ain't free either, but since you buy what modules in Pinnacle that you want (or pay an annual fee for them) it can be an affordable solution for specific applications. This year I needed small molecule targeted, metabolomics, proteomics, and large molecule and I think that set someone back $6k for my annual licenses. I'll probably only go for proteomics module next year. Out of what I've tried, Pinnacle has the most intuitive visualizations, particularly if you've got a big ass monitor. (Is anyone else using TVs instead of monitors? $200 for a 27 inch monitor? Or $150 for a 47 inch TV? I'll never buy a monitor again! You get thumbnails for your DIA transitions for each peptide on your main screen so you know up front if you've got a lousy hit for something you're interested in. You can also get a permanent license, but it's made sense for me to shuffle my licenses due to different projects. 

There are more DIA options out there, and more coming all the time. I'm glad these authors did a thorough pressure test of multiple datasets and you should 100% check it out! 

Monday, November 2, 2020

"Single cell" HLA peptidomics by cloning a population?

I'm going to run this by some biologists and MDs today to see if they think this is biologically relevant, or if this is cheating to some extent. 

"Single cell" anything is huge right now, and we've got to be skeptical.  This group just published some decent CyTOF data and got away with calling it single cell somehow and scored Nature Biotech. If you aren't out there trying to find whatever the largest "single cell" in the world is (there is an algae that is over a foot long!) so you can do some damage to this newly emerging field, I'm not sure what side you're on, probably not your own! 

However -- I hope this isn't cheating because, it is reeeeeeeeeeeeeaaaaaaaaaaaallly cool. 

HLA peptides suck. They typically don't have K/R on the ends so they often don't doubly charge (or at least localize to the termini so we have full b/y ion spreads. However, we could absolutely overcome this with high resolution MS/MS IF we have enough signal.

How to get over the signal hurdle? Get a single cell from the tumor and grow a ton of them. Now....I'm going to consult people who know biology, but I'm worried that whatever the presentation is in the tumor vs whatever it is while you're growing out a clonal population on a plate might be very different things. I assume the people who reviewed this considered that, but....ummm....wait. Same journal as the CyTOF IHC "single cell" stuff. So...grain of salt all around. 

Disclaimer: Maybe I just don't understand the CyTOF paper. I'm obviously no expert, but I was very disappointed because there are groups out there doing what I'd consider real single cell metabolism/ metabolomics.


Check out this recent work from the Bamba lab

They go ultra-low flow with fancy cell loading and build a targeted list for 35 metabolites and curves and calculate LOD off good ol' linear regression. 

There is also a great new resource that just came out: 

If you had time to read a whole book you'd probably have time to finish that review you've been working on since 2016. I'm not going to read the whole thing either, but I did get it and the bookmark is at the very back. 

Weird words
Weird words (...did that say microarrays....what the actual f...?..o..urier...?...moving on!) 
Weird words

And then the good stuff! 

Sunday, November 1, 2020

What's new in Proteome Discoverer 2.5?


Proteome Discoverer 2.5 just launched this week. You can get it from the ThermoFlexeraFlexibleNet thing here. I think there was a lot about it at the fall user group thing, but I still haven't caught up on all the talks from ASMS that are on my list, the single cell proteomics thing at NorthEastern, and HUPO was last week. 

If you're going to use it please keep in mind that it will default to 1 processing and 1 consensus workflows. If you queue up 60 things on your 72 thread PC and go out for a bit, you might be mad when you get back. Install it, go to Administration, parallel workflow settings, set it to something reasonable, then close and reopen the software. 

I just downloaded it and am messing around and it looks a lot like version 2.4 with one big new focus: 


You can basically run what I think from the manual is basically PROSIT, directly in Proteome Discoverer! 

You go to the Maintain Spectrum Libraries thing that has always been there that you've justificably never used: 

And now there is a button to "Predict a Library!"

What do you do with your fancy new library? You can use that to search things instead of dumb ol' SeQuest because now you're fancy new library can go right into an improved MSPepSearch that can run multithreads! 

I suspect you can take your library out and use it in other programs like the other Prosit libraries. Unfortunately, I queued up like 60 files without fixing my Admin settings.....I should put a note at the top! 

Interestingly, predicting a library isn't set up the way local PROSIT is. If you set up PROSIT, you need an Nvidia GPU (otherwise known as a video game card...the thing you need to put in your PC so you can run Crysis) and it runs locally there. PD runs on the CPU doesn't hold back.... NOW the office is warmer!  DIA-NN does the same thing when it runs, so that's cool, and everything. 

What would be fun would be a deathmatch! DIA-NN Libary prediction vs Prosit vs PD 2.5 library prediction! Someone should do that....

That's not all, though! There is a new toy called: 

According to a press release from ASMS this thing drops the false discovery rate of peptide IDs down to 100 incorrect out of 100,000 PSMs!  (That implies we were getting more than that before? Uh oh! Just kidding. Yes, we make more mistakes than that typically. FDR 0.01 doesn't mean 1% mistakes, but it's a decent estimate.) 

Worth noting from the manual: INFERYS works for HCD MS/MS spectra of IAA-alkylated cysteines. You can use whatever enzyme you want, but this version does have those limitations.

What else is new? I'm just interpreting from the manual now, my PC is all locked up making a library.

1) You can import your study factors from an Excel table! For a complex study, that could be a huge advantage. 

2) TMT Minimum % occupancy cutoff! When you are doing a TMT study, particularly one where you're using more than one Plex set, if you say "unless I see every quan channel toss it" cuts out a lot of data. What if you could say "toss it if I have less than 81.24% of my quan channels?" That's helpful for sure! 

3) Enrichment charts for downstream analysis sounds great! I messed it up somehow on my first try. I'll double back. 

4) WHOA. Diagnostic fragment ions AND neutral loss assignments? Yes, I know MaxQuant has done this since the 1970s or whatever. But this is great for people who use this software!! 

Okay, I'll double back to this one. You can filter things like this if you use/misuse the SugarQB nodes on older versions of the software. 

There are a few more things I've noticed that didn't quite make the "New in this version" section of the manual. I'll double back after I've had more time to mess around with it!