Wednesday, October 31, 2018

Advanced Precursor Determination -- Bad for TMT? Part 2.

Honestly -- even after reading the second paper -- I'm not sure I get it....

CPTAC-3 has some heavy hitters again in this newest project and a few months back they demonstrated their results of some serious TMT 11-plex optimization.

Their results?

Don't use APD
Don't use MS3 based TMT.

I rambled about that here.

The second one I get. For global proteomics I also go to MS2-based TMT quan. I'd rather get 15 peptides per protein with more isolation interference than getting 9 peptides per protein with less interference. The first one -- cheeeeeeeese --- I can't wrap my head around....

A second study on this topic replicates and elaborates on these findings and is brand new here.

Okay -- honestly -- maybe I get it....and maybe it's just denial....cause I've got some TMT 10-plex data on the PC behind me that is some of the best I've ever seen and it came from this study from the Olsen lab.  

The author's report 16,700 TMT10-plex labeled phosphopeptides. I'm pretty sure I got around 10k when I reprocessed it myself (and I'm a picky jerk about PTMs) with offline fractionation and short short gradients (6 hours total run time or something ridiculously short). 

And maybe it's the offline fractionation that improves the coisolation? And maybe phosphopeptides are just simpler? Because on the HF-X -- at least at launch -- APD was always on....

Lots to think about -- later! 

Monday, October 29, 2018

Security vulnerability in Xcalibur Foundation. Download and install this patch!

This is a serious post -- though the picture above is funny.

There is a vulnerability in Foundation (the program underneath Xcalibur starting way back after version 2.0.7.)

If your PC is online and has Xcalibur or even Foundation - with any of these versions -- every single one of them -- your computer may be at risk. This affects both instruments and PCs that just have Xcalibur on them for looking at data.

You can download the newest Foundation and Xcalibur -- Foundation 3.1SP5 and Xcalibur 4.2 or -- you can install the patch.

I only have a direct link to the patch and clicking on this will start the download.

If you're worried that I'm just making this up, call into tech support or your favorite FSE and ask, they'll be going around doing these patches soon. For reference, this is Factory Communication 2018.020

Sunday, October 28, 2018

XINA! Multiplex proteome kinetics in R

(This is the face I'm going to make if I'm asked to do proteome kinetics....)

But now there is a great new R tool -- called XINA (no relation) that takes loads of work out of this horrible sounding idea!  You can read about XINA in press at JPR here.

If there is another package that can do this, I don't know about it. I especially don't know anything that can directly port out the data into StringDB and KEGG (also through R...sorry...)

You can directly download it through BioConductor or you can pull the whole thing down from Github at this link!

Friday, October 26, 2018


((This image was floating around un-acknowledged on Google Images. It is a CopyRight of Steve Graepel and originally appeared here. This image used without permission, but better that I hunted down the guy who created it, right? (As always, let me know if this is a problem and I'll take it down!))

Okay -- so those degraded peptides?? THOSE ARE A SUPER BIG DEAL!  What if a big team decided to do something crazy and profile those???

BOOM. Here ya' go!

I'll be honest, I'm not 100% sure how they did this. I believe the proteosomes were purified and then the degraded peptides were knocked loose from them somehow. Then MaxQuant was used for an enzyme non-specific search of the entire proteome. Multiple rounds of digested proteomes were used for comparison to make sure they were on the right track.

And -- I can't even wrap my head around all the potential here, but I'm going to try.

1) The "dark proteome" stuff -- which might have a different definition now than the one I normally put with it. I consider it all the stuff that passes MIPS (or Peptide Match) -- so it isotopically looks like a peptide, elutes off c18 when a peptide should, but we don't know what the Albert heck it is.

The protesomes are, presumably, active ALL THE TIME. So a lot of the background peptides may have just been profiled in this paper!

2) How these differ between disease states could open up a whole new field in diagnostics!  The proteosomes are tightly regulated by a series of complex processes (typically modulated by ubiquitin, as far as we can tell, right?) Some proteins are labeled for degradation just because they're old (there is an N-terminal instability thing that marks old proteins) or they're degraded as part of the specified, complex, and poorly understood mechanisms.

What if we didn't need to learn the degradation patterns themselves and could just monitor the degraded peptides coming out of the system???  These authors do this here and show the potential this may have -- there are big differences in different diseases!

I'm super psyched to discuss this paper with people who understand the biology behind this and congrats to this team for ---

The first Tweet is my perception of this great paper. The second Tweet -- well -- that's pretty funny...

Thursday, October 25, 2018

Boost your crosslinked peptide IDs by fixing your monoisotopic assignment!

Virtually all of proteomics data processing these days requires a proper monisotopic assignment to make a match. It's also probably no surprise that today's instruments are trained on perfect tryptic digests.

What if I told you that there is a huge spreadsheet showing that monoisotopic assignments of big peptides (like crosslinked peptide species) are messed up a large percentage of the time?!?

Don't worry! There's a fix and it's in press at JPR here!

The spreadsheet is in the supplemental -- and it's from a very modern instrument!

Wednesday, October 24, 2018

ap-Quant -- Powerful FDR controlled label free quan for everyone!

The apQuant paper is finally out!

What's this about? It's FDR controlled (by Percolator? what? I know!!) label free quan software that you can use in the free version of Proteome Discoverer (IMP-PD 2.1), PD 2.2 and PD 2.3.

Even better? It's fully compatible with MS2GO!! 

You can check out the paper here.

Tuesday, October 23, 2018

Cross-ID Beta is now available!

I may go on several days of posts on chemical crosslinking. It's something we're doing a lot -- both with some big successes so far and some big not quite so successes.

Good time to be getting into it because the field is blowing up, though!

Another great new tool (that I've been pressing the "refresh" button on their website for a few weeks, is CROSS-ID from the Heck lab).

Today a new button appeared (can't swear I looked yesterday) with a BETA DOWNLOAD. You can get it here! 

Thursday, October 18, 2018



MaxQuant.Live is here. Go there. Check your compatibility. And get it, yo!

Yo, Google, it just came out today! Of course it hasn't been commonly downloaded. Nuts. I'll probably have to go to work and download it -- but that's okay -- that's where the instruments are!!

Wednesday, October 17, 2018

Kinda last minute -- but there is a free XCMSOnline metabolomics class tomorrow!

EDIT: This blog exists in a wobbly space in time and space. This half-day course is on October 17, 2018!

I know this is the proteomics blog, but if you're also dabbling in the dark side with small mass ions, you can't find something as powerful and easy to use as XCMSOnline.

There is a half day course that starts at 8:30AM (California time -- if it was East Coast I would neither attend nor would I tell you about it. I'm going to assume no one attends in person because that is when people are supposed to be sleeping NOT sitting in meetings....) but they are livecasting it!

I've got to give a talk that I should probably...start...writing....umm....soon... but then I'll log in and try not to ask questions that are too dumb....

You can RSVP here! 

Tuesday, October 16, 2018

INSTANTCLUE!! Easy, ultrapowerful statistics for proteomics!

What a time to be in this field!! As soon as I think I've found the most powerful downstream analysis software package I've ever seen, someone shows me something with even more powers!!

First off, a big thank you to Dr. Ilian A for the link to this awesome paper (open access here!)

A lot of the scientists we get samples from are getting used to statistics. If I sit down with a biologist younger than myself, it's pretty much a given that I won't have to explain what a PCA plot is, because the computers they had in their stats class was powerful enough to run one. My stats class in the 90s didn't have a PC element to it. And -- if I had started clustering or building a PCA plot on 20 samples on a 486 computer, I bet you it STILL wouldn't be done. (However -- it WOULD have minesweeper built into it. The future isn't always pressing forward in every regard.)

Honestly -- most of them know stats way way better than I do and I'm starting to seem like the unprepared one.  Reminder of this sad titled paper....

What does all this rambling have to do with anything other than Ben's love of espresso and rapidly striking ergonomic keyboards???


Look -- if you've mastered Perseus -- good for you. You're awesome. And you probably don't need this. If you're an R superstar and it's easier for you to do everything in R (did you know some people make their slide decks in R rather than Powerpoint? (@AlexisLNorris))  -- then you probably don't need this either.

But if you need
1) A GUI interface that works in Windows, MacApLitosh, and Linux
2) That has amazing flexibility to upload data into
3) That has short, premade, well scripted tutorial videos in case you get stuck
4) Has every stat you ever heard of and a bunch maybe you weren't sure if you really heard or if it was someone who started to say a real word and then accidentally burped a little, threw in a muffled apology, and then finished what they were saying without inhaling. (Tell me that's not what "latent semantic analysis" sounds like.)
5) A way to rapidly export the cool stuff you find
6) And software you can get started with even if your number one goal for the day is to not read anything smart today at all ---

You should download Instant Clue here!

Sunday, October 14, 2018

Get those hydrophobic membrane peptides/proteins!

Membrane proteins are hard to get to. They've got super hydrophobic regions for stuffing inside membranes, they've often got multiple glycan domains and they can have annoying 3D structures that are just clumpy (best term I've got this morning). Even the most comprehensive global proteomics studies we've ever seen appear to under-represent membrane proteins. (Post I wrote on that last topic last year). 

This group essentially enriches for hydrophobic peptides by throwing in a high organic separation that results in a downstream loss of the most hydrophilic peptides. EDIT: Loss isn't the correct word. Let's go with "enrichment of hydrophobic peptides in relation to the general peptide population."

The RAW files are up at PeptideAtlas here and it's striking how much signal they get in the high organic section of their chromatogram through this process.

It's fair, I think, to mention that this approach isn't entirely new....

...but the changes are definitely novel enough to warrant checking out this iteration if you're looking at membrane proteins!

Saturday, October 13, 2018

The terrifying FDR Averaging study is live on biorXIV!!!

....Just a little early for Halloween...!! The scariest study of the year just went up on biorXIV here!

It seems less bad if you start with the fact that this team has a (computationally expensive) solution and I think it's already live on Crux.

Look -- we all know that all our FDR shortcut things (target decoy, Percolator, Elutator, and so on) are imperfect. And -- we know they need appropriate datasets to work right. This study starts out by pointing out what happens if the dataset that hits the FDR calculator IS NOT right. Fluctuations by as much as 20% in your peptide IDs, just by reshuffling your decoy sequences and searching the same data again???  Ummm.....

....yeah....fortunately for those of us who use...well...BASICALLY EVERY PIECE OF SOFTWARE I USE....when you make your decoy sequence, you end up using that one pretty much forever.

Let's see....when was my UniProt human decoy FASTA generated.....

Oh. The week I installed software on my new computer?

The reason this is so disturbing is that if I was using a program that would reshuffle my decoy FASTA every time, I would see this because, given random shuffling, my results could be very different each time I press the <RUN> button. Okay -- Honestly, from a reproducibility standpoint, making one decoy and sticking to it is a good thing and keeps people from asking questions like "wait. are you running my results through a random number generator?!?" and I'm grateful for the fact I don't have to answer this question. This paragraph is poorly written.

Okay -- but -- at the end of the day I want to give people the list that is the absolute closest representation of what the proteins that I can detect in the cells they gave me are doing. And if my current FDR methods are simply masking issues with the data that can be as extreme as described here -- I think upgrading the way I generate my lists and tell true from false needs to be put at the top of my priority list.

Friday, October 12, 2018

(Re?) evaluating MS1 quan algorithms!


Check this new paper out of Smith (Montanta) lab!


1) Here is the equation for how MaxQuant extracts peaks for quan
2) Here is the equation for how OpenMS extracts peaks for quan

Here is how they perform on a relatively simple standard that someone spent 10,000 hours(???) profiling in it's exhaustive entirety....

Now we have a new way to test any label free algorithm!!

Thursday, October 11, 2018

pFIND -- another "next gen" proteomics tool to check out!

Our spectral matching tools are going into some unprecedented territory right now in terms of the ridiculous power that they have. I doubt we'll ever get to a point where we disregard SeQuest and what it has continued to evolve into, but -- holy cow -- there is some amazing stuff out there right now.

A new entry in this amazing category is Open-pFind. Here is the bioRxiV link, but it's now in Nature something or other (can't find the link yet) -- which seems to be advanced beyond the preprint entry.

pFind isn't new, but this is pFind 3.1. As far as I can tell, a totally free GUI (you just have to go through a licensing procedure so they can keep track!) that you can get here.

What's it do? Well -- like the other entries in the category (the ones I use the most right now are Fragger/FragPipe and MetaMorpheus, but there are obviously others!) pFind doesn't care what modifications you're looking for. It blows up the search space to a huge level and then starts pulling out the modifications. You can find what you never thought to look for, in large scale.

It looks like it uses a different type of mechanism for making matches than the others. I

You know what this blog needs? A DEATHMATCH. There hasn't been a software DEATHMATCH in forever. Time to get a great dataset, these 3 software packages and pit them against one another in a vaguely scientific and moderately unbiased manner. Gotta come up with some rules, though....

Don't be distracted by my rambling --- there is some serious important stuff to learn in this paper.  Their tests are extensive and show both the power and weaknesses of other programs out there.

There is also some surprising insights (to me, at least) into the "Dark Proteome" stuff. And....well ....even about trypsin. It only cuts K/R, right? Right??

This is a great paper that deserves some serious attention.

Shoutout to @Karl_Mechtler for tipping me off to this great resource!

Wednesday, October 10, 2018

Did this team just use proteomics and machine learning to identify tissues and cells?!?!

Okay -- like everyone else -- I'm obsessed with all these buzzwords like "Machine intelligence", "Artificial Learning" and "Neurosis Networks".

I've been erring on the side of caution because it seems really easy to memorize these funny words and use them to get jobs where you just continue to repeat them. But mounting proof is coming that this stuff has real power. (The image above is from MarIO, a program you can download, connect to a video game emulator and watch the 2D character die and die and die until it can run a level perfectly -- cool, right?)

Obviously, we've had Percolator and other programs for a long time -- but outside of those they haven't impacted us all that much.  BOOM!  NEW ENTRY!!

I'm on the wrong computer so I can't read this yet -- but -- if this is real -- this is a seriously big deal. These people train a machine learning program to learn the profiles of different tissues and cells. I'm super motivated to get to a computer where I can read this --- I just need a huge data transfer to finish first!!

EDIT: Okay...we probably all knew that we could probably do this, right? The best part about this might be the fact that this group did.

EVERYTHING is open access on this (except the paper) -- you can get all the code and notes at Github here.

Tuesday, October 9, 2018

Proteomics of healthy aging in humans!

One of the gems of Baltimore is the National Institute on Aging. They do all sorts of cool stuff over there, but the one that I always think of first is the Baltimore Longitudinal Study (BLSA), which has been running since the 1950s!  The goal is to establish some understanding of what healthy aging and what is not....

I expect this new study is just the beginning of the cool stuff since they have started pushing proteomics over there again!

In this study they don't use LC-MS, instead opting for the SomaScan thing (which is up to 1,300 targets, now? That's a big bump since the last time I'd heard anything from it!)

I like this study because it shows that we don't always have to push for the highest number of targets to draw conclusions. Maybe there is just as much to learn if you use the same amount of time to run more samples and allow the use of better  statistics!

Monday, October 8, 2018

PaperSpray analysis of a Neurotransmitter from Whole Blood?

Does this open up as many possible new avenues for things as I feel like it does?  At the very least I didn't know this was possible at all and it's really cool! 

Do y'all know about this PaperSpray thing?  You literally just put a drop of blood or whatever on a piece of paper and charge the paper like it is a nanospray emitter. The liquid ionizes right off the edge of the paper and into your mass spec. Cool stuff, but I don't track toxic inorganic compounds or anything, so I haven't needed to do it.

BUT -- here -- this group blows the doors off. While their end goal is instant tracking of some terrifying sounding biological weapons in people -- what they also do here is quantify a neurotransmitter! From a tiny amount of blood! And they do a reaction on this piece of paper that is their ionization source.

Am I (just) crazy or does this now sound like PaperSpray has moved over from the "cool toy" to "this belongs in the clinic" category?!?

Sunday, October 7, 2018

It took 3 years (and for me to luck out and find a great team to do it! ) RIDAR time!

Hey -- this, might seem super self-serving, but I don't care. I'm sooooo psyched.

Okay --- so 3 years ago I was staring at the most important dataset that I've ever been involved in generating -- one full of riddles and no perceivable answers and I had an idea of how to maybe look at it in a new way.  But the idea had to go on the shelf, because I didn't have the brains, skills, expertise, talent, or brains, or brains to pull off what I wanted to do.

What I needed was a team with all of those things!  And this year I found one!! PROOF!

I don't know if anyone else will ever find this tool useful. I know that I'm using it daily (and, if I'm perfectly honest, that's all I care about, but I really really wanted it out there just in case and for everyone to see how smart the people around me are!)

Here is the scenario it was invented for:

I've got 24 fractions of reporter ion quan stuff. The LC-MS/MS was ran by an expert's expert (PNNL, FTW, yo!) and this phenotype is as extreme as you can possibly get. The control channels? Yeah... they were still alive when the samples went on the instrument....

And you know what I have from the total protein quan? Besides some concerns regarding my capabilities as a scientist? NOTHING. And, yeah, today I can delta mass search and I can de novo everything and whatever. There are a lot of tools now that weren't around when I got these files. What if I use these? I get a big ol' list of things. But...if the answer is here in these million spectra? I can't make sense of it. And let's face it, sometimes the best quan software tools aren't found in the same place as the best discovery tools. My favorite tools for discovery don't yet have this kind of quan -- and I need it here.

Okay -- so what if I get RIDAR from Conor's Github here. And I take my MGFs and I say -- only keep the MS/MS spectra that are >2,5, or 10-fold different between my controls and everybody else? (You have to edit the text file, but I've requested a GUI. That's how hard I am to work with, btw.... "Thanks for doing this amazing thing...can you make it so I don't have to open this document, change this number and then save it? That's.too.hard. Thhhaaaannnnkkkkssss.....!")

What does this enable?
1) I know these spectra that RIDAR keeps are quantitatively interesting. Now this opens up tools I love that don't have reporter quan built in. Fragger (is it FragPipe yet?), SearchGUI, Metamorpheus. KER-POW.  ALL THE POWER.

2) At 10-fold? I've only got a few thousand spectra -- and you know what they look like? PTM hotspots. Is it real? I don't know yet, but I do know it's the first lead I've ever had on these files. In the study we look at CPTAC data and -- you know what? -- it's similar. Sure, the proteins that change the most come to the top (you'll see bunches of peptides for them, but then you also see loads of spectra that are from one peptide/protein --> and it's PTMs EVERYWHERE.

Sorry if this seems self-promotional (said the blogger, lol!). I didn't make this. It ended up smarter than I ever imagined. (I still don't understand the normalization thing they came up with, but it works!) and now I have a tool I've wanted for years!

Saturday, October 6, 2018

Become a data scientist without an expensive computer!!!

In my house, Jeff Leak is a hero. Maybe in a lot of other houses.  I've never met him, but I've seen him speak and taken an online course he taught.  The dude does awesome science and somehow makes it approachable to more people than you'd believe possible.

Okay -- so this is right in line with stuff we're working on in Frederick -- how the heck do you become a data scientist if 24 patient samples is 300GB of Lumos data and you've got a PC with 2GB of RAM? Answer? No idea.

It's even worse for nextGen. 220GB PER PATIENT??? WHAT???

The Leak lab has knocked down one of these barriers by enabling real life Data Science on inexpensive ChromeBooks. Seriously -- you should check this out! 

Friday, October 5, 2018

Get one minute of access to run your program on a quantum computer for free!

My list of things to blog about is about 100 things long at this point. There is ridiculously cool stuff out of Max Planck and the Smith, Glaros, Pandey, Coon and Gundry labs that are at the very top of my -- "you've gotta see this!!" list and I keep getting distracted by off-target stuff that matters to what we're working on in Frederick.  Oh -- and HUPO was last week?!?!  

And I'm rambling about what matters directly to what we're doing in Frederick. 

If you sign up they'll let you have 1 minute of access for free. 1 minute? That's stupid, right? What can you do with 1 minute?