Thursday, October 18, 2018



MaxQuant.Live is here. Go there. Check your compatibility. And get it, yo!

Yo, Google, it just came out today! Of course it hasn't been commonly downloaded. Nuts. I'll probably have to go to work and download it -- but that's okay -- that's where the instruments are!!

Wednesday, October 17, 2018

Kinda last minute -- but there is a free XCMSOnline metabolomics class tomorrow!

EDIT: This blog exists in a wobbly space in time and space. This half-day course is on October 17, 2018!

I know this is the proteomics blog, but if you're also dabbling in the dark side with small mass ions, you can't find something as powerful and easy to use as XCMSOnline.

There is a half day course that starts at 8:30AM (California time -- if it was East Coast I would neither attend nor would I tell you about it. I'm going to assume no one attends in person because that is when people are supposed to be sleeping NOT sitting in meetings....) but they are livecasting it!

I've got to give a talk that I should probably...start...writing....umm....soon... but then I'll log in and try not to ask questions that are too dumb....

You can RSVP here! 

Tuesday, October 16, 2018

INSTANTCLUE!! Easy, ultrapowerful statistics for proteomics!

What a time to be in this field!! As soon as I think I've found the most powerful downstream analysis software package I've ever seen, someone shows me something with even more powers!!

First off, a big thank you to Dr. Ilian A for the link to this awesome paper (open access here!)

A lot of the scientists we get samples from are getting used to statistics. If I sit down with a biologist younger than myself, it's pretty much a given that I won't have to explain what a PCA plot is, because the computers they had in their stats class was powerful enough to run one. My stats class in the 90s didn't have a PC element to it. And -- if I had started clustering or building a PCA plot on 20 samples on a 486 computer, I bet you it STILL wouldn't be done. (However -- it WOULD have minesweeper built into it. The future isn't always pressing forward in every regard.)

Honestly -- most of them know stats way way better than I do and I'm starting to seem like the unprepared one.  Reminder of this sad titled paper....

What does all this rambling have to do with anything other than Ben's love of espresso and rapidly striking ergonomic keyboards???


Look -- if you've mastered Perseus -- good for you. You're awesome. And you probably don't need this. If you're an R superstar and it's easier for you to do everything in R (did you know some people make their slide decks in R rather than Powerpoint? (@AlexisLNorris))  -- then you probably don't need this either.

But if you need
1) A GUI interface that works in Windows, MacApLitosh, and Linux
2) That has amazing flexibility to upload data into
3) That has short, premade, well scripted tutorial videos in case you get stuck
4) Has every stat you ever heard of and a bunch maybe you weren't sure if you really heard or if it was someone who started to say a real word and then accidentally burped a little, threw in a muffled apology, and then finished what they were saying without inhaling. (Tell me that's not what "latent semantic analysis" sounds like.)
5) A way to rapidly export the cool stuff you find
6) And software you can get started with even if your number one goal for the day is to not read anything smart today at all ---

You should download Instant Clue here!

Sunday, October 14, 2018

Get those hydrophobic membrane peptides/proteins!

Membrane proteins are hard to get to. They've got super hydrophobic regions for stuffing inside membranes, they've often got multiple glycan domains and they can have annoying 3D structures that are just clumpy (best term I've got this morning). Even the most comprehensive global proteomics studies we've ever seen appear to under-represent membrane proteins. (Post I wrote on that last topic last year). 

This group essentially enriches for hydrophobic peptides by throwing in a high organic separation that results in a downstream loss of the most hydrophilic peptides. EDIT: Loss isn't the correct word. Let's go with "enrichment of hydrophobic peptides in relation to the general peptide population."

The RAW files are up at PeptideAtlas here and it's striking how much signal they get in the high organic section of their chromatogram through this process.

It's fair, I think, to mention that this approach isn't entirely new....

...but the changes are definitely novel enough to warrant checking out this iteration if you're looking at membrane proteins!

Saturday, October 13, 2018

The terrifying FDR Averaging study is live on biorXIV!!!

....Just a little early for Halloween...!! The scariest study of the year just went up on biorXIV here!

It seems less bad if you start with the fact that this team has a (computationally expensive) solution and I think it's already live on Crux.

Look -- we all know that all our FDR shortcut things (target decoy, Percolator, Elutator, and so on) are imperfect. And -- we know they need appropriate datasets to work right. This study starts out by pointing out what happens if the dataset that hits the FDR calculator IS NOT right. Fluctuations by as much as 20% in your peptide IDs, just by reshuffling your decoy sequences and searching the same data again???  Ummm.....

....yeah....fortunately for those of us who use...well...BASICALLY EVERY PIECE OF SOFTWARE I USE....when you make your decoy sequence, you end up using that one pretty much forever.

Let's see....when was my UniProt human decoy FASTA generated.....

Oh. The week I installed software on my new computer?

The reason this is so disturbing is that if I was using a program that would reshuffle my decoy FASTA every time, I would see this because, given random shuffling, my results could be very different each time I press the <RUN> button. Okay -- Honestly, from a reproducibility standpoint, making one decoy and sticking to it is a good thing and keeps people from asking questions like "wait. are you running my results through a random number generator?!?" and I'm grateful for the fact I don't have to answer this question. This paragraph is poorly written.

Okay -- but -- at the end of the day I want to give people the list that is the absolute closest representation of what the proteins that I can detect in the cells they gave me are doing. And if my current FDR methods are simply masking issues with the data that can be as extreme as described here -- I think upgrading the way I generate my lists and tell true from false needs to be put at the top of my priority list.

Friday, October 12, 2018

(Re?) evaluating MS1 quan algorithms!


Check this new paper out of Smith (Montanta) lab!


1) Here is the equation for how MaxQuant extracts peaks for quan
2) Here is the equation for how OpenMS extracts peaks for quan

Here is how they perform on a relatively simple standard that someone spent 10,000 hours(???) profiling in it's exhaustive entirety....

Now we have a new way to test any label free algorithm!!

Thursday, October 11, 2018

pFIND -- another "next gen" proteomics tool to check out!

Our spectral matching tools are going into some unprecedented territory right now in terms of the ridiculous power that they have. I doubt we'll ever get to a point where we disregard SeQuest and what it has continued to evolve into, but -- holy cow -- there is some amazing stuff out there right now.

A new entry in this amazing category is Open-pFind. Here is the bioRxiV link, but it's now in Nature something or other (can't find the link yet) -- which seems to be advanced beyond the preprint entry.

pFind isn't new, but this is pFind 3.1. As far as I can tell, a totally free GUI (you just have to go through a licensing procedure so they can keep track!) that you can get here.

What's it do? Well -- like the other entries in the category (the ones I use the most right now are Fragger/FragPipe and MetaMorpheus, but there are obviously others!) pFind doesn't care what modifications you're looking for. It blows up the search space to a huge level and then starts pulling out the modifications. You can find what you never thought to look for, in large scale.

It looks like it uses a different type of mechanism for making matches than the others. I

You know what this blog needs? A DEATHMATCH. There hasn't been a software DEATHMATCH in forever. Time to get a great dataset, these 3 software packages and pit them against one another in a vaguely scientific and moderately unbiased manner. Gotta come up with some rules, though....

Don't be distracted by my rambling --- there is some serious important stuff to learn in this paper.  Their tests are extensive and show both the power and weaknesses of other programs out there.

There is also some surprising insights (to me, at least) into the "Dark Proteome" stuff. And....well ....even about trypsin. It only cuts K/R, right? Right??

This is a great paper that deserves some serious attention.

Shoutout to @Karl_Mechtler for tipping me off to this great resource!

Wednesday, October 10, 2018

Did this team just use proteomics and machine learning to identify tissues and cells?!?!

Okay -- like everyone else -- I'm obsessed with all these buzzwords like "Machine intelligence", "Artificial Learning" and "Neurosis Networks".

I've been erring on the side of caution because it seems really easy to memorize these funny words and use them to get jobs where you just continue to repeat them. But mounting proof is coming that this stuff has real power. (The image above is from MarIO, a program you can download, connect to a video game emulator and watch the 2D character die and die and die until it can run a level perfectly -- cool, right?)

Obviously, we've had Percolator and other programs for a long time -- but outside of those they haven't impacted us all that much.  BOOM!  NEW ENTRY!!

I'm on the wrong computer so I can't read this yet -- but -- if this is real -- this is a seriously big deal. These people train a machine learning program to learn the profiles of different tissues and cells. I'm super motivated to get to a computer where I can read this --- I just need a huge data transfer to finish first!!

EDIT: Okay...we probably all knew that we could probably do this, right? The best part about this might be the fact that this group did.

EVERYTHING is open access on this (except the paper) -- you can get all the code and notes at Github here.

Tuesday, October 9, 2018

Proteomics of healthy aging in humans!

One of the gems of Baltimore is the National Institute on Aging. They do all sorts of cool stuff over there, but the one that I always think of first is the Baltimore Longitudinal Study (BLSA), which has been running since the 1950s!  The goal is to establish some understanding of what healthy aging and what is not....

I expect this new study is just the beginning of the cool stuff since they have started pushing proteomics over there again!

In this study they don't use LC-MS, instead opting for the SomaScan thing (which is up to 1,300 targets, now? That's a big bump since the last time I'd heard anything from it!)

I like this study because it shows that we don't always have to push for the highest number of targets to draw conclusions. Maybe there is just as much to learn if you use the same amount of time to run more samples and allow the use of better  statistics!

Monday, October 8, 2018

PaperSpray analysis of a Neurotransmitter from Whole Blood?

Does this open up as many possible new avenues for things as I feel like it does?  At the very least I didn't know this was possible at all and it's really cool! 

Do y'all know about this PaperSpray thing?  You literally just put a drop of blood or whatever on a piece of paper and charge the paper like it is a nanospray emitter. The liquid ionizes right off the edge of the paper and into your mass spec. Cool stuff, but I don't track toxic inorganic compounds or anything, so I haven't needed to do it.

BUT -- here -- this group blows the doors off. While their end goal is instant tracking of some terrifying sounding biological weapons in people -- what they also do here is quantify a neurotransmitter! From a tiny amount of blood! And they do a reaction on this piece of paper that is their ionization source.

Am I (just) crazy or does this now sound like PaperSpray has moved over from the "cool toy" to "this belongs in the clinic" category?!?

Sunday, October 7, 2018

It took 3 years (and for me to luck out and find a great team to do it! ) RIDAR time!

Hey -- this, might seem super self-serving, but I don't care. I'm sooooo psyched.

Okay --- so 3 years ago I was staring at the most important dataset that I've ever been involved in generating -- one full of riddles and no perceivable answers and I had an idea of how to maybe look at it in a new way.  But the idea had to go on the shelf, because I didn't have the brains, skills, expertise, talent, or brains, or brains to pull off what I wanted to do.

What I needed was a team with all of those things!  And this year I found one!! PROOF!

I don't know if anyone else will ever find this tool useful. I know that I'm using it daily (and, if I'm perfectly honest, that's all I care about, but I really really wanted it out there just in case and for everyone to see how smart the people around me are!)

Here is the scenario it was invented for:

I've got 24 fractions of reporter ion quan stuff. The LC-MS/MS was ran by an expert's expert (PNNL, FTW, yo!) and this phenotype is as extreme as you can possibly get. The control channels? Yeah... they were still alive when the samples went on the instrument....

And you know what I have from the total protein quan? Besides some concerns regarding my capabilities as a scientist? NOTHING. And, yeah, today I can delta mass search and I can de novo everything and whatever. There are a lot of tools now that weren't around when I got these files. What if I use these? I get a big ol' list of things. But...if the answer is here in these million spectra? I can't make sense of it. And let's face it, sometimes the best quan software tools aren't found in the same place as the best discovery tools. My favorite tools for discovery don't yet have this kind of quan -- and I need it here.

Okay -- so what if I get RIDAR from Conor's Github here. And I take my MGFs and I say -- only keep the MS/MS spectra that are >2,5, or 10-fold different between my controls and everybody else? (You have to edit the text file, but I've requested a GUI. That's how hard I am to work with, btw.... "Thanks for doing this amazing thing...can you make it so I don't have to open this document, change this number and then save it? That's.too.hard. Thhhaaaannnnkkkkssss.....!")

What does this enable?
1) I know these spectra that RIDAR keeps are quantitatively interesting. Now this opens up tools I love that don't have reporter quan built in. Fragger (is it FragPipe yet?), SearchGUI, Metamorpheus. KER-POW.  ALL THE POWER.

2) At 10-fold? I've only got a few thousand spectra -- and you know what they look like? PTM hotspots. Is it real? I don't know yet, but I do know it's the first lead I've ever had on these files. In the study we look at CPTAC data and -- you know what? -- it's similar. Sure, the proteins that change the most come to the top (you'll see bunches of peptides for them, but then you also see loads of spectra that are from one peptide/protein --> and it's PTMs EVERYWHERE.

Sorry if this seems self-promotional (said the blogger, lol!). I didn't make this. It ended up smarter than I ever imagined. (I still don't understand the normalization thing they came up with, but it works!) and now I have a tool I've wanted for years!

Saturday, October 6, 2018

Become a data scientist without an expensive computer!!!

In my house, Jeff Leak is a hero. Maybe in a lot of other houses.  I've never met him, but I've seen him speak and taken an online course he taught.  The dude does awesome science and somehow makes it approachable to more people than you'd believe possible.

Okay -- so this is right in line with stuff we're working on in Frederick -- how the heck do you become a data scientist if 24 patient samples is 300GB of Lumos data and you've got a PC with 2GB of RAM? Answer? No idea.

It's even worse for nextGen. 220GB PER PATIENT??? WHAT???

The Leak lab has knocked down one of these barriers by enabling real life Data Science on inexpensive ChromeBooks. Seriously -- you should check this out! 

Friday, October 5, 2018

Get one minute of access to run your program on a quantum computer for free!

My list of things to blog about is about 100 things long at this point. There is ridiculously cool stuff out of Max Planck and the Smith, Glaros, Pandey, Coon and Gundry labs that are at the very top of my -- "you've gotta see this!!" list and I keep getting distracted by off-target stuff that matters to what we're working on in Frederick.  Oh -- and HUPO was last week?!?!  

And I'm rambling about what matters directly to what we're doing in Frederick. 

If you sign up they'll let you have 1 minute of access for free. 1 minute? That's stupid, right? What can you do with 1 minute? 


Thursday, October 4, 2018

Really wanna see if Ion Mobility will help you out? Here are the plans to 3D print your own.

Honestly, the best thing my niece and I have ever been able to print was a crappy Tie Fighter. Considering she's 10, I'm guessing it's her fault.

But what if instead of us focusing on printing a T.A.R.D.I.S. that isn't terrible, we made an ION MOBILITY MASS SPEC the next time I visited?

You can find Dr. Cooks's most recent entry in the Coolest people in the world competition here.

I've been through it and the supplemental info while distracted by the atrocity that is my cafeteria's possibly 3D printed version of "General Tsao's Vulture" but can't find direct links to the files to upload into a printer program, but there are lots of measurements in case you actually know how to print things.

Wednesday, October 3, 2018

POSTNOVO!! Is this percolator for de novo search engines?

This is the first image that showed up when I looked up POSTNOVO. Google translate was somewhat less than exceptionally helpful, so would someone let me know if it says something reeeeeaaally bad? Thanks!

Big shoutout to Hugo (from Porto!!) for reassuring me that I didn't post anything bad above. (An old friend of mine has some large Kanji symbols tattooed on his arm in college. He was a victim of a somewhat hilarious prank and they don't mean what he was told they mean. To make it even funnier, he lives in Japan now, with an awesome joke tattooed on himself forever. You can never be too careful!)

And I looked up POSTNOVO because of this Just Accepted Manuscript at JPR!

I'll be honest. I forgot my work computer and I can't actually read this. But the abstract says it's like Percolator for de novo searches.

What's the biggest problem with de novo? FDR!!!
Whats Percolator best at?  Fixing FDR problems when it has loads of data to look at. De novo produces loads of data (mostly bad, and I'm not being mean) and its really hard for normal engines to look through so much. When PepNovo came out (that's Ari's one, right? I'm pretty sure) I set it once to allow up to 20 results per MS/MS spectra. And regretted it because a LOT of them came back with 20 possible sequences. How do you search through all that without going crazy, becoming nocturnal, and adopting a bunch of elderly dogs? No idea.

But you could go and hunt down PostNovo's Github (here) and dig around.

What do you find? You find that it doesn't look all that hard at all to connect this awesome new tool to the DENOVO GUI, the ultrapowerful software from the amazing people at CompOmics.

And if that isn't enough for you to look for your laptop and read the paper, probably nothing is!

Clarification: This isn't Percolator, but it's described as post processing for de novo, like Percolator is for regular searches. I wasn't clear.

Tuesday, October 2, 2018

A lot of software just quantifies the 3 most abundant peptides. This is where it came from!

I honestly might just be putting this here for me. Gee golly, this paper is hard to find.


For real. It's this paper, best I've ever been able to tell.

If the world gives you lEMonS and you can somehow come up with something you, honest to gosh, have my respect!

Am I writing this because I have to cite this in something that is due to editors in like 3 hours and 54 minutes? Yes. Was this blog post a waste of time? Yes. Today. But it would save me hours down the road, for sure!  Did I just schedule an extra meeting with my therapist just now due to repressed memories emerging of when I used the hardware they used in this paper? Maybe.....

Monday, October 1, 2018

What's cooler than skiing in Poland? Skiing in Poland with 130 proteomics nerds!!

Registration is open today for the 2019 EuBiC Winter School!

Is it hosted at a ski resort? Yes!

Is it a workshop focused on informatics and quantitative proteomics? Yes!

Do you also love all 3 of those things?

What about a PROTEOMICS HACKATHON!?!??!! I don't even know what that is! And I'm still going!

You can find out more here!

Sunday, September 30, 2018

Longitudinal analysis of the human "Exposome" shows amazing promise and some scary stuff!

This concept is BIG, has far reaching and hard-to-fathom potential consequences, and is something we should all be thinking about.

I'm not qualified to talk about any of that (it may not stop me) but what about the stuff I do kinda understand? First off -- this is the paper (direct link here)!

What is the Exposome? The authors define it here as "....human airborne environmental biotic and abiotic exposures..." so....the stuff in our air that we're getting exposed to, coming from living things or non-living.

Interested? You should be!

Okay -- so they monitor 15 individuals around the world for up to 2 years. They had a "wearable device" of some kind that collected samples of the stuff they're exposed do. They do a ton of genetics stuff on the people and the samples that are collected (I think. this is a Cell paper, it's like 100 pages and I do have a job).

Now the interesting stuff ---> for the exposure stuff, the samples are ran on a UHPLC coupled Exactive using a cool mixed mode column (to presumably separate both polar and nonpolar compounds well) and -- the details are kind of fuzzy in the methods -- but it appears they ran each sample in positive and negative? or with pos/neg switching with 100,000 resolution.

The data was searched with XCMS and someone on this team is an R fanatic (or epidemiologist -- which might be redundant). I've never seen so many individual packages utilized in a single study -- but the genomics and the geographic data are all statistically tied together and ----

We're exposed to TONS of stuff, both from living and non-living sources. And -- geography plays a huge role. And -- there is some clear looking (though mysterious in their actual meaning) links between what you are exposed to and what is going on in your genetics.

Probably not the right response -- but I am certainly definitely completely not qualified to judge. However, this is a really though-provoking paper in a field where our technologies will obviously be able to help!

Friday, September 28, 2018

20-must read papers for proteomics students, courtesy of the Liu lab!

I can't claim any credit for this awesome list that  Yangsheng Liu posted on his lab webpage. I don't even know who originally found it!

I will be permanently linking this list over there --> in the Newbies section later!

You can check out this great list here!

Thursday, September 27, 2018

Positional phospho-isomers are a problem -- get a Thesaurus!!

(Image stolen from [please don't sue me])!

I've been dying to talk about this one since seeing a talk sometime in the spring about it!

Did you know that phosphorylations are commonly associated with phosphorylations on amino acids right beside them or just a few amino acids away?!?  I didn't, but I've asked a bunch of biologists and they said it's true.

This is one of the many cool insights that you could find in this new preprint on THESAURUS!

Before you get too excited -- Thesaurus is for DIA and PRM data. Wait -- You're more excited?!?!

(Groans.....okay...last one, probably....)

Thesaurus is software. You might have guessed that from a couple of the names on the paper. And it -- okay -- figure 1 is awesome and explains it better than I possibly could.

Somebody is good at making flowcharts. The end result of running through that logical circle is going to be a test of whether phosphorylation at E or F is the best match ---

Okay ---- last stolen picture for this post -- but this is the ABSOLUTE COOLEST PART --- what if it is both of them? Because it biologically makes sense that it could be. No -- not phosphoRS doesn't have enough information to discern which one it is and gives you 50/50 so you just report both --- like, biologically it can and does totally happen that you'll get an almost perfectly co-eluting pair of peptides that is both phosphopeptides E and F (obviously not in the example above, but it really does happen a lot (proof in this great paper!)).

But this is the ABSOLUTE COOLEST PART (wait. I said that. I'm excited.) in modern dd-MS2 -- we skip the second one! Almost always!  We're so certain of our massively improved peak shapes and the efficiency of our instruments in making an ID on the first fragmentation that most of us use dynamic exclusion to trigger at some (by historical standards) ludicrously low peptide intensity -- and then we exclude peptides of that exact mass from being fragmented for huge amounts of time. So if there are 2 positional isomers eluting at almost the same time -- we don't see it.

Is it possible that our improved methods and instruments is actually decreasing our phosphopeptide ID recovery? Yeah, it totally is.

EDIT: Forgot this part --> In DIA and PRM you are constantly acquiring MS/MS spectra for your mass range in a cycle. So you can see fragmentation patterns of two almost completely co-localizing phosphopeptides and Thesaurus can help you identify them!

I think DIA has kinda been floating around looking for something that it's good at -- or better at than dd-MS2 -- this might actually be that thing.

Thesaurus is shown working in conjunction with Skyline throughout the paper. It can also function as a complete stand-alone and you can get it here.

(They got this wrong, btw....🙉🙉🙉🙉🙉🙉🙉!!!!!)