Monday, September 28, 2015

New free label free node for Proteome Discoverer 2.0!

2 hours in and complete brain overload here at HUPO!  So much good science out there in the field!

One important side note that I think you guys will like, though. The first free OpenMS nodes for Proteome Discoverer 2.0 are now available for download. The first is LFQ for label free quan!  The second is a workflow in OpenMS I'm unfamiliar with, but will read up on ASAP.

You can download these here.  (Let me know how you like them! I can't wait to give 'em a shot!)

Saturday, September 26, 2015

Quantitative thermal proteome profiling!

About a year ago a paper came out that introduced me to the concept of Thermal Proteome Profiling.  While this concept will likely have several different applications, it is definitely really good at figuring out what proteins a drug is interacting!

This month, a brand new paper out of the Savitski lab takes this idea another step further. In this work they use TMT10plex reagents AND thermal proteome profiling to determine what proteins are being directly affected by certain drugs.

Being a Nature Protocols paper, the methodology is set out completely so that any of us can go right out and replicate it. This team also developed software for Python and R that can be used to process the data and they make this all available.

Are you stuck on what that stupid drug's mechanism of action is? You should probably check out this paper!

Friday, September 25, 2015

The long-awaited PD 2.0 (2.0) NIH Training Workshop!

This summer one of my biggest workshops was a PD 2.0 training workshop Alison Wiedergreen set up at the NIH Bethesda campus. It was really well attended and there were tons of great questions.

The crowd requested a follow-up workshop after they got going with the software and we finally made the schedule work!  I'm happy to announce PD 2.0 workshop 2.0!

To register you can follow this link!

In the AM the brilliant and charismatic Dr. Talamantes will be going over the basic functions of PD 2.0, including getting you going if you are new to PD completely or if you have used PD 1.x in the past.

In the afternoon I'll stop watching and we'll go over some more specific workflows. I think we'll look at some real big datasets and how to organize them as well as maybe how to combine quantitative analysis of global proteins with phosphoproteomics or something similar.  If you have a suggestion for what you specifically want to take a look at, shoot me a suggestion and/or a dataset and we'll try to make time for it!  We'll also look at what is coming in Proteome Discoverer 2.1 (which is just a bunch of improvements to the PD 2.0 interface. It works and looks just about the same, I promise!)

Is October 29 too early for halloween costumes?

Thursday, September 24, 2015

NeuCode labeling nematodes!

As depicted in the clip above, nematodes are hungry little guys. In this brand new paper in press at MCP from Rhoads and Prasad et al., we see a new way of taking advantage of this trait.

These Badgers fed nematodes NeuCode labeled E.coli and, voila!, NeuCode labeled nematodes!

Now, I know I've rambled on about NeuCode in this blog a bunch, but if you are unfamiliar there is a good description in this GenomeWeb article here.  In a nutshell, its very much like SILAC except by NEUtron endCODE(ing) the mass discrepancies between the various channels are very very small. You are limited by the number of channels you can use by the maximum resolution of your instrument. More resolution = more NeuCode channels.

(I stole the figure above from this open access paper here.)  P.S., the technology has been progressing significantly since the original study. I saw a slide a while back that suggested 40-plex is theoretically possible.

In this study, the nematodes are studied with an Orbitrap Elite that is running 480,000 resolution at the MS1 and 30k resolution at the MS/MS (which they refer to as "medium" resolution! man, I love this field!!!)

Now, you might think 480,000 resolution? That's so slow, they'll never identify anything that way!  What did they see? The top 50 most abundant proteins? Well, they did a little better than that. This might be the single most extensive proteome of the nemotode out there. Along the way they did phosphoproteomics and also worked out some of the key regulators or stress response in this important model organism.

Wednesday, September 23, 2015

Analysis of phosphopeptide enrichment strategies

About a year ago I had a great conversation with a scientist from Cell Signaling who described the work they were doing with differential phosphopeptide enrichment. Now they have some figures up that describe the awesome work they've been doing!

If you are still using the FACE technique or a series of different enrichment strategies leading up to FACE you might want to take a step back and think about what you want to get out of your samples. Is a generic anti-phosphotyrosine antibody still the best for what you want out of your analysis? If you are interested in pathways, for example, that preferentially use phospho-Ser, maybe there is a better option now than we had 5 years ago!

Tuesday, September 22, 2015

Proteinaceous -- Where to get resources for Prosight!

If you are looking for information on top down proteomics via Prosight or info on the Prosight nodes for Proteome Discoverer, you need to check out

Here is a direct link in case you have as much trouble spelling it right as I did...

Thursday, September 17, 2015

Nature Milestones mass spectrometry!

Shoutout to David Kusel for the link for this one! Is there anything about the history of mass spectrometry that you've ever wanted to know? This Nature Milestones project probably has it covered. This was compiled by a huge list of authors who all seem to know at least a little about this field and is written to be accessible to everyone.  It would make a great reference for our customers or collaborators who aren't really sure what magic we're doing in our noisy rooms!

Wednesday, September 16, 2015

LC-MS/MS applied to directly study DNA damage in Wilson's disease

This new paper in press from Yang Yu et al., at MCP is fascinating for a ton of reasons. First one, I have never heard of Wilson's disease and I had to read up on it in this Wikipedia article. In a nutshell, its a recessive genetic disease. If you get stuck with two of the copies because your traitless parents both had it then you accumulate excessive copper in your system. This copper messes with your liver and maybe your brain and it is somewhat subtle and very difficult to diagnose. Sometimes you have oddness in your eyes that is indicative, as shown above.

Another reason this is fascinating? They detail a painstaking method of directly analyzing DNA damage via LC-MS. The introduction of stable isotopes leads to an absolute quantification method via triple quad and ion trap mass spec. It is really a fascinating method because when we think DNA damage, we think about assessing downstream effects (got the right affect/effect this time, I think!). If I want to quantify DNA damage, I'm going with phospho-H2AX quantification or something like that. These guys cut out all the middlemen and go right to the DNA!

Tuesday, September 15, 2015

Biocrates -- QC'ed kits for metabolomics!

I'll not pretend to be a metabolomics expert, but its super interesting, right? In terms of sample prep, they have it far far worse than we do. At least we know how to get most proteins in one process. Metabolites? That's a different story.

Biocrates is a company that hopes to make MS based metabolomics easy. They produce QC'ed kits that are specifically focused on clinically interesting metabolites. You get the sample prep kit, the conditions for the experiment and the software to process the data. What you need is the mass spec -- and you are doing full out metabolomics!

Currently they kits are optimized for triple-quads but they are in the process of getting these powerful tools validated for the Q Exactives!

Washington and Baltimore mass spectrometry club

Due to some changes in what I do during the day I now get to spend a whole lot more time in the same state where my house and my dog live.  In my exploration of the area for fun things to do, I lucked into a last minute chance to go to the D.C.-Baltimore mass spec club!

If you are around the area you should check this out. The website is here.

I learned a bunch of stuff!  I  got to meet a senior scientist at AP-MALDI and I think I'm going to get the chance to set up a source and do some imaging mass spec on a Q Exactive!!!

If you're around the area anywhere you should come and check this fun group out. I can't imagine missing another one of the meetings!

Monday, September 14, 2015

Proteome Discoverer 2.0 / 2.1 workshop in Vancouver!

If you are going to beautiful Vancouver for International HUPO, you might want to pop by the Proteome Discoverer workshop where you'll get to see the introduction of this guy!
Wait? What? We just started using PD 2.0...are you crazy? Yes, but that's beside the point. PD 2.1 is a follow-up package that looks just like PD 2.0 but better. There were features and improvements that were recommended by all you users out there that just couldn't make the 2.0 cut.  Its so good that I pretty much just use PD 2.1 for everything.

Here are the details!!  This is meant to be interactive, not "DEATH BY POWERPOINT". Bring questions, data, whatever. This is great software and we want you to walk out of there with the ability to generate better data!

Here are the details I have right now. I'll add more info as I get it.

Saturday, September 12, 2015

The effect of peptides/protein filters.

We're a field that loves to count things! As we've matured as a field the numbers have been a great benchmark for us. And they keep getting bigger all the time. Better sample prep, better separation technologies and faster, more sensitive instrumentation is making it possible to generate data in hours what used to take days or weeks not all that long ago.  

In papers where we detail these new methodologies, we see one type of protein count filter, and I think we see something a little different when we look at the application of these technologies. If you want to show off how cool your new method is, when you count up the number of proteins you found you are going to go with 1 peptide per protein for certainty. In my past labs we recognized those methods as great advances, but I sure had better have at least 2 good peptides before I justified ordering an antibody!

 I don't mean to add to the controversy, by any means. I think using one peptide per protein can be perfectly valid. Heck, we have to trust our single peptide hits when we're doing something like phosphoproteomics! Cause there's just one of them. And if I'm sending my observations downstream for pathway analysis, I'm gonna keep every data point available.  I just wanted to point out how the data changes.

I downloaded a really nice dataset the other day. Its from this Max Planck paper and uses the rocket-fast QE HF. I picked one of the best runs from the paper and ran it through my generic Proteome Discoverer 2.x workflow.

In 2 hours I get about 87,000 MS/MS spectra. If I set my peptide-protein so that any single peptide means a protein with this setup I get 5,472 proteins from this run.

Now, if I apply the filter 2 peptides per protein minimum...

Ouch!  I lose over 1,200 proteins!  

Are they any good?

Okay, this is an extreme outlier, but this protein is annotated in Uniprot, so its a real protein and it only has one peptide!  This is 92% coverage!  I didn't know there were entries this short in there.  If we went 2 peptides/protein we'd never ever see this one.

The best metric here probably is looking at FDR at the protein level.  (I did it the lazy target decoy way 1% high confidence/ 5% medium confidence filter)

Its interesting. Of these 1,200 single hit proteins, about 150 of them are red (so...below 95% confidence). Another 150 or so are yellow, but the rest ~900 proteins are scored as high confidence at the protein level.

Okay. I kind of went off the rails a little. Really, what I wanted to take away from this is how very much using a 2 peptide count filter can affect your protein counts.  The difference between identifying 5400 proteins and 4200? Thats a big deal and worth keeping in mind. Is your data going to be more confident if you require this filter? Sure. Are you losing some good hits? Sure, but its your experiment and you should get the data out at the level of confidence that you want it!

Friday, September 11, 2015

pVIEW: tons of tools, including 15N (n15) quantification

I swear I wrote a blog post on this years ago. Seriously, but it took me forever to find this software and then re-remember how to use it.

pVIEW is a really nice piece of software. It does a ton of different things...including 15N quantification!

It is incredibly user-friendly. If you are using a Thermo Instrument, I highly recommend you download this tool as well:

Whats it do? Well, you click on it and show it a directory. And then, without complaining or without any extra steps it converts you data rapidly and perfectly to mZxmL (not sure I capitalized the right stuff.) Then you can pull your data right into pVIEW.

pVIEW can be downloaded at the Princeton Proteomics and Mass Spec core website here.

Thursday, September 10, 2015

Run complete programs on any system with BioDocker! Wait...what's a BioDocker?

Okay. I'm going to be pretty excited about this whole thing, cause I knew about exactly none of this 20 minutes ago.

It is totally awesome that we have all these talented programmers and bioinformaticians out there writing interesting new code. A problem is that, just like any expert in anything, they start talking their expert-ese and it becomes hard for outsiders to figure out what they are talking about. I take things about proteomics -terms and such, for granted all the time even though I try very hard not to.

This is an acknowledged problem in their field. That they can't reach users cause sometimes users don't know what a Perl thingy is.  Even worse, maybe someone assumes that you have that Perl thing on your PC because they've had it on every PC they've owned in the last 15 years.

An awesome effort is underway and its called Docker. Its generic for everybody, but what I can understand of it is that its a "container" for a program that includes all the requirements for running it. Say you need that Perl thing and some Perl add-in things, then it would be included in the Docker.

A more focused thing for us is BioDocker. Same goal, but specifically for bioinformatics type stuff.  Sounds great, right?!?

BTW, I'm learning all this from Yasset's blog.
Cause you know what? They've already constructed two awesome proteomics BioDockers.  The first is the all-powerful Trans Proteomics Pipeline and the second is the DIA-Umpire!

Is it simple enough that a dummy like me can use it? Actually...I think it might be...not without challenges, but its getting there!

If it isn't 100% what we need/want right now, its a great step in the right direction. Lets get all these awesome tools and put them into an easily digestible format. They get more users which hopefully translates into more grant justifications and more cool algorithms and we get better data!  Win win win!

Wednesday, September 9, 2015

Video of Dr. Makarov talking about every Orbitrap!

I just stumbled on this video and its pretty sweet. Its Alexander Makarov talking about every Orbitrap -- from the classic all the way to the Lumos and the developments each one went through!  Worth the half-hour for me!

Tuesday, September 8, 2015

Macrophage S1P Chemosensing -- and an interesting way of integrating genomics and proteomics!

All this next gen sequencing data out there!  How do we leverage all of it to our advantage? We can supplement our databases for mutations and we can cross-reference our quan, but this new paper from Nathan Manes et al., out of

In this new paper at MCP from Nathan Manes et al., out of the CNPU these researchers describe a different twist on integrating next gen sequencing data with LC-MS/MS.

The model is also super interesting. The study investigates osteoclasts, the cells that destroy bone. During normal maintenance osteoclasts break down bone where appropriate and osteoblasts rebuild it. This is a tightly controlled process (involving chemotaxis), but one that is only partially understood. Disregulation of this tight process leads to many different diseases, the most common of which is osteoporosis.

The focus of this study is the use of next gen sequencing technology and mass spec to explore that pathway. As a model they have some mouse cells that function like osteoclasts and they can add the right chemotaxic things to activate them. Cool, right?!?

First they started out with the next gen sequencing following all the normal protocols (they did deep sequencing via Hi-Seq) to get a list of transcripts that were differentially regulated in a significant matter.

Then they went a different direction. They used an in-depth literature search to hunt down proteins that have been implicated in these pathways. Some of this info comes from other quantitative proteomics studies and others come from genomics techniques. Why reproduce data that is already out there for free!  Strong protein candidates were filtered and heavy copies of good peptides were made to develop an absolute quantification method for SRM analysis for these targets.

To wrap it all up they took the results from their next-gen and from their absolute LC-MS quan and compared it (it compares strikingly well!) and then they dumped it all into a cool modeling program called Simmune that they developed that you can check out (and download for free) here!

Great, interesting study on an interesting model that uses some really original thinking and tools.

Monday, September 7, 2015

File migration...

Hey guys!  The following pages through the blog are currently down but are finding their way to new, awesome, permanent homes:

The Orbitrap Methods database is down completely
The Exactive family cycle time calculators are still available (email me: and I'll get them to you). My PPT tutorials are also down.
They might be down for a few days. The migration isn't a simple drop and drag but these new solutions should allow the documents to be accessible to more people...permanently...and will be free for me!

All videos are still up...and also migrating to duplicate locations..w00t!

ProtAnnot -- Highlight sequence variants that might explain your weird masses

In higher organisms proteins commonly have a ton of different forms. Splicing events are very happy to take a protein that has multiple functions and cleave out one of them to make a more specific protein. Of course...these cleavages occur at the genetic level and don't follow the same rules as trypsin. To detect these events with proteomics you have two choices -- the first is Top Down and the second is shotgun proteomics with a database that knows about the alternative sequences.

ProtAnnot is a new tool described in this open access paper by Tarun Mall et al., that is an add-in for the Integrated Genome Browser (IGB).  It highlights your alternative proteoforms within a sequence. I especially like the trick it does with data processing. So your normal session of IGB isn't interrupted in any way, if you choose to use ProtAnnot it fires up an extra thread on your server automatically to do its computations.

If you just can't get the masses of your protein to line up or get that last bit of sequence coverage, this tool might be exactly what you need.

Sunday, September 6, 2015

GOFDR! Analyzing proteomics data from the gene ontology level

Shotgun proteomics is amazing at identifying peptide spectral matches (PSMs). This is what we get out of the instrument: an MS/MS spectra that we can match to something with high confidence to something in our database.  The tricky part is getting relevant biological data back out. Figuring out exactly what PSM belongs to what peptide and what peptide belongs to which protein is the hard part. Evolution is working against us here -- it is much easier from a biological standpoint to make proteins with new functions from similar protein than it is to make a new one from scratch.

There are some really clever people thinking about other ways of inferring biological data out and I think we'll be hearing about a lot of it soon.  One new (to me!) approach is called GOFDR and its from Qiangtian Gong et al., and is described in this new paper here.

The idea is this: cut out the middlemen. That is, we've got the PSM confidently identified. If it is from a conserved region of a protein why would we bother going all the way through trying to infer which peptide and protein it is from. Chances are if its a PSM that matches multiple different proteins that those proteins are at least similar in their function. Thats the gene ontology part.

Example: This drug leads to upregulation of this peptide that can be linked to one of 60 different actin variants? Who cares what one it is, it sounds like this drug has a cytoskeletal component!

Thats the "GO" part. The "FDR"? its cause thats the level where they want to apply the false discovery rates, at the gene [protein] ontology level.

Is it simple in this form? Not at all. To run this pipeline the data is ran through multiple programs, including PSI-BLAST. At the end they see that they really have to spend time manually adjusting their scores and thresholds. Is it an interesting way to look and to think about our data? Absolutely.

Saturday, September 5, 2015

Friday, September 4, 2015

Wanna MALDI at half-million resolution?

I don't have a ton of MALDI experience. A little here and there, but I've always found it fascinating. I ran one a few times in grad school and I was very turned off by having to calibrate it constantly throughout the day. That might have been the start of my TOF hatred, come to think of it...

What I want is MALDI on a modern Orbitrap, like the Fusion above.  Thats the MassTech APMALDI-HR. Some friends of mine got to mess around with one and what I hear is that it is excellent!

You can check it out here!

Thursday, September 3, 2015

Intelligent optimization of search parameters for best possible data!

This is a very heavy and extremely interesting paper from a search algorithm optimization perspective. Oh, the paper in question is from Sonja Holl et al., and is available here.

You should probably read it yourself (its open access!) but I'm going to stumble through my layman-level interpretation of what I just read over Holiday Inn coffee that I think was some sort of homeopathy caffeine experiment...

The paper is essentially a meta-analysis of 6 data sets from three different types of instruments. Some come from ion traps, some come from Orbitraps, and some come from something called a Q-TOOF ;)

The goal of the study was to see how much changing the search parameters in a guided way would improve or hurt the results. And its kind of drastic. What they came up with is something that is a new optimization platform for a big and super interesting project called Taverna (will investigate!). The optimization plaform in Taverna looks at your data and determines what search parameters that you should be using for ideal levels of high quality peptide spectral matches (PSMs).

The taverna optimization platform looks at a number of variables including mass accuracy, isotopic distributions and more peptide-centric parameters like missed cleavages and enzyme fidelity. Up to this point, I was wondering why someone would re-write Preview....but then they make a sharp right turn and incorporate retention time prediction into the algorithm!  Interesting, right?!?

Another interesting plus? It appears to be a designed for server level applications!  A nice read even if your neurons aren't firing all the way!  Now its time to figure out what this Taverna thing is all about!

Wednesday, September 2, 2015

Wanna know what's going on in poplar tree proteins?

If you spend a lot of time climbing trees, chances are you hate poplar trees. Wait. What I mean is: if you climbed a lot of trees as a child...because well-adjusted adults don't climb a lot of trees of course! chances are you hate poplar trees.  They grow too fast and the branches aren't nearly as strong as their width might suggest.

However, some enterprising geneticists chose a poplar tree (the western balsa wood poplar (sounds strong, right?) rather than some more appropriate climbing tree as the first one to have its genome sequenced a few years back.  This, of course, opens the poplar tree to proteomics!

In this new paper (ASAP at JPR) from Phil Loziuk et al., and linked to some guy named Muddiman, this team does a disturbingly thorough job of proteomic characterization of this tree.  They first section the internal areas of the tree into whatever passes as tree organs and then use multistage fractionation and optimized FASP to end up with nearly 10,000 unique protein groups identified on a Q Exactive. The goal of the study was to hunt down transcription factors involved in cellulose production, which is never easy to do thanks to their low copy numbers.  But when you get plant proteins down to the 10K unique level, you are going to be able to find just about anything, including transcription factors.

They pick the most interesting proteins by tree organ and develop an absolute quantification method that can be used routinely to assay the levels of the proteins most deeply involved in cellulose production.

I like this paper because its such a good story. "We set out to understand more about tree growth because its very useful for the lumber industry scientist...and here is a nice assay you can use." It really highlights how we can sit down with a scientist with a unique problem and apply our existing tools all the way to a solution.

Oh...and the sample prep/fractionation method is pretty interesting as well!

Tuesday, September 1, 2015

Experimental Null Method!

This one is really interesting and an idea that I like more the more I think about it.  The authors capture the idea really well in the first picture.

In general, the idea is this: its hard to find the biomarkers because we see hundreds of thousands or millions of things in a standard peptide ID experiment. So we eliminate a ton of stuff from contention by assessing our entire system variability (sample prep, LC, mass spec, data processing) by comparing two control groups to one another. This gives us a baseline to go on. Then the stuff thats weird in our experimental sample can be considered to have some validity if it exceeds the total variation limit within our experiment.

Now. The big question in my mind is how to I easily do this and automate it. Cause the Qu lab has a pretty great bioinformatician or three...and I don't.... but I'm gonna give this one a whirl later with some commercial tools!

Shoutout to @pitman_mark for the heads up on this cool paper!