Thursday, April 30, 2015

How to import Proteome Discoverer results into Ingenuity Pathway Analysis


I swear I wrote this up somewhere before or made a video but I can't seem to find it.  Here it goes.  Ingenuity Pathway Analysis (IPA) (TM,R) is a great program for figuring out what is happening in your samples.  It was originally designed for microarray analysis but has grown over the years to be a fantastic tool for all sorts of -omics studies.  I used it for years in my previous life as a government researcher and I'll recommend it to just about anyone doing quantitative proteomics.

Why? Cause it will find the missing proteins!  We know we aren't getting 100% proteome coverage in our studies, though many groups are starting to come close.  If you have 5 or 6 proteins up- or down- regulated IPA can figure out what pathway those are most strongly correlated with and can point you to other interesting targets (or the low copy number regulator which might be causing your phenotype).  Of the many upsides of IPA one is that all this data is manually curated.  The downside is that is an awful lot of work for them so the software isn't free.  You can try it out though (30 day trials, I think...)

As I said, however, IPA was designed for genomics.  It works for proteomics but your data needs to be in a genomics friendly format.  I like the universal gene identifiers that are embedded in the Uniprot/Swissprot FASTA files.  There are other ways around everything this but is the easiest way I've found.  I'll show you how I do this in PD 1.4.  PD 2.0 is a little different and I'll highlight that as well.

I grabbed the first data file I could find on my hard drive on this sleepy morning (no coffee in my hotel room?!?! Seriously?!?!  Short pause while I write a scathing review of this place on Google Reviews...)  Okay! Here is the processed TMT 10plex file.



For simplicity sake I went to the right corner and removed all the unnecessary (for this) stuff, including: accession number, #peptides, #unique peptide, etc. I just kept the Uniprot description and the protein intensities (for label free) or, in this case, the ratios.

Next, you need to export your results so you can edit them in Excel.  In PD 1.0-1.4 you could right click and "export to excel". This function was replaced with the one shown below in PD 2.0.


To keep this easy for me I'm going to stick to the PD 1.4 version. In PD 2.0 this process will be the same.

Open your Excel or Text output in Excel.  What you need to do is parse the universal gene identifier out of your protein description.  There are smart ways to do this. Or you can just use the "Text to Columns" feature to get your identifier into its own column.



  Keep in mind that in many formats mitochondrial proteins don't have the same descriptor parameters that other human proteins do. The end result should look something like this.


At this point, I like to save this as tab separated text. This ditches any weird formatting stuff hidden in your Excel file.  This text file is now ready to import directly into Ingenuity.  If you were doing relative label free quan with the precursor ion area detector then you'd need to divide your intensities to get ratios.

Things to note:  Excel has this annoying issue with Septin proteins.  It recognizes Sep5 as "September 5th" and will change it to 9/5/15 or something.  You can change this in configuration to turn it off.  There is something else dumb it does, but I forget what that is.

When you import into IPA, tell the program you have headers so it ignores the top row. Then identify the gene column as the target and the quan output as the conditions you want to study.  You can set up time courses or multiple comparisons this way.

Is IPA a perfect tool for proteomics? Nope, but its getting better all the time. And if someone in your lab has a license for genomics research its a nice way to quickly see if anything cool is hiding in your data.

Wednesday, April 29, 2015

Human variation at every molecular level of control!


Its fun to sit back sometimes and reflect on the hubris of our species.  Hippocrates first broke down all human body functions into these 4 humors, or temperaments.


Seems silly in hindsight, right?  But this was the prevailing thought process for more than a thousand years.  In my lifetime I feel like I've seen this occur as well.  Probably the most fascinating was in how the lay public (and politicians...) really responded to the Human Genome Project.  Alright...so now...we have a list of 16k or 18k (or whatever...) genes that can make proteins.  EVERY time we really look at it we find more and more variation and levels upon levels of control.  When I read things like this paper from Can Cenik et. al., it highlights 2 things.  1) How freaking awesome the tools we have right now are for all this stuff and 2) How little we know about anything.  Honestly, sometimes I think we're closer right now to Hippocrates understanding of biology than we are to really really getting it.

Wow, I could use another coffee.  Or more sleep.  Sirens apparently make me rambly...sorry...what was  I going on about?  Oh yeah!

In this paper they take a look at a single cell type across a wide cohort of people.  They use every tool in the current book (exaggeration...there are so many tools!); ribosome profiling, RNA sequencing and proteomics via mass spectrometry.  What they found?!?! This mind blowing fact: individuals appear to control the regulation of what they express and when at an INDIVIDUAL level.  What?

Consider it this way.  If you wanted to understand how and why I made a certain protein and compared it to how and why you made that same exact protein you would find that the conditions might very well be different.  Here is a (probably inaccurate) analogy I just made up on the fly: if you and I are sitting in a box in the desert and it was slowly getting hotter at some point we'd both start expressing heat shock proteins to protect our systems.  We might upregulate them at different times due to an unbelievable level of variation between our two systems.  Even if we did start producing these proteins at the same time, it might not be because of the same mechanisms!

Crazy cool, right?


Tuesday, April 28, 2015

Awesome post on increasing glycopeptide identifications


I'm a big fan of the Glyco MS blog.  The anonymous author of this blog obviously knows what he's talking about.  Evidence? Check out this recent post on increasing glycopeptide IDs.

On my way out of Baltimore this morning...


I'm on my way out of my chosen home of Baltimore this morning with a little time on my hands (finally!) so I'll be backdating some posts that I've been wanting to write but haven't had time for.  If you've been following the news you know its been pretty exciting here...and sad.  Yes, this is a town with an awful lot of corruption and a recent history of discrimination and police brutality. Unfortunately, the attempts of thousands of people to bring this peacefully to light was subverted by a few hundred criminals who were looking to rob, steal, and destroy. Its a sad thing but this certainly isn't the first time Bmore has went through something like this and it'll all be fine again.  As for me, I don't think I'm really home again until late next week.  Hopefully the 10pm curfew will be lifted by then cause I'm definitely going out!

Back to the science!

Monday, April 27, 2015

In the MD/DC area? Wanna talk about Proteome Discoverer 2.0?


Sorry for the potato quality.  Here is the gist.  If you are in the Maryland or Washington D.C. area and you want to try to stump me with questions about PD 2.0 you should come to the NIH campus on Thursday May 28th.  We're going to start out with a RAW file and squeeze every bit of data out of it that I can using the new powers in PD 2.0 including (but not limited to!): multiple databases, Preview, Byonic, MSAmanda, ptmRS, protein markers and Annotation.  Then we're going to take some datasets with >1M spectra and show you how simple it can be to work with them now.

To register go to this link!

I'm working with local Thermo people in: Boston, Atlanta, LA, and Indianapolis right now to get workshops in these areas and I'm trying to get some time off the road to do a series of online webinars. Information on these events will be coming soon.

Sunday, April 26, 2015

Second party emitters for EasySpray source



I'm not much of a chromatographer.  Yes, maybe I came from the lab of a world famous separation scientist, but by the time I got samples they were pretty darned clean.  The best chromatography I've EVER gotten in my life has been with EasySpray columns. Viper connections mean I don't leave dead volume and the EasySprays are reliable and the peaks are perfect just about every time.

However, I am aware that some people need alternative chromatography materials and that may trump the risk of lowering base peak resolution for them.  If this is you, there are 2 solutions out there.  Dionex makes microspray direct emitters and MSWil GmbH makes what they call a "JailBreak" kit that vastly increases the flexibility of the EasySpray source.  It allows ionization at liquid junction or tip and allows any columns to be added to the system.  It also uses the in source temperature controls already built in.  So you can easily get heated columns.  I'd be interested to hear comments from anyone who has used this.

Shoutout to Vikrant for the heads up here!

Saturday, April 25, 2015

New PRIDE tools for meta-analysis of proteomics data.


Hey all you proteomics coders/ bioinformaticians!  PRIDE has some sweet new tools for you and they are open-source Java!

You can check them out here.

Tuesday, April 21, 2015

How to make an ion trap with spoons and wires!




I'm kind of blown away by this..and I have no idea why I didn't see it until now.

Shoutout to Donna Earley for showing me this!

Monday, April 20, 2015

Is throughput your problem? Try double barrel chromatography



I visit a lot of labs. And sometimes I visit some that can keep up with the demands they are getting for samples.  Most of us, though, I think have a freezer or two full of things that we haven't gotten to yet.

Fabian Hosp and Richard Scheltema et al., might just be in that latter group because they went out to set up probably the highest throughput method I've ever seen.  The secret is what they call "double barrel chromatography" coupled to the rocket fast Q Exactive HF.


The trick really is to minimize all the dead time by having the LC pumps working on the next sample.  If you've got an LC with multiple pumps (any 2D capable nanoLC) you can set up something similar to this and boost your efficiency way way up.  They go the extra mile here and use a whole separate second emitter.  Double LC/ double source, crazy fast mass spec and you might just be able to beat down that backlog of samples.

You can check out the paper here.

Saturday, April 18, 2015

Proteome Discoverer 2.0 Ultra deep search options

Alright!  I've been wanting to show you guys this for a while.  Here is the question:  In PD 2.0 we currently have access to a handful of different search engines.  What is the benefit of using multiple engines?

There have been several papers over the years where people inevitably compared searching the same dataset with Sequest + Mascot (or X!Tandem) and the crude summary I give people when they ask (from what I remember of these papers {go brain go!} is this:  adding a second engine gives you somewhere between 5% and 20% new peptide identifications.  Depends on the settings in that you use and mass tolerance of the instrument and database and whatever.

Now, on my personal desktop version of PD 2.0 I have access to the search engines above (I don't mean to exclude Mascot but my employer's IT department cited 30+ reasons why I can't access the Mascot server from my home PC. I didn't read them but I suspect they state things like: "our primary directive is to deter science at any cost and to personally annoy you." You know...the stuff every IT department says but actually doesn't say....man, I need another coffee

To keep it simple I say "adding one engine will probably give you 10% more IDs. Adding a third engine will give you maybe another 5%".  Again, these are based off of those studies with Sequest, Mascot, X!Tandem, OMSSA that I can't actually cite but they are in this big filing cabinet somewhere.  But here, here we have 3 very different search engines (not to say the others aren't different, but MSAmanda and Byonic are new stuff and so is the XMAn database.

The question?  Is it worth my time to set up a search that uses all 3 engines?  What is the most efficient way to do it.  And what is the net result?

Sample:  1ug HeLa digest. 2 hour gradient. Orbitrap Fusion operating in super speed mode (HCD ion trap MS/MS scans acquired while the Orbi gets the MS1s).  ~78,000 MS/MS scans.

Baseline processing:  SequestHT, UniprotHuman database, 10ppm MS1, 0.6 Da MS/MS tolerance, Percolator

Super processing method:  All 3 engines above. MSAmanda and SequestHT searched with UniprotHuman + XMAn database + cRAP; Byonic set with UniprotHuman database + all the modifications recommended from the same file searched with the Preview node.

Baseline processing results:  27,535 peptide groups; 5026 proteins

Super processing method results: 33,530 peptide groups; 5444 proteins.

6k new peptides!!!  Thats 21%  w00t!  The punchline?  I think I can dig deeper.  Honestly.  I short-changed Byonic pretty bad.  It only got to search Uniprot AND only the modifications that Preview recommended from its quick scan of the highest intensity peptides.  If I wanted to do this right, I would (will! this is cool!) allow Byonic to use the same databases, search for more PTMs & open up the wildcard search capabilities.

I'll queue up some more stuff and then go find something outside to climb.

Friday, April 17, 2015

Happy Friday!



I promise some real science is one the way soon!  Have a great weekend!

Thursday, April 16, 2015

Get all of your glycoprotein biomarkers in a single enrichment

Yes, I know my focus has been on glycos recently.  Its cause thats where this whole field (particularly the cancer people) seems to be going!  

Check out this quote from Esther Sok Hwee Cheow et. al., in this new paper in press at MCP:  "The most clinically useful protein biomarkers in cancer medicine are glycosylated molecules..."

In plasma you have soluble glycoproteins and insoluble ones that get stuck in lipid vesicles.  Since there seemed to be only methods available to get one or the other, this group came up with an entirely new method that pulls down both families of glycoproteins all at once.  

The LC-MS analysis was performed after de-glycosylating everything with PNGase.  It seems like the glyco field goes 2 ways on this topic (to deglycosylate or not to deglycosylate and run with ETD).  Since the LC-MS analysis was performed on an LTQ FT Ultra they had to go with the former.  I tend to have some reservations with this technique, but if you are doing your MS1 at 100,000 resolution (which they did)...I'm going to be okay with it.

All in all, a cool new technique that gets us closer to all the awesome biomarkers we know are in that plasma somewhere!

Wednesday, April 15, 2015

iFASP -- faster sample prep for isobaric mass tagging


This is a neat idea that my friend Trevor showed me at his lab this week.  iFASP is a technique that takes some steps out of the isobaric labeling process, particularly if you are processing your samples with FASP!
 It was described in this work by Gary McDowell et. al., here and, from what I can tell from the TMT 10plex data we saw, it works like a charm!

Monday, April 13, 2015

Proteome Discoverer 2.0 is out!


PD 2.0 is now officially released!

And....you're going to find it a little tough to use at first.

Here are some videos that will get you going.  They are rough drafts but should be a good way to help you get going.  I'll add this to the list of Pages on the Blog to the right. Where it will say "PD 2.0 Videos!"  Please be sure to watch these in HD.  P.S...more are on the way!

1.       Loading a FASTA database from your hard drive: https://vimeo.com/115595009
2.       Creating a custom FASTA database:  https://vimeo.com/115594400
3.       How to use the Administration menu:  https://vimeo.com/117010043
4.       How to configure your Mascot server:  https://vimeo.com/117042117
5.       How to set up a simple peptide ID workflow in PD 2.0:  https://vimeo.com/115592929
6.       How to use the consensus steps in PD 2.0:  https://vimeo.com/115593122
7.       How to process M2 TMT and iTRAQ MS2 data:  https://vimeo.com/117051567
8.       How to organized a reporter quan experiment in PD 2.0 (Part I):  https://vimeo.com/121393236
10.   How to organize a reporter quan experiment in PD 2.0 (Part 2, the results):  https://vimeo.com/121394251
11.   How to process SILAC data:  https://vimeo.com/115874330
12.   Reprocess consensus report data in PD 2.0 (new version) :  https://vimeo.com/121897580
13.   How to perform label free relative protein quan in Proteome Discoverer 2.0:  https://vimeo.com/117064754
14.   How to set up a complex relative label free quan experiment in PD 2.0 (multiple fractions):  https://vimeo.com/121074081
15.   How to use the Annotation nodes:  https://vimeo.com/121397080
16.   How to use the Proteome Discoverer 2.0 Daemon:  https://vimeo.com/115875441
17.   Set up a search with multiple search engines:  https://vimeo.com/115594605
18.   How to export data to search in another program:  https://vimeo.com/63693490
19.   Peptide FDR discussion (PSM validation nodes): https://vimeo.com/64465036

20.   Protein Validation nodes:  https://vimeo.com/121399307

Friday, April 10, 2015

Another hurdle in top down proteomics surpassed by the Kelleher lab


We miss so much data when we digest our proteins.  The long term goal is for us all to get to running all of our proteomics experiments top down.  Problem is, its really really hard.  There are hurdles all over the place.  Proteins degrade, proteins are hard to separate, proteins ionize poorly, proteins fragment unpredictably and the data processing can be a nightmare -- to name a few.

One by one, the Kelleher lab has led the way in chipping away at these hurdles. Is it still hard to do top down proteomics?  You bet your sweet peppy!

One of the ones I mentioned above is separating proteins.  Thanks to insolution isoelectric focusing techniques like the GelFree System, the separation of proteins is a whole lot easier looking than it has been in the past.  Problem is...that system (and things like it) use SDS to get good separation.  And SDS is not mass spec compatible.  You've got to get rid of it.

Until now, the only way of getting rid of the SDS is to crash the proteins out of solution with a precipitation technique.  Have you ever done this?  It sucks. You precipitate the protein and it is never the same again.  You lose a lot of it because 1) maybe it didn't crash out efficiently (its really hard to tell) or 2) it won't go back into solution again or 3) you just turned all the proteins into a big gross mess. It sucks.  You tell me that you are going to have to precipitate your protein with acetone or something and I'm going to assume what the mass spec is giving you isn't all that relevant to the biological system you had a minute ago.  Okay, maybe it isn't that bad...but sometimes it really seems like that. Did I mention precipitating proteins sucks?

But now?  How about an online separation technique where you just wash out the stupid SDS!?!?!  It exists.  It looks a lot like an online FASP system from the figure above.  The proteins can't go through but the nasty detergents are gone.  According to the paper it takes like 5 minutes. And then you can go right from a clever separation of your proteins to identifying what you have.

You can check it out in this paper from Ki Hun Kim et. al., here.

Thursday, April 9, 2015

Is this the ultimate phosphopeptide separation study?


I'm not a chemist. I assume that this fact is abundantly clear from most of these blog posts. I'm a biologist. I need this chemistry stuff (particularly methods) set out in front of me in an easily digestible format.  Then the next time some other biologist asks me how to best set up their experiment I can lead them directly to that great chemistry paper.

Some labs are very good at this side of things. Point in case, this new paper from Andrew Alpert, Otto Hudecz and Karl Mechtler.  This study starts out by very simply breaking down phosphopeptide separation chemistry in such a clear way that you'd think that this was a reference paper.  Simple and concise. At this pH, phosphopeptides stick to this thing and at this pH they don't.

Then! They go through step by step and compare every currently popular separation technique for phosphopeptide chromatography:

WAX (weak anion exchange)
vs
SAX (strong anion exchange)
and
AEX (anion exchange chromatography)
vs
ERLIC  (electrostatic repulsion-hydrophilic interaction chromatography)

Incidentally, this is essentially the title of the paper.  How clear is that?

"Hey Ben, how do you best separate phosphopeptides and why?" This is the paper that will be attached to my reply.

(The answer?  Oh yeah! It kind of depends on what phosphorylation sites you are interested in, but it looks like low pH ERLIC is going to give you the highest number of phosphopeptides if you're just going for a bulk study).

Wednesday, April 8, 2015

Man, do I ever love Preview


I don't know if I've ever really gone into my obsessive love of this little node in Proteome Discoverer. BTW, Preview is by Protein Metrics, and its free and easy to install and it makes my life so much easier.

If you aren't familiar, you should check it out.  Preview tells me about my sample.  This is part of the front page read out from something I'm digging through today:


It took a look and said "you know what? this is a tryptic digest. and its a pretty good one.  We miss about 10% of the Ks and Rs when we went through, but thats okay.

Then is says, "hey, looks like you used iodoacetamide, cause about 99% of your cysteines have a +57.00whatever on it"

Then is says: "man, you should really search for deamidation of N and Q, ou'll get an extra 900 peptides if you do.  Oh, and oxidized Met will help as well.  And it looks like you maybe over reduced/alkylated a little cause you've got some artifacts from those 2 processes around, but nothing too bad, just maybe use a little less heat next time or somethin'"

What a crazy useful tool!  The sample I'm looking at seems just fine.  Nothing to worry about there!

You can get Preview at the Protein Metrics page here.

Tuesday, April 7, 2015

Label free quan on a Q Exactive global proteomics sample!


Alright, this paper is really cool.  I know these days when people think global label free quan (both inside and outside the field) they think Data Independent Acquisition approaches like pSMART or SWATCH. Despite some existing evidence to the contrary, I think I still fall in the camp where I'd prefer to see a high resolution MS1 comparison between two different samples.  I see the power of DIA, obviously, but precursor ion intensity is a very valid quantification method when done correctly and it makes me when I see evidence of people doing it right.

A great example is this new paper from Tali Shalit et. al., that just came out in JPR.  In this study these researchers performed label free quantification on a super complex sample using a Q Exactive.  How complex?  They added stuff to a HeLa digest and separated it in 1 dimension!  Using the same amount of HeLa, they spiked in different concentrations of an E.coli digest.  This gave them a great way to test the accuracy of their quantification over the full dynamic range of the instrument with known values.

How good is the Q Exactive?  Incredible, of course.  <10% CVs across the entire dynamic range and measurements that compare accurately to the known values all the way into the limits of detection.

Now, one of the cool things analyzed was the importance of the processing software.  That data is there in the RAW file but not every piece of software is created equal.  In this analysis they looked at MaxQuant's iBAQ algorithm (I know, I probably didn't capitalize the right letters...I can hear my coffee machine violently extracting caffeine now...) and Expressionist.


Expressionist has come up in my conversations a few times recently. And the series of events go like this.  Someone says something about Expressionist.  I write it on my hand or EverNote or something.  Then  I see the word later and Google it.  And I come up with an entire page of pictures like the monstrosity above. Then I remember that I've done this repeatedly and that, whatever it is, it is something I've looked up incorrectly multiple times.  Adding the term "proteomics" to the toolbar gets me to the GeneData Expressionist page (here.)

I haven't had a chance to check this software out personally yet.  I do like the tutorial videos and I've watched a couple of them (hey, I don't tell you what to do with your free time!) Whenever someone speaks in an accent that isn't my native one, I assume they are smarter than me. The guy doing the videos sounds super smart so I just assume it is nice software.  But this is the first comparison I've seen with Expressionist vs another software package.

In the case of this experiment, Expressionist appears to be a whole lot better than the label free quantification options in MaxQuant.  This may be an artifact of the sheer complexity of this sample, but maybe not.  You'll have to dig through the Supplemental material yourself cause I've got other stuff to do today ;)

All around, this is a very nice paper and a great short coffee read!  Particularly if this is the kind of study that you want to make a focal point in your lab.

Monday, April 6, 2015

Double nanoLC source!


Somebody told me about this the other day.  I forget who, but then I found it in my LinkedIn stream.  This is a robotic double nanoLC source with a ton of cool features including the ability to pop one column out while the other is running.  There is actually a setup in the PDF document that shows 3 nano-emitters set up for instruments that can tune or calibrate (or seemlessly switch to direct injection for some experiments; quick intact protein masses, maybe!) with direct injection.

You can read more about this funky source here.


Sunday, April 5, 2015

New to proteomics? Here are some of my favorite videos!


This field is continuing to grow.  For those of us who have been doing this ....forever.... its easy to forget that we are using too much jargon and assuming too much pre-existing knowledge.

On the right side of this page you'll find my new page "Resources for Newbies in Proteomics".  I hope to add to this as time goes by.  I've found 2 great videos from the Broad (its pronounced like "toad"), as well as some other resources.  I'll post more as I think of it.

To my veteran proteomics readers, please send me resources that you think will be appropriate to this page.  I'll happily add them.  Lets get these new people up to speed and


Friday, April 3, 2015

Why phosphoproteomics is still hard.


We've been talking about phosphoproteomics now for what, 10 years? 15?  Seems like it would be pretty simple by now, right?

If you've tried your hand at it recently after a long break, you'll find the tools have gotten better.  We have better enrichment strategies and faster, more sensitive instrumentation.  Round that off with better software for analysis and...yes....it isn't as impossible as it seemed a few years ago.

Truth is, though, its not an easy experiment and good phosphoproteomic studies only routinely come from the very best labs.

For a breakdown of where the difficulties come from check out this cool paper from Solari and Dell'Aica et al., in last month's Molecular Biosystems here.

Thanks to Jon Humphries for Tweeting this great review!

Wednesday, April 1, 2015

Easily test the dynamic range of your instrument and your LC


I'm seriously not trying to muddy the water by telling you guys about all these quality control options out there.  I've told you what I use, and what I will continue to use, HeLa + PRTC (FTW!) but I recently visited a lab that was working with Promega to develop this product and I thought it was really smart.  And it is commercially available now and my opinion hasn't changed.

Promega is calling this the "6 x 5 Peptide Reference Standard." As catchy as that is, the name is actually a useful description.  What you get in the kit is 6 peptides of varying hydrophobicity, similar to the PRTCs.  The interesting part is that you get different heavy isotopes of each peptide in reducing abundance.  6 LC peaks, but 30 peptides.

Want to step up the coolness factor just a little?  They wrote software for processing out your runs that you can get (presumably, for free, it looks like you just register but don't quote me on that!) and that software accepts both Thermo and Sciex data formats.  Great ideas, right?!?

You can read more about the 6 x 5 here.

Credit goes to Katie for sending me the link to the commercial product!