Wednesday, February 25, 2015

Bioconductor for proteomics

I've been meaning to write this one for a while! Up to about a year ago R was a mysterious thing to me.  There was a class on it I could have taken in grad school and my roommate, an ecology statistics guy (who has an awesome blog here, btw) used it all the time.  Honestly, though, I didn't really know what the hell it was.  Now, I definitely can not claim to be an expert, but I've taken a class and a half on R and what is out there is pretty mind blowing.

This is how I'd describe it:  R is this free program that serves as a framework for other programs.  You begin by running R and get into its interface.  Then you load other programs (as far as I can tell, all of them are free) inside R.  You need to know some stuff to use R, like a list of simple phrases to tell it to do things.  For us old guys, it kind of resembles DOS (actually it most reminds me of the Commodore 64 interface).

It gets a little more complex than this, mostly due to how many people use R.  Packages exist for business and sports statistics, for astronomy, for everything.  One package of specific interest to us, however, is Bioconductor.  This has a whole set of frameworks for the biological sciences. Inside Bioconductor there are, as of this morning, nearly 1,000 free programs (called "packages") that either come with or can be optionally added to this package.

A simple search through Bioconductor for proteomics gives us these packages (click should expand):

The next obvious questions would probably be "why do these free programs have advantages over some of the other stuff out there?"  Check out the title of the second column:  Maintainer.  In general, R, packages don't just get written and forgotten after the reviewer checks them out.  One of the authors is responsible for making sure the package you download isn't corrupt, is compatible with the newest versions of R and upgrades functionality as new features are added.  This is in theory, of course, but you have the name of someone to email if you can't get it to work.  Pretty cool, right?

I was reminded to write this entry the other day thanks to this great paper by Laurent Gatto et. al.,.  

Friday, February 20, 2015

Is DIGE back?

Around the time I started my second postdoc, my lab was in the transition away from 2D-DIGE.  I started a SILAC project, did some TMT and the lab never went back.  My predecessor did some great stuff with 2D-DIGE, though, but I only worked with one study as we overlapped and control of the LTQ was handed over to me.  Considering that it was a technology we used, but then didn't use again, I guess I considered it out-dated.  There are obviously some drawbacks, such as limits of detection and proteins of extreme hydrophobicity/size/PI, but there is tons of good data that came from this.

G Arentz et al., would like us all to know that DIGE isn't dead.  And there have been developments that have improved this technique that we may not have noticed.  Especially if you have all the stuff sitting around to perform DIGE, you might want to take a look a this paper.

Thursday, February 19, 2015

Know someone who doesn't think mass spec can be reproducible and accurate? Send them this paper!

Man, this is awesome.  What is it that we hear about proteomics from the naysayers?  Interlab variability?  Poor reproducibility?  Next time you hear that send them this paper from Sue Abbatiello, Birgit Schilling and D.R. Mani et al., (in press and open access at MCP!)

What is it?  A concerted freaking effort that shows, conclusively that if we eliminate the dumb variables we're all always messing with and use the same prep and instrument methods and use a good quality control that we can get precise and reproducible quantification of peptides from human plasma -- in any lab.  The proof?  11 labs, 14 different LC-MS systems.  The same data!!!

And they didn't go after albumin for a "proof of concept".  They quantified 27 different cancer linked proteins and controls.  Something around 125 peptides!

One really interesting observation is that they did abundant protein depletion -- and were still reproducible.  A lot of previous observations had suggested that doing abundant protein depletion wreaks havoc on your plasma samples.  There is some solid data here that pointedly refutes that.  Sure, you can introduce variability if you do a lazy job depleting your plasma, but it looks like you can also do a great job (probably following the same protocol helps...)

I cut this image (please don't sue me...this thing is open access and I should eventually learn the details of what that really means) out of the paper!  Digestion and processing at individual sites!

Yes, I'm over excited.  Yes, this is the first paper I've read after opening a terrific chardonnay.  But that doesn't dilute that this is an incredible, thorough, and impressive paper from some top notch researchers at some great labs.  If you're into validation, I can't recommend this paper enough.

Tuesday, February 17, 2015

Impact of regulatory variation from RNA to Protein

Every tried to compare your mammalian transcript quantities to your protein quantities?  Even if you harvest the same cell lines at the same time, if you are expecting it to match up...

The reasons why have kind of diverged more into the easier analyses, like which one is better (protein, obviously!). The question of why we have all this variation between transcript and actual protein expression levels is a much tougher question.  Tough enough that the first decent swing at it made this month's Science.

In this paper from Alexis Battle, et. al., these authors take a look at gene products at three levels.  They go after the messenger RNA, the ribosome profiles, and the protein quantification (using SILAC).

What did they find?  Tons of stuff.  Holy cow, this is the most dense 3 page paper I've dug through in quite a while.  High level analysis, though?  At the top, they identified something they call "cis quantitative trait loci" that appear to function to minimize the effects of differential mRNA expression at the protein level.  That is, these things exist to make sure that even when the mRNA levels say "stress!  make 1,000 times more of this stuff!!!" these cis things make sure that the protein copies actually produced stay within normal biological limits.

Again, this is a seriously deep paper.  There are observations backed with heavy statistics on the ability to different individual sample variation at each of these levels (including SILAC!). Another interesting point of this paper is that we're looking at what can happen when these next-gen genomics scientists apply their statistics to our field...

 I highly recommend you check this out!

Monday, February 16, 2015

Great discussion on shotgun proteomics false discovery rates from BioCode's notes

Oh FDR, how I love thee....and how I wish I'd taken more maths in college....

In BioCode's notes, one of my all-time favorite blogs, Yasset walks us through his thoughts on shotgun proteomics FDR, in particular, how we can look at FDR in tools readily available in PERL.  This isn't an early morning least not for me...this is one of those post-lunch full of caffeine reads.

You can check it out here.

Saturday, February 14, 2015

It might be time to take another look at PEAKS

I had an insanely busy but extremely productive week at the Mayo Clinic.  One of the highlights of the trip was the chance to pop in on my good friends at the MPRC.  The MPRC has that perfect combination of things that we all want where we work:  access to cool samples, cutting edge instrumentation, and a load of top notch scientists to run everything.  Popping in is like Xmas to a nerd like me.

Of the many things that I learned in my few short stops in is that I really need to find some free time somewhere to check out PEAKS 7.  I know a lot of you out there have a copy of PEAKS sitting around.  In the earlier versions you could argue that the interface was maybe a little wonky or the FDR didn't seem quite right.  Honestly, I'd probably have said the same thing, but I really need to do de novo sometimes and PEAKS has always been the easiest interface out there.

I saw QE data ran through PEAKS yesterday that knocked my socks off.  The PEAKS interface is more intuitive (and more useful!) and the data we were looking at was just perfect.  I don't know how the licensing thing works, but I think if you've got a copy you should see if you can demo the new version or get yours upgraded.  I think you'll be impressed.

Friday, February 13, 2015

sbv IMPROVER. Got time to compete?

Thanks @PastelBio for this super interesting link!

sbv stands for System Biology Verification.  What it appears to be is a consortium (this is the word I use when I don't know who is doing something cool and they've got a great website) that wants to improve how we do biology.

The way it works is they set up challenges, kinda similar to what ABRF sets up, but they do this for all sorts of biology. A biological pathway analysis challenge is currently underway and they recently wrapped up a couple phosphoproteomics studies.

Check it out here!

Thursday, February 12, 2015

Exact m/z of the Pierce PRTC peptides

I'm throwing these up as much for you guys as for me so I can reference this later!

Here are the exact theoretical m/z for the Pierce Retention time calibration peptides:

Theo. Sequence Theo. MH+ [Da]
1 SSAAPPPPPR 493.7683
3 HVLTSIGEK 496.2867
4 DIPVPKPK 451.2834
5 IGDYAGIK 422.7363
9 GLILVGGYGTR 558.3259
12 LTILEELR 498.8018
13 NGFILDGFPR 573.3025

Shout out to Tara, cause I'm pretty sure I stole this from a slide deck you made!

Wednesday, February 11, 2015

MitoFates: Predict your cleavage products.

I'm currently processing the 16 files I ran overnight.  All human stuff. All pretty well characterized.  As always I find myself wondering things like "what the hell are all these MS/MS spectra that don't match anything!?!?!"

Peptide match on the Q Exactive allowed them to fragment, so they clearly have peptide-like isotopic distributions.  They have charges from +2 to +7, so I should be able to sequence them effectively.  It is a cancer cell line so I could do some tricks like using the XMan database to find the known mutations and that always pops up some new peptides.  I can run the file through Byonic and find PTMs and some novel mutations, but I'm still looking at a bunch of spectra that look nice, but don't match anything.

A while back I was blown away when I saw a talk describing the well-known (not to me...) facts of how systematic apoptosis cleavage events can be.  Even in cell culture some of these cells are going to be dead or dying or whatever, so could that be some of it?

To make the biology even more complicated, you can drop your proteins of interest into the new MitoFates program.  MitoFates will then search your proteins for known cleavage recognition sites and generate you a new FASTA.  So if programmed Mitochondial degradation is occurring you'll be able to identify the peptides that are caused by those events.

You can read about MitoFates (in press at MCP) here.

And you can just go ahead and dump your proteins in here!

Tuesday, February 10, 2015

Top tip -- fractionate your samples dozens of ways with your centrifuge

I just heard about this today.  Top Tip is a product from  At first it seems kind of boring.  "Fractionate your peptides without an HPLC?  Big deal!"  Then you take a look at the number of chemistries you can use.

SCX?  Check
SAX? Check
WCX (whatever that is?)? Check!
Two kinds of IMAC? Double check

The list of available resins is seriously a page long.  You can check it out here.  I haven't used them, but they got a good endorsement from a guy who is probably better at this proteomics stuff than I am.  And I'd rather pipette or spin to fractionate than set up an offline HPLC fractionation!

Monday, February 9, 2015

xiNET Cross Link Viewer

Okay!  I totally dig this one!

XiNET is described above in the screenshot that I cut from the main website.  You can pop over to check it out here.  Or you can just go to crosslinkviewer.ORG.

If you are one of those weird Bioinformatics-focused people, you can actually get all the source code to download and check out from GitHub here.

The paper from Colin Combe et. al., is currently in press at MCP here.

Now that all the links are out of the way, what the heck does this thing do?  Well, you feed it a CSV file of your crosslinked peptides.  Presumably you found them using something like the X-comb or Byonic, then you give it your FASTA database (or it uses a recent Swiss-Prot) and it generates you awesome graphics of these cross-links your mass spec found for you AND gives you some level of statistical significance of these crosslinks!  The paper works through some historic datasets and shows how well the xiNET works through this data.

This is a nice new (and surprisingly, it appears, easy to least in the web interface!) tool for anybody out there doing shotgun analysis of protein interactions.  If this describes you, you should definitely check it out!

Saturday, February 7, 2015

Thursday, February 5, 2015

Why you should run your nanospray at lower voltage

This is pretty cool.  You know, we always try to get our nanospray voltage to the lowest energy that will provide stable spray.  Why wouldn't we just set it higher?

Well, I think this great picture I got to take yesterday at the University of Wisconsin illustrates the point.  This is an EasySpray source running the same peptide (same scale) at different ESI voltages.  Check out the oxidation peak that shows up when you crank up the electrospray voltage to 2.35kV!  Crazy, right!?!

Thanks go to Greg for collecting this data and letting me share it!

Monday, February 2, 2015

Omics Tools. Well-organized omics resources

This is just an incredible repository of info and direct links to tools.  No idea why I've not seen this before.  You can check it out here (