Sunday, November 30, 2014

On vacation

Just to let you know, I'm going off the grid for a while (rock climbing in France and Italy!!!).  I schedule things on the blog to pop up sometimes at later dates and so there should be new stuff every few days, the only issue is that comments require my approval so none will post until after I return.

Didn't want you to think your insightful additions to this blog were being ignored.  They are, and will always be, sincerely appreciated!

Saturday, November 29, 2014

Using the ENCODE database to hunt down novel proteoforms?

Is this stuff ever going to get simple?  The answer appears to be a resounding "NO!"  We are some crazy complex creatures and the more I learn the more I realize that we are getting just a tiny bit of the biological picture with any technique we use.  Fortunately there are super smart people out there thinking of ways to integrate all of our tools so we can really get to the bottom of stuff!

I'm going to back up a little.  ENCODE is short for the Encyclopedia of DNA Elements and it is an amazing genomics resource at UCSC that has been ongoing since 2002.  ENCODE has been one way of trying to make sense of the wealth of DNA sequencing and expression data that has been rapidly building up out there.  You can learn more about ENCODE at these two pages (the original) (the new ENCODE portal).

Now, when we look at the genomics stuff, one of the big problems is that we know the starting material, either the DNA or the RNA transcripts present.  Like proteomics, or anything where we're making thousands, millions, or billions of observations, False discoveries are a problem.  And we can only score false discoveries based on what we currently know as true.  Man, am I mangling this post or what?

The reason I'm rambling about this is that this sweet paper in press at JPR took a swing at integrating data from ENCODE with proteomics data in an effort to expand more on the CHPP.  While I'm simplifying this completely into the ground, the idea is: how many of these things we can't explain that have been scored as false observations in genomics can be explained by unmatched spectra from the proteomics run?

Turns out?  Unsurprisingly, maybe?  Quite a few!  If unmatched spectra are driving you crazy, you might want to check out cool paper and see if this might help you explain some of them.

Friday, November 28, 2014

...kind of off topic, but an interesting study on data visualization

Sure...this is probably a little off topic, but if I stayed on topic I think this blog would be a whole lot less fun!

This is just a short blog post from a bioinformatician at Michigan State regarding different ways to visualize complex data sets.  In this example he uses the opening moves in a game of chess.  This was totally worth 2 minutes for me.  And I promise I'm not bringing this up as some sort of a statement regarding the quality of the data visualization in some papers I read recently.

A smiling pug!!!!

Monday, November 24, 2014

Cloud computing proteomics!!!

This is really cool.  In fact, if you read a paper today, I think it should be this one.  In it, these researchers detail what is possible if you use the Amazon Cloud to host aspects of the Trans Proteomic Pipeline for processing your data.  End results?  Thousands of files processed in hours, for pennies per file!  They really dig in, too, by showing different ways that this interface can be configured and operated.

The end result, though is the killer.  They load of 1100 files from all sorts of different instruments, from an LCQ Deca through an Orbitrap Velos and process them with 3 different search engines using the TPP-Amazon Cloud.  In the end it ran them about $80 to do so!  As a disclaimer, they start with pre-processed data (mZmL or something) which would lower their overhead.

It would be interesting to see how another super fast processing computer, say a Proteome Destroyer, would compare in a head-to-head, given their sub- 1 minute processing specs.

Sunday, November 23, 2014

FoldIt: Help other researchers by playing puzzles!

I know this is kinda old, but I had a great experience with crowd sourcing research recently and I realized I've never talked about FoldIt.  If you aren't familiar, this is a great (and crazy difficult) puzzle game where you solve protein folding problems.  FoldIt players have contributed to solving a slew of biological problems and were a feature in Nature a few years back as described in this Nature article here.

If you are interested and better at puzzles than me, you should check out FoldIt here.  There is a new version out this month for Apple, Linux, and Windows.

Oh, and here is a great introductory video.

Friday, November 21, 2014

Easy membrane protein prep procedure

I can't describe this one much better than the image I borrowed does.  Membrane protein prep is a pain.  Even if you are using some sort of a subcellular fractionation kit (and the ones I've used have not given me incredible membrane protein yields).  This looks really promising, though and (cheap and) easy!

You can check out this method in JPR's ASAP section here.

Thursday, November 20, 2014

More proteomic automation!

I'm obviously on a big automation kick recently.  But maybe the whole field is.  Last week I was at an incredible lab I don't think I can tell you about, but they had a GIANT robot that did all of their digestion and sample cleanup.  For those of us that can't afford that kind of thing, small time optimization with stuff that is already available may be the best option.

In this paper at JPR, the Jensen lab demonstrates such a process using disposable StageTips and something that looks like an EasyNLC 1000.  Using tons of E.coli preps they demonstrate they can achieve CVs <10% with their technique.  While I'd rather have the giant secret robot (and maybe you would too) this might be a great paper to check out for automated proteomics processing in a more affordable format.

Tuesday, November 18, 2014

Learn Byonic in a half hour!

Have you wanted to try out Byonic, but you were deterred by learning a new software interface?  Never fear, my good friends over at Protein Metric have made their software even more accessible with great videos.

Sit through a half hour of these videos here and be an expert in this great software!

Monday, November 17, 2014

Great slideshow on proteomics data/metadata

I may have posted a version of this previously, but I just went through this and learned more stuff.  This slide deck will lead you through the proteomics pipeline as seen by bioinformaticians -- including some hazards we may not be considering.

You can link to this great resource here.

Sunday, November 16, 2014

Enrich phosphopeptides on your HPLC!

Phosphopeptide enrichment via HPLC?  Sign me up!  While maybe not the most novel idea in the world, this new paper in press at MCP out of Utrecht describes the feasibility of such an approach.  In this analysis, they use an Fe-IMAC column for enrichment.  In comparison to tip based approaches they find massively reduced CVs and overall improved reproducibility.

And the yield does not seem to suffer.  Using an Orbitrap Velos they find ~9,000 unique phosphopeptides when loading directly onto a single dimension LC separation and were able to push it to ~14,000 when adding SAX and performing 2D separation!

Sunday silliness

Friday, November 14, 2014

Standardized method for fruit proteomics studies

Have you ever wanted to do proteomics on fruit?  How about dried fruit?  Have you ever wondered how hard it is to get peptides out of a banana?

If you answered yes to any of these question, I have the paper for you!  This paper in Electrophoresis is a complete evaluation of a number of different extraction techniques for obtaining peptides from different fruit.  They start with bananas cause they are crazy hard and the method they settle on works for everything they try!

Thursday, November 13, 2014

Extensive list of proteomics resources!

Hey guys!  I just want to share with you this great list that Pastel BioScience put together.  It is the most up-to-date and comprehensive list of proteomics resources that I am aware of.

You can directly link to it at the Pastel BioScience website here.

Tuesday, November 11, 2014

Another great QE HF paper!

This has been a big week for the QE HF.  Another paper is out detailing how great the newest QE is.  This one is from Christian Kelstrup et al., and out of Jesper Olsen's lab.  It appears to be open access, though I'm not sure if its because it is still listed as ASAP at JPR.  You can find it here.

The results from this closely match the paper out of Max Planck in the improvement in PSMs, peptides, and protein groups. One interesting observation out of this paper is that they see a realistic sequencing speed of 20 Hz!  Pretty awesome.  I still haven't got to test drive one, but that day is going to come.

Other awesome note from this paper and brazenly stolen.  All the currently existing Orbitrap and their speeds listed in resolution per millisecond.

Monday, November 10, 2014

Affinity enrichment vs affinity purification

What a great couple of papers out of Max Planck this week!

In yet another incredible paper in press at MCP, Dr. Mann's lab explores the difference between affinity purification and affinity enrichment experiments and how we should maybe take a step back and evaluate how we've been doing things.

Here is the distinction:
Affinity purification -- you look for a specific interactor(s) of your protein of interest.  You do multistage separations to get it super clean.  Protein presence vs. protein non-presence in your control says its an interactor.  Spend tons of time perfecting your sample prep.

Affinity enrichment -- you look for protein-protein interactions in a single-stage kind of way.  Pull down your protein of interest and its interactors.  Up-regulation of proteins in your pull down vs your control suggests that it is an interactor. (I'll add these to the translator).

End result?  Enrichment is just as good AND WAY EASIER from a sample prep view point.  The argument they make? We can do a pretty good job of getting a whole proteome in one shot, of course we can do protein-protein interactions this way.

The whole thing was validated by pulling down yeast proteins and by scoring the number of known interactors that popped up.

You can find this paper (still in press and open access) here!

Sunday, November 9, 2014

Max Planck heavy analysis of QE vs. QE HF

Do you have a Q Exactive?  You need to read this paper in press at MCP out of Max Planck!  It is an awesome analysis of the original Q Exactive vs a Q Exactive HF.  If that wasn't awesome enough, it is also a full out analysis of how any Q Exactive works and how best to optimize them to squeeze every possible ID out of your complex shotgun analysis.  One of the authors on this great paper is some guy named Makarov, so you can probably assume that these numbers are pretty accurate.

You should be reading this paper right now, but I'll pull out some bullet points I made on the plane this morning and tried to put into some semblance of order:

For shotgun analysis on both instruments, they found the ideal target for MS/MS to be 1e5.  A great chart backs this up.
For the QE, they recommend the ideal isolation window is 2.2 Da
The QE HF should be set at 1.4 Da for highest efficiency
For a QE HF, the ideal gradient for maximum numbers of IDs on a 50cm column is 150 min.  They were not able to increase their IDs with a longer gradient!
At a gradient of this length, with a 2.2 Da window, 8% of MS/MS spectra have a second peptide in them of high enough intensity that it can be sequenced.  At a 1.4Da window, this drops to around 6%

What about the head-to-head?
Same gradient -- QE HF gets 48% more peptide IDs than the Q Exactive
Both instruments have a dynamic range of about 3 orders of magnitude, but the QE HF can reach some lower copy number proteins than the QE classic.
Both instruments get an efficiency on a HeLa digest of about 62% (of MS/MS have peptide spectral matches).
The cool new flatapole may reduce the ion beam by as much as 75% (way way more robust now!)
The overhead is much better on the QE HF.  It may be as low as 6ms!!!

Again, you should read this awesome paper.  QE or QE Plus or QE HF, there is awesome info in here that will give you more peptide and protein IDs!

Saturday, November 8, 2014

Automate your phosphoproteomics!

We get a lot of flack from people outside our field when it comes to reproducibility. Most of it is our fault.  The genomics people all prep their samples the exact same way using the exact same kits, QC'ed mass produced reagents and the exact same protocols.  We tend to be a little more artsy...and cheap... when it comes to sample prep.  Everybody has their own little variations and most of us will try saving $2 here and there where we can before we load samples onto the most sensitive analytical instruments on earth.  In the very near future we're going to have to standardize all this stuff or the genomics people are gonna be eating our lunches.

Within proteomics, phosphoproteomics is even worse in this regard.  This is mostly because getting phosphopeptides is just hard.  Even if you are using an easy kit like the Pierce spin tips, you can expect to spend your whole day getting samples ready for the instrument.  The number of steps you have to go through introduces a lot of chaos into your system.  It's tough reproducing those steps exactly, even yourself, on Monday and Thursday of the same week.  Impossible? No.  Hard?  Absoutely.

I'm having this sleepy rant because I'm looking at a cool paper with a really appealing solution.  Let's just automate the entire thing!  In this paper from Christopher Tape et al., in this month's Analytical Chemistry (and open access), these researchers demonstrate a fully automated phosphopeptide enrichment strategy that 1) speeds everything way up and 2) makes this process amazingly reproducible.

To validate this method, they use SILAC and spiked in standards.  Post enrichment the well-to-well variability in their phosphopeptide recovery was as low as the variability in their SILAC ratios!

If you do phosphoprotoemics or are interested in a good hint in the direction proteomics needs to be heading, I definitely recommend checking out this paper.

Monday, November 3, 2014

Quick FASTA conversion trick I just learned

Maybe y'all already knew this one, but I didn't.

Have you ever had a FASTA sequence that software just wouldn't recognize as a FASTA?  Like, you saved it as "XMAn.FASTA" and you knew that the sequences inside were fine, but you just couldn't open it?

Well, it turns out that Windows has this fantastic little default feature called "Hide Extensions for Known File Types" (Sorry, you'll have to click on the picture to see what I'm talking about)

In this case my copy of Windows 7 Pro identified my .FASTA as a simple .TXT file and to clean it up it went ahead and added a .txt to the end of my file...then hid the .TXT.

To find this feature and turn it off so it never makes you consider smashing your keyboard late at night ever again, simply go to the search bar and type "Folder Options" then turn off this idiotic option and hit Apply.  Then you can go to your folder and delete the stupid .TXT and everything should be working again!

Sunday, November 2, 2014

The NCI-60 proteomics project has awesome online tools!

I've mentioned the NCI-60 proteomics project a couple of times over the last year or so since the publication came out in Cell.  If you aren't familiar, the NCI-60 is probably the most well-characterized set of cancer cell lines in the world.  The National Cancer Institute picked these lines years and years ago and hundreds, if not thousands, of papers have been written comparing and contrasting them.  One of the most common uses is for drug sensitivity comparisons.  You invent a new chemotherapy agent and someone at the NCI is going to dose all of these cells with tons of different doses of your drug under different, clinically relevant, circumstances.

Despite talking about it a few times, I've never actually got onto the tools TUM has provided for analyzing this amazing resource.  I've had some time this weekend and they are really really good.  You can use the online web tools to look at comparisons of virtually any proteins against these lines.  You can break them down by chromosome or by function.  The tools are fast, simple, and intuitive.

Best of all, the file downloader is simple and fast.  I'm benchmarking some functions in Proteome Discoverer 2.0 and I wanted a big dataset.  So I downloaded one of the cell lines that they studied "in depth".  Minor minor criticism -- it would be nice if it was easier to separate the files that belong to each cell line.  If you want every MCF-7 fraction, for example, you need to scroll down and pick each on individually.  Told you it was minor!!!

And while I'm talking about the RAW data.  Whoever set up the Orbi Elite that acquired this data knew what they were doing!  Part of my day job is to look at other people's methods and see where a tweak here or there might improve their data.  I'm pretty good at that part.  I wouldn't change a thing in the files I downloaded from this resource.  That method is perfect for the chromatography conditions they used.

Anyway, if you haven't checked out this resource, I strongly recommend that you do.  You can link to it directly here!

Saturday, November 1, 2014

Smart and inciteful review of quality control in proteomics

More quality control!!!  Maybe they just drilled this topic way to deep into my brain at Hopkins, but I just can't get enough quality control.  Particularly in this field (where we, universally, do too little...sorry...but its true)

Michael Bereman took the topic apart in this review (still in press at Proteomics, pre-release available here).  He does a very insightful and helpful step-by-step approach.  He focuses on the tools that are available and breaks them into three categories: manual extraction, automated, and real time.

I'm a big fan of the graphics that he uses to really streamline what he's talking about.

For example (sorry Michael and publishers...please don't sue me...I deliberately dropped the resolution...and not cause I'm sitting in a HILARIOUS honky tonk bar in the middle of nowhere Washington State...and I'm on 2G...or whatever...)

Anyway, the review is ridiculously clear and highlights the tools that are out there (most, if not all, are free!!) that can make you and your collaborators trust the shit out of your data.