News in Proteomics Research: January 2017

Tuesday, January 31, 2017

Did you miss the iQuan seminar series?

Recently the Thermo Center of Excellence had a series of online webinars called iQuan. If you missed them on the first go-around, I've got good news: they're available online now!

You can go directly to the videos here!

Oh yeah! Here are the video topics:

1. High Resolution Accurate Mass Peptide Quantitation

2. Best Practices for Peptide Quantitation On a Triple Quadrupole Mass Spectrometer

3. Pesticide Quantitation on the TSQ Quantiva

4. TraceFinder™ 4.1 Quantitation Workflow using High Resolution Accurate Mass Data

5. Q Exactive Series Maintenance and Calibration

6. Triple Quadrupole Maintenance and Calibration

Monday, January 30, 2017

ProteomeTools -- A library of 330,000 peptides!

Have y'all seen this one?!?

In a nutshell -- a bunch of awesome people (hey! I know a lot of these authors!) made 330,000 (ultimate goal over 1,000,000) synthetic peptides and fragmented them every which way in an Orbitrap Fusion.

You can download all the RAW data at PRIDE -- and shortly, you'll be able to download all of the spectral libraries!

Sunday, January 29, 2017

FINALLY!! SMART digest for proteomics!!

The SMART digest (previously called FLASH) works amazingly well for digesting commercial antibodies that...you know...are used cause they happen to work perfectly for mass spectrometry.... I've got files with 100% coverage off these kits from a couple of different antibodies. Fast, perfect, reproducible.

But...the question has been...does it work for other proteins?!?!

Check this out!!

Consensus? Yeah! It works for all sorts of proteins! Including ones that are hard to digest by other methods.

Friday, January 27, 2017

Loss-less nano fractionation -- all you need is a Spider!

Sometimes you just need absolute coverage of your sample -- an MS/MS spectra (or 10) representing every theoretical peptide -- and then you find out you've only go 10 ug to work off of...which means you're pretty much stuck with long column 1-dimensional fractionation....

OR!!! You hook a SPIDER FRACTIONATOR up to an EasyNLC and get loss-less prefractionation, as described in this article in this month's MCP!

Nanoflow fraction collection? Into 96 well plates? Virtually no sample loss? Any other cool things about it?

You don't have to buffer exchange or anything -- you can fraction collect in your first dimension with C-18 as shown above, recombine your fractions in a way that ensures a relatively equal distribution of peptides on your analytical gradient and get maximum coverage without any dumb loss steps.

Worth noting: The Spider fractionator is a beta test unit in development for commercial release by PreOmics and isn't quite available yet. You know, cause sometimes Dr. Mann gets access to stuff before the rest of us do....

How can you get away with calling this "Loss less" fractionation? Seems like an exaggeration, right? Ever fraction collected 1 (one!) ug of peptides, ran the fractions and gotten 10,000 protein groups? I sure haven't, but...

Thursday, January 26, 2017

Are we in the golden age for public proteomics data?

This new paper in Cell Trends is just awesome!

For years, beautiful proteomics data has been accumulating in all sorts of awesome public repositories. You or your biologist counterparts have extracted out what you needed for that study and you've dutifully put it somewhere that other researchers can now get to it.

Relevant: the Nature family of journals now require statements of data access. There now aren't too many that don't! w00t!

There is a whole new field coming -- of bioinformaticians who re-mine proteomics data with new algorithms and draw new and better conclusions from hundreds or thousands of files in these repositories! From these approaches, it is just good stuff for us. Their new algorithms are gonna trickle down to us and our first pass data is just going to get better and better!

Worth noting -- these authors do mention one of our weaknesses (worth stealing the whole screenshot because it is so well written!)

QC has come a long way recently! More and more labs are spiking in internal standards like the PRTCs (or running them between samples) but we're all gonna have to implement something rigorously. 'Cause it sure won't be a secret if we don't!

Wednesday, January 25, 2017

Why proteomics hasn't yet become the new genomics!

Sorry if I've posted this extremely insightful review on here before. DayQuil and short term memory don't go hand-in-hand in my experience. If I have, maybe it's just because it's worth a revisit!

It has seemed like -- for what, at least 10 years like we're on the cusp of becoming what genomics is to biologists. And we've had some awesome victories, but there sure are challenges.

It is only a couple pages, but cuts right through to the heart of it, including where MS is going in the biology lab now.

Tuesday, January 24, 2017

Improving UVPD in ion traps!

This has gotta be the year, right?!?! We've been hearing more and more about UVPD, to the point that I promised to stop writing blog posts about it. Guess that was an alternative fact, cause...check this out!

This new paper in JPR improves on the spectra obtained from the UVPD fragmentation data that you don't have!

One of that challenges that we'll see with the UVPD when some enterprising little startup gets rolling retrofitting our instruments with these systems (disclaimer: I don't know that this exists -- but...it totally ought to...somebody out there with the capacity to do this would like to set up a booth at ASMS 2017...and walk away with millions of dollars worth of orders...right...??) is that the fragmentation spectra can look like this ---

Taken from this abstract!

And...you know what is going to be fun from a data processing perspective? The amazing level of search space that is required to search all those possible fragmentation locations!!!

Do-able of course -- more do-able with high resolution accurate mass -- but even more do-able with this new methodology in this paper!

For the mass spec physics nerds -- it is an optimization of the resonance ejection to improve the S/N of the actual interesting (and searchable) fragment ions. For us biologists, it is better data out of the ion trap (that we don't have yet) faster!

For me -- it is a lazy blog post I put together in under 30 minutes while struggling my way out of this NyQuil/head cold haze! Wins all the way around!

Monday, January 23, 2017

Reminder -- Skyline Webinars tomorrow!!!

Reminder -- the first live Skyline webinar in a while -- and it looks like its gonna be a doozie! You can direct link to register here! For us east coasters looks like the second one is at 7pm EST!

X!Tandem Pipeline 3.4.1 -- a really smart strategy for grouping phosphopeptides!

Despite all the phosphoproteomics work that has been done -- working with phosphopeptides still kinda sucks. This new paper in JPR takes a whack at improving one of the pain points at the data processing level.

The strategy is worked into X!Tandem Pipeline 3.4.1 (which is freely downloadable here. Don't worry, the interface isn't in French!)

It employs a new (at least to me!) grouping strategy for both identified phosphopeptides into "phosphoislands". There are many reasons why this is a good idea, but the one that is the most obvious is how we deal with miscleaveages or peptides that show up as (for example) both +2/+3 species. At the proteoform and biological level they are representative of the same thing -- and increase in phosphorylation at this one site on the protein. If you've done much work with data processing of PTMs -- you know this isn't going to be how your output looks in any software I use. It is going to look like separate events and, while you can figure it out, it can be an annoying exercise!

By grouping these events into a single event -- a phosphoisland -- it takes a lot of work out of figuring out you're looking at upregulation of phosphorylation at this one site. Another advantage is when don't have great evidence for the actual phosphorylation site on a peptide where multiple modifications could occur -- it looks like you could group those together as well!

Worth noting -- this isn't the only power this snazzy interface has -- it has easy pull-downs to see your sequence coverage and it is grouping peptide/protein IDs really logically as well!

Sunday, January 22, 2017

Monitor your instruments with MS Instrument Connect on the Cloud thing!

Do y'all know about this thing? I think it might be a serious asset! I guess it has been around for a while for a lot of other instruments -- like the thermocyclers and things -- but it is now live for mass specs via MS Instrument connect!

It runs through the Thermo Fisher Cloud and it has a simple user interface that allows you to monitor, in real time(!), up to 3 instruments from your PC and possibly from your mobile devices! (Can't confirm yet...but the App looks like it'll do it!)

What it does:

1) Overview of your instrument(s) performance and status

2) In depth analysis of the runs if you want to exactly where things are.

3) Customizable email alerts if there are errors! Right?!? I don't actually want to look all the time, but I just want an email if the instrument stops (or once my Sequences complete!)

4) It isn't TeamViewer or GoogleConnect or other widely used (and therefore attractive targets for hacking...) Seems like these are being blocked at more and more institutions in my area....hacking sounds like a lot of work...is somebody gonna really want to spend a lot of time to get a few LC-MS labs? Seems less likely to me! (Scary screenshot from my newsfeed a while back...)

The JHU Center for Proteomic Discovery has been beta testing this for their Lumoses and I've been hearing good things.

There are some requirements:

You have to register with the Thermo Fisher Cloud.

You're instrument(s) need to be on newer Foundations/Xcalibur combinations.

And you need to download Applications to your network connected instruments and your devices you want to monitor them with. There are good instructions to walk you through setting it all up.

Saturday, January 21, 2017

You can't characterize transposable elements with proteomics!! Wait -- what?!?!?

I admit it -- I'm blown away by this one. To the authors of this study -- I owe you a round of drinks. Seriously. See you at ASMS?

1) You can't do proteomics of transposons!
2) You certainly can't do it on a non-model species without 100% coverage, 50x translational coverage and perfect annotation
3) You did it on the yellow fever mosquito? The vector of all sorts of murderous diseases?!?!?

You deserve a medal -- and a round of drinks. Warning: I might hug you. Kidding, probably!

This is the paper -- and everyone should read it (Open Access even? Of course it is...)

1) First off -- why is this a big deal? I'm glad you asked! The reason I said #1 above is that transposons are inherently chaotic from a genomic/proteomic sense. As this stolen image shows...

...the transposon moves around and interrupts things. There are systematic reasons/regions for this, but overall they are tough to deal with. If you're doing proteomics with your nice, curated UniProt database alone and you've had some jumping genes (transposons) in the proteome of the organism that you've actually digested -- well, you aren't going to find the area where the transposon messed up (okay...maybe if it has 2 copies...but that's another problem for another time...let's simplify it).

Hopefully it didn't land in the middle of your coding region or blow up a start or stop codon, but if it did you might not have MS/MS spectra to match your database (cause the genetics just don't match anymore).

2) There is a reason we develop new methods on E.coli and C.elegans and D.melanogaster --cause we know just about everything about them. A piece of a gene from the model coliform bacteria ends up in the wrong place -- we can figure it out, probably. Outside of our model organisms, there is an awful lot of chaos.

3) This mosquito sucks. It is known to transmit at least 5 viruses -- and it is awesome enough that, on rare occasions, if it bites you once it can infect you with more than one virus. We need to know more about this thing!

Okay -- so how did they do this? They used a technique called PIT (Proteomics Informed (by) Transcriptomics. It is detailed in this Nature Methods Paper from 2012 (from several of the same authors of this study).

In a nutshell -- they de novo sequence the RNA transcripts that they find with their fancy next-gen sequencing equipment. Once they have those, that is what they search their MS/MS data against. If you want to do this yourself, you need Galaxy (if you have a big genomics effort at your institution, chances are you already have a server loaded up with Galaxy programs...you may have to leave the safety of mass spec cave for a bit...the comforting sound of roaring vacuum pumps will be there when you return...not making fun of you guys...I'm with ya'!) and you need this GitHub package. Implementation of this into Galaxy is thoroughly detailed in this open! study from last year.

...only inserted because it was far too many words in a row without a picture. Bernie looking indignant after being accused of stealing my shoe cracks me up every time (and made the front page of Reddit/r/pugs a while back!) Back to seriousness!

Let's go back to #2 and #3 from above -- using a method like PIT is gonna be a whole lot easier on a model organism...but we have a sequenced genome for this sucky mosquito! Why go to all the trouble of PIT?

...cause PIT shows that the genome of this organism needs an awful lot of work! Think about it -- we're flipping the paradigm here! Traditionally, the thought is that we can only identify peptides that are informed by our genomic sequence. Here -- we are leveraging the transcriptomics to give the power to take the MS/MS spectra and show where the genome is wrong! They can go into specific examples of these regions where the spectra and the genome don't match and figure out that this area of the genome, for whatever reason, had low sequencing depth -- or was misannotated or things. You know, 'cause the mass spec never lies.

On this topic -- and something that is only of very minor concern here is that the data was acquired on an Orbitrap Velos running in high/low. Is there a little more wiggle room in the peptide sequencing data because of the lower resolution/lower mass accuracy of the ion trap? High resolution acquisition of the MS/MS spectra as well might very well strengthen the findings of this beautiful paper, but on an OV you are going to take a hit in overall peptide sequencing depth and I can't disagree here that depth was more important.

Okay -- finally back to #1! The transposons! Transposable elements have characteristic regions cause transposases (I think that's what they are called) leave specific signatures behind -- I forget the details and I'm losing motivation -- this one is taking a loong time! The unbelievable sequencing depth this group has from the transcriptomics + proteomics allows them to find all of these (they call it the mobilome!---adding that to the translator for sure!) Some of those places where the genome and PIT disagree ends up being inserted transposable elements -- and with information from both the T and P levels, it is darned convincing!

TL/DR: Amazing study with publicly available tools shows how proteomics/ genomics/ transcriptomics can be leveraged together to massively improve our understanding one of earth's worst pests. Ben puts awkward image of him hugging authors with his freakishly long arms into people's heads.

Friday, January 20, 2017

A changing of the times...?

With a title like this on a day like this, you probably think I'm talking about something happening in about an hour from me in Washington.

Don't worry, my level of denial regarding that series of events is nearing 100.000%. I don't even know what you might be implying.

What I'm actually thinking about is the abundance of these pre-print articles that keep popping up in fields outside of our own!

This is the topic of this article in this month's The Scientist (my 4th favorite magazine stacked in my bathroom -- and probably yours!)

I know this is controversial -- and there are big 'ol weaknesses to preprints -- like -- THEY HAVEN'T BEEN PEER-REVIEWED YET! -- but -- the up-sides can't be ignored either. It doesn't seem uncommon for a traditionally published article to be in some stage of review/revision for 2 years, right? I'm seeing quite a few. And...you could maybe argue...that these kinds of lags aren't exactly allowing science to always proceed at the rocket pace other areas of technology might be able to (and maybe that is a good thing...but....)

I figured it was time to start talking about this when one of my favorite recent studies in our field went the open / pre-print route!

F1000 is an open and transparent publication system. It works like this:

In the case of this paper from Breckels et al., I can see exactly where they are in the review process. As of this morning, 2 reviewers have checked in and it looks to me like it is about to be accepted with some revisions. I also know who is reviewing it, and I can read their comments. This is just a little different than what we're used to. Is it the right system? I dunno...but it sure is fast!

The thing I'm seeing a lot more of...but not proteomics yet (well...some nice meta-analyses) is biorXiV (not sure I capitalized the right stuff).

This resource from CSHL is a hybrid system. It allows the posting of research and comments before the study goes to a journal for publication. It allows you to have a place holder for your findings to some degree while the final work is finding a permanent home. Your data gets out there, allowing the next person to build on it, but you can definitely stake your claim that it was your development.

The next question, of course, is this nice statement we see a lot of the times when our we submit and article for review:

If you read the beginning (this is for MCP) it sounds like you can't use a pre-print service, but if you keep digging you'll find this isn't the case at all! You just need to disclose all the info (further down in the highlighted zone).

YEAH!! Go MCP!

You have to download the JPR author guidelines PDF to find that that they're already prepared for this revolution in proteomic data dissemination

(WOOHOO!!) ...which definitely makes it seem like I'm the one who is behind the times!

Thursday, January 19, 2017

SRMs for targeted verification of cancer!

I have a great subtitle for this nice new paper from Anuli Uzozie et al.,currently in press at MCP.

Subtitle: "Why a biostatistician should be involved at the beginning of your project".

At first glance, this is a boring paper. They've got a bunch of samples and they've got a bunch of discovery data so they design SRMs on a relatively small number of interesting targets (their heatmap shows less than 30 at the end of the paper) and run these samples on their TSQ Vantage. Sounds like they forgot to write the last chapter of their previous paper and here, somehow, they got it into MCP.

However, this might be the best validation exercises I've ever seen. It really makes me suspect one of those fancy biostatistics people was involved in this from the very beginning -- because they get so much data out of it! Start with the fancy randomization stuff (boring...but good science...) and the level of downstream statistics that makes REALLY impressive conclusions from relatively little data -- and I'm walking away from this paper a little bored (which might just be 'cause I'm typing rather than finishing my coffee) but seriously impressed....

...and wondering if I designed my experiments just a little better if I wouldn't have just a whole lot more data out of every run....?

Wednesday, January 18, 2017

Absorb entire cells into dried acrylamide gels for digestion!??!?

This study is paywalled...and you can't find any details until you get into the Supplemental info, but it is still neat enough that it is worth talking about!

What is it? A new and enormously efficient looking digestion method. They seriously make polyacrylamide gels dry them down completely and then add a solution containing a solution of whole cells (looks like they used non-adherent cancer cells, but I forget now).

Once the cells are inside the gel they basically treat them like a normal in-cell digestion. Compared to what they get with their normal in-gel digestion methods (and something called pro-absorb) they get massively more peptides and proteins per run on their Orbitrap Fusion.

Personally, I'd never have guessed that whole cells could be taken up like that! And, in their analysis, there were far fewer missed cleavages than the other methods they used. Now....you might argue that the total number of proteins they get out of all 3 methods seems a lot lower than what normally expect to see with more traditional methods like in-solution digestion or FASP, but this paper is clearly focused on showing the promise of this novel cell-absorption/digestion method.

Tuesday, January 17, 2017

HUPO 2017 Registration opens today!

No time for a real post today - busy, busy, busy -- wait?!? What great timing! Registration is now open for HUPO 2017.

Monday, January 16, 2017

Biodiversity pathway-centric data browsing plug-in for Skyline!

It has been enormously exciting (at least for me!) to watch our field develop over the last decade or so. Every year we get more and more coverage of our proteomes and the file seem to have more data that we don't have the algorithms to extract yet and...yet...it is also getting kind of intimidating! Discovery is great, but we still want to be able to easily do hypothesis driven research -- which is tough in a datafile with 1e5 spectra and 40,000 quantified peptides!

If you wanted a tool to narrow this down a little without learning a completely new software interface, you should check out this study from Michael Degan et al.,!

I'll be honest, I don't have hands-on time with this yet. I just downloaded and unzipped it, but I'm running out of time this morning already. The concept is good enough that I'm gonna trust them that they executed it as they described.

Okay...one super minor criticism...it was a little tough to find where I actually downloaded this thing (okay...accidentally closed the tab where I got it from....then found it tough to find a second time, n=2!), but you get the software from GitHub here. While I'm at it, here are the instructions for the software!

Why would you use this? If you are about 75% of our field, you probably are already using Skyline (I made the number up) so you're not going to have to install and learn a new software interface. And...unless I'm completely mistaken -- Skyline has never had features like this at all. You tell the Biodiversity Plug-in your organism of interest and it populates known pathways for the organism -- and you use those pathways to zero-in on what you want to study -- and browse the quantitative data for those pathways in the RAW files you've loaded! RIGHT!?!?!?

Not excited? I don't think I explained it right.

Say...you've got all this data from high-throughput screens of your drugs that suggest that this drug is inhibiting cell growth by messing up central metabolism. You could process your 400 RAW files the traditional way and then dig through the resulting 5e6 identified and quantified peptides which will take at least a couple days of crunching time -- regardless of what you are using to process your data OR...you can focus on the pathway you are interested in....

This gives you a % coverage OF THE PATHWAY YOU ARE INTERESTED IN!!! 80% of the peptides in the proteins in this PATHWAY (sorry, having caps locks problems) are contained within this library and easily accessible from within this plugin.

Then you input this specific hypothesis-driven data into Skyline and can look at the quantification of the peptides/proteins in this pathway in all your cells. The data reduction alone means that you can do this in real time (rather than 2 days to process...and maybe a day to filter later).

Even if we ignore the amount of data reduction you have here -- I'm going to guess that you've collaborated with someone who was aghast at the big Excel spreadsheet you gave them with all the IDs and quan at some point -- when they just wanted the mass spec to confirm their hypothesis. And this might be the easiest way to get right to this that I've ever seen, both for you and for them (once they download the amazing free Skyline software, of course!)

Worth noting --- this Plug-In has more features than what I'm focusing on here. I'm selfishly just going after what I find the most interesting -- and since it is in Skyline, I think this is going to be compatible with every instrument type and vendor and scan-mode (they specifically mention that it works with data-independent (DIA) type experiments as well.

Sunday, January 15, 2017

ASMS 2017 -- Abstracts due February 3rd!!

This public service announcement is brought to you from the beautiful and progressive state of Indiana! If you don't want to miss your chance to see it you need to get your abstracts in for ASMS 2017 by February 3rd!!!

Here is the direct link to the abstract submissions page.

Saturday, January 14, 2017

Do you hate PD 2.1 and just want to run it like PD 1.4? I made you a template.

I was having lunch with 2 of the most skilled proteomics guys I personally know -- and both of them talked about how they're still using Proteome Discoverer 1.4 -- and hate PD 2.x. I understand, for real! It is a new architecture -- and with that backlog of samples you have, there isn't a lot of time to learn new software interfaces.

So I put this together this morning and maybe it will help? I call it the "I Hate Proteome Discoverer 2.1" Analysis Template. Disclaimer: This is for simple peptide ID runs. Maybe I'll do a quan one later.

Step 1: Open the accursed new version of Proteome Discoverer and start a new Study, name the study and choose your RAW files. Ignore the processing templates and other things (you can click to expand these pictures)

Step 2: Download this template from my personal DropBox account here. Depending on your browser, you may need to right click here to download (I have to say "save link as").

Step 3: Ignore all that junk in the big grey box. All of it -- just Open the Analysis Template you just downloaded, then Click the weird little button by the "Processing Workflow" text.

Now you are in a window that looks just like what you're used to in PD 1.0 - 1.4 SP1, right? The consensus workflow is set up and you don't need to bother with it. Go to your search engine, add your FASTA file, adjust your tolerances, all exactly the way you do in the version of PD that you don't hate and you're almost set. All you need to do is get your RAW files into that little box below the number 5 (above) and hit the "Run" button.

Step 4: Click the Input tab near the top of the screen to get to the Raw files you added earlier. You can also add more raw files here. Click on the Raw file you want and drag it over to the window below the processing tab. It takes a few tries to figure out where you need to click (and where you can't) to drag the files over. You can hit the <ctrl> button to click and highlight more than one file at once, just the way you would in Excel. Hit the Run button!

Step 5: Go to the Job Queue and open your processed files. You'll need to open the Consensus workflow for each file and you're golden.

Now...the output report might not look like what you're used to. I haven't used PD 1.x in a long time, and I don't remember what it looks like (and it's Saturday, I'm gonna get out and enjoy some of this snowy day, rather than work on this blog all day!). If you hate the output and have suggestions, email me what you want it to look like and I'll see if I can create a filter method that I can add to that download that will make the output closer to what you want.

While I might not have LOVED the PD 2.x architecture at first -- I immediately preferred the PD 2.x output reports -- but if it is bugging you, let me know, I can take a swing at it!

Wait --> One more disclaimer before I put on my boots --> the results from this template may not 100% match what you get in PD 1.4, because of 2 changes in PD 2.x --> one I could change to PD 1.4 format and one I don't think I can change.

1) PD 1.4 only does false discovery rate filtering at the Peptide Spectral Match (PSM) level. PD 2.x can also do FDR at the peptide group and protein (and protein group level). I left the peptide group FDR on. There is too much evidence in the literature that this step is essential to getting the best data for me to recommend turning it off. There is a video over there to the right where I discuss this and show you how to turn it off if you need to match your peptide IDs exactly.

2) Parsimony for protein group identification. In PD 1.3 and 1.4 (maybe the earlier ones -- it has been too long for me to recall) when we have equal evidence at the peptide level that would equally support the identity of 2 distinct proteins, the protein that would have the highest percent coverage is made the top hit in the protein group ID. In PD 2.x, under these same conditions, the most intact protein reference from the FASTA is chosen for the group ID. The protein (not protein group) list is unchanged. I absolutely love this change because most databases have alternative cleavage events and partial protein fragments that, in the older versions, would get a higher group ranking -- in big databases you'd almost never see your intact protein -- even though it is probably(?) the most likely one biologically to be actually present.

That's it for today-- Gusto is wearing a scarf and he's ready to go!