Tuesday, May 31, 2016

Can't figure out your quantitative proteomics? Try a triangle!!!!

Either my stewardess really liked me and was extremely liberal with my dinner of scotch on my way to San Antonio, or this paper is completely brilliant. Considering my heritage, I'm leaning toward the latter.

What am I rambling about now? This fantastic new paper from Fernando García-Marqués et al., that you can find here.

Every single time we try to pigeonhole biology we're disappointed, because we completely underestimated it. A recent approach for the -omics community is something like this: (take a deep breath).

Forget everything that we think we know. And just look at what is statistically changing. Seriously. Throw out the gene ontology. Throw out the annotations. Start with what is statistically 
significant and throw the rest out. 


See if anything that we have figured out about the biology makes sense in this context. This is a tough one, I know. But it appears to be working. Think about it. How many times have we said "I've got this one. This is a checkpoint protein. So and so  proved it by western blots in '87 and about a thousand papers show that its a checkpoint protein. And then you find out that if you get a proteoform of it that is shortened 43 amino acids and is methylated that is has NOTHING to do with cell cycle checkpoints? Bonus points if you know what I'm talking about. Seriously. Ontology is awesome, but if we think we've got a protein down to "this is what it does" we are often surprised by something new it does. Its energetically favorable for evolution to find a bunch of new things for proteins to do. Its a lot less favorable to evolve a bunch of new proteins. 
The triangle approach isn't quite as intense as this. (Seriously, we do know something about biology by now). But you could cluster it in this regard. 

They start with their algorithm:GIA (which you can download here): 

GIA is short for "generic interpretation algorithm" YEAY! Thats what I'm talking about!!

And they take this triangle approach. There are a bunch of equations that are mostly Greek that walk through how they came to their approach. When it comes to the math stuff, I'm not the most qualified guy, but this seems really elegant and logical.  Start out by taking a step to the side. Assume you don't know what you're looking at. Let the Systems Biology Triangle do the hard thinking. Then once its done its pairwise analysis of the things that are truly significant, then go to what we know regarding the biology. 

Okay. How do you test such a nutsy concept? You could pull some historic data out. And find what they were looking for but didn't find. That's a nice start. Or you could pull some complex and confounding biological models, run your own experiments, see that your results make sense when Panther came up with nuthin. And then you could go ahead and do the validation yourself -- on a time course where early in nothing that they tested could see any perturbations in the system. But hey could. And it successfully predicts what is happening earlier in the timecourse showing massively higher sensitivity(that's all I could come up with) to the shifts that are going on at the proteome level.

They walk away from this sweet idea after doing some level of validation on 3 (or 4?) separate experimental sets.  And the validation looks top notch.  If any of the authors of this paper happen to be in San Antonio this week, hit me with an email at: orsburn@vt.edu.  I'd seriously like to introduce you to my secret employer's downstream pathway experts.

Monday, May 30, 2016

De novo sequencing and resurrection of an antibody!

Okay....so what do you do if somebody once created an antibody that works against your disease, but tragedy struck and you can't make any more of it? The cell line is gone, the material is limited...and its real important? Like, so important that only 2 antibodies to this disease have EVER been isolated?

Well, you could take the little bit of it that you have and crystallize part of it for high resolution NMR and you could give some of it to the awesome team at the NYU Proteomics Resource Center and maybe just to pull it all together you might call on the team at Protein Metrics.

And this is what you have here: this sick paper from Walter Bogdanoff et al., called "De Novo Sequencing and Resurrection of a Human Astrovirus-Neutralizing Antibody." Even the title is cool!

I'm going to have to skim over the NMR stuff -- its a crystal and you can see whats on the inside and outside and stuff and some inference of what amino acids are close to other ones.

On the mass spec side, though? There are some pros running the equipment up there in NYC!

They break out the Fusion and get an intact mass of the antibody. They mention that they don't get full isotopic resolution (I don't think anyone has ever got a full antibody resolved to baseline, but they seem disappointed...) but they get their starting masses for the intact chain.Then they reduce the antibody down to the heavy and light chains and pull full isotopic resolution on both of them. Wait. Isotopic resolution on the heavy chain??
Told you, this team is seriously good!  They get some really good masses and notice that they've got some variations to the heavy chain. (To do this, they had to use "protein mode" on the Fusion and they run through how they worked their gas pressures and differentials to make this magic happen!)

Next they digest the subunits with multiple enzymes and run the digests out on a Q Exactive, get high res MS1s and MS2s and the data processing comes to center stage. I'm a little fuzzy on the de novo process they used, but it sounds like the best hits are used to make a database that is used in the next round of processing and so on. Traditionally, de novo has a pretty high FDR -- we worry about the size of our search matrix when we have a database. Here the search matrix is basically infinitely large. With that many possibilities there are bound to be some bad matches. Run this data through any de novo algorithm and you'll come up with a sequence, but really...is it really, really right? This is important -- we can't be messing around with some okay sequence here. Antibodies aren't real forgiving. The sequence has to be right.


They take the sequence information
And they use it to create a new antibody
And it binds to the antigens expressed by the disease, successfully resurrecting an antibody.

Wait...is this meme using the wrong version of "you're"? Believe it or not, I didn't make this!

What a ridiculously good piece of work! I'm assuming this search strategy is something I we can learn more about during ASMS.

Sunday, May 29, 2016

Peak Juggler! Unlabeled peptide quan (with XICs?!?!) inside of Proteome Discoverer 2.1!

Mwaaaaahahahaha!  Look what I got to try out this weekend!  Super cool new nodes for PD that will be launched at ASMS.

What's it do? Peak alignment and easy (free!) label free quan courtesy of the Mechtler lab!

Can you have it? Not yet, Mwaaahahahahaha!

What is that?!?!  Is that seriously an XIC for a label free peptide inside of Proteome Discoverer 2.1? Sure is!

As cool as this is -- wanna know what I find maybe even cooler?

On the super secret download site there is this interesting tidbit. You need to have R installed for this to work. Take a step back here and let your imagination run away with mine. There are SO many powerful tools in R that aren't exactly accessible to some of us more...venerable...?... scientists who finished our formal schooling before this R thing became so key to science.

I'll be honest. My brain is rapidly losing flexibility. 2 online R classes and I know enough now that I can load a program and the help menu after a couple shots.  And that's it. But if you give me a node that takes data out of Proteome Discoverer and can run some of those R tools -- and bring it back in for me?!?!  Holy cow, do I have some stuff I need to rerun!  Can you imagine all those things we've been seeing in BioConductor and from Lgatto? and others at your fingertips? Variance normalization, anyone?!?!

Again, that is my mind running haywire here. I would suggest if you're at ASMS that you look around for posters and talks coming out of Vienna and see if you can get access to this...and maybe get some idea where this skilled and generous team is thinking of heading with all this power!!!

Saturday, May 28, 2016

Structural characterization of missense mutations(!!) by Native intact protein analysis!

[Edited on 5/30/16 to take out some of the stronger language.]

Characterization of mutations is one of the next big things coming in proteomics. I know 4 or 7 groups working on it right now that have all sorts of cool things to show soon.

But this study in the new JASMS in my bathroom is something I hadn't even considered!  Its from Gili Ben-Nissan et al., and you can find the abstract here.

In this study they show that they can find missense mutations (change in one amino acid) in NATIVE INTACT PROTEINS!!! (Sorry for shouting, this is rad!)

What they are interested in is a protein involved in Parkinson's disease. And there is a mutation associated with it that changes the native protein structure. To approach this analysis they break out 3 instruments -- two TOFs and the Exactive Plus EMR.

Very early into the study discover that the EMR is doing such an amazing job that they continue with it and drop both the TOFs (even the fancy pants IMS-TOF) because the EMR provides the data that they need. I seriously love that little box. Someone in Maryland buy one so I can pop in just to see it!

What they do say is that they end up using the EMR exclusively because even the ion mobility TOF can't do this kind of analysis

[End Rant.exe]

Since they're down to the EMR, they have to do some sample prep before they go into the instrument.
 I know there are some fancy IMS things that you can do and some people have found that they can take the effect of the ion mobility field as reflective of some sort of a function of protein stability, but
you can definitely argue that the effect of an electric field on an intact protein in the gas phase isn't exactly the most direct way to understand something meant to happily exist in an extremely complex aqueous buffer system.
 Here they do some really nice degradation assays and thermal stability stuff (in liquid) before the EMR and --- they show they can absolutely track the native protein changes from a single amino acid substitution.

What?!?!  I know!

Friday, May 27, 2016

ASMS event planner is up and it is kinda great!

You might be surprised to know I'm not the most organized guy. Though, maybe the 11,000 unanswered emails in my personal gmail account above might reinforce this idea. (Please ignore the Drumpfinator on the right. This blog is politically neutral I think, but I still personally find that App hilarious.)

Anyway, I finally signed in on the ASMS event planner and its totally rad. The search engine and filters are smart, smooth and the calendar works well and is intuitive! I don't know yet if I can download the schedule I build for what posters and talks to see and when and whether I can directly import it into my Outlook calendar yet. If I can't it probably has to do with stuff on my end, but even without it, I can easily just access the summary of this plan through the App.

Good going ASMS!  I'm impressed and I might get to see what I want to this year!

Thursday, May 26, 2016

Special Epigenetics issue of MCP!

In January MCP did a big ol' issue focused on epigenetics. Personally, I'd rather ignore this whole interplay between protein back to influence the DNA level because, honestly, I don't think my brain is big enough to think about it. If we have 1e9 proteoforms isn't that enough mechanisms to completely control a human system?

Unfortunately it seems like mounting evidence is building that we need to think about this stuff...or, at least...someone has to....and this special issue gives us a perspective of where we are right now.

A good starting point is this introduction from Mike Washburn, Ying Ming Zhou and Ben Garcia.

Counting the introduction, there are 19 articles here. If this is your field -- congrats, here is a new book on the topic! Other technologies are working on epigenetics as well, but here you can see that we've got the tools to take this confusing field head on. A study from Don Hunt's group applies ETD to histones and the Kelleher lab contributes some quantification via Top-Down of specific histone methyltransferase.

There is some great work in here and, holy cow, it shows how far we've gotten with our understanding of PTM interplay and with just our understanding of epigenetics as a whole. It isn't all sunshine and rainbows, though. Even a quick read through of some of the discussions will tell you that -- yes, we've got the best tools we've ever had, AND we can do some great stuff... but this is still a very big and complicated and relatively new field of biology and we have quite a way to go yet to fully understand what epigenetics really is. I'm glad people are interested in working on it!

One of the last articles is this gem from Gene Hart-Smith et al., that discusses how general shotgun proteomics searches don't do a good job with protein methylation data. My spidey sense warns me of a theme this year -- FDR and PTMs? Of course, they propose some solutions, but I expect some interesting conversations in San Antonio regarding how we improve our PSM confidence.

Wednesday, May 25, 2016

PlantPReS -- A proteomics database for plant stress response stuff!

This week I sat down with a great local scientist who is doing some plant proteomics work. Holy cow!  This stuff is hard!  There are so many problems that haven't been addressed really well yet in plants. Ridiculous redundancy, lack of basic residues, poor databases, lack of depletion technologies for the high abundance stuff...it just keeps going.

I took to some literature searching and found...yeah, this stuff is hard! But there is effort to work on it, so gains will be coming.

This new study from Mousavi et al., introduces some gains on the database side. PlantPRes is a database focused primarily on how plant proteomes respond to stress.

You can access it directly here. The interface is very straight-forward and has a lot of info on the topic. At first glance it might seem a little simplistic, but the data is manually curated!  And when you're looking at the tremendous amount of redundancy (and often huge genome sizes) present in plants, you probably want a smaller database of real information over something much larger (and probably less true)!

Tuesday, May 24, 2016

Systematic errors in shotgun proteomics

Wow, Google, you totally outdid yourself here. Did someone draw this freehand with a pencil?  It does the trick, though, and frames this somewhat sobering new paper from Boris Bogdanow et al., in press at MCP.

The premise of this paper is that there are PTMs in the stuff we're running. A LOT of PTMs.

".... It is estimated that every unmodified peptide is accompanied by ~10 modified versions that are typically less abundant(16)..." If you want to follow up on this, the reference to this statement is here.  I'll be honest. That is a bigger number than I had in my head....

We know there are lots of PTMs. Big deal. The problem they point out is that PTMs may be the majority of our false discoveries. By making the assumption that the most common thing we'll be seeing is the direct gene product we are propagating these errors. It is helpful that the unmodified version is likely to be the most intense, but if...for example, 1% of a peptide of albumin is modified, that is still going to be waaaaaaaay more abundant than most transcription factors.

Okay. They succeeded. I'm totally stressed out about this problem. Thanks, guys!

Wait...its in MCP because they propose a solution! Unfortunately, it isn't the easiest thing ever. This paper is currently open access, so I can put in a figure, right? If not (don't sue me!) email me: orsburn@vt.edu and I'll take it down, but its easier to explain it this way.

Now...they talk about other search algorithms, but this study solely employs MaxQuant and leverages heavily on a feature I didn't previously know about called ModifiComb. ModifiComb does an unbiased look at the potential PTMs. As described in the paper I'm going to consider it somewhat analogous to Preview. Cause its pulling out the most common dynamic modifications hanging out in your samples. (Preview is trademarked so I can't say if it uses the same logic described for this MaxQuant feature, but the end result seems the same).

Okay. This makes sense so far and makes feel pretty good about the search strategies we recommended at the PD 2.x workshops last year. But they diverge here a little and I think I like it. They run parallel searches with just individual modifications. Then they compile those and then they do the FDR stuff. According to the output it works really well. Why would this work better than using Preview to get the most common dynamic mods and throwing them all in? No idea, but if you're running PD it would be easy to try something similar to this approach.

Run your data through the Preview node. As so, grab the modifications individually (ignore the screenshot, grabbed randomly...)

Then, set up parallel PD searches like this:

They do something interesting with the target decoy searches involving doubling the number of scrambled peptides, but this is just a preliminary look, right? I'd much rather just use the node than use the Fixed Value node, pull my decoys double 'em and plot manually, but that might be necessary.

In terms of the Consensus workflows, I'm a little hazy on how they built their proteins from their PSMs, but it doesn't sound too far out of the box. Hey. Might be worth a shot, though!

They also integrate protein quantification data into assisting with their FDR, which other people have proposed (and honestly, might be some of the most powerful data we get from transcriptomics data..that's in here somewhere, right? I forget) but I don't have time to spend on the supplemental figures this morning.

TL/DR: Nice paper that emphasized that PTMs might be a bigger problem for FDR than we generally think. Proposes a reasonably straight-forward approach that might assist in improving the number of false discoveries coming from mis-assingments from PTMs.

Sunday, May 22, 2016

Proteomics in India!

I guess this is a couple weeks old, but it still made for an interesting coffee read this morning. It is an overview of the recent international forum and workshop on proteomics in Hyderabad.

While there were some big hitters in attendance and some interesting things over all I found this last bit probably caught my attention the most.

Saturday, May 21, 2016


And now for something completely different....wait...now for something!  Haven't had much time for hobbies lately, and now that I'm a little caught up I needed some inspiration and an email with the same title as this blog post did it!

The paper in that email is this gem from Y Maio et al., and is a combo- transcriptional and proteomics approach to understanding how scallops attach themselves to things. Here I've learned two things already. Scallops are attached to things!  And I'm bad at counting.

Inside this paper I learn a couple more things rapidly. Scallops are ugly. Like super ugly and they have a lot of different components. Actually, it doesn't look too bad in this public domain image, but they are super ugly in the pic in the paper (and many of the other ones online).

The thing they want to study in this paper is the scallop attachment. And they do so by excising the scallop foot (that attaches these things to rocks and stuff) Unfortunately, they don't have a good genome for it, so its proteogenomics time...well, kinda...its time for transcriptomics and some peptide mapping!  They use an Illumina platform for RNASeq during the attachment process and then cut gel-bands for MALDI-TOF and they do the relative protein quan by Coomassie staining.

I'm finding the genomics side of thing a little vague. It appears that they took their Hi-Seq output and then blasted the aligned output against everything that NCBI had for scallops and related organisms. This gave them 40,000 or so matches within their cutoffs and then they summed it up into GO.  I'm guessing I missed something there, cause if they really did it that way there would be some serious sampling bias. I wouldn't be surprised if this is so intrinsic to translational analysis of this sort that its commonplace to glaze over this normalization.

What did this get them? For one, some new proteins, and some very strong suggestions that the few proteins that the scallop foot is composed of seem to have some serious PTMs that may be regulating the whole adherence process.

As I am trying to get my brain back into proteomics mode this was a nice paper to start out on. As a side note, I'd like to share one of the first images Google gives you when you look for a "Scallop Foot anatomy"

Wednesday, May 18, 2016

Neil Kelleher did a TED Talk on the Cell-based Human Proteome!!!

HOLY COW!!!  Have you seen this?!?!?!  This is ridiculously clear and approachable....Popularize Proteomics!!!!

Tuesday, May 17, 2016

Stop the Win10 upgrade!

Wow...yeah, this is probably the longest break the blog has had in 3 years? Busy busy busy!!!

This is worth 3 minutes though.... This Windows 10 upgrade thing is driving me crazy. If you have this infestation have you noticed how much more pushy it has been getting? Where is the "NO" button?!? Are they seriously making it smaller or is that just paranoia? I seriously think its been getting smaller!  So, if you're like me and you've got $100k or more worth of awesome mass spec software on your computer and you don't want to wake up finding out you've now got Win10 and you can't process anything, there appears to be a reasonably easy solution called Never10.

I can't vouch for this quite yet, got too much stuff open to risk a reboot!  But according to this reddit feed, the author of the software is well known for solutions like these and runs a podcast for PC security nerds, so it sounds credible enough to me to try. You can find out more about Never10 here.

Worth noting: I do have a Win10 tablet that I run PD 2.1 and CD 2.0 on with no perceivable issues. A local scientist informed me recently that Xcalibur versions before a certain point do not function on Win10 even if you use the "compatibility mode." The fact that reddit feed suggests that some mass consumed video games get knocked out suggests to me that Microsoft has not tested compatibility with every piece of software in the world...as the popup often suggests...

That's a rant and a half... Back to work, Ben!

Friday, May 6, 2016

Is there any reason to skip low mass fragment ions?

This is a great question I got from a really good lab recently (and is part of another cool method project we're working on for a later post). The question came about because of the fact that I keep leaving recommended QE methods with fixed first mass of 100 as above (please don't pay much attention to that screenshot, I just put that together so I could have a picture).

One day I was running some TMT samples late and went to bed forgetting to set the fixed first mass at 100. I know I'm not the only person who has done that (not calling anybody out...) but...wow... wanna get some crappy quan? That is exactly how you get some crappy quan. The QE cuts off at the 3/16ths rule or whatever and you get good quan for lower m/z ions. You get partial reporter ion quan for some in the middle and no quan for the ions with the higher m/z. To make sure that NEVER happened to me again I began always setting my Fixed First Mass at 100.0 (m/z).

Now, here is the question: Is this bad at all?

On the instrument side, I know that one. Nope!  The Q Exactive is the honeybadger of the mass spec world. You tell it to get (as above) 1e5 ions or spend 57ms trying to get those ions. That 1e5 packet of ions (or as much as it can get) goes into the HCD cell, fragments and those get read out. The Orbitrap doesn't take longer to read from 100-1500 than it does to read from 500-1000. That transient (amount of time for the Orbitrap to achieve its mass scan) doesn't change.  No problem there.

But this opened a question in my mind. What would it do to the search engines and/or FDR? Would those low mass fragment ions possibly cause us to find lower IDs or hurt us?

That I didn't know. Fortunately, some of the biggest names in this awesome field are super cool people. So I sent some emails!

Jimmy Eng (who just, you know, wrote Sequest and Comet!!!) said:

"There's no direct penalty for acquiring signal in the lower mass range.  If there are real MS/MS fragment signals down there then that should only help.

However, if the search engine doesn't take into account peaks like immonium ion peaks then those do end up being just more noise peaks.  So that could be ever so slightly detrimental to a tool like SEQUEST where the cross correlation scores might be a bit lower because of the inclusion of noise peaks.

At the end of the day, I would suggest that your user go ahead and acquire signal down there (100 m/z or wherever) and not worry about search scores or FDR.  The impact, if there are any would be minimal and you would have some minimal benefit and some minimal detriment.  It shouldn't have any real impact on FDR analysis since every spectrum would be affected the same way (both the target IDs and the decoy IDs)."

And Dr. Darryl Pappin (uummm...you know...maybe wrote Mascot and maybe invented iTRAQ!!!) seconded that low mass fragment ions aren't a problem for the super secret Mascot algorithm either.

TL/DR: Go ahead an acquire the low mass fragment ions!  And the people in this field are awesome.

Thanks to Dr. Bill Breuer for starting this conversation out of the other project we've been banging around.

Wednesday, May 4, 2016

Label more cysteines!

Nuts. Google has this sweet new thing on the front page if you look up an amino acid where you can rotate it in 3-D by moving your mouse over it. I tried everything I could come up with -- left clicking and even right clicking -- to embed it in this post to no avail. Guess you can't win 'em all!

What was I talking about? New cysteine labeling approaches!  There are lots of reasons to want to label your cysteines instead of your lysines and arginines. Maybe your search engine can't handle 2 modifications per lysine. Maybe just about every post-translational modification (PTM) appears to happen...on lysine....maybe you're just interested in cysteine peptides in general (at least a few sensor proteins in humans (Nrf?) rely on the state of a cysteine within its own structure as a stress detection mechanism.)

So in this new....wait...this is from last April. Why did I read a 1 year old paper? Because someone on Twitter pointed it out. And you can get great info from even ancient research from 2015. Wow. This post is spastic.

In this study from Liqing Gu et al., out of the University of Pittsburgh, this team shows you how to quantitatively study cysteines using 2 extremely thorough methodologies. How thorough?

Seriously thorough.

They do all this work on an Orbitrap Velos. In case you aren't impressed by the experimental design, they also employ gas phase fractionation on the Velos and describe the increase in coverage they get from that approach -- following SCX fractionation.

From the 2 approaches above, they find highly complementary results when looking at mouse liver proteomes. Is it overkill? Possibly. But that isn't the point here. I like this paper because its an extremely different direction. If it looks like you need to be paying more attention quantitatively to what is happening to the cysteines in your organism (or if you just want to try something new with that system that is driving you insane) this is a comprehensive resource on the topic.

Monday, May 2, 2016

Great resource on alternative enzymes!

Whoa!  This is a nice resource. Open Access at Nature Protocols.  A great review for going outside of trypsin!

You can check it out here. 

Sunday, May 1, 2016

Differential proteomics for unsequenced species!

This comes up a LOT in conversation, particularly as I'm wandering about preaching the gospel of modern proteomics superiority over...well...everything else....cause that is just what you have to do sometimes.

What do we do if we've got no genome sequence for the organism we're interested in?

My general answer is de novo! If we're looking at high resolution accurate mass MS/MS spectra -- PEAKS and DeNovoGUI away and BLAST search that output. Or do the BICEPS thing.

Or take a look at this new (paywalled) paper in JPR from Sule Yilmaz et al., here.

Quick and interesting note from the paper:

In 2014 there were:

2334 completed genomes!
21,471 genome drafts

Which sounds like a ton, right!  Until you consider estimates of 2-8M species...

Okay, so how do these guys do it? By massively reducing the number of spectra they have to think about. I'm a little bit fuzzy on the details, not sure if I used decaf or if the method is just a little unclear -- but this is my interpretation.

They run two different samples. In this case they are looking at 2 parasites that are similar. One has a partial genome (or unannotated) and the other has none. By running them both, you can look at the samples pair-wise. What is unique to one can then be evaluated. What is shared can be eliminated then you end up with a lot less MS/MS spectra to worry about.

By using the genome sequence from the one and homology searching the stuff from the other you end up with a feel good story that this works well.

I really like the elimination approach here. I have some small concerns about the number of spectra that are there due to the intrinsic undersampling still prevalent in LC-MS/MS runs (by that, I mean the fact that we don't fragment every peptide present in every run so you might end up with a bunch of stuff that just shows up in the one sample that looks like its unique to one organism but really is just sampling issues)). Also, the 4% FDR cutoff here initially hurt my brain, but considering the variables employed and the relatively large mass cutoff filters they need to use (presumably due to instrument limitations? I don't know the device they use), 4% is a pretty tight control.

 My minor concerns aside, I think the approach described here is smart and unique, and one I think that would be something that would be amenable to several software packages. I'd love to give it a try!

I may revisit this one later. Where is that hyperbolic time chamber again?