Monday, February 29, 2016

Alzheimer's and mass spectrometry references


Alzheimer's sucks. Most diseases do, and everything, but man, that one sucks. And while there are promising developments coming (and...finally....a boost in U.S. government funding to work on it!) its still not gone.

I was working on something else a while back and I had put together a list of papers that showed the power of mass spectrometry in studying this nefarious and complex disease.

This is just a sampling of the recent work on the topic, but to think that this disease has components of changes at the: proteomic, metabolomic, glycopeptide and phosphorylation level really illustrates 1) How complex this stupid disease is and 2) Why us mass spectrometrists need to be involved in the effort to get rid of this crap.

Alzheimer’s LC-MS papers (2015)

High resolution (Orbi FT) [my good friend Katie Southwick is an author, so I can personally guarantee the work is top notch]:  http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135365

Laser Capture MicroDissection method using Orbitrap (behind paywall): http://link.springer.com/protocol/10.1007/978-1-4939-1872-0_9



Proof that there is a phosphoproteomic component to Alzheimer’s:  http://content.iospress.com/articles/journal-of-alzheimers-disease/jad150417

Detection of glycoprotein biomarkers in alzheimer’s:  http://pubs.acs.org/doi/ipdf/10.1021/acs.jproteome.5b00892

Another phosphoproteomics alzheimer’s paper (performed on a Q Exactive):  http://www.jbc.org/content/early/2015/04/27/jbc.M115.645507.short



Proteomics in Alzheimer’s (brand new, open access as of 12/15/15)_ http://www.mcponline.org/content/early/2015/12/11/mcp.R115.053330.abstract

Sunday, February 28, 2016

Imputation strategies in label free quantification


This bright Sunday morning, I learned a new word, "imputation". And since Google Image only gives you really weird stuff if you try search for this word, here is a picture of  my dog dressed as a sheep.

Google says: "In statistics, imputation is the process of replacing missing data with substituted values."  The paper where I learned this term was Just Accepted at JPR and you can find it here (open access if you are logged in).

This paper and I got off on the wrong foot on the very first line in the introduction, when they state: "Missing values are a genuine issue in label-free quantitative proteomics." We're going to agree to disagree here, because I fall firmly into the camp that "missing values" in modern instrumentation (i.e., Orbitraps) is an illusion, propagated by clever marketing from groups with alternative agendas and the fact that software hasn't existed until recently/soon that can assess all the values in our RAW data files in HRAM mass spectrometry. Again, agree to disagree and move on into this interesting paper!

For this team of talented statisticians, they are going to assume that:

In this run we didn't get a PSM for this peptide = missing value

Missing value = problem

Since they've defined this as a problem, how do they move forward? First of all they define 3 reasons that they wouldn't have achieved this PSM in this run. This has to do with whether the mixed value was due to a random occurrence (and how random). Next they use a very simple equation to simulate the replacement of a value in one set with the value achieved in a separate set. 3rd, they take a good SUPER SILAC dataset and check their values, then they go nuts with a bunch of equations.

What did we learn here?

Well, if you are going to do imputation -- you'd better do it at the peptide level (though the authors may actually mean PSM here). That if you are going to Impute (or plug in new values) for missing ones then you should really take into account the reason that the value is missing in the first place. So having algorithms in place that can diagnose why the value is missing will be a valuable tool in allowing you to correct for the missing values.


Saturday, February 27, 2016

A Biologists Field Guide to Multiplexed Quantitative Proteomics!


This paper falls firmly under the "things to send my collaborators before we begin designing experiments" category!

"A Biologists Field Guide to Multiplexed Quantitative Proteomics" is short, concise, well-written and perfect for clarifying the techniques we might be using on those cool samples they've been working up. You can get it (open access) at MCP here. 

Thursday, February 25, 2016

"High capacity" ETD massively boosts sequence coverage of intact proteins!



ETD is a valuable tool for sequencing intact proteins. We've known that for a long time. But its also tough to work with. One problem is how much signal you end up losing which forces you to use things like scan averaging, which slows everything down.

Nicholas Riley et al., have a solution to this problem. High capacity ETD!  I'm a little hazy on how it works. Instrument physics isn't my thing, but my understanding is that they used a standard Orbitrap Lumos and tweaked the code a little to contain and fragment the ions a little differently.

What I do get? The massive boost in coverage!


Check that out!  They mostly worked with small proteins for this study, up to 29kDa. But the top is normal ETD and the bottom is their new high capacity ETD. Almost double the number of fragments (they did the fragment analysis with the free Prosight Lite, btw!

More coverage is aweseome, but the cooler thing is probably that they could get lots of coverage WITH FEWER MICROSCANS!  (Sorry, my caplock got stuck). When I help someone set up a top down experiment, that is the killer. You take this super fast Orbitrap but then you microscan (scan average) a bunch so that you get enough signal, so the speed of the instrument comes way down. But if those microscans weren't as necessary? Holy cow.  Massive increases in what you'll identify in top down runs!

Now, if you're thinking: "Wow. That is cool and all, but I'm not going to be hacking the software on my Fusion or Lumos, so who cares?"  Turns out its already equipped on the Lumos!

Another interesting note!  An NBA sharpshooting legend appears be doing a lot of mass spectrometry these days.  Wait...have I made this joke before?


Sorry...

Wednesday, February 24, 2016

Use more than one search engine? Why?


This week I got some great questions from a researcher who is relatively new to the field. I love when I get these kinds of things because it reminds me of questions I had when I started doing this that I later forgot about (I've started a new FAQ Page that will appear on the right at some point (soon if I finally find the entrance to Kami's hyperbolic time chamber...)

One question was specifically linked to Search engines, including what advantages we'd see from using more than one.  There have been several good papers over the years, but I'd probably argue that this one from David Shteynberg et al., is the most comprehensive look at this subject.

While the primary focus of the paper might be more on how to deal with FDR when using a bunch of different algorithms, there are a number of interesting figures and details regarding the Peptide Spectral Matches (PSMs) that show up when you use these other algorithms. I think if you really dug for it you'd find at least 10 papers over the years that will come up with something like this

1 engine = x PSMs
2 engine = 10%x more PSMs
Add a third? = Maybe 3% more PSMs than 2

I'm embarrassed by how long it took me to write that (as well as how dumb the endpoint looks) but I hope that is reasonably clear.

Now, there may be big differences here. Some algorithms are very similar. Comet and Sequest have very similar underlying algorithms. Using the two together might not give you 10% more IDs. In the paper I mention above, they define a concept of Search Engine Complementarity (I'll add this to the Translator now!) and the equation is in the paper. In general, though, this is the amount of overlap between two search engines. Ben's bad-at-math translation:

Search Engine Complementarity (SEC); higher SEC = more new peptide IDs from the same dataset

In this example Sequest + Comet would have a lower SEC than two very different algorithms. The super-secret Mascot engine and InSpecT were found to have the highest SEC in this dataset.

This paper is a couple of years old, so some new stuff has popped up that didn't make the study. Notably MS-Amanda and MS-GF+. If you follow the figures in the launch paper for the latter, you will see what looks like a very high SEC for MS-GF+ and Mascot (the only software it was compared against).  In my hands, I find a lower SEC when paired with Sequest (but I feel its higher when paired with Mascot), but these are just rough measurements.

An interesting factor to consider here, though, is that these are all complex statistical algorithms and concepts like the SEC may be drastically altered when looking at different datasets. Case in point, where Sequest+MSAmanda seem to produce very similar results in my hands -- until I'm looking at a relatively high number of dynamic modifications in high resolution MS/MS and then I see the two begin to diverge.


Tuesday, February 23, 2016

Toward a better transmembrane proteome!


Doing some human membrane proteomics? Maybe wanna try a focused protein database method to search it? Maybe what you need next is a great transmembrane proteome! You can access the HTMP here.

If you're thinking "wait a minute...didn't someone do this a long time ago....?" Sure! Maybe its been done a few times. But according to this recent paper from Dobson et al., this database has a superior prediction algorithm for finding true crossing points of human membranes.

A Venny showing the distribution of the prediction models definitely illustrates some differences between different prediction algorithms....



...which this group says they took into account in making the best model. Is it the best? I honestly can't evaluate the model, but anything that opens discussions on what proteins are localized where is going to move us forward. By some strange coincidence (honestly, this was just on Twitter over the weekend) I'm leaving in a few minutes to visit some friends who are doing what is easily the best membrane proteomics I've ever seen. I've been dying to tell people about these methods and I'm hoping to hear the paper is out the door. Maybe while I'm there I can get their feel for this cool new resource!

Sunday, February 21, 2016

Environmental proteomics tackles viral "dark matter"!!!


If it seems like I'm making fun, I'm doing it wrong. I'm totally not. Environmental proteomics (there's definitely better names for it!) is super cool. I think that we've wanted to do it for a long time, and we finally have the technology (and really innovative researchers) to use mass spectrometry to better understand how our environment works.

Want further proof? Check out this sweet new paper at PNAS:


This is more than just a catchy title, turns out "dark matter" is a hot topic in science right now (and something in this context I've never heard of before).  What does dark matter mean here? Its all the biomaterial out there that we have no idea what it is or does.

For some cool reading on viral dark matter, check out this New Yorker article from last year. A researcher states that there is probably 1e31 phages on Earth (vastly outnumbering every other organism on earth combined) and that we really know very very little about them.

Sounds like a job for some Orbitraps to me!!!  And some genomics. And some really really smart data analysis.

How do you get started in looking at viral dark matter? Well...you start with FASP digesting about 80 LITERS of sea water. And then you break out the MudPit. We're talking low abundance and super low copy number. An Orbitrap classic did MudPits of 72 hours in length. For the Q Exactive work they could get data to work with out of 8 hour run times, and an Orbitrap Velos Pro fell somewhere in the middle.  Once they got their HUGE data files, they used an assembled metagenome that is tremendously huge. I'm a little fuzzy on the FASTA details, but the assembled entries are between 2e5 and 1e6 protein entries.

Cause every data point needs to be evaluated, the data was thrown against Sequest (with DTA select and with Percolator separately) and through the TPP, using X!Tandem and some robust statistics were used to combine that data.

So how do you define success with an experiment like this?

What if you found a high level of peptides present from some new proteins that are very similar to known capsid protein structurally (capsid is the protein on the outside of a virus...or phage, more correctly, I think...), but no one has ever characterized it before?!?!  And what if you find that protein, now that you know to look for it in sea water all over the planet and this illuminates some of that "dark matter" a little?

I think that is EXACTLY how you define success with a project like this one. This is a solid study all the way around and just a great coffee read!

Saturday, February 20, 2016

What happens when you use a tea bag a second time?


I'm just posting this cause I'd always wondered what the answer was, but never cared enough to look it up. Thanks to /r/askscience, I now know the answer!

The numbers vary somewhat, but if you make 1 cup of tea with a tea bag you get most of the total extractable caffeine (around 70% or so).  If you make a second cup of tea you get close to that last 1/3  that is left. Here is a summary table.

We can thank Monique Hicks et al., at Auburn for this paper (there are other cool measurements in this as well)!

Wednesday, February 17, 2016

Should you narrow down your database to realistically observable peptides?


Hmmm....I'm kinda liking this idea, even if at first it seems a little like cheating...

Here is the paper (JPR, paywalled) from Avinash Shanmugan and Alexey Nesvizhskii.  I know this first-hand -- if you are studying mouse proteins and you search your MS/MS spectra against the sequences of every organism ever sequenced...


...you're gonna have a bad time. (Sure, there are exceptions, but in general why would you search mouse proteins against a Archaea FASTA? 

So lets take this thought a little further. What if you more intelligently built your FASTA database by using additional data at your fingertips?  For example, what if you went to the GPM and take a swing at targeted (peptides likely to be present in your LC-MS/MS runs) and untargeted (all sorts of stuff, whether its likely to be there or not) and see how it affects results? 

Turns out you end up with better data by making your databases even more realistic (i.e., biologically relevant)!!!

Sorry if I've been misusing the "i.e." thing. I just Googled it and one of the first entries was a long thing that explains the difference between i.e. and e.g. and it was way too many words.

I'm pretty fascinated by this idea and I think I'll give it a shot once I come up with a good example file and database. Definitely check this out when you get a chance!

Monday, February 15, 2016

TMT 6 plex reagent used on >1,000 human plasma samples!


Isobaric labeling technologies like TMT and iTRAQ are awesome cause they save us a ton of time. The down-side has been traditionally thought to be that: "once you -plex, you are done". You TMT 10plex ten samples and you can compare those, but you can't compare anything else. Several studies have shown you can go beyond that with internal controls and things, but I'd argue that this is definitely the biggest one!

The study is from Ornella Cominetti et al., and is in this month's JPR here.  They took plasma from 1050 patients, depleted it and TMT 6plexed it on an Orbitrap Elite (separation was on a 50cm 2um particle columns) and used PD 1.4 as their interface for Mascot and Scaffold for X!Tandem. The downstream work was done with R and GraphPad.

They do a really remarkable job setting up the experiment and sample groups (nicely randomized!) and I'm nothing short of impressed with the downstream stats. Good experimental design, great chromatography and MS method, and thorough data searching and setup = one solid paper!

How'd they do with a set this big? Direct quote from the abstract: "We demonstrate that analyzing a large number of human plasma samples for biomarker discovery with MS using isobaric tagging is feasible, providing robust and consistent biological results"

Number-wise? Its pretty sharp. They quantified hundreds of proteins that had NO missing quantification points across all plasma samples. Over 1,000 patients! In the discussion they detail why, for a cohort this big, they would stick with this method over a data INdependent (DIA) type approach because they'd be stuck with 6x the run time even if they would get a few less missing values. All along, just a nice solid study. One day I'll be less impressed when people show me proteomics from thousands of people, but that day hasn't got here yet!

Sunday, February 14, 2016

SUPERQuant test number 1


So I actually read some of the SuperQuant paper and then set up the method to run like this. After multiple failures that ended up just being me running out of C:\ space, I got a couple of runs in.

At first approach, I think the darned thing totally works. I also think that it needs some further evaluation.

I'll queue up a bunch of stuff to run now that I have some space, but here are a couple of screenshots.

Experiment here:
HeLa 1ug 120min run on 25cm EasySpray columns running in High/High mode (this is the normal HeLa file that I have used for just about every experiment you see on this blog that used to be publicly available via my old FTP site, so I know a lot of you out there have it.
Ran just like above for the SUPERQuant runs and without the 2 new nodes in the nonSuperQuant comparator.

Nonsuperquant:

SUPERquant!


Interesting!  In the end, what do we get? 39 new proteins. Okay. Worth noting that this took approximately 50% more total processing time than what I had with the regular method.

Check out how many MS/MS spectra it thinks it looked at!!!  It thinks it saw over 30,000 more MS/MS spectra after the processing. Through that Percolator actually ended up with FEWER PSMs. One thing we know about Percolator, though is that is works better the more MS/MS spectra it sees, right?

Does that mean that the matches I'm looking at after SUPERQuant are better than those before? I'm trying to come up with a good way to assess. I know for sure that the worst scoring peptide (and totally not a good match) disappears after SUPERQuanting. But the output here is just a little dense without plotting. What I do know is that I have more Protein groups ID'ed on a small quick dataset after SuperQuanting, and it didn't take long to install OR run and a really bad peptide that slipped through all my filters disappears after using these cool free nodes.

Sounds like a darned win to me. I'll feed it some more stuff and see if I can make sense out of it, but I definitely recommend you check this out!

EDIT: Another couple runs finished up!  Bigger dataset (55K MS/MS scans OT/OT on a Lumos)
SuperQuant gives me 100 new protein groups from over 1,000 new PSMs and its still hard to tell, honestly if the data coming out is better, but I feel pretty confident that it isn't markedly worse. Here is a quick screenshot. PSM #s vs. XCorr (I know...if you come up with a smarter metric, let me know...)


 I let the binning occur automatically, so it isn't identical, but its not too far off. Both runs come back with some peptides with XCorrs under 1.5, but the numbers are similar.

The 1,000 new PSMs look as solid to me as the rest of the data. More data for free? Consider me all signed up!

Saturday, February 13, 2016

SUPERQuant! Identify those coeluting peptides directly in Proteome Discoverer


Aaaaallllllrrrrriiiiiiggggghhhhtttt!!!!

I'm going to get excessively excited (surprise!) about this one before I actually even run any data through it. You know how sometimes we fragment more than one peptide at once, even when we don't want to? For a while people have been kicking around code that can figure out what the identity of the other peptides are. I heard a while back that MaxQuant had a feature like this built in.

Well, according to this sweet new paper from Vladimir Gorshov et al., this new node you see in my PD 2.1 screenshot above called "Complementary Finder" has the ability to do this as well.  You can download this software (as well as easy installation instructions!!!) at Github here.  At first glance the approach seems a good bit different than other ideas I've heard of, but I haven't delved in deep here yet.

I'll try to queue up some stuff through it tomorrow and share impressions, but they test the method in the paper by doing HeLa runs with isolation windows of 1,2, and 4 Da in width and show massive increases in peptide identifications. Super complex mixture? High numbers of MS/MS scans that include more than one peptide. Maybe you can use SuperQuant to figure out what all of them are!


Thursday, February 11, 2016

How should we store and handle all these peptides for best results?


This article, honestly, isn't the most exciting thing...but its real important. If we're going to get around to standardizing this field we're all going to have to start following the same protocols. In this study from Andrew Hoofnagle et et et et al., (40 or so people at different labs ALL over threw in on this!) these authors systematically assemble a list of "Recommendations for the Generation, Quantification, Storage and Handling of Peptides" for LC-MS.

How long is that peptide good for? Do we put those standards at 4C or -20 or -80? How many times can we freeze thaw before we chuck it? All super important things to keep in consideration and this might be the current definitive work on it.

Wednesday, February 10, 2016

Proteostasis in intact animals!!!


This new paper from Dean Hammond et al., is an absolutely elegant approach to exploring amino acid integration and protein turnover.

What'd they do? They took a bunch of voles (had to look up what there were, knew it was a rodent...)


and fed them food that contained stable isotope lysine!  Then they studied the proteins from various organs of the cohort over time.  How much time? 40 days!!

From these samples they did single dimension LC separation with 15cm EasySprays on <100 min gradients on a Q Exactive running a nice standard top10 method.  The resulting RAW data was ran through Mascot and quantification was done with Progenesis.

To really assess the big question here -- protein turnover, they had to work up their own math and processing methods in-house, but the RAW data was put on PRIDE (PXD002054) and the calculations and logic laid out in great detail so anyone who wanted to look at these dynamics could check their methods (seriously...a well written paper!).

So what did we get out of this one? It seems like this is a starting point for this group. Here is a protein integration/turnover model system and now that they (and we!) have this system in place, they can start to look at more complex perturbations of this system. They specifically mention that they will be further applying this system to body mass altered systems.  What we get is a really good dataset describing in a mammal, how reliable our turnover models (based largely on unicellular organisms!) are on a global level scale.

One challenge they had here was that the Vole doesn't have the best genome. But you know, someone somewhere out there is probably working on it, so this is a good dataset that will only get better with time!

I'd just like to throw out this thought: In the liver, they were able to calculate the turnover rates of over 1,000 proteins. >1,000!!! They did this with an incomplete genome, a 15cm EasySpray column and a Q Exactive classic and some really good biology! I am so psyched about where we are these days!

Tuesday, February 9, 2016

Brain region resolved proteomes!


Maybe it has something to do with my common spatial proximity to a certain neuroscience postdoc, but I've definitely heard that neural proteomics is some tough stuff to do. There are limitations due to lots of things, including low relative protein amount to lipids and other less interesting biomolecules as well as just historic limitations of mass spec sensitivity.

A lot of mouse brain proteomics has come out of Max Planck over the years, and they just decided to show us in this new article in Nature Neuroscience just where you can get with today's methods and technology.

Mouse brains taken out, mouse brains chopped into the correct sub-regions, proteins extracted, peptides subjected to simple StageTip fractionation and then full out label free proteomics with a Q Exactive HF.

The sensitivity problems of the past are history with simple fractionation and a Q Exactive. There is plenty of material in a region of a tiny mouse brain to get full proteome coverage. The output is incredible. Their base group ran about 12k proteins and anywhere from hundreds to thousands of proteins show differential expression across the various sub-types.

To sweeten the analysis, they went ahead and did RNA-Seq on these brain regions and compared both the IDs and the quantification -- cause, you know, why not!

I'll throw it out there that I'm pretty impressed by the data integration they demonstrate in Perseus in these 2 data sets. And the fact they draw some really interesting conclusions on the processes leading to neural cell differentiation is just icing on the cake here!

Thursday, February 4, 2016

Go to beautiful Cold Spring Harbor to learn Proteomics! AND Metabolomics!!


One of my favorite things to do each summer is to go up and secretly help out at the Cold Spring Harbor Proteomics course. (Secretly, cause there is never any record on the website of me being there at all...though there might be incriminating photos here and there if  I really needed proof...)

If you aren't familiar with the program at all this is Ben's description: 20 or so people show up who have often heard of proteomics but have never really done it and about a week in there are 10 of us gathered around a Q Exactive screen at 3am watching beautiful phosphopeptide data come flying off the instrument.
The instructors are always top notch with the biggest names in the field either there full time or popping in to teach everything from sample prep through data processing and everything in-between.

In sum, its a really high quality proteomics boot camp at a beautiful location.

This year a new course has been added to the Q Exactive's workload -- there will be a full metabolomics course!

You can find out about these courses at the CSHL Courses page here.

Oh, and in my new role I probably can't justify spending a week up there to my boss however, if any of the instructors/assistants see this, if you want the QE set up, calibrated and QC'ed for the day the class starts, I've got an open schedule that weekend!

Wednesday, February 3, 2016

Online capillary electrophoresis for separation of antibody-drug conjugates!


Immunotherapies!  Man, that is about all anyone in the pharma world wants to talk about these days. And for good reason. Engineer an antibody that goes after tumors and just happens to have a killer chemotherapy "warhead" on it, and BOOM you kill the tumor without killing the patient. The successes so far have been amazing and people are going after it.

Problem is....building an antibody with a drug conjugated to it is tricky stuff. And you don't want to dose patients until you really understand what those weirdos back there in chemistry have done. And...even figuring out what they built can be hard.

Have you tried doing LC with intact antibodies before? Not the Waters antibody standard, but with a real honest-do-goodness-this-might-be-a-drug one day antibody?

Well, I have and they tend to look something like this C4 separation I did a while back....

If you want to see pretty monoclonal antibody separations, I've got hundreds of those on this hard drive too, but they're Waters standard or one of the cool standards we can run in Europe but not in the U.S.  When you are looking at something that isn't just 4 glycoforms, but 3 drug conjugates and 4 glycoforms, getting good LC can be the hardest part.  Fortunately, for us, we can generally make sense of most of this with resolution. There are limits though, cause antibodies are so big that they often degrade too much to use resolutions above 15k or 30k (instrument depending).

What's that image at the tip top then?!?!?  Its this new paper from Erin Redman et al., In the study they describe the use of capillary electrophoresis coupled mass spec to separate out ADCs with very small modifications into nice, rapid, distinct peaks!

How good is their separation? Its good enough that a TOF can get the individual components out of the mixture. Can you imagine what you could do by coupling CE up to a Q Exactive?  You'd end up making the job of ADC analysis a whole lot easier. Hell, you might open up the possibilities of really looking at polyclonal antibodies, which is honestly kind of the holy grail for a lot of my friends out there.

They reference a study they did last year with this same setup. I don't have time to read it this morning, but this is the front page image from JPR...



Independent separation of glycoforms?!?!


Monday, February 1, 2016

Large scale protein turnover calculations!


One criticism mass spectrometry might get from the classical biochemistry community is that our readouts are more like "snapshots" of what our cell population is doing rather than the elegant dynamics of other assays.

For someone like me who would choose being tasered repeatedly over working out one one PK or equilibrium thing, this criticism hasn't been one I've given a lot of attention to.

Hey, wait!  What is this thing? This looks like a dynamic proteomics assay using where transient 15N labels are applied growing cells (in kind of a pulse chase sort of way) and the software reads the data of multiple samplings into a dynamic readout of protein turnover!

You can check out this interesting study (paywalled) and free software (in R) from Kai-Ting Fan et al., here. 

It is worth noting that they did this cool study in Arabidopsis, but I'm sure these resources could be also be applied to interesting organisms as well.  ;)