Thursday, October 31, 2013

Do we still need to fractionate our samples for deep proteome coverage?


Here is a good question:  Are we still in a place, technologically, where we gain more from pre-fractionation than we lose?

At first, we had to, right?  If we didn't cut gel slices or fractionate a sample by SCX or IEF or some other manner all we'd ever see was albumin and keratin from Pan troglodytes.  Maybe that was just me.  My first several proteomics experiments and that's all I got.  Chimpanzee keratin, because mine is more similar, apparently, to chimp skin than human.  I have proof, btw, I'm not just being funny.

Fractionation seemed to be the key.  20 fractions yielded more than 10 fractions and we dealt with the fact that we went from a 24 hour run time to a 48 hour one, because we were getting some sample depth.

Here is the question, though, is it still necessary?  Or have we attained the speed and sensitivity in MS and the quality of nanospray chromatography separations gotten to the point that the inevitable losses incurred by pre-fractionation and the staggering increase in run time are far worse than the gains?

The literature this month would go in two completely different directions on this one.  Two weeks ago I summarized a paper in Proteomics that used a pretty unique 3D fractionation method and a lot of you guys really liked that paper.  Going in the complete opposite direction, the paper from Josh Coon's lab with the one hour proteome showed that you can get into the high numbers with just 1D, if you optimize the LC really well (oh, and have a Fusion).

Here is a new one for camp number 2:  "Rapid and Deep Human Proteome Analysis by Single-dimension shotgun proteomics" by Pimoradian et. al., out of Sweden, the home of the world's greatest band (In Flames), and also the home of Roman Zubarev's, who is corresponding author on this work.  From the title, you might be able to guess a little bit about the paper...but I'll still tell you more.  By using a 50cm column and optimizing the LC conditions to the ideal dynamic exclusion settings (see, I told you this was super important! see rant #1, rant #2) , they were able to knock out 4,800 unique proteins out of an A375 cell pellet.

I'm going to reserve my opinion for now.  Nope, I lied.  I think that if you're currently doing lots of prefractionation or 3D separations, I think that you owe it to yourself to step back away from your system and give one of these optimized 1D separations a shot.  The evidence is building.

You can link to the abstract for Pimoradian et. al.,  here, but the pre-print release link has expired.

Tuesday, October 29, 2013

Broad researchers reassess high throughput cancer screens with good results


Last week in Boston and I was informed that Broad rhymes with toad.  The way I was saying it was wrong, and in the absolute wrong context in casual conversation could be misconstrued into somewhat sexist statement.  So Broad, like toad, everybody!
For everyone else in the entire world who already knew that (my home state is infamous for saying things incorrectly, by the way, it's just how we roll), here is some really cool science that they are doing at the Broad Institute of MIT:

High throughput cancer drug screens normally work like this -- hundreds or thousands of plates, cells, or wells of immortalized cancer cells are grown under identical conditions.  An automated system doses each separate cell with a different prospective drug or dosage of said drug and the efficacy of the drug is recorded by means of the destruction of cells or the inhibition of cell division or something similar.

Researchers at the Broad took a step back and decided to make the system more physiologically relevant.  By mixing leukemia cells with normal stromal cells, they can more accurately mimic the microenvironment that these cells exist in.

What they found is that they can eliminate some false positives.  Some drugs will work on leukemia cells alone, but are protected by the presence of normal stromal cells.  Re-screening potential drugs with bring you to drugs that not only work in the well, but are also more likely to work under normal physiological conditions.

You can read more about it in this press release that I found through Twitter.

Monday, October 28, 2013

QE Plus, first impressions


I just wrapped up my first day on a QE Plus.  This was a full out shotgun proteomics instrument -- no extended resolution (280,000 resolution is an option) or protein mode (for improved intact analyses).  It is the "base model" plus.  First impressions; if you've ran the QE, you are ready to go on the Plus.  The controls have not changed noticeably, in fact, if you've purchased a QE Plus, you can still get going by using my QE training videos without much of a hangup.
An option that I like a lot:  you can set your mass tolerances for your inclusion, exclusion, and dynamic exclusion lists where the original QE is set for 10ppm (a perfectly suitable window for almost every applicaton!).

Aside from that, from the instrument setup side, it is every bit as intuitive and streamlined to calibrate and set up methods as I've come to expect from this great instrument.  All improvements in engineering and sensitivity have taken place inside the box.  We experimented with some very tight MS/MS isolation windows on overnight runs and I should have something to say about the power of the new quadrupole sometime soon.

Sunday, October 27, 2013

Sirtuin (Sirt6/Sirt7) proteomics!


Sirtuins are genes/proteins of significant interest these days?  Why you ask?  Because they seem to be key regulators of the aging functions in Eukaryotes and a lot of us self-aware and self-centered beings out there would rather not die, nor suffer age-linked fun restrictions!
Knock out SIRTs and you totally mess up the yeast life cycle.
The same holds for mice (as shown above and stolen from the Nature paper clicking on the picture will take you to) But no one really seems to know why yet.  We have their in vitro functional activities all worked out, but there doesn't appear to be a clear link between what they're doing and the complex breakdown of senescence (programmed aging!)

Sounds like a job for proteomics!  Nuts.  I photoshopped a Q Exactive wearing a superman cape and symbol a while back for a talk I gave in Seoul, but I can't seem to find it right now.  I'll add it later!  Nevermind (I made a new one!)  I need a hobby.  This study didn't even use a QE....



Two really nice papers in press at MCP now take a swing at the functions of SIRT7 and SIRT6.
In the SIRT6 paper, "A proteomic perspective of SIRT6 phosphorylations and interactions...," by Miteva and Cristea out of Princeton, this team goes after human SIRT6 using an impressive array of molecular techniques.  The proteomics are flushed out by an Orbitrap XL and Velos.

In "SIRT7 plays a role in ribosome biogenesis and protein synthesis," Yuan-Chin Tsai, et.al., out of the same group uses a similar approach to study SIRT7 knockdowns with an Orbitrap Velos.  Again, this lab demonstrates a remarkable mastery of a wide variety of molecular skills, employing top notch microscopy, genetic techniques and proteomics to really make their case and demonstrate more about the pathways of these proteins than we knew ever knew before.

If this is your field, or you just want to see proteomics seemlessly integrated into a comprehensive pathway study, I suggest you download one or both of these nice new papers.  The links above are to the Early release abstracts and won't be there forever.





Saturday, October 26, 2013

How does Percolator affect MSAmanda search results?

This Saturday afternoon analysis is brought to you by the fact that I don't have any hobbies that you can do when it is too cold to rock climb but not cold enough to snowboard.  It is also brought about from by one of my 10 favorite questions asked by other attendees of the PD user's meeting.

Here is a paraphrase of the question:
MSAmanda gives you more peptides than Sequest using a target decoy search on high resolution MS/MS data, but how does it fare when we use Percolator?  I'm actually going to extend that question one step further, if both using MSAmanda and using Percolator give you more peptides than Sequest + Target decoy alone, are these the same peptides?  This analysis will come later.

Dataset:  A 120 minute HeLa digest run on an Orbitrap Elite using a 25 cm EasySpray column and operating in standard high-high mode employing at Top15 methodology.  So, an extremely complex high-high dataset with  nice chromatography.

Processing:
1) Sequest + target decoy
2) Sequest + percolator
3) MSAmanda + target decoy
4) MSAmanda + percolator

Conditions for processing were as similar and simple as possible, Uniprot/Swissprot database parsed on "Sapiens" alone, iodoacetamide as a static mod and M oxidation as a dynamic mod.  FDR of 0.01 as the "strict" cutoff, and those are the only peptides I looked at.  Mass tolerance of 10ppm at the MS1 level and 0.02 Da at the MS/MS

The following data is all at the Unique protein group level:
First Sequest + target decoy vs. Sequest+ percolator

As expected, Percolator ends up giving us more total unique protein groups.  No surprise there.

Question #1 then, does Percolator + MSAmanda do the same thing?


Yup!  Okay, I totally dig question #1.  And I wonder why the heck I added more work to it, because if I hadn't we're looking at an open and shut case.  MSAmanda is definitely Percolator compatible and at the end of the day, we are looking at more protein groups from MSAmanda whether we use target decoy OR percolator than we get from Sequest.

My conscience is saying that I need to check to see if these new peptides are any good (ugh...).  This is my opinion on false discovery rate calculations (feel free to look at my other discussions on this site), they're a shortcut.  Inherently, I do not trust them, and neither should you.  They are a mechanism to help you, but manual verification is ALWAYS a good idea.  Unfortunately, looking at tens of thousands of MS/MS spectra is a poor use of time.

My strategy:  manually look at a sample of the worst scoring peptides at your 1% FDR cutoff and at your 5% FDR (please excuse my shorthand, you know what I mean, or you probably wouldn't be reading this unless you were really odd.)  If you have crappy peptides at your 1% FDR, it isn't strict enough.  If you have great looking peptides all over the place at your 5% FDR, you are too stringent.  Adjust your cutoffs accordingly and re-evaluate.

This is how I do it on a Saturday afternoon on a dataset I'm not getting paid to analyze.
1) Go to the peptide tab for each analysis
2) Arrange the peptides in order of respective peptide score from worst to best
3) Double click on peptides 1,5,10,15 and 20 to reveal the XICs with the overlayed fragment matches.
4) Rapidly score them by this point system:  10 points if you would publish that peptide spectral match as it is, 5 points if you think it is okay, and -5 points if it is some junk.  Yes, I made this up, geez!  But it works.  Remind me and I'll show more evidence at some point.

Here is how they did:
Sequest + Target decoy:  45 points (one mediocre peptide match)
Sequest + Percolator:  5 points, several bad matches
MSAmanda + Target decoy:  35 points (3 mediocre ones.  not bad, but I wouldn't publish alone)
MSAmanda + Percolator:  50 points.  In this small sample set, I would trust every one of these PSMs.  Ummm...not exactly what I was expecting...but I'm cautiously excited about it!

Okay, this is getting out of hand. Now my conscience says:  Is this due to the sample size?  I looked at the next 20 spectra, spaced every 5 and I don't think so.  You'll have to take my word on it.  But at this default cutoff, there is NO doubt in my mind that the peptides scored by MSAmanda in conjunction with Percolator are significantly better than the peptides scored by Sequest + Percolator and are on par with, or are better(!?!?!), than the peptides scored by the much more conservative target decoy search.

Examples:
In this sample set, the WORST PSM scored by MSAmanda + Percolator and passing default cutoffs:


By comparison, the lowest scoring peptide from the Sequest + Percolator search that passed default FDR cutoffs.



2 y ions?  Seriously?  This is why you CAN NOT trust your default FDR cutoffs.  Take this as a shortcut.  In case you were wondering, I gave this peptide a -5!

I'm going to cut this analysis off now.  Enough data processing for a Friday evening.

Again:  Question #1, does MSAmanda work with Percolator?  My answer, based on 4 runs of 1 dataset.  Absolutely.  In fact, Percolator seems to work a whole lot better with MSAmanda than it even works for Sequest.

By the way, I'm not putting down Percolator + Sequest, I would simply tighten the FDR cutoff until I got to consistently good data.  In this example Percolator simply over-shot the mark a little and dug too hard trying to get us as many peptides as possible.  In fact, that peptide may be a good match, but it is one that I certainly would not show someone to convince them that we found their protein of interest.

Disclaimer, because I'm still a little thrown off by this:  This is one dataset.  The results are surprisingly convincing, however, and the logic is beginning to make sense to me.  Percolator is trying to dig into the data to pull out PSMs that we mistakenly threw out as false (oversimplification, but let's roll with it) and it can only do that based on the quality of the data that was originally identified.  If MsAmanda is doing a superior job of making peptide to spectral matches, Percolator has more to work with.

TL/DR:  Use MSAmanda for high resolution MS/MS spectra.  Also use Percolator, they are compatible and give you more data.  Always verify if your FDR cutoffs are giving you good data!


Friday, October 25, 2013

What happened at the PD user's meeting?


Not to rub it in if you couldn't attend, but the International Proteome Discoverer User's meeting kicked ass.  It was easily the most valuable learning experience that I had personally this year.  I'm working on hunting down the talks now and I'll provide links to them as soon as I can obtain them. I want to provide an overview of what happened.

Talk 1)  Bernard Delanghe went over basic PD functions, then plowed head on into the power of using multiple search engines, both in parallel and in series for digging into your data.  I recently touched on the extreme results the BRIMS has had when using multiple engines in parallel.  A big emphasis of Bernard's talk was the enormous value in processing speed and identification rates that occur by using multiple engines in series.  I've touched on that a little, but expect a number of experiments from me to follow.


Talk 2) Marshall Bern described Byonic, a software that I've previously beamed about, in particular how Byonic can be used effectively for glycopeptide analysis.  He also described the Byonic node that will soon be an purchasable upgrade for PD.  Expect an explosion of analyses and announcements and data from me when this launches.

Talk 3) Viktoria Dorfer gave a talk on the power of her creation, MSAmanda, in the scoring of high resolution MS/MS spectra.  An interesting note for proteomics software teams out there should be the fact that the creator of this fantastic new tool is still working on her Ph.D.  A highlight of her talk, for me was a comparison of the number of peptides that MSAmanda found for ETD spectra when compared to the other search engines.  This should be an extremely interesting observation for the Orbitrap Fusion teams out there, as the incredible speed of the Orbitrap (and the ease of obtaining high ETD signal) allows high resolution ETD spectra to be an efficient experimental design.  Expect experiments and data from me to follow.

Talk 4) Not to take anything away from the other great speakers, but this was Oliver Serang's day. In an incredibly amusing and informative talk, Oliver walked us through a new way of thinking about assigning protein identity to identified PSMs.  These new functions are to be available in PD 2.0.  I am currently awaiting library access to the papers that Oliver has written so far on this and other topics.  When these arrive, I'll spend some time trying to figure these out.  In the meantime, I highly recommend a Google Scholar search on his name.

Talk 5)  Automated spectral library generation node for Proteome Discoverer!  My good friend Maryann Vogelsang presented a node in development at BRIMS that will take your PD data and automatically generate spectral libraries, which can then be directly imported into Pinpoint!!!!!  Remember my entry that spectral libraries are about to blow up?  They are, especially with the thousands of terabytes of high resolution/high quality Orbitrap data out there that we can easily convert into high res spectral libraries?  Exciting!!!!

Talk 6) David Perlman gave a talk that I regretfully had to miss due to an important meeting.  Fortunately, I did get to meet him finally later that evening.  In case you aren't familiar, Dr. Perlman is the Director of the Proteomics and mass spectrometry core facility at Princeton University.  You can find out more about his facility here.

Talk 7) Proteome Discoverer 2.0!  Bernard showed it in action and took audience suggestions.  It is currently in alpha testing (and open on this PC I'm typing on, haha!) and it looks great.  Tons of new features are coming, but I'll go into them later.

Q/A session:  This was a great session.  I wrote down a large number of questions and many of them will be the feature of upcoming blog entries.

In sum, it was a great session.  I'm eagerly looking forward to next year's.  Expect much greater press coverage when the date is set for #4 and seriously think about showing up.  It is a valuable experience.

Thursday, October 24, 2013

Check out the new links on the side bar!


New on the side bar!  Direct links to all of the current Proteome Discoverer 1.4 videos, as well as access to the beginnings of the Orbitrap methods database.  The third one is only a placeholder, but it will my attempt to de-mystify some of the excessive terminology and get us all speaking the same language!  These new pages came about from suggestions I received during the amazing Proteome Discoverer International User's meeting.  Highlights from the meeting will follow!

Wednesday, October 23, 2013

Two new papers in press at MCP highlight the value of peptide immunoaffinity enrichment


For phosphoproteomics, this has been a by-gone concusion -- you can't get down to the majority of your PTMs without enriching at the peptide level.
Two papers in press now at MCP show the values of these approaches for other PTMS, namely ubiquitination and methylation.
 "Peptide level immunoaffinity enrichment enhances ubiquitination site identifications on individual proteins" from Anania et al., and "Immunoaffinity enrichment and mass spectrometry analysis of protein methylation" from Guo and Gu, et al., both support this fact in their PTMS of interest.

Perhaps more importantly, both papers provide a very nice and effective method for identifying high numbers of peptides with these modifications in a complex environment.  If you are interested in either of these PTMs, definitely download these papers now!

Monday, October 21, 2013

One hour yeast proteome!


Holy shit!  One hour for a proteome?  Not one hour for a protein or two and calling it "proteomics".  Full theoretical proteome coverage for a Eukaryote in one hour.
This study, in press at MCP, from Josh Coon's lab uses the Orbitrap Fusion to pull out ~4,000 yeast proteins in a 1 hour run. And reproduces it.

This is what happens when a great lab get their hands on the fastest and most sophisticated mass spec ever built.  They change our perspective of what can be done and when!  Get it here.

An extremely thorough new review of plant proteomics -- where are we now?


This review is absolutely a work of love.  Called, "A decade of plant proteomics and mass spectrometry:  Translation of technical advancements to food security and safety issues," this review is co-authored by a group representing no less than 6 different countries.  I haven't worked a lot with plants, but I've helped some people who have, so I appreciate the complexity of working with organisms with such complex and often repetitive genomes.

If this is your field, download this thorough review.  You can find it open access here.


Thank you to SpectroscopyNow for leading me to it.  You can find their overview of the review here.

Sunday, October 20, 2013

Georgia Tech Starter -- Let's crowdsource some science!


Crowdsourcing is huge right now.  We're using it to refine ideas, local speed traps on the highway (Waze!), find the best cat videos (Reddit!), and start new businesses through programs like Kickstarter.

Georgia Tech recently came up with the idea of crowdsourcing funds to get scientific research programs going.  The research that is selected for this program is extremely well filtered through a peer review process before it can be listed on the GTS page.

{Begin Ben tirade}Honestly, I consider this a little excessive.  One of the cool things about programs like Kickstarter is that we get to decide what programs we want to fund.  Having some professors pre-filter it takes a little bit of the fun out of it, in my opinion.  Their intentions are good, the peer review process is intended to filter the research to that which is most likely to succeed.  I think this intention actually allows us to forget how much we learn from scientific "failures".
{End Ben tirade}

This is a really cool program and I hope to see more of these in the future!  For more information, hit the GTS website here.

Saturday, October 19, 2013

New UCSD paper shows novel way to think about proteogenomics!



This paper, currently in press at MCP, may get my vote for bioinformatics paper of 2013.  The study comes from Natalie Castellana et al., out of UCSD.

Let me frame it, first, as I see it.  We have an organism that lacks a fully complete and annotated database.  What we do have, however, is a ton of high quality next-gen sequencing data and a few million MS/MS spectra from shotgun proteomics on this organism.  Can we possibly put the sequencing and MS/MS spectra together without having the complete sequence?

It turns out the answer is yes.  Yes we can.  In this impressive study, the team took next gen sequencing data and MS/MS spectra from corn (Zea mays) and lumped the two together, using a really impressive logical progression.  I don't want to ruin the story for you, but what if we stopped thinking that unique MS/MS spectra were the coolest part of our data?  What if we, instead, took a probability based approach and considered the repeat occurrence of spectra to be an indication of the strength that observation is true?  Obviously, we're doing some de novo type sequencing here, and considering that every peptide spectral match has a degree of uncertainty to it (and de novo even more so!) the fact that we've made that identification more than once can actually be considered a very complex functional measurement of the level of certainty of that measurement.

I'm going to stop here.  I lied.  I do want to ruin the story for you, but I am not doing the story justice.

If you are working with an organism that is not fully sequenced, or you want to but the lack of sequencing is stopping you, definitely check out this paper.  "An Automated Proteogenomic Method Utilizes Mass Spectrometry to Reveal Novel Genes in Zea mays" is available in pre-release version at MCP here.

Blood based proteomics markers demonstrated as effective diagnostics for lung cancer


A fantastic new study in this issue of Science Translational medicine demonstrates remarkable correlation between a particular combination of blood-borne peptide markers and lung cancer.  By evaluating over 300 markers in both patients with benign and cancerous lung legions, the team found 13 markers that were highly predictive of lung cancer.

I first found this article through a mention in GenomeWeb regarding plans to commercialize this and other techniques.
The original article can be found here.

Perfinity. Remove the variability from your digestion. Oh..and do it in 4 minutes.


I think you could very successfully argue that sample prep may be the aspect of proteomics where the most chaos is introduced.  This is particularly true if you are accepting samples from multiple sources.  Digestion is one of those steps.  How much detergent (if any), urea (if any), reduction and alkylation techniques (and on and on) are all highly variable.  I'm not sure I know two people who do this exactly the same way.  And even when you do this the same way, there is still variability (see this very thorough analysis this year from Piehowski, et al., from May of this year).

Perfinity is a company in Indiana that wants to remove the variables from your protein digestion. They do this by employing a thermocycler and some very cool and very proprietary techniques to obtain reproducible and incredibly thorough digestion and peptide map coverage.  The technique, called, flash digestion can digest up to 96 samples at once and has the potential for creating these digestions in as little as 4 minutes.

For more information, see the Perfinity website.


Friday, October 18, 2013

New quality control tools for proteomics!


A few years ago I first submitted my book proposal for a text detailing Quality Control in Proteomics.  One of my favorite rejections involved a statement about a general lack of interest in the field.

But now!  QC is all over the place.  We have cool programs like SympatiQCo, and awesome reagents like the peptide retention time calibration mixture (PRTC).  More and more people all the time are using some level of quality control before just shooting their samples onto their instruments.

Now we have a new resource for targeted studies of human plasma, namely commercially available protein quality control standards.  These are being produced by a company called MRM Proteomics and you can read more about them here.

The kits from MRM proteomics are heavy labeled peptides from known biomarkers.  It is important to note that they are single peptides from these proteins of interest.  They are going to be extremely useful for testing the general health of your LC-MS system (like the PRTC peptides).

They may also be useful for setting up preliminary assays for these proteins of interest.  An important note:  the ASCP and other clinical organizations will not accept a single peptide from a protein as a positive finding for any assay.  Most reviewers will not accept a single peptide either.  But for setting up your experiment and for making sure your LC-MS is in tip-top condition, this is going to be another extremely valuable tool for our labs.

Thursday, October 17, 2013

U.S. Shutdown is over! Let's do some proteomics!


What good news!  The shutdown is over.  Time to flush air, purge solvents, calibrate, and get back at it!  This has been a particular kicker for me, as a good friend of mine is on maternity leave and she left me open access to her lab while she's out.  I got one fun full day of experimentation before everything locked down.  This morning will be frantically rearranging my schedule to get back in to 1) get my things I left there because I never thought the government would shut down and resume the experiments I had started and 2) try to get in to all the people at the NIH that I had planned to work with over the last couple of weeks.

Wednesday, October 16, 2013

Do you need nanospray to do good proteomics? (Part 2)

I threw out this idea a while back, and some recent experiencies really make me want to reapproach it.

This week I worked with a core facility in Pittsburgh (which is quietly becoming a mass spectrometry powerhouse of a city, btw!).  At this facility, the majority of the work coming in is small molecule work, though nucleotides, whole proteins and shotgun work does make appearances.  Due to the chemical nature of the work, the HPLC in use is a high flow Ultimate 3000, and not enough shotgun proteomics is coming (yet) to justify the purchase of a nanoflow LC.

No problem.  Using the Peptide Retention Time Calibration mixture as a standard, we first benchmarked the sensitivity of the system (a Q Exactive with a 15 cm x 2.1 C-18 column, HESI source and 200 uL/min flow rate).  


I know this is hard to see, but this is 1 picomol of each of the 15 PRTC peptides.  The TIC baseline is almost 1E8.  At 1 picomol.  All 15 peptides came off nicely, when they should have.  The chromatography, obviously, could be improved, but our goal was to benchmark our sensitivity.

Next came the real sample.  A group down the hall digested a mouse liver (in solution), desalted and brought over a vial.  We set up a quick, first pass run to go overnight and came in this morning to this beautiful TIC.



Again, I apologize for the grainy JPEG.  The basepeak signal intensity is around 8E8.  These are quick runs.  Little time was spent optimizing the source conditions, dynamic exclusion, fill times, etc., we just injected a few different size aliquots of this digest.  The digest was 200 ug of mouse liver protein digested in solution and desalted.  If 0% loss, the above injection is 10 ug of protein.  Considering the losses involved due to membrane proteins (liver has an awful lot of membranes in it and no detergents were used) and to desalting, I would be surprised if we wee looking at 5ug of protein.

Yes, 10 ug on a nanocolumn is a lot.  But, come on!  10ug of protein is nothing for most biologists. You get 1-2 mg of protein from a T-75 flask of poorly growing adherent cells without even trying.

How are the results?  Using default percolator outputs provided, ~5550 peptides  and ~1250 unique protein groups.  You can't tell me that isn't awesome for a first pass, non-optimized 80 minute run.  You can, I guess, but I won't believe you.

Next, we took that run and exported a Q Exactive exclusion list through PD and put that in for an otherwise identical rerun.  Summing the two run resulted in ~1650 unique proteins and ~7000 unique peptides.  Not too shabby, in my opinion!

This experience was reinforced when I spent some time working with a big company recently.   Although I saw dozens of mass spectrometers doing proteomics experiments, I never once saw a nanoflow system.  I think that experiments of this kind are becoming more of a regularity than a novelty.  And the work this week demonstrates why.  Mass spectrometers are sensitive enough to work without the added sensitivity from nanoflow sources.  Unless your really need to be digging into the noise to look for the lowest copy number peptides and PTMS, electrospray or microspray may be enough to get you the the proteins that you need.  Nanospray will always result in higher signal, but sometimes you have to take a step back and think about just how much signal you really need.

BTW, these screenshots were graciously given to me by Dr. Bhaskar Godugu, Director of the Mass Spectrometry Facility (Chemistry) at the University of Pittsburgh.  



Tuesday, October 15, 2013

Spectral libraries vs. search engines


Wow.  That image is terrible.  Ugh...
Anyway, it's the content that matters!  And this topic is going to be cool, and important in the future.

Spectral libraries are something we're hearing more about.  That's because the libraries are getting bigger, more useful, and improving in quality.  I think it is a good time to take a look at what they are and what advantages/disadvantages they are bringing for us.

Spectral library searches have been around for a long time.  Originally, they were used for small molecule searching, but they were quickly adapted for peptide searching.  The two that are integrated into Proteome Discoverer 1.4; MSPepSearch (NIST; link coming when the government reopens...) and SpectraST  were two of the earlier algorithms to pop up.  

The concept is simple -- you take identified spectra that you (or somebody else) has sequenced and identified with high confidence in the past and you put that in a library.  Then on your next experiment, rather than go through the statistical magickery of a search engine, you simply compare all of your MS/MS spectra to that of your library.  If the new spectra looks like the old spectra, you have a match.  

PROS
Faster.  Way faster.  In the original paper for SpectrST (by the way, I just found out today that this rhymes with "contrast"), on the same PC, the spectral query speed for SpectraST was 0.005 seconds, while Sequest was 6.4 seconds.  That is almost 1300 times faster.  Partly this comes from the fact that you are comparing a spectra that actually occurred to another.  In a Sequest search, the engine has to look at every possible MS/MS fragment ion and do that comparison.

More sensitive.  By comparing two spectra, you can get away with fewer fragments and of lower intensity than you can with a traditional search engine, mostly for the reasons mentioned above.

CON
You've got to have a library.  And an okay library only cuts it if you want okay results.  If you want good results, you need a good library, and excellent results...  If your spectra has a PTM, but that PTM has never been recorded in a library, that result is gone.

And this is why I haven't really used these engines.  The libraries just haven't been good enough.  But this is the good news:  They are getting better.  Much better.  High resolution MS/MS libraries are around the corner and new tools are coming.

But this is the best news:


With the completion of all these new libraries, we won't have to worry about whether spectral library or traditional searching is better, because we can use them both.  We can use the spectral library engine to filter rapidly though matches, then we can take the spectra that don't match and send those (and only those) through Sequest, Mascot or another engine.  Then, if you really want to you can take the spectra that don't match there and export those by searching with Byonic or Peaks or PepNovo+ and really get down into your data.  

For a video on how to set up this last part, follow this link (watch in HD only!)



Monday, October 14, 2013

Tandem mass spectral libraries for phosphopeptides

Analysis of the proteomics data using spectral libraries has become possible but with the fact the unavailability of a comprehensive spectral library. Although, it is evident that such libraries will become more popular in coming years.

This technical note by Henry Lam Lab just accepted in JPR describes about the spectral library generation for phosphopeptides from human as well as four other model organisms. The claim is better sensitivity over conventional database searching to identify phosphorylated peptides. The other good part is that this library is made freely available to be searched using SpectraST algorithm (PD 1.4 has this algorithm in-build for spectral library searching).

I will be testing this phospho-spectral library and see how it performs in my hand.


Proteome Discoverer International User's Meeting 2013!



Thermo Proteome Discoverer Users’ Meeting Agenda

Courtyard Marriott, 777 Memorial Drive, Cambridge

Wednesday, Oct 23, 2013

08:00 – 09:00 Registration and Breakfast

09:00 – 09:15 Welcome

9:15 – 10:00 PD 1.4: Overview of New Features; Bernard Delanghe, Thermo Fisher Scientific


10:00 – 10:45 Byonic and Preview: New Tools for Proteomics and Glycoproteomics Analysis;

Marshall Bern, Protein Metrics Inc.


10:45 – 11:00 Break


11:00 – 11:45 Using MS Amanda for Identifying High Resolution and High Accuracy Tandem Mass Spectra;  Viktoria Dorfer, University of Applied Sciences Upper Austria, Campus Hagenberg


11:45 – 12:30 A Graphical Bayesian Approach to Mass Spectrometry-based Protein Identification
Spectra;  Oliver Serang, Thermo Fisher Scientific


12:30 – 13:15 An Automated Tool For Creating and Using Curated Spectral Libraries; Barbara Frewen, Thermo Fisher Scientific


13:15 – 14:15 Lunch

14:15 – 15:00 PD 2.0 preview;  Bernard Delanghe, Thermo Fischer Scientific

15:00– 15:30 User presentation TBC

15:30 – 15:45 Break

15:45 – 17:00 Q&A session and wrap-up

17:15 – 18:00 BRIMS tour

Thanks to some last minute scheduling changes, partially brought about by the Government shutdown, I will get to be there.  To register, follow this link.

Sunday, October 13, 2013

Thorough analysis of the most influential authors in proteomics



I found this really nice blog through a twitter link the other day.  I strongly recommend that you check it out.  A cool entry is a breakdown of the number of proteomics/genomics/bioinformatics publications over time, as well as the number of citations the big guys are getting.  What is interesting/unfortunate is the complete lack of women on this list.

Saturday, October 12, 2013

Iterative searching in Proteome Discoverer!


Want to boost your peptide IDs in Proteome Discoverer as well as the confidence of those IDs?  Move from your regular search to iterative searching.  I looked it up (it·er·a·tive ( t -r t v, - r- -t v). adj. 1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness ).

Here is the gist of it.  You take that file and you give it different instances of search engines with different combinations of likely modifications.  For example, if your peptide is one that has acetylation and phosphorylation, you need to have both of those mods in that search, but it is impossible to have every combination (or even very many) in one search engine.  Using multiple engines allows you to search more modifications.  It also allows you to get better confidence on your ID'ed peptides.  For example, if Mascot, Sequest and MSAmanda all give you a peptide spectral match, then it is probably more likely to be true than if only one of the engines give it to you.  So your confidence increases.

How are the results?

On each sample, this iterative (comprehensive search) increased the number of ID'ed peptides.  By a lot.  Yes, at first it seems crazy, but it sure looks like it works.  For full information, look for this on Planet Orbitrap:



Friday, October 11, 2013

DanteR: Data Analysis software by PNNL


Pacific Northwest National Laboratory (PNNL) has developed many useful and handy software tools especially for proteomic data analysis. DanteR is one among them which uses R programming environment. It is user-friendly with graphic-user-interface and helpful to do many of the common tasks such as normalization, Annova, clustering to name a few. It is easy to install and intuitive. The best part is if you are a programmer, you can add your programs to it using the add-on feature. You can download DanteR from here.

Deep proteome coverage by 3 dimensional fractionation


Do you REALLY need proteomics depth?  I mean, do you REALLY need it?  They we know how to get it.  Fractionate fractionate fractionate.

This new paper from Ilian Atassov and Henning Urlaub shows how to get crazy deep proteome depth by employing a 3D separation approach.

1) Do SDS-page at the protein level
2) Cut band and digest
3) Separate the peptides by isoelectric focusing
4) Run those fractions on LC-MS/MS

Using this approach, they were able to identify >3,000 peptides out of a single gel slice.  Worth checking out if you really just have to get to the bottom of your proteome.

Thursday, October 10, 2013

NeuCode for de novo peptide sequencing?


On Tuesday I saw my favorite talk of the year.  Anna Larson-Merrill from the Coon lab gave a talk on NeuCode proteomics for the Indianapolis NPI conference.  What an amazing and flexible technology!  All of the benefits of SILAC and TMT, with only the limitation that ultra-resolution is needed for quan.

Then I found out that yet another application of NeuCode has been discovered.  In press at MCP, is this paper from the Coon lab where they show the use of NeuCode doublets for the improvement of de novo sequencing data.  I stole this figure (below) cause it's too cool to not post (don't sue me! see disclaimer page!)

If you are interested in de novo you need to check this out!  (Bonus points for these authors for using pepNovo+!)

Updated TMT 10plex method in database


The Orbitrap methods database just had its first suggestion/correction.  The scientists of Thermo's Proteomics marketing division, Dr. Rosa Viner and Dr. Michael Blank have spent months optimizing methods for running the TMT 10plex experiments on various instruments and have provided me with an optimized method for this.  It is titled "revised" in the methods database.

Thank you!!!  Let's keep these comments and suggestions coming.  I really want to make this a working resource available for everyone.  Orbitrap methods database.

Wednesday, October 9, 2013

Hekate - A new crosslinking analysis program for people with experience in Linux and Perl



This new article (ASAP) at JPR describes Hekate, a new software package for evaluating protein crosslinking.  You can download it from GitHub here.

And thats where I have to stop.  I have Perl and when I really want to, I can run a Perl script.  Unfortunately, this is a tough one.  GitHub suggests that you get support from another resource (link provided).

 Hekate then falls on my list of nice software ideas that are implemented in such a way that the software really isn't accessible to labs that don't have their own bioinformatician or programmer.

New phosphoproteomics review in press at MCP


In press at MCP is this nice new phosphoproteomics review from Philippe Roux and Pierre Thibault.  While the first half of it is your run of the mill review on this topic, the second half is a very nice review of recent findings in cross-talk between various post translational modifications.  The figures are also extremely nice quality.  All in all, a pretty nice review and definitely deserving of placement in MCP.

Monday, October 7, 2013

Nice article on Next-gen sequencing.


Shotgun proteomics requires databases.  So-called "next gen" sequencing devices rapidly and (relatively) inexpensive new databases.  This article at IEEE Spectrum gives a very nice run down of how these new devices work, as well as a nice look at the problems we are facing with where to place all this data.  Since this is obviously a problem we're also seeing in our field, hopefully one group will come up with a good solution soon and we can share it!

Intact antibody ran on the Orbitrap Fusion


I heard a weird rumor the other day:  That the Fusion can't do intact protein analysis.  I disputed it when I heard it, and figured I ought to have these screenshots up somewhere in case this myth perpetuates somehow.


This is from a Fusion OT run on a commercial reduced antibody at 15k resolution.  I did a rough deconvolution and it came out pretty solid:



For this antibody, the expected masses are in the table above.  Not too bad, right?  Now, if you were being really critical, you'd probably say this isn't all that pretty.  And the truth is that 15k resolution isn't that amazing for an intact analysis -- fortunately this is an instrument that can do 450,000 resolution.  And this file wasn't optimized forever to look amazing.  It was just a run to see what the intact masses look like on the instrument, when it looked okay, we moved onto the next test.  If you have further doubt about the ability of the Fusion to do intacts, I can upload this RAW data file.  Please leave me a comment below if you want it and I'll make it available.