Monday, June 25, 2012

Free Graphpad calculations

This webpage came up in a conversation over the weekend.  Its one of those things that has been around for so long I thought everyone used it.  This site by GraphPad will do statistical calcuations (unpaired student's t test, and Welch's t test) online in about 5 seconds.  Cut the columns out of an Excel spreadsheet and paste it right into one of their two columns.  Forget what p value is significant or extremely significant?  Who cares, GraphPad will tell you.  The only drawback is that you are limited to 2 sets for comparison.  You have to buy the real version for more.

Saturday, June 23, 2012


Okay, so this is probably a weird entry.  Weird because I don't want to lose credibility with my small audience because I'm writing about a product produced by my new employer.  Regardless, I really wanted to write this entry because Pinpoint is a great piece of software.  And if you're like me, you've heard of it, but don't really know what it can do.

This entry is also probably a bit premature, because it looks like I've only scratched the surface of what Pinpoint is capable of.  But after several days of using it, I'm beginning to become very comfortable with the basic functions of the software.

First of all, Pinpoint is for targeted proteomic studies.  Specifically, if you have a protein (or, more importantly, protienS) of interest and want to perform targeted qual/quan analysis on different samples for the specific peptides, Pinpoint is your software.

When I participated in targeted studies at the NIH, we always felt that we were limited to either peptides that we had identified in discovery runs.  When we specifically went looking for a protein or peptide that had not shown up in a previous experiment we went through the following steps:
1)  Looking up the protein sequence through NCBI (and trying to guess the right one, because the NCBI is almost at the point where it has too much information to sort through)
2) Taking the sequence (that is hopefully correct!) to the UCSF protein prospector and performing an in silico digest of the protein sequence.  Since you can't save your settings, every modification and setting has to be re-inputted every time you reopen the webpage.  (Please don't think I'm knocking the Prospector project, btw, I will never fully express my gratitude to the University of California for setting up and maintaining that site!  See the lavish praise I showered on this site in my first book for more information!)
3) From the Prospector output, manually remove all singly charged, redundant, and unlikely peptides from the massive list of possible peptide masses that resulted
4) Move that list into the Always Include box within your Xcalibur method.
5) Run the sample
6) Manually extract the peaks and areas for each peptide that you found using Xcalibur and plot your standard curve using Excel
7) Make a professional looking output graph in Powerpoint.  The trick is adjusting your peaks for the inevitable little shifts in retention time (or big shifts, if you are using and Eksigent...)

And this way works.  I know that some of my readers are still doing it this way.  After you've done it a few times, it doesn't take all day to show that your protein is upregulated after drug treatment, just most of a day.

Pinpoint is so amazing because it does all of it for you.  Every bit of it.  In about a minute.

This is the order of events
1) You tell Pinpoint what protein(s) you are interested in.  It can be from your own FASTA file, or it will look it up for you and it inputs the sequence.
2) You tell Pinpoint what you want to digest it with.  You can save these settings, so it knows you use trypsin and you expect no more than 2 missed cleavages, and you use iodoacetamide.
3) It digests your protein and gives you only the good, relevant peptides that you will be able to get information on.  
4) It also predicts the RETENTION TIME of said peptides.  No joke.
5) You export your file and copy it right into your Method file.
6) You make your runs
7) You drag the completed .raw files into Pinpoint
8) It plots the abundance of all of the peptides from your theoretical digest from each of the samples, lining up the peaks and making all the small adjustments in retention time.  Resulting in an absolutely gorgeous output file.
If you are doing targeted proteomics studies the way that I mentioned above, you want this software.  But, seriously don't take my word for it.  Go to the Thermo-BRIMS portal and download the free trial version.

What mass range contains the most (and best) fragment ions? Part 2

This is a continuation of an entry from a couple of weeks ago.  The question is this:  If I'm using an LTQ Orbitrap system, should I always be scanning in MS1 (and selecting fragment ions) with an m/z from 300 to 2000, or am I wasting valuable scan time?  Should I really be scanning from 500-1,000 because there is no useful information in the low or high mass ranges?  In the first analysis we looked at a file that was generated from an LTQ Orbitrap Velos using a standard Top10 CID method.  What we observed in that experiment was that although some fragment ions were generated in the m/z range >1600, no peptide sequences were actually obtained from these ions.

The follow up question:  Is this some artifact of the LTQ?  Does the ion trap simply do a better job of fragmenting ions in the mass range I observed?

In answer to this, I obtained two files from an old colleague.  In this experiment, their lab was comparing the Top 10 CID method on an Orbitrap Velos (the high/low method) to the HCD based Top 10 method (high/high).  The same sample (a fraction of human serum) was ran in both methods.  The same amount of sample was injected, the MS1 was set at 60,000 resolution.  The HCD Top 10 ions were read in the FT at a resolution of 7,500.

The CID method identified 57 proteins from 138 high confidence peptides
The HCD method identified 44 proteins from 99 peptides.
-Since high/high method on a standard Velos is slower than the high/low method, these results are definitely in the range I would expect.

The average m/z of the identified peptides from the CID method was ~821
The average m/z of the peptides from the HCD was 769, which seems interesting until you note that the standard deviation of these two averages are ~150 amu.   I was about to consider them the same, but decided to do a student's unpaired t test.  The p value is 0.0162, which is technically a significant value.  With error bars overlapping this much, I am willing to say that the HCD may allow the identification of peptides of lower m/z.

Here are the ID'ed peptide distributions by m/z

First of all, I want to caution that this is again, one experiment, from one run of one particular sample ran in one software (PD 1.3) versus one search engine (Sequest).  But from this extremely limited dataset, it looks like using the high/high method versus the high/low method results in a somewhat different distribution of confidently identified ions.
I'm definitely not done with this concept, or even the analysis of this one pair of samples, but that is all the time I'm willing to commit to this today.

Monday, June 18, 2012

Protein Identification using Top-Down

I admit it, I first downloaded this article in June's MCP because I thought that it was going to be a nice review of recent advances in top down proteomics.  For anyone who hasn't done this kind of work, this is where you do not digest your proteins before performing LC-MS/MS analysis.  When I was doing top down work years ago, I liked to start with 1 single protein and post-deconvolution, I could deliver results + or - 10 daltons.
Recent advances in LC separation of intact proteins as well as high resolution mass spectrometry and new fragmentation methods have allowed several recent studies to report the identification of hundreds of proteins in a single LC-MS/MS run.  It isn't uncommon for labs these days to report deconvoluted mass accuracy in the parts-per-million (PPM) range.

The paper we are discussing here is not, however, a review of top-down proteomics.  This paper is a description of a new piece of software for performing top-down analysis, called MS-Align+.

In order to evaluate the effectiveness of MS-Align+, the authors separately harvest proteins from yeast and a species of salmonella.  The proteins are separated on long LC runs (600 minutes) on a system coupled to an LTQ Orbitrap.  The MS1 spectra were obtained at 60,000 or 30,000 resolution, depending on the experiment.  The intact proteins were fragmented using the HCD cell and the MS/MS spectra were also obtained in the FT cell at a resolution of 30,000.

The obtained spectra were then evaluated with MS-Align+, Mascot, OMSSA.  Even though one of the primary goals of the project was to optimize MS-Align+ for simplicity and speed, the authors report that MS-Align+ compared favorably against every other algorithm they used.

Summary:  Despite what the title suggests, this is not a review paper on top-down proteomics.  It is a nice paper describing a new algorithm for top-down analysis.  The software uses some clever mathematics to run quickly, even on outdated desktop computers.  Unfortunately, due to the incredible similarity of the article's title to that of recent reviews on this subject, I fear that information on this algorithm will not disseminate as quickly as it deserves.

Wednesday, June 13, 2012

Systematic Comparison of Fractionation Methods for In-Depth Analysis of Plasma Proteomes

This paper from Cao, et al., came to my attention while digging through references for a paper I am constructing with my soon-to-be ex-lab.  I've read this through 4 times, because I am absolutely amazed by how different the protocols described in this paper are from the way that I have learned to do things.  These are primarily small differences, but it is the number of these alterations that astound me.
For example:
-Dimethylacrylamide (DMA) was used to alkylate proteins
-Peptides were desalted with ultra-MicroSpin columns from the Nest Group
-The Orbitrap XL performed MS1 at 60,000 resolution, and accumulated MS/MS data from the Top 6 ions
-The data was processed with Bioworks against the UniRef 100 database
-MS1 tolerance was set at 100 ppm and the MS/MS tolerance was 1 Da
-A variable modification was set for the deamidation of asparagine
-MySQL was used to compile the data

Not one of the steps listed above is how I would have done this experiment, but that obviously doesn't matter because their data is beautiful.  They show thousands of unique high scoring peptides for each sample preparation method they evaluate.

That brings me back to the actual paper.  The goal, as stated in the title, is to show what methods produce the best coverage of the plasma proteome.  They compare 1 dimensional gels to OFFGEL fractionation to High Ph-RP-HPLC, as well as the pros and cons of collecting more fractions within each method.  Overall, it is an extremely meticulous work.  If you are interested in plasma proteomics you need to read this paper.

The real take-away for me, however, is how many changes can be made in MS/MS downstream processing that will still result in good peptide coverage.  This also suggests how difficult it is going to be to make all of us transition over to the unified protocols we all know are necessary to really move proteomics into the promise land of cross-laboratory reproducibility.

Tuesday, June 5, 2012

What mass ranges yield the most (and best) fragment ions?

Here is the experimental setup:  normal mouse serum was taken and depleted of the 4 most common proteins, albumin, transferrin, IGg, and the fourth one (I forget).
The remaining proteins were digested in trypsin overnight, using a simple FASP-like method in an Amicon filter with a 10 kD MW cutoff.
The peptides were desalted by ziptip and 1 ug of peptides were loaded on a 30 cm MAGIC column from Proteomics Plus, running at 250 nL/min.  A standard top 10 method (CID) was used on an Orbitrap Velos with a dynamic exclusion after 2 occurrences.  Only ions with >1 charge were used, with a m/z of 350-2000.  The data was processed in Proteome Discoverer 1.3 using Mascot and Sequest with Percolator rescoring
1348 unique fragmentation events resulted in 956 high confidence peptide IDs (71% identification rate)

I have been curious for a long time about the distribution of useful ions.  The real question is, I guess, am I wasting time?  Are there any peptides in the low and high mass ranges, and if not, why am I scanning all the way to 2,000 on every MS/MS?
The average positively ID'ed peptide m/z was:  890.45, with a median of 892.98
The average fragment m/z was:  856.61, with a median of 836.96

What does the actual distribution look like?
Peptides first:

Wow.  Keep in mind that the I started at a m/z of 350, but its pretty obvious that we don't get a lot of ID'ed peptides from the lower m/z range.  Nor do we see anything in the high mass range, with only 1 peptide ID'ed with an m/z >1600.

Does the distribution of fragmented ions look the same?
No.  Up to about 1,400 m/z the distribution looks exactly the same.  It seems like we are picking fragments purely by random distribution.

How do the two overlap?  If the totals from each chart are adjusted to 100%, the distribution looks like this:
The red bars are the adjusted number of fragment ions within that mass range and the blue bars are the number of confident peptide IDs.  What is striking, I think, is the number of low mass fragments we obtained that were not ID'ed as peptides.  It is possible that these sequences were too short to reach our stringent peptide ID cutoffs.  The adjustment causes a funny occurrence around the peptide/fragment median, where we actually identify more peptides than we have fragments.  This is merely an adjustment error that reflects the peptide ID m/z median.

These results beg further analysis, and the first questions I have when looking at this are: 1) is this reproducible, or simply a single occurrence in mouse serum and 2) is this a consequence of using the FT for MS and IT for MS/MS?  I'm going to look at 2 experiments where we performed FT-IT and FT-FT (HCD) analysis of the same samples.

The real take away message here is that we may be wasting precious scanning time in the +1600 m/z range in an FT-IT experiment that is reaping no benefits to our research

Sunday, June 3, 2012

Static exclusion on an Orbitrap Velos

One of the most useful, and possibly least exploited features on an Orbitrap is the Reject Mass List dialog box. This feature allows you to create a list of up to 2,000 ion masses that are ignored when ions are selected for fragmentation.  For uncovering low-abundance peptides, there may be no more powerful function in Xcalibur.
For example, 95% of the protein in human plasma consists of only 14 proteins, primarily albumin.  Even when serum depletion columns are used to strip these proteins out of solution, they are still prevalent.  A single 140 minute LC-MS/MS gradient on a Velos using a standard Top10 method with a dynamic exclusion on peptides > 1 occurence, will result in a peptide output list that results primarily from these proteins.  This is where static exclusion can help.
In a recent study in our lab, we separated tryptic peptides from depleted plasma on a 240 minute LC gradient using a 30 cm MAGIC column from Proteomics Plus.
The first run resulted in roughly 300 peptides from <80 proteins.  A reasonably typical result

All of the peptides from the first run were excluded from the second run.

The second run resulted in >600 peptides from >140 proteins.

All of the peptides from the first two runs were excluded from the third run.

Now, we're getting somewhere.  The third run pulled out >900 peptides from >400 proteins.  The best part?  Less than 5% of the peptides were from our 14 highest abundance proteins.

Its important to note that this was static exclusion + dynamic exclusion.  Using the two hand-in-hand will allow you to dig deeper into the proteome than either alone.

For our work, we are attempting to develop a universal exclusion list to use on the first run of every sample, that way we aren't wasting precious running time on albumin and IGg.

Saturday, June 2, 2012

Enhanced Identification of Peptides Lacking Basic Residues

The new issue of Proteomics has 2 articles in it that I find really interesting.  The first is by Martin Biniossek and Oliver Schilling of the BIOSS Center.  This paper begins to address one of the gaping holes in shotgun proteomics, the requirement that peptides are multiply-charged.  I know I wrote about this once before, but it may be in an article that I haven't transferred from the old blog.
Anyway, in most cases, your peptide must have more than 1 charge on it ore you won't get a good fragmentation pattern.  The MS/MS spectra will show only the fragments that maintained the charge.  For example, if the terminal amino acid on one side of your peptide is charged, you will only see fragments that contain that amino acid.  Since half of the potential fragments are now invisible, you probably won't get enough information from the MS/MS spectra to accurately sequence that ion.  There are other reasons for investigating doubly charged ions that I go into in that missing post.  When I find it, I'll insert it here.
This paper attempts to fill that hole by studying the MS/MS spectra of singly charged ions.  Their motivation is the fact that the use of some alternative proteolytic enzymes result in fewer multiply charged peptides than you get with a tryptic digest.  The real emphasis of this project was to determine if the analysis of singly charged peptides would increase the coverage of peptides lacking a basic residue.

For this study they used two enzymes, GluC and ChymoTrypsin, to digest the proteome of E.coli strain MG1655.  Since these enzymes do not cut at basic residues, the peptides are less likely to multiply-charge.  The proteome was also digested with trypsin, for comparison's sake.

The MS/MS analysis was performed with a QStar Pulsar coupled to an Ultimate 3000 system.
The experiment was done in three ways:
1) Ions were only selected for MS/MS if they were singly charged
2) Ions were only selected for MS/MS if they were multiply charged
3) Ions were fragmented regardless of charge state

As expected, experiment 1 resulted in low peptide coverage, with the chymotrypsin digest producing the most identified peptides, 64.  However, >95% of the peptides identified from the chymotrypsin and GluC digests were found to be the peptides of interest, the ones lacking basic residues.
Experiment 2, the classical approach resulted in the largest number of ID'ed proteins, with trypsin the clear leader at 1108 peptides.
Experiment 3 resulted in fewer peptide IDs, with the highest number again coming from trypsin, but only 989 were ID'ed.  This is surprising at first, until you think about it.  I am sure that more ions were selected for fragmentation in the third experiment than in the second.  Unfortunately, only a small percentage of the singly charged ions could be identified with high confidence.  I have tried similar experiments twice in the past with virtually the same results.  The number of fragment ions goes through the roof, but the number of peptides ID'ed decreases.

In summary:  This paper shows the promise of targeting singly charged peptides for increasing the coverage of peptides lacking basic residues.  It is a quick read and an elegant approach to this problem, and ultimately a nice first step toward addressing a key weakness in our field.

Friday, June 1, 2012

New Position

Things have been happening really quickly in my life recently.  First off, I am leaving the world of government contracting to begin my new career as a Field Applications Scientist for Thermo Fisher Scientific.  Second, I am leaving the grinding commutes of the Washington, D.C. metro area for the wide open plains of Indiana.  A consequence of all these amazing events is that I have fallen behind on my blog entries.  I didn't actually realize how long it had been since I had written until I got an email from a reader in India asking me to keep writing.  (Shout out to Santosh!)
On top of this, a super secret project I contributed to at the NIH is finally going to be unveiled and I'll get to write about it sometime in the next few months, as the paper should be submitted soon.  The manuscript has been sucking up a lot of my time recently, but its off my desk temporarily and I can get away from the revision process for a few days.