News in Proteomics Research

Wednesday, February 6, 2013

High Point RocketCache -- accelerate your database searching

I was on a plane and reading a lackluster review in Maximum PC magazine about the HighPoint rocket cache and going a little out of my mind.
Let me sum it up real quick: The rocket cache allows you to connect a large (slow) rotary drive to a significantly faster solid state drive. The solid state drive is then used as a memory cache to speed up the transfer rates of data to and from the big spinning drive.
From the reviewers: "As an example, when we ran HD Tune on the one-SSD-plus-1TB combo, we initially saw the drive hit 107MB/s sequential read speeds(the same score it hit on its own), then 169MB/s on the next run, then 194MB/s, anod on it went all the way up to 242MB/s" More than double speed.

The review was lackluster because Maximum PC is primarily for hobbyists who are building or modifying their own PCs for gaming performance. There aren't too many times in games where that kind of increase in hard drive read or write speed is going to make a difference.

Where it will make a huge difference? Database searching!!! If you are running multithreaded processing on a modern processor with 4 or more cores, the limiting factor is almost always your hard drive speed. You slap in a solid state and it will have a bigger increase in your speed than almost any other modification. But big solid states are expensive and there is a limited amount of testing of prolonged heavy usage. If you use a small SSD while you are processing, then you have to constantly transfer data to and from the two drives.

But if you RocketCache it, you get the large cheap storage of the spinning drive, the speed and price of a small SSD and no hassles in constantly transferring data.

BTW, I will own this soon. It currently runs $155 at Amazon and NewEgg

Tuesday, February 5, 2013

Cancer signaling pathway videos from MIT

This link came to me by way of the LinkedIn Biological Mass Spec forum. It is a series of videos on cancer signaling pathways that were delivered at the Koch Institute. If this is your kind of thing, I definitely suggest that you check them out. The link is here. And if you're on LinkedIn, you should check ouf the BMSF!

Monday, February 4, 2013

Printable mass table from PEAKS

I just ran across this handy tool from the nice people at PEAKS. It is a printable mass table that has forward and reverse masses for individual and multiple amino acids. It is a handy reference that I just placed above this monitor in my office. You can find it here.

Get your publication reviewed here!

Crazy pre-coffee idea I just came up with. I currently have a little more difficulty accessing the literature than I once did. Obviously, I can get it, but I have to request full text one at a time from the library system that we use. Here is my idea: what if my readers started suggesting papers? If you have a new paper of your own, or one that you simply find really interesting, send it to me! You can forward me the URL of the abstract and I'll get our library to send it to me. That way you can help me filter. I request a lot of articles and often don't review very many of them because I feel they kind of suck. If I can't blog something nice, I try not to blog anything at all.

Sunday, February 3, 2013

Proteomics for Biomarker Discovery Amazon Presale

This is pretty cool. Tim Veenstra and Ming Zhou's new methods compilation text is now available via Amazon through a special pre-sale offer. If you pre-order it now, the book is $101.67, which is crazy cheap for a Springer Protocols text. They also guarantee that if the price changes at any time before the release, then you will get it for the lowest price. If the title isn't enough to make you buy it, the book also includes the full method for my three stage enrichment process for global phosphoproteomics. Yes, this is a shameless plug.

Friday, February 1, 2013

Optimizing your nanoLC conditions part 4: The effects of a longer column

This is the 4th part of my somewhat convoluted monologue on optimizing your nanoLC gradient.
You can read
part 1,
part2, and
part 3 here.
Again, I'm not a huge chromatography expert, I remember some things from college about theoretical plates and column loading, but not well enough to embarrass myself describing them here. What I do know is that the image above, taken from this Dionex app note, very accurately reflects what I've seen when experimenting with column lengths. In this experiment, they used three column lengths, the same flow rates, and the same injection volume. The important part to me from the MS side of thing is the peak height. When the column length increases, the peak height does as well. You can't see it in the above screenshot, but we can assume this -- if we loaded the same amount onto the column and the height increased, that means that the peak width had to decrease. Decreased peak width means increased chromatographic resolution, less coelution effect, and more results for your MS/MS, and this is just with more material. The real way to jack up your results is to increase the chromatographic resolution AND your peak intensity AND your gradient length. You can read about an old comparison of a 10cm 140 min gradient and a 30cm 240 min gradient here. In these limited experiments, I found that I could save time AND increase my coverage by using a longer column/gradient combination. Again, you have to take a good hard look at the requirements of your lab and the study in question, but I hope this helps get you started. As always, if I can answer further questions or if you have suggestions, don't hesitate to contact me or post questions here and I'll do what I can.

Wednesday, January 30, 2013

Proteomics journals by Impact Factor

I must be preparing some publications because I keep looking up the stupid impact factors of proteomics journals. To save both of us time, I just wrote a bunch of them down. I didn't check to see which system is used, because we all know that every journal is going to use the metric that favors them the most. This list is very incomplete and I can't verify all of the sources, but it could be a useful starting point.

Molecular and Cellular Proteomics: 7.4
Journal of Proteomics Research: 5.1
Proteomics (Wiley): 4.5
Expert Reviews Proteomics: 3.7
Proteins and Proteomics: 2.9
Proteome Science: 2.3
Journal of Proteomics and Bioinformatics: ?
Open Proteomics: 2.0?
Genomics, Proteomics & Bioinformatics: 1.0

If anyone has more information they'd like to contribute, please let me know! I know there are more journals out there (particularly internationally!) and I'd love to have a more comprehensive list.

Tuesday, January 29, 2013

Optimizing your nanoLC conditions part 3: How many full scans do you need?

This is part 3 of this week's monologue on optimizing our nanoLC conditions. BTW, it seems like the title is evolving...
Anyway, this is going to deal with matching our sample and what we want out of it to our nanoLC and MS/MS settings
As I said in part 2, we can go one of two ways -- we can optimize our LC gradient to match our MS/MS settings, or we can go the other direction. Here are the important questions to ask:
1) How complex is the sample?
2) What is more important right now, run time or sample depth?
3) How many MS1 scans do I need?

1) How complex is the sample? Is it a gel spot? An old(ish) paper said that each gel spot from a human sample contains, on average, 5 proteins. That's a really simple sample by today's standards. If you are looking at gel spots, run fast LC, short columns and low cycle times. You'll be fine.
If you are running a whole proteome, which some estimates put at 1,000,000 (1 million! At least for human) you don't want to follow this same plan. Important note: If these estimates are correct, the most extensive study of human proteomics published so far found peptides that belonged to less than 5% of the total proteins present. Every global proteomics study of a complex organism is going to be a small snapshot of the proteins in the cell and what they are doing. Leading into...

2) What is more important to you right now -- the amount of time you put in for each sample, or the total depth of the sample? I have friends who do beautifully reproducible studies of patient proteomes and reliably get 3,000 quantifiable proteins for each patient in 4-6 hours of run time. They made the decision that this was far enough for what their facility is funded to do. Another group I am working with has extremely unique human samples that are probably the key to the malaria vaccine. They may separate a single patient's blood into 144 or more fractions, and take months of run time because the depth of their data is far more important than time. Anything you decide for yourself is going to be a compromise.

3) How many MS1 scans do you need? This gets us (finally!) to the sketch at the top of this entry. Keep in mind that on just about every instrument, the MS1 scan takes the longest, particularly if you want your MS1 scan to be the best quality. It is important to get some MS1 scans, but how many?
This is my opinion, take it or leave it: If I am doing label free quan, I want to have 10 MS1 scans over my average peak. If I am doing SILAC, I shoot for 4 to 6. If I am doing reporter ion quan, I want as few as possible! There is no quan data in the MS1, only the MS2 which also contains the sequencing information. So the MS1 is only useful for the selection of ions for MS/MS.

Not too long ago, I wrote something about cycle time calculations using a Q Exactive as an example. I also made some estimates of the cycle times of the other hybrid instruments (before I worked for who I work for now, and I've never checked the numbers.) So I won't bore you with those details again, but you can get a feel for how I'm thinking about this.

What's even better than thinking about it? Doing the experiment! This is the way I really do it (at least when I was running lots of samples): I look at how many samples are on their way and I decide on a run time that makes sense. My go-to gradients for generic sample types are: 80 minutes for a gel spot, 140 minutes for a gel band or OFF-GEL fraction, and 240 minutes for a pull-down, bacterial proteome, or a survey study of a mammalian proteome. If all that is coming that week is 20 gel bands, I might run a 160 minute gradient just to squeeze some extra data out.

When you get the samples, make a test run. Take a small aliquot of one of the samples (or something representative) and run it using your base method. When it is finished, look at the resulting RAW file. If you are using a Thermo Insturment, don't even look at it, just drop it in the RAW Meat program from Vast Scientific.

RAW Meat does a lot of great things, probably another entry for later. The important thing for here is the TopN spacing feature. This tells you how many times you hit your Top N. For example, look at the picture below:

In this experiment, a Top 10 experiment was employed. In almost every case, the Orbitrap selected 10 ions for fragmentation, suggesting that there is a whole lot more in there to fragment and that we're only scratching the surface.
Now, we could lengthen our gradient to improve our chromatography, or we could increase our TopN. In this case, we raised the cycle to a Top20.

Look at the improvement! Yes, we're still hitting the maximum number of fragmentations as the most common event, but it isn't the only event. And in this particular case, we nearly doubled the number of MS/MS events -- giving us more peptide IDs in the same length of time.

In part 4, I swear I'll get back around to column lengths -- I swear, there is a point to all of this!
On to part 4!

Monday, January 28, 2013

Optimizing your nanoLC conditions part 2: Calculating your peak width at threshold (PWAT)

This is part 2 of this who-knows-how-many parts monologue on optimizing your nanoLC conditions. Part 1 was yesterday, or you can click here to go straight to it.

We're going to talk about the next thing I consider when I'm setting up nanoLC flow conditions: peak width and cycle time. By peak width, I mean how wide your average peak is. Classically, we consider the 1/2 peak width of an HPLC peaks like the one below (stolen from LcResources.com):

It is important to note that the 1/2 peak width isn't that useful in mass spectrometry. What you are interested in is what I will call the peak width at threshold (PWAT). Somewhere in your method, regardless of your instrument, you have set a fragmentation threshold. You don't just want to be trying to fragment everything you saw in your MS1 spectra, so you set this threshold to fragment things that are unlikely to just be noise. Hence your threshold. In discovery experiments, these are often set low (1e3 or lower, though 5e3 is probably the setting I see the most on hybrid instruments) just in case that biomarker is down there in the noise. So, if your peak looks like the one above, and the peak is 1e6 in intensity, the you fragment it clear down at the base. If you use dynamic exclusion to then ignore that peak for the duration of the elution, then you will not fragment it again. If your threshold is higher, say 1e5, then you will fragment that eluting ion first (and perhaps only) at 10% of the way up the peak. Now, these settings sound like they are hurting you, but only in the case of this peak. If you set the threshold too high, like 1e5, and you biomarker peptide only elutes with a peak height of 5e4, then you've missed it. Again, this discussion is well beyond the scope of what I am writing here. I also refuse to turn this into another monologue on appropriate dynamic exclusion settings, you can read one of those here.

Back on topic: What you need to know is your PWAT. The best way to find this is to measure a few peaks in your RAW data and get a good estimate of your PWAT at your LC conditions. Once you know this, you have two choices -- you can either change your LC conditions to match the MS settings, or you can change the MS settings to match you LC conditions.

Sorry this is so short, I ran out of time this morning. Next up: cycle time vs peak width (link here!)

Sunday, January 27, 2013

Optimizing your nanoLC gradient and number of MS/MS events to your column length, part 1: How to calculate your gradient delay

First of all, this is a big topic and one that is definitely beyond the scope of what I could possibly write here today. I'm going to break it into several parts throughout the week.
With that out of the way, I needed to get some ideas out there. I recently visited one lab that was using 50cm nanoLC columns and trying to get efficient peptide separations and IDs using 15 to 30 minute gradients for whole cell digests. The following week I visited a lab that was using 10 cm columns and 180 minute gradients to study single protein digests. While both approaches can work, it just served to outline the extremely different approaches that our being used in the field. I'm, by no means, a great chromatography expert, but I will walk you through what I consider when establishing a nanoflow set up.
1) The very first thing I consider is the total dead volume of my LC system. I start by taking a ruler and measuring the total length of my output line from my mixer to my LC column. I take the internal diameter (ID) of these lines and the length I measured and calculate my empty dead volume. There are good calculators at IonSource.com, but I usually use the MSBioworks App on my Ipad or phone. I then make the same calculation for my column. While the column is filled with stationary phase, I ignore this effect because it's permeable and because it is just simpler that way. When I add the two together, I have my total system dead volume.

2) I divide the total system dead volume by my base mobile phase flow rate to get what I call my gradient delay, the amount of time it takes for what I have in my mixer to actually elute from the tip of my emitter.
On some systems this is pretty small. If you are using nanoflow tubing with a 20 um ID and a 10 cm picofrit column (column + emitter) and 20 cm of line to connect your LC to your nanosource, you are looking at a total system dead volume of ~1 uL. At 200nL/min, you will see a 5 minute gradient delay. If you are using a system with 75 um ID lines where your LC isn't exactly beside your instrument (55cm inlet lines are pretty common) and a 50 cm column, you are looking at a dead volume of ~4.6 uL. Your gradient delay at 200 nL/min is going to be 23 minutes. In other words, if your gradient runs to 30% B in 30 minutes, 30% organic will not actually be eluting from the tip of your emitter until 53 minutes into your run. If your gradient is 45 minutes in length, your peptides will be eluting during the following run.

On to part 2!
On to part 3!