Wednesday, January 30, 2013

Proteomics journals by Impact Factor


I must be preparing some publications because I keep looking up the stupid impact factors of proteomics journals.  To save both of us time, I just wrote a bunch of them down.  I didn't check to see which system is used, because we all know that every journal is going to use the metric that favors them the most.  This list is very incomplete and I can't verify all of the sources, but it could be a useful starting point.

Molecular and Cellular Proteomics:  7.4
Journal of Proteomics Research:  5.1
Proteomics (Wiley):  4.5
Expert Reviews Proteomics:  3.7
Proteins and Proteomics:  2.9
Proteome Science:  2.3
Journal of Proteomics and Bioinformatics: ?
Open Proteomics:  2.0?
Genomics, Proteomics & Bioinformatics:  1.0

If anyone has more information they'd like to contribute, please let me know!  I know there are more journals out there (particularly internationally!) and I'd love to have a more comprehensive list.

Tuesday, January 29, 2013

Optimizing your nanoLC conditions part 3: How many full scans do you need?


This is part 3 of this week's monologue on optimizing our nanoLC conditions.  BTW, it seems like the title is evolving...
Anyway, this is going to deal with matching our sample and what we want out of it to our nanoLC and MS/MS settings
As I said in part 2, we can go one of two ways -- we can optimize our LC gradient to match our MS/MS settings, or we can go the other direction.  Here are the important questions to ask:
1) How complex is the sample?
2) What is more important right now, run time or sample depth?
3) How many MS1 scans do I need?

1) How complex is the sample?  Is it a gel spot?  An old(ish) paper said that each gel spot from a human sample contains, on average, 5 proteins.  That's a really simple sample by today's standards.  If you are looking at gel spots, run fast LC, short columns and low cycle times.  You'll be fine.
If you are running a whole proteome, which some estimates put at 1,000,000 (1 million! At least for human) you don't want to follow this same plan.  Important note:  If these estimates are correct, the most extensive study of human proteomics published so far found peptides that belonged to less than 5% of the total proteins present.  Every global proteomics study of a complex organism is going to be a small snapshot of the proteins in the cell and what they are doing.  Leading into...

2)  What is more important to you right now -- the amount of time you put in for each sample, or the total depth of the sample?  I have friends who do beautifully reproducible studies of patient proteomes and reliably get 3,000 quantifiable proteins for each patient in 4-6 hours of run time.  They made the decision that this was far enough for what their facility is funded to do.  Another group I am working with has extremely unique human samples that are probably the key to the malaria vaccine.  They may separate a single patient's blood into 144 or more fractions, and take months of run time because the depth of their data is far more important than time.  Anything you decide for yourself is going to be a compromise.

3) How many MS1 scans do you need?  This gets us (finally!) to the sketch at the top of this entry. Keep in mind that on just about every instrument, the MS1 scan takes the longest, particularly if you want your MS1 scan to be the best quality.  It is important to get some MS1 scans, but how many?
This is my opinion, take it or leave it:  If I am doing label free quan, I want to have 10 MS1 scans over my average peak.  If I am doing SILAC, I shoot for 4 to 6.  If I am doing reporter ion quan, I want as few as possible!  There is no quan data in the MS1, only the MS2 which also contains the sequencing information.  So the MS1 is only useful for the selection of ions for MS/MS.

Not too long ago, I wrote something about cycle time calculations using a Q Exactive as an example.  I also made some estimates of the cycle times of the other hybrid instruments (before I worked for who I work for now, and I've never checked the numbers.) So I won't bore you with those details again, but you can get a feel for how I'm thinking about this.

What's even better than thinking about it?  Doing the experiment!  This is the way I really do it (at least when I was running lots of samples):  I look at how many samples are on their way and I decide on a run time that makes sense.  My go-to gradients for generic sample types are:  80 minutes for a gel spot, 140 minutes for a gel band or OFF-GEL fraction, and 240 minutes for a pull-down, bacterial proteome, or a survey study of a mammalian proteome.  If all that is coming that week is 20 gel bands, I might run a 160 minute gradient just to squeeze some extra data out.

When you get the samples, make a test run.  Take a small aliquot of one of the samples (or something representative) and run it using your base method.  When it is finished, look at the resulting RAW file.  If you are using a Thermo Insturment, don't even look at it, just drop it in the RAW Meat program from Vast Scientific.

RAW Meat does a lot of great things, probably another entry for later.  The important thing for here is the TopN spacing feature.  This tells you how many times you hit your Top N.  For example, look at the picture below:

In this experiment, a Top 10 experiment was employed.  In almost every case, the Orbitrap selected 10 ions for fragmentation, suggesting that there is a whole lot more in there to fragment and that we're only scratching the surface.
Now, we could lengthen our gradient to improve our chromatography, or we could increase our TopN.  In this case, we raised the cycle to a Top20.


Look at the improvement!  Yes, we're still hitting the maximum number of fragmentations as the most common event, but it isn't the only event.  And in this particular case, we nearly doubled the number of MS/MS events -- giving us more peptide IDs in the same length of time.

In part 4, I swear I'll get back around to column lengths -- I swear, there is a point to all of this!
On to part 4!

Monday, January 28, 2013

Optimizing your nanoLC conditions part 2: Calculating your peak width at threshold (PWAT)

This is part 2 of this who-knows-how-many parts monologue on optimizing your nanoLC conditions.  Part 1 was yesterday, or you can click here to go straight to it.

We're going to talk about the next thing I consider when I'm setting up nanoLC flow conditions:  peak width and cycle time.  By peak width, I mean how wide your average peak is.  Classically, we consider the 1/2 peak width of an  HPLC peaks like the one below (stolen from LcResources.com):

It is important to note that the 1/2 peak width isn't that useful in mass spectrometry.  What you are interested in is what I will call the peak width at threshold (PWAT).  Somewhere in your method, regardless of your instrument, you have set a fragmentation threshold.  You don't just want to be trying to fragment everything you saw in your MS1 spectra, so you set this threshold to fragment things that are unlikely to just be noise.  Hence your threshold.  In discovery experiments, these are often set low (1e3 or lower, though 5e3 is probably the setting I see the most on hybrid instruments) just in case that biomarker is down there in the noise.  So, if your peak looks like the one above, and the peak is 1e6 in intensity, the you fragment it clear down at the base.  If you use dynamic exclusion to then ignore that peak for the duration of the elution, then you will not fragment it again.  If your threshold is higher, say 1e5, then you will fragment that eluting ion first (and perhaps only) at 10% of the way up the peak.  Now, these settings sound like they are hurting you, but only in the case of this peak.  If you set the threshold too high, like 1e5, and you biomarker peptide only elutes with a peak height of 5e4, then you've missed it.  Again, this discussion is well beyond the scope of what I am writing here.  I also refuse to turn this into another monologue on appropriate dynamic exclusion settings, you can read one of those here.

Back on topic:  What you need to know is your PWAT.  The best way to find this is to measure a few peaks in your RAW data and get a good estimate of your PWAT at your LC conditions.  Once you know this, you have two choices -- you can either change your LC conditions to match the MS settings, or you can change the MS settings to match you LC conditions.

Sorry this is so short, I ran out of time this morning.  Next up:  cycle time vs peak width (link here!)

Sunday, January 27, 2013

Optimizing your nanoLC gradient and number of MS/MS events to your column length, part 1: How to calculate your gradient delay

First of all, this is a big topic and one that is definitely beyond the scope of what I could possibly write here today.  I'm going to break it into several parts throughout the week.
With that out of the way, I needed to get some ideas out there.  I recently visited one lab that was using 50cm nanoLC columns and trying to get efficient peptide separations and IDs using 15 to 30 minute gradients for whole cell digests.  The following week I visited a lab that was using 10 cm columns and 180 minute gradients to study single protein digests.  While both approaches can work, it just served to outline the extremely different approaches that our being used in the field.  I'm, by no means, a great chromatography expert, but I will walk you through what I consider when establishing a nanoflow set up.
1)  The very first thing I consider is the total dead volume of my LC system.  I start by taking a ruler and measuring the total length of my output line from my mixer to my LC column.  I take the internal diameter (ID) of these lines and the length I measured and calculate my empty dead volume.  There are good calculators at IonSource.com, but I usually use the MSBioworks App on my Ipad or phone.  I then make the same calculation for my column. While the column is filled with stationary phase, I ignore this effect because it's permeable and because it is just simpler that way.  When I add the two together, I have my total system dead volume.

2)  I divide the total system dead volume by my base mobile phase flow rate to get what I call my gradient delay, the amount of time it takes for what I have in my mixer to actually elute from the tip of my emitter.
On some systems this is pretty small.  If you are using nanoflow tubing with a 20 um ID and a 10 cm picofrit column (column + emitter) and 20 cm of line to connect your LC to your nanosource, you are looking at a total system dead volume of ~1 uL.  At 200nL/min, you will see a 5 minute gradient delay.  If you are using a system with 75 um ID lines where your LC isn't exactly beside your instrument (55cm inlet lines are pretty common) and a 50 cm column, you are looking at a dead volume of ~4.6 uL.  Your gradient delay at 200 nL/min is going to be 23 minutes.  In other words, if your gradient runs to 30% B in 30 minutes, 30% organic will not actually be eluting from the tip of your emitter until 53 minutes into your run.  If your gradient is 45 minutes in length, your peptides will be eluting during the following run.

On to part 2!
On to part 3!


Saturday, January 26, 2013

SprayQC -- take the guesswork out of nanospray!

I talk about this paper so often that  I was certain I had written about it.  SprayQC is an open source software program from Richard Scheltema and Matthias Mann that aims to provide easy computer monitored analysis of your nanospray.  It requires a bit of hardware, including a new camera and video card for your PC ($100?) and is absolutely worth your time.
 Picture this:  Middle of the night and your deep into fraction 7 of these tumor digests and your spray goes all wonky.  Normally, you get in the next morning and it ruins your day.  Sample lost forever.  But if you have SprayQC and your spray goes wonky?  The runs stop.  And by wonky, I mean that it is using a video to monitor your spray stability in real time, it also monitors your LC system for errors and it monitors you TIC baseline for weirdness.  If these occur, SprayQC stops after your run and then sends you an email describing why and how the run was stopped.  In my opinion, everyone should have this software installed on their PC. It was written with the EasyNano and the NanoFlex source in mind, but it is easily adaptable to other LCs and nanosources.
You can read more about this fantastic program here.

Friday, January 25, 2013

Comet search engine -- free multicore Sequest?

Comet is the topic of this new paper from Jimmy Eng et. al., and it appears to be a search engine based on the original Sequest algorithm.  It has been adapted to be multi-threaded so that it can work on multicore PCs.  There are some caveats, however, to this (as well as most) free stuff.  The first is that there currently is no GUI (graphical user interface), you'll have to write one yourself.  The alternative is that you will have to order your searches using a command line (DOS, for us old guys) which will get very tiresome.  It also doesn't appear to be 64-bit compatible.  It is definitely nice that it is out there, but you will have to do a little work to get it where you want it.  I do like the icon.  It's blown up here because I couldn't find a higher resolution graphic, but it looks nice on a desktop.

Thursday, January 24, 2013

Proteomic snapshot of breast cancer cell cycle: G1/S transition point


When DNA damage occurs in normal cells, the cell development/division cycle will stop at the next of several checkpoints that occur throughout normal progression.  The cell will stay at that point until the damage can be repaired.  If the damage can't be repaired the cell will never leave that checkpoint and won't divide again.  If the damage is extreme, the cell may voluntarily self-destruct through the process of apoptosis.  Cancer cells, however, will just shoot through these checkpoints and keep dividing regardless of the amount of DNA damage present.
How and why a cancer cell does this is the focus of this new paper from Milagros Tenja and (a member of my thesis committee) Iulia M. Lazar.  In this study they look at the breast cancer cell line MCF-7 and perform label free quan using spectral counting to identify pathways that are involved in enriched nuclear and mitochondrial fractions.  The paper is a featured article of this month's Proteomics, so you should check it out.  Go Hokies!

Monday, January 21, 2013

PBI Shredder -- use pressure to extract your proteins?

I recently read about this instrument in The Scientist.  It is a permutation of your basic homogenizer systems.  While I'm not completely clear on its mechanism of operation, what I do get is that it uses a finely adjustable pressure to lyse cells with minimal shear pressure.  While it is marketed primarily for DNA extraction, the producers do mention that it can be used for proteins.  While it would probably have limited usefulness for most of us shotgun MS/MS people, particularly the membrane centric ones (like me!), I could see where it could be useful for others.  This might be a great way to get proteins out for top-down analysis of protein complexes, as sonication does tend to damage larger proteins more, who knows what it does to large protein complexes (probably bad things!).  You can read more about the PBI Shredder at the manufacturer's website, as well as watch a video!

Sunday, January 20, 2013

StavroX -- software for disulfide link analysis

This paper is a little older (2011) but just recently came to my attention.  The study, by Gotze, et. al., details the construction and testing of StavroX a software package for studying enzymatically linked peptides with intact disulfide bridges.  Following the development, the authors go on to show that the software works with three different biological systems.  If you are interested in protein-protein interactions, you might want to check this out!

Wednesday, January 16, 2013

Does Hyper threading work in proteomics applications?


This is an interesting thing that recently came to my attention.  Does "hyper" or "virtual" threading actually work in Proteomics applications?  In at least one instance I've seen evidence that it does not.
What is hyperthreading?  It is a virtual processing unit used by Intel processors that enables tasks to be performed while one core is not busy.  Here is an illustration I stole using  Google Images:


The gist is this:  In normal applications using multiple processing cores, sometimes one core isn't doing anything.  When that occurs, hyper-threading goes ahead and runs the next processing thread.  In most applications, this allows the CPU and motherboard to pretend they have additional cores.  This runs on the assumption that there will be dead time for the cores.
This is where the problems comes in with hyperthreading and proteomics data processing:  when you're running a search algorithm on a huge proteomics file, the cores never get a chance to take a break -- or at least very rarely.  With no stop in the processing on-slaught, the processor that is pretending to be an 8 core processor, is the 4 core it really is and no advantage is gained from pretending.
Keep in mind that this is from an extremely limited experiment but:  a comparison of an Intel processor with 4 cores and hyperthreading enabled to an AMD 8 core processor came up extremely different than one would predict when comparing their rankings on the Passmark CPU benchmarking chart.  The Intel processor in question was ranked considerably faster by benchmark but was absolutely smoked by the 8 core PC although Passmark's study had shown the AMD was less than 70% of the speed of the Intel processor.  Again--limited experiment, but when the Intel processor in question was almost 10 times as expensive as the AMD, it makes you want to try it out yourself, right?
 I'd love to hear from other people who have data on this!


Sunday, January 13, 2013

Proteois: FDR for label free quan!


Currently in press at MCP is this paper from Marianne Sandin et al., that describes Proteois, and adaptive alignment algorithm for label free quantification experiments.  An interesting aspect of this algorithm is the use of a false discovery rate (FDR) calculation during the alignment stages.  Another nice feature is that there are multiple readouts during the alignment and quan steps that allow you to rapidly troubleshoot problems with your analysis.

Thursday, January 10, 2013

ASMS 2013 Deadline is rapidly approaching!


It is January 10th people!  Less than a month left to get your abstracts in for ASMS 2013.  Don't miss your chance to show off your work in sunny Minneapolis!

Wednesday, January 9, 2013

iPathways


iPathways is a program available through the App store for iPhone and iPad that catalogs biological pathways.  The pros are that the App is really fast, since the pathways are images rather than changing objects.  The downside to this particular App is that each pathway is so extensive that you have to pinch and drag an awful lot to move around the image.  Is it as good as a searchable manually curated database such as Protein Center?  Definitely not!  But is it useful?  Absolutely.  And don't take my word for it, this App has over 6,500 other registered users.

Tuesday, January 8, 2013

Another good reason to use Proteome Discoverer for Orbitrap data!


So this is another thing that, yes, is probably biased cause you know who I work for.  But if you've read anything I've written over the years, you know that I have been plugging this software for a long time, even back when I worked for other companies and government entities.
Anyway, I first heard back this summer that some open sourceware proteomics programs might be mishandling Orbitrap RAW data.  I only got the chance a few weeks ago to check it out.

Summary (and, as always, email me directly if you want the data):

Conversion of Orbitrap Elite data:
1.)  Mascot Distiller only does it correctly if peak picking is employed.  If you take results from the file, it does not grab the correct monoisotopic mass.
2.)  MM File Conversion 3 does not grab the high resolution monoisotopic mass
3.)  Neither does Mass matrix (both only use the pre-scan lower resolution mass)
4.)  The ProteoWizard and Transproteomic pipeline appear to use the correct mass, but the monoisotopic does not match ours exactly

Conversion of Q Exactive data
1)  I haven't had a chance to investigate the monoisotopic mass (part 2, later?) but this week I discovered that the ProteoWizard, while getting the correct monoisotopic, incorrectly centroids profile MS/MS spectra leading to massive confusion and processing crashes.

These are all good and very useful programs.  But if they aren't reading your RAW data correctly what is the point?  Thermo is so concerned about this problem that there is a free solution that has been around for quite a while.  If you aren't going to use Proteome Discoverer for processing your data, you can at least use the free demo version and the Daemon to convert your data to MGF or MZML or MZDATA before running your hard-earned data on these other processing programs.  You can download the Proteome Discoverer demo at the BRIMS software portal.  You can find tutorials on how to set up PD and the Daemon so it will still convert your data after the 60 day trial has ended (when it converts to the PD Viewer package) at Planet Orbitrap. You'll have to register (for free) at both sites for access to the downloadable content.


Wednesday, January 2, 2013

iCOPa Heart Proteome Database App

Happy New Year!
For anyone doing proteomics on heart tissue -- there's an App for that!  The iCOPa is a curated database of proteomics data that has been stored.  Unfortunately, I can't really speak to the ease of use or efficiency, because I don't actually have any heart tissue proteomics data to filter it with, but it's a great idea nonetheless!
It is linked to the data of the cardiat organella protein atlast knowledgebase, that you can learn more about here.