Tuesday, March 31, 2015
Wow! I haven't posted in days. I have been busy! And I've got lots of cool stuff to write about thanks to some reader emails and some neat new literature studies, but this one has to go first. I hosted a Proteome Discoverer workshop last week and the coolest idea occurred to me part way through. Possibly this was brought on by a participant comment (there were a lot of smart people in the room!) but I'm going to claim this as my own semi-original thought.
If you are a cancer researcher you have a big interest in what mutations are present in your genes or proteins. What if I told you that PD 2.0 can tell you if you have cancer mutations present and detectable in your proteomics sample? Maybe you already have an awesome and easy pipeline. But I think this is pretty darned cool (and its new to me!)
What you need:
1) PD 2.0 (sorry...)
2) The XMAn database
3) A good contaminant database (go for cRAP)
Import all 3 databases into PD and then set up a normal Processing Workflow:
In PD 2.0, you can select as many databases as you want to use for any search. Here I selected my contaminants, my normal human and my mutants (XMAn database)
Next you need to set up a consensus workflow. The critical thing you need to use is the Consensus node called the Protein Marker. This node lets you keep track of which ID comes from where AND lets you label what the output is:
Here I've labeled my contaminants as such, my proteins that match Uniprot exactly as "normal" and anything from the XMAn database as a cancer mutation. This is what you get in the PD output:
As you'd expect, most proteins aren't mutated. Thanks to the Protein Marker node, however, you get three new columns. You get an X in the column(s) that finds matches you your spectra. Again, most are going to be normal. Here, I cheated, though. I used a HeLa digest and its got mutations all over the place. By clicking on the top of the column I can sort by proteins that have 1 or more mutations in them.
This gets really interesting when you go to the peptide group level.
Here, I did a quick filter by peptides identified that were NOT present in the "normal" Uniprot database. IT turns up several peptide sequences that were only found in XMAn. Known cancer mutations found in HeLa, what do you know? And what can I do next? Well, at the top level you can pull out the XMAn nomenclature for the protein ID (the one I highlighted is O00203).
I figured the easiest place to get info was COSMIC. I typed the nomenclature (as I got it from the PD column into the search bar....and BOOM!
Tons of info! This mutation has been noted in over 100 studies in the past. I get references to all of them, I can look at the structural info from the genome level that leads to this mutation. Now, the obvious next test (running right now!) is: can annotation and/or Protein center make sense of the XMAn nomenclature? If not, I bet you they could get it going pretty quick!
Okay. Again, maybe you have a cooler way of doing this. But I didn't.
P.S. The data looked better when I used MSAmanda over SequestHT. Better FDRs. Maybe due to the single file I'm searching vs. Percolator.
Thursday, March 26, 2015
Histones are hard to study via LC-MS. Problems with histones include tons of post-translational modifications around, many of which are isobaric (same mass!) For some reason, however, people want to do it anyway. Like any LC-MS/MS experiment, we can apply the normal tools. Sometimes, however, you want to have the correct tool for the job.
(Thanks Google Images, for exactly the right image!).
Zuo-Fei Yuan et al., would argue in this new paper in press at MCP that the tools they really needed for studying these tough and important modifications did not exist - so they had to write their own.
Epiprofile is the tool they came up with, and from their description it sounds great. Unfortunately, that is all I have to go on since I can't find a download link for it anywhere. This paper is in press, after all!
What does it do? It separates out the isobaric ions from data dependent experiments by use of specific fragment ion peptides. It then refers back to the chromatographic peaks for quantification data. From the description in the paper it seems to work very well with different quantification technologies, SILAC included, and is good at quantifying a modified peptide sequence from its unmodified counterpart.
Wednesday, March 25, 2015
Okay, one more. So I was scrolling around on SharedProteomics this morning (it requires very little bandwidth!) and ran into a thread discussing a proteomics conference back in August. Turns out it was kind of a scam?! I'm still confused about this. The Cabbages of Doom blog gave the most in depth analysis on it. You can check it out here!
If you're currently asking yourself, what is worse than going to a conference in Missouri? How about a fake conference in Illinois!?!
Tuesday, March 24, 2015
I stumbled upon this paper while looking for one thing and realized it completely solved another problem I've been having.
The paper in question is this gem from Collin Wetzel et. al., and came out last year about this time. In this study, the Limbach Group shows a way to improve the quantification of RNA (one of this group's specialties) via heavy O labeling techniques. In their previous work they found some difficulty distinguishing oxygen isotopes from the natural C isotopes all over the place; particularly in more complex sample matrices.
How'd they clean it up? They de-isotoped their media! What?!? I know!!
They cultured their cells in media that is enriched for C-12 and depleted in C-13! All the sudden there are fewer isotopes to deal with. It made every step of their analysis easier. Less mis-identifications, better quan, easier processing.
I wonder if you could apply this to other isotope-noisy systems?
Monday, March 23, 2015
A local chemist in my area politely emailed me with a question regarding no enzyme searches and how much time these could take.
I realized that I'd never actually looked at this stuff before. Since her instrument is a Q Exactive and I happen to have a nice QE -like run around and I haven't left my desk anyway, I queued up a bunch of runs in PD 2.0
8 PC running at 4.7 GHz/ thread
16 GB of RAM
Solid state drive
Atlanta Hawks/San Antonio Spurs game streaming in HD (NBA League Pass FTW!) in the background.
HeLa digest from 1ug ran on an Elite operating in High-High mode (high resolution MS1/ high resolution MS2, so just like a Q Exactive should run) separated on a 2 hr gradient with a 50 cm EasySpray C-18 column
FASTA: The human uniprot database (Uniprot-Swissprot parsed on the term "sapiens" in PD 1.4) ~ 15MB
10ppm MS1 tolerance
20 mmu (0.02 Da) MS/MS tolerance
Oxidation of M at a variable mod
Carbamidomethylation of C as a static mod
All pretty normal, right?
Here is what I changed around between runs
1) Trypsin; 0 missed cleavages
2) Trypsin; 1 missed cleavage
3) Trypsin; 2 missed cleavages
4) Trypsin; 0 missed cleavages, Semi-tryptic digestion
5) No enzyme search; 0 missed cleavages
Now, in hind sight I should have queued these up on PD 1.4 because PD 2.0 throws my numbers off cause it started running the first 4 at once. When the first one finished it queued up the last one. They are probably close enough. Let's just talk about the Sequest search cause that is what I'm most interested in here.
SequestHT times (as reported by PD)
1) 1 min 37 seconds
2) 1 min 57 seconds (I'm not joking. when people round things to the nearest 7, I assume they are making them up.)
3) 1 min 19 seconds? What? Umm... This may have had to do with PD queuing up different runs and maybe how the game was buffering.
4) Semi-tryptic: 17 min 39 seconds; I'm just going to go right out there and admit I don't entirely know what that means. (Don't tell anybody.) I've always assumed its like this; sometimes it hits trypsin, sometimes doesn't. In my head I assume its the same as like 2 missed cleavages. It obviously isn't. It requires a whole lot more power than that. Guess I'd better look it up..ugh...
According to Matrix Science (orginal page here):
"semiTrypsin" means that Mascot will search for peptides that show tryptic specificity (KR not P) at one terminus, but where the other terminus may be a non-tryptic cleavage. This is a half-way house between choosing "Trypsin" and "None". It will only fail to find peptides that are non-specific at both ends.
I take back the "ugh". This is actually pretty cool. But I digress...
5) No enzyme; 3 hours 4 minute and an odd number of seconds.
Now, this may have got a boost cause it was the last run and toward the end PD could focus solely on processing that run. Plus the Spurs won; honestly it wasn't even close and the game was definitely taking up processing power.
Interesting observation 1: On normal searches and a ton of threads with tight tolerances, PD 2.0 just tears through these data sets. A minute or two each, give or take.
Interesting observation 2: Semi-tryptic search is a big boost in search space (this is, size the of the in silico theoretical digest that we are comparing our spectra to.
Interesting observation 3: During the semi-tryptic and no enzyme search, PD 2.0 doesn't make an index. It says so here (highlighted). I circled and drew a smily face around the 2 missed cleavage search that took 1 min. I'm not entirely sure why, though I'm gonna blame it on cold medicine.
Okay, the next question that pops into my strangely performing brain is this: Did I gain anything here?
Here are the number of peptides from each run:
Interesting. Recently, I began working on making videos for Protein Metrics nodes for PD. When I ran this same dataset with the Preview node, it told me that 15 or so percent of my cleavage sites were missed. That comes dangerously close to being right on the money (15% more than 6675 is ~7700). Interestingly, we didn't gain anything at all by looking at the second missed cleavage event. At 15% probability, missing 2 seems unlikely but to gain exactly zero? Seems like a big coincidence. I'll rerun this one later and update if necessary. The interesting thing is that the semi-tryptic search did the best. It took a whole lot more time to run, but it came back with the most peptides. I did my old manual verification sampling trick and I think these are good matches.
I guess, for high resolution MS/MS sequences and small databases, you might as well use a couple of missed cleavages for searching. It won't affect a high end PC running SequestHT hardly at all. But I think we learned that Trypsin isn't perfect. It is supposed to cut at K and R. And it does...most of the time... but it misses sometimes, at least 15% of the time it will blow right by a K or R and go to the next one. However, it might make more mistakes than that. I'm sure this is in the literature somewhere!
Sunday, March 22, 2015
A reader asked me the other day something like this "hey Dogbreath, how do you do targeted quan on the Orbitrap Fusion?"
So I fished around and found some resources. These videos were created by someone who would prefer to remain anonymous but who also seems to share my obsession with Carl Sagan. He asked me to password lock these, so I used "MisterX"
They aren't much, but maybe there will be more on the way since he just got a whole lot more Fusion access.
How to set up a targeted MS2 type experiment (also called parallel reaction monitoring, or PRM) on the Fusion: https://vimeo.com/122895643
How to set up a T-SIM-ddMS2 : https://vimeo.com/122895995
Shoutout to Darryl for the suggestion. I'll post more resources if I find them.
Saturday, March 21, 2015
Friday, March 20, 2015
Thanks to my dumb body picking up a stupid virus and I'm not going ramping up in the mountains this weekend, I figure its time to work on a few big blog projects I've been wanting to work on.
This one I've been leading up to for a while and I've gotten some emails from you guys about it, along the way. They go something like this "hey Captain talks-to-much, you are always talking about quality control, but you never tell us what you run. P.S. pugs are stupid."
Since you're all so nice about it, I should tell you. For discovery proteomics my favorite QC is the PRTC peptides spiked into the HeLa digest. For the Q Exactives, Orbitrap Elite and Orbitrap Fusion with nanoflow I run 200ng of HeLa with 100fmol of PRTC spiked in.
If you buy the 50fmol/uL PRTC you can add 450uL of 0.1% FA to it andthen just put 200uL of that into the 100ng/uL HeLa vial. Inject 2uL and you are there. You have 100 QC injections for < $2 each. If you keep it at -20C its stable for like 6 months. You can buy the bigger vials, aliquot it out and keep the aliquots at -80C. I think you can get it down to about $0.50 per run.
Okay, so you're asking now "Hey, Smelly, why are you using this one!?! Why not just one or the other." Excellent question (and I've got a cold, I'm supposed to smell bad)! The HeLa digest gives you a nice quick metric. Search it real quick and have a good feel for where you are. Are your number of peptides the same as they were at PM? When the instrument was new? Bingo! If its down, then you can extract the PRTC peptides. You'll always see them, they are equimolar and you know where they ought to elute.
Best of all? I have reference files! I stole this method from work being done by Tara Schroeder and Lani Cardasis who use these to QC the instruments in their labs! And I stole some of their RAW files! So I know what I ought to be looking at. So if I load up the exact same method (ask me if you want it, I'll send it to you) on a QE and run the same gradient and I don't get, I don't know, 16,000 unique peptides, then I know something is wrong. I can line up the two files and see what happened.
This is the QC gradient we go with. And here are the method parameters (click to zoom in):
Why is there a T-SIM in there? So you can test your isolation with multiplex SIM! On select PRTCs!
This is cool because if you are having problems, you can quickly see if it is an isolation issue. At 100fmol you're going to likely end up fragmenting the PRTCs. So you will have a measurement of the PRTCs isolated and fragmented vs. just isolated. Smart, right!?!!?
If you need to diagnose things, you can easily build 2 XIC layouts in Xcalibur. This will allow you to see your PRTCs and peak shapes. If your peaks start tailing and looking gross then you know your column is getting old. If your IDs are low and you are missing the most hydrophilic or hydrophobic PRTCs, then you know that you aren't trapping right or that your pump isn't putting out enough B. Here is a table of what the extracted data should look like:
What else? Okay, what might this look like from the processed level?
DISCLAIMERS (Critical, please don't get me in trouble with people who do this stuff professionally):
1) I'm only leaving up these protein/peptide/PSM numbers (below) as an example to show that there is always run to run variation, even if you run these back to back.
2) Please don't consider these numbers as what you should be getting in all your runs. Your column conditions, emitter age, buffer quality, ion transfer tube cleanliness, background contaminant ions from your deodorant (no joke), the compounds leaching out of your centrifuge tube, the pipettor you picked up the formic acid with, vacuum quality level -- I've seen this for real -- the amount of direct airflow in your room -- the construction they're doing on the other side of the building, who prepped your sample -- and, especially how, etc., etc., etc., all can affect your peptide ID numbers. I just looked for a link, but couldn't find it, there was an old study that showed how 0.1% formic acid in your samples sitting on the autosampler caused a noticeable decrease in the detectable peptides when samples were queued up for a few days....not sure if that reproduced.
3) Even under perfect conditions -- new LC, new column, new instrument run to run variation exists in back-to-back samples.
Here are some representative processed runs. Clean instrument (QE Plus) new column, clean LC, perfect spray stability. Honestly, this is better data than I normally get, but you get the point.
You should establish yourself a baseline for when your instrument is ideal and go from there. A few years ago at ASMS there was some serious controversy regarding whether processed data should be used for QC. I use it to troubleshoot some things, but I'd much rather see the peak shape and intensity!
One thing I can use it for is dynamic exclusion and you can get a feel for that by paying attention to the final column (or row?...whatever it is...) the % Unique.
This is the number of unique peptide groups divided by the number of PSMs. Why is this so important? If the unique % goes down then you, my friend, are over sampling. You need to work with your peak widths and dynamic exclusion cause you are fragmenting the same peptide over and over.
By the way, you can apply this method to other instruments as well. It is always easiest to explain ideas when you start with a QE, though!
Shoutout to my good friends Tara and Lani and Josh Nicklay cause I didn't do any of this. They blew my mind with this at a meeting last summer and I've been using it ever since. I run this at almost every lab I visit and I meant to tell you guys about it a while ago.
Wednesday, March 18, 2015
I had dinner with Dr. Gary Paul and he brought up that there aren't a lot of resources for the peptide mapping crowd and I thought I'd spend some time doing something about it. I figured I could start by rambling about it while I wait for my plane!
I'm going to stick to the term mapping. My blog. My nomenclature. (And this is the most common term right now anyway, I think) I'll add it to the translator to make it official. What I mean when I say peptide mapping is this: I have a protein species and I want to see every single amino acid and post translational modification at every possible level of abundance and I want to do this by LC-MS/MS.
Step 1: Get the protein sequence from somewhere.
People doing this are often studying antibodies or some other pain-in-that-you-know-what protein (I gotta watch the profanities right now....) so they know what they are looking for...to a certain extent. So we've gotta start with what genomics has given us as the protein sequence. This is where some people get caught. When you pull a protein sequence from NCBI, remember this....lots of them are wrong! Us shotgun proteomics people get to ignore that fact because the sequences are mostly right. Mostly. Where do the human uniprot proteins come from? Like 2 people! Seriously. So we sequenced a couple people's DNA, assumed every start and stop codon and intron/exon is correct then we 6 frame translate that DNA into protein and everything is okay? And it is...mostly....
But we start where we have to.
Get the best annotated sequence for your protein that you can get. Be cautious that it might be wrong. Keep this in mind: What you get from the mass spec trumps what you get from NCBI or Uniprot. Every time.
Step 2: Theoretically digest your protein sequence:
Man, I can't state this one enough. It takes like 10 seconds to do this and it can save you so much trouble. If you have PepFinder, it'll do it for you, Pinpoint does it, Skyline does it. If you don't have one of these installed (you should at least have Skyline! come on!) you can do it online and there are many good tools to do it. By default I'll probably do it with this old guy:
this is the UCSF Protein Prospector. Yes, he's been around forever, but it isn't a dead project. New features were just added a few months ago.
Why would I do this second? Once upon a time when I was young and stupid, rather than just stupid I studied a phosphorylation cascade caused by a promising chemotherapy drug. I developed a really extreme method for forcing an Orbitrap XL to get an insane number of phospho IDs. (Triple enrichment + 3D fractionation) Unfortunately, While I got enough for a cool method paper I didn't get anything from my pathway of interest. Because I used trypsin. Because the phospho-sites of interest are surrounded by lysines. Honestly, this entire pathway had a motif that was KxYk. The peptides were too small to be sequenced by LC-MS/MS. They were singly charged. Had I spent, I don't know, maybe a half hour on Protein Prospector maybe I would have realized this and could have went with an alternative enzyme...or chemical cleavage...or even forced semi-tryptic cleavage and would have got more than a method paper out of 4 months of work. (Sorry for the rant)
Don't follow in Dumb Ben's footsteps. Look at your protein digest in silico (theoretically). An ideal protein for tryptic digestion should have lysines or arginines spaced reasonable evenly throughout the protein. They should make up around 1/13 of the total amino acid sequence. If they only make up 1/30 of the total amino acids present, the peptides being produced may be too big for CID or HCD fragmentation (but they may be perfect for ETD! if you have it!). If K and R make up 1/4 all of your peptides may be singly charged and invisible to mass spec. Consider switching enzymes or doing something extreme like 30 minute tryptic digestions at room temp with a decreased amount of trypsin. There are ways around this.
Optional step X: Get an intact protein mass. If you can do it, getting the intact protein mass can be awesome here. This is where you'll find out that your Uniprot sequence was wrong. Or that this protein, when produced in E.coli doesn't cleave the initial methioning. Or...that you are looking at a mixed population. Just because that protein comes off an FPLC size exclusion column as one peak doesn't mean every single protein molecules is the same. Heck (Albert Heck! lol!), a single peak of ovalbumin has >60 protein forms. Keep that in mind. If you have the capability and enough information, man, this is going to be so great for you!
Step 3: Get great chromatography!
I know. This is one protein. I should be able to get this whole thing in 10 minutes, easy, right?!? Maybe. But it depends on what you want. You're peptide mapping, so I assume you want everything. I assume you want at the end 100% sequence coverage. Best chromatography is going to give you the best chance of success. A 4 hour gradient for a single protein is probably excessive....but I don't think a 2 hour run is crazy at all...
Step 4: Sequence everything you can.
In an ideal world, the mass spec will pick out every peptide, that peptide will be of high enough intensity and of perfectly compatible with your MS/MS fragmentation method of choice and you'll walk away with 100% coverage on the first run.
Realistically? Some of those peptides will ionize poorly and will be of low intensity. Some of them will fragment too poorly to sequence and you'll get 10 MS/MS events of high quality for peptides in every region you don't care about. That's okay.
Step 5: Rerun that dumb sample.
A.) There are two approaches here. Both are equally valid. You can take all of the events that gave you MS/MS events that were sequencable and you can put those on an exclusion list and rerun that sample. Put a nice tight mass tolerance on it (<10ppm) and maybe a time restriction limit on it if you can. Then re-run the sample with this new method.
B.) Target it. This is where PepFinder is real powerful. Pepfinder gives you a list of everything that matches your protein of interest...whether it was triggered for MS/MS or not. If it wasn't you can export the list and then build a targeted list. Increase your fill time and go after those regions you don't know about. Get MS/MS for everything that you can in there. If you have to, raise your fill times and do it again!
Step 6: Export the MS/MS spectra that didn't match anything.
This is often overlooked. And can be if you have software (like Pepfinder or Byonic) that can search for unknown PTMs or amino acid substitutions. If you are just using Sequest, for example, export the MS/MS spectra that don't match and try sequencing those de novo. You have options. Peaks is commercial and powerful. The de novo GUI is simple and free. PepNovo+ (command line) can actually do BLAST sequence alignments after sequencing your peptides. Now. Please keep in mind, these unknown peptides may be keratins or other junk from around your lab. Or, the reason your protein doesn't do what its supposed to!
Step 7: Find out what is missing!
Now, if everything went well you should have at least most of this protein figured out. Look at that theoretical sequence. What is missing? Does it make sense that it is missing? If there are two amino acids flanked by lysines and those are missing that makes sense. If there are big regions that are 12 amino acids or so long (between Ks and Rs) and you didn't sequence those, then something weird is going on. Maybe you have a point mutation. Maybe you have a big glyco mucking up that part of the peptide. There is a logical reason for why that is missing and getting to the bottom of that is going to be some work, but it might just be the icing on the top of this awesome study you just put together!
Now, I should get on a plane. I'd like to follow up here later with more visual stuff so I may build on this. I don't have a lot of good single protein stuff on this laptop. But I do have stuff around. More on it later!
Tuesday, March 17, 2015
Everyone wants spectral libraries for something or other! How can you build them? Why not grab Bibliospec from those good people at the MacCoss lab?
What does it do? It makes spectral libraries!
How much does it cost? Nuthin!
Do the programmers know what they are doing?
...Yes. Yes, they know what they are doing....
If you are moving from DDA to DIA (say, SpectroNaut) you don't even have to learn Skyline! Just run Bibliospec on its own. I'm gonna take a swing at it soon and I'll let you know how it works out.
You can get Bibliospec here.
That image above is our good friend Campylobacter jejuni. For some reason I had associated this bacteria strongly in my mind with the "traveler's sickness." A quick Google search says its probably one of many bacteria that can be linked to these unpleasant symptoms we sometimes incur while globe trotting.
Now, our buddy Campy here lives just fine in chickens, but tends to cause some serious problems for us and this new paper from Weston Struwe et. al., (in press at MCP and currently open access) wanted to figure out what the difference was here.
In order to get to the bottom of it, they use a complex experimental design involving both human cells and chicken intestines which is a complex system they previously designed. Then they go after released glycans and analyze them with an LTQ.
The answer? (Gosh, there is a theme here in the literature....) Its a protein glycosylation thing. Sometimes it seems like everything is coming around to glycans!
If glycans are your thing...or if protein abundance isn't answering your questions....you might want to check this out. Man, I know I shouldn't be bummed out that we keep finding solutions to these problems, but glycans are a whole lot harder to work with than regular old peptides. Guess I'm going to have to dust off those old protocols from grad school...
Monday, March 16, 2015
This site at Etsy features hand made jewelry, often with a science theme. Notice above the 2 different Orbitrap designs?!?! Pretty awesome, right?!?
Shout out to Julian Saba for leading me to this cool site.
Sunday, March 15, 2015
We're in that in-between stage right now, I think. That stage where we're about to conquer bottom up proteomics. We're maybe one or two instrument upgrades from being able to quickly and reliably get all the theoretical identifiable peptides in a cell. The next stage (IMHO) is top down. But this isn't a little step, its an exponential one. To move from the goal of 1e5 peptides to 1e6 intact proteoforms....thats gonna be a while....
But we need to know more about how the protein works. How they are interacting and how super important things that we've been ignoring, like 2D and 3D and 4D (quaternary?) peptide/protein structure is affecting things. Biologically, the peptide sequence isn't all that important in relation to those other things.
In an effort to bridge this gap, a great team of researchers at different Belgish (is that right?) institutions, including some people from :
It is detailed in this JPR ASAP paper by Elien Vandermarliere et. al., (sorry, not open access). The goal of PepShell is to allow us to consider the peptides identified by MS/MS in a more realistic sense -- that is, from the biologically relevant protein level. For global proteomics, this goal is so lofty that I can't imagine applying it, though maybe you can. The applications I imagine are for when you've narrowed down the field some. My first run would be to dig through some pull down data I've got around here somewhere to see what PepShell makes of it. I think this could be invaluable for protein-protein interaction studies, particularly when your lists of linear peptide data just didn't answer your questions!
Saturday, March 14, 2015
Finding post translational modifications (PTMS) like phosphorylation is awesome. And maybe its super useful to what you're doing. What is almost always more useful, however, is getting quantification out of your PTMs. If we just go to the peptide level and ignore PTMs we are still at a point where there are more peptides than any mass spectrometer can sequence in a typical cell digest. In the end, we're really just skimming the surface. The absence of a PTM in one sample may seem like the missing key to your puzzle when really it might just be a sampling error.
I'm a big fan of labeled quan. Yes, I know there are drawbacks. The reagents are expensive and they may not be applicable to all experiments and we get ratio suppression in reporter quan experiments. But I'm a busy guy. And I don't have my own Orbitrap. So I've got to steal time on other peoples and nothing out there gets you anywhere near as much data as fast as TMT does.
This cool paper in Open Proteomics from Benedetta Lombardi et al., links the two paragraphs above together. In this paper they evaluate different strategies for doing TMT based quantification of phophorylation. It is a nice little analysis that will save you a great deal of work in case you want to move from compiling lists of phosphorylation events (boring...) to quantifying the shifts in a phospho cascade over a time course following drug treatment or something (way less boring!).
Friday, March 13, 2015
DISCLAIMER: Like anything/everything on this blog, this isn't official vendor stuff. I have successfully generated some intact protein data and a little bit of top-down using the Fusions I have ran into and people have asked me a couple times recently how I've done it. I'm sharing what I did.
With that out of the way! Intact protein masses are relatively easy to generate on the Fusion. The front end is very QE - like and you can generate intact protein masses that look as good as a QE without you having to work all that hard.
That being said, this is the most advanced mass spectrometer ever made. You can put in more effort into adjusting your parameters and get better data than the QE. It depends what you want. Do you need to confirm that your E.coli expressed protein didn't suffer a read-through? Inject 200ng on column and you'll get a mass that will confirm it. Do you need to know the position of a 1 Da shift in that protein? Then you're going to need to put in a lot more work. I can't tell you how to do that second one...yet!
The first one, though:
As with any Orbitrap intact run, I like to start with the lowest resolution first. On the Fusion, that's 15k resolution. You do have the option here, like on the hybrids, of starting with an ion trap scan first. Your choice.
I'm going to like these settings as a starting point for an intact mAB run. Another disclaimer: Good LC is going to help you a lot here. For this assay I generally use 200uL/min on a C4 column and divert the first 2-3 minutes to waste, particularly if this is a stored commercial antibody. There are lots of buffer components that are used to keep the mAB safe. Probably a good idea if that weird stuff doesn't go in the mass spec.
Starting points for mAB:
1e5 AGC target
50ms max fill time
MS1 scans from 1500-4000
S-lens RF at 60%
In source fragmentation energy (SID) at or around 40
325C ion transfer tube temp
And appropriate gas settings for the best solvation at your flow rate. Honestly, I LOVE that flow rate optimization button in the tune settings. You can likely tune it better yourself by putting in some extra work, but it gives you a darned nice starting point. (If you haven't seen it, you put in your flow rate and it sets what it considers the ideal gas pressures and HESI temperatures for best solvation.) Its really cool, particularly for those of us who mostly do nano!
Run these settings, tweak them a little (particularly the SID, often higher for bigger proteins) and run a nice clean intact protein. Average some scans and you're going to get something nice out of it.
Still doesn't look good?
Go into the scan headers and see what is going on. Are you hitting the maximum injection time? If so, move it to 100ms and see if that cleans it up. If that doesn't appear to be it, increase the SID. There is an application note out there on Rituximab (you can find it at Planet Orbitrap) that uses an SID of 80 on a QE with HESI. This is often the most important factor I end up moving around.
In general, you want higher SID for bigger proteins. This is my typical rule:
Light chain SID: 0-15
Heavy chain SID: 20-50
Intact mAB: 40-80
Why the big range? I'm not sure. If I had to guess, I'd say that the solvation of the protein is an additive effect. The goal is to get this big protein all into gas phase. I think the temperature/ESI voltage, SID, S-lens RF and other things all contribute to this effect. You can probably use more voltage and less SID to some degree. I'm not sure.
Once you've tweaked those to the best settings, take a look at the LC peak. Can you make your gradient steeper and sharpen your peaks so that you get more signal?
What does the isotopic envelope look like? Is it cleanly in the range from 2,000-4,000? Change your mass range to that. Get as much signal in there as you can.
Do you see things that don't belong and can't be deconvoluted into your protein of interest? Lower some energies! Lower the ESI voltage, crank back on the SID, lower the S-lens RF.
Intact protein measurements by mass spec is still a kind of art. There aren't a great list of settings that will work for every protein. Heck, I may sometimes re-optimize if I'm moving from one intact mAB to another. Secondary structure, # of basic amino acids exposes, number and types of PTMs, and on and on are all at work here.
Again, in case this wasn't clear: These are just starting points and are not vendor doctrine (as always, please don't sue me!). After I get emailed a question a couple times I figure I can save everybody some time by putting some stuff up here in the hopes that someone out there might find it useful. Correctly tuning the instrument for proteins and fully optimizing it for the higher m/z range is going to get you better data than this, but sometimes you just need to confirm your protein identity or number of PTMs and this is how I would get you there!
Thursday, March 12, 2015
Wanna see where the Human Protein Atlas is right now? Science is hosting a webinar next Wednesday (3/18/15) on this awesome resource.
You can register for it at Science here.
I already scheduled my next Wednesday...doing oligonucleotides on Q Exactive (woooohooo!) so I'll miss it, but you ought to check it out.
Wednesday, March 11, 2015
Parallel reaction monitoring (or quantification via targeted-MS2 fragments) seems to be rapidly increasing in popularity. There is, however, a cap on the number of targets you can look at. This cap is much lower than more traditional targeted quan instruments like the triple quadrupole mass specs (QQQs).
What if I told you that someone just wrote custom software that blows that limit away? Now, instead of doing PRM on 20 or 30 peptides at a time, you can do PRM on over 600 peptides?!?!
Well, that is exactly what is in this paper in press at MCP from Sebastien Gallien et al., out of the Luxumbourg Clinical Proteomics Facility (btw, the LCP needs a cool logo for me to put up).
How'd they do it? First of all they wrote some sweet custom software, I think they said its all in C#. Second, they spike in an isotopic labeled peptide for each of their peptides of interest. Each method is based around a full scan followed by PRMs. The full scans are rapid and are looking, not for the peptide of interest, but instead for the isotopic spike in. When the instrument sees the spiked in peptide which will be eluting at the same time as its endogenous friend, then it starts doing PRMs for the endogenous peptide.
Is that crazy smart or what?!? Think about this. You can spike in your standard at a high level if you want. High enough that the less sensitive full scan can easily see that its time to start monitoring for your peptide of interest. Then the super sensitive PRMs get to work.
Did you change your column and experience a 2 minute retention time shift? This method doesn't care! You aren't trapped into a finely tuned window that was designed on the last column. Because the retention times of your standards are going to shift, you've intelligently shifted your target windows.
Now, I do have to throw a caveat out there. The 600+ peptide targets were achieved on a QE HF. Using this method on the slower QE or QE Plus would probably mean about half the target list...but still! 300 PRM targets on a QE? Heck of a step in the right direction!
Tuesday, March 10, 2015
I don't even care to keep my love of this instrument under wraps. I'm more of a discovery proteomics oriented guy, but this stuff is so easy on the Q Exactive that there isn't all that much to talk about.
More and more I see the Q Exactive replacing more traditional (QQQ) platforms for validation and targeted measurements. Don't get me wrong, a triple quadrupole instrument (QQQ) is a great thing to have, but you can really achieve a lot of what it does by subverting your Q Exactive from discovery for a little bit.
You have these options on the QE for targeted quantification
1.)MS1 based quan - full scan
2.) MS1 based quan with targeted ddMS2
3.) AIF (all ion fragmentation)
4.) MS1 - AIF
5.) Targeted SIM (single ion monitoring)
6.) Targeted SIM - ddMS2 (triggers MS2 if something in the SIM matches your target)
7.) Targeted SIM - Targeted MS2 (performs SIM and then targets everything in the SIM window for MS2)
8.) Targeted MS2 (called PRM, for Parallel Reaction Monitoring)
9. DIA (data independent acquisition)
10.) msXDIA (multiplexed DIA)
11.) pSMART (MS1 followed by an intelligent number of DIA windows)
Eleven methods! And this is without really considering multiplexing!
I'm going to divide them up into 2 groups:
1) Maximum number of targets
2) Maximum sensitivity
Maximum number of targets, lowest sensitivity:
Methods 1-4 and 9-11
On a normal chromatography gradient (60-120 minutes is what I consider "normal") if you want to look at, say, more than 200 targets, you need to go with one of these.
In order of sensitivity, from most sensitive to least, this is the order I'd generally go with.
11 & 2 > 1&3&4 > 10 > 9
Where I'd consider MS1 based quan with targeted ddMS2 to be equal to pSMART in terms of sensitivity and regular old DIA is the least sensitive. Honestly, they all come pretty close and a well tweaked DIA may beat out a not-as-well optimized pSMART.
Lowest number of targets, highest sensitivity:
Methods 5-8. Super sensitive but you can only look at a few targets at a time. For reliable quan, I use the cutoffs Mark Chance's lab uses at Case Western for their clinical proteomics samples. I want to see 12 measurements across the peak to trust that I have a good quantifiable peptide. For this equation, the realistic fact is this. More sensitivity = fewer targets. In order of sensitivity
7 > 5 &6 & 8. In my hands (old data...) 7 is more sensitive because you have 2 chances to get your ion of interest for measurement. You get the chance to measure it with just SIM and you get the chance to measure it with PRM. Half the number of targets. Otherwise the rest are the same.
Consider this really simple example. You have 30 second wide peaks. At 17.5k, the QE can get about 13 scans per second. If we use the cutoff above, 12 measurements sample (heck, round to 13 to make it easy, this means -- at most-- we can really look at 30 targets at a time. Now, this is also assuming that the limit here is the speed of the Orbi, for this to be true, the fill time needs to be 50 ms for the QE or 57 ms for the QE Plus or QE HF. A lot of times, you aren't targeting something that you can pick up well at 50 ms fill time. And some people might argue that 17.5k resolution MS1 scan isn't enough resolution to clearly say that is your target of interest. If we go to a more realistic example of 35,000 resolution, and a fill time of 114 ms for QE then we now have to cut the maximum number of targets we can quantify ...at once... in half, to 15. At 70k, we'll have to drop to 8.
Tricky, right? For a really good analysis of this including illustrations that explain this a whole lot better, I recommend you go to Planet Orbitrap and download the iORBI 2015 called Maximum Peptide IDs by the great Tara Schroeder. I can't direct link to it because you have to be registered and logged into Planet Orbitrap to follow the link.
Sunday, March 8, 2015
Alright...there is a ton of interesting stuff in this paper....and, in general, I don't know exactly what to think of it yet. But I'm gonna tell you about it anyway!
What is it? Its this paper at Nature Medicine from Tiannan Guo et al., and it entitled: "Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps."
What did they do? They took biopsies and digested them using something called a Barcocycler. Google Images tells me it looks like this.
And the first website to pop up describing it (here) describes it as a cell lysis technique using barometric pressure. Considering it includes the word "cycler" in its name, I have to assume it is a more complex piece of equipment than the pressure bombs we used in Microbiology class for chemical free cell lysis.
The authors of this paper describe their digestion method as one that minimized transfers. They put the biopsy tissue into the Barocycler tube and it appears to stay there throughout the several steps until they have nice digested peptides at the end. I'll take that. Lets minimize some variables! Anyone know how much one of these costs?
Now, I don't know anything about pressure cycling digestion. Maybe its massively superior. I'd love to see more data. For a more conservative person, I would argue that there are other devices out there, like the stuff from Perfinity that also minimize transfers and have a good track record for reproducibility in the literature. I'd be interested to see how these compare.
The resulting digested samples where then ran in a data independent acquisition style experiment (called SWATH) using 25 Da windows. The resulting MS/MS files were processing with Open-SWATH against established libraries. They also use a decoy database to work in an effective false discovery rate metric. They do all of their statistics using established available R packages, including aLFQ (which you can find, open access, here...oh...which I guess these authors had a hand in developing as well, I don't know how I missed that before. Nice package!)
The output? Some really nice data. Quantitative analysis of ~2,000 proteins per biopsy.
Now if you're thinking. 1mg of protein and they only got 2,000 proteins?!?! How the heck did this get into Nature Medicine? Is it 1995?
Ummm...did you see the awesome title...and the nice heat maps...?
Update 3/10/15: A reader whose opinion I respect wrote me and said this regarding the Barocyler:
"I used it for a couple of months looking at how well it does homogenizing small amounts of insect tissue. It works, and works well."
Might be worth checking out, especially as it sounds like evaluation of the instrument is a possibility!
Saturday, March 7, 2015
These days, Percolator is the gold standard for separating good peptide MS/MS matches from bad ones. Is it perfect? No, but it is darned good at its job.
The problem with Percolator, however, is that it is assessing your peptide spectral matches with for something like 18 different criteria (list here) and this takes time...sometimes a massive amount of time. But we are getting used to the power of Percolator, particularly as more limitations of much simpler (and way faster!) false discovery rate approaches such as target-decoy become apparent.
Whats the solution? Maybe is NOKOI. If you Google it, you'll have to tell the search engine 20 times that "No...I do NOT mean Nokia..." NOKOI comes to us from some great researchers in Belgium and is described in this new JPR paper from et. al.,
I recently saw a great talk on how this works. I'd rather not spoil the surprises in this great paper (or show how bad I actually am at maths...). Suffice to say, the data looks really good. Results almost as good as Percolator when used in conjunction with Mascot, but without nearly so severe a bottleneck in processing capabilities.
Thursday, March 5, 2015
My collaborators at SUNY Buffalo recently generated an incredible data set that I've been working on. It is comprised of 80 GB of label free Fusion runs. Its about 2.5 million spectra in total. Even my Destroyer takes about 20 hours to do the peak picking, searching and quantification of an experiment this big.
It gives me plenty of time, however, to figure out what Proteome Discoverer is doing behind the scenes. I really like this shot at the top. This is what PD 2.0 is doing during the "Spectrum Selector" node. It is truly using all 8 of my processing cores for this step. Check out PD 1.x at this step sometime. Its typically maxed out on one core. I'm seriously not trying to rub it in that I have this software and you don't. I'm hoping to get you all excited for when the lawyers finally let them release it! True multicore processing at every step!
Wednesday, March 4, 2015
Jargon and acronyms are the bane of scientific understanding. Ever popped in on a talk outside your field and felt like a moron? Maybe that's just me, cause I feel dumb most of the time, but its something we're pretty bad about in this field. Sure, it speeds up communication between us, but when we try to convey these thoughts to the people with the cool samples we have to find a way to drop the insider terms.
What is worse than internal jargon and standards that make communication outside our field harder? When we aren't using the same terms and standards within our field! Maybe I'm pessimistic, but I like the long view here. I want this field to grow up and be everything it promises to be. We've yet to realize that, and in order to do so, we need to get more money. And in order to get more money we need to be reproducible, have great controls and solve awesome biological problems.
Obviously I'm not the only one on this page. There are lots of groups trying to standardize everything. HUPO is definitely a frontrunner. The Proteomics Standards Initiative is a big push to unify how we talk about experiments, how we display data, and how we define what is significant.
A new paper detailing these standards is available here and is open access. Now, the inner punk rocker in the back of my brain looks at this and recoils in disgust. I love being the proteomics rock star just as much (okay, maybe more...I don't know...) as you guys do. Its a whole lot more fun to pop in, solve a biological problem some shmuck's been working on for 10 years with gels and stuff in 5 minutes on an Orbitrap, check out early and go get some beer and some cool tattoos or piercings and call it a day (yes...this is how your average day unfolds in my head...don't ruin it for me!) But the fact is we're all going to have to start following rules like QC and data sharing standards if we're going to make it to the next level.
Heck, if nothing else I think its becoming apparent that we're starting to run out of the easy problems...and places for piercings...
This is photoshopped, I'm sure. Shoutout to @PastelBio for today's cool coffee read!
Tuesday, March 3, 2015
Hey! Are you a Proteome Discoverer 2.0 Beta Tester? Want to Beta Test my PD 2.0 tutorial videos? Send me an email: email@example.com and I'll send you access information. In return, please send me your thoughts and comments on the videos and I'll do my best to integrate them into the final release!
I've recently become aware of something called "Feedly?" that some of y'all use to know when I post new things to the blog. Maybe there is something else? I swear, I need to read something that isn't a journal article once in a while....
Anyway, I'm putting a note here that I made a dumb mistake when reporting some values on the great DIA paper I read yesterday. The blog post has been corrected and notes are added at the bottom. Shoutout to the reader who tipped me off to the error! And apologies to the authors for reporting an incorrect value. If I understand this correctly, making the actual edit wouldn't alert the Feedler, but this would? Its an experiment!
Sunday, March 1, 2015
More QC for mass spectrometry, please!
I just found out about this one on Friday. You can check it out here at www.QCmyLCMS.com
What is it? Its a useful web user interface for your SProCop data from Skyline. It can take historical data from your LCMS system over any length of time and generate you fantastic information on valuable metrics of your LCMS runs.
It is vendor neutral (Skyline!) and can be used for high res or low res (QQQ!) data. Best of all, you don't have to do any coding or script writing. Handy user interface. It is also fast, as it is housed on a big server.