Wednesday, March 18, 2015
Ben's sleepy airport monologue on peptide mapping
I had dinner with Dr. Gary Paul and he brought up that there aren't a lot of resources for the peptide mapping crowd and I thought I'd spend some time doing something about it. I figured I could start by rambling about it while I wait for my plane!
I'm going to stick to the term mapping. My blog. My nomenclature. (And this is the most common term right now anyway, I think) I'll add it to the translator to make it official. What I mean when I say peptide mapping is this: I have a protein species and I want to see every single amino acid and post translational modification at every possible level of abundance and I want to do this by LC-MS/MS.
Step 1: Get the protein sequence from somewhere.
People doing this are often studying antibodies or some other pain-in-that-you-know-what protein (I gotta watch the profanities right now....) so they know what they are looking for...to a certain extent. So we've gotta start with what genomics has given us as the protein sequence. This is where some people get caught. When you pull a protein sequence from NCBI, remember this....lots of them are wrong! Us shotgun proteomics people get to ignore that fact because the sequences are mostly right. Mostly. Where do the human uniprot proteins come from? Like 2 people! Seriously. So we sequenced a couple people's DNA, assumed every start and stop codon and intron/exon is correct then we 6 frame translate that DNA into protein and everything is okay? And it is...mostly....
But we start where we have to.
Get the best annotated sequence for your protein that you can get. Be cautious that it might be wrong. Keep this in mind: What you get from the mass spec trumps what you get from NCBI or Uniprot. Every time.
Step 2: Theoretically digest your protein sequence:
Man, I can't state this one enough. It takes like 10 seconds to do this and it can save you so much trouble. If you have PepFinder, it'll do it for you, Pinpoint does it, Skyline does it. If you don't have one of these installed (you should at least have Skyline! come on!) you can do it online and there are many good tools to do it. By default I'll probably do it with this old guy:
this is the UCSF Protein Prospector. Yes, he's been around forever, but it isn't a dead project. New features were just added a few months ago.
Why would I do this second? Once upon a time when I was young and stupid, rather than just stupid I studied a phosphorylation cascade caused by a promising chemotherapy drug. I developed a really extreme method for forcing an Orbitrap XL to get an insane number of phospho IDs. (Triple enrichment + 3D fractionation) Unfortunately, While I got enough for a cool method paper I didn't get anything from my pathway of interest. Because I used trypsin. Because the phospho-sites of interest are surrounded by lysines. Honestly, this entire pathway had a motif that was KxYk. The peptides were too small to be sequenced by LC-MS/MS. They were singly charged. Had I spent, I don't know, maybe a half hour on Protein Prospector maybe I would have realized this and could have went with an alternative enzyme...or chemical cleavage...or even forced semi-tryptic cleavage and would have got more than a method paper out of 4 months of work. (Sorry for the rant)
Don't follow in Dumb Ben's footsteps. Look at your protein digest in silico (theoretically). An ideal protein for tryptic digestion should have lysines or arginines spaced reasonable evenly throughout the protein. They should make up around 1/13 of the total amino acid sequence. If they only make up 1/30 of the total amino acids present, the peptides being produced may be too big for CID or HCD fragmentation (but they may be perfect for ETD! if you have it!). If K and R make up 1/4 all of your peptides may be singly charged and invisible to mass spec. Consider switching enzymes or doing something extreme like 30 minute tryptic digestions at room temp with a decreased amount of trypsin. There are ways around this.
Optional step X: Get an intact protein mass. If you can do it, getting the intact protein mass can be awesome here. This is where you'll find out that your Uniprot sequence was wrong. Or that this protein, when produced in E.coli doesn't cleave the initial methioning. Or...that you are looking at a mixed population. Just because that protein comes off an FPLC size exclusion column as one peak doesn't mean every single protein molecules is the same. Heck (Albert Heck! lol!), a single peak of ovalbumin has >60 protein forms. Keep that in mind. If you have the capability and enough information, man, this is going to be so great for you!
Step 3: Get great chromatography!
I know. This is one protein. I should be able to get this whole thing in 10 minutes, easy, right?!? Maybe. But it depends on what you want. You're peptide mapping, so I assume you want everything. I assume you want at the end 100% sequence coverage. Best chromatography is going to give you the best chance of success. A 4 hour gradient for a single protein is probably excessive....but I don't think a 2 hour run is crazy at all...
Step 4: Sequence everything you can.
In an ideal world, the mass spec will pick out every peptide, that peptide will be of high enough intensity and of perfectly compatible with your MS/MS fragmentation method of choice and you'll walk away with 100% coverage on the first run.
Realistically? Some of those peptides will ionize poorly and will be of low intensity. Some of them will fragment too poorly to sequence and you'll get 10 MS/MS events of high quality for peptides in every region you don't care about. That's okay.
Step 5: Rerun that dumb sample.
A.) There are two approaches here. Both are equally valid. You can take all of the events that gave you MS/MS events that were sequencable and you can put those on an exclusion list and rerun that sample. Put a nice tight mass tolerance on it (<10ppm) and maybe a time restriction limit on it if you can. Then re-run the sample with this new method.
B.) Target it. This is where PepFinder is real powerful. Pepfinder gives you a list of everything that matches your protein of interest...whether it was triggered for MS/MS or not. If it wasn't you can export the list and then build a targeted list. Increase your fill time and go after those regions you don't know about. Get MS/MS for everything that you can in there. If you have to, raise your fill times and do it again!
Step 6: Export the MS/MS spectra that didn't match anything.
This is often overlooked. And can be if you have software (like Pepfinder or Byonic) that can search for unknown PTMs or amino acid substitutions. If you are just using Sequest, for example, export the MS/MS spectra that don't match and try sequencing those de novo. You have options. Peaks is commercial and powerful. The de novo GUI is simple and free. PepNovo+ (command line) can actually do BLAST sequence alignments after sequencing your peptides. Now. Please keep in mind, these unknown peptides may be keratins or other junk from around your lab. Or, the reason your protein doesn't do what its supposed to!
Step 7: Find out what is missing!
Now, if everything went well you should have at least most of this protein figured out. Look at that theoretical sequence. What is missing? Does it make sense that it is missing? If there are two amino acids flanked by lysines and those are missing that makes sense. If there are big regions that are 12 amino acids or so long (between Ks and Rs) and you didn't sequence those, then something weird is going on. Maybe you have a point mutation. Maybe you have a big glyco mucking up that part of the peptide. There is a logical reason for why that is missing and getting to the bottom of that is going to be some work, but it might just be the icing on the top of this awesome study you just put together!
Now, I should get on a plane. I'd like to follow up here later with more visual stuff so I may build on this. I don't have a lot of good single protein stuff on this laptop. But I do have stuff around. More on it later!