Recently, I worked with a couple of labs that use single protein digests and % coverage as a QC metric. Lots of people do this. This isn't my favorite QC, but as long as people are benchmarking their instruments with some sort of constant standard, I'm sure not going to stand in the way. A question occurred to me when I saw very high % of peptide coverage: how much can we actually see with a single enzyme digest and mass spectrometry?
Take this coverage map for example. This is the Mascot coverage output for one of these QC proteins. Mascot says 79% coverage (what was found is in red).
Something that I've started to be very concerned about, due to the amount of intact and top-down analysis I've been doing, is the signal and pro- peptide sequences. This protein is BSA, but the first 24 amino acids are not actually part of the true BSA sequence. They are part of the translational process and are cleaved prior to BSA, so I don't think they should count.
Lets look at what is left: If we assume 100% cleavage, we have:
What are our requirements for settings for our instruments? I, for one, almost never look at ions with a mass to charge of <400. I also ignore anything with less than 2 charges, because they don't seqence in most cases. Ignoring the fact that not all amino acids can/will accept protons, if I only use the requirment that my peptide has a mass >800 Da, only DLGEEHFK, makes the cut. It also has two basic amino acids, so it should charge to at least +2. If it charges to +3 or above, this would explain why we didn't see it, as it won't meet our >400 m/z cutoff as a +3.
So, if we actually consider our coverage of what is possible? If we start with the FASTA BSA sequence of 608 a.a. and subtract our non-expressed region (24 a.a.) then we get 584 amino acids in the fully expressed protein. There are 109 amino acids in the peptides I just deemed too short for my mass spec analysis. 584-109 = 475. Lets assume that DLGEEHFK will charge +2, so it counts as one that we can see but didn't so (475-8)/475 = 98% achievable coverage of BSA in this example.
Real achievable coverage (RAC? is that in use?) is 475/608 = 68% of the FASTA sequence coverage. I wonder if that is anywhere near consistent in natural proteins?