There is a seriously fascinating article in this month's The Scientist (cover story even), that shows how some systematic oversights might be causing us to miss some cool stuff.
You know those nice databases we use that come from the genome sequences? It turns out that, in an effort to help minimize the tiresome annotation process, pretty much anything between two stop codons that was under 300 base pairs in length was skipped. That's up to a 100 amino acid protein! You consider the average mass of an amino acid at 110Da and that is an 11kDa protein that is systematically skipped, doesn't get considered for annotation in the genome -- doesn't get translated -- and doesn't end up in our FASTA database file.
They talk about the implications to proteomics a little when they talk about us letting the little stuff run right off the bottom of the gel! In a more modern consideration of this problem perhaps -- YM10 is the typical FASP filter cutoff most of y'all are using, right? That 10 means that it retains anything over 10kDa. (yeah, I know its plus or minus a good bit from my own hands-on) but..whoa!
Turns out that we're losing evidence on the protein end, it isn't in our databases and -- there is a ton of cool stuff down in that range! They talk about labs that are just trying to fill in those blanks -- and they are finding all sorts of cool small regulatory proteins in all sorts of organisms!
I'm gonna pulls some of the C-HPP papers on "missing proteins" to see if they're considering this (they probably are, I'm probably the only one surprised by this!) also, would our friends in the top down arena who are mostly looking at 30kDa down finding a lot of stuff that isn't in our genomes?
Worth a thought, anyway, right?