Sunday, July 20, 2014
Heavy analysis of the human proteome drafts
I'm certainly not the only person who has jumped on the new resources provided by the human proteome drafts and checked them out. In this brand new paper in JPR, a group out of Madrid takes a look at some of their favorite proteins in the human proteome drafts and comes back with an interesting analysis. (Abstract here.)
I love the fact that, in this paper, they did the same experiment Alexis and I did the day the drafts came out. We chose proteins that we knew would lead to cancer if they were over or under expressed and analyzed those. This group took proteins from nasal tissue (olfactory receptor proteins) and looked for those in the various tissues.
At first glance, the image on the abstract looks pretty damning:
These are olfactory (smelling) receptors. What are they doing being expressed in colon cells and platelets?!?! (It is worth noting that the image above is from the HumanProteomeMap.org (the data from the Pandey lab).
The authors of this analysis indicate, even in the abstract, that the "experimental data from these studies should be used with caution." And I agree. There is inherent error in studies this big; hell, a 1% false discovery rate on 100 million observations is 1 million observations that are false, right?. But...the experimental data from every study should be used with caution. And we all know that (by "we" I mean you proteomics experts who read this.) I am glad that this caution is stated, though, for the people outside our field who have discovered this resource through mainstream news outlets.
That being said, I have some problems with this experimental design. There are 3 big assumptions being made here:
1) The annotation of these proteins are 100% correct
2) These proteins have 1 function
3) These proteins only function in one tissue
Number 1 is easy. Annotations suck. The system for annotation sucks. The first person to identify a protein in the first tissue gets to name it, right? So there are tons and tons of proteins named in tissues that are heavily studied.
Number 2 is relatively easy, as well. Making new proteins takes a ton of energy. Evolutionarily (that's not a word? whatever...) there will be a lot of pressure for proteins to function in more than one way, in more than one context. (Side note, one of my graduate committee members, Jiann-Shin Chen proved the first dual substrate enzyme...in bacteria...in the 1970s...sorry, couldn't find the link, I'll add it later if you're interested). Considering the sophistication of eukaryote proteins, it is naive to think that if a protein is annotated as "Butt_itching_protein_1" that it would ONLY be utilized in the itchy butt response pathway.
Number 3 is an impressive coincidence. Like millions of Americans, I subscribe to "I Fucking Love Science" and get Elise's feed of cool articles. From this feed I know that: zebrafish embryos highly express functional olfactory response proteins and olfactory receptors are highly active in human skin. Heck, I've looked through more than a few high quality proteomics assays and seen "olfactory response proteins" in bunches of different tissues. So...I think this was a poor choice for analysis.
TL;DR: Please interpret the results of the human proteome draft maps with caution. They are draft maps. Two, consider proteins in an evolutionary context before using those proteins to generate excessive criticism of datasets that a ton of work went into.
Thanks, Karl, for suggesting something to read over coffee this morning!