Wednesday, December 23, 2015
What is a PFAM? And how do they deal with all this data?
Personally, I think the biologists and biochemists need to hurry up and annotate the function of every protein from every organism under every biological condition. Until they stop slacking and get that stuff done, we need to use some shortcuts to extract biological data from our peptide spectral matches. Fortunately, smart people have been working on this gap for us.
Gene Ontology (GO) is tricky stuff. If we don't exactly know what a gene does can we infer from its similarity to genes we better understand what the heck it does?
More tricky, and way more biologically relevant? Protein Ontology (PO?)! One way of getting this data is via PFAM (which you can access here). I'll be honest. I didn't really know what this is was for a long time. I just knew that it was an option in the Annotation node in Proteome Discoverer. Cool, I have new column that says that all this stuff that is upregulated shares a PFAM ID (actually, I made that part up. Its never that easy, is it?)
Turns out that the people making PFAM are working really hard making this data:
1) More accurate
2) More relevant
3) More current
As you can imagine, all of this is hard, but...
Can you imagine what the 3rd one is like these days?
The amount of sequencing information in databases is increasing EXPONENTIALLY and the current tools for creating PFAM information increases at a linear rate. It doesn't take a stolen GoogleImage to show that this is a problem, but...I'm nervously waiting for an important phone call...so...
So, what do we do about it? Well, Robert Finn et al., say in this new OpenAccess paper, we fix the algorithms to deal with this glut of data. So they did.
When I clicked on this link in Twitter this morning, I honestly expected a dense paper that I probably would hardly be able to read and would likely not understand at all. I was pleasantly surprised to find that this team can seriously write and that I not only learned a lot about how PFAM works, but I also (think) I got a good understanding of their challenges and how their new algorithms power through in dealing with them. Solid and interesting paper that makes me want to add this column to all of my processed data from now on!