Wednesday, September 14, 2016
GEMPro -- Genome Scale Models with Protein Structures!
This paper from Elizabeth Brunk and Nathan Mih et al., is not the first paper to jump on if you're already feeling dumb.
It is elegant and brilliant and imposing. The concept is an extension on a genome regulation tool called genome scale models. This is a nice open access review written on the topic from that is directed to people who aren't planning to encode their own tools.
There appear to be multiple iterations of GEM, but the one that seems the most straight-forward to me is the integration of the genomic changes with the metabolic ones. Obviously there is several levels of regulation from the genetic level (from the transcriptional regulation through the post translational) that all have effects on metabolite production, but GEM steps around that. The concept takes our existing knowledge and relationships and feeds it into a framework -- we know that it isn't a direct link from RNA X to metabolite Y, but all the same when we see an upregulation in X we see a down-regulation in Y.
I probably slaughtered the concept, but that's what I'm getting out of it.
GEMPro builds on this. Cause what would make this more complicated? What if you also threw in protein 3D structures into the mix!?!? The whole idea definitely makes my head hurt, but...
The GEM framework is in place and yielding dividends
We have structural information on 110k proteins (seriously...!?!...that's what the paper says!) and more all the time.
For those protein 3D structures we have useful information -- like what is the structure of this protein at this temperature...or in this disease state.
More metabolomics data is showing up all the time that could correspond to changes in either.
This is obviously a big data problem with this number of variables....and a big focus of the paper is that if they build a framework that can do this --it MUST be able to grow with the existing knowledge bases, cause our knowledge of everything biological is increasing significantly faster than linear rates.
How do you test something like this? They go for 2 bacteria. E.coli and T.maritima (which I don't think I've ever heard of...Wikipedia says its a cool extremophile from Italian volcanoes (estremofilo!)
Cool point in the paper -- if they try to do this analysis with all the data that was available in previous years you get a really cool picture of how our knowledge is expanding.
The myoglobin crystal structure was published in 1958. From that time until 2013 all the groups doing protein 3D structure work got to where about 34% of the E.coli proteins are characterized in high quality maps that can be used for this type of analysis. If they step forward in time to when they finalized this paper? They're at 44%. Wow! (Google doesn't know the word "Wow" in Italian. Well...it knows one...but apparently its not appropriate in all dialects and I'm watching it today.)
And they dump all this data in. From these 2 organisms and look around. This is my favorite analysis:
E.coli isn't very tolerant to heat compared to our estremofilo friend. They go into the literature and find the proteins in E.coli that are known to be adversely affected by growing in culture that is too hot. Here they can draw on their their GEM models -- what genes are known to be similar as well as what gene products are linked to metabolic functions that are the same (if the genes don't look the same, you can pull the listings that are tightly linked to the creation of this metabolite as the same thing)
This gives them a little over 200 entries that either have the same (or very similar) metabolic functions in the 2 organisms....and only 10% of them have similar 3D structures.
So...the genetic pressure is there to conserve this basic DNA sequence for making, for example, this amino acid. Or if the two organisms have evolved very different ways of making that amino acid -- we can link some of these proteins together by the fact that they make that amino acid. But at the 3D protein level they are very very different.
So...E.coli has 200 proteins that presumably just up and fall apart. No amino acid in my example = dead, but our Italian friend just keeps chugging along and enjoying its relaxing volcanic sauna.
I totally dig this paper. I'm not sure what I'm going to do with this information, but I really like it!