Sunday, June 19, 2016

The consequences of genetic varation at the proteome!


I'm having a ridiculously great weekend. Perfect climbing weather and a cookout with my 3 favorite (and intimidatingly brilliant) geneticists and I've still got NBA Finals game 7 tonight?!?! What a weekend!

So, if I read a couple papers that make me feel a little intellectually inadequate, I can handle it.

I'm going to start with this one from Joel Chick and Steven Munger et al., in this issue of Nature.

What is it? Its looking at genetic variation at the proteome level. At first statement that doesn't sound like such a big deal, but it totally is. As much as it is fun for us to have distinct proteins from one (or a couple) organisms in a nice FASTA database for a species, that is just a summary of what we think we know about what is a protein coding region from one organism.

I'm going to summarize this as something that works for my brain well.


If we have a nice curated Dog FASTA, that file is likely going to represent the whole species -- Heck, it might even come from a sample from just one dog! Therefore, that poor deformed experiment of irresponsible breeding in the photo above would be represented by the exact same FASTA database as the majestic Pug beside it. And, heck, at the straight up DNA base pair level, they probably wouldn't be all that different.

At the proteome level, its gonna be incredibly different! But how do you evaluate it?

Back to the paper!  There is this group of mice called the Diversity Outbred (DO)model. They have been developed to have a bunch of genetic variation so that we can better understand how individual organisms turn out different than their forebears.

If you're thinking "what? we know how that happens! my Dad is homozygous for the bushy eyebrows gene and my mom is bald and that is recessive, so I get both of them. Simple!" It turns out if you look closely...that the exchange of traits isn't anywhere as simple as Mendelian models suggest. Genes don't get cleanly copied and transferred, and we don't actually know how the whole system works. (Summary of a ton of WikiPedia articles I read while trying to understand what Joel and Steve are even working on here. My explanation is definitely way too simple, but it helps in my espresso charged brain!)

There are these things called Quantitative Trait Loci (QTL). These are areas of the chromosome that are associated with a trait. Once we know what/where those are, then sequencing techniques can start to figure out what actual genes/proteins come from that area.

Back to the paper, I swear!  So this group starts with some of these mice that are deliberately genetically diverse and they take some of their livers and do some TMT SPS MS3 Fusion proteomics on the livers to get the proteomics data.  By "some" mice, I mean 192 (slackers) with half the mice on a normal diet and the other half on a high fat diet. You know, cause this wasn't complex enough. They, of course, did transcript-level profiling of all of these mice as well. Its worth noting that there are not complete genomes for these DO mice. The original mice that were bred to created this heterozygous model colony are sequenced and well annotated, though, and this is info is critical to the...imposing...level of downstream analysis.

Edit 6/21/16: I left out some of the coolest bits!  Okay, so when you are doing stuff like this you toss the gene ID stuff. What you think you know of the protein coding region and so forth, cause we need to make the assumption that maybe not all of that stuff is 100% accurate -- seriously, this is going to be a theme that pops up going forward (unassigned spectra, what!?). Rather, focus on the stuff you do know, as where in the chromosome this stuff matches to!

Now, there is all sorts of awesome biology to infer from this paper. There are close to 100 supplemental figures and tables. There are fantastic conclusions made here in terms of how much the genetic variation effects both the protein expression in general -- as well as the response to this extreme dietary change. But check this one out.


 In case showing this figure is totally against the rules (please don't sue me...Nature! See disclaimer statement or email me: orsburn@vt.edu, and I'll take it down! Promise! But this is really cool and I'm sending people to read your paper, you can loan me one tiny screenshot from one huge supplemental picture, right?).

Blue is looking at these loci at the transcript (RNA) level and orange is at the proteome. Number C, makes sense, right? Lots of messenger RNA --> lots of protein. D? More mRNA? Less protein!  Turns out that they can pinpoint site mutations that cause post-transcriptional regulation.

Take home points out of this great paper that I'm seriously concerned I still might not understand at all and probably completely butchered?

We are still vastly oversimplifying the biology of our eukaryotic models.
But we have the technology (right now!) to throw out some of our erroneous preconceived notions and readdress how we do all of this stuff!



No comments:

Post a Comment