I admit it -- I'm blown away by this one. To the authors of this study -- I owe you a round of drinks. Seriously. See you at ASMS?
1) You can't do proteomics of transposons!
2) You certainly can't do it on a non-model species without 100% coverage, 50x translational coverage and perfect annotation
3) You did it on the yellow fever mosquito? The vector of all sorts of murderous diseases?!?!?
You deserve a medal -- and a round of drinks. Warning: I might hug you. Kidding, probably!
This is the paper -- and everyone should read it (Open Access even? Of course it is...)
1) First off -- why is this a big deal? I'm glad you asked! The reason I said #1 above is that transposons are inherently chaotic from a genomic/proteomic sense. As
this stolen image shows...
...the transposon moves around and interrupts things. There are systematic reasons/regions for this, but overall they are tough to deal with. If you're doing proteomics with your nice, curated UniProt database alone and you've had some jumping genes (transposons) in the proteome of the organism that you've actually digested -- well, you aren't going to find the area where the transposon messed up (okay...maybe if it has 2 copies...but that's another problem for another time...let's simplify it).
Hopefully it didn't land in the middle of your coding region or blow up a start or stop codon, but if it did you might not have MS/MS spectra to match your database (cause the genetics just don't match anymore).
2) There is a reason we develop new methods on
E.coli and
C.elegans and
D.melanogaster --cause we know just about everything about them. A piece of a gene from the model coliform bacteria ends up in the wrong place -- we can figure it out, probably. Outside of our model organisms, there is an awful lot of chaos.
3) This mosquito sucks. It is known to transmit at least 5 viruses -- and it is awesome enough that, on rare occasions,
if it bites you once it can infect you with more than one virus. We need to know more about this thing!
Okay -- so how did they do this? They used a technique called PIT (Proteomics Informed (by) Transcriptomics. It is detailed in this
Nature Methods Paper from 2012 (from several of the same authors of this study).
In a nutshell -- they
de novo sequence the RNA transcripts that they find with their fancy next-gen sequencing equipment. Once they have those, that is what they search their MS/MS data against. If you want to do this yourself, you need Galaxy (if you have a big genomics effort at your institution, chances are you already have a server loaded up with Galaxy programs...you may have to leave the safety of mass spec cave for a bit...the comforting sound of roaring vacuum pumps will be there when you return...not making fun of you guys...I'm with ya'!) and you need
this GitHub package. Implementation of this into Galaxy is
thoroughly detailed in this open! study from last year.
...only inserted because it was far too many words in a row without a picture. Bernie looking indignant after being accused of stealing my shoe cracks me up every time (and made the front page of
Reddit/r/pugs a while back!) Back to seriousness!
Let's go back to #2 and #3 from above -- using a method like PIT is gonna be a whole lot easier on a model organism...but we have a sequenced genome for this sucky mosquito! Why go to all the trouble of PIT?
...cause PIT shows that the genome of this organism needs an awful lot of work! Think about it -- we're flipping the paradigm here! Traditionally, the thought is that we can only identify peptides that are informed by our genomic sequence. Here -- we are leveraging the transcriptomics to give the power to take the MS/MS spectra and show where the genome is wrong! They can go into specific examples of these regions where the spectra and the genome don't match and figure out that this area of the genome, for whatever reason, had low sequencing depth -- or was misannotated or things. You know, 'cause the mass spec never lies.
On this topic -- and something that is only of very minor concern here is that the data was acquired on an Orbitrap Velos running in high/low. Is there a little more wiggle room in the peptide sequencing data because of the lower resolution/lower mass accuracy of the ion trap? High resolution acquisition of the MS/MS spectra as well might very well strengthen the findings of this beautiful paper, but on an OV you are going to take a hit in overall peptide sequencing depth and I can't disagree here that depth was more important.
Okay -- finally back to #1! The transposons! Transposable elements have characteristic regions cause transposases (I think that's what they are called) leave specific signatures behind -- I forget the details and I'm losing motivation -- this one is taking a loong time! The unbelievable sequencing depth this group has from the transcriptomics + proteomics allows them to find all of these (they call it the mobilome!---adding that to the translator for sure!) Some of those places where the genome and PIT disagree ends up being inserted transposable elements -- and with information from both the T and P levels, it is darned convincing!
TL/DR: Amazing study with publicly available tools shows how proteomics/ genomics/ transcriptomics can be leveraged together to massively improve our understanding one of earth's worst pests. Ben puts awkward image of him hugging authors with his freakishly long arms into people's heads.