Monday, March 27, 2017
Multi-institute study uses proteomics to fix errors in 16 Mosquito genomes!
I can't seem to fix the resolution on this image. It is just too big. You'll just have to believe I'm pointing at is an awesome step in the bioinformatic pipeline in this new paper in Genome Research!
The blurry highlighted line is where they use the 5 million MS/MS spectra that they got in their deep proteomics of this mosquito to correct the genome that they started this study with! As a mass spectrometrist you might not be aware that this journal is a big deal. More proof that our field is coming of age -- proteomics correcting genomic information in one of the top Genomics journals?
To do this they also integrated RNA-Seq (transcript) data from this organism and the pipeline is, understandably, complicated. Proteomics isn't perfect, but neither is genomics, but if you've got a peptide that comes from tissue of this organism that the genome can't explain and you look for it in the transcriptome and it's there, maybe editors of a big journal will let you:
Add almost 400 genes that were removed from the genome in error
And fix almost 1,000 errors in the genes that are there!
Mass spec nerd highlight for the paper -- to convince people outside your field that your data is amazing, maybe you need to show them that your median mass error for your peptides was 350 ppBILLION!
I definitely like that part of the paper -- but what I love about this paper is that they took this proof of principle (deadly mosquito vector #1) and applied it to 15 other species (15 other deadly mosquito vectors). And, you know what? They could find a lot of the mistakes that were made in the mosquito genome they started with were also systematically applied to the other mosquito species!
This makes a lot of sense, when we're automatically assembling genomic information it is often assembled based on previous genomes. Even when manually annotating a genome you are going to ride a lot of the same assumptions. This study shows that we don't have to necessarily run deep proteomics on every tissue of every organism on earth to drastically improve our understanding of biology!!