Tuesday, August 13, 2019
FusionPro -- Find the fused proteins with transcript data!
Fusion proteins -- at this point -- seem like the result of relatively rare events, at least in humans. And...woweee....have some of them been controversial....but they absolutely do exist.
If you want to find them there are different ways of doing it, but FusionPro goes at it with a different toolbox than what I'd use. Oh no. This gets rambly. Here is the paper!
I'd also like to throw this in up front -- this tool will NOT work for post translational protein/ peptide fusions products. This is purely for events that would produce both transcript (RNA) products that would then be translated to protein. These are still important, but might not be what you're looking for.
The most well-characterized example for me to ramble about is probably the BCR/ABL protein fusion that is the result of the "Philadelphia Chromosome"
Photo taken from the source used in this link and used here in accordance with the GNU Agreement, thanks Pmx!)
It's a little hard to make out, but these are human chromosomes stained with DAPI (blue) and the ends of of two separate chromosomes are tagged, probably by FISH (not the awful band, the fluorescence in situ hybridization).
What you should see is 2 chromosomes with green dots and 2 chromosomes with red dots.
That thing in the upper corner with both colors is the Philly chromosome thing. An unfortunate event has caused the ends of two to break and rejoin together making an evil little chromosome.
There might be more negative effects of this chromosome fusion -- but the one I know about is that the break points line up to make a long reading frame that produces a BCR/ABL Fusion transcript and then protein. Unless something has changed recently, we don't actually know what the BCR protein does in the normal context, but it appears to be a Serine/Threonine kinase. (Responsible for direct S/T + phospho). ABL is a tyrosine kinase (Y+ phospho). See where this is going?
Tyrosine phosphorylation is supposes to be our fast-response sensitive signal regulatory system -- and now this stupid fusion protein is turning on tyrosine phosphorylation -- permanently. This triggers an entire regulatory system inside the cells with the protein that is saying divide! divide! divide! You don't want every cell trying to divide all the time....you especially DO NOT want damaged cells to divide -- you want them to stop, try to repair themselves and then divide and kill themselves if the damage is too bad to repair...but a damaged cell with this protein will still want to make damaged copies of itself.
Wow. That was a lot of words. I'll link the paper way above.
Back to FusionPro -- I think the standard way of doing this kind of work, for us, is to de novo sequence every peptide and then try to work your way back. To be honest, that's what I'd still do first. But proteomics is 1000x easier when you have a database to reference. And this is where FusionPro comes in. It can build that database for you from transcript data.
You'll need a bioinformagician for this. FusionPro uses a combination of Perl and Python and you can get it all here. It totally works, though. They go through CPTAC data that has both high read depth RNA-Seq and deep proteomic data (95% sure this is CPTAC 2 since it's labeled reporter ion data) and they find fusion events with high accuracy. Pretty nice to have a load of transcript reads and a high resolution labeled MS/MS spectra to make your case that you found a new fusion protein!