Saturday, February 1, 2020
Predicting PTMs in 2019-nCoV Wuhan Coronavirus
Yeah....maybe I need a hobby....but I think this stuff is cool AND I've learned how to use some new tools thanks to my curiosity about this new virus and thinking about how I would analyze proteomics data from the virus if I could get my hands on it....
Here is the question: PTMs don't typically just happen indiscriminately. There are particular motifs that are the targets of the enzymes that add the PTMs. So...can we start with just some unknown linear proteins and predict what PTMs that we would find?
And...are those predictions any good? I can't yet answer that part directly, but I'm trying.
There are a LOT of tools that predict PTM sites. After two late nights of trying a few of them and doing a lot of failing -- this older one is my current leading favorite -- and you can read about it here.
If you've got better things to do on a Saturday than read, I got you, yo!
You can also just go and dump stuff into their server at ModPred.org. The interface is super straight-forward. Put in your protein FASTA entry (one at a time), pick your mods and hit the button. (You can also install it locally, but I'd rather use their electricity.)
You are capped at 5,000 amino acids per model with the web interface of their server. And you are definitely penalized for longer sequences. At 1,000 amino acids, I recommend walking your dog.
Okay -- so only one protien from the 2019-nCoV translated FASTA is over the cap, so I broke it into 5 separate translated regions in order to have a large overalap in peptide sequences (in case the domains it is modeling against for PTM prediction are large ones). And -- it took basically all morning.
You get a pretty output that you can keep or have it kick you out a Tab(?) delimited text file. I spent a lot of time swearing while combining everything into a single Excel file (I need to grow up and stop using Excel. It always seems like it will be easier -- even though it increasingly is not the easiest solution.
Okay -- and here I'm talking smack about Excel -- and the Ideas button just did something smart!! Normally, it's just funny to hit the button, but -- darn -- it made a decent Pivot Table!
If you're interested in the actual motifs predicted to be modified, you can download them from my Google drive here.
Okay -- so -- that's all nice and all. Predicted PTMs are a pretty big step away from actual PTMs.
Can we test this?
I mentioned a couple of days ago that there was some cool unpublished MERS-CoV proteomics data on MASSIVE.
Now -- this is CID ion trap MS/MS data -- not my favorite source of data for identifying PTMs. It also kind of rules out some of my favorite tools, because they were designed with HRAM MS/MS data in mind. So...back in the time machine to the 1990s to fire up SeQuest and take a minute to polish up my sense of skepticism....
Okay -- this will take more than a minute or two....I forgot how long CID MS/MS takes to search with a couple of PTMs.
I broke it up into queues and only one has finished -- aaaaaaannnnnddddd....nothing!
Okay...so I do actually need another hobby....maybe something I can do inside, in case I screw up my knee and have to do a lot of sitting around for a while.
However -- there is A LOT wrong with this system. One -- we're looking at single shot analysis from 2009s best mass spectrometer -- in a human cell background. We're not exactly digging to the full depth of the proteome -- and PTMs rarely want to announce themselves. Two -- I'm using a prediction model of one virus that is similar to another, but we are definitely reaching when trying to make predictions off the little data across the board. Three through 41 --? I didn't even look to see if that region of the similar protein is even digested by trypsin. Maybe that is for next Saturday.