Thursday, October 12, 2023

Bulk make a custom .FASTA file from anything (tutorial for me!) then deep learn it!

I swear I thought this was on here somewhere and then I had to go Googling all over the place.

Let's quickly get a FASTA (and then deep learn it for any program at all)

First get a protein list of any kind. UniProt, RefSeq, Gene Identifier, whatever and go here: 

https://www.uniprot.org/id-mapping


Easy! This is 3 KRAS isoforms (though the 3rd one I think has an A or B in it after the 3, so it might be wrong. Going off memory. 

Make sure that "From database" is correct, particularly if you're using a Gene because UniGene and Official Gene are different (I forget what they're really called). 

Keep the "to database" to UniProtKB - unless you need something weird at UniProt - this will get you to what you need

There is a "MAP IDs" button in the bottom. Give it a minute or two. Then click the "Completed" link that pops up. 


Go to Download and toggle off the "compressed" button. It's an annoying Linux compressed format. .gz or something indecipherable. 

BOOM -- FASTA! 

Wanna deep learn it? HAHA! The buttons in Prosit are more complex these days, but this 3 year old tutorial has the rest of it. 

After today's generation of spectral libraries exercise I was reminded that about 9/10 proteomic studies that are funded are for generating new and exciting spectral library formats! 

Thank you to all the study sections who immediately move a proteomics study to the front of the queue when someone proposes yet another way to display and store the same thing. There is nothing we need more today than more spectral library formats. 



No comments:

Post a Comment