Sunday, June 9, 2013

FASTX Toolkit -- Convert your next gen sequencing data to something usable

Recent studies have shown that having a cell line specific database can significantly boost your proteomics matches.  Really, it makes sense.  The human sequence we primarily use is derived from the genome sequence of J. Craig Venter.  My proteins might be a little different.  This opens the door for next generation DNA sequencing to finally contribute something useful to science!  (I don't really mean that, I only said it because it is funny...)

Many labs are now doing sequencing on their cells of interest followed by proteomics searches versus this new sequencing.  If you have tried to use this data, I'm sure you've noticed that the output isn't exactly the quality level of what we're used to from Uniprot manually curated data.

Never fear, though, the FASTX toolkit is a set of tools that can clean up this data and make it a whole lot more presentable to our favorite peptide search algorithsm.  You can find out more about FASTX here.

