I'm still investigating PepNovo for performing de novo sequencing on our data sets. The second paper from the list at CSE Bioinformatics is this 2008 paper by Ari Frank and describes the PepNovo Plus algorithm.
While a lot of the statistics are a little beyond my level, there is a lot of very useful information in this somewhat long paper.
In the introduction, Dr.Frank points out the problem with most statistical models used in bioinformatics -- that "such models tend to oversimplify the phenomenon they describe and are consequently inaccurate."
In order to address these shortcomings, the paper describes the use of a machine learning boosting algorithm to analyze a large database of low resolution MS/MS spectra.
The dataset used was >300,000 peptide spectrum pairs.
The principle of boosting "produces highly accurate prediction rules by combining many "weak" rules that, each on their own, might be only moderately accurate."
The boosting algorithm, as described here, is able to make use of a combination of over 800 possible features produced by CID fragmentation of a peptide.
Its pretty clear that this algorithm is much more complicated than simpler programs like Sequest. Considering the amount of thought that has went into the PepNovo program, I'm expecting big things from it once I can actually get the file to run.
A handicap is that the Thermo RAW files can not be inputted directly into the software. They must be converted before they will upload successfully. I'm still working on that one....
PepNovo Part 1
No comments:
Post a Comment