Thursday, October 15, 2015
An application interface for all those funny MS file types
All of our friends on the bioinformatics side of the proteomics world have been throwing out all these funny letters for years. They tend to start with an "m" and end with an "l" and have something random in the middle. mZmL, mzXmL, mzTab (no L! cheater!), mzIdentmL, and on and on. On cursory examination these are all attempts to store our data with better efficiency without the loss of data that we see when converting our data to MGF (where we lose almost all of our MS1 data!)
Problem is, that some of us have used these things. One or the other and the public repositories may have cool data hidden in one of these formats.
This new program (definitely meant for the bioinformaticians out there who can code and stuff!) is called ms-data-core-api. It is an Application Programming Interface that should take care of all these formats for you. Adding this to your programs will allow you to pull data in from any of these sources and read the data in a unifying format so you aren't all jumbled in your downstream processing.
You can read about it at BioCode's notes here.
And it can be downloaded at GitHub here.