Tuesday, January 7, 2014

Uniprot vs IPI databases

I guess I kicked up a little bit of a controversy today, as I've gotten a couple of emails already about this (the previous entry).

I've got a lot going on today so I don't want to go into the true differences between Uniprot TREMBL, Uniprot Swissprot, and the IPI databases, I'm just going to show you some real data.

I ran some HeLa digest a while back on an Orbitrap (Velos, I think).  I used a high-high mode (60k, 15k) which is relatively slow on that instrument in comparison to the QE or the Elite or Fusion.   I think this was for a limits of detection study or something.  It doesn't matter.  The experiment will illustrate the point.

I downloaded the IPI Human database and I set this to run over lunch.  Same file, same everything, all I changed was the database, IPI or Swissprot.  I didn't search with any mods except carbamidomethylation on C.

Remember:  Nothing else was changed:

Uniprot database:  2,940 proteins.
IPI database:  11,150 proteins

Want the screenshots?  Email me.  Want the file and the XML copies of the methods?  You can have those as well.  Let me know.

Why the big difference?

For one, look at our IPI Human database:  50MB, vs our Uniprot database at 13MB.  Why so much bigger?

Cause the IPI database is full of putative crap.  Putative 22kDa protein?  Super useful, right?  This is why very few people use the IPI database.  The Uniprot/Swissprot has real proteins with annotations that can help you arrive at a biological conclusion.  Could that 22kDa protein be super useful later?  Sure, but we have no idea what it is right now!

Hope this helps clarify some things!

1 comment:

  1. Hi,
    Do you refer to "proteins" or "protein groups"? The redundancy in IPI clearly inflates the number of proteins because of sequence homology.

    On a separate note, I noticed your 64bits Percolator node, is this available?

    Great post once again