Wednesday, January 16, 2013

Does Hyper threading work in proteomics applications?


This is an interesting thing that recently came to my attention.  Does "hyper" or "virtual" threading actually work in Proteomics applications?  In at least one instance I've seen evidence that it does not.
What is hyperthreading?  It is a virtual processing unit used by Intel processors that enables tasks to be performed while one core is not busy.  Here is an illustration I stole using  Google Images:


The gist is this:  In normal applications using multiple processing cores, sometimes one core isn't doing anything.  When that occurs, hyper-threading goes ahead and runs the next processing thread.  In most applications, this allows the CPU and motherboard to pretend they have additional cores.  This runs on the assumption that there will be dead time for the cores.
This is where the problems comes in with hyperthreading and proteomics data processing:  when you're running a search algorithm on a huge proteomics file, the cores never get a chance to take a break -- or at least very rarely.  With no stop in the processing on-slaught, the processor that is pretending to be an 8 core processor, is the 4 core it really is and no advantage is gained from pretending.
Keep in mind that this is from an extremely limited experiment but:  a comparison of an Intel processor with 4 cores and hyperthreading enabled to an AMD 8 core processor came up extremely different than one would predict when comparing their rankings on the Passmark CPU benchmarking chart.  The Intel processor in question was ranked considerably faster by benchmark but was absolutely smoked by the 8 core PC although Passmark's study had shown the AMD was less than 70% of the speed of the Intel processor.  Again--limited experiment, but when the Intel processor in question was almost 10 times as expensive as the AMD, it makes you want to try it out yourself, right?
 I'd love to hear from other people who have data on this!


2 comments:

  1. Interesting that you brought this up. We're upgrading our systems to dual processor Intel Xeon E5-2450s and had a similar question as to the effect of hyper-threading. So I ran a benchmark using X!Tandem on a representative data set with the thread counts set to 1, 8, 16, and 32 and hyper-threading turned on then off. I fully expected the hyper-threading to have little or even worse times then with it off since this is largely a CPU bound problem with little file I/O. Surprisingly using 32 threads on 32 virtual cores was much better than using 16 threads on 16 real cores. My guess is that hyper-threading does help with memory cache misses.

    ReplyDelete