Tuesday, October 15, 2024

Revisiting the Harvard FragPipe on an HPC technical note in terms of total time/costs!

 


I read and posted on this great technical note from the Steen groups a while back and I've had an excuse to revisit it today.


Quick summary - they ran EvoSep 60SPD proteomics on a TIMSTOF Pro2 on the plasma of 3,300 patients. They looked at their run time on their desktop and estimated processing it the way they wanted to would take about 3 months. Ouch.

What they did instead was set the whole thing up on their local high performance cluster and they walk you through just about every step. 

It took them just about 9 days to process the data using a node with 96 cores and 180GB of RAM. They do note that they never appeared to use even 50% of the available resources, so they could have scaled back in different ways. 

Where I was interested was - if I was paying for HPC access, how many core hours would I be set back for doing it this way? 9 days x 24 hours = 216 hours x 96 cores puts it at 20,000 core hours, right? I know some HPCs track how much you actually use in real time based on the load you're putting on their resources, but others don't. So it's probably at the very  most 20,000 core hours. Which is the estimate that I was looking for when I went looking for this paper.

Not counting blanks/QCs/maintenance - 2 months of run time for a 3,300 patient study. 9 days to process. It's such an exciting time to be doing proteomics for people who care about the biology. And - I'll totally point this out - 60 SPD isn't even all that fast right now! It's a 6 week end to end study at 100SPD! 

1 comment:

  1. Might want to check this out Ben... https://www.biorxiv.org/content/10.1101/2024.09.05.611509v1

    ReplyDelete