Friday, October 25, 2024

Parallelizing the most challenging steps in proteomic analysis on the cloud!

 


I got this preprint sent to me after my brainstorming on core hour usage for proteomics. I was largely doing that to figure out whether it was worth it to me to use spend the time on slurming around for 50,000 cpu core hours I just got access to. What I didn't get into in that post was where the Harvard team found their HPC spending the most time - it was, by far, on match between runs. 

In this preprint, this team demonstrates some early results in parallelizing that pain point on the Cloud. 


The best figure in the paper is probably the panel at the top. Go to 1,000 files and - yeah - you use a lot of cores but you cut 6 days of processing time to a few hours. Since Clouds (which are just someone elses HPC) tend to do a really good job in charging you for what resources you actually use (because it's a highly competitive commercial environment and if they didn't do it right you'd give your money to someone else) the costs end up working out to just about the same, same cost but you get your results back almost a week later? Everyone is taking that deal. 

Again, very preliminary, but you should be excited because you know someone who would like to talk to you about their 5,000 FFPE blocks for proteomics and you can only avoid them for so long. Pretty cool to know that someone is thinking about a bottleneck you haven't go to yet! 

No comments:

Post a Comment