Sometimes this blog is just what I learn as I'm going through learning something for myself - and this is clearly one of those posts.
One thing that was not emphasized nearly as well as it could have been during my interviews at Pitt was the absolutely amazing world class High Performance Computational /Computer / Cluster (HPC) framework that we have.
It took a little work and me bugging colleagues with dumb questions, but I've got some workflows going that need a lot of firepower!
Namely things like FragPipe open search - and R packages that almost inevitably require ludicrous amounts of RAM.
Things I've learned so far.1) The time to get my data to the HPC can be a bottleneck worth considering. My TT Ultra2 is generating around 160GB of data/day right now. Around 1.5GB per single cell and closer to 4GB for libraries and QC samples. Seems to average out pretty close to 160GB. Transferring 1 day of files to the HPC seems to be around 1-2 hours. Not a big deal, but something to consider if you're the person prepping samples, running the instruments, writing the grants and papers, writing blogposts and picking your kid up on time from daycare every day. Worth planning those transfer out.
2) NOT ALL PROGRAMS YOU USE WORK IN LINUX. FragPipe, SearchGUI/PeptideShaker, MaxQuant are all very very pretty in Linux. Honestly, they look nicer and probably run better than in Windows. DIA-NN will run in Linux, but you do lose the GUI. You have to go command line. But what you can do is set up your GUI runs and then export those from DIA-NN. Maybe I'll show that later.
3) You may need to have good estimates of your time usage. In my case I currently get a 50,000 core hour allotment. If I am just doing 80 Fragpipe runs, I need to think about
Cores I need x number of hours I need those cores. I can't request more than 128 cores simultaneously right now (for some reason, yesterday I could only request 64 with FragPipe, I should check). But if I need 128 cores - do I need those for 10 hours? If so, thats' 1,280 core hours I will blow through.
Since MSFragger is ultra-fast but match between runs and MS1 ion extraction is less fast and uses fewer maximum cores/file, there isn't a difference for a small dataset for using 32 cores. Your bottlenecks aren't where you really scale up forever.
4) Things that are RAM dependent may be WAY WAY FASTER. I think we scale to 8GB of RAM/core on our base clusters here. 32 cores gives me 256 GB of RAM! If your program is fast enough to read/write to offset a lack or RAM or use every amount of RAM around to maximum effect, those things can be much much faster.
5) Processes that are processing core speed dependent may be slower. For a test, I gave FragPipe 22 14 cores on a desktop in my lab and 14 cores on the HPC with the same 2 LFQ files. Unsurprisingly, you can really crank up the Ghz on desktop PCs where it makes sense to have lower overall core speeds when you have 10,000 cores sitting around.
6) You probably need help with all installation and upgrades. Most of us are used to that by now, though. I can upgrade my lab PCs to FragPipe 23 today. I need to put in a service request to have someone upgrade me on the HPC.
7) You may have to wait in line. I tried to set up some FragPipe runs before bed and requested the HPC allotments. Then I dozed off in my chair waiting my turn. Then when I woke up the clock had already started ticking. I wasn't using my cores, but I had blocked them so no one else could use them, so they did count against me.
I'll probably add to this later -but I highly recommend this recent study out of Harvard which has been my go-to guide.
No comments:
Post a Comment