Saturday, August 24, 2019

BOLT -- A Scalable Cloud Based Search Engine with an easy GUI input/output!


Full Disclaimer up front: Two papers featuring the Bolt search engine were recently accepted, one on the engine and one pushing the crap out of someone else's "Cloud computers" to look at EVERY currently known cancer mutation that alters human protein sequences in a bunch of files, and I'm an author on both. Somehow I ended up last, but that was clearly just a nod to the fact that I'm, by far, the oldest contributor. It'll happen to you one day. "Oh no, look how hard it is for Ben to get out of his chair after all those knee procedures....it's so sad...let's put him in as the senior author...."

To unnecessarily clarify my position on the papers:  The Bolt engine is the invention of OptysTech (And if you don't want to take my word for it and read all these poorly written words, just go here and contact them for a demo of the software!) Conor and I were lucky enough to get involved in this project and provide the comparison data (we'd search the files on different software, mainly through PD and then compare it to Bolt output) and feedback on the input/output and biological interpretation of the data. OptysTech has given me nothing to write this blog post or the papers (see disclaimers page over there somewhere --> no one wants my irate responses, space on this blog is not for sale, fortunately...who the heck would want it?), except for access to the beta and demo versions of Bolt and I guess they paid the publication fees on the manuscripts if the journals charge them. I do subscribe to their other great software package, Pinnacle, and pay the normal annual fees. And, to be honest, I think I forgot to pay them for this year's meaning that my license is probably expired now (...wait...it's August...?..oh...yeah, definitely expired...), and I ought to look at that and getting them a P.O....cause there is a ton of DIA data we need to look at, and Pinnacle is my preferred way of looking at DIA data due to the cool thumbnails that allow you to QC hundreds of peptides by eye almost instantly.

END DISCLAIMERS. Start cool stuff!

We knocked out this preprint on Bolt a few months ago. Like many preprints, it improved a ton during what ended up being a really positive peer review process (more engines are compared and a lot of exploration of FDR)


Using "Cloud Computing" isn't a crazy new idea or anything. Everyone is using the Cloud for everything else. Big clouds like Amazon Web Services are said out loud in so many places by so many people that I now think when someone starts saying the word "always" that they are starting to say "AWS." Using the Cloud for proteomics isn't a new concept either. This team set up the Trans Proteomic Pipeline to run on AWS 4 years ago and search over 1,000 files in 9 hours. I'm not going to read through the paper, my thoughts are on this dumb blog somewhere, but I remember thinking it was amazingly inexpensive. Dollars or cents per file.

Great proof of concept, with 2 major flaws
1) I can hardly figure out the TPP on my desktop (I'm dumb)
2) I have exactly zero chance whatsoever of setting that up myself in my lifetime.

So when a long time friend offered to show me a rough Cloud interface his team was working on that I could actually use? Yes. Sign me up.

Bolt is a commercial product and I haven't ran a search on it in a while so it might be a little different now that it's available -- but it was already in an interface that I exactly knew how to use. I load my data into Pinnacle -- Pinnacle spends a few minutes converting the file, exporting it to the Cloud, and it's done. The first time I saw it, there was some showmanship involved. Like "pick any human file on your PC and load it into this new box in Pinnacle -- and -- let's talk about something else for a minute -- BOOM that popup is your completed file searched against every mutation in this huge library, and I threw in several PTMs"

I made this picture that sums it up, I think. Our first pressure test started with just single files (HeLa we got off ProteomeXchange or something) and then we loaded as many sequences as we possibly could into it and kept adding modifications to see if we what would break it. Turns out we could load every sequence we could find!

Behind the scenes --


(--I've always liked pointing this out...) -- but behind Bolt is something called an Azure. Around my eyes glazing over when the younger authors explain these things, I have absorbed the interpretation that this is Microsoft's equivalent to AWS.

This Azure Cloud thing can, apparently, scale according to the demands put upon it. Therefore if you do something stupid, like load up the entire NCI-60 proteome project and then search that against every mutation in COSMIC (for example. btw, COSMIC is free for academic use and you should check with them if you're going to use it for not academic use) and then throw in 30+ PTMs and partial cleavage (which -- now that we've really taken a look at -- there are an awful lot of....) Bolt isn't sitting around for days thinking about it. Bolt just magically (from my perspective) uses a ton more cores and memory and things (I'd assume Microsoft's power bill goes up? which I guess probably bills OptysTech more? Magic!) and you get your output in just about the same amount of time as you do for a single file.

I say "just about" with this caveat. Bolt is much faster at my work with the super speed internet than it is on Holiday Inn WiFi. You've got to get the files there and get the interpreted data back. The files are converted before going up and integrated when they return. However, there isn't much of a difference whether I'm using a laptop or my PC tower. The conversion is fast and it might be a little faster on my big tower, but that's it.

And -- this is a serious perk -- all the quan is done and interpeted in the same informative interface in Pinnacle that I'm used to where little thumbnails for all the signals used to generate that numeric ratio are visible and I can go right into them to examine if one of the thumbnails looks funky. For me this is a huge advantage. I've only got so much space left in my brain to learn new software. I already use Pinnacle. I mostly kinda know how to use it! If you're a Pinnacle subscriber already, there's almost nothing to learn!

At ASMS this year I saw a couple new Cloud proteomics technologies on posters. Our data is getting so large that it's inevitable, right? But it has taken us an awful long time to get here compared to every other field in the world and (given there are over 1,000 proteomics software packages out there, who knows, maybe there have been easy Cloud engines for a while, I can't keep track of 4 dogs and where they pooped at the park all at once -- 1,000 software packages?? but this was definitely the first I've ever seen) but, if nothing else, Bolt is a great proof of concept that you can have an easy-to-use GUI software with powerful visual output without sacrificing behind-the-scenes power.

That's a lot of words, I know, this is a thing I've wanted to talk about for quite a while!

Worth noting -- the last I checked, Bolt could only search human data, but that's just cause they have to load the FASTAs on the back end.

And -- I've totally got to point this out --- there is a lot of proteomics software you can buy out there -- and I've been using Pinnacle for quite a while, in part, due to this page on their website:

I spent a lot of time contracting for the US government and they love price level caps. You can order a single nanospray column without getting permission from anyone because it's under your personal spending cap. However, you need to provide a written justification for why 6 columns for $3,502 is a better deal than buying 3 Columns for $2,200 now and then repeating it later. Software is even worse. You have to find where the IT guys are playing FortNite (or whatever) and get one of them to sign it.  So...software you can customize and lease for an amount that doesn't require 90 minutes of tracking Dorito crumbs and body odor through your building's sub-basement? Intrinsically valuable and, admittedly, what first drew me in and got me hooked on this powerful software.

I don't know what Bolt will cost, but if OptysTech's other software is any indication, I don't think they're going to try and use it to buy and island or anything....

No comments:

Post a Comment