Wednesday, October 11, 2023

SpectroScape -- Real time spectral query and visualization in proteomics!

The outside world (outside of proteomics) is generally a lot better at computer things than we are. We're a tiny little industry compared to things like social media where inserting an ad for a heated toilet seat in-between someone's wedding photos can simultaneously trigger revulsion and $100 million in bathroom product sales. 

A great strategy for a tiny innovative industry okay with borrowing ideas is to - borrow everything we possibly can! And SpectoScape is a superb example! 

SpectroScape borrows a central FaceBook algorithm called FAISS which is: 

"This is made possible by an indexing scheme based on the inverted file and product quantization encoding (IVF-PQ) algorithm in the Facebook AI Similarity Search (FAISS)22 library that groups spectra in neighborhoods in high-dimensional space, defined by approximate spectral similarity. Given any query spectrum that the user supplies, the method efficiently retrieves all its approximate nearest neighbors in the entire repository and performs real-time spectrum clustering by computing accurate pairwise distances among the query and the neighbors to reveal any cluster(s) in the neighborhood. The user can then visualize the result in an interactive web-based user interface. " 
The end result being that you can real time hunt down a spectra matching one that you're interested in  -- OR - one that is SIMILAR TO the one you've got across something as big as MASSIVE.

The authors have a lot of big plans for something that allows data analysis/clustering/visualization at this kind of scale, but the first thing that pops out to me is the whole dark proteome thing and how valuable this can be.

Imagine if we flipped proteomics sideways and did it the way we do Metabolomics most of the time. What we do there is basically say "this ion comes off my column at 2.3 minutes and has this mass and fragment pattern and it is important under these conditions. No fucking idea what it is, but it is clearly involved" Then you go and try to figure out what that thing is. I accidentally agreed to do another natural product discovery study because one I was sure there was no chance in hell would work totally ended up being one of the biggest papers I've ever been on. That's how you do that as well. Here is this peak, wtf is it? I commonly wonder if we're just being dense doing proteomics the opposite way. 

This is something I think could truly be enabled by this truly awesome new resource. They envision the almost instantaneous generation of new spectral libraries and the integration of new data into our comprehensive knowledge of the proteomes of life on this planet and that's also pretty cool.

No comments:

Post a Comment