Wednesday, August 30, 2023

SCP Viz - A universal toolkit for single protein analysis in single cell proteomics data!

 


Hahahahahaha! Another self-serving post already? 

Not entirely, please read to the bottom! 

Okay -- this is something that I really desperately needed since I started doing single cell proteomics basically full time (around mentoring, grants, trying to keep the mass specs in my lab NOT UNDERWATER, trying to convince biologists that mass spectrometry COSTS MONEY EVEN WHEN WE COLLABORATE) and other fun things. 

And Ahmed Warshanna wrote it for me! Early print here. 

https://www.biorxiv.org/content/10.1101/2023.08.29.555397v1

Imagine this scenario -- you do a silencing RNA experiment or treat cells with a drug that should drop the abundance of a protein and you just want to figure out how well that actually dropped the protein abundance across a couple hundred single cells. Try visualizing that shit with anything you typically use. (Actually PD does a decent job until you get above like 800, but nothing else really does). I've made heatmaps using conditional formatting in Excel from the FragPipe and DIA-NN output and that sort of works and I can drop that report into GraphPad and wait a week for it to make what I want. I came up with a klugey use of relational databases through GlueViz in Anaconda (which I do honestly still use, but it's clear there is some facet of my particular neurochemistry that makes this compatible for me and even close friends trying to be supportive are really wondering about all that acetonitrile inhalation has done to me) and I don't think it has EVER been downloaded from the Github by anyone.

Introducing SCP Viz!  

SCP Viz lets you put in your SCP data from Proteome Discoverer or FragPipe or SpectroNaut (and - everything else, let us know if it doesn't, the preprint features scSeq data as well!) and put in that protein you "silenced" and visualize how it changed. Either the raw abundance/intensity from the software (or the normalized data if you had FragPipe MaxLFQ it or whatever) data across every single cell, or the box plots or histograms or the violin plots nerdy reviewers seem to like that I literally do not understand what they are showing at all. You can log transform your data or impute it and then you can use a sliding scale for your background imputation (I'd rather just see zeroes personally, but people do impute for whatever reason) but here is the coolest part.

Imagine that you've got 50 cells where you've still got background expression of your protein. Your "silencing" doesn't appear to have silenced anything or your inhibitor doesn't seem to have inhibited your protein. What makes those cells special? 



Use a box select or lasso select of the proteins exhibiting the WT or higher abundance (random protein shown, I gotta publish some things) then Unleash that Analysis (that was a compromise it used to say F' it) wait a minute till R does all the things in the background then download a file that only contains those special cells with that higher than normal protein abundance! 

Now you can import that CSV with your subpopulation of cells and really dig into what makes them special. Are those stupid things all twice the size (histone H4 signal is off the charts?) are they using a pore to pump out the silly RNA or drug? Now you can actually get to that data. I've been copying my headers across so then loading the whole text file back in with the important cells flagged so then I've got multiple populations to look at. 

Github is up and we actively (desperately) need users and collaborators. We found a funny glitch after submitting to biorxiv for TMT data we downloaded from the cool TMT cell death study and correcting that is in works. It's a living breathing document/program that will improve a lot as we test more data.

And here is why this post isn't entirely self-serving and why I pushed this out while we're still doing some bugfixes. 

Like just about any American kid who didn't luck into being born rich, the author of this software had to borrow a lot of money to go to college. They'll hand you tons of money to cover the spiraling out of control costs of higher education in my country (last I looked a year of undergrad at the school where I work was pretty darned close to what I take home per year....)

And thanks a corrupt pile of garbage in power, and especially this trash


everyone has to start paying their student loans back starting on September 1st. We can bail out corrupt bankers, forget to tax billionaires, spend trillions on bullshit wars to murder millions of people - as long as they're darker in complexion and far away -  but we can't help an entire generation of young people in this country. And the author of this software can't afford to work for me for the ridiculously low wages that I'm allowed to pay him out of my grants when he has to start paying his loans back. 

SO if you see an application for a young trainee in protein informatics cross your desk, do me a favor and give it a serious look even if it isn't the number of years of experience you were hoping for. Ain't the first paper with his name on it that you'll see in the very near future. 

1 comment:

  1. Gotta say that unleash the analysis is the best button text i have ever seen

    ReplyDelete