Tuesday, February 20, 2018

Do you have a data processing task that sounds impossible? Perseus time!

It's NBA All Star Weekend and here in the U.S.A. and it's a big enough deal that in my rural community we get a school holiday for it. From the sounds I can hear from my yard, I think that nearly all of the local children are celebrating this respite from arithmetic by firing semi- and fully- automatic weapons. I'm exaggerating. I'm sure there are also untrained adults out there with military grade weapons. I like to hope there are two distinct groups of people who go down my middle-of- nowhere dirt road:  The group with the machine guns and the group that throws all the "Lite" beer cans out the windows of their vehicles. What can I say? I'm an optimist!

Around all this revelry I somehow have found time to start checking something critical off of my bucket list. And this is to finally take a look at where MaxQuant and Perseus are today.  And...I feel kinda dumb...

I'm going to start with Perseus first. If you don't have this one your desktop and you have any intention of doing an analysis that is more than peptide ID, you should go here and register (it's free, of course) and get the newest version on your desktop.

Am I always telling everyone to download all sorts of software? Probably. I should justify this.

The current iteration of Perseus can do everything you've ever wanted to do with a complicated proteomics or transcriptomics dataset.

It can process your data through logical and hierarchical filters (and allow you to export your data at every point in the step by step process. NOT JUST AT THE END).  Think about how useful this is for a second. If your workflow looks like poop at the end, you can go back through your data manipulations and look at the report at each step. You can find out exactly where you took that beautiful mass spec data and messed it up.

It also allows single step insanely powerful manipulations of your data. Example: Imagine that, out of the sheer goodness of your heart, you have taken on the data processing of a huge clinical proteomics cohort in a virtually unknown disease. Imagine that this study had the most rigorous QC methodology anyone has ever done for a proteomics study (I didn't do that part. holy cow. the team that did is good. wait. this is hypothetical). Also imagine that you have delivered 16 LFQ reports and everyone is really annoyed that you did Control/Disease state, rather than DiseaseState/Control. (It's clinical, this is a bunch of MDs) and recreating those 16 Consensus reports is more than all the goodness that has, or ever will, exist in your heart.

Perseus? Just pull in all the table for all the values and hit the Transform button. Type 1/x and export the report.

I think I literally or figuratively (I get those mixed up) just chose the absolute least powerful thing that you can do with Perseus as an example because it saves me 16 hours of Consensus workflow processing.

What if you have a bunch of SILAC experiments that were done a few years apart and someone realizes that these would be perfect for comparing the light labeled version of 3 of them from the 2011 study to the heavy standards done last year? Sounds like a nightmare, right? There are 10 ways you could do this (PD could do it) but Perseus is actually designed to do it. That's kind of what it is for. There are tutorials specifically made to address this!

If you are thinking -- "wait. aren't you really hard on MaxQuant and Perseus in this blog?" Yeah. Totally. I can't remember even 1% of what I've written on this site, but I think that all of the criticism has been regarding how challenging the software is for beginners or for simple experiments. My first favorable comparison of the two software packages was when PD 1.2 (I believe) could get me the same results the version of MaxQuant did at the time but could do it with a simple saved template that I could generate results from just by hitting the "Play" button. PD has grown up a lot and it is the software I will go to every time (my lab has like 7 licenses and Mascot! w00t!). But if you have something nuts --like -- absolutely nuts -- you may enjoy your life a lot more if you go to software that can do something like this.

This is a multiscatter plot showing the Spearman correlation coefficients for the quantification of 9 different cell lines versus one another. The coefficient is overlain on the plot and the orange is the visualization of one set of proteins selected in a single plot -- carried over to where is this set of proteins present in EVERY OTHER SAMPLE SET.  Is there a set where your proteins of interest are not showing up in the low ratio range? Easy to find that plot, highlight it, it becomes the active plot and then you can examine them manually.

Now -- I have to be honest. I haven't done these plots. I stole them from last year's MaxQuant summer school lectures. But -- this is important -- I'm giving it a go right now -- and I'm just feeding Perseus PD data. I want to do something that is tough and time consuming in PD, so I'm just feeding it into Perseus. Oh -- and I'm also giving Perseus transcriptomics data, too. Cause Perseus doesn't really care what it's looking at, so long as you tell it the right format!

If I convinced you to also give up your next holiday to learn Perseus. I recommend you take the time and start here.

Part II is here:

And Part III (my personal goal for today): This is the video where Dr. Tyanova shows all the clustering!!

As an added bonus, Dr. Geiger is really funny. You have to really be paying attention to catch it and I suspect if you are replaying the video and pausing it while trying to replicate her live data manipulations it's easier to catch her subtle jokes than if you are sitting in the audience. Or the summer school participants are just really serious (as they should be). You may find yourself looking around and wondering why no one else laughed and then realize you're in your office and there is just a sleeping dog and it's 5pm and you haven't had breakfast and maybe low blood sugar makes you laugh at things no one else laughs at. Who knows? I prefer to think that Dr. Geiger is really funny.

Yes, I just suggested you watch 3 hours of videos and to work along with these awesome operators to learn Perseus. This much power doesn't come for free! There are other resources as well. This great recent paper and there are great focused tutorials and (non video) use cases here at www.coxdocs.org.

No comments:

Post a Comment