Sunday, August 11, 2019

Correlation tools for coregulation(?) analysis!

This is another win from the awesome scientific tool that is Tweeter. I wish there was one just for science, though...

I'm studying two things right now where we're paving new ground. One is human brains and the other is plant material. I can't go to Ingenuity Pathways to help me interpret my data (yes, there is brain stuff there, but no one has ever seen what we're looking at, and no established pathways line up with our differential proteins. This is surprisingly common, by the way. Overlaying your data on established networks is starting to seem less powerful to me every time I do it.)

Last year I saw a talk by Wilhelm Haas that has stuck with me for a load of reasons. One of those reasons is that it seemed like coregulation is what his team finds most important in cancer. It's a simple idea that I'd almost stumbled blindly to myself, but is now a central thought in every study I do.

Here is the idea ---> what is important is the proteins that all go up or down in abundance at the same time. Wait. Was that exactly what you were always doing anyway? Of course! But here is where it diverges from my thinking. I get a list of up regulated and a list of down regulated and I try to overlay that list on an established pathway. Seriously, the more I think about this the more dumb I feel... What if I've got a new pathway no one has seen before? Did I just bias my results to old data that might not be relevant? What about the cool new protein that changed? Where did they go? Did they just get ignored because they don't line up with the pretty Ingenuity figures...?...

By establishing a list of the coregulated proteins as intrinsically important we might be able to find the pathways ourselves! Should I keep typing if every word makes me seem even more clueless than the last.....

Bernard (not named after Dr. Delanghe, despite what you might have heard) struggling to escape a koala costume (best picture we could get. He's grumpy for a Belgian Terrier) puts things in perspective for me and I feel much better!

Okay -- back on topic. I'm building new pathways where they've never been found before. Time for coregulation analysis!

Before I made Conor (@SpecInformatics) spend his weekend writing what I wanted, I went to Tweeter and asked for suggestions. Obviously, I could use Perseus for correlation analysis. Or Excel. I don't want to use the former because I'm lazy. I don't want to use the latter because people will make fun of me. Sometimes people in my own home...

Want some powerful correlation analysis tools on your desktop? Check out this Java thing at Metscape. It's called the correlation calculator. Definitely download the text file example if you want to use it so you get the formatting correct.

HeatMapper is a great suggestion. 100% recommended. Web based and easy. It's here. A heatmapper figure is definitely going in the supplemental of this paper I'm currently putting guess...hey, it's my Saturday, I can procrastinate a little....

Just in case you know R or Python. These suggestions were popular solutions as well.

I presume that corrr would be an easy to find central package in R. Maybe part of the TidyVerse thing I hear so much about.

Payne lab already has a Python package out that utilizes the looping of pandas or something. It's available here.

All great suggestions, and I sincerely appreciate the Twitter Proteomics community for the ones I'm not going to get to here as well.

In the end, I took the data (1,000 metabolites quantified over 12 samples and the by-the-book IonStar results from the proteomics of those same 12 samples (about 5,000 proteins) and gave the two lists to Conor and late on Saturday night I got a list of each metabolite and each protein and the Pearson and Spearman correlation coefficients and corresponding p-Values of each one.... I don't ask details, I just ask for them to be typed up in the methods section, but I presume that it is similar to the flipping Pandas thing from the CPTAC Github above....

The most accurate Tweet of my summer... And the results are exactly what I wanted!  Data from just 6 samples shown below. My collaborator is very interested in this metabolite. Patient 4? Tons of it!

The top correlating protein from the PD results? 0.999 Pearson?

2 peptides for this protein in sample 4. Spot checking more of them suggests this worked great!

Is this what Dr. Haas meant by coregulation analysis? Maybe? Maybe this is song is just a tribute to the greatest data interpretation method in the world. However, I'm not sure I've ever had collaborators as pumped to see a spreadsheet before....

No comments:

Post a Comment