Saturday, February 22, 2020

Proteomics Data Mining Challenge 2020 is go!

(Previously known as the First Annual News in Proteomics Data Mineathon Challenge)

Some people have signed up from these places* 
(*Of course, participation is voluntary and does not imply, in any way, the endorsement of these institutions.) 
However, this looks pretty impressive, right?  And it's not all of them! 

Participants in the challenge are coming from 5 continents!

Details can be downloaded from the page -- but here is an overview.

Amyotrophic Lateral Sclerosis (ALS), often called Lou Gherig's disease, fucking sucks.

You can read about it in the New England Journal of Medicine here.

Here is an article in Frontiers that helps elaborate on how things have been going:

There isn't a cure yet and best I can tell, the diagnostics aren't very good. There are some promising treatments, however, but they aren't getting out to patients rapidly at all.

Here is one from a company called Brainstorm that is showing promise
as well as
another recent advance from researchers in Houston.

I'm no ALS expert, I'm a loud-mouthed mass spectrometrist, but I've been listening to people about this a lot lately. My understanding is that in some cases there are genomic components, but in some patients, there is not -- or it hasn't been uncovered.

However, in both cases, like most of these neurological diseases, this is a post-transcriptional problem. It's either proteins or it is post-translational modifications.

No surprise that there has been little in the way of success on it, right? Genetic diseases can be approached with genomics technology that is either mature -- or, well, at least 10 years ahead of protein technology, in most regards -- and definitely when it comes to informatics!

This challenge is a test to see if we can help find something interesting in ALS patient data with today's proteomics data analysis techniques. We're going to use files from this study. (Chorus 1439)

What do we have?

33 cerebral spinal fluid samples from patients with ALS
33 matched controls
Plasma samples as well!

However, we've chosen to focus on the CSF for this (we will NOT turn down plasma data, but the focus is on the CSF)

It's from Michael Bereman's lab (AutoQC, sProCOP), so the quality is obviously great. QE Plus single shot data. Still a lot to process for PTMs.

I'll put a FAQ up on the page that is more formal, however, these have been some questions:

Q1) I'm from a software vendor, can I participate?
A1) Fuck yes,  you can participate. If you find the important PTMs and have the best data, I will buy your software. That is a promise. I'll go around telling everyone else they should buy it.

Q2) Can I just use a commercial software package I have?
A2) Please see answer A1.

Q3) Weren't you involved in some software development stuff? (No one has actually asked that)
A3) However, I am not judging in any way. I'm the hype man. Picture DJ Khaled with a more nasal voice, just going "Yo!" and "What!" while you're talking. That is what I'm doing here.

Q4) What is the goal again?
A4) We want to find the most important PTMs that appear linked to the diseaase. Bereman lab's original study found some interesting proteins. Let's take it beyond that. Let's see if, as a huge -- kinda scary huge -- team we can find something that can help move ALS research forward.

Official emails will go out to all the participants as soon as I figure out how to get everyone's email addresses into my contacts folder correctly.

1 comment:

  1. I am so excited by this! Downloading now and will begin searches later today!!!!