For anyone unfortunate to visit this blog in the past, you might have seen some of my early analysis and reanalysis of a high profile Nature paper in December. I'm not going ot put the link here. But the idea - which got mainstream attention - was that we could measure protein abundance in plasma and use that to infer the biological age of different human organs.
Here was me going through it - increasingly appalled by a several aspects of it.
As you'll see I was less annoyed by the central premise - that 4x more transcript abundance might mean a protein is organ specific - than I was by the lack of publicly available data - or that the validation was performed by ...measuring RNA......not protein....
That rant got me a really cool interview with the science reporter for the Wall Street Journal and a soundbite in the mainstream press.
Here is the thing, though, while appalled that a seemingly arbitrary and overall rather ignorant level of assumptions about using transcript counts to predict whether a PROTEIN came from a specific organ doesn't mean that all of the results are meaningless. It could be that if you examined every organ in complete isolation and said "if I count 4x more transcripts for this protein in organ A than in any other organ then that protein might be pretty specific to that organ."
So I thought something like "wow, wouldn't it be great if there existed somewhere in the world actual protein level measurements of different human organs?" Something like these 2 studies that got the cover of this exact same journal 10 years ago?
Or - more convenient -
more recent data that is higher depth and really addresses a lot of the weaknesses in the two articles in this 10 year old thing I love so much I have the cover framed (the artist who made it is super cool and I respect him a lot.)
Imagine this - you use the same cutoff that Oh et al., used - the protein abundance needs to be 4x more than any other organ? In isolation. As if every organ is completely disconnected and that protein bearing material doesn't get transferred between them in some sort of an interconnected fluid based system.
What's the overlap between the proteins predicted to be organ-specific between the transcript based data and the proteomic data? (Keep in mind there is not 100% overlap of every target or organ).
Want to follow my step by step analysis? It's in Excel and I tried to make it very clear. It's a bunch of VLOOKUP and things. Heck - this is how the Open Science Framework is supposed to work! Check it out and please tell me if it's wrong or flawed. I spent a lot of time looking at it (and spotchecking "organ specific proteins" at http://www.humanproteomemap.org/
Drumroll....?
59.6%. Better than flipping a coin! But...not...much...better....
Okay - but hear me out. What if you actually consider that organs ARE connected by a, I dunno, let's call it a "circulation system" or something "circulatory?" I like that one - that could hypothetically carry proteins from multiple organs. How many of those proteins are higher in abundance - not 4x higher - just any amount higher than the summed abundance of the organs we have solid PROTEIN LEVEL measurements on? Let's just use the organs in the 29 healthy human tissues map that have a match in the recent nature paper.
45.4%. Worse than flipping a coin, but -again - not by very much.
Now, is Ben just screaming at the sky again? There are grownup ways of doing this stuff. Like contacting the editor at the journal and asking if you could put in a commentary or a "matter's arising" that discusses that the very basis of a paper is intrinsically flawed.
I did that.
And the editor asked me to have a conversation with the authors - so I contacted the senior author about my concerns. I'm not sure if I can share the emails so I won't but I'll provide my interpretation - I am not sure if gaslighting is the correct term or if it is just sometimes a feature of academia where a Professor assumes anyone who isn't one probably doesn't know 1% of what they do and talks down to them? Hard to tell. But this is a summary of the conversation.
1) They'd love to share the proteomics data, but it's impossible to share any sort of -omics data without waiting months or years. Ben sighs. Obviously this is not only inaccurate, it is shockingly ignorant.
2) They might be forming some sort of a consortium to make -omics data publicly available. Ben shuts his PC off for the day. We have a global, extremely well organized multi-national system to share proteomics data. Please do not invent one. Please.
3) Looking at proteomics data publicly available is something that they might try one day. So...yet again... someone with SomaLogic data skips the easy and obvious experiment.
I spent a lot of time working on my focused breathing and redrafted my email a few times explained that in proteomics data sharing is considered mandatory- has been for a decade - unless patient data is compromised or it is flagged for national defense or something. And shared a summary of the analysis I linked above. I also shared something else that is a decade old about why we have to share proteomics data.
Then I shared this with the editor with my analysis and they thought about it for a month or two.
I received an email from the editor that they had a meeting and couldn't see how actual protein level data could add anything to the findings of the paper and rejected my paper.
I guess that was the issue.
I don't want to add anything to the findings of this paper. I want to point out that the whole central premise of the study is silly and that - in the absence of publicly available data for reanalysis - no aspect of it should be taken seriously at all. Because when you actually look at proteins themselves - which this study is based on - and compare those to results that have been analyzed and reanalyzed, there is virtually no support for this study.
Along the way somewhere I found out that - unsurprisingly- there is a whole company being spun out of the results of this study. I mean....who wouldn't want to know that their liver is 15 years older than it should be? Right? Cool idea, sign me up! However - there is no reason to believe - at all - that the methods detailed in the study can make anything at all like those kinds of measurements.
Here is my analysis - whole thing open with step by step instructions. Please check my work!
Oh wow. While I was rereading this rant the preprint went live, but given the format I wrote it in, the post is actually longer.