Friday, March 13, 2026

NIFty -Never Impute Features (thank you)!

 


This was the first poster I hunted down after Day 2 Lightening Talks at US HUPO 2026, and now I can share it with you guys! It's great because another SCP Biorxiv preprint came out this week and my opinions on that that one definitely have to stay in my drafts folder. This is a place for positive commentary, mostly! 


Proteomics has a very strange relationship sometimes with the concept of zero. I get it, dividing by zero is not allowed in Excel, and that's probably a significant part of the problem, so there are piles of smart-ish ways to not have a zero. Realistically, single cells are still at the limits of our detection limits and zeroes are pervasive. Everyone's heard this a million times, but the first good scSeq dataset I got on a drug I was studying was a little more than 90% zeroes. There were a couple transcripts that were detected in less than 10 cells out of 7,000 cells so...for that transcript, in that study it was more than 99.875% zeroes. 

SCP also has batch effect problems that primarily effect lower abundance proteins. Those high abundance proteins can look pretty great from study to study. 

This is me rambling about what I understand about NIFty, which tries to solve both problems by 1) using zero values as if they mean things and 2) looking for classifying data using the proteins that are less variable between batches.

And then there are a bunch of things like this, which an AI just informed me indicates that you are supposed to read the formula in a snobby British accent. Which....makes me think I've pushed the Chipotle support bot a little too hard today. 


You might just want me to stop typing and post the Github so you can check this awesome thing out yourself! 

https://github.com/PayneLab/nifty

No comments:

Post a Comment