Wednesday, September 25, 2019
Error rates in Match Between Runs!
I can't read this yet -- way way way behind on everything -- but this is a super important missing piece in the Match Between Runs (MBR) puzzle.
Again -- that's the secret (not secret) thing the Europeans have been doing years in MaxQuant that makes up for a lot of the reason why they always get more peptide IDs than we do with all our stuff here. If a peptide is ID'ed in one run then it doesn't need to be fragmented in every run if the MS1 is there and it matches in m/z, isotopic envelope (?) and retention time.
Some group in Boston has been doing really smart stuff with proteomics quantification by making absolute samples with things like TKO. In this study they do something similar to estimate the errors occurring when you match between runs. Again -- this needs time when I have time -- but it would be AMAZING to have a metric for how many MBR measurements are true/not true.
For context I've essentially built an MS1 library from around 48 files from ProteomeXchange from a region of the human brain that my friends really really care about and I'm using that with MBR to boost the number of IDs from the tiny tiny tiny amount of protein they've enriched from that same brain region from like 80 dead people. Searched alone -- I can get something like 500 total protein IDs from all of their samples. With the MS1 library I'm up to something around 2,300. Are 1,800 of them artificacts of MBR (or of me misusing it?)? Hooooooooly cow, I hope not, but it isn't the simplest thing to manually evaluate a dataset this large. This study gives me hope because it looks like MBR is making mistakes at the PSM level -- but after you roll the data up the error rate diminished markedly!! I think it is a terrible idea to put blind trust in anything, but my life would be a lot simpler if I just sat back and said "maybe someone at Max Plank knows what they're doing! (and... maybe I can follow a nice instructional YouTube video without screwing it all up...)