News in Proteomics Research: 2024

Sunday, October 20, 2024

Streamlined proteome stability - find those drug on- /off- targets on 30 samples/day!

What a week for proteomic applications!

This came up on great study I rambled about earlier this week, but drugs almost always target or affect proteins. Hopefully just the one you care about, but off-target effects can and do happen where that drug binds some other proteins.

PISA has been discussed on this blog before and you can probably find it in the search bar, but this and related applications expose a proteome to a drug and look at the proteome effects after treating the proteome at different temperatures prior to digestion.

If your drug is binding to a protein chances are that protein is going to fold/unfold at different temperatures and alter the downstream peptides you quantify. Super cool stuff, right?!?

The problem with just about every earlier study is that it takes freaking forever. Remember that to get a decent coverage of the proteome even 5 years ago could take 1 day/sample. This new study walks through optimization of a bunch of steps and gets to a really solid and approachable method...

...with great throughput! At 30SPD using DIA on a solid and pretty affordable workhorse of an instrument, these authors characterize the on- and off- target effects of 22 different drugs in record time.

The data on drugs we know about matches up really well with older data and adds a ton of credibility to the uncharacterized drugs. If you're just interested in the drug output data, it appears to all be up there on Zenodo here.

Saturday, October 19, 2024

Spatial proteomics identifies an effective treatment for a lethal skin disease!

Holy shit, y'all.... if you're going to read 1 proteomics paper in 2024. This should be that paper.

I'm going to change the formatting of the blog to add a "success stories for proteomics technology" or something over there -->

I haven't before because .....it's not a very long list..... THIS IS ONE.

1) Let's start with one of the most horrific diseases you've never heard of. TEN or Toxic epidermal necrolysis. It's as bad as that last word suggests. Patients are on other treatments - it looks like typically from chemotherapies, but possibly also from other treatments - AND 30% or more of their skin dies and falls off. Mortality rates are high. Skin is an important thing for humans to have all of.

2) Screw the method and approach and the fact that a TIMSTOF SCP and an Orbitrap Astral were used in combination with multiple really good spatial techniques. All cool stuff -

THEY IDENTIFIED A MECHANISM - something we already have approved drugs in the clinic for - and it was something we already had drugs for in the clinic -

AND THEY CURED PEOPLE! 10 patients?!? I was excitedly reading this on my phone while my kid was digging a hole outside and I'm only getting to type this while he's in the bath.

All the stuff in the middle is important. They did the deep visual proteomics workflow with the TIMSTOF SCP. They derived cells for deep proteomics/phosphoproteomics from limited material with the Astral. What they found in both the FFPE tissues they were analyzing and multiple relevant models was that the disease messes with JAK/STAT. JAK inhibitors cured mice - and then - they worked on people!

Such an inspirational, exciting and beautiful study.....

Friday, October 18, 2024

More evidence the blood brain barrier is a drug metabolizing system!

I've never had a pharmacology class. I started with a book called something like "pharmacology made very very simple for people who are a little slower than average." In that book it is pretty clear that drug metabolism occurs in the liver. You can find similar things by googling "where does drug metabolism occur" like this nice picture from the European Patient's Academy.

So when (now Dr.) Abigail Wheeler hypothesized that toxic effects of HIV antiretroviral drugs were due to drugs being metabolized by cells at the brain and not cells at the liver, she had several tough years to build evidence for that case. She had to quantify metabolism products and use painful targeted quantification to make the case that drug metabolizing enzymes were really present in a lot of places outside the liver.

Fast forward some technology improvements and a couple years of hard work by another young scientist and some helpers and - here is how and where that drug metabolism (and transport of those drugs and drug metabolites) happens at the blood brain barrier!

Again - this is some controversial stuff - so there are pages and pages of validation including western blots and FACS and efflux assays and other words I don't know.

For the proteomics stuff, diaPASEF on a TIMSTOF Flex (later model, so Pro2 cartridge) was used to characterize the cells that make up the blood brain barrier. The files were processed in SpectroNaut and the proteomic ruler technique was adapted to generate solid copy number and nM concentration estimates for 8,000 or so proteins. Those numbers are summarized on a nice Shiny web portal which can be directly accessed here.

Oh yeah, and I didn't do any of this study, I taught author 1 how to do really good proteomics and author 3 how to write stuff in R, kept the service plans paid and tried (unsuccessfully) to keep all the instruments from being destroyed by floods. Boom - Hannah wrote a really nice story that helps illuminate some serious questions we have about drug toxicity and I have a great new resource bookmarked at the top of my browser.

Thursday, October 17, 2024

Three different retinal degeneration mutations result in the same (treatable?) phenotype!

Need to read something super positive and optimistic today? I strongly recommend this new study in press at MCP that totally made my day!

It's really easy to look at the broad range of different genetic mutations that can lead to a single disease and think.....

Retinal degeneration diseases ABSOLUTELY fall in this category. Check out this associated paper on progressive vision loss in dogs.

Mutations on 17 different stupid genes are known to lead to just progressive retinal atrophy - which is just one of many diseases that cause dogs to go blind later in life.

If you are in drug development in either primary research or for applied for-profit stuff what do the odds of success sound like for a disease caused by at least 17 different things? Can you convince someone to help fund you while you chase targets that may only help a small percentage of those afflicted?

Almost always? No. That's a bad elevator pitch and a worse grant application. In pharma? Start sending out CVs before you ask.

Why this paper is so very very cool is that they took some of the mouse models for progressive retinal degradation (mutations on different genes!) and looked at the proteins that actually change vs controls. They're the same!

Unnecessary reminder for most people here (good for outsiders, who still can't seem to get this stuff straight)

Genome is genotype, that's what the DNA says, but that isn't what is physically happening

Proteome is often the phenotype (what is physically happening!) (or at least very close and involved in the phenotype)

AND - Nearly all drugs target proteins!

These authors don't miss the point here either. Who cares what the gene is that caused the protein change if you know the protein causing the problem? Not me, not these authors, and certainly not patients. Cause now you've got something to develop a drug against!

Tuesday, October 15, 2024

Revisiting the Harvard FragPipe on an HPC technical note in terms of total time/costs!

I read and posted on this great technical note from the Steen groups a while back and I've had an excuse to revisit it today.

Quick summary - they ran EvoSep 60SPD proteomics on a TIMSTOF Pro2 on the plasma of 3,300 patients. They looked at their run time on their desktop and estimated processing it the way they wanted to would take about 3 months. Ouch.

What they did instead was set the whole thing up on their local high performance cluster and they walk you through just about every step.

It took them just about 9 days to process the data using a node with 96 cores and 180GB of RAM. They do note that they never appeared to use even 50% of the available resources, so they could have scaled back in different ways.

Where I was interested was - if I was paying for HPC access, how many core hours would I be set back for doing it this way? 9 days x 24 hours = 216 hours x 96 cores puts it at 20,000 core hours, right? I know some HPCs track how much you actually use in real time based on the load you're putting on their resources, but others don't. So it's probably at the very most 20,000 core hours. Which is the estimate that I was looking for when I went looking for this paper.

Not counting blanks/QCs/maintenance - 2 months of run time for a 3,300 patient study. 9 days to process. It's such an exciting time to be doing proteomics for people who care about the biology. And - I'll totally point this out - 60 SPD isn't even all that fast right now! It's a 6 week end to end study at 100SPD!

Thursday, October 10, 2024

Use a carrier channel - to reduce(!?!) your boring background!

This smart new technical note does something that I think many people have thought about, but both pulls it off AND methodically dissects it so it's now a completely valid tool to put in our utility belts.

Problem: There are 10,000 proteins here and I don't care about any of them. I care about the stuff after those first 10k.

Traditional solution: Fractionate and fractionate some more and cross your fingers.

New idea - Isobaric tag (TMT is one solution) all your peptides. Then tag (with a different channel) a higher abundance amount of the peptides that you care about.

Perfect application? Infected cells! Even if you've got a super duper bad bacterial infection pretty close to 100% of the protein around is going to be human. But if you label bacterial proteins and spike those in at a higher level you've biased your stochastic sampling toward the bacterial proteins and effectively reduced the host background!

Where this shines is the pressure testing. Smart standards are made and tested and tested. Instruments that can reduce coisolation with tricks like MS3 seem to be the best. Ion mobility (here FAIMS) coupled MS2 comes in second and MS2 alone has a lot of background, but still works.

The proof is divided between a bunch of public repositories. Easier to copy paste than link them here.

Wednesday, October 9, 2024

How much do sample specific libraries help in DIA low input/single cell proteomics?

At first this new study is a bit of a head scratcher, but once you get past the unnecessary nomenclature, it's worth the time to read.

Ignore the DIA-ME thing altogether. I should remove it from the title. Wait - I have a car analogy - just about every review of the Ford Mustang Mach-E is something like "this is a really nice EV, we were just confused about the whole Mustang thing."

DIA-ME is just a name for how literally everyone processes single cell DIA data. We know library free isn't as good as library. And we know that it really doesn't make sense to look for transcription factors in global single cell data. Not even the marketing releases at ASMS have claimed to get to proteins at 10 copies/cell and - oh boy - there are some slide decks from ASMS 2022 that no one has published yet...and not just because I'm reviewing every other SCP paper and limping around punching things while typing anonymous snarky things (I'd rather write snarky things where everyone knows who I am and why). So you run 100 or 200 of your cells on your super sensitive new instrument and you make a library out of that data. Maybe you do that 10 times. Then you analyze your single cells against that library. Works great. Walkthrough here for 2 popular programs.

However - we're all largely doing that because you've got to get 1,000 proteins/cell to get your paper published in a Nature family journal. How much does using these sample specific libraries effect our results and the biological findings?

That's the gold in the method of this paper. These authors painstakingly disect it with spike ins and different library loads and it's all very telling. They use 5 cell and 20 cell and 100 cell libraries and on and on.

If you're interested you can read it. I'm adding it to my reference folder for later.

THEN - the paper gets cool. Forget the mass spec stuff - this group takes some U-2 OS cells which are one of the best studied cell lines for understanding circadian rhythm (smart! stealing this idea for some targeted stuff coming up) and they hit the cells with Interferon gamma. I don't know how to make the funny greek letter thing.

And - no real surprise to anyone who has seen a control/dose response thing in single cells - they identify 2 very different populations of cells. In fact, the two populations appear to be almost entirely opposite in their response! There isn't as much on this as you might hope from the biology side, but it's still cool. Would we want every single one of our cells to go into a pro-inflammatory response? Probaby not! Most adult humans I know are doing everything they possibly can to reduce inflammation whenever possible because that stuff is gross and toxic.

It drives home how important it is for eukaryotic cells that not every cell is going into a full out inflammation cascade when messed up cells derived from a cancer patient and grown in plastic since 1964(!!!) are exhibiting a bimodal response. I was snarky at the beginning of this post, but I think it's both an important and very interesting study, as well as both visually pretty and well organized.

Thursday, September 26, 2024

Wait - do we even need high resolution mass spectrometry if we're doing protein ID/quan?

This one is totally worth thinking about - AND - it's open access!

A lot of proteomics today is just measuring protein abundance, right? And now that we have all these cool ways of predicting and matching the relative intensity distributions of fragmented peptides, do we even need to go past unit resolution mass? Or....did someone....just convince us we absolutely needed it all the time....

Yo, I am not a big unit resolution anything fan. I've been stuck on things like - is this a nitrate or a sulfate or a phospho and it's big enough I can't tell what the monoisotopic ion is.

You know - mass spectrometrists probably don't get enough credit for how absolutely bizarre our sense of humor can be about the stuff we do.

Chris posted this paper and it descended into chaos

This is funny because citrulline is such a pain in the ass PTM that even Orbitraps suck at determining what is a citrulline vs what is an M+1 isotope when it's an intact peptide. And you aren't just fragmenting that probably-not-really-citrullinated peptide - you're fragmenting all the crap around it in this big dumb window.

And the whole reason I'm writing this post instead of cleaning my house before my Mom - who will totally tell everyone back home that my house isn't clean?

Spit out my coffee. OMG. It's so great to know a group of people this funny.

Back to the paper - this is super important. We've got people out there measuring proteins with arrays and antibodies - poorly - but rapidly - and some of us are about to lose our lunch money. Maybe we are overdoing it here and there. And ion traps are tough - and easy to build - and fix - and they can be screaming fast. And they can be cheaper to buy and run with those little vacuum pumps. It's totally worth thinking about.

Wednesday, September 25, 2024

The current status of the NCI Proteomic Data Commons - it'll get there!

The National Cancer Institute Proteomic Data Commons is such a big big big idea. And it is dealing with super important human samples in formats that are generally evolving. I honestly can't imagine what a hassle it is to pull something like that together - but they've got one heck of a team working on it.

You can read about the current status of things and where it's going here.

If you just want to dig around and try to look for things you can check out the portal here.

If you're used to other big endeavors like the Human Protein Atlas or ProteomicsDB you might find yourself wondering - did US Government employees design one of these things and ...not.... the others? Well.....maybe....why would you ask that.....?......but again, that's an absolute shitload of data and it's tough to make it organized. Again - super cool plan - and when they inevitably get it all working please check the date on this post before you leave a comment "what is this weirdo talking about - it's awesome!" Or do leave the comment so I can check back.

Tuesday, September 24, 2024

How many scans/peak do you need for accurate quan in LCMS?

This is a couple of years ago but this group makes a compelling argument for 6 scans/peak!

That's about 1/2 what I'm generally trying to get (10-12) but as I'm looking at a LOT of recent data from different instruments it looks like I'm old fashioned. I might need to put up a poll to hear what the community thinks.

Monday, September 23, 2024

What could you do with some free proteomics? EuPA YPIC Student Awards accepting applications now!

Are you a student with a problem that maybe some proteomics could solve? Or do you have a great new idea that you just need some real evidence could change the future of proteomics research?

Applications are open til November 24th. Students at European universities who apply can get up to 5,000 Euros to use for their work. Finalists get a free trip to Greece to pitch their ideas.

Another game changing idea from the EuPA YPIC! Find out more here!

Sunday, September 22, 2024

Accurate transient lengths/times for Exploris 480 (and very related systems!)

I stumbled backwards into this when I realized my Excel sheet cycle time calculator wasn't lining up with Astral data. Turns out I either had the D20 high fields wrong, or there have been some incremental improvements since the QE HF launched (....a while ago....) either way - this is a cool paper and it's what I'm using to fix my math.

Friday, September 20, 2024

Did y'all know DIA-NN has an integrated viewer now?!

Alllriiiiight! I've been doing this thing recently where I forget things here and there like -

Maybe not a lot but I was really surprised to find out that my desktop PC at home is 8 years old (eek!) and my wife said something really weird about being 40 soon and I used to be older than her.

I also just discovered that this old PC is running DIA-NN 1.0.0 -and Vadim mentioned something about 1.9.1 - and - WHOA - what an upgrade! (I was doing SLICE-PASEF a couple years ago so some computer was on a more modern version at some point) - HOWEVER - it isn't really obvious what one you're currently using and this one HAS A VIEWER!

On top of "hey! here is a viewer!" it's really fun to go and just see how many scans/peak you're getting for published data!

Thursday, September 19, 2024

Calculating and reporting CVs for DIA proteomics!

Want a great use of 10-20 minutes? How 'bout some quick tips on how to get the best possible DIA data?

There are a bunch of gems here - like 3 in just one paragraph here!

Thermo and SpectroNaut are friends again!

So....I think there were some ...concerns... in some corners about how one very popular and polished commercial software package was getting with one company.

Boom! At least 3 vendors seem to be fully partnered up with one of my personal favorite tools! You can read this new announcement here.

Wednesday, September 18, 2024

Analyzing the complexosome(?!) of malaria infected red blood cells!

Okay - so - y'all ready for a cool sideways approach to find protein protein interacting pairs?

Is your first thought?

Ummm....don't we have 1 million of those? Like immunoprecipitation and affinity enrichments?

Sure - if you have an antibody to every protein in your organism. Do you have an antibody to every protein from a malaria parasite? We don't even have a good FASTA database for it.

Ummm...okay well we've totally got APEX and BioID!

Sure - you just need to convince someone to fund the development of hundreds of mutant strains of a parasite that pretty much only kills very very poor people. Again, we don't even have a very good FASTA database for this organism.

What about native CE complexosomics with a gaussian interaction profiler? WTF is that? It's a technique that can fill in the blanks I mentioned above!

Now - it doesn't look like a ton of fun - the other things are easier -but you basically lyse your cells under friendly enough conditions that you don't bust up the protein complexes and interactors. Then you take fractions by (in this case capillary electrophoresis) then you just digest everything in those fractions, analyze it like regular old shotgun proteomics. You need to use the Gaussian thing to backtrack your way to the interactors. It appears to be, in this case, totally compatible with MaxQuant label free output.

This is where I probably sorta get what is happening - but I think this is a lot like when we try to hunt down a natural product with an enzymatic reaction. If I have 30 fractions of all the small molecules that a weird mushroom/algae/or bacteria I've never heard of and fraction 6 has a little activity, 7 has a little more 8 has a ton, and 9 has an almost detectable amount - we start eliminating molecules by those that do/do not follow those trends. Ideally the one molecule with activity will perfectly track to that, right? As a disclaimer I send every request for natural product discovery to ANYONE else and if they strike out there then I'll do it. Has worked twice - in 2 decades.

Similar here - these gaussian models help backtrack the proteins that are most statistically within the clusters. Sounds smart, right?!? And if anything else can help you really backtrack to native protein-protein complex interactions in understudied organisms (and in this case a parasite- human interactions) I can't think of one. What a superb new tool for our utility belts!

Tuesday, September 17, 2024

Proteomic signatures of immunotherapy response!

...sometimes it's all about getting access so super cool and important samples and just not cutting corners! That appears to be the case in this great paper in press at MCP!

They had access to patients undergoing PD-1/PDL1 immunotherapy and tracked their plasma. The cohort isn't the biggest one on the planet, but it sure does seems well controlled for variables.

The authors depleted, facilitated digestion by doing a very quick one-lane SDS-PAGE cut and digest - and then SWATH'ed everything. Large spectral libraries derived from cancer studies were used in addition to more common human ones.

Where this really shines is that they tracked this and found out how the individual patients responded to the therapies and backtracked to interesting patterns when people respond well - or don't. As the patients weren't on the exact same regimens, the statistics do seem to get really complicated - and beyond my ability to really evaluate for a short blog post, but the downstream validation on a second cohort upholds the strength of their initial findings!

Thursday, September 12, 2024

FragPipeAnalyst - Save yourself a couple of button clicks to BOOM - Analyst data!

A little while ago FragPipe overtook the always-amazing MaxQuant in terms of number of global users. While there are probably some level of error bars, such as maybe the old server I have offline because it doesn't have a CMOS(?) battery in it's motherboard so it thinks it is 2018 and exactly zero of my annual licenses on anything have expired (it runs MaxQuant 1.6.17, which was one of my very favorites). No Fragpipe on that, it has 32 bit Java.

That was a joke. AND 1) That probably wouldn't work and 2) No sane person would admit to that if it did work.

However, Fragpipe is a powerhouse today and Analyst has rapidly become an easy push button go-to for mid-stream proteomic data analysis. Here I'm going with mid-stream being "getting to a list of plausible targets of interest". And then I'm going to use downstream as "getting to the targets that likely explain your phenotype". I'm blasting FDA Omics Days on the other computer in the converted old garage where I'm definitely not housing some computers that don't know what year it is.

What if you're having one of these days...

..and you just can't find the energy to upload your FragPipe data and your burdensome experimental design into Analyst? You can still do it!

AND if you need some middle road rather than pushing the "open FragPipe Analyst" button, starting somewhat recently (I had a scan header thing for an in house tool so we stuck to a pre-20 version for quite a while) now the experimental design is already filled out for Analyst. This is assuming you filled in the box in FragPipe. Super cool, right?

Wednesday, September 11, 2024

The human genome and proteome are larger than we thought - with some caveats!

A whole international consortium got together in 2022 and found something like 10% more human proteins!

Does that mean that you now have a FASTA you can reprocess your data with and get like 10% more IDs?

...not exactly...at least not yet....but it's super cool! Here is the preprint!

Wow. That's a lot of names, including some of the wettest blankets in all of proteomics - "false discovery" this" "analytical metrics of precision" that - "standard pipelines and data storage types" on an on. Names you may not recognize are even worse - they're RiboSeq people.... (I wrote up some stuff on what Riboseq a few years ago here, if you're interested)

Please read the paragraph above in this voice, if you didn't already.

With the important stuff out of the way, what is all of this? Well, it puts into question how we build those nice protein level FASTA files everyone in mass spec based proteomics takes for granted today - until you don't have one.

In a nutshell, they threw out some of the assumptions and looked at a few billion human MS/MS sequences on ProteomeXchange that are from tryptic datasets. Billion with a B. And they looked at a few hundred million MS/MS sequences from HLA immunopeptidomics experiments. Honestly, I was pretty surprised there was that much HLA data publicly available. Y'all have been busy! There wasn't very much (good) stuff out there when I un-retired from science in 2018.

Have you ever 6 frame translated your own genomic data in MaxQuant? There is a little tool for it. And it defaults to something like 50 amino acids. What if the genomics people have also been doing something like that all along? Would you care? What use is a 31 amino acid protein? At 110 Da each that's only 3,410 Da. Cut it with trypsin once or twice and it is probably too small to detect. And you won't get more than 1 peptide for it.

Here is where it gets cool, though. For about 4 years people have been confidently finding surface peptides (MHCs or HLAs or NeoAntigens, whatever you want to call them) on the cell surface that map to genetic information that isn't in our FASTAs. There was a flurry of this in 2020-2022. In the study I know the best out of these Amol Prakash found over 700 that he was super confident about. And that was one of maybe 5 papers that dropped over this period of time where everyone was like ...ummm....WTF...?

And - get this - the RiboSeq nerds have been seeing the same thing. There are mRNA transcripts going to the ribosome - presumably to be ribosomed into chains of amino acids - and they come from regions of the DNA that are annotated as noncoding.

So these two groups worked on it for like 2 years and this is what they found - overlapping data supported by both MS based proteomics data in repositories and whatever stuff the RiboSeq thingamabobs produce.

And what did they find? I'm just going to take screenshots of the coolest stuff. I started this on my phone earlier today.

100 codons! Wait. That's 100 amino acids, right? That's not as small as my example above!

Boom! Add these to my FASTA! Let's gooooooo!

But we're not there yet. This is a cautious group. They first built some cool new resources.

Then they remind us (me? you?) that there are long established rules about calling something a protein that were agreed upon by the Chromosome Centric Human Proteome Project (you're using the SpongeBob voice now, right?) And that there is validation and other stuff. BUT - this is all still really cool.

If you find a section of the DNA turned into mRNA and hanging out inside of ribosome AND you find (probably a gross looking, immunopeptides or no fun) MS1 and MS2 fragmentation spectra showing that same sequence occupying one of the HLA things you pulled down - that's probably around somewhere doing stuff, right? AND what if you're being all nosy and looking in other people's proteomics data AND you see those peptides there?

Evolution is pretty stingy. It doesn't generally go out of it's way to make new mRNA and then put it in the thing that translates it and then leave it floating around for some nerds to detect, right? Accidents happen where there isn't sufficient evolutionary pressure to lead to the removal of things, but they are the exception rather than the rule.

Super exciting stuff, right?

Man, while I was looking for the preprint on my PC and this was half written, I found a much better breakdown of the study and results in the form of a Tweetorial. You can check it out here.

Did I forget to make fun of the fact they used the Trans Proteomic Pipeline thing? That is what they processed the data with. And the fun people at the Broad probably sequenced the peptides by hand with a ruler.

I'll leave you with one last screenshot from this super cool likely text book altering study.

(BTW, they're calling these cool things they found "ncORFs". They leave a lot of questions open to the community for how these should be categorized and dealt with, etc., but you'll have to go to the paper for those.

If you are new here I should probably clarify that me taking the time to poke fun at a study wherever I can is the highest form of compliment I generally can come up with. This study may contribute to answering so many riddles like -what are these other spectra? Why is our coverage of the immunopeptidome so abysmal? It also shows why we can't just target the proteome for every study - What percentage of the proteome do we even understand now?!? If these data were all from targeted experiments, we'd never know that the genome/proteome may be 10% larger than we thought. What other stuff is hiding?

I can't recommend this (51 pages???? WTaF?) enough.

Tuesday, September 10, 2024

MonoMS1 - Can you just identify peptides from predicted precursor/RT and ion mobility alone?

Didn't I just post an MS1 only based prediction paper? I sure did, but here is another!

Have you ever just thought we're doing a little too much stuff? How many degrees of certainty do you need that the peptide you're interested in is the one you're looking at.

MonoMS1 takes a step back and asks this question: If I have

Solid chromatography

High resolution ion mobility

And a high resolution precursor

Do I need everything else? Can I just predict where my peptide is going to elute, how many charges it will pick up and what the isotopic envelope - and here is the twist - predict a solid 1/k0 value, can I do proteomics?

What paper? Oh yeah - this one -

They model on HeLa peptides on a TIMSTOF Pro then they move up from an E.coli digest to increasingly complex samples - next serum - and then to single cells (by a reanalysis of this TIMSTOF SCP based paper). Someone has a big hard drive! It's a zipped terabyte of data!

In the final case the improvements are really interesting (from figure 4C). The MS2 identifications and MS1 predictions don't always agree, but the information they pull out appears complementary in downstream analysis

For fields looking at a whole lot more precursors than fragmentation events (still!) deep learning precursors might be a solid avenue to figuring out what some of this stuff actually is.