Thursday, April 17, 2025

An update on the proteomics of extracellular vesicles!

 



Are you also confused by what an extracellular vesicle is? And or why people want you to do proteomics on them? Did you get two collaboration requests this week to work on them while also remembering that you already ran 400 for someone and you aren't entirely sure where you put the .d files that you are supposed to be using to write a grant. 

This isn't me I'm talking about. There is no way a single lab move would make me so disorganized to have 20 cases of p20 tips and no way to pipette a larger volume without adding to a large list on my tablet that says "I owe Matt MacDonald lab..." And I absolutely have every file and every file backed up. It just might be in the hard drives with the p200 tips that I absolutely have on a shelf...somewhere.... 

To whoever you are you hypothetical person who needs to know about EVs I present this really nice and succint and new review



OH! And this is important! I just found this out and you guys should know this. 

There is a Qiagen kit out there for EV enrichment. The buffer that they use in the final step to lyse the EVs contains Triton X-100. I got suspicious when I checked the MSDS and it was essentially a blank document. Of course, a vendor won't tell you what is in their secret buffer, so I put in a help ticket and I asked - does this contain X-100, Tween-20 or any other scary detergents I should know about. And - yes - it contains a low amount of X-100. 


Wednesday, April 16, 2025

Deep spatial proteomics of ovarian cancer precursors!

 


Wow - this is just another stunning example of spatial proteomics and what it can do


Highlights? 

The cuts are made with a little Leica LMD7 system which can drop cuts directly into 96 or (as used here) 384 well plates. These smaller systems appear a bit more approachable, but might not have the resolution to do some of the really precise cuts (?) I'm not sure.

Here the regions that were excised were in the 50-100 cell range and guided by histological stains (I think? not my field and I can't spend all morning on it).

Cells were digested in plate - ACN first to lyse - that's dried off then LysC then trypsin. 

The cleanup was done on EvoTips and analysis was on a TIMSTOF Ultra(? I forget, but pretty sure) using the EvoSep 30SPD method.

Analysis with DIA-NN allowed them to quantify 10,000 protein groups?? From 50-100 cell cuts! Just stunning work I can't wait to circulate to my colleagues this morning! 

Tuesday, April 15, 2025

AI IN PROTEOMICS! An ABRF / US HUPO Crossover event is go!

 

Yeah! It's happening!

My favorite session from ABRF 2025 is now open to all you bums who weren't heavily exposed to second hand smoke for 4 days in Las Vegas so you could see it when I did.

You can register for it here - and, like all US HUPO webinars, it'll be recorded if you've already got plans or are in a suboptimal time zone.

Monday, April 14, 2025

Pirates of the proteome!


Too funny and well done to not share! I'm going back home to Baltimore, Hon!  

True story, I'm somehow presenting exclusively in the mornings so I don't even think I can stay at my house because I-83 in the morning isn't the most fun you can have. 

Sunday, April 13, 2025

Multiplexed method allows hydroxyproline enrichment and quantification!

 


This is a really interesting approach - and application! Ever thought much about hydroxyproline? It looks like it might play a role in the progression of prostate cancer! Bonus - 12 plex DiLeu labeling makes an appearance! 



Saturday, April 12, 2025

A tissue proteome specificity map that is actually good!

 


I am so excited by this great new resource!  



And not just because the most recent big paper purporting to do such a thing (this one - still out there and still misdirecting people!) was such a pile of garbage! 

In this new cell paper, this group did the work to FIND PROTEINS that legitimately appear to be organ specific! 

Don't want to read and just want a list of proteins that are -by protein - and by RNA - basically almost entirely produced by specific organs? Check out their nifty little portal

 Their RAW data is publicly available? What?!? What an amazing concept! They actually compare data from protein level and RNA measurements before claiming a protein in plasma originated in said organ? Amazing. For proteomic nerds - the organ data was acquired by DDA on a QE Plus running nanospray on 2 hour gradients. The plasma proteomics was done by DIA proteomics on a QE HF-X. High accuracy high precision measurements. I'll dig into it more soon, for sure, but it's absolutley worth checking out now. 

Friday, April 11, 2025

Is there anything we CAN'T do proteomics on today?

 


Another self-serving blog post but I'm so pumped to see this out! 


About 2 years ago I was asked to review a paper for some journal and my peer review was so positive that the editor asked me if I wanted to write a commentary on it. 

I wrote the most over the top commentary - about the paper which was on  MANGOS - the fruit. Because that group somehow did great quantitative proteomics - and glycoproteomics - on mangos (the fruit) sitting on a shelf to figure out why they change when they ripen.

Why was I so impressed? Because the available protein sequences for the mango suck. And - it's this thing, right? We can only do proteomics on organisms that we have really high quality curated genome sequences for! Right? 

Okay - so then, concurrent story - I had to kill somewhere around 70 very poisonous black widow spiders who took up residence on my property. It was not cool - at all. 

Like this grumpy asshole


and these violent little jerks the one above was so grumpy about my murder plans. 


and this asshole -


and this one 

I could keep going. Searching "spider" in my ICloud thing will keep you very busy for a while.

Okay - so the big ones are not only super poisonous but they are creepy and they'll do this "I'll roll up so you can't see me - oh, that didn't work because I'm black and RED? Time to run right at you." Particularly if they're protecting an egg sac or a bunch of babies that just came out of an egg sac. BTW, they hatch in groups of 50 or so and then they play a little game of survivor so only the meanest ones get out. They're awful. 

Tarsh Shah was working on his Master's at Hopkins and was looking for a cool thesis project. He dropped by my lab right after I'd lost an amazing PhD student to her graduation and I had a free lab bench. I had a couple ideas for projects and this was one (a drug analysis project is also on my desktop somewhere). 

Colten, Hannah and Ahmed trained Tarsh on proteomics prep and data analysis and Ben Neely -in my opinion the world's #1 expert on doing proteomics on under-studied organisms - provided invaluable support and advice for working with organisms with almost no annotated protein sequences. If you want to do a study on something where you can't download a good UniProt protein database for - 100% start with Ben Neely's blog

This wasn't a funded project, so S-Traps and EvoTips were donated or scrounged and instrument runs were performed on weekends instruments weren't in use on real projects. 

Concurrent with this work - the somewhat closely related Western Black Widow Spider had a genome assembly pulled together. With even more help from Neely and tools on his Github (thank you!) we followed his instructions and were able to make this into a FASTA file as well.

To be clear - UniProt had 140 proteins from any black widow spider when we started out on this. There are 53,000 spider species. Scrounging RefSeq got us 14 - and when we entered all these sequences into SpectroNaut for it to generate spectral libraries for DIA analysis - I sorta expected 400 proteins and a student who now knew how to do S-Traps and load EvoTips and evaluate a mirror plot of mass spectra?

Regardless of what spider FASTA we used we could get over 2,000 protein groups! WTF, right? Not that long ago I was impressed when I got 2,000 proteins from human materials. When we used the Western Black Widow FASTA we got 5,500 or so! 

Now, you can totally just get a bunch of proteins, but how do you know the deep learning neural network things didn't make them up? We started hunting proteins that made sense - like what protein should be in a spider head? No idea. But can we find one and do we only find it in the head? Yes. But Tarsh found some cool papers about how black widow spider toxins work. From his interests in pharmacology and drug functions (and possibly some ideas about how we might learn from these toxins)?  Including a recent study that showed that small spiders only produce toxins for sedating insects, but big mean spiders produce different toxins for murdering amphibians and different ones for murdering mammals. So we looked for those. And - boom - that little spider I was able to capture intact had almost no toxin expression. One of the big mean ones? Toxin proteins EVERYWHERE. Proof of concept, right? That's the figure at the top. Red is high and blue is low. Regardless of what spider database we used, if we filter on toxins - What's fun is that regardless of the FASTA used, we see those same trends. 

Ultimately, this started with something bad that happened that I might have nightmares about for the rest of my life. But we hope it ended as something inspirational - like maybe we can do good proteomics on just about anything, even if that organism doesn't have a beautiful FASTA library on the easiest-to-access websites? 

Thursday, April 10, 2025

Preprint compares 7 plasma proteomics methods - and doesn't get a single good correlation!

Title is overstated, I guess, there are some "strong correlations" but not the ones I really care about. SomaScan 11k vs 7k get a strong. Same with O-link 5k vs 3k. Olink 3k vs SureQuant gets a barely, but technically strong. I'm still weirded out by this preprint though. 

Huge shoutout to Cliff for bringing this preprint to my attention. Now -- I'm all hopped up on DayQuil, but Figure 2B has caused me to both laugh and have chills of terror (or DayQuil). 

Before I get too far into what I think will be every plasma proteomics person's favorite preprint of 2025 - this is where you find it. 


For a reminder - this is a general breakdown of what Spearman Rho's are - 


(Stolen from this place

The methods section in the paper is really well written. This little company has access to lots of plasma and they sent some to the amazing Biognosys facility and they sent some to SEER and they did O-link 3k and 5k and SomaScan 11k and they did SUREQUANT. 

And.....honestly I'm not sure what to think, but this is where I'd start.

I've never done SureQuant but impressions from friends who use it in their core is that is pretty damned accurate. You buy kits of spike in heavy peptides and you use those to trigger quan on an Orbitrap, right? Slow - high precision. That should work well.

The Biognosys prep appears to have been a classic Top14 depletion ---and SEER - is SEER. My impression is that they know what they are doing on those Astrals. 

The fact that there appears to be basically no good correlations on any of these platforms is striking and weird and worrisome and ultimately not at all what we've seen from others. I'll probably come back to this.

Monday, April 7, 2025

The Single Cell Omics issue of JPR is out! Some papers feature actual single cells!

 


Wooohooo! We knew this was coming for a year, but none of my stuff got accepted in it, which is totally okay. We rebundled a couple for Nature family journals where they'll probably get in. 😇 I did get to contribute by reviewing 3 or 4 of them. 

Like most single cell proteomics papers out there, it's mostly perspetives and highlights and reviews. 

Actually - this is funny - let's do it this way - here is each paper in the issue and the number of single cells that were analyzed. It's actually a whole lot better than I guessed! Out of 12 articles 7 of them acquired single cell data! 

Take these numbers with a grain of salt - most single cell studies go to amazing lengths to not let you know how many cells that they actually analyzed. For some of these I had to go to the supplemental excel tables and for one I gave up and counted and recounted the number of dots on various U-MAP and T-SNE figures because I was too lazy to go to ProteomeXchange and count the .raw files. 


Now - don't twist this - there is some great stuff here - but there is a misconception out there that because there are lots of single cell papers out there that every lab is doing lots of single cell proteomics or mass spectrometry. The perception that there is lots of data out there in the world makes it seem like it's easy. That means people who have never done it can get grants to do it and will spend the first couple of years trying to figure out how to do it. 

Again - there is great stuff here. I haven't read it all, but I've read about half of these to some level and I'm already integrating findings here into my personal work. 

I can't follow the scSeq meta-analysis papers - and they're just not in my area of interest in applying scSeq to my set of biological problems and interests right now. 

The MALDI mass spec of the funny fish cells looks like an awesome study. Those fish cells might be bigger than frog eggs (200ng of protein) I honestly don't know, but 6000 single cells by MALDI is a fun read, right? 

I've cited the EMT preprint at least 3 times already - it's a lot of fun, and I had an entire session that I named after the Payne lab perspective paper. The use of LysC rather than trypsin to simplify low input samples was featured on this blog last week and I just got a bunch of LysC in to see if their findings will translate to actual cells. Single cell lipidomics on the TIMSTOF is something that we've already looked at using this pipeline and I asked someone to buy me the software already. How you store single cells might have already been posted here - but it's 100% absolutely worth thinking about right now. And I've got to get cells to a collaborator and we're discussing whether we should fast fix them in formaldehyde now. 

That was...a....rant..... but you should check it out! 

Sunday, April 6, 2025

metLinkR - can we finally translate all metabolite identifiers??

 


I don't have time to dig into this but I will have to. Do you remember all the IUPAC stuff they lied to us about in undergrad? Like this molecule has this name because it's all been standardized? All lies. Metabolomics people spend as much time making up new names for old molecules as proteomics people do making up spectral library formats. 

Trying to link those data back to a molecule should be easy, right? I've been assured by many people that no. Absolutely and completely not easy. Go check out the Fiehn lab Abacus stuff. You can do it one at a time but it is very hard to do a bunch of them. Maybe this will help??



Saturday, April 5, 2025

TIMSTOF Ultra2 is up and running - first impressions!

 


I'm only up to 300 LCMS injections but I'm at least at a point where I have first impressions of the instrument. This is a new TIMSTOF Ultra2 and refurb/demo EvoSep One System. 5 years ago (!!!) I had impressions of the Flex, my first ever Bruker instrument. A lot has happened in 5 years, and these things have continued to mature.

First off - the source is so much better. I won't mince words (is that a thing? mix?) - I hate the classive CaptiveSpray source on the TIMSTOFs. This was the source I had on the Flex and on the TIMSTOF SCP. It's fragile, has too many pieces and is too tricky for people to put together correctly repeatedly. Since this post is mostly positive I will share something negative and funny. The old CaptiveSpray source is such a pain in the butt to put together that - not even kidding - an extremely capable Bruker field apps person once left my lab with one put together wrong. That's a pretty good sign you need a new source, right? 

The Ultra2 has a completely different source and it's a night and day difference. It isn't simple, you remove the captivespray and brackets separately, but unless you try to do those things backward there appears to be little chance of doing it wrong. The pieces are largely steel (with the obvious exception of the silly glass capillary and gold o-rings) but you don't need a checklist and the hands of a surgeon to put it together. Huge upgrade. Is an EasySpray source easier? Absolutely! But this is a tremendous upgrade. 

Huge surprise for me - the PASER and HyStar integration have come a long long way. It's now called BPS, which I can't even guess what it stands for, but if you also have a PASER sitting in a corner that you stopped using when you started running DIA you might want to revisit it. 

BPS not only runs Bruker's version of DIA-NN but it also can integrate TIMSScoring and - get this - it'll run SpectroNaut! You can still run stuff from the command line through PASER so this is how mine is set up now - 


It's currently triggering stand-alone DIA-NN 1.8 to run on the data acquisition PC and it is running SpectroNaut (let's call it SpectroNaut Lite) on the PASER PC. I'll move it up to a new version of DIA-NN this week. Running the GPU version of 2.definitely results in more IDs, but 1.8 and SpectroNaut seem to be in better agreement (again - very limited run time). 

Worth noting - SpectroNaut lite will make you a .SNE file AND you can go back and merge .SNE files into reports - but if you want to look at your data (which is the whole reason I pay for SpectroNaut) you need to pay for SpectroNaut. I got a fantastic discount on full SpectroNaut for being a BPS user, which makes it the least I've ever paid for a SpectroNaut annual key. I'll prioritize paying for that before the tariffs kick in. 

How are the results? Unreal. I had an SCP but it was one of the very first commercial models. I don't know if the reason this is a bigger jump in my hands than the Flex to SCP is because my SCP was an early model - or if this is just that big of a jump. 

I wonder if the jump from the Fusion 1 to Fusion 3 is a good analogy? Or the jump from a QE to QE HF? Probably. That's not to say the SCP didn't get great data. I'll eventually have a couple papers out from that system. But if you needed my Ultra2 right now and offered me two SCPs right now that for some reason couldn't be upgraded to this one - I wouldn't take it. Here is what is running off the system this weekend (again - only 300 injections). 

These are in reverse order. Definitely ignore the sample names. There is no way I had a retired chromatographer help me come up with a column compatible with the 80 SPD  that has 90% more theoretical plates than the one that is recommended by the manufacturer. I'm not going ot get in trouble because separating peptides on a 5cm column is silly. 


These are in reverse order, so 19 was my first 2ng injection on my way out the door on Friday. 31000 peptides from 2ng is nuts. But it seems to stabilize at 40,000 peptides and 5k protein groups. 200pg seems to hang out around exactly half of that. 

Samples 9 and 10 were where I ran out of 200pg peptide sample. That's what I made the most of and I have trouble estimating from the volume in a 1.5 mL tube how much I have left. So my blanks are blank.

I'm so so so so pumped with this system so far. 

Another big huge jump for the system is that I can tune it from TIMSControl - I don't have to switch to the antiquated OTOFControl to tune my resolution and TOF sensitivity then reboot my PC 11 times to get it to recognize TIMSControl again. I can do the tuning directly in TIMSControl. 

Where are the problems? Compass Data Analysis. For Thermo users imagine that you had a 16-bit version of Xcalibur that works just great for a TSQ Quantum or LCQ and seemed to struggle with LTQ but you're opening Astral data with it. Are the functions still there? Sure! Can you find them? Maybe! Can it do it fast? Ummmm.....no...not fast. And then imagine that if you wanted to extract an XIC on another PC then you had to buy another license of it. Is it that big of a deal? No, but it only feels right to complain about something while gushing about the single biggest purchase of my career. 

Until people join the lab (this summer, I think!) I'm not going to play with it much. I'm prepping single cells almost entirely label free (largely using the One-Tip method by printing cells from a Tecan Uno into EvoTips - though I'm having some problems with some cells that I'm working through). My plan is to spend 1 day every 2 weeks sorting/prepping single cells. Then they run on the EvoSep through until the next week. I can easily do 600 cells and by 40SPD that's 2 weeks of run time. Then I can spend my time writing grants, papers, meeting faculty, writing hiring exemptions -and other PI stuff. When the first people join the lab we'll have Bruker out to train them and I'll ...ummm.... pass off the single biggest purchase of my career...off to those super smart and capable young people... mostly. I have a some dumb ideas and I should probably do them myself! 

Friday, April 4, 2025

SimpliFi is live now! (Commercial cloud software) for data interpretation and sharing!

 


Disclaimer: I've been a long time alpha/beta tester for this commercial product. That ultimately means that I've been using this Cloud based tool kit for free for years and occasionally providing useful (?) feedback to the developers. 

They've never once asked me to blog about it but I'm about to stop being a freeloader and buy an annual license and some credit hours. Also, it might have been live for a while and I only just discovered that 1) it was and 2) That $800/year and $0.20/credit is something I can afford (academic pricing?) For most of my stuff the $200/1000 credit hours goes a long way. 

$800 puts it just about the same price for a big group negotiated bundle deal is for Ingenuity. I've paid less each year for Ingenuity, but then I've only been able to log on super early in the morning because we had limited licenses. I like this so so so much better than Ingenuity. 

SimpliFi, however, is designed for proteomics (and metabolomics and can do transcriptomics) but I've only ever pushed the one button. 

Why I like this? It's smart and simple. You just load your CSV or TSV or Excel or whatever into it and then it can generally recognize exactly what you're looking at. It says stuff like "this is your accession column and I think these are your quantification columns" if it is wrong or you want to ignore a sample, you just un-highlight them. 

In my opinion, the figures are also publication ready 


And biology is easy to get to (in some cases, of course) - in this one my drug definitely screws with the nucleus (found that out myself, but it's cool that SimpliFi would have found it had I initially used it) 


Also! I can load data into this and make a link and send it to collaborators and they can just dig through their own data themselves! That part doesn't use credits - only the data normalization, clustering, that sort of stuff, and if you're doing small n experments it doesn't cost a lot of them.

Why you might not like it? 

When it is detecting batch effects - I don't know how - there isn't a paper (yet?) When it is normalizing your input data - I don't know how it is doing it. When it is looking for run order effects (like your signal dropping over time?) I don't know how. If you don't like black boxes. This isn't for you. 

It also might cost a million dollars/year for industry, I don't know. 

www.simplifi.protifi.com

Thursday, April 3, 2025

Deeper spatial proteomics with MALDI and collagenase digestion!

 


MALDI mass spectrometry is beautiful and can have really impressive spatial resolution these days, but a single spectrum can only look at so many things at once. Even if you had an amazing ion capacity and dynamic range, once you divide that by thousands of tryptic peptides (and matrix) ions that are around you're not going to see much past the absolute highest intensity stuff. 

I think the very best we ever saw from a single MALDI shot in Namandje's lab was 100 peptides(?) and I think reasonable FDR wouldn't have been so kind. That was also a very large sampling size with FTMS readout (high resolution and high capacity but low dynamic range - sort of averages out). 

What if you could simplify your proteomic matrix so there was just less peptides hanging around? We've seen some interesting stuff recently for single cell loads where bigger peptides are better. Sounds like MALDI is something that could also benefit. 

What about collagenases proteomics? 

What? 

Yes, collagenase. The stuff you use to rapidly extract DNA from tissue for quick genotype tests? Yup! 

Unlike our friend trypsin that cuts at K and R and makes nice medium sized peptides, this protein is a lot pickier. It cuts at G-P-X domains - and while I'm very unclear on whether this would be helpful outside of regions where there is lots and lots of collagen - this study focused on the tricky proteomics of Extra Cellular Matrix ,or ECM (which appears to be lots and lots and lots of collagen).

Cool - so how on earth do you analyze peptides produced from this weird enzyme off of a MALDI spectrum? You can make a ridiculous number of guesses - or - you can do LCMS and use a lot of standard proteomic tools to understand the peptides and move backwards! 

This study was a lot of work, btw.... LCMS is used to understand the peptide sequences including where and how they charge and their ion mobility - then the LCMS is used to inform the peptide picking from the MALDI. 

End result? They analyze some patient FFPE tissues at 20um resolution and come back with hundreds of peptides identified by MALDI matching. When compared to trypsin collagenase helps them identify nearly 2x the peptides in the ECM and digesting directly off of tissue slices for LCMS is way more relevant than in solution digestion. There is a lot of biology here that they seem excited about that is outside of my wheelhouse, but there is some neat stuff here because the collagenase peptides often +1 charge in ESI and MALDI so they're straight-forward matches. 

Ultimately, sometimes MALDI papers seems like pretty pictures and not a whole lot else, but this is not one of those. This looks like a really innovative way to get completely new insights from those FFPE blocks. 


Tuesday, April 1, 2025

It looks like Lab Developed Tests for diagnostics are back on the table in the US?!?

 


For an old post on what a Lab Developed Test (LDT) is vs an In Vitro Diagnostic (IVD) you can go here

And...in what was an altogether extremely bad day for the FDA with thousands of people finding out when they tried to use their badges and they simply didn't work - they also lost a federal appeal in this recent ruling.

The American Clinical Lab Association sued to overturn FDA's new rules for moving all new (?) or all (?) diagnostics to IVD designation. So....for those of us who put Aim 4 of our grants things like "and then we'll move this to an FDA approved medical device LCMS system..." we don't have to come up with something more clever to write for how to translate our findings. 

To the thousands of HHS/NIH/FDA employees who just found out they were cut by a heartless and misinformed administration, I'm sorry and I hope you can continue making the world a better place in a better role for yourselves. 

Monday, March 31, 2025

Two people I know are in ScienceNews talking about AI in Proteomics!

 

Yeah! Proteomics is bigtime! 

Check out this 1) Instanovo is finally out and 2) 2 people I know were interviewed about it and they seem to think it's smart! 

Instanovo final paper here


omicsGMF - Why I'm going to have to install R Studio on my new laptop...

 


You know - I was really pumped when my kid's puppy was like "HEY HEY HEY GUY I HATE, WAKE UP OR I'M ABSOLUTELY GOING TO TAKE A DUMP IN YOUR APARTMENT!!" 

So we walked in the pouring rain without either of our coats until she found the absolute perfect place to poop about 3 blocks from our apartment at 4am. 

And then my Inbox was like - "HEY HEY HEY GUESS WHAT! YOU'RE GOING TO HAVE TO TOTALLY INSTALL R STUDIO ON YOUR PERFECTLY OKAY NEW LAPTOP!" 

And here is why I have to stop putting it off....


I don't have an accessible (free) solution for large scale batch effect correction right now. Do you? I guess I can MSStats it, but a long time ago I realized I'm probably just not smart enough to use it. omicsGMF does require me to fire up R which I do have an official certificate saying that I took a bunch of classes in. But it looks to me like I don't have to think about it after I do. It looks like it is smart enough to apply the corrections if I just get everything formatted the right way. 

Now - the upside is that as long as I don't try to rollerblade to work today (Pittsburgh pavement is tough to predict when it's wet) my day absolutely has to get better! 

https://github.com/statOmics/GMFProteomicsPaper

Sunday, March 30, 2025

Selected knockouts of the HLA / MHC presentation system!

 


Forwarding this one in just a second for sure! 


Most of what I know about the HLA/MHC immunopeptidomics / neoantigen presentation system comes from this amazing old paper in Cell Immunity. 

The mechanisms for processing and presentation are largely inferred from the large number of high confidence peptides they identify - but again - this is inference.

If we really wanted to understand how this system worked could you just knock out each protein along the way and do proteomics and immunopeptidomics? I mean, that's what a lot of people would tell you to do and that's what this cool new MCP paper does. 

They knock out 11 proteins in the system and there are 3 or 4 that cause big systematic changes in what peptides are expressed on the cell surface. The ramifications are probably beyond me and absolutely beyond what time I've got to think about it today but some collaborators are going to totally dig all of this new insight! 

Saturday, March 29, 2025

MASSIVE.UCSD links are all updating! Here's how you find your data!

 


I was freaking out just a little late last night in lab. You ever get those 100GB warnings from MASSIVE and then the time limit goes by and you're like "false alarm, nothing changed?" 

If you're over your FairUse limits with Thermo RAW files those files will disappear.

With Bruker .d file format you have embedded folders. Those folders stay, they just get emptied out. For real - it'll look like you still uploaded like 800 .d files and they're still there - but they...aren't....

As I was trying to figure out what I still had backups of -and what I didn't - none of my FTP links that worked before ABRF seemed to work now. 

Right now you just need to add the -ftp into every address.

For example - Proteome Discoverer files I need to answer a reviewer's question were

ftp://massive.ucsd.edu/v06/MSV000093434/

And the webportal version thinks that's what they still are - 


But they're actually at 

ftp://massive-ftp.ucsd.edu/v06/MSV000093434/search/

That's probably all stuff that will be fixed at the end of the migration. You might need to update some reviewers if you told them they could find the stuff at the former, but it's now at the latter. 

Friday, March 28, 2025

Another...unbelievable.... single cell proteomics study - in situ - even....

 


Would you believe you could sample single cells directly off a plate and lyse/digest and transfer that cell with virtually no losses? 

No? Sounds sort of unbelievable and there is some "extraordinary claims require some amazing evidence" or something.  

Would you believe over 3,000 proteins per single cell with a TIMSTOF Pro 1 system running a 21 minute active gradient on those same cells? 

No? Would that immediatley make you think -wow - that should have ultra super fucking amazingly extraordinary goddamned evidence cause I don't think I've ever gotten more than 500 proteins on a very similar system regardless of how long the gradient has been -even on a cell line I like a lot that has a higher protein content than either of those? 

Would you be really annoyed that those instrument files weren't provided? 

What if when you follow a bunch of links to the Electronic Supplemental Information where you were promised access to the data you instead find an Excel spreadsheet with 20 columns  (10 cells each?) and 2150 rows for protein values and not one single missing value? 


In SINGLE CELLS PROCESSED IN SPECTRONAUT? You can't get 0 missing values if you had 2 ug of each of these injections, yo.

Would you start to wonder if there is actually something of a link between some groups and their ...unbelievable...advances in a new field and the fact they only seem to publish in places where providing your data isn't required? Maybe I'm just a jerk, but this study seemed so cool until about page 3. 

I'm not linking the study, but if you see it, I'd suggest you crank the skepticism up to 11 or so. 



Thursday, March 27, 2025

PCA-N -36,000 human plasma proteomics samples by mass spec in 311 days!

 


WHOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAAAA!!!!!

Dataset for reanalysis alert!! 

EDIT - EEEEEEEEEEEEEEEEEEEEEEEEEEEEKKKKKK! I can't find the data files. Contacting authors now. 

I have some 100SPD files from Matt Foster's Astral and I think they're like 8GB? So....8GB x 36,000? Is that 288 TB? It might just be uploading and they wanted to preprint something this year? 

Mann lab optimized perchlorate based enrichment to get a super cheap sample prep method giving them about 2,000 proteins per sample in human plasma. Then did 36,000 of them?!?! No nanoparticles? Off the shelf stuff? I need to read this, but - whoa. 






Wednesday, March 26, 2025

Artificial Intelligence (AI) in Proteomics - at ABRF 2025!


Artificial intelligence stuff has come so far in such a short time. Just last year, a sick 3 year old and I could not get either of the ones I pay for to generate an image of a turtle swimming in ketchup. And - today? Check that out! Totally did it! The future is now! 

Okay - so in what might possibly be slightly more useful - what about a killer session at ABRF 2025 on AI in proteomics? This featured two young scientists who actually know what they're talking about. 



Sebastian Paez from Talus bio - who went through where and how we're using AI (surprising places! like inside some of the instruments?? what??) and how our "raw" data has already been manipulated a whole lot. Right now we use our old fashioned search tools and then we go back with AI learning machine things and clean them up. He emphasized new tools where the smart computer is on the front end rather than cleaning stuff up. He also showed a cautionary tale where a modification searched in closed vs open search resulted in completely different results. 

The session ended with Justin Sanders from the Noble lab at University of Washington giving the best description of how Percolator (and derivative tools) work that I've ever seen. Really cool stuff in the context of where these things succeed (how we know) and where they are still not very good (PTMs).

A big surprise in this session was Graham Wiley at the University of Oklahoma. My picture from that was even worse than the ones above. My wife ordered me a new phone this morning. The battery couldn't survive a 5 hour flight home on airplane mode which made finding an uber at 2 in the morning a lot of fun. 

Dr. Wiley is a genomics guy who is doing tons of proteomics now thanks to O-Link technology. Like the really good stuff (5,100 proteins). While the sesssion was AI, as a side effect, I couldn't scribble notes fast enough. 

Like every question that I've had about -

How much does it cost? (If you've already got an Illumina Nova) it'll add about $350k in mandatory robotics. (List price, so probably a lot less). 

What's your throughput? One tech can do about 172 samples in a day (sample prep side) with all the nice robotics. 

What's it cost per sample reagents and labor? His group (which is a certified global service provider) is running about $406/sample in costs. 

What are other limitations to consider? Runs on the Nova thing are very sample specific. Like you can't fire the thing up with plasma in some wells(or equivalent?) and cell lysate in the other. You need those runs to be separate. 

A big advantage for his group has been using the nice robots for other purposes when they aren't running these preps. Again, this seems like I got distracted, but it was a super valuable addition for me for this cool session!  If you're interested in the deep O-link stuff you can find more about his lab and services here

Tuesday, March 25, 2025

LimeLight! Share DDA proteomics data if you can install a docker!

 


Okay - so this is really cool as a user - even if I feel like I'd only ever use it to look at someone else's data because I don't think I meet the minimum requirements to share my data. 


Getting an account and digging through the example data is really intuitive. I'm all about more transparency and so it goes here. Maybe I'm sleepy and moving fast but when we get to a bunch of prompts to install my docker my brain sort of checked out. It seems like every time a software has a docker requirement that means I can no longer look at MALDI imaging data on my PC because I messed up something in software I paid for. On new PCs here I don't think I have the permissions necessary to actually try (which is probably a good thing). 

For real - I'm probably just old and cantankerous and just can't keep up with all this new fangled stuff. But - I tell you what - if you put your data there and I'm the reviewer I'll totally dig through it on LimeLight over downloading all your files locally and digging through them. It looks like it can answer most of my first questions (which is totally a good thing!)