Wednesday, February 13, 2019

Ever had trouble transferring RAW data files and getting them to work??

Check out the bottom part first. Have you seen this?!? I feel like I have...or....

But the upper part of this awesome data transfer Twitter discourse is what's super cool. Phil wrote a python package for data transfer and quality measurements of said data transfer!

You can get it here.

Proteomics provides a way of assessing the health of fish!

(Photo borrowed from Hanover Koi Farms )

I like the advice in the image. But how do you tell if a fish is sick?!?  

I often can't tell if Miss Puff is alive or not unless I pet her and she opens her eyes. I've spent the majority of my life with dogs. 


Eyes open? She's alive! 

(Vet says she's surprisingly healthy for a dog who has outlived several previous owners)

Okay -- so what about FISH!?!?  At first glance you'd assume they're dead, or at least drowning, since they aren't coming up for air all the time, right? 

And we also probably don't want to eat sick fish. And we don't want sick fish making other fish sick. 

How straight forward is that?

How ridiculously commercially valuable could that be?? Particularly as aquaculture continues to pick up the slack for our over-fished, heating, and plastic filled oceans!?!? 

Tuesday, February 12, 2019

Sneaky MetaMorpheus update!! LOW RES MS/MS enabled!!!

If you haven't noticed, my favorite tools these days are MSAmanda 2.0 (CharmeRT enabled so I can second search each spectra -- and then look at it in PD or MS2Go) and MetaMorpheus. We're generating new conclusions and new inspiration from the two of them all the time!

The drawback of the two is that there is still a lot of low res MS/MS out there. PRIDE is full of it! So are like 15 hard drives here! Both engines can process these spectra, but neither is ideal for it. It totally makes sense. If you're trying to confirm PTMs, you don't want to do that in ion trap MS/MS if you don't have to, right? It's so much easier and more confident in high res. (Both engines do more than this, of course, but that's primarily why I'm firing them up.)

However, if you've got 900 low res MS/MS files that you've acquired over the last 10 years or so and you've been dying for the software to get better so that you can look indiscriminately in those files for a ton of different PTMs in it using free should be VERY interested in the new MetaMorpheus update!

If you don't have the software -- you can get it here.

P.S. What the heck is dioxidation on Tryptophan?!?!?  And why are really nice looking spectra appearing as super abundant in these weird cells with the metabolic defects?!?!? 

Monday, February 11, 2019


I'm not a visual learner. I still -- to this day -- have no idea how anyone interprets anything from a 3D ion map. It all just looks the same to me, even when other people can point out real data variations in color and density and whatever. When I was younger and cared more, I'd nod my head a lot and say "I see..."

(Found by Googling "serious intellectual pug" -- I know you were wondering)

Let's find all the PTMs!! Let's do it because this file shifts like crazy compared to this file! And let's do it with DeltaMass.

You can get it here (you can get it with Java)!

Sunday, February 10, 2019

TABULIZER -- Pull TABLES out of PDFs!!!!

Have you ever pulled a great paper like this one and been super excited to replicate the data, but are shocked to find:

1) A big table with 510 exact masses for EXACTLY what you want to target? Thank you, awesome authors for doing this incredible amount of work. I'd sure hate to even type 510 exact mass targets into my instrument method window, or TraceFinder, or Skyline, or whatever....
2) There is no table of the data in the supplemental info?!?!?
3) You've got to be shitting me. There is no way the journal doesn't have a supplemental table with the exact masses of 510 compounds. What... the.... actual...

Guess what! Now you can channel rage into the 80 pound heavy bag you just installed in your office (or, more appropriately, into learning how to use R....) because TABULIZER can take tables from PDFs and pull them out so you can use them! (with R...)

You can get it through CRAN -- or the Github thing is here.

Saturday, February 9, 2019

Please participate in this cool survey on free proteomics data processing software!

Hey! You! You got under 30 seconds to answer a two question survey?

Check it out. It's here!

Yasset's gathering information. And that has never, as far as I know, turned out to be a bad thing for our field.

You'll note HIVE is not listed on the survey. If you are using that, skip the survey and email me directly. I'll put out an S.O.S. and try to find help to extricate you.

Friday, February 8, 2019

Doing MHC/HLA peptidomics? Please stop if you aren't using charge based mass filters!!!

Peptidomics or MHC or HLA profiling is all the rage right now. And for good reason. By figuring these things out there are all sorts of potential drug targets, therapies, etc.,

I'm going to screw up the nomenclature here. When I say "HLA peptide" I mean the endogenous peptide that is held by the HLA protein on the cell surface as shown in the HLA picture near the bottom.

I've rambled about these things on this blog almost since the beginning, but I'd always looked at it from the data processing side until recently. Our group got a cool grant this year on it and we've looked at it in detail and even hired a great young scientist to specifically work on just this.

We've downloaded just about every HLA study from the literature we can --- and -- wow.... there is loads and loads of wasted cycle time. In fact, in a lot of the studies we have in hand, the majority of the ions selected for fragmentation can not be HLA peptides. Like the picture above (files were from a paper in a journal called Cell? or something). We're writing this up and have proof, but one of the other Senior Scientists and I talked about this and decided we needed to get these ideas out to people ASAP. This is too important to wait for months of peer review. Here is the summary :  If you run HLA peptides the same way as tryptic digests (and, especially, just adding +1 peptides) you are doing you and your samples a disservice.

There are really easy ways to improve this, btw. If you've got a Fusion system, feel free to stop reading this and go to this paper.

After processing like 12 data sets and running around screaming -- I was thrilled to find out that there was a recent HLA study where someone optimized the mass spec for the targets they were trying to identify. It's that one from the Elias lab, and I'm thrilled that we're all working on the same project. This is the highest number of HLA peptides ID'ed/unit time of any study we've analyzed so far. (Having a Lumos doesn't hurt, but the advanced instrument logic is the star).

Let's think about these peptides for a second. This image was put into the public domain by Pdeitiker and is on the wikipedia page.

These peptides are loosely(?) packed into a pocket on the surface of the cell. As far as anyone can tell they're held in this 3 dimensional thing and they have extremely tight size restraints.

If you look online for information about these things you'll get to visit some amazing websites. Bioinformaticians of various level of skill have been working really hard with 3D modeling, fancy sounding statistics things, and what appears to be the AOL webpage editor to make the amazingly inaccurate predictions of what peptides will be "presented" by the HLA proteins. However, if you look hard enough you can find huge spreadsheets of all the actually identified HLA peptides. (Ben: Add link when you remember)

I found this one spreadsheet that has 156,000 identified endogenous HLA class 1 peptides (again, nomenclature) in it -- and this is the distribution pie chart from it:

Over 122,000 of them are 9-mers. and the next 18% are 10-mers.

Yo. Don't fragment something that has an uncharged mass of 4,000 Da!!!! It ain't what you're looking for.

So -- how do you set this up, fancy pants?

With size and charge based filters.

This is how Elias lab set theirs up. If it's a +1 and it's 300 m/z. It isn't an HLA peptide. Don't fragment it!! If it's a +3 and 300 m/z, SURE!

If you're asking -- what about me and my blue collar Q Exactive.

You need 2 scan events. I swear, it totally pays off. Check out the figure at the top where a QE spent >60% of it's time fragmenting things that are smaller than 8 amino acids and bigger than 14.

This is what you can get with 2 scan events in a QE

You set one dd-MS2 event that is only allowed to fragment +1 peptides. That needs to be a high mass range. For example only scan from 700-1600 or something. Trigger only on the ions that are +1.

Then put in a second dd-MS2 can event that runs from 300 up to 800 or so. Then only allow +2 and +3 peptides to fragment.

I have pictures. I can't seem to find them.

I promise, though, that the time you lose in adding an extra MS1 scan isn't anywhere close to a loss of 60% of your cycle time because you took your normal tryptic peptide method and said to the QE "go ahead and fragment +1 peptides as well."

I'll also post the methods to   Yes, I hope that is still a real thing. I haven't responded to an email from anyone who has volunteeered to help the project in 4 months because I'm a terrible person and very very sleepy, but it's about to become my #1 priority again. Hopefully the volunteers are still onboard (I have the strangest suspicion that I'm tough to work with).

Wednesday, February 6, 2019

Harvard won the race. Real time Search -- MS3 on Fusion Lumos

I know of one other group that was working on this as well -- and I'd be confident guaranteeing there were some others, but it probably isn't a surprise that Gygi lab won the race.

Some instrument vendors have had real time peptide sequencing for years. Unfortunately, those vendors seem to have their instruments designed and constructed by lemurs with unlimited access to amphetamines (best explanation I have for some of this stuff), so the technology hasn't reached more than a few -- unfortunate -- scientists.

Thermo's metabolomics flagship, the IDX, has real time data acquisition and partial processing capabilities, demonstrating that the onboard resources can support something like this.

It was putting the pieces together. Here they are and the results are predictably marvelous.

10-plex TMT proteomics. 50% the acquisition time.... Is this the end of another of the final TMT limitations?

Tuesday, February 5, 2019

Measuring individual ions increases protein resolution....

Wait. What?

255 milliseconds?


ON AN INTACT PROTEIN (yes, a ridiculously easy one to work with, but still....)?!?!?

Anyone else have the feeling that ASMS Atlanta is going to be crazy....?

Saturday, February 2, 2019

Machine Learning Reveals Protein Signatures for ALS!!

How'd I miss this one?!? It's 🔥🔥🔥🔥🔥

Nevermind. It came out in November. And I wasn't aware November had happened yet.

You can check it out (open!) here! 

ALS is another one of those stupid human diseases that we don't put nearly enough resources into investigating and -- consequently -- know very very little about.

Time for some proteomics! (Hey! I know some of these people! Whasssup!?!? I love this study!)

Okay -- so -- let's complicate matters. How much do we know about CSF? Well...I know two people who are currently trying to figure out what the "normal CSF"-ome even is. So -- not a as much as you'd guess. Yes. This is 2018 and we still don't know what normal human body fluids are like. We'll get there.

How do you get over a hurdle like this? A LEARNING MACHINE and expert level proteomics.

In case you're interested in the proteomic data -- it's on Chorus and it's #1439.

It's what I know proteomics data is supposed to look like when it's coming off the instrument -- but -- somehow -- mine doesn't ever seems to quite look that way.

Sample #64...for example....(right? I know!)

The samples are Plasma and CSF from 30+ patients with ALS and 30+ healthy controls. The samples were depleted and digested.

A QE Plus instrument and -- fuuuuuucccck...... have to read this yourselves.....'s like a real scientist came in to our big fun proteomics science fair and took over..... Yes. This is awesome.

Despite the quality of the separation, you may be surprised to learn that nanoLC was used and the detector was a QE Plus. The CV of the plasma data is 1.6%. I was fine not knowing that the number could get that low. It was certainly better for my self-esteem.

Starting with this beautiful data -- everything then got more awesome somehow.

How do you do machine learning on a body fluid proteomics dataset? You break out the RStudio and you follow these directions.

I'm going to stop gushing over this paper and go do something else, but if you want an expert lesson in proteomics, or just an ego check, I strongly suggest you check this one out!

Friday, February 1, 2019

RUN!!! It's ASMS Abstract deadline!!!

What are the odds that 3 conferences -- 2 months, 4 months and 5 months away would all have their abstract deadlines in the same 10 days?  Wait. What are the odds that we can't present any of the same stuff at any of them...? did that happen...? Do we all have super severe ADHD...?

Welcome to January 2019!!!  And this is the big one!!  ASMS ATLANTA!!

Abstracts are due today, Friday February 1st!!!