Monday, March 27, 2017

Multi-institute study uses proteomics to fix errors in 16 Mosquito genomes!

I can't seem to fix the resolution on this image. It is just too big.  You'll just have to believe I'm pointing at is an awesome step in the bioinformatic pipeline in this new paper in Genome Research!

The blurry highlighted line is where they use the 5 million MS/MS spectra that they got in their deep proteomics of this mosquito to correct the genome that they started this study with!  As a mass spectrometrist you might not be aware that this journal is a big deal. More proof that our field is coming of age -- proteomics correcting genomic information in one of the top Genomics journals?

To do this they also integrated RNA-Seq (transcript) data from this organism and the pipeline is, understandably, complicated. Proteomics isn't perfect, but neither is genomics, but if you've got a peptide that comes from tissue of this organism that the genome can't explain and you look for it in the transcriptome and it's there, maybe editors of a big journal will let you:

Add almost 400 genes that were removed from the genome in error
And fix almost 1,000 errors in the genes that are there!

Mass spec nerd highlight for the paper -- to convince people outside your field that your data is amazing, maybe you need to show them that your median mass error for your peptides was 350 ppBILLION!

I definitely like that part of the paper -- but what I love about this paper is that they took this proof of principle (deadly mosquito vector #1) and applied it to 15 other species (15 other deadly mosquito vectors).  And, you know what? They could find a lot of the mistakes that were made in the mosquito genome they started with were also systematically applied to the other mosquito species!

This makes a lot of sense, when we're automatically assembling genomic information it is often assembled based on previous genomes. Even when manually annotating a genome you are going to ride a lot of the same assumptions. This study shows that we don't have to necessarily run deep proteomics on every tissue of every organism on earth to drastically improve our understanding of biology!!

Sunday, March 26, 2017

Phosphoproteomics of irradiated cancer cells!

This awesome new paper in press at MCP takes a swing at filling in some of the blanks in our understanding of DNA repair when cells are hit with different kinds of radiation!

We know lots and lots about our various DNA repair pathways. Every big school has a radiation oncology department and that's what they do. Is there still more to learn here?  SILAC phosphoproteomics says yes!

The figure above describes what they did. They chose A549 because (unlike many of our normal lab cell lines) it has a reasonably normal DNA repair pathway (p53 works right), so it will be more like what we'd see with normal human cells than others where they'll just keep happily dividing, pushing broken DNA right into the next generation of cells.

The used an Orbi XL for the global proteomics/phosphoproteomics running in high/low mode (CID-ion trap MS/MS). The protein was divided by 1D gels and the phosphoproteome was enriched thoroughly with both TiO2 and IMAC.  All the data was processed in MaxQuant with thorough downstream analysis with some R packages, cytoscape and PhosphoSite.

Cool stuff they found was validated by PRM with heavy peptide standards, but I'm not clear on the details and this paper has way too much supplemental info for a Sunday morning paper ;)

What did they find? With these treatments, little to no changes in the global protein level -- but phosphorylation changes all over the place!

I'd like to mention that this isn't a "peak bagging" paper where they goal was "How many phosphopeptides can we detect" (which is great, we do need those to test new methodologies, no criticism!)  This is a purely biology centric study. If the confidence level of the phosphopeptide ID wasn't awesome -- they toss it early in the analysis. If it wasn't clearly differentially regulated in response to treatment (with statistical validity) -- they toss it. They're looking for big hits that they can (and do) validate.

You start with getting amazing confidence in their methodology -- because they find the normal players (brazenly stolen table...don't sue me and I'll take it down if it is a problem! I'm just excited about these awesome results!)

BRCA1/RAD50/53BP1? Check! (They mention in the paper ATM/ATR, but it didn't make this figure.)

They find about 200 things that pass their stringent thresholds and about 1/3 of them are the normal stuff we know about. And then...the rest...IS ALL NEW STUFF (meaning it isn't listed in PhosphoSite anywhere!)

The Supplemental Info is seriously old school (and I love it!) -- page after page of manually annotated phosphopeptide spectra, their validation for each phosphopeptide with quan and stats for each radiation type.

They did make some poor person do a bunch of westerns to prove the PRMs were okay on the ones they could find antibodies know....drives me a little crazy, but may never really go away because a western is so much easier to explain to people outside of mass spectrometry.  This team were experts from beginning to end -- the westerns look great and (surprise!) match the PRMs exactly.

Saturday, March 25, 2017

Process TMT11-plex data in PD 1.4!

...cause somebody really did ask for it...if you've got TMT11-plex reagent on the way and are using PD 1.4, you'll need to add a channel, or download this template from my DropBox.


1) Tandem Mass Tags are the property of Proteome Sciences. I'd put trademarks every time I mention these reagents on the blog, but I don't have access to symbols without using HTML.
2)  I'm not a professional mass spectrometrist
3) This looks like it works fine. No promises, though!
4) I totally made up the mass in the average mass box for the method. You've got to use HRAM to even use the TMT11-plex reagent, so I figure it doesn't matter. I took the difference between the C and N average masses in the box for the other reagents, and added that to the average for the original 131 reagent. No promises.
5) Use at your own risk!

Wednesday, March 22, 2017

MvM workflow -- Combine DDA with DIA!

This one takes a second to wrap your brain around -- get to proteins that are only estimated to be expressed at 50 copies/cell(!!) it is worth it.

The paper is brand new and can be found at JPR here.

The basic idea is that if you run your normal DDA (TopN-type) experiment, you can break the peptides coming off the column into 3 groups:

Group 1 -- Fragmented and identified in all runs and any label free algorithm will give you amazing quantification

Group 2 -- Fragmented in a few, but not all runs. Identified, but you'd have to infer (or impute)  their identity from MS1 only in the other runs

Group 3 -- Peptides you never fragment that are just too low in abundance to ever crack the N-most intense in your TopN experiment

The MvM strategy (Missing Value Monitoring) specifically focuses on Group 2. You have this subgroup of peptides that have been identified -- which means you have a Peptide Spectral Match (PSM) that you can use to create a spectral library.

If you then run DIA on every file you can use the spectral libraries you made to quantify the peptides with missing values across all of your runs.

To test this strategy, this group uses a QE (the paper says QE HF, but the method section uses resolutions that show it as a QE Classic or Plus) on a yeast cells during different stages in their developmental cycle or something. They are able to get incredible depth, with even lowest abundant proteins being quantified in all samples.

Up-side -- This approach doesn't use any funky software and you get much better label free quan!
Down-side -- You need to run every sample for both DDA and DIA.

I really like this paper because it is a clever approach I haven't considered before. If the queue in your freezer seems to be growing at an ever faster rate, this might not be the LFQ method of your dreams ;)

But...if you have the available instrument time that you could run each sample twice, this might be a great method to consider!

Tuesday, March 21, 2017

Prosight Lite! Free top-down analysis of single proteins.

Now that 20 papers are out that have cited the use of Prosight Lite it may be time that actually link the paper on the blog -- as a partial thank you for how often I use this awesome free resource!

I'm too lazy to search the blog for some of the older posts on the software and I'm too busy with work to write a real post for Tuesday, so here is the paper for Prosight Lite!

Monday, March 20, 2017

Great review on structural proteomics techniques!

I've never done HDX-MS before. I think the idea is fascinating, but despite reading lots about it over the years -- well, I forget the key points.

This little review is awesome for linking stuff I do know well and stuff I don't and making a cohesive unit out of the big picture -- why we'd do this in the first place!

Even better? It is a great introduction for people who might be new to all of this -- good enough I'm gonna add it to the "Resources for Newbies" page over there -->

Shoutout to @KyleMinogue who is NOT this person! I checked.

Saturday, March 18, 2017

Great slide show on data storage and standardization!

Two great things come from following this link!

The first is (finally) something useful that came from LinkedIn!

The second -- is the great slide deck that walks you through challenges and perspectives in relation to proteomics data storage and meta-analysis!

Friday, March 17, 2017

Should you be using lower collision energy for PRM experiments? I was running my mouth again about how PRMs on a Q Exactive could beat SRMs for a QQQ and had to blow a weekend in lab running stuff to prove it to a bunch of skeptical people.

Caveats here for why I made this very costly dare (I probably only have a few thousand weekends left in my whole life after all...)

This researcher has only one peptide that he can use to confirm a positive outcome for this assay. One peptide. (Plus controls and whatever, of course).

There will be pressure for the LC-MS assay to be as short as possible.

The matrix is...whole...digested...human...plasma (or serum or whatever. A friend told me there was a difference yesterday and I still don't know what it is)

If you've got a protein you can get 3 peptides from for this, okay -- a QQQ might be the better choice for this assay -- but if you've just got one? I'm going PRM all day and never consider the QQQ.

I can't show the actual data cause I signed a thing that looked seriously scary. But I can tell you this -- there were so many peptides in the close mass range of this peptide in the digest on a 20 minute gradient that there was no way I could even trust SIM -- even at 70,000 resolution (max I had on the instrument I used) -- nope.  HAD to be PRM.

And -- when I was looking for fragment ions for my quantification (btw, I just extracted with Xcalibur and I believe it sums the fragment intensities rather than averages them -- but I'm not 100% - the peptides look great in Skyline as well) there was enough coisolation interference at with a 2.2 Da window that I couldn't use anything in the low mass range at all.

With this information I created the super-scientific scale that you see at the top of this post.  I really had to go to high mass fragment ions for specificity in my quan (and the best possible signal to noise!)  How complex is the matrix -- that with a 2.2Da isolation window there are smaller peptides you can't trust -- extracted at 5ppm...?

And, you know what? I could boost the intensity of these big fragment ions by dialing the collision energy back some.  Not a huge boost, but dropping the nCE down to 25 might have picked me up 10-20% in this particular assay for this particular peptide. (Your results may differ)

Let's check some experts!

I went to ProteomeXchange and searched "PRM" and downloaded some RAW data at random from a couple studies out there....and...I totally "discovered" settings I should have been using the whole should probably use a little less collision energy for your PRMs!

The first 2 studies I pulled...used...25! (PXD003781 and PXD001731). 2 other studies -- RAW files completed just as I was wrapping this up appear to have used 27.  We're at 50/50, but my peptide really liked lower energy.

Side note -- these samples were given to a lab that ran them on a QQQ that would cost this researcher MORE than the Q Exactive I used, LOL!

BTW, the  QQQ lost again. In ultra complex matrices where QQQ is going to lose the S/N game -- and you don't really need the 500 scans/second -- what you need is certainty that what you are quantifying is the correct compound -- my money is on PRM. And -- holy cow -- if you can save money getting a Q Exactive over a QQQ for the assay....

Thursday, March 16, 2017

High precision prediction of retention time for improving DIA!

We've have peptide retention time in silico predictors for at least 15 years - and sometimes they work great. I don't think it is controversial at all to say that real peptide standards work better.

This recent Open Access Paper takes a look at the difference between the two -- as well as different retention time calibration mathematical models in the context of SWATH and DIA.

And the results are pretty clear from their work -- in DIA it helps a lot to have retention time control for your identifications. With the added uncertainty of the bigger windows or having the MS1 for quan that is not directly linked by the instrument to the MS/MS fragments -- this is really valuable.

Also, this paper is great because it highlights how ridiculously great the Q Exactive Classic is for DIA. They can get over 10% more protein IDs with their high precision iRT model, pushing standard 2 hour DIA on human digests from 4,500 protein groups up to 5,000 protein groups!

5,000 protein groups in 2 hours from human digest!!!!!  I need to do more DIA....

Wednesday, March 15, 2017

Cell wide analysis of protein thermostability!

Okay --- I've GOT to get out the door before 5 if I've got any shot of making it to my talk at the NIH this morning...

BUT...I've got to say something about this AWESOME NEW PAPER IN SCIENCE!

Man, THIS is a Science paper. One of those things where you're scratching your head wondering -- "um...okay...why would we even want to know that...?...but that was a really smart way of doing it and I bet something will come out of it!"

Its 4:47!  I've gotta steal @ScientistSaba's notes (thanks!) on the paper and go!

It uses "LFQ to explore thermostability on a proteome-wide scale in bacteria, yeast, and human cells by using a combination of limited proteolysis and MS...The group maps thermodynamic stabilities of more than 8000 proteins across these 4 organisms. Their results suggest that temperature-induced cell death is caused by the loss of a subset of proteins with key functions." Sweet, right!?!

Worth noting, they do all the analysis with LFQ on a QE Plus using Progenesis IQ.

Tuesday, March 14, 2017

Param-Medic -- Automatic parameter optimization without database searching!

I'm honestly having trouble wrapping my brain around how this new free piece of software works -- and whether it would be an advantage over the tools I currently use for this, regardless it is an interesting read!

Somehow -- it can look at your RAW data and determine the mass accuracy settings that you ought to use for your database search, without looking at your database at all the way Preview or IMP Recalibration node does.

If you are using the Crux pipeline tools -- it has already been integrated as an option for you to check out. For the rest of us who don't want to use awesome free data processing pipelines from some guys in Seattle (what do they know about mass spec data processing anyway...), we he can download the stand-alone and run it in Python.

Monday, March 13, 2017

Awesome clinical proteomics study on weight loss!

I'm gonna be conservative and say there are about 12 reasons to read this awesome new open access paper!

I'll name a few and see how far I get

1) A "how to" for clinical proteomics. 1 hour digestion? 45 minute runs? Now -- this is something practical for a clinical setting.

1.5) This had to move up the list. The samples were prepped with a robot liquid handler thing!

2) This section title "Plasma Protein Levels Are Individual-Specific" Holy cow! Why don't I have my own plasma proteome done yet?

3) XIC based label free quan (MaxQuant LFQ) applied to a clinical sized cohort (300+ patients; over 1200 runs!)

4) Beautiful downstream analysis -- that leads to clear biological conclusions on this cohort, including inflammation response, insulin resistance, etc.,

I really think I could get to 12, but I do have a job and I should probably not be late for it!

Saturday, March 11, 2017

Ready for a new PTM to worry about? Cysteine glycosylation is all over the place in human cells!

Fun fact: Did you know that O-GlcNAc modified proteins were discovered in Baltimore over 30 years ago? See, there's more to my home town than fantastic infrastructure and friendly people!

Glycoproteomics is kind of exploding right now -- the enrichments are better, the separations are better, and the mass specs are ridiculously better, and the software has almost caught up....and I wonder if this great new paper at ACS is just the tip of the iceberg....

A whole new class of glycopeptides right under our noses! The evidence looks pretty clear cut to me -- and first analysis from this group suggests that it isn't even rare. Once they had a mechanism to enrich and a pipeline to search for them in the data they report proteins with this modification in virtually every subcellular fraction!

Friday, March 10, 2017

Changes in coffee proteomics during processing.

Want to learn a lot about coffee this morning and see some classic proteomics techniques put to good use?

Check out this new paper in Food Chem (Elsevier)

The idea? They dry coffee in different ways -- and some people have linked how they dry the coffee during processing to the quality of the coffee. Apparently, making coffee is really complicated.

So this group extracted proteins from coffee beans (btw, you need liquid N2 to extract peptides from coffee beans), did some 2D-gels and spot picked for an old MALDI-TOF to get to work on.

They find a couple dozen spots --  and can get a peptide or two from each spot for identification. Unsurprisingly they find some heat shock proteins are differentially regulated as well as a few other interesting proteins that make sense. Their next plan is to see if they can create model systems to tell if one (or more of these) are responsible for the taste difference.

I want to imagine this is how the taste test goes ---  coffee supplemented with Hsp70:

Coffee supplemented with: "homologous protein to putative rMLC Termites like superfamily protein" (another big spot on the gels)

...and now we know which one it is!

Thursday, March 9, 2017

MetaMorpheus -- Amazing software and stategy for finding all the PTMs!

I'm gong to end my blog hiatus with the best paper I read while I was out recovering -- and it's this new one out of Wisconsin!

Let's start with a minor criticism -- if you saw the title of this article in your Twitter feed you might think that this is a review on the topic of PTMs and just go right past it. And you shouldn't pass by this one.

Here is the thing -- our database tools are really good at finding peptides from unmodified proteins in our databases. If your job as a proteomics scientist is to identify peptides from model organisms with perfectly complete annotated UniProt proteins that are not regulated in anyway by PTMs you are in the clear -- we've got all the tools for you.  If, however, you are studying something that actually exists in nature (i.e., modifies virtually all of it's proteins with chemical modification combinations of some kind) it's still tough in this field.  Our tools are designed for unmodified proteins. Looking for any modification is possible -- but computationally super expensive (example).

I LOVE this paper, btw. I had worried that my enthusiasm for it had something to do with all the painkillers from my knee procedures, but -- narcotic free -- still love the paper!  Here is the idea.

1) Screen the data at a high level with a great big mass tolerance window and look for PTMs
2) If finding evidence of the PTMs -- take a FASTA and build a more intelligent FASTA (at this point it must be XML) that includes this stuff (think of it like a ProSight Flat file where instead of using biologically curated data to build your database you are building your database with PTMs on the fly with the data that you have in hand)
3) With your smart database research your RAW data with your normal tight tolerances so you get everything right.

If you're thinking -- "hey, I can do that, I have all the tools necessary on my desktop right now." You might be right. You can do a wide tolerance mass search, find all your deltaM masses, convert them to PTMs, make a better database (okay...maybe you can do that...I can't...unless now I'm firing up ProSight....building a Flat file and doing the rest of it that way....) and then research my database.

My response -- can you download a free piece of software right now -- that'll just do the whole darned thing for you? It's called MetaMorpheus and you can get it right now -- right here!

(No relation.)

Okay -- so this doesn't come without a hitch -- you are STILL doing a huge delta M search to start your program -- and even as fast as Morpheus is the search space is tragically large. For one of their human cancer digests it takes 13 days to run the project on what sounds like a seriously beefy really truly get to the bottom of these PTMs with ultra high confidence of their presence and their site specificity -- in one workflow...??!?  I can't wait to give this a try!!