Wednesday, March 29, 2017

ProtVista -- Visualize all the protein information!


As we continue to pile new genome and protein sequences into all these databases we have to come up with something better than just text sequences. You could argue that the few graphical sequence tools out there could use a refresh as well.

I'm happy to see that people are actively working on this stuff! Proof is in this extremely short open access paper in Bioinformatics!


It details this new visualization tool, what resources it uses, where to download it and the source code but the coolest info in the paper (to me, anyway!) is that it is already in use by multiple tools we all already use!

To verify I went after a protein I studied for a while, typed it into Uniprot -- and BOOM!


Once you find the protein (and correct species) click the accession ID then click Feature Viewer and you get the ProtVista results! This isn't just colorful, it's super useful. For example, if you click on PTMs, it expands and you get all these little symbols


Hover over them and it gives you a little box that shows what the modification is and who has verified it. Check out the disulfide bond row (column? whatever...)!! How ridiculously valuable is that if you are really characterizing this protein in depth? I'll tell you. Very ridiculously valuable.

If this isn't enough for ProtVista, you can go directly to the page and get the software for yourself -- with instructions -- and source code here!


Minor aside here -- While looking for this tool, I discovered UniProt hosts a surprisingly useful YouTube page. Seriously, I normally just go to UniProt and bumble around looking for FASTAs and click lots of things till I find them. I was vaguely aware they were adding lots of cool tools to it (and maybe always had some) but I'd stolen my Zip files and ran away.

There is a lot of cool stuff here and these videos will show you what they are. Maybe even more useful? There are really basic intro videos you can send to customers or collaborators if they're outside the field and ask you what a "FASTer" is!  ;)

Tuesday, March 28, 2017

Integrated proteomics and metabolomics of fungal-endobacterial systems!


Another amazing paper this week in a non-proteomics centric journal.  This one is in Environmental Microbiology and integrates proteomics and metabolomics to figure out how symbiotic relationships work between fungi and bacteria!


A lot of the biology is mostly new to me. I vaguely recall something from some biology class (...the more I think about it the more I feel like it was in the 90s...then the pastel flashbacks start happening again...did I really think a mullet was a good idea...stop thinking about the 90s, Ben!!) about fungi and bacteria working together in symbiotic relationships.

These fungal-endobacterial relationships are the focus of this awesome paper. There are some serious technical challenges here like;
- you extract proteins from these systems and you've got proteins from multiple species
- the fungal biomass is way larger than the bacterial biomass

It is worth noting that this isn't trivial at all, there are several studies leading up to this one, and several studies on the way but they have assembled a really smart system for separating what is what!

They have a model fungi that they can grow -- and suppress the bacterial symbiont! (Spell check says that is not a word, but I'm leaving it. If they look at quantifiable changes in the proteome (also not a word, I might add...) and the metabolome under different conditions in the normal fungi-bacteria and the fungi-suppressed bacteria, then they can start to figure out what's what.

In this study they use media that has plenty of nitrogen and media that is low in nitrogen and compare the two conditions and extract for global proteomics and metabolomics.

The metabolomics is done on an Exactive running in negative only mode. In an interesting variation I've never considered (that makes a lot of sense) early in their run they use a lower scan range than later in the run. Since I keep seeing things like lactic acid come off my columns here around the 1.5 minute mark, and my lipids don't start showing up until around 10 minutes, I'm absolutely stealing this trick for the next project!

For the proteomics, I'm just a little fuzzy on the sample prep...nope! I get it! I think they are taking part of their protein lysate and digesting it in solution and the other part is FASP'ed and then they're combining. I'm not sure why I haven't done this....for every paper we've seen out there that has shown that this digestion method and this digestion method get us slightly different results, it's obvious, right? If you want to get a comprehensive picture why not use a couple digestion techniques...? Honestly it seems stupid obvious right now, but I'm not sure I've ever seen this done either!

The proteomics are straight online MudPit -- 11 hour 2D into an Orbitrap Elite running high high mode. The data is at PRIDE PXD003240.  The dataset isn't available yet (this paper is brand new). Honestly, I just wanted to see how big these RAW files are!

Okay, gotta wrap this up and go to work!  What do they get out of the metabolomics and proteomics? 1) They can tell what parts of the total metabolism of this biomass is coming from the bacteria!  Depleted bacteria load depletes these metabolites (boom!) and the global proteomics backs it up.

They get such awesome coverage of both the proteomics and metabolomics that they have supporting info on these observations. Stuff like "there is much less of this metabolite, oh and there is much less of the enzyme that makes it!!," forming a beautifully complementary picture!!

Side notes: they do label free quan with ProRata and downstream stats with QVALUE in R; they have a custom designed set of metabolic standards they use for verifying metabolite ID and differential stats appear to all be done in various R packages.


Monday, March 27, 2017

Multi-institute study uses proteomics to fix errors in 16 Mosquito genomes!


I can't seem to fix the resolution on this image. It is just too big.  You'll just have to believe I'm pointing at is an awesome step in the bioinformatic pipeline in this new paper in Genome Research!



The blurry highlighted line is where they use the 5 million MS/MS spectra that they got in their deep proteomics of this mosquito to correct the genome that they started this study with!  As a mass spectrometrist you might not be aware that this journal is a big deal. More proof that our field is coming of age -- proteomics correcting genomic information in one of the top Genomics journals?

To do this they also integrated RNA-Seq (transcript) data from this organism and the pipeline is, understandably, complicated. Proteomics isn't perfect, but neither is genomics, but if you've got a peptide that comes from tissue of this organism that the genome can't explain and you look for it in the transcriptome and it's there, maybe editors of a big journal will let you:

Add almost 400 genes that were removed from the genome in error
And fix almost 1,000 errors in the genes that are there!

Mass spec nerd highlight for the paper -- to convince people outside your field that your data is amazing, maybe you need to show them that your median mass error for your peptides was 350 ppBILLION!

I definitely like that part of the paper -- but what I love about this paper is that they took this proof of principle (deadly mosquito vector #1) and applied it to 15 other species (15 other deadly mosquito vectors).  And, you know what? They could find a lot of the mistakes that were made in the mosquito genome they started with were also systematically applied to the other mosquito species!

This makes a lot of sense, when we're automatically assembling genomic information it is often assembled based on previous genomes. Even when manually annotating a genome you are going to ride a lot of the same assumptions. This study shows that we don't have to necessarily run deep proteomics on every tissue of every organism on earth to drastically improve our understanding of biology!!

Sunday, March 26, 2017

Phosphoproteomics of irradiated cancer cells!


This awesome new paper in press at MCP takes a swing at filling in some of the blanks in our understanding of DNA repair when cells are hit with different kinds of radiation!


We know lots and lots about our various DNA repair pathways. Every big school has a radiation oncology department and that's what they do. Is there still more to learn here?  SILAC phosphoproteomics says yes!

The figure above describes what they did. They chose A549 because (unlike many of our normal lab cell lines) it has a reasonably normal DNA repair pathway (p53 works right), so it will be more like what we'd see with normal human cells than others where they'll just keep happily dividing, pushing broken DNA right into the next generation of cells.

The used an Orbi XL for the global proteomics/phosphoproteomics running in high/low mode (CID-ion trap MS/MS). The protein was divided by 1D gels and the phosphoproteome was enriched thoroughly with both TiO2 and IMAC.  All the data was processed in MaxQuant with thorough downstream analysis with some R packages, cytoscape and PhosphoSite.

Cool stuff they found was validated by PRM with heavy peptide standards, but I'm not clear on the details and this paper has way too much supplemental info for a Sunday morning paper ;)

What did they find? With these treatments, little to no changes in the global protein level -- but phosphorylation changes all over the place!

I'd like to mention that this isn't a "peak bagging" paper where they goal was "How many phosphopeptides can we detect" (which is great, we do need those to test new methodologies, no criticism!)  This is a purely biology centric study. If the confidence level of the phosphopeptide ID wasn't awesome -- they toss it early in the analysis. If it wasn't clearly differentially regulated in response to treatment (with statistical validity) -- they toss it. They're looking for big hits that they can (and do) validate.

You start with getting amazing confidence in their methodology -- because they find the normal players (brazenly stolen table...don't sue me...contact me and I'll take it down if it is a problem! I'm just excited about these awesome results!)


BRCA1/RAD50/53BP1? Check! (They mention in the paper ATM/ATR, but it didn't make this figure.)

They find about 200 things that pass their stringent thresholds and about 1/3 of them are the normal stuff we know about. And then...the rest...IS ALL NEW STUFF (meaning it isn't listed in PhosphoSite anywhere!)

The Supplemental Info is seriously old school (and I love it!) -- page after page of manually annotated phosphopeptide spectra, their validation for each phosphopeptide with quan and stats for each radiation type.

They did make some poor person do a bunch of westerns to prove the PRMs were okay on the ones they could find antibodies for....which....you know....drives me a little crazy, but may never really go away because a western is so much easier to explain to people outside of mass spectrometry.  This team were experts from beginning to end -- the westerns look great and (surprise!) match the PRMs exactly.

Saturday, March 25, 2017

Process TMT11-plex data in PD 1.4!


...cause somebody really did ask for it...if you've got TMT11-plex reagent on the way and are using PD 1.4, you'll need to add a channel, or download this template from my DropBox.


DISCLAIMERS:

1) Tandem Mass Tags are the trademark property of Proteome Sciences. I'd put trademarks every time I mention these reagents on the blog, but I don't have access to symbols without using HTML.
2)  I'm not a professional mass spectrometrist
3) This looks like it works fine. No promises, though!
4) I totally made up the mass in the average mass box for the method. You've got to use HRAM to even use the TMT11-plex reagent, so I figure it doesn't matter. I took the difference between the C and N average masses in the box for the other reagents, and added that to the average for the original 131 reagent. No promises.
5) Use at your own risk!


Wednesday, March 22, 2017

MvM workflow -- Combine DDA with DIA!



This one takes a second to wrap your brain around -- but...to get to proteins that are only estimated to be expressed at 50 copies/cell(!!) it is worth it.

The paper is brand new and can be found at JPR here.


The basic idea is that if you run your normal DDA (TopN-type) experiment, you can break the peptides coming off the column into 3 groups:

Group 1 -- Fragmented and identified in all runs and any label free algorithm will give you amazing quantification

Group 2 -- Fragmented in a few, but not all runs. Identified, but you'd have to infer (or impute)  their identity from MS1 only in the other runs

Group 3 -- Peptides you never fragment that are just too low in abundance to ever crack the N-most intense in your TopN experiment

The MvM strategy (Missing Value Monitoring) specifically focuses on Group 2. You have this subgroup of peptides that have been identified -- which means you have a Peptide Spectral Match (PSM) that you can use to create a spectral library.

If you then run DIA on every file you can use the spectral libraries you made to quantify the peptides with missing values across all of your runs.

To test this strategy, this group uses a QE (the paper says QE HF, but the method section uses resolutions that show it as a QE Classic or Plus) on a yeast cells during different stages in their developmental cycle or something. They are able to get incredible depth, with even lowest abundant proteins being quantified in all samples.

Up-side -- This approach doesn't use any funky software and you get much better label free quan!
Down-side -- You need to run every sample for both DDA and DIA.

I really like this paper because it is a clever approach I haven't considered before. If the queue in your freezer seems to be growing at an ever faster rate, this might not be the LFQ method of your dreams ;)

But...if you have the available instrument time that you could run each sample twice, this might be a great method to consider!

Tuesday, March 21, 2017

Prosight Lite! Free top-down analysis of single proteins.




Now that 20 papers are out that have cited the use of Prosight Lite it may be time that actually link the paper on the blog -- as a partial thank you for how often I use this awesome free resource!

I'm too lazy to search the blog for some of the older posts on the software and I'm too busy with work to write a real post for Tuesday, so here is the paper for Prosight Lite!


Monday, March 20, 2017

Great review on structural proteomics techniques!


I've never done HDX-MS before. I think the idea is fascinating, but despite reading lots about it over the years -- well, I forget the key points.

This little review is awesome for linking stuff I do know well and stuff I don't and making a cohesive unit out of the big picture -- why we'd do this in the first place!

Even better? It is a great introduction for people who might be new to all of this -- good enough I'm gonna add it to the "Resources for Newbies" page over there -->

Shoutout to @KyleMinogue who is NOT this person! I checked.



Saturday, March 18, 2017

Great slide show on data storage and standardization!


Two great things come from following this link!

The first is (finally) something useful that came from LinkedIn!

The second -- is the great slide deck that walks you through challenges and perspectives in relation to proteomics data storage and meta-analysis!

Friday, March 17, 2017

Should you be using lower collision energy for PRM experiments?


Okay...so I was running my mouth again about how PRMs on a Q Exactive could beat SRMs for a QQQ and had to blow a weekend in lab running stuff to prove it to a bunch of skeptical people.

Caveats here for why I made this very costly dare (I probably only have a few thousand weekends left in my whole life after all...)

This researcher has only one peptide that he can use to confirm a positive outcome for this assay. One peptide. (Plus controls and whatever, of course).

There will be pressure for the LC-MS assay to be as short as possible.

The matrix is...whole...digested...human...plasma (or serum or whatever. A friend told me there was a difference yesterday and I still don't know what it is)

If you've got a protein you can get 3 peptides from for this, okay -- a QQQ might be the better choice for this assay -- but if you've just got one? I'm going PRM all day and never consider the QQQ.

I can't show the actual data cause I signed a thing that looked seriously scary. But I can tell you this -- there were so many peptides in the close mass range of this peptide in the digest on a 20 minute gradient that there was no way I could even trust SIM -- even at 70,000 resolution (max I had on the instrument I used) -- nope.  HAD to be PRM.

And -- when I was looking for fragment ions for my quantification (btw, I just extracted with Xcalibur and I believe it sums the fragment intensities rather than averages them -- but I'm not 100% - the peptides look great in Skyline as well) there was enough coisolation interference at with a 2.2 Da window that I couldn't use anything in the low mass range at all.

With this information I created the super-scientific scale that you see at the top of this post.  I really had to go to high mass fragment ions for specificity in my quan (and the best possible signal to noise!)  How complex is the matrix -- that with a 2.2Da isolation window there are smaller peptides you can't trust -- extracted at 5ppm...?

And, you know what? I could boost the intensity of these big fragment ions by dialing the collision energy back some.  Not a huge boost, but dropping the nCE down to 25 might have picked me up 10-20% in this particular assay for this particular peptide. (Your results may differ)



Let's check some experts!

I went to ProteomeXchange and searched "PRM" and downloaded some RAW data at random from a couple studies out there....and...I totally "discovered" settings I should have been using the whole time....yeah...you should probably use a little less collision energy for your PRMs!

The first 2 studies I pulled...used...25! (PXD003781 and PXD001731). 2 other studies -- RAW files completed just as I was wrapping this up appear to have used 27.  We're at 50/50, but my peptide really liked lower energy.

Side note -- these samples were given to a lab that ran them on a QQQ that would cost this researcher MORE than the Q Exactive I used, LOL!

BTW, the  QQQ lost again. In ultra complex matrices where QQQ is going to lose the S/N game -- and you don't really need the 500 scans/second -- what you need is certainty that what you are quantifying is the correct compound -- my money is on PRM. And -- holy cow -- if you can save money getting a Q Exactive over a QQQ for the assay....

Thursday, March 16, 2017

High precision prediction of retention time for improving DIA!


We've have peptide retention time in silico predictors for at least 15 years - and sometimes they work great. I don't think it is controversial at all to say that real peptide standards work better.

This recent Open Access Paper takes a look at the difference between the two -- as well as different retention time calibration mathematical models in the context of SWATH and DIA.


And the results are pretty clear from their work -- in DIA it helps a lot to have retention time control for your identifications. With the added uncertainty of the bigger windows or having the MS1 for quan that is not directly linked by the instrument to the MS/MS fragments -- this is really valuable.

Also, this paper is great because it highlights how ridiculously great the Q Exactive Classic is for DIA. They can get over 10% more protein IDs with their high precision iRT model, pushing standard 2 hour DIA on human digests from 4,500 protein groups up to 5,000 protein groups!

5,000 protein groups in 2 hours from human digest!!!!!  I need to do more DIA....


Wednesday, March 15, 2017

Cell wide analysis of protein thermostability!


Okay --- I've GOT to get out the door before 5 if I've got any shot of making it to my talk at the NIH this morning...

BUT...I've got to say something about this AWESOME NEW PAPER IN SCIENCE!


Man, THIS is a Science paper. One of those things where you're scratching your head wondering -- "um...okay...why would we even want to know that...?...but that was a really smart way of doing it and I bet something will come out of it!"

Its 4:47!  I've gotta steal @ScientistSaba's notes (thanks!) on the paper and go!

It uses "LFQ to explore thermostability on a proteome-wide scale in bacteria, yeast, and human cells by using a combination of limited proteolysis and MS...The group maps thermodynamic stabilities of more than 8000 proteins across these 4 organisms. Their results suggest that temperature-induced cell death is caused by the loss of a subset of proteins with key functions." Sweet, right!?!

Worth noting, they do all the analysis with LFQ on a QE Plus using Progenesis IQ.

Tuesday, March 14, 2017

Param-Medic -- Automatic parameter optimization without database searching!

I'm honestly having trouble wrapping my brain around how this new free piece of software works -- and whether it would be an advantage over the tools I currently use for this, regardless it is an interesting read!


Somehow -- it can look at your RAW data and determine the mass accuracy settings that you ought to use for your database search, without looking at your database at all the way Preview or IMP Recalibration node does.

If you are using the Crux pipeline tools -- it has already been integrated as an option for you to check out. For the rest of us who don't want to use awesome free data processing pipelines from some guys in Seattle (what do they know about mass spec data processing anyway...), we he can download the stand-alone and run it in Python.

Monday, March 13, 2017

Awesome clinical proteomics study on weight loss!


I'm gonna be conservative and say there are about 12 reasons to read this awesome new open access paper!



I'll name a few and see how far I get

1) A "how to" for clinical proteomics. 1 hour digestion? 45 minute runs? Now -- this is something practical for a clinical setting.

1.5) This had to move up the list. The samples were prepped with a robot liquid handler thing!

2) This section title "Plasma Protein Levels Are Individual-Specific" Holy cow! Why don't I have my own plasma proteome done yet?

3) XIC based label free quan (MaxQuant LFQ) applied to a clinical sized cohort (300+ patients; over 1200 runs!)

4) Beautiful downstream analysis -- that leads to clear biological conclusions on this cohort, including inflammation response, insulin resistance, etc.,

I really think I could get to 12, but I do have a job and I should probably not be late for it!

Saturday, March 11, 2017

Ready for a new PTM to worry about? Cysteine glycosylation is all over the place in human cells!


Fun fact: Did you know that O-GlcNAc modified proteins were discovered in Baltimore over 30 years ago? See, there's more to my home town than fantastic infrastructure and friendly people!

Glycoproteomics is kind of exploding right now -- the enrichments are better, the separations are better, and the mass specs are ridiculously better, and the software has almost caught up....and I wonder if this great new paper at ACS is just the tip of the iceberg....



A whole new class of glycopeptides right under our noses! The evidence looks pretty clear cut to me -- and first analysis from this group suggests that it isn't even rare. Once they had a mechanism to enrich and a pipeline to search for them in the data they report proteins with this modification in virtually every subcellular fraction!


Friday, March 10, 2017

Changes in coffee proteomics during processing.


Want to learn a lot about coffee this morning and see some classic proteomics techniques put to good use?

Check out this new paper in Food Chem (Elsevier)

The idea? They dry coffee in different ways -- and some people have linked how they dry the coffee during processing to the quality of the coffee. Apparently, making coffee is really complicated.

So this group extracted proteins from coffee beans (btw, you need liquid N2 to extract peptides from coffee beans), did some 2D-gels and spot picked for an old MALDI-TOF to get to work on.

They find a couple dozen spots --  and can get a peptide or two from each spot for identification. Unsurprisingly they find some heat shock proteins are differentially regulated as well as a few other interesting proteins that make sense. Their next plan is to see if they can create model systems to tell if one (or more of these) are responsible for the taste difference.

I want to imagine this is how the taste test goes ---  coffee supplemented with Hsp70:


Coffee supplemented with: "homologous protein to putative rMLC Termites like superfamily protein" (another big spot on the gels)


...and now we know which one it is!


Thursday, March 9, 2017

MetaMorpheus -- Amazing software and stategy for finding all the PTMs!


I'm gong to end my blog hiatus with the best paper I read while I was out recovering -- and it's this new one out of Wisconsin!


Let's start with a minor criticism -- if you saw the title of this article in your Twitter feed you might think that this is a review on the topic of PTMs and just go right past it. And you shouldn't pass by this one.

Here is the thing -- our database tools are really good at finding peptides from unmodified proteins in our databases. If your job as a proteomics scientist is to identify peptides from model organisms with perfectly complete annotated UniProt proteins that are not regulated in anyway by PTMs you are in the clear -- we've got all the tools for you.  If, however, you are studying something that actually exists in nature (i.e., modifies virtually all of it's proteins with chemical modification combinations of some kind) it's still tough in this field.  Our tools are designed for unmodified proteins. Looking for any modification is possible -- but computationally super expensive (example).

I LOVE this paper, btw. I had worried that my enthusiasm for it had something to do with all the painkillers from my knee procedures, but -- narcotic free -- still love the paper!  Here is the idea.

1) Screen the data at a high level with a great big mass tolerance window and look for PTMs
2) If finding evidence of the PTMs -- take a FASTA and build a more intelligent FASTA (at this point it must be XML) that includes this stuff (think of it like a ProSight Flat file where instead of using biologically curated data to build your database you are building your database with PTMs on the fly with the data that you have in hand)
3) With your smart database research your RAW data with your normal tight tolerances so you get everything right.

If you're thinking -- "hey, I can do that, I have all the tools necessary on my desktop right now." You might be right. You can do a wide tolerance mass search, find all your deltaM masses, convert them to PTMs, make a better database (okay...maybe you can do that...I can't...unless now I'm firing up ProSight....building a Flat file and doing the rest of it that way....) and then research my database.

My response -- can you download a free piece of software right now -- that'll just do the whole darned thing for you? It's called MetaMorpheus and you can get it right now -- right here!


(No relation.)

Okay -- so this doesn't come without a hitch -- you are STILL doing a huge delta M search to start your program -- and even as fast as Morpheus is the search space is tragically large. For one of their human cancer digests it takes 13 days to run the project on what sounds like a seriously beefy PC...but...to really truly get to the bottom of these PTMs with ultra high confidence of their presence and their site specificity -- in one workflow...??!?  I can't wait to give this a try!!

Wednesday, March 8, 2017

Why do I hate this wine? Or...how I learned how to do metabolomics...


This is off the proteomics topic completely! Here is the thing. I have a very skeptical friend with some ridiculously cool samples. Like -- if anyone else in the world is brave enough to get these things -- I don't know who they are. And -- we talked about doing metabolomics on these samples. But before I could have these samples I needed to be able to prove that I could...well...do metabolomics.... And -- I may talk a LOT -- but I'm not gonna pretend I know how to do something especially at the risk of wasting precious samples!  I'll just spend a year of my spare time learning how to do it!

To learn the field, I did what I normally do -- I started a metabolomics blog -- and forced myself over the last year to read as many papers as I could on the topic. It is still a new field to me, so I know I've got tons to learn, but I may still link it to the right somewhere. Maybe someone will learn something from it, and I don't mind feeling dumb.

Okay -- so you can read a lot about something and that's awesome, but you need to run the instrument, clog a few columns and lose your temper with the software a few times to learn a new discipline, right? And it helps if you have some motivation....


...okay...so here is a perfectly anonymized map of  region with 15 commercial vineyards within a 20 minute or so drive of one another. My favorite wines in the world comes from this area -- amazing and in my happy range of $8-$15 a bottle even after they arrive here!

I went to such efforts to anonymize this -- cause there is an exception here. I don't like one of these vineyards. They are using the same grape varietal as everyone else. They are using the same strict rules of their appellation, in terms of how long they have to age, etc., but there is something that I really don't like about what they make.

Wine is just a mix of small molecules, right? As good of an excuse as any -- and with wine you're not exactly material limited!

Over an undisclosed time period I collected 1mL from a number of different bottles of wine from this region. The rest of each bottle was disposed of in a manner that meets the strict ethical guidelines of my undergraduate fraternity. Once a number of samples were collected, I borrowed a Q Exactive classic system with RSLC3000 from an old colleague using some vague statements and a promise to clean the S-lens later (which I totally did).

There aren't a bunch of "Q Exactive wine metabolomics app notes" but if you erase the word "wine" you're set -- I found 2 that were very similar, couldn't decide which one was better (now know this one is -- warning .PDF download) and ended up on the following methodology (used the columns and flow-rates they describe, btw - oh, and I injected 5uL of wine on column, cause why not...?)

You're basically doing C-18 separation in positive and negative just like for everything else except you're using a lower mass cutoff and +1/-1 charged ions are a good thing!  Pull that off and you are doing the instrument side of metabolomics!

Metabolomics is, however, ahead of us (in my humble opinion) in terms of the data processing in some ways. In most of the software I've tried so far they start with what is quantified -- and statistically significant between their sample sets -- THEN they care about finding out what it is. They have massive reductions in their search space by going to the XIC and throwing out all the stuff that is 1 to 1.  Who cares about the molecules that aren't changing? Not me!

To find what is significant, metabolomics software relies heavily on statistical tools.

Check this out --



This is a shot from Compound Discoverer (which, btw, is super easy to learn if you are using Proteome Discoverer 2.0 or newer).

(Look familiar?)

This is one of the first steps in analysis -- Volcano plots showing the fold change of your compounds on one scale and the P-value (!!!) on the other. You can just take your list of statistically(!!) significant changes that you find graphically and export them into a darned list!  Out of thousands of compounds detected in these weekend runs -- there are about 200 that are 10x up or down regulated with a p-value cutoff of 0.05. Wish you could do something that easy in Proteome Discoverer to get to the bottom of what is interesting...? I hope I'll have good news for you soon!

Interesting notes -- there are thousands of soluble small molecules that will stick to a C-18 column and ionize in a bottle of red wine! What?!? Initially, I'm thinking "that is way too high" but you've got small molecules from the grapes -- from the yeast -- from the wood of the barrels and stems -- so it doesn't seem that crazy...

Also -- and this is funny -- wines from the same vineyards cluster together just using PCA. Want to start a wine counterfeiter busting business on the side with your Q Exactive (if it is yours to do what you please, of course)  it is really easy to do. This is interesting to me cause anything you read on that stuff is done with big FTICRs -- and they -- and they're hungry helium habit aren't necessary -- you can do this with a benchtop system easy.


That big circle? Wines from one vineyard in particular are quite inexpensive and multiple years were available. They clustered really well together. Proof of the terroir myth, LOL?

So -- the big question -- what is it about wines from that one place that are different than the others? To find this I've got to do one of them volcano plot thingies with the wines compared.

I strengthened my cutoffs to narrow the list way down -- yeah -- I'm not screening 200 compounds -- but I have a few huge outliers...and a few are quite informative....but this ends up being one of my favorites.


Wow -- that one is kind of an ugly looking peak -- and a lot of the samples are virtually zero so you can't see a good comparison -- but I'm still gonna leave it here. Check out the numbers, though! We're looking at something that is upregulated like 200 fold over my control bottles!

If you've got high resolution MS/MS fragmentation mzCloud does a good job of identifying things. It is pretty strict, though. Low mass fragment ions are wobblier than you'd think without a lower mass negative calibrant than SDS. I took a significant hit by not adding an additional lower mass calibration ion...but ChemSpider had no problem making the ID of this massively upregulated molecular species.


It is called 3-Methyl-4-octanolide -- but we generally call it "whiskey lactone" -- cause it is a big part of the taste of whiskey. Long story short -- it is significantly higher in some oaks than others. In American oak it is super strong compared to other oaks in the world.

Now -- this may have absolutely nothing to do with why I don't like wine from that one vineyard -- but, of all the other vineyards there -- as far as I have been told in follow up emails -- only one uses American oak in their barrels....guess which one?  It is likely just a funny coincidence, but it makes a good story.

I wasn't going out to really solve this -- I wanted to learn the techniques and learn the software and it's funny to me that I used a couple weekends and some cutting edge technology to tell what I think is an interesting story. I actually started writing this up to publish, but then I got lazy.  The important part is that --- I got some cells from culture prepped from my friends I mentioned earlier and the data for our ASMS poster (its on the last day, if you want to see something I'm actually putting time into) convinced them that I could be trusted with the REALLY cool stuff.


Tuesday, March 7, 2017

Super phosphotyrosine enrichment!

As I continue to backlog some blog posts I was working on -- what about this awesome new paper and completely new strategy for pulling down phosphotyrosine peptides?!?!?  (Big shoutout to Saddiq for tipping me off to it!)


What is it? A completely different way to pull down peptides with phosphotyrosines on them. SH2 domains of proteins specifically bind to phosphotyrosines. This group figured out if they took a protein and modified the SH2 domains they'd end up with proteins that'll bind P-Tyr with super strength!

How's it work? Better than any enrichment I've ever seen!

Proof? Orbi Velos files are at Proteome Exchange (PXD003563) here!

Monday, March 6, 2017

Is there still stuff to discover in the red blood cell proteome?!?!


This new open access paper is low on color, but high on perspective. It seems we got to a certain point and then assumed we'd kinda conquered the red blood cell proteome.

In this short description of their analyses, they show evidence that we might want to look a little deeper. 


There might still be cool stuff in there!