Sunday, July 23, 2017

Quantitative assessment of digestion techniques for protein-protein interactions!



This new study ASAP at JPR is an incredibly thorough (QUANTITATIVE!) analysis of different pull down and digestion techniques for Protein-Protein Interaction (PPI) experiments.


Honestly, I'm filing it away as a reference for the next time I try to do one of these. This thing is a textbook of how to optimize a PPI experiment.

Of course, this isn't the first one of these we've seen, but rather than doing PSM counts or even # of peptide/proteins ID'ed, this study does LFQ with MaxQuant (quadrupole Orbitrap) and then use stable isotope SRM quantification.

An interesting observation is that higher abundance proteins are easily quantifiable just about regardless of the pull-down and digestion methods. To get to the lower abundance things, they really need to fine tune the methodology.

Saturday, July 22, 2017

How does Proteome Discoverer maintenance work?


I've gotten a lot of questions about Proteome Discoverer Maintenance and how it works. I wanted to put this post together so I can reference it when the next question comes in.

Proteome Discoverer 1.4/2.0/2.1/2.2 all appear to reference a shared maintenance key. If you get a quote from your vendor here in MD, the maintenance key says "Proteome Discoverer 1.4 maintenance." I'm not sure why, but I'm going to assume it is hard to change the product description. Either that, or my local rep does it wrong....maybe explaining the questions....

If you go into any version of Proteome Discoverer you should be able to add that maintenance key.

Admin/Manage licenses. Should look something like this:


Here you can add the activation code. Then you should be able to upgrade to any version of Proteome Discoverer following something like these instructions.

If your maintenance is expired (you may have to click "show expired licenses") then you won't be able to upgrade. You will be able to try out the demo if the demo has never been installed on that PC, though!



Friday, July 21, 2017

Proteomics of white button mushrooms post-harvest


I honestly started reading this paper 



because I was 12% awake and the main question I had was something like...


If I'd stopped at the abstract, I'd have known that this group is interested in finding biomarkers that will enable them to select for mushrooms that have enhanced shelf life! As my espresso is being absorbed, this seems like a better idea. I imagine as someone is selectively breeding mushrooms they are probably first focusing on obvious phenotypes like size and shape, but how long that mushroom will last on the shelf might be a lot more problematic to test for at the farm.

Off topic: Did you know that portabello and white button mushrooms are all the same species? Just different stages in the maturation process? 5.1 million fungi species on this planet, and in my country we essentially eat just one of them...maybe this is where I should put the Catbug picture I used above....

Back to the paper!  They get a bunch of mushrooms post-harvest and drop them into liquid nitrogen and manually grind the tissue into powder and extract the proteins/peptides in an undisclosed manner. The tryptic peptides are labeled with 4-plex isobaric tags and an online 2D-LC/MS method is used following a protocol in a paper that is not open access. Data is processed with Protein Pilot and Mascot using manufacturer settings.

In the end they obtain 5,878 peptides that correspond to around 1,000 proteins. Over 250 proteins change post-harvest across the datasets. Surprisingly, about 100 proteins are up-regulated! Naively, I assumed that everything would be degradation post-cell death.

The authors pick a few proteins that are shared between the multiple time points and about half of them appear to match via RT-PCR.

Thursday, July 20, 2017

How to do quantification of reporter ion quan replicates in PD 2.2


Thank, Dr. K for this first great PD 2.2 question that Dr. P didn't cover in his/her videos!

Here is the question: If you have an experiment like this reporter ion one above where your TMT-10plex set has replicates within it's set, how do you set that up so that PD 2.2 knows these are replicates and will allow you do get p-Values and use Volcano plots?

First off, you're going to need 2 study factors -- one for your samples and one for your replicates. Something like this will work for the example above


Here I've set up 5 conditions for my TMT-10plex control runs I did with a friend in Boston a few years ago. This is human cell digest added in a ratio of 1:2:4:8 with 2 replicates of each (and 4 1:1).

Next, I'll have to set up the Samples tab so they know which is which. Something like this


I just called the replicates 1 and 2 in order for simplicity sake.

Now, when I go to the Grouping and Quantification tab, when I select my conditions, this "Nested Design" wording pops up and my samples look like this:


DISCLAIMER: (Maybe this should be at the top) I don't know if this is how you do this. There is probably a better way. In this example it looks like the ratio of B1/A1 is obtained and then compared to the B2/A2. Works for me for this example, but please only take this as a way to get started!

What do the results look like? Spot on!


I'll have to look at the RAW file again, cause it looks like A and E are the 1:1 (and we had a bit of variance due to our pipette accuracy at 1uL.) but the other channels look right and E/A looks about 1:1

Now that we have replicates then we're allowed by PD to create Volcano plots to find what is statistically and quantitatively interesting. They should look like this (from a Compound Discoverer set we're working on):


However, if ALL your peptides are 1:1 or 1:2...that doesn't work very well...the 1:1 sample looks like this:


(...everything is at/around 1...I made this even messier by just processing peptides in a very narrow m/z range so that I could do several iterations before I had to go to work)

Follow-up question: Where are the p-Values coming from in the Volcano plot?

They are the -Log10 (unadjusted) Abundance ratio p-Values for the peptide/protein.


By default template you probably only see the Adjusted p-Values. You can unhide this value by clicking the Field Chooser (1) and then checking (2) above.

If you've also forgotten all of high school (Logs?), the way to convert the Abundance Ratio P-Value in the sample output to the one in the Volcano plot in Excel is something like "=ABS(LOG10(value))". Took me a couple tries to get it right and the numbers match.

Wednesday, July 19, 2017

Proteome Discoverer 2.2 is now available on the Thermo Omics Portal!


An amazing scientist at the NIH contacted me and told me that Proteome Discoverer 2.2 is live on the Thermo Omics portal! (Thank you Dr. J!)

Demo versions are available, as well as an upgrade key that will work if your copy of Proteome Discoverer has valid maintenance.

The instructions to upgrade to PD 2.2 are about the same as the instructions I posted on how to upgrade to PD 2.0 here a while back.

Once you get upgraded you'll find that PD 2.2 is VERY similar to PD 2.1, just with some awesome new features. For the biggest changes, you'll find some videos over there --->

that may be useful (Thank you Dr. P!)

Tuesday, July 18, 2017

Amazing quantitative coverage of the RBC proteome!


I gotta run, but I want to leave this great paper ASAP at JPR here.


This is the second paper I've posted this year on the RBC proteome. The first paper (post here) suggested that we have been taking our knowledge of this "simple" cell for granted and there is more to discover. This new paper definitely supports this assertion!

They start with RBCs from 4 individuals and digest them with a modified med-FASP (multiple enzyme) methodology. They do something really cool here, lysing part of the RBC population forming "white ghost" cells (which appear to just be empty membranes) and digesting them /running them separately. This approach reveals a more comprehensive RBC proteome than we've seen before as well as some new information.

They show clear evidence of some membrane transporters that have not previously been seen in RBCs and show that RBCs contain over 2,500 distinct proteins. I expected this study to use the proteomics ruler but the reference for the math leads me to this paper on the Total Protein Approach (which looks awesome! but I have to go do work rather than spend time on it). Using this they can get a really clear number of the # proteins per cell. About 1,800 of the ones they identify are present in the RBCs at >100 copies per cell -- meaning that there are bunch here at less than 100 copies per cell -- and they have the sensitivity to identify and quantify them. Not the focus of the paper, but something I'm still amazed to see.

Some of their global observations don't jive with our historical understanding of protein abundance in RBCs -- so they order stable isotope labeled peptides and show that they are right. Global proteomics "relative quan", FTW!

Monday, July 17, 2017

Changes in protein turnover in aging nematodes!

(C. elegans image borrowed from Genie Research)

This morning I learned that protein turnover slows down as most eukaryotic organisms age, which sounds like a dumb idea to me. 



The method is really cool. They feed E.coli 15N or 14N containing NH4Cl. Then they put synchronized nematodes on plates of the labeled E.coli. They can have the worms eating labeled bacteria for however long they want and then move the worms to unlabeled bacteria. When they extract the proteins/peptides from the worms they can assess the protein turnover levels by comparing time vs heavy/light N.  I don't know about you, but I'm impressed! 

Not only can they assess overall protein turnover speed, but they can assess protein turnover speed of individual proteins. They pull a total of 54 samples and do 3 biological replicates for their downstream stats. The peptides were analyzed on an LTQ Orbitrap and peptide identifications were obtained with a pipeline that includes MS-GF+.

Even more cools stuff -- they have an R package that they developed that can do all the 15N/14N computations! You can get that here

How'd they do? 

The could accurately track turnover in nearly 900 peptides throughout all these samples that correspond to about 600 proteins. This gives them a really good picture of different cellular compartments and proteins of different molecular function. 

In interpreting this data, the paper gets even better!  Even in an organism this simple, turnover isn't just slowing down uniformly. It is a mixed bag. A few proteins even increase in turnover. They draw some really thought-provoking biological interpretations regarding the systems protecting eukaryotic cells from proteome collapse that is better to leave to this great open paper. 

Clever system, great free software for the community, AND awesome biology? If you need an inspirational paper to start your week off right, I highly recommend this one! 

Sunday, July 16, 2017

Fascinating GWAS proteomics(?) study.

This post is going to be a bit of a Sunday morning ramble. It began when this interesting paper  showed up in my Twitter feed a couple times today.


And it caught my attention because of the associated text in the retweets:


GWAS is Genome Wide Association Study (wikipedia here). Generally, they proceed in this way: hundreds or thousands of people are tested with SNP arrays that can detect literally millions of different genetic variants. The participants are divided by their phenotype or disease state and inferences are made between the signals of variants between the phenotypes.

Lots of good stuff has come from GWAS, and lots still will as the tools continue to improve. If all goes well you will identify an Expression Quantitative Trait Loci (eQTL) or two that is associated with your disease. GWAS via SNP doesn't identify a gene that is associated with your disease. It identifies an area in the genome that is associated with your disease. In the best case scenario, you are working with a really well annotated genome and a gene with really well understood mechanisms of expression.

Side note: As of this Nature article in 2011, 96% of the 1.7 MILLION samples in the global GWAS catalog were from people of European descent. In this Nature followup last year, this appears to have approved, but there shockingly large discrepancies (that same library now has 35 MILLION samples). These articles point out the problems with developing genetic medicines for only certain populations. However, if you are really bored (or interested), check out this paper and the concept of linkage disequilibrium. Genomes aren't static. They can't be or that evolution thing doesn't work very well. You may not be able to make an inference from the effect of one point on a genome from one population to the next, cause that gene might be different or somewhere else entirely.

Wow. What was I writing about? Oh yeah! Okay, so GWAS is powerful, but we're inferring a lot of stuff 1) that place that looks upregulated is linked to that gene 2) it is linked to that gene in a way we understand (upregulation of that area could cause regulation of the associated gene to go UP or Down)

If you're still reading along (sorry) you can see why I might do a double-take on a GWAS proteomics study. You might also understand why I might read a paper and be a little surprised that no direct protein measurements were ever performed in this study.


This paper introduced me to the concept of pQTLs. These are QTLs associated with protein levels. 71 proteins known to be associated with cardiovascular disease were integrated as factors here.

My interpretation (which is likely wrong) is that rather than saying cohort A vs cohort B, the factors compared were patient group A who had CRP levels above X.X mg/dL compared to patients with CRP below that level. Then you look for QTLs that stand out.

How did they fare? Pretty well. They make some interesting biological conclusions and correspond those to what characteristics that patients manifest. They come up with 20 or so observations where the GWAS predicts the proteins that they know are elevated from the patient files. They find some other QTLs that seem to be associated with the known elevated proteins that might make for better predictive models of different stages of cardiovascular diseases down the road. Some enterprising CVD researcher should pull out this list and see if they do correspond at the protein level.

Is it really a proteomics study? Well...it's more of a transcriptomics study with some integration to a small set of proteins, but it is interesting and it forced me to read 2 or 3 papers to get to this (likely incorrect) interpretation of what they set out to do -- and whether it worked.

Got some guy doing GWAS down the hall and wondering if you could work together? Maybe you should check out this paper!

Incidentally, last summer a great study came out of some lab at Harvard where QTL and protein quantification were systematically compared using an amazing mouse model system. I wrote a post on it here that certainly didn't do it justice, but the original paper  is seriously good and helps bridge some gaps -- including terminology and can give you a feel for when you can trust QTL measurements and when you can't.

Saturday, July 15, 2017

Picky -- take all the work out of targeted human proteomics!


Targeted proteomics is cool and all, but don't you hate entering ion mass and retention time into those stupid windows? It is even worse if you're using one of those quadrupole things and you have to load up fragment ions and optimized collision energies...ugh...

What if there was an easy web interface that would just make the method for you for any human protein you select?

Say hello to Picky, which you can read about here! 

It makes use of the Proteome Tools database (synthetic peptides for ALL human proteins!) and knowledge of retention time and instrument parameters and completely simplifies the worst part of the targeted proteomics process

Heck, why don't you just give it a shot? You can check out the awesome Picky user interface here! 

Friday, July 14, 2017

UHPLC-LCMS lipidomics of dog plasma can differentiate breeds!

I admit it. I used this awesome new paper in Springer Metabolomics to justify Google Image searching for pictures of 9 different dog breeds in funny hats and used my favorites to illustrate a figure from the paper. You caught me. However, I also think that this is a really interesting study with results I would not have guessed possible.


First off, I'm no lipidomics expert. I did spend a couple days at Pitt's center for Environmental and Occupational Health which is one of the premier lipidomics centers in the world. What I got out of that visit, however, was mostly that lipidomics is really really hard and I'm glad they were doing it and not me.

The conclusion of the study I'm talking about now is in the title of the post and the paper. I would never have imagined that during our occasionally demented differentiation of wild dogs into this diverse set of physical features that we would also alter their plasma lipidomes, but it looks like we did!

They took plasma from around 100 dogs. n > 9 in all cases. These weren't lab controlled dogs, they were plasma samples from healthy dogs that selflessly volunteered some plasma for science, then went home to their different houses, probably put on funny hats and ate different food. There are a TON of variables here.

The lipids were extracted from the plasma in a very straight-forward manner and separated on a C-18 column and MS1 spectra were acquired at 100,000 resolution on a benchtop Orbitrap (appears to be Exactive "classic" system). Post-processing, lipids of interest were validated by MS/MS via direct infusion using a NanoMate coupled FT-ICR (LTQ-FT Ultra).

All the data was processed in XCMS and FIEmspro (new to me) -- and, you know what? The breeds are distinct. The paper is Open Access. If you're curious, you should totally check out the PCA plots! The breeds are totally distinct!

From the global profiles they are able to hunt down the features/compounds that are driving breed clustering. This is another reason to check out this awesome paper -- in case you're like me -- and don't know how to make that step from prime components to biomarker. I'm still unclear after reading this, but I think this is a cognitive deficiency on my part, because this is a very clear and well-written paper. They provide some great references that I hope will help me bridge this gap eventually.

On the topic of references in this paper -- I spent some time on this open access one as well.


I'm just linking it in case anyone else wants to know what breeds have the highest cholesterol levels and how those vary throughout a dog's life (the n is only high for a few breeds/genders/age groups, but still interesting to me).



Thursday, July 13, 2017

Spatiotemporal proteomic profiling of human cerebral development!


This study is incredible! I passed over it once because I was sure that this was another bait-and-switch paper. The title says human, but I figured the work was probably done on mouse or drosophila brains. Nope. This is perfectly performed single shot LFQ proteomics on human post-mortem FFPE brain tissue! (Nothing wrong with model organisms, no criticism intended, we can learn a lot from those too! But...human brains in early development?!?! Wow! Have you seen this before? I haven't!) You can check it out in press at MCP here.


They also did a lot with in vitro cells forced into differentiation, which I guarantee is interesting to people out there as well.

Check out those Venn diagrams! They get reproducible quan without missing values on an impressive number of proteins from the formalin fixed tissues. We've seen higher numbers in some other studies, but these are specifically taken from one area of the brain and robustness of quantification is the focus here. Honestly -- if they mentioned this in the paper I didn't see it -- but I wouldn't be surprised from the way this is done that we aren't looking at a group directly considering clinical applications for this beautiful study design,

A remarkably low amount of peptides were used on column (I'm too late this morning to do the math) and the nLC separation was only 60 minutes on a 15 cm column -- and they quantified >3,000 proteins! Not to get too excessively excited, but I love this trend of shorter nLC columns and gradients for today's faster instruments. A quadrupole Orbitrap (plus) type system was used.

The data hasn't been triggered for release on ProteomeXchange, but it will be available here when it does (PXD004076 for the FFPE tissue). Actually, the authors put in the reviewer login information so it can be accessed prior to release trigger, but I shouldn't be looking at beautiful RAW files. I should be getting ready for some meetings.


Wednesday, July 12, 2017

Proteomics of hydrophobic samples!


Need to do some proteomics on something like...adipose tissue...that is mostly lipids?  If trying 27 different lysis/digestion techniques to find the best one doesn't sound like a good time, you should check out this paper where they already did it!


A major emphasis of the study is quantitative reproducibility in these tough samples.

Tuesday, July 11, 2017

Convert .RAW files in Linux!


Random note for Linux users. Did you know UWPR set up instructions for RAW to mzXML conversion?

You can find the instructions here.

While you're there, you might want to bounce around the always awesome UWPR site! There is always something there that I didn't know about!

What the heck is precision psychiatry?


Want to see an amazingly ambitious undertaking? Check out this review on Precision Psychiatry!

If the picture doesn't completely encapsulate the level of this undertaking (where proteomics and metabolomics are just little bubbles of info playing in), read just a little ways into it. This is my second favorite quote from this awesome and informative paper...


...this makes the disease I've worked on sound really simple! Malaria and cancer are pretty much binary. You either have a parasite living in your blood eating your hemoglobin (or hiding in your liver) or you don't. That cell is either blowing by a cell cycle checkpoint...or it isn't... How much harder must it be if your proteomic profile doesn't actually match to a concrete and established phenotype? Yikes.

I'm often in conversations here in Maryland about precision medicine. This question sometimes comes up (maybe I'm the one asking it sometimes): "What does precision medicine mean to your field?"

My favorite quote from this great paper might be my favorite answer I've heard so far--




Monday, July 10, 2017

Low MW COLOR markers for peptide OFFGEL!



If you have an OFFGEL(TM) system, I highly recommend you check out this paper in press at Electrophoresis here.

This group came up with markers that migrate with the pI of the sample wells and can give you a quick heads up on the progress of your fractionation!

One of the challenges with this offline separation technique is that it can be difficult to tell when your fractionation has proceeded to completeness. This team evaluated the use of a several markers with known pIs. For example, the markers dark orange and yellow, have pIs of 3.9 and 10.1 respectively. If the diluted colors focus in wells or IPG strip can we assume that the peptides have completed migration?

They find that they can, and confirm with MS/MS. Even better? They can use these markers with peptides that have been isobaric labeled!

Sunday, July 9, 2017

NetProt: A new R package for complex-based feature selection


I'm not going to embarrass myself by trying to explain this paper. I'm just going to throw it out there because I recognize the value this R package provides in making some really complicated stuff a little more accessible

Protein Complex Based Feature Selection (PCBFS) has shown some really good results in some previous proteomics studies. This is the big data stuff where our classical approaches may not matter as much (by classical I mean - ID this peptide, quantify this peptide). One particular value in the approach is that it helps with batch effects.

NetProt is an attempt to make PCBFS somewhat more approachable to everyone else by integrating 5 different approaches into an R package. Seems like a nice step in the right direction to me! You can check it out in this month's JPR here.


Saturday, July 8, 2017

Metaproteomics of poo from preterm infants!


In this Saturday's edition of "Things I didn't know you could do until now," I present this amazing study in press at MCP!


We're hearing lots about our microbiomes these days. We're even seeing it reach mainstream media and popsci. The aim of this study was to figure out the microbiota of preterm human babies -- who might not have mature microbiota and might get introduced early to hospital bacteria and other gross stuff is differs from the norm through early development!

I think the normal approach to this is going to be genetic. And they do that (via 16S profiling), but they also follow these babies over time using metaproteomics.

How'd they do it?
They identified some candidate babies born at different points in the gestation (some as early as 25 weeks!) and collected poo samples (do NOT google search these fancy baby poo terms!) at birth (I think) and definitely 1,2,4 &6 weeks post birth were used for the study.  The microbiota proteins were extracted with a bead lysis technique and the proteins were separated in the first dimension with SDS-PAGE and peptides digested out. The digested peptides were ran on a linear ion trap Orbitrap system. I haven't pulled a RAW file or the supplemental to look at the method or instrument model.

What I'm interested in is how they link the peptides back to the microbiota to make a chart like the one at the top of the post!

First off -- they get their FASTA from this thing I didn't know about 


that apparently has loads of information on the bacteria that occupy us! They process the data with MaxQuant using iBAQ (label free quan) and the relative intensities go into their calculations of the relative bacterial content (probably obvious to less sleepy people? I shouldn't admit I was surprised. How else would you do it? Come on, brain!)

What did they find? Really interesting stuff! Big microbiota shifts that can be correlated to the child's nutritional needs at early development, some reasons to be concerned that early microbiota might need to be monitored for later health implications (the abstract explains this better than I can).

Figure 3 might be the most interesting to me from a proteomics standpoint -- in some places the genomics and metaproteomics line up amazingly. At some points there appears to be some serious disagreement about the dominant class of organisms. For example, at weeks 3/4 the metaproteomics calls Klebsiella the highest abundance organisms, while the 16S profiling appears to call Enterobacter.

Check out what the authors say about it!


WHOA!!! The metaproteomics finds misclassifications from the 16S profiling!

I've really got to get going this morning. Grammar check (maybe) later!

All the data described here is available via ProteomeXchange (PRIDE) via this identifier PXD005574

Friday, July 7, 2017

How many actual phosphorylation sites are there?

We've seen some amazing papers that have identified 10s of thousands of phosphopeptides. Numbers that are boggling in size and can seem kind of unbelievable.

Ever wondered how many sites could possibly exist? Wonder no longer, cause this team did the hard math in this new open access paper in GigaScience(?)!


To get their numbers they pulled 1,000 papers. Not kidding, that's what they said. They narrowed the papers down to a smaller subset of experiments -- they looked at 97 human studies and did curve-fitting stuff to the data. They look at other organisms, but they have the most data points to work on from human studies.

Their results, in a nutshell, we may still have a long way to go before we identify every human phosphopeptide -- current technology may be able to find about 40-60% (and you know the ones we haven't found yet aren't gonna be easy).

However, we've probably identified most proteins that can be phosphorylated! It sounds like finding the other sites on these proteins is what is needed at this point.

Are they right? I'm certainly not qualified to answer that, but it does help put the numbers in perspective!

Thursday, July 6, 2017

The developmental proteome of Drosophila over 15 life stages!


Drosophila melanogaster is the geneticists tool! What is someone doing stealing it and doing protein work on it?!?  Oh, you know, just finding out more about it than we ever knew!

You can check it out in this new paper in GENOME RESEARCH (yeah!) here!


They sampled fruit flies at 15 stages of their development -- including early stages that have nauseating names like "pupae". They harvested this grossness and used 2 different protein extraction techniques. I'm just a little unclear on this, but it appears that the early embryonic stage organisms need to have the proteins extracted in a different way. Dechlorination is involved. The proteins were separated in the first dimension with SDS-PAGE and peptides digested out for 280 min nanoLC runs into a quadrupole Orbitrap (plus) system running a standard Top15 method.

The resulting output? 8 MILLION high resolution MS/MS spectra. The data was all processed in MaxQuant using LFQ and all the downstream stuff was done in R.

Nearly 8,000 proteins were monitored across the life cycle. In case you're wondering, a study at MaxPlanck using GFP tagged proteins thinks that there are around 10,000 drosophila proteins. Considering that GFP tagging into the genome would put a tag separately on two proteins with 100% homology... 8,000 is probably pretty darned close to the whole thing.

Okay...so why is this cool? I'm sure there's some genomics/transcriptomics resource for every stage of development in fruit flies that has been around for years, right?

They track a bunch of proteoforms.
They establish the existence of some small proteins flagged as noncoding by the genetics
They find clear discrepancies between the mRNA expression levels and what is clearly the protein level expression. (Yeah!)

All the RAW data is available at ProteomeXchange PXD005691 and PXD005713

Better yet, all the expression data is easily searchable in their handy web interface that you can access here!

Wednesday, July 5, 2017

Are we underestimating the artifacts created by reduction/alkylation?


I don't have time to go through this one, but from the abstract alone, this looks worth sharing (also reminds me to read this when I get home!) Honestly...I feel like this has been evaluated in depth a few times in the past...but still worth reading later!

You can check it out here.