Sunday, December 31, 2017

Functional proteolytic cleavage proteoforms as PTMs?


Ouch...okay....this review is seriously dense and worth taking a look at if only to think about how much potential biology we may be missing because we cut all our proteins up with enzymes and lose much, if not all of the information this review is focused on....

It is no surprise their for shortened protein variants out there. My problem conceptually here is the number of them that are stable and functional and the number of mechanisms that they are derived from. Maybe I just need more coffee, but this bears a much closer look! Especially as you see how critical these cleavage variants appear to be to central canonical pathways. Considering how few pathways are known compared to the unknown ones makes this matrix seem more than a little daunting.

Saturday, December 30, 2017

The Advanced Precursor Determination paper is finally out!

At ASMS 2017 in scenic Indianapolis, a new functionality called Advanced Precursor Determination was rolled out for the Q Exactive HF-X and Fusion Lumos systems from TFS.

Details were a little vague on how this worked, but this new Just Accepted study at JPR fills in all the blanks!

Maybe details were a little scarce because you need a flow chart to explain how it works?

There is a lot of other good stuff in this study including some up-to-date numbers on theoretical cycle time required to select all peptides for fragmentation in a complex mixture as well as a good explanation why we don't always hit our TopN.

Highly recommended, even just for better understanding sampling dynamics!

Friday, December 29, 2017

Extended deadline -- ABRF needs participating labs for their phosphoproteomics standards study!!

Need an awesome phosphoproteomics standard?
Want to interact with some of the top experts in the world on this type of experiment?

Check out the sPRG study! They are still accepting volunteer labs!!

Thursday, December 28, 2017

Microwave digestion returns -- and appears really useful for single protein mapping!

I was initially skeptical when a friend told me I should check out this paper.

I've been down the microwave digestion route before. And, while I may have primarily been young(er)  and stupid(er), I still believe that the microwave cooked off my phosphorylations.

However -- this new study is pretty convincing that the technique has real value! 

Unlike enzymatic digestion which will occur reliably at specific residues, put some formic acid in with your proteins and microwave them -- and the proteins break all over the place. Reactive oxygen/ I don't know, but it does really work. (but it might cook off the phosphos first )

Wednesday, December 27, 2017

Specificity of phosphorylation responses to MAP kinase pathway inhibitors!

This new study in press at MCP is a practical "how to" guide for studying MAP kinase inhibitor phosphoproteomics with high certainty in cell culture!

This is actually an extension of previous work from this group on established inhibitors (that is open access, has some incredibly nice figures and is available here!)

In this study the authors look at two ERK inhibitors and compare them to the clinically used ones that inhibit other proteins like BRAF and MKK1/2.

The proteomics is pretty clearly explained in the figure at the top. An Orbitrap Fusion is used for the LC-MS/MS analysis running each sample in 3 replicates. This is where something interesting happens. 2 of the replicates are ran using OT-OT (orbitrap for MS1 and MS2) analysis using the "top speed" method. The third replicate is identical, as far as I can tell, except the MS/MS is obtained with HCD-iontrap. You'll discover rapidly when running the same sample with just this difference in instrument settings (one toggle on the Fusion) that ID coverage is a little different. The little differences between the speed, sensitivity and general physics of the two MS/MS acquisition methods can lead to differences in PSM identification. This is a really neat trick that I hope to see more in the future!

The downstream analysis is performed with MaxQuant with LIMMA thrown in for significance changes. The SILAC labeling is switched between each replicate(! lots of work!) and the result of all this effort is a really nice picture of the total affects of the MAPK pathway global proteome and phosphoproteomic (most important) shifts when each of these inhibitors is employed. One of the compounds is shown to have a high number of off-target effects, while others seem to do exactly what they are stated to do!

This is an MCP paper so a serious level of downstream validation goes into the study as well. I'm obviously not all that qualified to assess how they did that, but the experimental design, sample prep, enrichment and downstream data processing is all top-notch work. What we get out of this is a method section that I would follow as if it was a Kristen Kish recipe in my kitchen if I had to assess the specificity of new drugs that affected anything in the MAP kinase pathway (or, maybe more importantly, if their inventors claimed that's what their compound did...)  ;)

All the files from the study are available on ProteomeXchange via the PRIDE repository. There is a serious amount of work here, so the files are divided into 3 separate numbers.

PXD007620, PXD001560 and PXD007621

Tuesday, December 26, 2017

Impact of detergents on membrane complex isolation!

Ever needed a good protocol to analyze a protein complex with some membrane components? Check out this awesome systematic analysis!

This team focuses on something called Cad11 (it appears to have some really interesting significance in cancer on top of having a membrane linked protein complex structure appropriate to exactly this type of assay optimization.

There are some really striking differences in the study including when the use of one detergent leads to the complete loss of the ability to pull down their complex(!!) when others can as well as the observation of novel proteins only observed when certain detergents are utilized. Sample prep optimizatin, FTW!

iTRAQ 4-plex reagent is used for a quantitative readout of complex isolation efficiency and is analyzed on an Orbitrap Fusion Lumos system using MS2 based quantification.

Considering how important the different detergents appear to be here, it might be fair to wonder if these results would reproduce for different protein complexes.... If this isn't your protein of interest, it at least shows you how valuable picking the right detergents can be and gives you a good starting point!

Tuesday, December 19, 2017

ProteomeGenerator -- Another step closer to comprehensive proteogenomics!

Wow. Um...this is a solid study. Awesome new tool? Top notch mass spec work? The best gains I've ever seen in incorporating transcriptomics into a proteomics workflow? Check. check. check.

It is in BioRxiV here.

We've seen some big proteogenomics papers and I'm sure that we've only seen the tip of the iceberg, but there is some important stuff in this paper. First off -- yeah -- we have to do some bioinformatics to make this work, but this one is laid out pretty well.

Second -- the pipeline here runs in something new to me called SnakeMake. It appears to be a framework to basically make any code as scalable as you want. Scalable isn't a word? Too bad. 

A really interesting finding is the size of the databases this tool generates from the transcriptomic data -- it isn't huge amounts of FASTA coming out of this tool and skewing your FDR all over the place. It is compact. Smaller than the canonical databases, because your cell type isn't producing every theoretical human protein at every point in time. It's only producing transcripts for the ones it needs. Smaller databases, lower FDR, and more peptide matches!

Thursday, December 14, 2017

A simple workflow to diagnose some sample/instrument quality issues using PD!

A couple people have asked me to look into this over the years and I thought I'd finally give it a try.

Here it goes!

Sometimes you just need a quick snapshot that will tell you if the samples that are running on your instrument the next 2 weeks is worth your time. Could we just build a quick Proteome Discoverer template that would allow you to snapshot that first fraction to give you confidence that everything is okay?

To keep it simple for the first one I'm going to say these are the requirements:

1) A histogram of the relative mass discrepancy at the MS1 level
2) A measurement of the relative number of missed cleavages for determining your enzymatic digestion efficiency
3) A measurement of your alkylation efficiency?
4) Complete data analysis in under 10 minutes on a normal desktop.
5) Must use either Proteome Discoverer normal nodes or IMP-PD (the free Proteome Discoverer version)

#1 is super easy. #2 requires some serious computational power to do correctly on a modern complete RAW file and will require a bit of data analysis reduction.

If you are working on human samples, I'll walk through it. I'll try to post the FASTA and templates somewhere here a bit later (out of time).

If you are working on something else -- tough luck. You'll have to do this yourself.

Step 1) Generate yourself a good limited FASTA. Something small enough to allow you to perform very large data permutations rapidly, but large enough to get a good picture of your data.

To get this, do a normal search of a representative data file. Feel free to use the default Proteome Discoverer templates. The only thing we're doing here is finding the most abundant proteins in your data. Fractionation may complicate this, but I ain't never seen a human offline fraction that didn't have an albumin or Titin peptide in it. I'd also use the cRAP database, but it isn't super important at this step as long as you do use it here somewhere.

I threw in Minora, but don't feel as you have to here. If you are using IMP-PD, use MsAmanda, Elutatator, and PeakJuggler.  Normal tolerances for your search (10/0.02 for FT/FT & 10ppm/0.6Da for FT/IT)

Same thing for the consensus -- something normalish. I'd throw in the post-processing nodes as well as the ProteinMarker node so that you can clearly distinguish your contaminants from your matches.

Step 2 Run this full search search.

Let's find the most abundant proteins and make a FASTA. You can do this a couple of different ways. I recommended using Minora and/or PeakJuggler so that you can sort your proteins by XIC abundance.

Interestingly, the most abundant protein is a cRAP entry. I'm starting to remember why I was asked to troubleshoot this file a few years ago and why I marked it "keep for example purposes"

Step 3: Make a small FASTA! What I'm going to do is filter out the contaminants and make a FASTA of the 150 most abundant proteins. You can use your mouse to hover over it and your down button on your keyboard to scroll. Once you have the area covered, then right click "check all selected in this table" then File > Export > Fasta > Checked only

Step 4: Process with this crazy FASTA! Now you have a FASTA to work with! Import it into PD through the Administration tab. Once it's in make a crazy method.

I'm allowing up to 10 missed cleavages. 100ppm MS1 tolerance and 0.6 Da MS/MS tolerance for FT/FT (ion trap, maybe try 2 Da?) please note -- this database is likely too small for Percolator to work well on. I've turned it off here and am relying on Xcorr alone (Fixed value PSM validator)

Even with 10 missed cleavages, my old 8 core Proteome Destroyer completed the file in 4 minutes.

Step 5: Evaluate the data quality: Let's check the deltaM. This is the pic at the very top of this post and it looks kinda bad. However, this is mostly a histogram binning issue. Change the number of bins to 100 and it looks much better:

What about missed cleavages?

A few -- but it looks like you'd capture well over 90% if you used 2 missed cleavages on this data. I'd say the digestion was okay.

Alkylation output:

In order to see your relative alkylation efficiency (please keep in mind I'm assuming iodoacetamide. You will need to make it a dynamic modification.

In your output report you can see your relative alkylation efficiency by applying this filter:

Then go to and plot this data:

In this output we're looking at around 73% alkylation efficiency. A quick look shows me that about 49 of these peptides are from cRAP -- even if you take those out of consideration (which only makes sense for peptides introduced late in the process -- this still is pretty low. I'd check the expiration data on this iodoacetamide, or see if it has spent a lot of time exposed to direct sunshine.)

This is an evolving project (there is a lot more we can do here) but I'm going to stop here for now.

Tuesday, December 12, 2017

Trypsin may be a limiting factor in alternative splicing event detection in proteomics!

Thanks WikiPedia Alternative Splicing page!

Alternative splicing is a big deal in eukaryotic systems. One of the first big revelations in "next gen" transcript sequencing is that these aren't rare events.

However, there hasn't been a huge amount of data from the proteomics side to back up the numbers that the transcriptomics people have been finding.

This new study in press at MCP suggests it might be our fault -- primarily our reliance on good 'ol trypsin.

Earlier this year we had this bummer study on cysteine alkylation reagents -- and now another weakness of trypsin...?  Isn't anything perfect?

Really -- this is only going to be a problem if you're specifically focused on alternative splicing -- it turns out that lysines and arginines are involved more often than other amino acids in these junctions. The trypsin cuts them up to things far too small to sequence -- or cuts them at just the right point that there is no useful information about the alternative splicing. (The espresso must have kicked in just now, there isn't a single period in that paragraph. Maybe I should go back later and add one).

These authors in silico digest some human proteomes simulating different enzyme activities. Trypsin doesn't lose all the information, but it appears that chymotrypsin and AspN provide better coverage of these sites -- however, as in all things it looks like using all 3 (separately, of course) will provide the greatest amount of coverage

Monday, December 11, 2017

The Synaptosome Proteome!

At the very top of my "favorite new (to me) field to say 3 times fast list" -- I present this awesome new study on the synaptosome proteome!

A little looking around and I had to add the "to me" part. There are dozens of studies on the proteomes of synaptic junctions going back to before I ever learned how to use a mass spec, but having not read any of the others -- this is, by far, my favorite one!


1) Real human brains were employed.
2) Even more impressive? This is tissue from a bunch of human brains with really interesting phenotypes.
3) Multi-plex iTRAQ was performed (2 4-plexes) -- and performed expertly. 8 "normal" brain controls were combined in equal ratios and these were used as channel 117 in both 4-plex groups. That's really smart, right? They could look at 6 patients from their disease state (the other 3 channels times 2) and compare it to 8 of their control groups. The control mixture in 117 can be used to normalize between the two 4-plex sets AND all interesting observations in the patient samples can be obtained by just using the 117 channel as the denominator. Simple -- and I'm totally using this later.
4) Other sample are also studied using label free quan. I have the RAW data files on my desktop, but I'm not 100% sure how the data was processed. The findings from the LFQ and the iTRAQ analysis were compared and combined.
6) PD 2.1 was used for the data analysis, and InfernoRDN (which I'd forgotten about somehow!) was uses for statistical analysis. If you don't know about this -- I highly recommend you check it out!
6) PRM was used to validate the interesting findings!
6) This data is also integrated into the C-HPP's search for missing proteins!

I can't gauge the value of their biological findings, but the samples are really cool, and the proteomics is some top-notch stuff. They point out some pathways that seem to make sense with the theme of the paper and that's good enough for me!

All the RAW data is up on ProteomeXchange (where I actually found out about this study) here. 

Sunday, December 10, 2017

Awesome matrix detailing the free PD nodes from IMP!

Without fail, every time a new version of PD comes out, someone discovers their favorite free tool from IMP hasn't been ported over.

IMP provides these nodes and does so out of their own good will for the community, and it's a lot of work.

This super handy new matrix shows you what tools are compatible with which PD versions. Keep in mind that many of these work within the awesome free IMP-Proteome Discoverer version as well!

Saturday, December 9, 2017

What are the best open Linux Proteomics tools?

I run into this question a lot and I'm surprised I haven't thought to work on a post like this before.

Linux is a family of alternative (and mostly free) operating systems. I tried really hard during grad school to live my life with these -- primarily because I was broke -- but also because I had this strong anti-corporate thing (I appear to have misplaced that along the way...).  This experiment failed dramatically. The versions I tried just couldn't support my crappy hardware and I wasn't smart enough to install/alter my own drivers to make them work.

These operating systems have come a LOOONG way since then -- are still mostly free -- and when universities build supercomputing complexes this is what they're going to use (or UNIX, but we'll ignore that).

Quick note on these, however, I've got to mess with one quite a lot recently. These huge "cloud cluster" computers have unbelievable numbers of processing cores and threads available. However, the architecture of these processors might be very different from the desktop ones that we're more familiar with. I was first allotted 8 cores for Proteome Discoverer 1.4 and 2.1 that worked within a simulated Windows environment. It was SO slow. I complained and they gave me 16, then 32, and it still wasn't as fast as my old 8 core desktop. I got bored and did something else. Really it might be that software needs to be designed specifically for these things to work optimally on them....

And that's a lot of words leading up to  -- what proteomics programs can I install on Linux!?!
In no particular order:

#1 SearchGUI!  

I <3 SearchGUI. You know all those command line search engines everyone talks about in the literature? SearchGUI make them all work in one simple, easy interface. You can then bring all the results together in PeptideShaker.  I would make #2 the denovoGUI -- but it has now been integrated into SearchGUI so it only gets 1 entry.  You can get it here.

#2 OpenMS!

I've only installed the great OpenMS package in Windows. It is a full integrated environment for mass spectrometry with quantification and identification for proteomics and metabolomics. There may be some extra steps to get it installed in your Linux environment. Fortunately, those instructions are here.

Now, it is probably worth noting that you can really make any Windows program work in a Linux environment by creating a simulated Windows environment within the Linux system. This is what I was doing with Proteome Discoverer, but the performance was too much of a problem. This really might have been that I didn't know what I was doing...

This page is a resource that will help you set up MaxQuant working in the same way!

Heck, this guide will help you set up Proteome Discoverer (1.4) on Linux as well.  Please keep the limitations in mind.

There are other tools that will work in these environments as well, but I don't have hands-on with them personally:


The TransProteomic Pipeline!

And...of course you can install R and use R for Proteomics.

Many of the IMP tools (including MSAmanda) can be operated stand-alone in Linux

Please let me know what I've forgot, I'm sure there is a ton!

Thursday, December 7, 2017

Q Exactive HF-X paper is out!

This "just accepted" paper at JPR is a gold mine! 

Head to head comparisons -- QE HF with HF-X.

How fast is the HF-X in real proteomics samples? Really fast!
How sensitive? Same gradient conditions, 100ng of peptides on the HF-X gets the same coverage as the HF at 1000ng!
What is the overhead of the HF-X? Somehow -- it again drops in this model. It's somewhere between 3-4ms!

The paper goes through TMT, phosphoproteomics, single shot runs and it's almost an afterthought -- that these authors take the 46 offline high pH reversed phase fractions from the Rapid Comprehensive Proteomes Cell paper -- and cut the nanoLC runs to 19 min with the same degree of coverage.

At 19 minutes it is realistically possible to obtain 2 (TWO!) complete human proteomes in about 1 24-hour day of run time! (My crude math)

The thing that slows you down the most? The nanoLC column loading and equilibration time.

Tuesday, December 5, 2017

Proteoform Suite!

I don't know if this picture is related, but Google thinks it is and I like it!

I can't yet read this paper. It's ASAP at JPR and my library doesn't list those for a few days typically.

You can access it at JPR directly here: 

Check this out, though! Before paper launch these authors have already put an instructional video on their cool new software (quantify and visualize proteoforms!!) on YouTube.

I'm having trouble embedding the video here. I took a screenshot at this point so you can see some of the awesome output capabilities of this new tool. Direct link to the video is here.

Since we have to admit that proteoforms exist and complicate our work. Any tool that will help you group them into families -- quantify your changes -- and help you make sense of all that stuff you found is going to be seriously useful.

Monday, December 4, 2017

pSITE -- A computational bulldozer approach to de novo global PTM analysis

Sometimes elegant solutions are in order.

Other times you've got to put on your cowboy hat and check every possible AMINO ACID where a PTM could co-localize in a gigantic peptide matrix to determine where the heck that modification has ended up. And...maybe that's how you have to do it to accurately determine your FDR.

I don't mean to insult the authors of this cool new paper -- at all -- quite the opposite.

I have loads of respect for the fancy pants statistics stuff that I don't fully understand. However -- pSITE cuts most of that out of the way by just checking everything and in their hands it looks superior for global de novo identification of modifications and their localizations than our classic methods like Ascore and phosphoRS.

You'll have to check out the math for yourself if you're interested. One of the big surprises for me in this paper was in the supplemental info. With all the amino acid specific calculation, I'd expected the search space for pSITE to reflect the actual amino acid length -- for example, a 12 amino acid peptide would have a search space ^12 larger than NovoR or Peaks.

I don't know how this is possible, but pSITE somehow (on the same server configuration) isn't slower than these other algorithms. It is somehow faster than some of them....

pSITE is free to download and you can get it here. 

Sunday, December 3, 2017

MORPHEUS (not that one!) at the Broad for downstream analysis!

Have you been trying all weekend to successfully do some sweet clustering on a collaborator's data, but keep getting this when you try to run ClustVis? 

Have you already tried the popular American strategy of calling the person(s) responsible a name on Twitter to see if that helps?

..and it totally didn't help at all?

Are you also too lazy to do it in R yourself, despite the fact that Brett Phinney tipped you off to an awesome and super easy looking package that would do it?

Well -- do I ever have the solution for you!

Check out the MORPHEUS web interface hosted by the Broad Institute here!

1) Wow, is this thing ever easy! It was designed for genomics stuff -- wait, some of the example data is quantitative proteomics! You have a format to follow!  Just cut your data out into a Tab delimited text file and go. It looks like you can load your entire file and then determine what is important, but I found it simpler to cut it myself down to my protein accession numbers and my normalized protein abundances from each RAW file.
2) It has a lot of fancy stats things in it that you can use. Do they work? Sure probably!  My results seem to make a lot of sense....
3) It doesn't like enormously huge names in the Row and column titles. If you have, for example, clustered your RAW files 7 different ways in Proteome Discoverer 2.x and the title of your row has over 128 characters, it will compress the space for visualizing the clustering distance. Shorten the name and it's fine!

And it's got a bunch of cool stuff to choose from once your tables are loaded!

(This isn't patient confidential data or anything like that -- but unless I've gotten explicit permission to share someone's stuff I figure you can never be too careful. See what a good collaborator I am? Just don't expect analysis in a hurry...)

You have loads of power to take a first pass at your data here. All the normal clustering algorithms (that I have to go straight to WikiPedia to remember what is what) are here. Surprisingly, it appears to do all the analysis within your local browser! When I really loaded it with data (clustering proteins in one tab and all quantified peptides in another tab) it used up a good 10% of the processing power available on my PC for upwards of 10 minutes before wrapping it up. (This suggests to me it is limited by the amount of math my web browser can handle). Come on, Chrome! You can do it!

Are there a ton of ways to get your data clustered? Sure! But it never hurts to have another nice (and easy) tool to get there.

Is it a little confusing that it shares a name with another great tool? 

Saturday, December 2, 2017

PoGo! Overlay your identified peptides onto your gene assembly!

You should be able to click on the picture I stole above to expand it!

I'm kinda leaving this awesome tool here so that I can find it later to check it out.

You can access PoGo directly here. 

Why would you want to? Proteogenomics, Kyle. Don't know which frame is the correct one? Overlay them (this all goes back to the codon uncertainty thing)

Big shoutout to @clairmcwhite for the image and tipping our multi-omics Twitter community off to this awesome resource --and to the team at something called the...

...for putting this up. Can't wait to get ahead a little so I can check it out!

Friday, December 1, 2017

Mitochondrial dysfunction in heart tissue revealed by RNA-Seq + Proteomics!

The biology in this new study is intense. I know approximately nothing about cardiovascular proteomics (honestly -- as far as I can tell, not many people do -- definitely an underdeveloped field at this point, but there are great people working hard on it!)

What I'm interested in here is how these authors approached the experimental design in terms of combining transcriptomic analysis with their quantitative proteomics.

Don't get me wrong -- there is some solid proteomics work in this. If you are looking for a recent method to enrich mitochondria from cells -- this (and the associated study referenced in the method section) is/are for you. These authors start with an impressively pure mitochondrial fraction before the peptides from it go onto a Q Exactive HF. In just a mass spec nerd note -- this group sacrifices some scan speed and MS1 resolution in order to get higher resolution MS/MS. They use 60,000 resolution for MS1, top10 selected for 30,000 resolution MS/MS. We've been seeing this more often recently.

My inclination is always going to be to get more MS/MS scans -- at the sacrifice of relative quality/scan. In simpler mixtures of peptides, it seems like more researchers prefer to get fewer MS/MS scans if the scans can have higher injection times -- and if you're using longer fill times, you might as well get better resolution MS/MS right? It will be free in terms of cycle time on the lower abundance peptides and will only cost you cycle time on the higher abundance peptides. I should check later to see if anyone has done a comprehensive comparison at different complexities....

Back to the paper -- all the label free proteomics is processed in MaxQuant and the transcriptomics is performed using a commercial kit for heart proteins for the RT-PCR and Hi-Seq analysis using a mouse kit.

The impression you get from this paper at first is that the combination of this staggering amount of data from all of these mutant mice -- is that it was easy. There's no way it was. No way. But they make it look that way till you dig deep into this massive body of work.

And that's why I recommend this paper -- they did this work and laid it out end to end here so I don't have to. If someone hasn't dropped by your lab with samples they've done transcriptomics on and want to combine it with quantitative proteomics -- you'll see them soon. And -- it's tough. We don't have unified pipelines --yet. You need to use some of their genomics tools and our tools and find things like IPA, CytoScape and the right R packages (if they've done the transcriptomics, they probably have an R specialist around) and this study is a great framework for how to pull this off!