Tuesday, December 29, 2015

CPTAC shows high reproducibility in Orbitrap quan between systems AND methodologies!

Once in a while I run into someone who heard from someone else that Orbitraps aren't good for quantification.
...and I try really hard to not make this face...

Our good friends at CPTAC decided to make the ultimate comparison. Over 1,000 (one-thousand!!!) LC-MS/MS runs. From different mass spectrometers. From different institutes. With different quantification technologies. On xenografts! (That's a human tumor grown on a mouse. You don't get much more variable).

They compared iTRAQ quan with XIC based label free quan (peak area integration) and spectral counting. What did they find? I'll just quote it.

"If laboratories deploy different methodologies to analyze the differences between the same two complex samples, then they will assuredly see differences in the gene or protein lists produced by the two technologies. The degree of conformity observed in this study, however, was encouraging. When label-free data were analyzed by spectral counting rather than precursor intensity, the differences yielded a high degree of overlap. When iTRAQ rather than label-free methods are deployed, the differential genes were again quite similar. These overlaps suggest a degree of maturity in proteomic methods that has grown through years of development along multiple tracks.
At base, biologists need to know that differential proteomics technologies can produce meaningful results. Our assessment showed that biological pathway and network analysis is highly consistent across instruments."
Right?!? Ben's interpretation: We're still getting a subset of the data in something as complex as a human tumor. We can bias this subset by using completely different methodologies, but even on the most complex human samples and experiments, we're at a point where we are HIGHLY reproducible. And this is the global/fractionated stuff....

Monday, December 28, 2015

Protein carbamylation is a hallmark of aging - and how to detect it

A recent paper in PNAS makes the statement in the title "Protein Carbamylation is a Hallmark of Aging. You can find it here.

They find that you can almost assess the age of a mammal by looking at the degree of carbamylation in the proteins of that mammal. I'm not 100% awake yet, so it took me a minute or two to remember what carbamylation was and why it puts up a little alarm in my head. Then I found the image above. Most of the time when I think about carbamylation, its cause its a sample prep issue.

Here is a paper that discusses this modification.  When I run Preview on a sample and it pulls up carbamylation as a modification to consider I've always assumed it is from a protein prep in which either excessive Urea was used, or Urea was used and the prep was performed at too high of a temperature. Turns out, it might be detecting old samples as well? Interesting thought, right?

Detection of this modification is very straight-forward in any search engine. In PD you just need to activate the modification in the Administration --> Maintain chemical modifications tab.

With this valuable new information, I expect y'all to get on reversing this aging stuff 'cause it kinda sucks.

Sunday, December 27, 2015

Pinnacle -- the best translational software I've ever seen.

I've been wanting to talk about this one for months!!! Unfortunately, I do have a day job and there are rules I have to follow to keep that day job, so I held my tongue until I found out I was finally allowed to talk about it this week.

At HUPO I got to see Pinnacle. Pinnacle is software specifically meant for all you translational people out there. I know, there is a ton of software out there, but I'm going to argue that you ought to demo this one if:

1) You have so many clinical samples (especially high resolution ones) that you can't process them in anything like a reasonable amount of time
2) You are doing label free quantification
3) You are doing data-independent analysis (DIA, pSMART, WiSIMDIA)
4) You just want to use a piece of software that is graphically pretty.

This software is fast. Sick fast. It-shouldn't-possibly-be-this-fast FAST.  Put in HUNDREDS of Q Exactive Raw files -- targeted, untargeted, DIA, whatever -- and watch it pull the data out in minutes.

Wonder what the data quality is like? Just look at color and shape of the icons on the left (click on the pic above to zoom in) and get a feel for the quality OR look to the right of the peptide sequence where you ACTUALLY SEE THE INTEGRATED PEAKS.  Sorry to shout, but how cool is that?  "Wow, that is crazy upregulated! Should I investigate it? Nope, that is obviously just a poor integration. Better readjust that integration right now". In real time. Without changing the settings and reprocessing the data. Just fixing that peak. Click, click, done.

Pinnacle has a bunch of other functions. Its a thorough software package and you purchase the modules that you need for your work. You can also download a free trial version here that lets you process one dataset and see what I'm talking about.

Saturday, December 26, 2015

Interesting, though somewhat morbid, article on elite scientists and progress

I'm not entirely sure what to think of this. Partially because I'm having a little bit of trouble wrapping my head around it. Maybe part of the difficulty is that the article is from the National Bureau of Economic Research. Which, Wikipedia tells me, is a real thing.

Anywho...you can read the article I found on Vox here.

And the abstract for the original article is here (there is a $5 charge to download the complete article)

Friday, December 25, 2015

Christmas Magic -- Multiply charged proteins ionized with no energy!

This is really interesting. What if you could just mix up your proteins, including the big ones with your matrix compound(s) and then magically get multiply charged species into your mass spectrometer? No energy. They just grab some protons and go flying into the air? Well, it sounds like you could save a lot of energy on lasers....AND....maybe you could finally give up on that weird old TOF in the corner that can go to 100kDa (you know...the one that is 8 foot tall and has accuracy within 1kDa...or 2...)

Well, that appears to be exactly what happens. What?!!?  I know!

Check out this paper from Sarah Trimpin for more details. Hey, if nothing else, it has one of the single most amusing abstracts I've ever read.

And its got this great chart!

Thursday, December 24, 2015

CaspDB - A database of caspase cleavage products!

Another tool to help find identifications for unmatched MS/MS spectra!  Caspases are proteins that hang around just to destroy other proteins. They are a critical component of apoptosis and normal cell maintenance, and if you believe the recent in silico protein cycle predictions -- they are active constantly. If my mix of proteins I just harvested is full of incomplete, complete, modified AND degraded proteins, then all these unmatched spectra start to make sense.

Caspases have specific substrates for degradation and a bunch of them have been worked out. CaspDB is a new online tool to help you work with this this information. It is described in this new Open Access paper from Sonu Kumar et al.,.

While the paper is totally neat and all, you can go directly to check out this online tool here.  You'll quickly find out that this tool requires a good bit of pre-existing information before it is useful. Once you've got some data, you can use it to run through your protein of interest and different caspases to see if you've got stuff that makes sense. The paper goes forward to show how awesome these prediction tools are by going ahead and proving that a ton of their software predictions are totally true.

This is obviously a very powerful and interesting tool and this will generate some great data from the validation end. But first you need to get some observations....

...(how did we ever get anything done before this...????...)

Check out this thing!!!  Its called Pripper and, wait, we'll need this...

...to go WayBack to 2010....to this paper from Mirva Piippo et al., that describes Pripper. Pripper is a Java tool that will take any FASTA database you give it and will perform in silico caspase cleavages on that database and give you a new FASTA that has all the predicted caspase cleavage products.

If you're thinking "How can I trust a tool that is 5 whole years old?" Never fear, it has been updated multiple times (the version I just unzipped is time stamped from 2013). Oh. If you download Pripper here you might want to right click on the zip file, go to properties, click "unblock" then "apply" and THEN unzip it. Windows Defender on my PC blocked it as a threat.

Now you have a tool that will make you a predicted caspase cleavage FASTA that you can run against your samples. If it comes up with something really cool then you can go to the CaspDB and search those observations against their more advanced prediction models (and validated data!)

Wednesday, December 23, 2015

What is a PFAM? And how do they deal with all this data?

Personally, I think the biologists and biochemists need to hurry up and annotate the function of every protein from every organism under every biological condition. Until they stop slacking and get that stuff done, we need to use some shortcuts to extract biological data from our peptide spectral matches. Fortunately, smart people have been working on this gap for us.

Gene Ontology (GO) is tricky stuff. If we don't exactly know what a gene does can we infer from its similarity to genes we better understand what the heck it does?

More tricky, and way more biologically relevant? Protein Ontology (PO?)!  One way of getting this data is via PFAM (which you can access here).  I'll be honest. I didn't really know what this is was for a long time. I just knew that it was an option in the Annotation node in Proteome Discoverer.  Cool, I have new column that says that all this stuff that is upregulated shares a PFAM ID (actually, I made that part up. Its never that easy, is it?)

Turns out that the people making PFAM are working really hard making this data:
1) More accurate
2) More relevant
3) More current

As you can imagine, all of this is hard, but...

(holy cow)...

Can you imagine what the 3rd one is like these days?

The amount of sequencing information in databases is increasing EXPONENTIALLY and the current tools for creating PFAM information increases at a linear rate. It doesn't take a stolen GoogleImage to show that this is a problem, but...I'm nervously waiting for an important phone call...so...

So, what do we do about it? Well, Robert Finn et al., say in this new OpenAccess paper, we fix the algorithms to deal with this glut of data. So they did.

When I clicked on this link in Twitter this morning, I honestly expected a dense paper that I probably would hardly be able to read and would likely not understand at all. I was pleasantly surprised to find that this team can seriously write and that I not only learned a lot about how PFAM works, but I also (think) I got a good understanding of their challenges and how their new algorithms power through in dealing with them. Solid and interesting paper that makes me want to add this column to all of my processed data from now on!

Tuesday, December 22, 2015

Updated guide to connecting your NanoLC-MS!

Got a Thermo nanoLC? Wanna connect it to a Thermo mass spec? Want every frickin' part number and easy to follow diagrams?

TAAADAAA!!! This link will lead you to a new and updated version of the nanoLC connection guide. It is at PlanetOrbitrap so you might need to log in and then re-click the link to get directly in.

Monday, December 21, 2015

PTMs in centromeres!

I had to dig deep in my brain and then finally just look at Google Images to remember what a centromere is and why its important.  Hopefully the nice sketch I found above clarifies it for you as well. Cause its the protein that holds chromosomes together. Its gonna be deeply involved in cell/chromosome division, sexual reproduction and probably all sorts of other things.

In this new paper from Aaron Bailey et al., in press (and currently open access) at MCP this group looked at the post-translational modifications that can show up on these important proteins.

They started with a HeLa cell line that had a stable affinity tag at some centromere and then immunoprecipitated to get at their proteins of interest. Chemicals were used to arrest the cells in certain stages of mitosis or something. Multiple enzymes, including LysC and AspN were used to get big chunks of the cleaned up protein for effective PTM identification and localization.

What did they find?

Sunday, December 20, 2015

BetterExplained -- a great site for math concepts

I seem to have forgotten all the little that I ever knew about Math. This site, BetterExplained, uses clever examples to either teach or remind you of what a match concept is.

Tuesday, December 15, 2015

Open Genomics Engine

Sorry, this is something I just stumbled on that I didn't want to forget about!  I lost the password to my EverNote account...but it does look super cool, right? If you're into that weird DRNA sequencing stuff, that is...

Monday, December 14, 2015

Use protein solubility to get around protein abundance issues in biofluids?

For biofluids, one of the biggest problems is the high abundance junk. "Junk" probably isn't the right word since evolution probably wouldn't have erred toward filling our fluids with albumin if it wasn't important, but...you know what I mean....

In an interesting take to this problem, Bollineni et al., tried a protein solubility approach. Rather than specifically depleting the most abundant proteins using an immuno-affinity approach, they used different concentrations of ammonium sulfate to precipitate or solubilize different populations of plasma proteins. This gave them a less directly biased way of fractionating out the high abundance things.

To my friends out there who are in the "do not deplete!" camp, sure, you're probably going to run into the same problems, like the fraction that has albumin will pull down tons of interesting things with it. But for people who will accept this loss in order to see the stuff that isn't at 1e9 copies per uL this might be an simple approach to see something different than what your Top4,10, or 14 depletion column is giving you.

Saturday, December 12, 2015

proBAMsuite! Great new proteogenomics tools!

Man, I love a software package with a catchy title. And I love a free software package that has a ton of promise!  proBAMsuite has all of these things!

Is a set of R tools that are meant to help you integrate the data from your next gen sequencing files with your LC-MS/MS spectra. This is an overview of the steps involved.

Of course, the process isn't trivial. The RNAseq data needs to be lined up and QC'ed and so do the MS/MS spectra and the PSMs and the Peptide matches. When we're looking at millions of measurements the number of false discoveries has to go up, just mathematically, nevermind the fact that not every MS/MS spectra or next gen read is as good as the others.

In order to control the false discoveries, the capabilities are in place to control the FDR at the PSM and peptide level. Even cooler, maybe, is this idea:  The decoy matches are kept and allowed to be mapped against the total genomics data, so you can get a good idea of the FDR at the complete, reassembled level!  Total system FDR.

Why would we go to all this trouble?

1) How bout more data about your protein than you'd maybe even want? Check out the suite's sweet output!

And, of course, more explanations for what those weird MS/MS spectra are!

Open access pre-release of paper here!

Friday, December 11, 2015

The second version of the OpenMS LFQ nodes are available! Now for PD 2.1!

The label free quan nodes from OpenMS I keep going on about?  Version 2 is now available!  More stable, faster, and works in Proteome Discoverer 2.1.

You can get them here. Once this PC stops looking like this:

I'll install 'em and give 'em a good hard run!

Keep this good code coming, people!

Thursday, December 10, 2015

Find unidentifed differentially regulated reporter biomarkers in reporter ion datasets!

I feel kind of smart for this one, though I'm afraid I'm getting to the point where I really really should get an indoor hobby of some kind since this is most of what I did last weekend. What do you guys do when its too cold to rock climb but you can't snowboard yet?

Anyway. I have access to an amazingly cool set of TMT/iTRAQ samples. I have access because there is a distinct and observable phenotype. Not a little one, either. The hundreds of samples in group 1 and group 2 are extremely different. Proteomics, so far, has shown just about nothing different between the two. Weird, right?  For years we've been suspecting a novel mutational system or PTM that we've just never seen before, but we've not been able to find a way to hunt it down.

So, here was the thought that killed this last weekend: What if I completely ignored the IDs? What if I only looked at the spectra that showed a significant difference at the reporter ion level?  And then I tried to figure out what they were later?

In PD 2.1 + Quan you can do this. There is a tab in your report that is your "Quan spectra".

You can actually go to that and look at every MS/MS spectra. You can see the RAW reporter values and you can even see your quantification spectra zoomed in.

So, you can actually go through and see all the stuff that is different. See the reporter ions above? This is exactly the trend I should be seeing in this sample set based on the phenotype. Exactly. And this MS/MS spectra is the most differentially regulated observation in this entire sample set of 1M or so MS/MS spectra. And this PSM shows up just like this three times in different, overlapping fractions. I think the precursor intensity for this is 1e6-5e6. More importantly, since in PD 2.1 we can plot our reporter ion intensities by their SIGNAL TO NOISE (yay!!!!!!), the S/N of these reporter ions are >500!!!

In sum, this is the perfect biomarker for this experiment and maybe the thing we've been trying to find in one form or another for 5 years (Holy cow, I don't think I'm exaggerating. Its 2015?!?!).  Not to get my hopes up to high or anything....

Where it gets difficult, however, is linking that back to the full fragmentation spectra.

For example, check this out, and I'd LOVE it if you guys had advice. I'm putting in a feature request and will be bugging the great people at PD.Support but I'll take any ideas I can get.

Anything from the Protein/Peptide/PSM and MS/MS spectrum can be checked and exported to .DTA, mgf, or whatever. Then I can do big DeltaM searches in Byonic or DeNovo GUI it or PEAKS it.

But I've got to go through one at a time and find the MS/MS spectrum info to export. Kinda looks like next weekends gonna be a wash if I can't find a shortcut (cause I have about 200 interesting things to look at now that I have NO idea what the fudge they are!)

I suspect I'm looking at a PTM but I don't have anything to match any of our normal suspects. Or...I'm looking at unique class-switch sequences in the variable regions of antibodies!  Either way, there are biomarkers in this dataset that traditional peptide searching can not identify and the dataset is just too big for Byonic WildCard, but here I've vastly reduced (computationally, at least...) the complexity of this problem!  Will I find my biomarkers this way? Who knows, but on some of these hard datasets we need every lead we can get, right?

Again, if you have any advice or thoughts on how I might simplify this, I'd love to hear it!!!

Tuesday, December 8, 2015

Full optimization of a QE HF for TMT quan!

Got a QE HF and wondering how you can best optimize that speedy monster for the best possible TMT 10plex quan? Well, you don't have to do the experiment yourself, cause my buddy Tabiwang (et al.,) already did that for you.

You can check out a description of the method on Accelerating Science here.

And this will directly link you to the poster describing the optimized parameters.

Rumor is that an extensive application note may be in development.

Monday, December 7, 2015

Playing with the OpenMS Proteome Discoverer community nodes!

Hot diggity dog!

I got some samples and got to work playing with the OpenMS PD Community nodes for PD 2.0, which you can get here!  BTW, new and improved nodes are coming for PD 2.1!!!

Here is the processing setup. The LFQ nodes require Sequest and Percolator for now. I looked at my samples and picked a good retention time that made sense. The peaks were real nice so I used a typical retention time of 60 seconds.  I would have used a smaller window with other LFQ software, but this stuff is fast enough that I didn't really care.

Note:  In Spectrum selector "MS Order" MUST say "Any" or it won't work.

These are the settings I used for Consensus. It appears that you only need the two nodes on the right, but I don't see any problems when I use the other nodes. The data may not fully integrate, but it doesn't hurt the output. There is a dramatic difference in speed on my PC when I change the number of cores that the Profiler is allowed to use. If I give it 8 cores this thing is faaaaaassssttttt!!!!

Okay. Boring part over!  How's the data look?

Well, you get these sweet new tabs!  Quantified proteins/ quantified peptides and EVEN BETTER?!?!? Quantified features!!! You get quantification even if you didn't identify stuff.
"Hey, whats that thing that's upregulated 27-fold in the tumor?"  Well, sir/madam, that is your biomarker. Figure out what the heck it is now.

Okay. Sorry for all the scribbles. This isn't my data. This anonymous protein is present in all 12 samples analyzed. The files I put in are labeled in order of their "F" value in PD.  My first file is "Abundance number 1".

I can go into quantified peptides and/or features to see the individual quan values, or I can pop over to the PSM tab and see how the original MS1 intensities look.

Okay. But this is the real test. How do the values compare to the RAW intensity values and XIC areas?

 Really really well. Definitely try out this software.

Saturday, December 5, 2015

Confirmation of NIA standards for Alzheimer's disease via protein biomarkers

So, I've read 2 biology sciency things today. In both cases, the scientific method was at work (YAY!). Researchers were looking as published results and in the first case (the tardigrade genome I mentioned earlier in the week) striking problems were found with the data.

The second paper is more positive for the studies pre-dating it!  In this paper from Huded et al., some researchers in India decided to test the National Institute on Aging's criteria for Alzheimer's diagnosis and progression.  This requires removal of cerebral spinal fluid (CSF) and testing for a number of known protein biomarkers. Quantification reveals presence and severity of the disease.

Now, there hasn't just been one test on these biomarkers, there have been tons of them. So it would be super weird if they didn't check out properly (or you could blame it on the ELISA assays they were using). But, hey! sometimes you want to verify it yourself, especially if it requires extracting fluid from someone's nervous system!!  On diseases that are this nefarious, every data point is going to help. Lets get early detection and drugs on the market, STAT!

Thursday, December 3, 2015

A team at UVA decided to rewrite the textbook on antibody profiling.

This is such a great paper!  AND its Open Access. Several people who occasionally read my ramblings here who need to see this right now are about to get this link emailed directly to them! You're welcome!

The paper is from Lichao Zhang et al., and some guy named Don Hunt was apparently involved which might explain some things about it.

When I visit people who profile antibodies, they are doing 2 things. First they are getting intact masses on the antibody. In big facilities, maybe its a whole group of people figuring out what intact protein masses are there.  The second thing is digesting with trypsin and peptide mapping. Between the two groups they pretty much figure out what they're looking at. Groups that use multiple enzymes get better coverage, but you're looking at a ton of runs.

This approach? Kind of a lower-middle down approach with just enough awesome tweaks to maybe get the whole antibody figured out in one shot!

They start with the whole antibody and then they reduce and alkylate it (more on that in a minute). Then they run it through or over an immobilized enzyme I've never ever heard of, aspergillopepsin I, which instantly cuts the antibody to pieces around 3-9 kDa long.  See? Lower-middle-down! What else would you call it?

What else would you call peptides that are 3-9kDa long? Perfect for ETD!  In this case they used an LTQ Orbitrap Velos with ETD. And these perfectly-sized fragments give off amazing levels of coverage. They process everything with ProsightPC BioMarker search functions.

Okay. Neat, right?  But it gets better.

The digestion occurs with a bioreactor. The antibody goes in and comes out digested...and the reaction quenched. Want bigger fragments? Increase the flowrate. Smaller? Decrease it.

One last thing. They alkylate the cysteines with a new reagent. Its called NAEM

Not only does it alkylate in 10 minutes, but it also puts a positive charge on the cysteines which aids in fragmentation.

How's it work out? Absolutely ridiculous levels of coverage of these huge and hugely important proteins and their PTMs in record instrument time!

Wednesday, December 2, 2015

What the heck is this metformin stuff, and how is it slowing aging?

So I'm sitting here watching the Q Exactive blow away yet another triple quad (the Q E is like "what's a matrix effect? never heard of it...") in terms of small molecule sensitivity and I have some time to browse Twitter and I see this article from June!  What the Albert heck is this about?  (Shoutout to @attilacordas for the Tweet)

Human beings are being given an Anti-aging drug? Sign me up!  Turns out you have to be suffering from one of three conditions to be eligible - cancer, heart disease, or cognitive impairment. You might argue I qualify under the third condition, but I think they mean decline from baseline and I've always been this way. The drug they are testing is metformin.

Next question is the title of this post. What does this stuff do?

Well, Chengkai Dai says its something like this (in this here paper):

In this paper he argues that HSF1 is repressed by nutrient stress and/or metformin and induces "proteomic chaos" and this messes up tumors. Which sounds pretty awesome. But what does this have to do with aging?

Time to turn off the "since 2015" filter on Google Scholar and travel far, far back in time!

So, this group dosed worms with the drug and did 2D-DIGE and peptide mass fingerprinting to come up with the solution that its all about reactive oxygen species and peroxiredoxin.

Which sounds an awful lot like what the resveratrol people have been thinking their drug does...either way this is super cool.

Tuesday, December 1, 2015

Can phospho- protein profiling be highly reproducible?

DNA damage response proceeds along a tightly controlled phosphorylation cascade. The main operators are very well studied and predictable enough that immuno assays are used in the clinic for these phosphorylation sites.

Could you quantify the entire cascade with a single LC-MS run? And could you do it with a high level of reproducibility?  Sure looks like it!

In this paper from Jacob Kennedy et al., out of the Fred Hutch, they use a single step IMAC pull-down followed by MRMs and the data looks fantastic. (Max CV on phosphos of ~16%?!?!)

Do 150 Western blots? Or monitor all of this pathway in a single run? Makes the mass spec seem like a pretty cheap option, right? Now, being the resolution snob that I am, I would like to point out that the relatively small number of targets here, this is something that could easily be adapted to a Q Exactive. Again, the data looks seriously fantastic here, but if I was to improve this assay in any way I'd want to see my fragment ions in PRM +/- 1ppm.