Monday, September 26, 2016

GAPP -- Proteogenomics and PTMs for microorganisms!


...um...this statue is in DC...and I'm pretty sure you aren't allowed to climb on it. But it is my favorite image that popped up when I went for "on the shoulders of giants" cause this is what GAPP reminds me of.  Watch for the park police, kid!

So...what if you looked at the open source toolkit landscape for proteomics (and had programming capabilities!)? Would it make sense to completely build your own tools right now? I totally wouldn't!  There is so much cool stuff out there, I'd just piggyback some existing framework and add what I needed.

And this is EXACTLY what GAPP is doing.

You can read about it (in press and currently open at MCP!) here.

Check out what is inside the Zip file if you get it from SourceForge here!


I was a little alarmed by how long it took Windows Defender to scan through this file (it was clean, but always scan whatever you download!) but it made sense when I opened it.

GAPP fills the gaps all our favorite tools leave in the Microbial proteogenomics workflow. They couldn't find anything that would make their database -- search their proteomics data (extensively! look at all those engines) and process their PTM data, so they made a Java interface that would by using a bunch of tools that are already out there -- awesome and can be networked together.

This is the general scheme:


To validate that the complicated thing all works -- they downloaded a very nice and extensive H.pylori dataset from PRIDE (this one from Muller et al.,) and went to work.

Worth noting:
 There are some pre-requisites necessary to install this. You must have Java 1.6 or later and you must have Perl already installed to use the Java interface. I can't tell for sure, but I'm going to suspect that you are going to need the MSFileReader separately installed as well, as its a pre-requisite for the ProteoWizard.

I really like this study. We don't need to reinvent the wheel every time!  Sometimes we can just tweak the awesome stuff that is already out there!

Sunday, September 25, 2016

Bethesda Thermo User's Meeting October 27!!


I LOVE the User's Meetings. I generally take a vacation day and fly up to Boston to see that one, but -alas- I'd already booked alternate vacation plans before they announced the date -- next year!!

This year I got to help plan and pick speakers for the Maryland one. This is the lineup!

You can register here (space is limited!)



Why you wouldn't want to miss it if you're here!?!?

Dr. Pandey showing how he can integrate DIA with proteogenomics!!!
Dr. Cole showing the use of capillary electrophoresis in proteomics!!
FPOP!!  I still haven't seen Dr. Jones since she relocated to Baltimore!
Dave Muddiman is here!!
Dr. Jenkins is going to talk about METAL!



...in Zinc finger proteins!  There is lots of other great stuff as well, of course!!



Saturday, September 24, 2016

How does the Precursor Ion Area Detector node work?


I'm procrastinating this morning and just when I was running out of excuses for not finishing an ongoing bathroom remodel, I realized there were a bunch of unapproved questions/comments on the blog!  This is the last one. After writing far too many lines in the little comment box about why the NIST antibody is so much better than the commercial sources that have been around, I didn't want to tackle this one the same way.

How does the Precursor Ion Area Detector node work? And a reference?

The reference might surprise you!


You can direct link to it here, and I think its open access.  Look, I'm gonna give Q-TOFs a hard time. I've only had one in all my career and it was, on its best day, a turd sandwich not very good. [Completely dedacted rant about Ben's hatred for Q-TOFs and sarcastic statements about their many uses as well as recently acquired facts regarding their value in scrap metal components].  Wow, I feel much better about publishing this post now -- and it is much much shorter!

Remember, though, that there was a day when this was the cutting edge and people were just as smart back then as they are now, and they did good research despite the limits of their instrumentation!  This paper is such a study. It is definitely intended to be a paper showing off a new (at the time) fragmentation technology, but in it they set the framework that most label free quantification is based on --or at least, influenced by.

The idea -- the high resolution (here 10,000) extraction of the intensity of the 3 most intense peptides from each separate maximum intensity is very strongly correlated with the abundance of the protein.

This is the Proteome Discoverer interpretation  -- you're ticking along and identifying peptides and you assign each PSM (peptide spectral match) the intensity it had in the MS1 event that it was selected from.  When you compile the PSMs into the peptide, if there are more than one PSM the peptide is assigned the intensity of the highest PSM. When the peptides are pulled into the protein or protein group, the average of the intensity of the (up to) 3 (adjustable in PD 2.1) peptides is averaged into the protein area.

If you have a protein that has only one PSM, this is easy. The "area" of that protein is the intensity of the PSM.
If you have 3 PSMs that all go to one peptide and into one protein, still easy. The "area" of the protein is the intensity of the most intense PSM.
If you have 3 PSMs for each of 3 peptides, the protein "area" will be the average of the most intense PSM from each peptide.

Important note here!  The protein "areas" will not always be calculated from the same peptides. If you've got something where you had 50-60% sequence coverage and have 200PSMS, chances are it won't be the same peptides at all. But, seriously, this totally works at the protein level. You are going to need to go to the PSM or peptide level intensities if you want to say, for example, how this modified peptide changes from run to run, and that requires a good bit extra work.

Michael Bereman, who knows a little something about protein quantification (SProCop! and QCMyLCMS.com) and he told me it worked, if I remember correctly, "surprisingly well". I use it in virtually every sample I process in PD. It has never once hurt me to have that extra information!

Are there better ways of getting relative quantification of proteins and peptides? Sure!  And these algorithms are coming -- and are going to absolutely change EVERYTHING about how we do proteomics -- Minora, PeakJuggler, and IonStar are all getting ready for prime time and are going to usher in something I think will finally be worthy of the title "next gen" proteomics by allowing us to finally see all the stuff in Orbitrap data that we've never seen before. Your Orbitrap, right now, is far better than you think it is.

Friday, September 23, 2016

iMixPro -- less false discoveries in pulldowns with heavy peptides!


Affinity purifications (or much cooler...affinity enrichments!) are ever in increasing demand. What protein interacts with my other proteins and how may be one of the most important things we'll be contributing to biology in the future -- once its not completely fracking impossible to do it. The new crosslinking methodologies that are coming are going to help, but iMixPro is another elegant approach.

It is described in this awesome new JPR paper from Sven Eyckerman et al.,!  I'll start off by saying it isn't the simplest method you've ever seen, but if you've spent much time doing protein-protein interaction assays you've either developed your own complex methodology that works and you're keeping it secret from the world -- or you'd try just about anything to figure out what is real and what is not! -- especially when today's super sensitive instrumentation is telling you you pulled down 1,000 proteins with that expensive "specific" antibody you just got in!

It differs from affinity enrichments in that it intelligently employs heavy labels (this is where the "i" comes from in the name). Having essentially SILAC pairs to look at in their data improves even the label free quan approach you have in affinity enrichments. Combining labeled peptides = less batch effects, which is never a bad thing.  They show some great examples where they can remove the noise and find their true interactors, even when the intensity of the true signal is only a fraction of the value of the other signals identified!


Thursday, September 22, 2016

New cool stuff in Q Exactive Tune 2.7



I popped by to visit Dr. Kowalak the other day to see what cutting edge science the NIMH Proteomics Center is doing these days and he showed me that his QE HF Tune doesn't look like my QE HF tune....

So I upgraded my QE HF Tune to 2.7SP1 to check it out!  There is a bunch of cool stuff in here!  One highlight: The confusing %underfill ratio is now gone and replaced by a much more sensible measurement. You now have a "minimum AGC Target" as well as your normal AGC target.

According to the manual, "IF the mass peak of interest reaches this minimum AGC target within the maximum injection time, a data dependent scan will be initiated"

I like this much better!

If I've got it right, this is the current instrument logic --


--and a pretty good drawing of me and my dog, Gustopheles (you're welcome!)

What else is included in the Tune 2.7SP1?

Loads of upgrades for the QE Focus (extended mass range, more MSX counts && some combination scans, like MS1 and PRM in the same experiment!)! And a software modification to make instrument bakeouts on all Exactives and Q Exactives more efficient!

Please remember that this is my interpretation and may not be 100% factually accurate or well drawn. Sometimes I put things like this up and the vendor involved will contact me to tell me I'm wrong and I'll walk away learning something. If that happens, I'll be sure to edit this later!

The best part of this is that I don't ever have to explain %underfill ratio again!


Wednesday, September 21, 2016

Threonine and Isothreonine have different HCD fragmentation patterns!

Yeah...I totally stole this from another blog in-between meetings, but its seriously cool. The original blogger put it up on Accelerating Proteomics here.

The original article is from KG Kuznetsova et al., and can be found here.  (Side note: Man, there has been some cool original research coming out of Moscow lately! Keep it coming!)

Wait. What is Isothreonine again? Well, its also called homoserine and we sometimes see it in proteomics data, but it generally isn't a good thing.

Check out this quick image I borrowed from Alexey Chernobrovkin et al., from this paper a couple years ago:


In this illustration, the protein is yanked out and digested and...crud...overheating the protein with iodoacetamide converts some of the methionines to isothreonines. Gross. Then, cause you don't have IsoThreonine in your FASTA, you end up finding a peptide with regular old Threonine in it. 

Boom. False discovery. Where is that a big deal?

1) De novo sequencing (nuts) -- you totally got a peptide wrong
2) Proteogenomics -- cause your huge database has lots and lots of possibilities in it. And...well...the chances that you'll have a peptide sequence in your database with a  xxxTxxxK (from a peptide that really started out as xxxMxxxK...but isn't there anymore is higher than when you are using a smaller, manually curated FASTA and your odds of making that mismatch is made higher just algebraically.

All is not lost, researchers who are banking hard on proteogenomics/metagenomics being the future!

Cause the original paper I found at the top did a focused study with synthetic peptides and found 1) the Isothreonine peptides elute differently AND there is a change in the HCD fragmentation patterns (actually the second paper I mention reports that as well), but they suggest that it would be reasonably easy to integrate this shift in fragmentation patterns into most proteogenomic pipelines!


Monday, September 19, 2016

GlycoPep MasList -- automatically build targeted lists for glycopeptides!


Shoutout to @ScientistSaba for helping me keep up with all the awesome stuff happening #HUPO2016 and still having time to tip me off to some cool papers like this one!!

The paper introduces this little program GlycoPep MassList.

The concept is simple AND powerful!  Feed it your protein and it will generate you an inclusion list for the glycopeptides that could occur given your parameters.

They demonstrate that it works on their Orbitrap Velos Pro. You can use it for a purely targeted experiment or within a "gas phase enrichment" strategy (like the "include others" button on a Q Exactive)!





Saturday, September 17, 2016

MCP wants your opinions on targeted proteomic publishing guidelines!


MCP has always had (famously strict!) guidelines for what and how they will accept global proteomics data. They are now working on a draft for targeted proteomics data and have opened that draft up to the community for contributions.

Want to shape how we publish? Check it out here!!


Thursday, September 15, 2016

Rosetta's mass specs have confirmed complex organic molecules on Comet67P


You've probably already heard this, but if not -- its totally cool. According to this paper in Nature this week, Comet67P is just flying along leaving its long comet trail -- and that trail has a bunch of complex organic molecules in it!

My first question -- how do you confirm that? I'm first thinking the orbiter is probably doing this by spectroscopy and I'm gonna find those readings -- dubious -- but it turns out there are 2 mass specs on the Orbiter!

COSIMA is a Time Of Flight instrument designed by researchers at Max Planck that is capable of 1500 resolution at 100(m/z) but has an effective mass range up to 1,000 m/z. COSIMA's job is to collect dust particles and to analyze those particles by Secondary Ion Mass Spectrometry (SIMS). The surface of the particles are hit with an ion beam that ionizes stuff off the particles and in.

ROSINA is another mass spectrometer that is detecting and ionizing gases. I'm having trouble finding much in the way of details on it, but it I've ran across several descriptions of it as a double focusing mass spectrometer, something I'm not familiar with (and I've got to be at work super early today). The design should be investigated, though. It is capable of 3,000 resolution at 1% peak height. Which...honestly is a lot of resolution.

If I think of resolution at 1/2 height like this image I stole from the Fiehn lab...


We're calculating resolution at FWHM or 50% peak height. Unless I've got my numerator and denominator mixed up, if you're pulling 3,000 resolution at 1% peak height, that...ain't bad at all!

And I think I do have it right, because this double focusing mass spec is supposed to be able to tell CO from N2 -- (27.9949 from 28.0062!  that's 11 millimass units!!)

Okay. So...if there is evidence from these two impressive mass specs that survived a 4 BILLION mile trip there, I'm going to believe it!

But that isn't all! This isn't he first time we've shot instruments through the trails of comets. This is just the first time we've gotten readings this good. If you take the mass spectra from the other instruments in the past (as they show in the paper) you'll see that we have seen this in the trails of other comets.

So....there are huge balls of rock that fly through the universe shooting organic molecules the whole way....does anyone else feel like this alters an adjustment factor in the Drake equation at all?!?

Wednesday, September 14, 2016

GEMPro -- Genome Scale Models with Protein Structures!


This paper from Elizabeth Brunk and Nathan Mih et al., is not the first paper to jump on if you're already feeling dumb.

It is elegant and brilliant and imposing. The concept is an extension on a genome regulation tool called genome scale models. This is a nice open access review written on the topic from that is directed to people who aren't planning to encode their own tools.  

There appear to be multiple iterations of GEM, but the one that seems the most straight-forward to me is the integration of the genomic changes with the metabolic ones. Obviously there is several levels of regulation from the genetic level (from the transcriptional regulation through the post translational) that all have effects on metabolite production, but GEM steps around that. The concept takes our existing knowledge and relationships and feeds it into a framework -- we know that it isn't a direct link from RNA X to metabolite Y, but all the same when we see an upregulation in X we see a down-regulation in Y.

I probably slaughtered the concept, but that's what I'm getting out of it.

GEMPro builds on this. Cause what would make this more complicated? What if you also threw in protein 3D structures into the mix!?!?  The whole idea definitely makes my head hurt, but...

The GEM framework is in place and yielding dividends
We have structural information on 110k proteins (seriously...!?!...that's what the paper says!) and more all the time.
For those protein 3D structures we have useful information -- like what is the structure of this protein at this temperature...or in this disease state.
More metabolomics data is showing up all the time that could correspond to changes in either.

This is obviously a big data problem with this number of variables....and a big focus of the paper is that if they build a framework that can do this --it MUST be able to grow with the existing knowledge bases, cause our knowledge of everything biological is increasing significantly faster than linear rates.

How do you test something like this? They go for 2 bacteria. E.coli and T.maritima (which I don't think I've ever heard of...Wikipedia says its a cool extremophile from Italian volcanoes (estremofilo!)
Cool point in the paper -- if they try to do this analysis with all the data that was available in previous years you get a really cool picture of how our knowledge is expanding.

The myoglobin crystal structure was published in 1958. From that time until 2013 all the groups doing protein 3D structure work got to where about 34% of the E.coli proteins are characterized in high quality maps that can be used for this type of analysis. If they step forward in time to when they finalized this paper? They're at 44%. Wow! (Google doesn't know the word "Wow" in Italian. Well...it knows one...but apparently its not appropriate in all dialects and I'm watching it today.)

And they dump all this data in. From these 2 organisms and look around. This is my favorite analysis:
E.coli isn't very tolerant to heat compared to our estremofilo friend. They go into the literature and find the proteins in E.coli that are known to be adversely affected by growing in culture that is too hot. Here they can draw on their their GEM models -- what genes are known to be similar as well as what gene products are linked to metabolic functions that are the same (if the genes don't look the same, you can pull the listings that are tightly linked to the creation of this metabolite as the same thing)

This gives them a little over 200 entries that either have the same (or very similar) metabolic functions in the 2 organisms....and only 10% of them have similar 3D structures.

So...the genetic pressure is there to conserve this basic DNA sequence for making, for example, this amino acid. Or if the two organisms have evolved very different ways of making that amino acid -- we can link some of these proteins together by the fact that they make that amino acid. But at the 3D protein level they are very very different.

So...E.coli has 200 proteins that presumably just up and fall apart. No amino acid in my example = dead, but our Italian friend just keeps chugging along and enjoying its relaxing volcanic sauna.

I totally dig this paper. I'm not sure what I'm going to do with this information, but I really like it!

Erratum to yesterday's talk.


Thank you to everyone who popped in to hear my 4+ hours of sleepy incoherent rambling about how Orbitraps work.

Important erratum: During a discussion on the potential promised by the experimental NeuCode reagents I mistakenly copied an image that was NOT NeuCode. I then, very sleepily, tried to figure out why I hadn't cited the paper -- cause the slide was from something else entirely.

This slide has been corrected and clarified and this section will need to be deleted entirely from the video recording. No slides have been distributed from the talk, so I don't have to worry about a big and embarrassing mistake being shown to anyone. So...I've got that going for me!

This talk was not officially sponsored or blessed by any representative of any corporation of any kind and was not a responsibility of my day job. I have evidence, cause I had to spend Sunday and late Monday and Tuesday nights doing my day job so I could put on the talk. Hence why I was sleepy enough to put an image that made no sense to the talk at all into the slide. The proof is on my FitBit sleep tracker....

To anyone this slide annoyed -- I'd like to apologize. I need a proofreader!

Tuesday, September 13, 2016

UVPD without lasers!!


UV photodissocation and you don't need a laser!!  Paper here!

Check this out!


That is seriously it. Somebody here in Maryland has got an ion trap sitting around somewhere. Time to set up a weekend. I can get some LEDs on Amazon. Lets do this!

Is it the coolest possible use of LEDs?!?!


...I guess it matters who you ask....

My vote is PUT THEM IN AN ION TRAP!!!!

Monday, September 12, 2016

SCX separation inside your nanospray emitter!?!?!!?


I read this abstract yesterday and thought to myself "....great, someone discovered MudPIT..." and promptly forgot about it and got back to work.

This morning I reread it. Unfortunately, I don't have time to read the paper before I go out the door on this ridiculously early morning, but....they appear to be doing SCX in their nanospray emitter....

Considering the way that I learned to do SCX involved a buffer with 8M salts of some kind, they are either doing something very different -- or they are replacing their mass spec every couple of days.

If you'd like to delve into this mystery you can find it here!

Sunday, September 11, 2016

Known unknowns of cardiolipin signaling: The best is yet to come


A few years ago I had the pleasure of spending a few days with a bunch of lipidomics experts in Pittsburgh and got to learn: 1) How ridiculously insanely hard lipidomics can be if you aren't going after the "easy" compounds and 2) How important they are 3) How very very little we know about them.

This group just wrapped up a really nice review on one of their tougher problems -- the analysis of cardiolipins. How much fun are cardiolipins to work with? Start with the fact that structurally similar ones tend to cluster in similar mass ranges, but have different functions and you have a good idea.

One example they mention in the paper, 12 of their compounds of interest are within 0.1 Da in MS1 mass and even when they fragment them to figure out which one is which -- MS2 isn't capable of elucidating the location of a functional site -- which is critical to know cause there are a slew of isomers that are within these "12" compounds. They have to employ a 2D LC method and utilize MS3 methods on an Orbitrap Fusion to figure out what they are looking at.

They also show genetics techniques they can use, as well as imaging techniques to localize these things. The problem sounds...daunting...but groups all over the world are chipping away at it. And you can't beat the optimism in the title!

Oh yeah! Paper link here!

Friday, September 9, 2016

Basic Orbitrap physics seminar



Hey! Wanna log on for free and listen to me go on about where and how ions move around in Orbitrap devices?  The goal is to have a better understanding of where the ions are going and when to help understand your instruments better!

Note: -- Part 6 should be more like: From the LTQ-Orbitrap XL through the Orbitrap Fusion Lumos!

You can register here (limited to 1,000 total viewers....so be quick about it)

The videos should be available afterward and I'll post it here!