News in Proteomics Research: September 2016

Thursday, September 29, 2016

Cool TGen new mass spec video!

This video is pretty fun!

Wednesday, September 28, 2016

UniprotKB improved its interface for helping us find references!

This might sound minor, but its going to make some of my days easier!

If you land on a cool protein that is differentially regulated in your organism in UniprotKB, they've added a handy little button (highlighted above) that just says "Publications"

And it actually pops up publications. AND it doesn't just pull up articles where your protein name appears in the title!!! It pulls up big studies as well where your protein was listed!

Sometimes its the little things! (Shoutout to @PastelBio for pointing this out)

Tuesday, September 27, 2016

Cysteine modifications in aging and neurodegeneration?!!?!?

Who was I talking to the other day at length about cysteine modifications? Somebody locally here in Maryland. I have the vague impression I felt outclassed intellectually, but that doesn't really narrow it down too much...

Eons ago, I was involved in a study where normal shotgun proteomics techniques totally messed up our work. We were studying a compound and it caused big protein mass shifts if you incubated it with a single or a mix of proteins, but we couldn't find peptide mass shifts. Turned out that the compound loosely bound to cysteines and when we reduced and alkylated the iodoacetamide displaced the compound. (We weren't reducing and alkylating the intact proteins; just shooting them intact). I've always wondered since then if we are losing other information about cysteines by using these techniques.

(Side note: it is a very common practice in big Pharma companies I've visited that are studying antibodies for them to do a digest of their antibody with and without reducing and alkylating and most of their software can take both runs and make conclusions about what the cysteines are doing)

Want to worry more about cysteine states? Check out the study in the screenshot above (link here)! Maybe this is all stuff you already know that I don't, but this review matter-of-factedly states all sorts of stuff about cysteine PTMs that I knew nothing about until this morning.

It is well established that cysteine oxidation is linked to aging (what?!?) cause the redox stuff that cysteine does that is critical to many protein functions is inhibited by the build up of modifications on it (??). But not very much is known about what specific PTMs or patterns of PTMs are the most critical.

I DON'T EVEN LOOK FOR CYSTEINE PTMS! And they talk about 8!!! reversible cysteine PTMs that play really key roles in cellular regulation and stuff that I probably can't see cause I've blasted all my cysteines with a crazy strong reducing agent and then bound something to them.

However, all is not lost --- this paper discusses the established techniques for going after these PTMs. By using alternative reduction techniques or even modifying the PTMs themselves, you can study the changes in these -- and even, in some cases, directly enrich for peptides with this cysteine modification state.

Each one appears highly involved, but if you are sitting there looking at a phenotype that you can't explain from a global proteome level, maybe this is something to go after? I doubt you could find a more definitive review on this topic!

Monday, September 26, 2016

GAPP -- Proteogenomics and PTMs for microorganisms!

...um...this statue is in DC...and I'm pretty sure you aren't allowed to climb on it. But it is my favorite image that popped up when I went for "on the shoulders of giants" cause this is what GAPP reminds me of. Watch for the park police, kid!

So...what if you looked at the open source toolkit landscape for proteomics (and had programming capabilities!)? Would it make sense to completely build your own tools right now? I totally wouldn't! There is so much cool stuff out there, I'd just piggyback some existing framework and add what I needed.

And this is EXACTLY what GAPP is doing.

You can read about it (in press and currently open at MCP!) here.

Check out what is inside the Zip file if you get it from SourceForge here!

I was a little alarmed by how long it took Windows Defender to scan through this file (it was clean, but always scan whatever you download!) but it made sense when I opened it.

GAPP fills the gaps all our favorite tools leave in the Microbial proteogenomics workflow. They couldn't find anything that would make their database -- search their proteomics data (extensively! look at all those engines) and process their PTM data, so they made a Java interface that would by using a bunch of tools that are already out there -- awesome and can be networked together.

This is the general scheme:

To validate that the complicated thing all works -- they downloaded a very nice and extensive H.pylori dataset from PRIDE (this one from Muller et al.,) and went to work.

Worth noting:
There are some pre-requisites necessary to install this. You must have Java 1.6 or later and you must have Perl already installed to use the Java interface. I can't tell for sure, but I'm going to suspect that you are going to need the MSFileReader separately installed as well, as its a pre-requisite for the ProteoWizard.

I really like this study. We don't need to reinvent the wheel every time! Sometimes we can just tweak the awesome stuff that is already out there!

Saturday, September 24, 2016

How does the Precursor Ion Area Detector node work?

I'm procrastinating this morning and just when I was running out of excuses for not finishing an ongoing bathroom remodel, I realized there were a bunch of unapproved questions/comments on the blog! This is the last one. After writing far too many lines in the little comment box about why the NIST antibody is so much better than the commercial sources that have been around, I didn't want to tackle this one the same way.

How does the Precursor Ion Area Detector node work? And a reference?

The reference might surprise you!

You can direct link to it here, and I think its open access. Look, I'm gonna give Q-TOFs a hard time. I've only had one in all my career and it was, on its best day, ~~a turd sandwich~~ not very good. [~~Completely dedacted rant about Ben's hatred for Q-TOFs and sarcastic statements about their many uses as well as recently acquired facts regarding their value in scrap metal components~~]. Wow, I feel much better about publishing this post now -- and it is much much shorter!

Remember, though, that there was a day when this was the cutting edge and people were just as smart back then as they are now, and they did good research despite the limits of their instrumentation! This paper is such a study. It is definitely intended to be a paper showing off a new (at the time) fragmentation technology, but in it they set the framework that most label free quantification is based on --or at least, influenced by.

The idea -- the high resolution (here 10,000) extraction of the intensity of the 3 most intense peptides from each separate maximum intensity is very strongly correlated with the abundance of the protein.

This is the Proteome Discoverer interpretation -- you're ticking along and identifying peptides and you assign each PSM (peptide spectral match) the intensity it had in the MS1 event that it was selected from. When you compile the PSMs into the peptide, if there are more than one PSM the peptide is assigned the intensity of the highest PSM. When the peptides are pulled into the protein or protein group, the average of the intensity of the (up to) 3 (adjustable in PD 2.1) peptides is averaged into the protein area.

If you have a protein that has only one PSM, this is easy. The "area" of that protein is the intensity of the PSM.
If you have 3 PSMs that all go to one peptide and into one protein, still easy. The "area" of the protein is the intensity of the most intense PSM.
If you have 3 PSMs for each of 3 peptides, the protein "area" will be the average of the most intense PSM from each peptide.

Important note here! The protein "areas" will not always be calculated from the same peptides. If you've got something where you had 50-60% sequence coverage and have 200PSMS, chances are it won't be the same peptides at all. But, seriously, this totally works at the protein level. You are going to need to go to the PSM or peptide level intensities if you want to say, for example, how this modified peptide changes from run to run, and that requires a good bit extra work.

Michael Bereman, who knows a little something about protein quantification (SProCop! and QCMyLCMS.com) and he told me it worked, if I remember correctly, "surprisingly well". I use it in virtually every sample I process in PD. It has never once hurt me to have that extra information!

Are there better ways of getting relative quantification of proteins and peptides? Sure! And these algorithms are coming -- and are going to absolutely change EVERYTHING about how we do proteomics -- Minora, PeakJuggler, and IonStar are all getting ready for prime time and are going to usher in something I think will finally be worthy of the title "next gen" proteomics by allowing us to finally see all the stuff in Orbitrap data that we've never seen before. Your Orbitrap, right now, is far better than you think it is.

Friday, September 23, 2016

iMixPro -- less false discoveries in pulldowns with heavy peptides!

Affinity purifications (or much cooler...affinity enrichments!) are ever in increasing demand. What protein interacts with my other proteins and how may be one of the most important things we'll be contributing to biology in the future -- once its not completely fracking impossible to do it. The new crosslinking methodologies that are coming are going to help, but iMixPro is another elegant approach.

It is described in this awesome new JPR paper from Sven Eyckerman et al.,! I'll start off by saying it isn't the simplest method you've ever seen, but if you've spent much time doing protein-protein interaction assays you've either developed your own complex methodology that works and you're keeping it secret from the world -- or you'd try just about anything to figure out what is real and what is not! -- especially when today's super sensitive instrumentation is telling you you pulled down 1,000 proteins with that expensive "specific" antibody you just got in!

It differs from affinity enrichments in that it intelligently employs heavy labels (this is where the "i" comes from in the name). Having essentially SILAC pairs to look at in their data improves even the label free quan approach you have in affinity enrichments. Combining labeled peptides = less batch effects, which is never a bad thing. They show some great examples where they can remove the noise and find their true interactors, even when the intensity of the true signal is only a fraction of the value of the other signals identified!

Thursday, September 22, 2016

New cool stuff in Q Exactive Tune 2.7

I popped by to visit Dr. Kowalak the other day to see what cutting edge science the NIMH Proteomics Center is doing these days and he showed me that his QE HF Tune doesn't look like my QE HF tune....

So I upgraded my QE HF Tune to 2.7SP1 to check it out! There is a bunch of cool stuff in here! One highlight: The confusing %underfill ratio is now gone and replaced by a much more sensible measurement. You now have a "minimum AGC Target" as well as your normal AGC target.

According to the manual, "IF the mass peak of interest reaches this minimum AGC target within the maximum injection time, a data dependent scan will be initiated"

I like this much better!

If I've got it right, this is the current instrument logic --

--and a pretty good drawing of me and my dog, Gustopheles (you're welcome!)

What else is included in the Tune 2.7SP1?

Loads of upgrades for the QE Focus (extended mass range, more MSX counts && some combination scans, like MS1 and PRM in the same experiment!)! And a software modification to make instrument bakeouts on all Exactives and Q Exactives more efficient!

Please remember that this is my interpretation and may not be 100% factually accurate or well drawn. Sometimes I put things like this up and the vendor involved will contact me to tell me I'm wrong and I'll walk away learning something. If that happens, I'll be sure to edit this later!

The best part of this is that I don't ever have to explain %underfill ratio again!

Wednesday, September 21, 2016

Threonine and Isothreonine have different HCD fragmentation patterns!

Yeah...I totally stole this from another blog in-between meetings, but its seriously cool. The original blogger put it up on Accelerating Proteomics here.

The original article is from KG Kuznetsova et al., and can be found here. (Side note: Man, there has been some cool original research coming out of Moscow lately! Keep it coming!)

Wait. What is Isothreonine again? Well, its also called homoserine and we sometimes see it in proteomics data, but it generally isn't a good thing.

Check out this quick image I borrowed from Alexey Chernobrovkin et al., from this paper a couple years ago:

In this illustration, the protein is yanked out and digested and...crud...overheating the protein with iodoacetamide converts some of the methionines to isothreonines. Gross. Then, cause you don't have IsoThreonine in your FASTA, you end up finding a peptide with regular old Threonine in it.

Boom. False discovery. Where is that a big deal?

1) De novo sequencing (nuts) -- you totally got a peptide wrong
2) Proteogenomics -- cause your huge database has lots and lots of possibilities in it. And...well...the chances that you'll have a peptide sequence in your database with a xxxTxxxK (from a peptide that really started out as xxxMxxxK...but isn't there anymore is higher than when you are using a smaller, manually curated FASTA and your odds of making that mismatch is made higher just algebraically.

All is not lost, researchers who are banking hard on proteogenomics/metagenomics being the future!

Cause the original paper I found at the top did a focused study with synthetic peptides and found 1) the Isothreonine peptides elute differently AND there is a change in the HCD fragmentation patterns (actually the second paper I mention reports that as well), but they suggest that it would be reasonably easy to integrate this shift in fragmentation patterns into most proteogenomic pipelines!

Monday, September 19, 2016

GlycoPep MasList -- automatically build targeted lists for glycopeptides!

Shoutout to @ScientistSaba for helping me keep up with all the awesome stuff happening #HUPO2016 and still having time to tip me off to some cool papers like this one!!

The paper introduces this little program GlycoPep MassList.

The concept is simple AND powerful! Feed it your protein and it will generate you an inclusion list for the glycopeptides that could occur given your parameters.

They demonstrate that it works on their Orbitrap Velos Pro. You can use it for a purely targeted experiment or within a "gas phase enrichment" strategy (like the "include others" button on a Q Exactive)!

Saturday, September 17, 2016

MCP wants your opinions on targeted proteomic publishing guidelines!

MCP has always had (famously strict!) guidelines for what and how they will accept global proteomics data. They are now working on a draft for targeted proteomics data and have opened that draft up to the community for contributions.

Want to shape how we publish? Check it out here!!

Thursday, September 15, 2016

Rosetta's mass specs have confirmed complex organic molecules on Comet67P

You've probably already heard this, but if not -- its totally cool. According to this paper in Nature this week, Comet67P is just flying along leaving its long comet trail -- and that trail has a bunch of complex organic molecules in it!

My first question -- how do you confirm that? I'm first thinking the orbiter is probably doing this by spectroscopy and I'm gonna find those readings -- dubious -- but it turns out there are 2 mass specs on the Orbiter!

COSIMA is a Time Of Flight instrument designed by researchers at Max Planck that is capable of 1500 resolution at 100(m/z) but has an effective mass range up to 1,000 m/z. COSIMA's job is to collect dust particles and to analyze those particles by Secondary Ion Mass Spectrometry (SIMS). The surface of the particles are hit with an ion beam that ionizes stuff off the particles and in.

ROSINA is another mass spectrometer that is detecting and ionizing gases. I'm having trouble finding much in the way of details on it, but it I've ran across several descriptions of it as a double focusing mass spectrometer, something I'm not familiar with (and I've got to be at work super early today). The design should be investigated, though. It is capable of 3,000 resolution at 1% peak height. Which...honestly is a lot of resolution.

If I think of resolution at 1/2 height like this image I stole from the Fiehn lab...

We're calculating resolution at FWHM or 50% peak height. Unless I've got my numerator and denominator mixed up, if you're pulling 3,000 resolution at 1% peak height, that...ain't bad at all!

And I think I do have it right, because this double focusing mass spec is supposed to be able to tell CO from N2 -- (27.9949 from 28.0062! that's 11 millimass units!!)

Okay. So...if there is evidence from these two impressive mass specs that survived a 4 BILLION mile trip there, I'm going to believe it!

But that isn't all! This isn't he first time we've shot instruments through the trails of comets. This is just the first time we've gotten readings this good. If you take the mass spectra from the other instruments in the past (as they show in the paper) you'll see that we have seen this in the trails of other comets.

So....there are huge balls of rock that fly through the universe shooting organic molecules the whole way....does anyone else feel like this alters an adjustment factor in the Drake equation at all?!?

Wednesday, September 14, 2016

GEMPro -- Genome Scale Models with Protein Structures!

This paper from Elizabeth Brunk and Nathan Mih et al., is not the first paper to jump on if you're already feeling dumb.

It is elegant and brilliant and imposing. The concept is an extension on a genome regulation tool called genome scale models. This is a nice open access review written on the topic from that is directed to people who aren't planning to encode their own tools.

There appear to be multiple iterations of GEM, but the one that seems the most straight-forward to me is the integration of the genomic changes with the metabolic ones. Obviously there is several levels of regulation from the genetic level (from the transcriptional regulation through the post translational) that all have effects on metabolite production, but GEM steps around that. The concept takes our existing knowledge and relationships and feeds it into a framework -- we know that it isn't a direct link from RNA X to metabolite Y, but all the same when we see an upregulation in X we see a down-regulation in Y.

I probably slaughtered the concept, but that's what I'm getting out of it.

GEMPro builds on this. Cause what would make this more complicated? What if you also threw in protein 3D structures into the mix!?!? The whole idea definitely makes my head hurt, but...

The GEM framework is in place and yielding dividends
We have structural information on 110k proteins (seriously...!?!...that's what the paper says!) and more all the time.
For those protein 3D structures we have useful information -- like what is the structure of this protein at this temperature...or in this disease state.
More metabolomics data is showing up all the time that could correspond to changes in either.

This is obviously a big data problem with this number of variables....and a big focus of the paper is that if they build a framework that can do this --it MUST be able to grow with the existing knowledge bases, cause our knowledge of everything biological is increasing significantly faster than linear rates.

How do you test something like this? They go for 2 bacteria. E.coli and T.maritima (which I don't think I've ever heard of...Wikipedia says its a cool extremophile from Italian volcanoes (estremofilo!)
Cool point in the paper -- if they try to do this analysis with all the data that was available in previous years you get a really cool picture of how our knowledge is expanding.

The myoglobin crystal structure was published in 1958. From that time until 2013 all the groups doing protein 3D structure work got to where about 34% of the E.coli proteins are characterized in high quality maps that can be used for this type of analysis. If they step forward in time to when they finalized this paper? They're at 44%. Wow! (Google doesn't know the word "Wow" in Italian. Well...it knows one...but apparently its not appropriate in all dialects and I'm watching it today.)

And they dump all this data in. From these 2 organisms and look around. This is my favorite analysis:
E.coli isn't very tolerant to heat compared to our estremofilo friend. They go into the literature and find the proteins in E.coli that are known to be adversely affected by growing in culture that is too hot. Here they can draw on their their GEM models -- what genes are known to be similar as well as what gene products are linked to metabolic functions that are the same (if the genes don't look the same, you can pull the listings that are tightly linked to the creation of this metabolite as the same thing)

This gives them a little over 200 entries that either have the same (or very similar) metabolic functions in the 2 organisms....and only 10% of them have similar 3D structures.

So...the genetic pressure is there to conserve this basic DNA sequence for making, for example, this amino acid. Or if the two organisms have evolved very different ways of making that amino acid -- we can link some of these proteins together by the fact that they make that amino acid. But at the 3D protein level they are very very different.

So...E.coli has 200 proteins that presumably just up and fall apart. No amino acid in my example = dead, but our Italian friend just keeps chugging along and enjoying its relaxing volcanic sauna.

I totally dig this paper. I'm not sure what I'm going to do with this information, but I really like it!

Erratum to yesterday's talk.

Thank you to everyone who popped in to hear my 4+ hours of sleepy incoherent rambling about how Orbitraps work.

Important erratum: During a discussion on the potential promised by the experimental NeuCode reagents I mistakenly copied an image that was NOT NeuCode. I then, very sleepily, tried to figure out why I hadn't cited the paper -- cause the slide was from something else entirely.

This slide has been corrected and clarified and this section will need to be deleted entirely from the video recording. No slides have been distributed from the talk, so I don't have to worry about a big and embarrassing mistake being shown to anyone. So...I've got that going for me!

This talk was not officially sponsored or blessed by any representative of any corporation of any kind and was not a responsibility of my day job. I have evidence, cause I had to spend Sunday and late Monday and Tuesday nights doing my day job so I could put on the talk. Hence why I was sleepy enough to put an image that made no sense to the talk at all into the slide. The proof is on my FitBit sleep tracker....

To anyone this slide annoyed -- I'd like to apologize. I need a proofreader!

Tuesday, September 13, 2016

UVPD without lasers!!

UV photodissocation and you don't need a laser!! Paper here!

Check this out!

That is seriously it. Somebody here in Maryland has got an ion trap sitting around somewhere. Time to set up a weekend. I can get some LEDs on Amazon. Lets do this!

Is it the coolest possible use of LEDs?!?!

...I guess it matters who you ask....

My vote is PUT THEM IN AN ION TRAP!!!!

Monday, September 12, 2016

SCX separation inside your nanospray emitter!?!?!!?

I read this abstract yesterday and thought to myself "....great, someone discovered MudPIT..." and promptly forgot about it and got back to work.

This morning I reread it. Unfortunately, I don't have time to read the paper before I go out the door on this ridiculously early morning, but....they appear to be doing SCX in their nanospray emitter....

Considering the way that I learned to do SCX involved a buffer with 8M salts of some kind, they are either doing something very different -- or they are replacing their mass spec every couple of days.

If you'd like to delve into this mystery you can find it here!

Sunday, September 11, 2016

Known unknowns of cardiolipin signaling: The best is yet to come

A few years ago I had the pleasure of spending a few days with a bunch of lipidomics experts in Pittsburgh and got to learn: 1) How ridiculously insanely hard lipidomics can be if you aren't going after the "easy" compounds and 2) How important they are 3) How very very little we know about them.

This group just wrapped up a really nice review on one of their tougher problems -- the analysis of cardiolipins. How much fun are cardiolipins to work with? Start with the fact that structurally similar ones tend to cluster in similar mass ranges, but have different functions and you have a good idea.

One example they mention in the paper, 12 of their compounds of interest are within 0.1 Da in MS1 mass and even when they fragment them to figure out which one is which -- MS2 isn't capable of elucidating the location of a functional site -- which is critical to know cause there are a slew of isomers that are within these "12" compounds. They have to employ a 2D LC method and utilize MS3 methods on an Orbitrap Fusion to figure out what they are looking at.

They also show genetics techniques they can use, as well as imaging techniques to localize these things. The problem sounds...daunting...but groups all over the world are chipping away at it. And you can't beat the optimism in the title!

Oh yeah! Paper link here!

Friday, September 9, 2016

Basic Orbitrap physics seminar

Hey! Wanna log on for free and listen to me go on about where and how ions move around in Orbitrap devices? The goal is to have a better understanding of where the ions are going and when to help understand your instruments better!

Note: -- Part 6 should be more like: From the LTQ-Orbitrap XL through the Orbitrap Fusion Lumos!

You can register here (limited to 1,000 total viewers....so be quick about it)

The videos should be available afterward and I'll post it here!

Thursday, September 8, 2016

OpenMS 2.0!!!

Does OpenMS officially have everything now?

Ben, what are your rambling about now? Oh...just the evolution of OpenMS into something that can do everything, as described in this brand new paper!

OpenMS can already do:
-Peptide ID
-Peptide Quan
-Integration into Proteome Discoverer via the OpenMS PD Community nodes
-Add DNA/RNA binding (to protein) detection capabilities to both OpenMS and to PD
-Allow people to add their own source code and then use the OpenMS downstream workflows (like FDR) to link to whatever upstream source search engines you are using; I think this is how these guys controlled this awesome inference study.
and now?

-INTEGRATION WITH COMPOUND DISCOVERER? I love HRAM metabolomics and it consumes most of my increasingly rare instrument time these days. As much as I may rant about how easy metabolomics is with an Orbitrap after a beer or two, it is a field that still has its own innate challenges -- challenges that we honestly may not fully understand yet. Flexible software platforms that can address these are going to be critical if metabolomics is every really going to blow up the way we keep thinking its going to. I'm not surprised that the OpenMS team has the capability to add software to Compound Discoverer....considering the cool stuff they've developed for Proteome Discoverer, but...

...I had no idea they'd stated making nodes....and I don't know what the MetaboProfiler is or what it does, but it painlessly installed into my copy of CD 2.0 can't wait to give it a try!!!!

-PROTEOGENOMICS!?!? This study points out a case study where it does, as well as...

-Degradomics! Have you tried realistically quantifying the degradation of proteins at a global level? No? Well...It. is. not. fun. The tools have to get better before we can track more than a small group in and someone is using OpenMS for that.

-Integration into KNIME and R for collaboration and downstream processing, respectively

-And a bunch of other stuff like Galaxy integration(?!?!), but this list is long enough now.

Does this sound like a sales pitch for OpenMS? It probably does, but this team of talented people are quietly making amazing tools for our community and going to great pains to make these tools as accessible as possible. And I don't mind being loud about it. (There are bunch of new tutorial videos for getting started now!)

You can easily find OpenMS and their spiffy new website with a Google search or directly link here.

Wednesday, September 7, 2016

Origin of Disagreements in Tandem Mass Spectra!

When you search the same RAW file containing tandem mass spectra versus the same database using different search engines, you are going to see some disagreements in the results.

For example, if I take a proteomic sample from myself and I run it through Mascot and I run it through Sequest separately, the results probably not going to be exactly the same. Mascot will identify some peptides that Sequest won't, and vice versa. It is also likely that I'll see a few MS/MS spectra that Sequest said was one sequence and Mascot said...is something different...

Considering that the database we're searching this against is constructed making some textbook assumptions and is starting from a DNA sequence....that is not mine....we do pretty darned good though!

Where do these disagreements come from? That is the topic of this new paper from Dominique Tessier et al., in this month's JPR. To evaluate this question, these researchers grab a cancer dataset from PRIDE from Gygi lab and then run some plant samples in house on an Orbitrap Velos using high/low (or...medium/low? 30k MS1 + Top5 ion trap MS/MS).

The RAW files are then searched versus: Mascot, MSGF+, X!Tandem, TPP (presumably, also using X!Tandem) and an analysis of the conflicts are performed between the results.

The results are interesting, and the processed results are more conflicting than I've ever seen. The authors develop a concept of "peptide space" and conclude that optimization of the search parameters for each engine is essential to getting the best and most overlapping data. They also note that in some versions of the software they utilize the parameters that they need to change to get the best data is sometimes not easily user accessible.

I think this is a nice study and a good look at some of the problems we have in the statistics behind the scenes. It is sometimes easy to forget these days what an enormous undertaking from a mathematical perspective developing all these tools has been over the last couple of decades. Today's proteomics researchers coming in can simply push a play button to get good results and its easy to take it for granted!

Minor criticisms:
1) The RAW files were converted by different tools that I believe are quite different in their underlying mechanisms. I think this is a variable should have been eliminated by using the same tools. Would it have an effect? I dunno...but its a variable that could be knocked out with 5 minutes more work.
2) PD 1.7? Wow, I don't have that one! ;)
3) I think the function of the search engines is something that is being focused on cause its the easiest to implicate. The FDR estimations employed were different for each engine. I think this could have a big impact on these results. I'd suspect that if FDR was controlled the same way for each of these results that the level of agreement would be a little better
4) The in-house generated data is just a little weird. 30k MS1 followed by 5 MS/MS for plant fractions is going to yield only high copy number proteins and using a search parameter of 0.4 Da for the fragments is probably too tight and will affect the downstream results a little.

Again, minor criticisms from somebody who just does proteomics as a hobby. Please feel free to ignore! I do like this paper and I'm glad Twitter (PastelBio!) recommended it for my breakfast paper today.

Tuesday, September 6, 2016

2017 EuBIC Winter School on proteomics bioinformatics

What has a bunch of big proteomics bioinformatics speakers...
In Austria...
In January...
At a place called "SportHotel"...
With an organized tobogganing event...
And a Hackathon?!?!?

It must be the EUBIC Winter School!!! While I'm trying to find some way to find someone else to pay my registration fee, you should check it out and see if you can find someone to do the same for you!

You can find out more about this EuPA sponsored event here.

Monday, September 5, 2016

Is the Q Exactive HF less sensitive than other models?

Short answer:

At ASMS there was a rumor buzzing around. Earlier in the summer, 2 groups had found --individually -- on their assays, their QE Plus outperformed their QE HF in terms of sensitivity by limits of detection. Therefore, the rumor said, the QE HF wasn't as sensitive as the Plus.

I actively got involved in the first assay and resolved it myself -- it was a minor misconception on the parameters of the two instruments. The second was much more complicated and didn't seem to be my problem...until it was...and it irreparably destroyed a couple weeks of work I did in the spring.

The first one is the easiest to talk about (both thematically, and emotionally) but in the end, its the same issue.

This is a simple schematic of the QE to demonstrate how it works similarly to a triple quadrupole, but it'll do here as well. The focus here is the C-trap, which we control in the instrument software:

The importance of the C-trap in any hybrid Orbitrap system can not be understated. It is a critical and often misunderstood component of the system. In the newest instruments we have 2 ways of controlling this parameter in the instrument method software -- the AGC target and the maximum Injection Time (IT; which I often call "fill time").

The AGC Target is the goal. In this case, I am telling this QE Focus that the goal is for it to obtain 50,000 charges. That is 50,000 +1 ions; or 25,000 +2 ions; or whatever adds up to 50,000 total charges. That is the goal.

The Maximum IT is the backup plan. In this example, the QE is told to obtain 50,000 charges OR to fill for 57 milliseconds before it performs the HCD fragmentation and Orbitrap scan.

57ms is a ton of time to collect ions. Consider the fact that a modern QQQ (triple quadrupole) instrument running at maximum speed only spends 2ms on each target ion. Here, the QE can collect the same ion for >25 times longer than the QQQ. {A QQQ has an advantage here because the ions actually physically strike the detector, while ions in the Orbitrap pass by the detector many many times, but that is a different conversation.} The fact is that in most experiments you won't ever need 57ms of fill time to collect 5e4 charges. You only need the maximum IT time on extremely low concentration ions.

Lets go to the Q Exactive Family Cycle Time Calculator! (Which you can download here).

QE Plus first:

In this case; MS1s are off, just MS/MS. Here I'm going at maximum speed. This is the most common settings for a QE; where the limiting factor is set to be the slowest part of the experiment -- the actual Orbitrap scan (which is 64ms on a QE or QE Plus); if you consider the 7ms of instrument overhead (to collect, fragment, cool, and inject) 57ms is the most efficient experiment. If you don't hit your AGC target, it will go to the next scan at 57ms anyway. Lets call this the "High Speed" Experiment.

Lets take a look at the QE HF "High Speed" Experiment settings (please remember the Cycle time calculator is really an unofficial cycle time estimator -- its also a holiday here and I'm noticing this math isn't looking right, but I'm too lazy to check it):

Boom! We're flying here. We went from 12Hz to 18Hz or whatever, so the Orbitrap is faster! We can get 20 MS/MS faster on the HF than on the Plus. If the AGC target is always easy to hit -- the HF is going to tear through 50% more scans than the QE Plus.

See the problem, though?? What if the AGC target ISN'T easy to hit? What if the maximum injection time is needed? Then the QE HF gets 57ms to collect ions; but the QE HF gets only 32ms --about one-half the amount of time that the Plus did. All the sudden, you're going to be looking at 1/2 the signal!!

The only way to ensure that the 2 instruments are running equally in terms of limits of detection or limits of quantification is to set the maximum injection time to the same number between the two instruments.

Imagine that you had all the same parameters -- same LC; column; sample concentration injection; AGC target; and Maximum IT -- how would that experiment turn out? The sensitivity will be just a little bit higher on the QE HF than on the Plus -- for an entirely different reason.

Imagine the top of your peak -- where the highest intensity of your target is coming out. This is the place where you are most likely to be hitting your AGC target.

Look at this peak I chose, literally at random, from the very first file I found in my Downloads folder. Also, please remember that if someone sends me a file to check out, 99% of the time something is wrong with it and they are asking me for advice on what might be the problem -- regardless, it is an okay example. I've labeled the scan numbers.

Check out lower on the peak --- the scans are further spaced than they are on the top. Near the bottom of this weak signal the Maximum IT is being used. Near the top of the peak, where we have the most ions, the AGC is being reached. The limiting factor at the top of the peak is the scan speed of the instrument.

Imagine now -- if I had a faster scanning instrument -- like a QE HF running this experiment -- here I would be getting 2x the number of scans at the top of this peak. Honestly, here I may have missed the actual top of this peak entirely. The max signal may have came out between 11739 and 11758 -- and I might have missed it. Even if 11739 was the highest that signal ever got, There is definitely some signal missed here between 11713 and 11739 that could have been picked up more accurately by a faster scanning instrument (or a less complex experiment). Therefore, the faster scanning QE HF would get slightly higher sensitivity on the same experiment than a slower scanning instrument.

I believe this post deserves this image.

The QE HF isn't as sensitive as a QE Plus???

EDITS: Wow! My 4th most read blog post ever? WTHeck? Okay. So just in case you think I'm making this stuff up. Please go to this paper, where the sensitivity of the QE HF is compared to the QE Classic. The HF can achieve the better fragmentation quality in ONE-HALF the time the Classic requires for the same sample and LC setup.

This study did not compare the Q Exactive Plus, but...this one did....

And...in this deep analysis, no deficiencies in the QE HF in terms of sensitivity were uncovered. Exactly the opposite.

Sunday, September 4, 2016

Proteomic analysis of how stink bugs mess up tomatoes!

Out of the 15 or so tomato plants we put in this spring, we've gotten something like 12 good tomatoes. A large part of the problem has been my puppy who will take bites out of any tomato he can reach but the ones he can't reach have also been messed up -- but in a weird and different way.

A little investigation on my part and it turns out its those jerks in the picture above causing the rest of the problems. That is Halyomorpha halys, or the brown marmorated stink bug, an invasive species brought to the U.S. in the 90s that has definitely made its way into the ecosystem of Pug Mountain.

They don't do a lot of obvious physical damage to the fruit, I guessed at first it was coincidental that I had a few wandering around. Not so.

In this PLOS One paper from Michelle Peiffer (not this person) and Gary Felton describe their investigation into this problem. They study both the effects on the tomatoes, as well as doing a proteomic investigation of the salivary glands of the pest.

The stinkbugs make little bites in the tomato and inject enzymes from their salivary glands (GROSS!) into the tomatoes that liquefy the region around the bite. Then they can easily drink the tomato.

When they look at what the salivary glands (and salivary sheaths...which go into the tomatoes (GROSS!)) they found a slew of different proteins, mostly consisting of digestive enzymes.

When they look at the plants, they find that if they take extract from the salivary sheaths and apply them to tomato leaves that they induce a stress response in the tomato plants that they can measure as well.

Solid all-around paper that has some good stats. I particularly like that they looked at both the bug and the host response. What I don't like is the fact that I've eaten 12 tomatoes that probably had bug spit inside of them. In my searching I found some good pesticide recommendations that should fix the problem...though I've still got to do something about the puppy.....

Saturday, September 3, 2016

High pH reverse phase StageTip FTW!

Need a quick method to boost your membrane proteomics coverage?

At first glance this paper might seem a little boring or obvious, but I only know one group that uses techniques like this and they've never published it and I'd never guessed it would be as powerful as this new paper suggests.

The idea is that they solubilize with SDS-PAGE buffers (which....despite tons of work on other techniques...is still probably the best technique for high membrane proteome coverage), run a gel and gel extract. I've seen some groups run for only a very short time and then only cut out the very very top piece of the gel (right below the stacking) and digest it out. There is still a TON of stuff in there.

This group (sorry, first author, your name is way too long for me to write it out here et al.,) takes it another step forward, taking the gel slice StageTip high PH reverse phase fractionating it and walk away with a huge number of membrane proteins and peptides with typical membrane spanning domains from a tiny amount of membrane starting material.

Simple and cheap boost to membrane proteomics coverage? Sign me up!

Thursday, September 1, 2016

Johns Hopkins launches new proteomics initiatives!

As someone who has been associated with Johns Hopkins since my very first full-time job, the lack of support for proteomics on the campus has been a source of frustration for me (and others...). There has obviously been proteomics capabilities on campus, but the administration has always treated it like an after-thought, especially compared the the $$$.$$$.$$$.$$$ the school has always thrown at genomics technology!

No longer! Today JHU officially launches the Center for Proteomic Discovery which is definitely the most sophisticated lab in Maryland, and probably on par with anything in the U.S.

With the third Lumos en route and multiple Q Exactives in plan for validation exclusively by high resolution PRMs, JHU isn't messing around.

In order to drive innovation in the center, JHU has introduced a "core coins" program that reminds me very much of the PRIME-XS project in Europe. The JHU school of medicine will sponsor worthy projects for free proteomics work, both at the new Center ran by Dr. Chan-Hyun Na and in the Dr. Cole's proteomics core facility

The center is also open to collaborators and service fee customers outside of the school. You can find out more about this awesome new resource here!