Tuesday, October 31, 2017

Cool SIM-DIA technique for screening DNA adducts!



I had NO idea that you could even do this. However, this appears to be mostly my ignorance, because these authors have been developing techniques like these for years! 

Before I go further, this is the new paper at JPR. 


There are a ton of classical genetics techniques for quantifying DNA adducts (essentially DNA that is messed up or altered in some way). Unless a lot of new stuff has popped up since I left JHU, they have:
Low sensitivity
High false negatives/false positives
Detect adducts that could...be..well...anything... Maybe I'm exaggerating, but I'm not sure that I am. The techniques my department used were fluorescence based and/or gel migration based (sure -- the gels were like 0.5M long, but still...gels...)  

I think this paper is an invitation for DNA damage researchers to join us in the 21st century! Check this out:


Yeah -- this method is no joke. I mentioned above that the Turesky lab has been developing mass spec based DNA adduct detection/quantification methods for years, it appears, so they know what they are looking for in terms of mass shifts. However, a classical dd-MS2 method isn't going to cut it here. Honestly, this figure doesn't do this study justice.

At first, I thought this was the first Wi-SIM-DIA paper I'd come across (this is a method on a Tribrid mass spec where the Orbitrap does wide SIM scans while the ion trap does simultaneous small DIA scans. However, to get the level of precision this group needs to detect and identify these adducts, they do all steps in the Orbitrap. This requires a lot of timing because they are eluting digested nucleotides off nano-LC columns and all these Orbitrap scans take time.

Looking at it this way, I immediately wonder if someone could pull this off on a simpler instrument if they had complete hardware control, but then you realize they throw in MS3 as well. Quantification and confirmation of these nasty DNA modification all in one go!

Monday, October 30, 2017

New Stuff on the Thermo Omics Portal!

If you've visited the Thermo-Omics Portal recently you'll notice it's been under renovations and looks a whole lot spiffier. You might also notice that they've announced the 2017 User meeting in Bremen!

The navigation on-site has changed and I've gotten a couple questions about how to find things. For example, if you are looking for the PD 2.2 demo you'll want to follow the horizontal lines, then sideways arrows. This is ugly, but this is what I mean...



If you click on the horizontal lines by "Navigate products" the big red bar marked 2 shows up. Then you expand that product by clicking on the arrow pointing right.  It makes sense once you do it once or twice, but if you click on the rotating blocks on the homescreen with the product names as you did before, you won't get to where you or your collaborators can download the demo and Viewer software versions.

Saturday, October 28, 2017

Where do you get all those values for proteomics ruler calculations?


The proteomic ruler approach has appeared a couple of times on this blog.  While incredibly cool, when you start trying to do it you may run into this issue -- "where the heck do I get these total intensity values"? The authors are using MaxQuant, but maybe you're using Proteome Discoverer? And you don't want to go back to the RAW file and start taking averages, right?

BIG SHOUTOUT TO DR. CLELAND at the Smithsonian! We were having lunch and he told me exactly where you get the numbers you need!

If you are using PD 2.0 or newer (you should be using 2.2...) there is a post-processing node called Result Statistics.

I always throw it into my workflows -- cause it doesn't seem to add any time to the data processing and it adds a tab of data, but I've honestly never used it for anything.  AND EVERYTHING YOU NEED IS HERE!

For example, if you want to normalize your data against the total area or intensity of every feature identified in the entire dataset?


It is on row 612!  There is so much information here, including the stuff you'll need to start calculating your absolute protein abundances.


Friday, October 27, 2017

PhoStar -- Identify phosphopeptides BEFORE database search!


Ummm...thanks google images...not exactly what I was looking for.....guess I have to use it now...

This blog post is not about how I went from reading through an awesome paper to wondering when the Vietnamese restaurant in Frederick opens for lunch.

No. This blog post is about a super smart way to find phosphopeptide MS/MS spectra before your database search. Maybe even, without(!!) a database search and is described here.


What if you took your output MS/MS spectra and searched them against a comprehensive spectral library of experimentally determined phosphopeptides. -- like this one. 


Spectral library searches are, by definition, fast. However, you'll note that NIST got it's spectral libraries from human phospho from CPTAC -- so they're iTRAQ-4 labeled. PhoStar doesn't care. It uses a supervised machine learning approach to determine how to bin the MS/MS spectra -- do they go into the output file of MS/MS spectra that are likely phosphorylated? Or into the new file of MS/MS spectra that are definitely not phosphorylated?

How great is this? In what is possibly my all-time favorite dataset (Bekker-Jensen et al.,) these authors do such deep coverage of the human proteome (in only 32 hours of instrument run time) that they find over 10,000 phosphorylation sites without enrichment!  I've downloaded and reprocessed about half of this huge study and even on my last-gen Proteome Destroyer  -- I haven't even considered trying searching for phosphorylation. 584,000 unique peptides from one cell line!! It's tough to process with no modifications whatsoever....

When you have MILLIONS of MS/MS spectra to search through, every phosphorylation is going to make this job tougher. PhoStar gives you the ability to pull those out and worry about them later and separately. And it looks like it does a darned good job!


Thursday, October 26, 2017

TMT-C+ A simple method to improve MS2-based reporter ion quan?


This is new study at bioRXiV is worth taking a look at if you're doing MS2 based reporter quan! 


The MS3-based reporter ion quan methods on tribrid Orbitrap systems make use of the fact that there are relatively large MS/MS fragments that still possess tagged regions when fragmented at lower energy.  A bunch of those are isolated together and fragmented at extremely high energy to get a ton of reporter ions at once.

In MS-2 based reporter ion quan, we hit the tagged peptides with medium level collision energy so we can get enough information to quantify the peptide and identify it in a single spectrum. In addition to the free reporter ions in the <132 m/z range, you'll often see fragments with complementary information. For example, fragments of peptides that are also quantifiable. These have been referred to as TMTc fragment ions (C stands for complement/ary)

The IMP-Hyperplex node (pd-nodes.org) permits you to utilize these to improve your quan by adding in the TMTc ratios into the reporter ion ones and is an add-on for Proteome Discoverer 1.4.

TMT-C+ takes this method a step further by developing optimized sample prep and instrument parameters specifically for producing stronger TMTc signals. I'm unclear on the second part, but by integrating the modeling of the isolation window into their calculations (?) they dramatically improve the precision of their measurements!

Tuesday, October 24, 2017

Bioinformatics Resources for Proteomics!


Want a really great (up-to-date) review of the resources out there for us for interpreting proteomics data? Check this out! 


It is a really good introduction to the data analysis portion of our field and I'll be adding it to the resources for newbies section over there ---> somewhere.  It's also a really nice resource for any of us more seasoned(?) people who might be in a rut with the databases and software we've been using.

It is just one chapter in this nice new methods book -- that has a really great title!


There is a LOT of good stuff in this book!


Tell me you don't want to read these chapters!  Quantitative cell surface glycoproteomics, PRM with ETD!?!?!?


Tell 'em, Spock! And..umm...and for the weirdest 2 minutes of 60s television possibly ever....(why would I post this here...?)

You should check this book out. The more I flip through it, the more I expect you'll be hearing more about it!

Monday, October 23, 2017

Extensive human phosphorylation -- not on S,T, or Y!


I'm honestly leaving this exciting new study at biorXiV here so that I can remember to get back to it when I get caught up today! 


This...um...would answer a bunch of questions, wouldn't it...?  I can definitely think of a puzzle or two...

PHOSPHORYLATE...


....guess I'm heating my office this winter with all my new search space...

Sunday, October 22, 2017

Simple (& surprising?) strategies to find more PTMs on quadrupole Orbitraps


I'm at a loss here. Time for more espresso? The observations in this new study don't gel with my mental framework, but I think this is definitely worth checking out, because these authors build a solid case for their observations.


The first part here is easy. These authors demonstrate the use of static inclusion lists as being very useful for finding the acetylated peptides that they are interested in. On their quadrupole-Orbitrap they can put in up to 5,000 target ions for fragmentation and aside from that they run it as a normal dd-MS2 experiment. If the instrument doesn't see anything from the target list in the MS1 scan it goes ahead and fragments the most intense (presumably, that pass the "peptide match" protocol selected). Somebody had a really catchy name for this, I forget what it was -- gas phase enrichment or something? 

EDIT: Since I'm already here 'cause I misspelled "Orbitrap" in the post title..ugh...something else I forgot. They mention that other systems can accept 50,000 targets in their inclusion lists! I've always wondered what the upper limits were.

WAIT! I get it now. There's hope for you yet, brain!

The part I couldn't figure out was this observation -- that turning off "exclude isotopes" increased the number of acetylated peptides that they identify. They enrich acetylated peptides from these organisms and the resulting mixture is relatively simple. By turning "exclude isotopes" off, they are  allowing multiple fragmentation events to occur for each acetylated peptide, essentially getting around the dynamic exclusion settings! Presumably they have plenty of cycle time to get to each acetylated peptide multiple times and increasing the number of times the peptide is fragmented increases the chance of positive identification (looks like all IDs are with Mascot, btw).

They find that just turning off that feature allows each peptide their interested in to be fragmented 4-7 times more than when they leave the feature on and this massively increased their chances of accurately identifying the modified peptides.

Honestly, in this same circumstance, I'd probably have started with crudely matching my peak width with my dynamic exclusion windows. (Some of my not-as-terrible ramblings on dynamic exclusion optimization can be found here and here and here.), but their strategy appears to work quite well in their hands and you don't have to go messing around estimating peak widths.

A final interesting note in this papers is that they stress the importance of manual interpretation of MS/MS spectra for modified peptides. I agree 100% -- until I realize the dataset in front of me has over 1e6 matched PSMs....then...


I mean...I definitely check the important ones!  (Not to sound like a slacker, but if you spend 1 second on each of 1e6 spectra, that is over 6 full 40 hour work weeks...and on a modified peptide I need a whole lot more than 1 second...)

There are more interesting observations in this nice study as well. Definitely worth checking out!

Saturday, October 21, 2017

Advancing top down proteomics past 30kDa!!


Okay...before you roll your eyes and assume this is going to feature the custom modified quadrupole Orbitrap with the 7 ion funnels that requires vacuum levels that can only be achieved on the International Space Station -- I don't think it does.  This appears to be a normal high field quadrupole-Orbitrap....that can scan at a 3,000 resolution and do 25 microscans.....


...I know! Not out of the box, but this is stuff that is just operational software. However, I swear there is stuff for everyone doing top-down in this new paper!


Realistically right now when someone says "I did comprehensive top down proteomics" it means "I got some great coverage of the proteins from this cell that were in the 15kDa range and I even got a few that were almost 32kDa." This is where the technology is -- but this range is creeping up all the time,

This paper is great because they got over 400 proteoforms that were in the 30kDa - 60kDa range and they did so by breaking one of the rules of top-down proteomics. They used NARROW isolation widths -- 3 Da!

(If you haven't done much intact protein fragmentation work, this probably doesn't sound too weird, however..it totally is!)


This is a zoomed in MS1 spectra (140kDa QE Classic) of a 23kDa protein. There is a lot of stuff going on there -- however, the important part here is the signal -- it's only E5 in the MS1. That may be enough precursor for a peptide, but larger ions have a couple of problems. They are harder to isolate. They degrade more rapidly during trapping. They are harder to transfer from one place to another efficiently (for example, from the HCD cell to the C-Trap to the Orbitrap -- they're all places where you are going to lose more ions).

Another big problem is the ion spread. On a peptide with 12 amino acids, there are basically only 12 fragment ions possible, right? On something at 23kDa that fragmentation spread can be spread over 200+ different MS/MS ions (though there are obviously energetic biases).  So we cheat. We open up the precursor isolation window to get in as many ions from this one (or more, if multiplexing) protein charge state to squeeze in as much signal as we can and we use different scoring systems like the ones in ProsightPC that deal with co-isolation interference using strategies different than peptide engines. (Totally worth discussing some other time!) You will typically see isolation windows of 15Da or more.

Bigger proteins, however, mean more charges are accepted and the isotopic envelope is squeezed tighter together.


Easy example -- mAB -- if you capture a 15 Da window around this charge state you would really be isolating at least 5 different proteoforms!  This is going to make the job of sequencing the protein(s) much much harder, even for top-down algorithms that are expecting it.

Wow...that was a lot of words....I missed blogging!! Okay -- so it's a big deal in this paper that they narrow the isolation width. Apparently the segmented quad and the high field Orbitrap can get enough signal here that we can actually ID these proteins now! It's one of those things where, we've always done it this way, but the new technology advanced our capabilities more than we thought!

Worth noting, to get to the total number of proteoforms reported in the abstract they combine methods, including Autopilot, which you probably don't have. And they did use 3,700 resolution for the MS1 on the bigger proteins. However, that's not too far off from what some quadrupole-Orbitrap systems are capable of doing now when running unmodified versions of the vendor's software!

I'm glad there isn't a word count on this...but the moral of the story is that you may be able to get good results on fragmentation of intact proteins on some instruments with isolation widths that will enable single proteoform isolation, even on larger proteins!

Friday, October 20, 2017

Profiling the BNP Cardiac marker at rocket speed with CE-QE!


I puzzled over this new study for quite a bit. 



First off -- I had never heard of this BNP peptide or how it works, but the important thing for my limited understanding is that monitoring it rapidly is super important in some cardiac diseases.

Our standard proteomics techniques are awesome -- but they are ultimately far too slow for a clinical environment. I still don't 100% get this technique, but I do get this about it:

1) It requires very little hands on work. It's automatically incubating this reaction.
2) It allows multiple measurements of these critical markers in rapid succession because it is using capillary electrophoresis (CE) connected to a quadrupole-Orbitrap (plus)

One of the coolest things about CE is that it's just constantly going. You don't have to go back to re-equilibration the way you do with LC. The ions are migrating electrophoretically (probably not the word) through the same buffer system. You can make another injection before the first one has reached the mass spectrometer, so you can have tons and tons of throughput.

The way this method takes advantage of these properties is by allowing an enzymatic(?) reaction to occur and then making subsequent injections from minimally processed plasma samples into the CE flow path. There are 5 peptides(?) that they are interested in that aid in the cardiac progression diagnosis and they can monitor all 5 of them as the reaction is going.

Complaints about mass spec speed in the clinic? They go from plasma to read-out of these 5 critical markers in under 1 hour!

Thursday, October 19, 2017

Proteome Discoverer 2.2 Column (row?) Interpreter!


Once upon a time, I made a spreadsheet that explained what all the output columns were in a Proteome Discoverer (1.3? or 1.4?) output with my own (probably inaccurate) descriptions of what they all meant.

Unfortunately, someone was looking for it -- and I lost it along with a bunch of other blog resources a while back (please continue reporting down links -- I'm eventually going to get them all!)

This is a work in progress to some extent, but I made a new one! It was a great exercise because I'm doing an "experts PD workshop webinar" this afternoon and I'm feeling very sharp for it now!

I focused on PD 2.2 since that's what I'm using all the time. You can download it from my personal Dropbox here.




Wednesday, October 18, 2017

Simplify the top down proteomics problem with NeuCode!


Top down proteomics is still tough  -- some of the problems that were really hard 10 years ago are still really hard now. What if we could simplify the whole thing by doing something completely different?

Like this...?!?!?!?


What if you looked at the challenge of trying to work out an intact protein sequence (including PTMs!) from the MS/MS spectra and set that to the side for now? Instead you use a combination of the intact protein masses alone (those are much easier to get!) and you combine that with ultra-deep shotgun analysis to work out the PTMs?  Could you then link the proteoforms back together?

Some of them for sure, but there is going to be a whole lot of uncertainty there between proteoforms of similar mass. What if you knew something really cool about the intact proteins that would help you link it back to the shotgun measurements -- like EXACTLY how many lysines are in each protein!

This is yet another thing that you can do with the NeuCode reagents. The application of NeuCode in this manner was previously shown in this paper by many of these authors.



Honestly, it sounds like a neat trick -- TADAA! this is how many lysines are in this protein, but there are other ways we could do this, right?

What this new study does is shows how we can actually apply this to a biological system -- by delivering the largest number of E.coli PTM-annotated proteoforms we've ever seen in a single analysis (>500). It is worth noting that there are some of the familiar top-down limitations, like proteins >45kDa were excluded from analysis, but what a cool new method to have in our utility belts!

Tuesday, October 17, 2017

Sonic speed digestion for complex proteomics samples!


I just got back from a couple weeks in amazingly beautiful southern Portugal. Great climbing, beautiful beaches, cool people, and the best $2 wine in the world.

I didn't actually intend to be disconnected from this hobby, but I dropped my laptop -- due, in no way whatsoever, to the awesome $2 wine.

I'll be backlogging posts for a bit, though, TONS of cool stuff came out recently!

But first -- SONIC HORN SPEED DIGESTION!


My library doesn't provide digital access to Talanta any longer, but I think this tool is super cool. Why wouldn't ultrasonic treatment speed up sample digestion?!?


According to the abstract they get to full tryptic digestion in 5 MIN!!  It's exciting to see some good science out of Portugal, as well as sonar contributing something positive to our field!

Monday, October 16, 2017

Multiplexed plasma peptidomics!!


I stole this slide on peptidomics from this talk by Harald Tammon (see LinkedIn is good for something!)

Peptidomics is coming fo' real, yo! And this new paper in MCP jumps one of the biggest hurdles in doing these experiments! 


One of the reason peptidomics is so hard is that that processed circulating peptides are a little bit too big for metabolomics tools -- and often too small for proteomics tools. When a metabolomics person is optimizing their chromatography to separate lactic acid from alanine so they don't just shoot off the column in one single peak, that same chromatography system might not be the best for catching a singly charged peptide at 550m/z. And our tools? We rely to a huge extend on peptides accepting at least 2 charges so they provide an appropriate b/y spread for identification. +1 peptides?!? Most of the time we tell the mass spectrometer to just ignore them -- cause we aren't gonna identify them anyway.  (I wrote a post on a classic paper about this here).


How'd these authors tackle the problem? By TMT tagging everything! In general the TMT tag will add about 1 extra charge (all pH/pKa {or something} dependent) and all the sudden their normal proteomics workflow could do quantitative peptidomics!

To improve identification they used both HCD and ETHcD (forgive capitalization) and search the data with PEAKS and Byonic which are both more likely to successfully identify +1 peptides than Sequest.

How'd they do? Thousands of quantified peptides and a method that I'd follow to the letter if someone asked me to quantify changes in the global peptidome!