Saturday, September 30, 2017

Unrestricted data analysis of protein oxidation!


Okay -- you're gonna have to trust me on this one -- this figure above is actually really cool, but I can't get even the single image to copy over here right. I even tried (on purpose!) to open this paper in the "Active View" thing...

It's from this paper that is way too smart for me this morning.


In general we still have to limit the PTMs we go after in a study. Maybe that's going to change soon with some of the next generation algorithms that are coming, but right now we need to be restrictive. People studying protein oxidation in a biological context -- for example in aging research -- tend to focus primarily on carbonylations. We know from induced oxidation studies, like FPOP (which is probably an extreme example) that oxidation can have all sorts of different effects on a protein.

What this team shows here is a somewhat counter-intuitive way of looking at all sorts of oxidative events, even in complex matrices -- as far as I can tell, by just using MaxQuant in a clever way and some relatively simple post search filtering.

All the data they show is from a Q Exactive with 70,000 resolution and 35,000 resolution MS/MS. I think the resolution in the MS/MS is pretty critical for what they are doing. Even though mass accuracy doesn't really change with increased or decreased Orbitrap resolution, their downstream filtering is super harsh and co- corresponding fragment ions at lower resolution will probably lead to a real PTM getting tossed.

If you're trying to resolve a modification of tryptophan chlorination (+33.96) from homocysteic acid (+33.97) you might want to double that resolution (though..it does help a little that this example occurs on different amino acids... ;)

Something that ends up being ultra-critical for them is the "dependent peptide search" function in MaxQuant. Fabian Coscia describes this function in this YouTube video here (description of the function starts at 9:19, but the whole thing is worth watching.)  This slide screenshot does a good job of summarizing how it works.


These authors utilize this function and then export the resulting delta M peptide modifications and filter them down to known oxidative modifications (oh -- their samples are treated with something that oxidates the Albert Heck out of them.)  What they find in a very simple mixture reflects in a much more complicated sample -- specific oxidation "hot spots" and a whole lot more interesting protein oxidative modifications than carbonylation! Once they find them -- they've got MS1 signal to quantify them with.


Friday, September 29, 2017

Advanced precursor ion selection strategies on an LTQ Orbitrap!


I'll be honest, when I found this paper I was looking for the answer to a completely different mystery. However, this awesome paper goes a long way toward answering a question that's been rumbling around in my head for a while -- that is: how hard could you push an LTQ-Orbitrap system?

3 paragraphs dedacted due to excessive rambling....

What if you could get around that awesome mustard yellow interface and into the guts of the operating software. Could you write better, smarter instrument control software and crank that monster to 11?


Yeah -- this awesome study suggests there is definitely some room for improvement!

This team totally hacks and Orbitrap XL -- and drastically improves it's performance! 200ng of HeLa ran 8 times they get around 1,600 unique proteins (single shot top 10, 2 hour gradient). Honestly, that's pretty good and smokes any Q-TOF or Ion Trap I've ever personally used.

When they modify their instrument parameters to do cool things like better control dynamic exclusion -- and to automatically exclude peptides identified in the previous runs using their cool method (Smart MS2) they can get that number up to ~2,500 unique protein groups in 4 runs. Wow, right?

And if you're thinking "who cares, I'm no Russian hacker" check this out!

You can buy SmartMS2 for your LTQ Orbitrap!  (please see the disclaimer's section of this blog, it is way over to the right somewhere at the top. I am not endorsing this product. I have not used this product. This is a semi-scientific review of the literature only, and mentioning the fact that this thing is out there falls in line with the general story of this paper and blog review thing. And someone reading the paper would find out anyway!)

Thursday, September 28, 2017

Is it finally time to revisit biomarker discovery in plasma proteomics?!?

Admit it -- we jumped the gun. Proteomics was the most exciting thing ever and we had a couple of awesome early successes -- and every lab and biopharma company in the world dumped $$$$ into searching for plasma proteomics biomarkers. And...not to be mean...but very little came out of it...

There are facilities I've been to where people worry about saying "biomarker discovery" out loud. Where the impressions from the fleets of FT-ICRs still remain indented in the floors...and 6 cars fill lots that could hold 600....

We underestimated the problem. We underestimated the matrix. We didn't have enough speed, our separations were too primitive, and -- especially this -- we didn't have the dynamic range. I'm saying this all in past tense...cause this is the real question....


...and this great new open review tries to tackle this question head on!


The review starts off with some very good perspective. We all know the dynamic range of proteins in plasma is 10 or 11 orders, right? But, honestly, what does that mean in relation to disease states and current biomarkers? This is addressed very well here.

Okay -- another cool thing in the paper is the literature analysis, the rise of published proteomics studies that just keeps going versus the plasma proteomics biomarkers studies that went way up and then went way down. I think you could directly correlate that graph with the parking lot(s) I mentioned earlier!

I mentioned above that we didn't have the dynamic range. This is only partly true. It is more accurate to say -- we didn't have the dynamic range per unit time. I have a good friend who does incredibly deep analysis of samples. She gets coverage of her samples on par with anything we see in the literature today on today's newest instruments -- and she's been doing it for years with little change in her instruments and methods. However...she may spend 1-2 months of analysis on a single sample. Multiple protein extraction and digestion techniques, 2D-offline fractionation, that sort of thing. We've always been able to do that stuff...eventually....but now we have dynamic range / unit time!

This is where this review goes from reviewing the history of plasma biomarker proteomics -- to providing the blueprints we might use and changes we'll need to initiate if it turns out we're finally there. I especially like the grouping of the strategies into 2 clear groups, triangular and rectangular and I plan to add them to my terminology list.

Are we there yet? Maybe? At the very least, we're a whole lot closer than we've ever been, but there's definitely some work ahead still.

Loosely linked side note/foot note: I just learned recently about Eroom's law. It's a play on Moore's law (that we'll double computer power every 2 years or whatever). Eroom's Law is the opposite. It states that each new drug discovery will cost more and will take longer than the last one. This was based on observations from the pharmaceutical industry and there are lots of thoughts on the causes. One leading theory is the tendency for companies to expand and continuously add non-scientific staff like management, administrators and HR to "support" the scientific development. I've also seen it thrown out there that as a company expands it becomes commonplace to bring in outside thinkers from other industries and this may contribute.

It turns out that the fictional(?) parking lot I mentioned above had 30 spaces for the scientists and 569 spaces for the managers, administrators, marketing and human resources people and, of course, 1 for the hot shot executive bringing in all the freshest ideas from AOL...maybe our technological limitations weren't 100% of the problem...


Wednesday, September 27, 2017

QuiXoT -- Quantify any proteomics dataset labeled in any way?


I feel like QuiXoT has been on the blog before -- because maybe I used a screenshot from this travesty of an 80s cartoon show (the only thing good about it was the pun -- that I didn't get till years later...), but a search doesn't reveal it.

Nope -- it looks like QuiXoT is new!


...and available in pre-print at BiorXiV here! (I can not commit the proper capitalization to memory)

What is it? It is a pile of tools for quantifying mass spec data -- any mass spec data. It was first designed in house to quantify 18O (O18?) labeled proteins, but then they realized if they could do that, quantifying everything else was easy.

Immediately, my first question is what about 15N. It is never mentioned in the paper. Fortunately, in my country right now it is completely acceptable for you to perform any kind of formal business thru Twitter. You can break 100+ year established traditions at 3 in the morning, you can change military policy-- anything you want....just Tweet it out into the universe.


...I also wrote an email to the corresponding author, just in case they aren't up with the modern way we do things here in D.C....(sigh)...

BTW, you can get QuiXoT here. It has an awesome logo.

QuiXoT does need some manipulation to run. It isn't a super user-friendly GUI. However, it has a really nice feature called a "Unit Test". These are short tests that make sure that 1) You have all the pre-requisites to run the program 2) You get familiar with the steps of what you're doing and 3) You can get the data out that they expect from the data they give you.

Considering how hard it can be to get quan from 18O/15N labeled data (I know there are things, but it's nice to have alternative algorithms to run samples through) this doesn't seem too bad at all.

UPDATE 1 hour later (people in Europe get up EARLY!): QuiXoT can't do 15N, but it's still awesome!

Tuesday, September 26, 2017

The Dark Proteome Database!!

(Borrowed this image from ChemistryWorld -- here)

I LOVE talking to people about the dark proteome. I seriously think there is some fundamental biological process occurring in all cells that we don't know about yet. Maybe I'm crazy, but when you realize that we can't identify MOST of the MS/MS spectra we get, it does suggest something like that - right?

This new paper doesn't diminish this idea at all! 


I love the name of this journal! Just added it to my "watch" list.

In this study, these authors construct a database of the stuff proteomics doesn't identify -- and -- holy cow, it's super weird!

They test a lot of the presumed assumptions -- like, these are alternative cleavage events, or intrinsically disorders -- and find that these assumptions fall well short of the whole.

Okay -- so how about this for cool -- they rate entries according to their "level of darkness" in the database.

I'll be honest. I'm in the web interface now (which you can directly access here) and -- while cool -- I can't come up with a good idea of how I would/could utilize it right now. Considering the fact I just resigned myself to the fact I will never find my car keys again and I swear I just had them -- maybe I just need (a lot) more espresso this morning!

Monday, September 25, 2017

Deep Dive -- double 96 fractionation for when you absolutely need sensitivity


Ouch. This goes on the list of techniques I really hope I never have to do -- but if you really really need to quantify that peptide and you don't have any way to enrich for it, deep dive will probably get you there.


This is the overall strategy -- (please note they fall in the non-deplete camp) -- and you are reading that right. Fractionate into 96 well plate, monitor to figure out the well(s?) where your peptide ends up. Fractionate that single well into another 96 well plate -- and then use that final well for quantification.

I guess if it is completely automated and you know that C6 well fraction F4 has what you're looking for it should be the same for every sample...so it wouldn't be that bad? Again...I hope I don't have to do it, but the method is here just in case.



Sunday, September 24, 2017

SugarQB -- Glycoproteomics just got a whole lot easier!

I've had to sit on this one for a couple of days. This is a dilemma for me (despite the fact it took me 10 tries before Google was satisfied with my spelling of the word "dilemma." Not one "n". At all. )

This is it: I LOVE Byonic. Love it. It doesn't show up on the blog a lot, but I think it is some of the best software we've ever seen for proteomics. It has been pigeon-holed (which apparently is a term) as a glycoproteomics tool -- and maybe that alone. It is, however, a REALLY good proteomics search engine. It is, {also} however, a commercial product and not every lab can afford the $7k USD or so for it. {Am I allowed to say that? Guess we'll find out...}

Now -- I'd seen some posters online that suggested that my friends at IMP had glycoproteomics tools in the works. This is good for all of us, because to date, IMP has never charged anyone for a piece of software. They are even responsible for the fact there is a free functional version of Proteome Discoverer (which is, btw, off-the-charts awesome! Have I shown data yet? Man, I need some free time -- glad my vacation starts in 42 hours!)

I forgot what I was talking about. And I don't care. But check this out!


Ummm....is that a free glycoproteomics workflow running in Proteome Discoverer 1.4? Yeah...it totally is...and it's no joke. It is seriously amazingly -- like -- I don't want to admit how good it is -- good.


(Did you know Lavar can't say that phrase out loud due to a court order? He can't. If I ever met him, I'd still try really hard to trick him into saying it. I have scenarios planned and everything...)

More importantly to this conversation -- is this new Nature Letter article --


Where they utilize SugarQb to uncover new glycosylation events that occur in exposure to Ricin. This is totally cool, for sure, we should figure out how Ricin works in case we make Walter White mad or something. But SugarQb can be applied to any acquired proteomics data and any biological problem -- where we know the glycosylation pathway of interest -- or not.

You can get SugarQb at the revamped pd-nodes.org. As far as I can tell, it currently only installs in PD 1.4, but I haven't tried moving the .DLLs over yet.  I'll let you know.

Saturday, September 23, 2017

Quick guide -- is it a peptidase or a protease?


My formal training is in teaching and in microbiology. This whole mass spec thing kinda happened because no one else wanted to do it -- and I was definitely the worst person at the stuff my lab was good at -- so...well...here ya go!

I appreciate the heck out of anything that can clear up technical things for me fast. Especially if I can just link them on this blog so I can find it later.

Thanks to @PastelBio I now have a link to look at the next time I'm considering using peptidase -- when I really mean protease -- as well as the different kinds of each.

This is courtesy of DifferenceBetween.Com and you can find this here! 

Wednesday, September 20, 2017

APOSTL -- A staggering number of Galaxy AP-MS tools in a user friendly interface!


Thanks to whoever sent me the link to this paper! That doesn't mean that I'll write about it, btw. However, if the paper leads me to an easy user interface that allows me to use a bunch of tools I've heard about, but all require Perlython or Gava or Lunyx or whatever to use otherwise, there's a pretty good shot!

This is the paper that describes this awesome new tool! 


As far as I can tell, bioinformaticians fall into a couple of different branches. You've got the hardcore computational camps that are writing all their stuff in Python, PERL or whatever. And you've got your more data science people who seem to be using either R or Galaxy. From my perspective all the awesome tools have one common denominator when I give them a shot...


Okay...maybe 2 common denominators...


....joking of course! But it isn't uncommon for these shells or studios to require some extremely minor alteration or library installation that is a challenge for me to do. And honestly, as cool as all the stuff in Galaxy looks -- I don't even know where to start with that one.

And here is where we click on the link to the APOSTL SERVER where all the Galaxy tools for Affiniity purification/enrichment MS experiments are all located in a form where we all can use them!

APOSTL has a flexible input format. Workflows are already established for MaxQuant and PeptideShaker but it looks like you'd just need to match formatting to bring in data from anything else.


I don't have any AP-MS data on my desktop right now (and it sounds like it's still working on a big queue anyway) but I have some stuff that needs to go into this later. I'll let you know how it goes.


...and sometimes Matthias Mann uses a picture from your silly blog!!!


One of the downsides of my current job is that I have to miss my favorite conference, iHUPO. Fortunately, though, loads of really cool people are there and I've been able to keep on top of what is happening via Twitter.

...and...yesterday I got this picture that made me feel included and also made me laugh a lot!!

Tuesday, September 19, 2017

Two cool new (to me?) tools I somehow missed (?)

I'm leaving these links here so I don't forget them (again?) they've both been around for a while and I'm wondering if I forgot, or if our field just has a lot of software!

You can check out MZmine 2 here. (now I can close that tab -- WAY too many are open!)



And Mass++ is here. P.S. PubMed thinks + signs mean something else and doesn't like searching it as text.

Both are free -- look super powerful -- and are waiting on my desktop for me to bring my PC into a hyperbolic time chamber.

Let's plunder some .PDresult files, matey!

(Wait. There's a guy on our team who dresses as a pirate?!?)

As I'm sure you're aware, it's talk like a pirate day. You can go two ways with a holiday like this as a blogger. You can ignore it completely OR you can make a sad attempt to tie it in with what your blog is about. I, unfortunately, chose the latter.

Recently, I've been helping some people with some impressively complex experiments. The days of "how many proteins can you identify" are just about gone. The days of "how does this glycosylation event change globally in relation to this phosphorylation event and how the fourier are you going to normalize this" Arrr upon us. 

The Proteome Discoverer user interface has gotten remarkably powerful over the years. However, I imagine the developers sit back and have meetings about -- "we have this measurement that is made as a consequence of this node's calculation, but I can't imagine a situation under any circumstances for why someone would want this." To keep from overwhelming us with useless measurements they don't output some of these measurements.



.MSF files and .pdresult files are really formatted SQLite files in (pirate? groan....) disguise. DB Browser uses virtually no space and can pillage these files and reveal all the behind the scenes data. 

For this reporter quan experiment, I can get to:  


78 different tables! Add more nodes into your workflow and there are more! You can get to in and pillage the files for tables you can't access otherwise.

Is this useful to you? Maybe if you're doing something really weird. If the weird thing you are doing is really smart, you could also make a suggestion to the PD development team to include it in the next release.  In the meantime, maybe this will do, ya scurvy dog (ugh...sorry...)

Monday, September 18, 2017

Metabolic labeling for tracking acetylation dynamics in human cells!


I was going to call this "new strategy" in the title, but I wasn't 100% for sure on that fact. I just know that I've never seen anything like this!



The strategy involves using heavy labeled acetate and glucose. Metabolic labeling techniques that are normally reserved for advanced metabolomics experiments -- but here they are used to track acetylation in the proteome!

One of the advantages of using the heavy labeled acetate media is that you pretty much know where that is heading -- through Acetyl-CoA to protein acetylation (sure it can go other places, but that's where proteins are going to pull from) -- the first observation this awesome study provides is -- HOLY COW HISTONE ACETYLATION is FAST!

Not phosphorylation fast -- but still freaky fast. Turnover, so de-acetylation back to re-acetylation in around an hour. Maybe that doesn't sound fast until you remember what histone 3D structure looks like.

(Borrowed from MB-Info here)

Histones aren't just hanging around linear in the cytoplasm where their active site may be easily accessed by acetylases. They're balled up and complex and I'd expect alterations to happen on a more glacial time scale because of it. Sounds like whatever the acetylations are doing in there is seriously important because a lot of energy is being used to get them in and out of there.

Super cool new (at least to me) methodology and seriously interesting biological implications make this a great Monday morning read.


Sunday, September 17, 2017

What 2D peptide fractionation technique yields the highest number of IDs?


The field of proteomics has changed at a dazzling rate in a remarkably short period of time and it's an absolute challenge to keep up on instruments, software and methods. Separation science is also evolving and it's yet another factor to try to keep up on.

During my postdocs methods for in-solution isoelectric focusing of peptides were the cutting edge and that's what I used for all of my 2D-fractionations. 7-8 years later...not so much...I've seen one paper that used this separation methodology this year -- and I do watch out for it.

What does the current literature say about the best offline fractionation techniques for shotgun proteomics?  I know a lot of my friends are using high-pH reversed phase offline fractionation. But -- are they doing it for the same reason I was using IEF? Cause it's the cool thing right now?

(Did you know you can install this button in your browser window in Chrome?)

The first paper Scholar directed me to is this short review on the topic:


For studies where they have lots of sample, >10ug total peptides, these authors report dramatically higher peptide IDs (80% more peptides) when using offline high pH reversed phase fractionation rather than offline SCX. Interestingly, they mention that report that the desalting step post-SCX is a major point of sample loss, with up to 50% lost in their typical protocol.

However, they do find online SCX (MudPIT type methodologies) more sensitive when they are limited to less than 10ug of total peptides.

This review spends a lot of time stressing the importance of concatenation techniques. It's one shortcoming is that it doesn't give me much to work with in terms of technical details --resins and gradients and so forth.

However, it appears that all of these details can be found in this study:



The second paper Scholar directs me to is a phosphoproteomics one --


I know this group has been primarily using high pH revesed-phase (which they usefully abbreviate HpH, but I hadn't seen this technical note.


In terms of phosphopeptide identification, the work seems quite clear cut. Wow. Does the concatenation ever look like a pain in the...


Okay -- these studies have all compared HpH to SCX. What about isoelectric focusing (IEF)? Scholar?

First study that pops up doesn't have a very pro-IEF title...


I have to say, however, that the results aren't quite that drastic. HpH does outperform IEF in every way, but it isn't a night and day comparison. Interestingly the overlap isn't huge between the two techniques. They are sampling a pretty small percentage of the proteome (<3,000 proteins) so they're getting some interesting variations based on stochastic sampling of the whole.

This paper really goes beyond just a comparison of these two techniques. There is some really insightful charts showing peptide/protein distribution and relative protein coverage. Not that the other papers I link in this post aren't worth checking out -- but if you are just interested in peptide chemical properties -- this is a seriously interesting read.

Whoa...I've read too many papers for a Sunday morning. Time to get out and do something!

(No -- Ben -- do not investigate the impact all this concatenation has on LFQ....do that later...)

HUPO 2017 Kickoff!!

SUPER JEALOUS of everyone at HUPO 2017 in Dublin. Definitely may favorite conference and the second year I haven't been able to go do to how crazy September is for my day job.

If you are thinking to yourself -- "I probably won't Tweet this. No one will read it anyway." You're wrong!

Saturday, September 16, 2017

msVolcano -- Visualize quantitative proteomic data!


Want to visualize any quantitative proteomic data out of any platform or processing pipeline? Want that output to look like a scientist did it?

Check out msVolcano! 


This should not be confused with Ms. Volcano which I just discovered is a pop song that I could not handle in it's entirety...shudders....

While optimized for MaxQuant output, it sure looks to me like you just need to move some columns around if you are using Protoeme Discoverer or other software packages...but I haven't verified.


While it definitely seems valuable for any dataset, the authors stress the power this workflow has in terms of affinity purification and affinity enrichment experiments!


Friday, September 15, 2017

Make a histogram of up to 300 masses in 10 bins in 5 seconds!


Feel free to react like miss Jones above because, yes this is totally the laziest thing ever. However -- if you need to make a histogram in 5 seconds and you aren't real concerned about what people think about you or your computational skills --- I present the Social Science Statistics Histogram tool!


Put up to 300 masses in the box -- and BOOM!


Laziest mass distribution histogram of all time!

I made an assertion today regarding small molecule quan and a much smarter person challenged that assertion. This little tool proved in 5 seconds (maybe minutes to cut the data from the CSV file...that I opened in Excel first...which I hope elicits more eyerolls...) that I was completely wrong!!  I'd based my entire mental framework on what I know from HRAM peptide quan and one small molecule anecdote from a study I worked on last year.  Is there anything cooler than finding out you are completely wrong about something? It made my day for sure! Sure, there are 100 ways to make a better histogram, but I'd argue there isn't a faster one.