Saturday, February 24, 2018

STOP. Do not phosphoenrich another sample till you check this out!!

WHOA!!! Some of y'all have spent a good part of a million bucks to double your phosphopeptides. What if you could just about do it for free?!?! What if it just requires switching your solvents around and adding an enzyme to degrade the DNA/RNA?  That's what they show you can do here!

Okay -- so they only increase the numbers by like 50%. But that's still a lot!!

What you're doing is just switching up the protocol to drop the amount of crap that is also sticking to your enrichment column.

By UV it looks like this:

A has a lot of crap! B has WAY LESS crap! And that's all there is to it, your instrument spends less time trying to sort out your already-difficult-to-ionize-and-fragment phosphopeptides from a bunch of other stuff and you get more IDs.

As a side effect, maybe missing all that junk is a great way to further preserve column life and keep the instrument cleaner longer.

Even if it doesn't? 50% more phosphopeptides!!!?!?!?!

Tracking metabolite fate with the Colon Simulator!

The importance of mammalian microbiomes is something that just can't be overstated. The genomics people haven't been making it up and we're seeing more and more metabolomics stuff to back it up. Unfortunately it can be a little hard to study what all those things are doing in the context of shifting conditions between the mammal's diet and a zillion other external conditions. Maybe variables need to be minimized.

Solution? The colon simulator! 

You can't just put a mixture of the ten most common gut bacteria in a flask and rotate it at 37C and expect it to simulate physiological conditions. The colon simulator attempts to recreate something closer to what is actually going on. In this study they use it to track labeled polydextrose (that's the stuff in just about every boxed/canned product that is "low sugar". It doesn't take much time on Google to find that there are some well known side-effects of polydextrose -- rather than following some forum posts with some very immature statements in their titles down some sort of a rabbit hole, I decide to consult WikiPedia. Here I learn that these statements are backed by science because polydextrose can cause 10x more flatulence than naturally occurring fiber in some people).

If this isn't enough to ban this compound outright, we should definitely be studying it!

These authors put 13C polyglucose in their simulator and track the heavy breakdown products. They use 2 NMR systems as well as an LC-MicroTOF that they operate at 1Hz (presumably to obtain the highest resolution possible? or for massive scan averaging to simulate resolution?).

The output is really interesting and visually very nice to look at, but you'll have to check it out yourself. How I'm interpreting it is that the bacteria aren't doing anything at all uniform with this carbon source, they are utilizing it in different ways and then the new breakdown products in different ways based on their own metabolic processes and relative numbers (explaining the individual variation in processing polydextrose?) and the more we can learn about it, the better.

Have a great weekend!

Thursday, February 22, 2018

Structural prediction of protein models using distance restraints!

Amateur hour is over for structural proteomics, yo'.  Time to take the formaldehyde and the guess work and get off the stage.

This is how you do it. Step by step. Reagents, mass spec settings, free software. Everything is in here. Okay -- actually -- you have to also have this paper (this is where the mass spec data came from) and THEN you have the entire workflow.

MS-Cleavable crosslinker was utilized (of course) but the gem here is the downstream analysis that takes you a step forward in your data. You go from this peptide is xlinked to this peptide to "holy cow this is the way this protein is folded or how these are linked".

As someone with a bunch of this planned in the near future -- this Protocol couldn't have come at a better time. Now I just need someone to install all this software....

Wednesday, February 21, 2018

DALEX: Take a look behind the scene in machine learning!

I probably don't need to tell you that people love throwing the terms "machine learning" around these days. Heck, we're at a point now where some percentage of them aren't just saying it to sound smart and know they actually know what it means. (I'm not one of them) 😈 (Is that an evil cat Emoji? Yeah!)

A big problem with the actual machine learning algorithms that are real things that are actually doing computations on computers is how black boxy they are now. A lot of it is one algorithm build on another and another and modern PCs just have the firepower to run all of them.

Now you have a new problem. What are they doing to your data back there!?!  And if they are screwing it up, when would you know? In 2 weeks when all the calculations are done? Or when someone does a follow-up to your study and the prose has a condescending tone (that you probably imagined...😈...)

DALEX is an attempt to figure this out. It is a project by a bunch of bored mathematicians called MI^2 (this is pronounced "Am I square").

You can read about this project here.

Shoutout to Dr. Norris (someone who does fall in the group of people who knows what machine learning is and how to use it) for this cool link!

Tuesday, February 20, 2018

Metal bands OR words that exist in the human protein FASTA sequence?

On a lighter note, I just saw this on Twitter. These are some "words" you can find in the human FASTA sequences!

I can't find MISERY, but I checked a few. ANGST looks like it shows up in at least 21 human proteins (NIST fwd FASTA was the first one I had available to check). Finally, a good reason to stop being embarrassed about that Goth period you had in high school!

Do you have a data processing task that sounds impossible? Perseus time!

It's NBA All Star Weekend and here in the U.S.A. and it's a big enough deal that in my rural community we get a school holiday for it. From the sounds I can hear from my yard, I think that nearly all of the local children are celebrating this respite from arithmetic by firing semi- and fully- automatic weapons. I'm exaggerating. I'm sure there are also untrained adults out there with military grade weapons. I like to hope there are two distinct groups of people who go down my middle-of- nowhere dirt road:  The group with the machine guns and the group that throws all the "Lite" beer cans out the windows of their vehicles. What can I say? I'm an optimist!

Around all this revelry I somehow have found time to start checking something critical off of my bucket list. And this is to finally take a look at where MaxQuant and Perseus are today.  And...I feel kinda dumb...

I'm going to start with Perseus first. If you don't have this one your desktop and you have any intention of doing an analysis that is more than peptide ID, you should go here and register (it's free, of course) and get the newest version on your desktop.

Am I always telling everyone to download all sorts of software? Probably. I should justify this.

The current iteration of Perseus can do everything you've ever wanted to do with a complicated proteomics or transcriptomics dataset.

It can process your data through logical and hierarchical filters (and allow you to export your data at every point in the step by step process. NOT JUST AT THE END).  Think about how useful this is for a second. If your workflow looks like poop at the end, you can go back through your data manipulations and look at the report at each step. You can find out exactly where you took that beautiful mass spec data and messed it up.

It also allows single step insanely powerful manipulations of your data. Example: Imagine that, out of the sheer goodness of your heart, you have taken on the data processing of a huge clinical proteomics cohort in a virtually unknown disease. Imagine that this study had the most rigorous QC methodology anyone has ever done for a proteomics study (I didn't do that part. holy cow. the team that did is good. wait. this is hypothetical). Also imagine that you have delivered 16 LFQ reports and everyone is really annoyed that you did Control/Disease state, rather than DiseaseState/Control. (It's clinical, this is a bunch of MDs) and recreating those 16 Consensus reports is more than all the goodness that has, or ever will, exist in your heart.

Perseus? Just pull in all the table for all the values and hit the Transform button. Type 1/x and export the report.

I think I literally or figuratively (I get those mixed up) just chose the absolute least powerful thing that you can do with Perseus as an example because it saves me 16 hours of Consensus workflow processing.

What if you have a bunch of SILAC experiments that were done a few years apart and someone realizes that these would be perfect for comparing the light labeled version of 3 of them from the 2011 study to the heavy standards done last year? Sounds like a nightmare, right? There are 10 ways you could do this (PD could do it) but Perseus is actually designed to do it. That's kind of what it is for. There are tutorials specifically made to address this!

If you are thinking -- "wait. aren't you really hard on MaxQuant and Perseus in this blog?" Yeah. Totally. I can't remember even 1% of what I've written on this site, but I think that all of the criticism has been regarding how challenging the software is for beginners or for simple experiments. My first favorable comparison of the two software packages was when PD 1.2 (I believe) could get me the same results the version of MaxQuant did at the time but could do it with a simple saved template that I could generate results from just by hitting the "Play" button. PD has grown up a lot and it is the software I will go to every time (my lab has like 7 licenses and Mascot! w00t!). But if you have something nuts --like -- absolutely nuts -- you may enjoy your life a lot more if you go to software that can do something like this.

This is a multiscatter plot showing the Spearman correlation coefficients for the quantification of 9 different cell lines versus one another. The coefficient is overlain on the plot and the orange is the visualization of one set of proteins selected in a single plot -- carried over to where is this set of proteins present in EVERY OTHER SAMPLE SET.  Is there a set where your proteins of interest are not showing up in the low ratio range? Easy to find that plot, highlight it, it becomes the active plot and then you can examine them manually.

Now -- I have to be honest. I haven't done these plots. I stole them from last year's MaxQuant summer school lectures. But -- this is important -- I'm giving it a go right now -- and I'm just feeding Perseus PD data. I want to do something that is tough and time consuming in PD, so I'm just feeding it into Perseus. Oh -- and I'm also giving Perseus transcriptomics data, too. Cause Perseus doesn't really care what it's looking at, so long as you tell it the right format!

If I convinced you to also give up your next holiday to learn Perseus. I recommend you take the time and start here.

Part II is here:

And Part III (my personal goal for today): This is the video where Dr. Tyanova shows all the clustering!!

As an added bonus, Dr. Geiger is really funny. You have to really be paying attention to catch it and I suspect if you are replaying the video and pausing it while trying to replicate her live data manipulations it's easier to catch her subtle jokes than if you are sitting in the audience. Or the summer school participants are just really serious (as they should be). You may find yourself looking around and wondering why no one else laughed and then realize you're in your office and there is just a sleeping dog and it's 5pm and you haven't had breakfast and maybe low blood sugar makes you laugh at things no one else laughs at. Who knows? I prefer to think that Dr. Geiger is really funny.

Yes, I just suggested you watch 3 hours of videos and to work along with these awesome operators to learn Perseus. This much power doesn't come for free! There are other resources as well. This great recent paper and there are great focused tutorials and (non video) use cases here at

Monday, February 19, 2018

Time to plan your summer European vacation around these amazing meetings!

Everyone in proteomics should be in Europe in July! Let me help you plan your vacation.

First stop:  July 2-6 -- Zurich

For the DIA/SWATH course. You can register here until March 31st. There is no intro material. This is for mass spectrometrists. Been doing DDA for 10 years and want to see what DIA can do for you? This is your stop.

Stop #2: July 8-13 -- Barcelona

Want to become an expert in the world's most powerful quantitative proteomic packages? This is how you do it. MaxQuant and Perseus for 5 full days with amazing speakers who design the stuff and power users who influence the designers. You can apply for a spot here.

Stop #3: July 16-20 -- VIENNA! 

EuPA's Advanced Practical Proteomics returns -- this year in Vienna. If you aren't familiar, just check out the amazing lecture material produced from the 2017 academy.

SUMOylation, glycoproteomics, big data, protegeonomics, PTMs I haven't heard of. You can get all these 2017 lectures here.

You can register for this event here. While I'd love to get to all 3 this July, I can only realistically do one so it's Vienna in August for me!  I can not wait!!

Stop #4: July 27-28 -- Oland, Sweden!

Heavy metal festival on an island in rural Sweden?!? What?!?  IN FLAMES plays both days? Wait! Wrong blog! much vacation time do I get again...?

Sunday, February 18, 2018

More details on our optimized gradients for C-18 PepMap

Thanks for the emails!  I legitimately love to receive them. Even if it starts with "you're an idiot!" Which, honestly, is reasonably rare. 

This is a follow-up to a recent post where I talked about how I'd been totally messing up my LC separations by treating C-18 PepMap like other resins.  I know I didn't provide enough details, but this has been a work in progress as we fine tune our instruments to be able to best handle the intimidating number of projects we've got going on.

After messing around with 6 EasyNLCs (1 Easy2, 2 Easy1200 and 3 Easy1000s) this is what appears to be the best separation I can get in 120 minutes of time on the 15cm 2um columns in 2 hours.

Please note: Recent versions of the EasyNLC user manual recommend using no more than 80% acetonitrile in Buffer B to protect your pump and valve seals. We switched all of our systems to 80% just last week. With 6 EasyNLCs running around the clock --- even a 1% increase in pump seal life will result in at least a 40% decrease in the number of loud profanities coming from this one weird bald guy in the lab.  Also worth noting -- these LCs use viscosity and temperature and some other stuff to determine flow rates. Don't change your solvents without consulting your manual (get the newest one online. there have been significant revisions over the years!), and don't trust what weird people post online on Sunday mornings.

Buffer A is 0.1% formic acid in 100% LC-MS grade water
Buffer B is 80% MS-grade acetonitrile, 20% LC-MS grade water, 0.1% formic acid.

This is with a 2cm C-18 PepMap trap column in line.One with a surprising amount of dead volume. More on that when I have more data.

I'm actually using 500nL/min when I get to the high organics just to flush things out, but I don't like how the graphic represents it.  Translation: I am displaying a method image above that is actually incorrect, purely for my own personal sense of aesthetics...

What happens if I don't ramp it up the organic pressure/flow rate? I get some trailing of some extremely hydrophobic peptides.

I know that is hard to see, but if you click on it - it should expand it. This is a human cell digest from Thermo with the PRTC peptides spiked in. The bottom is the most hydrophilic peptide of the 15 and the middle frame is peptide number 13. You can see that there is still some signal when the method cuts out at 130 min. There really are some peptides coming off at the end there.

If I up the flow rate on high organic, it trails off a good bit better and it sharpens the peak shape on my latest eluting standards.

I really dislike the boring 0-10 minutes at the front. We're running some tests in this weekend's queue, but on one run we've been able to negate it almost completely by using an alternative NanoViper trap column with significantly lower dead volume. I'll share those details when I get them.

For less complex samples we're getting the best separation by mimicking this gradient and ramping to 24% buffer B in 40 minutes and 36 in an additional 15. I brought some RAW files home, but I can't find the funny cord for my portable hard drive. I'll post later if I can find the stupid cord.

Why aren't all USB cords standardized yet?!? It's 2017, for crying out loud.

EDIT: *2018

Saturday, February 17, 2018

Spectral accuracy of an Orbitrap using Isotope Ratios

There has been this myth out there for years that Orbitraps aren't good at isotope ratio analysis.

Okay, it might not actually be a myth. The original Orbitrap and maybe the Orbitrap XL didn't perform very well against a time of flight (TOF) instrument in a study operated by a TOF manufacturer.

Fast-forward 11 years or something and take a look at this newish study. 

Head to head comparison -- a Q Exactive Plus operating at 140,000 resolution (doesn't appear to utilize the enhanced resolution upgrade) versus an honest to goodness isotope ratio mass spec.

How's it do?

Really well. Honestly, better than I expected despite my borderline obsession with these instruments, with an important caveat or two:

There is a dynamic range where the isotope ratios are spot on. It looks to me from the charts in the paper that if you are above 1e5 counts you're in the clear. Drop below that line and things get wobbly.  Stay above it? And you can just look at the isotopic distribution and count the number of carbons, nitrogens and sulfurs in your molecule without any additional information.

Caveat 2 (I'm adding this one) the mass cutoff of the instrument. Resolution decreases as m/z increases in the Orbitrap so in the low range everything is just incredible, but then you hit the low mass cutoff at 50 m/z....we've got some molecules in the 40 m/z range and it is soul crushing to have to run those compounds on the TSQs...

Is a Q Exactive going to beat a dedicated IRMS instrument? No way. Could you use a QE as a high throughput screening device to determine if compounds needed to be sent off for isotopic determination on an IRMS? Absolutely! As long as it's >50m/z (and in this paper, they never go over 1,400 m/ long as:  50 m/z < your compound < 1400 m/z

Friday, February 16, 2018

Systematic analysis of protein turnover in primary cells!!

Edited 2/16/18 for unnecessary profanities, but -- You've still got to check this out!!

What has proteomics done for biology lately? You mean today? Well -- how 'bout figuring out the protein turnover dynamics for over 9,000  (!! over 9,000 !!!) different proteins in B-cells, Natural Born Killer cells, neurons, monocytes, and hepatocytes!?!?

Why is it important? Proteostasis is a critical component of understanding mammalian biology and perturbation of the natural processes is the center of many diseases. Also, all of the cells this study works with are important in their own right.

This dynamic SILAC approach used in the paper is a major improvement over anything I've seen before on this topic -- they can assess protein turnover in a huge dynamic range from a time perspective, assessing proteins that have half-lives as short as 10 hours or as looong as 1,000 hours!

This study has "systematic" in it's title. This translates here into "a ridiculous amount of work". Just when I think I've got this under wraps, and I can't get any more impressed, I realized that they also apply this turnover analysis to protein complexes. How do all the proteins that construct the nuclear pore complex cycle through degradation/replacement in every one of these cell lines? Oh. Like this.

Yes, I know I get excited about a lot of stuff that I read. My basal level state probably appears to exist somewhere between "in complete awe" and "totally blown away" to outside observers, but this study is biology text book altering level stuff and I'm having a lot of trouble putting it down and going to work this morning.

Important note -- This study also features major improvements on the already awesome isobarquant Python software package that can be directly downloaded from links within this great paper.

All the RAW and processed data is available. Since it is 571 RAW files (!!!) and I've already used 2 screenshots from this paper...

Oh no. I'm not done yet. (What time is it?)

In this study the authors use a Q Exactive (Plus, I think) and if you are going through the methods you'll notice that they use a higher MS/MS target value than I bet you're using. 1e6.

Now. Let's think about this one for a second. Why do we keep our target values lower? Because we're cramming a lot of positively charged things into a little tiny space (the C-trap and Orbitrap). I wrote Dr. Makarov once for advice on maximum ions for SIM scans and he said that I would start to see space charging if I went above 5e4 (I can't remember what instrument I was on, but I promise you I kept that email. I didn't frame it, or anything I'm not that weird.)

But that is a SIM scan, right? My only proof that my ion is my ion is that it has a perfect mass and ion distribution. Any shifts from that and I'm in trouble (some FDA pesticide assays on SIM scans require <1ppm mass accuracy for positive ID). This is MS/MS scan. We're using a much more wobbly tolerance, typically allowing 0.02 Da.

If I go to my new favorite bookmark, the RedElephant, it tells me that on a 200Da fragment ion, a 0.02Da shift is 100ppm!  Even when these guys purposely tried to space charge an Orbitrap I think they couldn't force it to get 30ppm out on a SIM scan (I forget all the details and I really should go to work sometime...)

So...if we couldn't possibly space charge our ions out of whack why wouldn't we go for a higher target value for our MS/MS ions? Sure, maybe it is overkill, but if there is no downside?

Okay, so check this out. They didn't just go for 1e6 without evaluating it. They've thoroughly vetted this target level. And it isn't a good idea for reporter ions where you need perfectly focused fragment ion masses.

(I need to read more ACS)

But it appears just fine for everything else. You bet your sweet peppy I'm going to run some head-to-head comparisons as soon as I find some open time on one of our instruments.

Thursday, February 15, 2018

SUMO is back and this is how you identify substrates and partners.

SUMOylation is a PTM that I generally try to forget about, because

1) I don't know how to identify it and
2) I don't know what it does, except...well...if it messes up it's probably bad.

Solution? This great new paper in Press at MCP! 

SUMOylation is a protein-type PTM -- something like ubiquitination, but without the handy-dandy lysine on the third amino acid post-protein binding. We'll probably find out tomorrow, if we haven't already, that this isn't true -- but, ubiquitination means DESTROY THIS PROTEIN IMMEDIATELY.  SUMOylation on a protein -- well...I don't think we're entirely sure, but if the last couple amino acids aren't cleaved off, it doesn't do anything.

This study solves problem #1 for me. These authors describe an enrichment procedure for SUMOylation that they can control in fine detail using Flag tags and stuff. Even better? They develop a protein microarray for SUMOylation substrates!  YES.

Potential collaborators of the future, "What about SUMOylation?"
Ben, "Great idea! Here are the names of 18 people in beautiful Baltimore who know how to do this with protein microarrays. I even know a couple of them. Let's go visit and get a Natty Boh, hon!"

Hey, if you can monitor a variable and complicated PTM with an array. I say do it with an array. And if you aren't interested in sites or are doing exploration, this great paper walks you through the molecular techniques to get a global picture of this weird and complicated modification.

Wednesday, February 14, 2018

phpMS -- More new easy to use proteomics tools online!

This is more like it! 

Powerful tools?
Easy and super simple to use?
Free webserver hosting it?
A random red elephant?

Check check check check!

The tolerance calculator is AWESOME. How often have you immediately wanted to know exactly what the Da or millimass unit tolerance is when you've been thinking in parts per millions (ppm) all day? Type it into the box!

In silico digest online or predict your fragments!  Sure, the Protein Prospector has been able to do this for 20 years -- but...and I mean this in the most respectful way possible...I use the Protein Prospector at least once a day right now, but it's never been the most user friendly tool ever written.

The red elephant is a really nice thing to bookmark when I need a simpler answer without the raw power of the my hillbilly friend.

Like most tools people are developing these days, this thing has way more power than the smaller functionalities I seem most impressed with here. I'm just talking about the neat little tools that are going to make my life easier and my day maybe a little more productive.

Tuesday, February 13, 2018

Oh no. Batch effects in proteomics?

I'm stuck behind a paywall so high that I can't even see a thumbnail of this paper's main figure -- or an abstract that will tell me if this is proteomics related.

However, Google flags the term "proteomics" in the super secret hidden text of this study and it makes me think that they address things I'm concerned about in this study.

While I'd like to ignore it because of the crappy abstract, the title says it's something that I'd really like to read...

I can't recommend you check it out, but I'm leaving it here so I don't forget to download it when I'm on the other side of the wall.

Zotero -- Open Source Citation Software!

Supposedly my new job has access to EndNote. I can't figure it out and calling my IT help desk only provided me with something completely unrelated in consensus reality.

Then Reddit suggested that Zotero is better anyway! I can't say for sure yet, but it appears to be 1) free 2) well supported 3) has the correction citation formats loaded for the target journals for the open things on my desktop.

Monday, February 12, 2018

Target decoy methods for spectral library searches!

Could you use 18% more peptide IDs with your normal confidence level (assuming a 1% FDR)?

Wait. I can one up this. Could you use 23% more ID's at a 0.1% FDR?

Is it finally time for us to seriously look at spectral libraries again? I'm gonna say it is. I also think this paper is another good argument to support it. 

If you've also been doing this a while you might think "hey, we tried this, but the libraries weren't good enough"

Okay -- so there are some people, I think they're mostly in Germany (but we won't hold that against them) and they are synthesizing something like every human peptide. If you haven't seen it, I recommend you check it out here.

NIST also hasn't been sitting around doing nothing all these years either. They've already made a huge libraries of increasing quality -- and they've already made spectral libraries of the first ProteomeTools release.

You can check out what NIST has available these days here.

Does anyone know if MSPepSearch works in the IMP-Proteome Discoverer? I'm trying it out, but I won't know for sure until the demo key expires in 54 days....