Saturday, August 27, 2016

hEIDI -downstream processing for massive Mascot processed data sets.

Does your data processing resolve around a central and powerful Mascot server, but you want to compile massive datasets in a streamlined manner?

Maybe you should check out hEIDI.  It is described in this new paper in JPR from AM Hesse et al.,.

To be perfectly honest, at first glance it definitely looks like you'll need a good bioinformatician to put this together from even the downloadable tools here, (I tried, but very rapidly got to a window that seems to assume a lot of pre-existing knowledge. I'm gonna assume I'm dumb and not that the authors of the instruction manual left out several critical pieces of information necessary to get going. Honestly the safer assumption!) but the output seems pretty powerful and it kicks it out in a format that meets the MIAPE standards automatically.

Fading plastic

What's up with that LCQ...or that 3200? Was it always yellow/brown?

Turns out it is a consequence of the plastic that is in use. Check out how extreme the difference is in these Super Nintendo!

There is a cool article on this here.

Friday, August 26, 2016

Ummm...gene name errors are widespread in the scientific literature.... you've probably ran into this before. Honestly, I would be really surprised if you haven't but if this paper is correct, Excel autoformatting is a much bigger problem than I ever guessed...

Ever sorted your output list and ended up with something weird like this?

Umm...why are there a bunch of dates in September at the top of my output sheet? And why do they have quantification values?

Cause if Excel sees the gene identifier for Septin 3, SEP3, its gonna assume you're just lazy and didn't feel like writing the date out correctly and it'll fix it for you. Which is all well and good, cause septins are super boring and don't really do anything...wait...they're GTPases!  Okay, no problem. We'll just put an apostrophe in front of it 'SEP3' and everything is okay.  Everybody else do that and we're fine (except...if you have to convert it to text and back and then it does it again).

This isn't the only one. Alexis knew several off the top of her head when ABRF Tweeted this paper yesterday, so its affecting the nucleotide bioinformaticians as well.

The title is flashy!

Sounds alarmist, right? How bad could it be?  Shockingly surprisingly bad!

They pulled thousands of published papers and supplemental files and looked for genes that had annotation mistakes that could be directly attributed to Excel autocorrects (autocowrecks?) they found hundreds of supplemental files per year in the relatively small list of journals they looked at.

The journal average is about 20% or so. About 20% of the supplemental data found in leading genomics journals in the last 10 years had some sort of Excel-linked mess-ups in the data. The sample size was smaller and the number of supplemental tables is always bigger in the biggest journals, but the jounal with the highest percentage of spreadsheet autoconversion mistakes? Something called ?Nature? -- with over 30% of files showing these issues.

The solution these authors suggest after all that searching? Databases!

No thanks! I'll turn off all the autocorrect and autofill functions, like this!

Go to File and Options is at the bottom and this window will pop up.

Important: Change your UserName to something funny so that people will see it as the author of any spreadsheet you send them (they'll see it rarely enough that it will stay funny for a while)/ Then go to the Proofing menu!

Then open the autocorrect options and turn a bunch of stuff off!

If you don't need it, turn it off!

Shoutout to the ABRF forum that turned this up and @ABRF for Tweeting it posting the paper link!

Thursday, August 25, 2016

A researcher's guide to mass-spec based proteomics!

I can say with absolute certainty that I'm not the only person who considers the guy in the picture above to be kind of a personal hero. And not just because other people in that category occasionally send me pics like the one above late at night.

This new paper (Epub ahead of print as of early this AM) does an even better job of characterizing why. While the title might seem kinda dull, the topic of this paper centers on probably the most common conversation I have during my day job (3 times yesterday and twice with people with really impressive titles). That question is -- How do we translate the brilliant stuff mass spectrometrists are doing into what biologists care about?

Honestly, my entire career has been based on being kinda okay at both mass spec and biology and talking a LOT so this is dear to my heart.

How does this paper help?

WILEY DON'T SUE ME!! Please see disclaimer! I'll take it down, but I'm directing you traffic for free.

By breaking this stuff down to basics as shown in the picture above. How clear is that!??!

Is it the most original paper ever written? Nah... But it sure is approachable, extremely well-written, and breaks down a lot of this stuff to more palatable little blocks without leaning heavily on maths maybe us biologists weren't required to take. I highly recommend downloading this paper!

While I'm on this topic, I'm going to self-promote a little. I've prepared a seminar on something similar. You can call it a "Orbitrap physics for biologists day". It originally started as a way to maybe help collaborators or core lab customers at JHU to ask better questions, but has grown into something a little larger.

If you are on campus at the NIH, this is 2 metro stops up at Twinbrook lane. We're working on having the meeting recorded/telecasted and made available, but we don't have details yet.

If you are local and want to attend you can register for this meeting here. 

You bet I'll be stealing borrowing and fully citing some of the beautiful and concise illustrations and explanations from this new paper as improvements to this workshop material!


Thanks to Twitter, I've been able to follow along a little here and there on what is going on at IMSC in Toronto. Though...not thanks  @theIMSF, cause they haven't tweeted in over 4 years.... but #IMSC2016 and @ScientistSaba have been great for conference highlights

This reminded me to check out the IMSF website, which has some great resources including direct links to at least 30 other international mass spectrometry groups. This is super useful for seeing if you can leverage your 2017 conference budget to help realize your life-long dream of celebrating Halloween in Switzerland...for example... ;)

Wednesday, August 24, 2016

Convert iTRAQ labeled spectral libraries to TMT (or reverse it!)

Yeah! Been waiting to talk about this one!

Paper first: It is from (Jane) Zheng Zhang et al., out of Steve Stein's lab (and some guy named Markey was involved as well). Its in this month's JPR.

Spectral libraries are faster than general search algorithms and they have great levels of confidence. This is because you aren't looking at theoretical spectra generated on the fly. You are either looking at real observed MS/MS spectra or you are looking at spectra that are informed by real-life MS/MS spectra. (There are fully in silico developed spectral libraries out there, too, but this doesn't count here).

Downside? You have to have a darned spectral library that is relevant to your experiment. For example, you know the hundreds or thousands of iTRAQ labeled cancer cell line runs in repositories people have put in? Great data to pull spectral libraries from -- if you are doing iTRAQ. If you want to use TMT so you can have more channels you're out of luck cause you can't search that data against any of those libraries...

Until now!

What this study shows is that --HEY! The peptide backbone fragments the same way whether its TMT or iTRAQ labeled. And if you just pull the spectral library and convert (clean) it to keep the peptides that will stay (and are experimentally observed) and mass shift the ones that will change thanks to the tag, it works almost as well as running a sample with the correct reagent and original library!

Work is going on at NIST to take libraries and convert them for direct searching. They also state in the paper that spectral library searching that can utilize PTMs are in works! Yeah!

Tuesday, August 23, 2016

Double laser dissociation in the HCD cell of a Q Exactive!

All summer, at virtually the same time every single morning the same two questions have occurred to me:

1) How many other people are reading JASMS in their bathroom right now? Guess it depends on the fiber intake of the average mass spectrometrist -- and where they have JASMS delivered. This is probably a question that could be answered with some effort, but I doubt I'll ever make it a priority to find out. A SurveyMonkey and some epidemiology stats could shed some light, but a tracking App of some kind would probably be necessary to get real numbers. Is it worth the time? Or is it better for it to be one of those deeper philosophical mysteries that we shouldn't ever try to investigate because it would reveal more about us than we should ever really know?

2) Why doesn't anyone I know have a mass spec equipped with lasers?  This is the important one.

Cause...guess what?!?!....the cover of JASMS is another group with a.... their HCD cell.  I know...I promised not to write about any more of these, but this one raised the bar. They put 2 kinds of lasers in their HCD cell and used it to analyze intact ubiquitin.

One laser is high energy and another is low energy, but it appears they can use them both simultaneously.

They read out the fragments in the Orbitrap at 140k resolution with 3 microscans and they sum 50 scans for their fragment analysis. This is, arguably, an awful long time to accumulate fragments of an 8.5kDa protein. But this is a fragmentation method development paper, so we don't really mind about that.

How's it do? 84% sequence coverage of the +13 charge state!  Not so bad!

Now...when is someone in Maryland getting a laser HCD cell?!?!

Monday, August 22, 2016

SPECTRUM ANALYZER!!! Pull 43 metrics out of any .RAW file!!!

Is this old news? Maybe it is, but I've never seen it till this weekend. AND...I'm always hearing people say "I wish RAWMeat still worked for these new instruments..."  While many programs will extract the same data out of RAW files that RAWMeat does/did, most of them provide that after you have processed your data.

SpectrumAnalyzer is the opposite. It is a tiny and fast piece of software that pulls the scan header information from any of your RAW files and makes a handy TXT file out of it.

As a warning, it does do this for EVERY SINGLE SPECTRA, so it looks a little daunting, but you could easily make a little matrix document that bins the results into histograms and produces a pretty output.

You can get this nice little tool at the IMP page here (go to Tools!)

Sunday, August 21, 2016

Proteomics and phosphoproteomics of Scott Syndrome!

Not the right Scott at all! And much more serious.

Scott Syndrome (I just learned this today) is an extremely rare bleeding disorder. How rare? Well, there is one guy in the UK who is confirmed to have it and is a registered blood donor. They call him ScottUK.

Some work has gone into studying the disease and conclusions have been drawn regarding how Ca2+ channels are involved, but more work needs to be done.

Enter Fiorella Solari et al., and an incredibly through proteomic analysis of platelets from unaffected people and ScottUK.

iTRAQ is used
Phosphoproteomics is used
HighPH reverse phase fractionation is used
A Q Exactive does the heavy lifting and
PD 1.4 plus a publicly available Excel Macro from Karl Mechteler's lab (??) for simplifying modified peptide output reports  (find it under software tools!) is used to find the cool stuff
PRMs on an Orbitrap Fusion are used to confirm what they found.

See, thorough!

The did this with activated and unactivated platelets AND they also use open searching techniques (with quantification) to study protein cleavage events.

Honestly, I'm unqualified to assess the biology on this, but if you have a woefully under-analyzed disease you wanted to know more about -- this is a great study to show you how to go from little information to tons of information about that disease!

Saturday, August 20, 2016

What MS1 resolution and mass accuracy do you need to be as precise as an SRM?

I have wondered about this for years. I even put together a really poorly planned idea to try and test it myself that was hindered by 1) my lack of access to a triple quad 2) my lack of free time 3) my lack of motivation to spend some of my little free time trying to get access to a triple quad, figure out how to run it and some stuff on it 4) the poorly planned idea was written on a napkin and the poor idea was written in even worse handwriting and the ink was diluted by something that one might guess was a solution of some reasonably high concentration of EtOH buffer.  I KNOW I'm not the only person who has a manilla envelope full of napkins from late meeting hangouts with brilliant people.

What was this about again? OH YEAH!!!

As resolution and mass accuracy increase in an MS1 scan your certainty that the ion you are looking at increases. For example, if I shoot an ion through a single quad and I see a peak that elutes at 10.5 minutes and is 312.1 +/- 0.5 Da and I'm looking for an ion that is 311.8413 -- is that my ion? If the matrix is simple enough and the concentration is high enough, SURE! it probably is. But what if that is a HeLa peptide digest? If you submit that as quantification of your ion, that paper is coming back to you. The complexity there (>1M peptides?) is way too high for you to be sure.

What if you are running a super high end TOF instrument that can get true 20,000 resolution?  If that TOF comes back with 311.92, is that your ion? Your certainly is definitely going up and your reviewers might actually take that.

SRM/MRMs are the gold standard. You isolate your MS1 mass +/-0.5Da and you fragment your ion and you collect your single transition at +/-0.5Da. We've historically considered you pretty certain that you are looking at the right ion. Anyone who has done years of them in complex matrices (ugh...) can tell you really hope you get 2 or 3 really good transitions, and with peptides you typically do. Chances are you can tease out what peak is the real one and what is coisolation. Still, though, this is the gold standard. (Also the gold standard because they all use the same expired or rapidly expiring licensed technologies which makes them 1) all the same --85% of your ions hit the detector 2) suuuper cheap today, in mass spec terms.)

SO. BACK ON TOPIC. Here is the question. At what mass accuracy and resolution do you need in an MS1 before it is as good or better than a QQQ????  After years of wondering about this I was reading a review before a meeting about something completely unrelated and BOOM!! these guys did it like 2 years ago!!

Its 2016? 3 years ago!!

I think I know all these people....stalked them on LinkedIN just now...I totally worked with this team around 2013 on something completely different. They had the answer to this question the whole time. I've got to work on saying less and listening more. But not today!

Wow. This is rambling Saturday....

What did they do? They took multiple mixtures of increasing complexity and spiked in peptides that didn't belong. The mixtures started real simple and went all the way up to HeLa digest separated on a short gradient. As complex as you can get without getting into the waking nightmare that is plant proteomics.

The peptides were spiked in at varying concentrations. A nice QQQ was used to quantify the peptides as they went. An Orbi Velos and an Orbitrap Elite were used for MS1 scans.

They did some really imposing math stuff and worked on the certainty.

And they found?!?!

(unenthusiastic drumroll, please!!!!!!!)

60k resolution and extracting the exact mass at +/-4ppm is roughly the equivalent of an MRM when the same retention time extractions are used.

Could this blog post have been 4 sentences and a link to this nice Open Access paper? Yeah....

Friday, August 19, 2016

The Colorectal cancer atlas!

Is this the summer of the Atlas? It might be.

This one right now is a little more style than substance, but it is a work in progress. The CRCA is an effort by David Chisanga et al., to pull together all the information out there in the colon cancer world. The paper was just released and the web portal went online just recently.

If this is your field, you'll probably want to bookmark it. As all the resources come online this'll probably be something you'll want to come back to. Just give 'em a little bit to work out the bugs. You can visit the graphically pretty interface directly here. 

Thursday, August 18, 2016

Nanospray single run analysis of peptidoglycan!


I'd wondered how long it would be till someone pulled this off!  Bacterial cell walls are sugar chains connected by peptide crossbridges. To get to it you use enzymes that destroy all the DNA and protein and you're left with the cell wall. You chop what is left up with lysozyme, reduce the sugars so they don't all elute in a single peak and then use LC-LC-MS to quantify the compounds and figure out what they are to reassemble a picture of the cell wall.

Marshall Bern (wait, I know him!) et al., show in this new study that they can ditch the first LC part. Just run the digested reduced peptidoglycan out on a single dimensional LC gradient and use ETD and HCD and BOOM!  you're done. (That sentence is too long).

By simplifying it they get to see some new, low-abundance muropeptides (glycopeptides) in C. difficile that they haven't seen before.

The paper points out that the traditional method is slooow. You use a phosphate buffer system to really get very clean separation of your compounds and fraction collect them. The phosphate buffer system is flushed out and then the remainder is analyzed with reverse phase. Yeah. Its slow. But this is the reason they've been doing it this way:

(From a Dave Popham paper. They all look like this.)

Using the phosphate buffer system you can quantify it like its a NIST standard.  Can you get a 4% CV? Heck yeah, you can.

Just shooting muropeptides on reverse phase alone as they did in this study?

Great for finding new muropeptides!  Ummm....maybe not so great for quantifying them and ultimately elucidating the peptidoglycan 3D structure? Hmmm....if only there was a quantification technique that didn't care about peak shape...then you'd have the whole package.

But for straight-up discovery of new muropeptides? Single shot 1D ETD/HCD with Byonic processing looks like the solution!

Wednesday, August 17, 2016

In-depth study of protein inference!

August and September is crazy time for my day job so the blog is probably going to be worse than usual for a if I write about something during this time its cause I really really like it.

Case in point:

(Direct link here)

These guys totally had a protein inference party!  Protein inference is the problem that I think is really well described in the image at the top (this supposedly was taken from the wall in a 1st grade class....this is definitely a better school than the one I went to....).

We KNOW the Peptide Spectral Match (probably...). Our search engines are great at that. "This MS1 mass and MS/MS fragmentation of the area around that MS1 mass matches this peptide from our FASTA index".  The tricky part is what protein is present?

So this group of slackers did what everyone else would do...who had the technical capabilities of taking most of the protein inference algorithms and then putting them into the same operating environment. They used something called KNIME which appears to be some sort of a big Cloud-based collaboration environment. To get everything working together they assembled an OpenMS workflow within this environment. Of course, they made it all available to download on GitHub here (under the really cool name, KNIME-OMICS)

Once they got everything all operating under the same technical conditions and parameters. Wait. Describe everything:

They used the search engines: Mascot, X!Tandem and MS-GF+
And then the inference algorithm: FIDO, PIA, ProteinProphet, ProteinLP and MSBayesPro (I don't know the last 2. No time to investigate)

They picked 4 datasets of varying levels of complexity from public repositories. They range from a yeast digest all the way up to a lung cancer analysis. Then they go to work.  I'd like to mention that the paper is really well written. No guesswork regarding what setting they used for which algorithm. Every one I have the concentration on this little sleep to really look at seems clearly detailed.

Good news for us non-programmers in the Proteome Discoverer world, cause FIDO seems to perform really well in these studies, providing the highest number of unique proteins inferred as the number of databases increase of all the inference algorithms. Go FIDO!

Probably the coolest conclusion is something we've probably all observed a little -- that increasing the number of algorithms doesn't always increase the number of proteins inferred at the end. But it does increase the number of peptides which increases the strength and accuracy of our inferences.

 Despite the fact we didn't see Sequest or Comet employed I think we can infer from the other 3 algorithms and strength of observations that what they show here would reproduce well in the most used search algorithms as well.  This is the most thorough and best controlled study I've ever seen on protein inference so I'll definitely take it!

Wednesday, August 10, 2016


Its fo' real, yo!

DISCLAIMER: Alpha is the first version of a piece of software.  It is perhaps not fo' real per se, yo. It may be a while until even external testing release to collaborators.

This post is not purely for bragging about all the cool stuff my computer gets to do while I'm away at work. It is meant to instill confidence that some of the cool features you may have seen or heard about at ASMS are really making their way to your friendly Discoverer interface. Such as....

The ability to pull 63,000+ unique label free features out of an Orbitrap Elite run in 3 minutes!!!!!  WHAT?!?!

First impressions?
1) Super stable so far!!! (Bremen is getting GOOD at Building Discoverers.)

2) Very very familiar. Why alter a good thing, right? No serious changes in the interface at all. A few minor tweaks that just improve the running environment. {Loaded a new FASTA. Appeared as an option in Sequest. Did not have to close my study!}

3) Ummm....if all of these features make it to release, we definitely didn't get the full story at ASMS. There is some seriously cool new toys in here!  Come on PC knock out these RAW file alignments!!

4) LFQ works! Really really well!

Disclaimer: This is an unreleased Alpha version of the software. The future is not now per se. But the future is en route? I'm not altering that picture I found, though!