Thursday, June 30, 2016

Recognizing millions of unidentified spectra!

At some point you've got to step back away from the unidentified spectra problem and just let go. It will seriously drive you crazy if you don't.

Do we just have a fundamental misunderstanding of how biology works? Seriously!?!?  What is all this stuff?!!?!?!?

Take into account individual genetic varation, partial cleavage events, protein dynamics, non-stoichiometric post-translational modificiations -- and you know what? You still have a buttload of spectra that sure as heck match the averagine isotope model -- but I have NO idea what they are.

Want to feel better about it? Check out this paper from Johannes Griss et al.,! 

In this study, this group of smart people suggest that we can take a deep breath and distance ourselves from this problem a little -- we have these huge databases like PRIDE. What if we just look for patterns for these unidentified spectra?


"Hey, I don't know what these 4,000 spectra are but they pop up every time y'all do a study of cancer cells that are Her2 positive"  (!!!!!)


PRIDE has been proposing work like this for a while, but I honestly wasn't as excited by the original PRIDE Cluster paper or web interface, cause either I wasn't so frustrated by unmatched spectra...or it never occurred to me that this was a way to use it.

Okay. I want to use this somehow. Gotta think up a strategy.

Wait...MGF?  OH. Okay. This needs more time than I have this morning. GO TO WORK, BEN!

Seriously smart paper though. One of those -- "Man, I totally should have thought about that one!" Glad there are professional proteomics people out there!

Saturday, June 25, 2016

PRM assays show that oncogenic KRAS and BRAF reprogram metabolism!

In all this newfound interest the world has in metabolomics/metabolism these days, this awesome new paper from Josiah Hutton et al., is a breath of fresh air!

In the study they compare tumors that have both wild-type (WT) and mutant KRAS. They do some nice global analysis on the QE Plus and add some nice data to the huge pile of KRAS proteomics data out there.

Where this paper differentiates itself is that they go after the metabolome of these difference cells. Instead of extracting the metabolites themselves and trying to infer what they are (which is hard cause lots of those things have the same exact masses -- thank goodness the mZcloud and high res NIST libraries keep getting better!!) they do something I don't think I've seen before.

They use parallel reaction monitoring (PRM) to quantify the proteins that are most key to central metabolism. The PRMs give them absolute certainty that they're looking at the right proteins as well as the sensitivity to see statistically significant changes (that...the global proteomics data wasn't quite sensitive enough to conclude!) and they end up with this awesome heatmap I stole and put at the top.

I can't recall now, but I think they PRM'ed 73 different metabolism proteins. I wouldn't change a thing in the methods as they described them. (Though I'm just a tiny bit confused about what source they are using on the QE Plus...looks like nanoflow rates, but described as micro, probably just a difference in what the authors and I consider the proper term for the flow rate. Again completely minor observation.)

Totally sharp paper. I hope we see a lot more of this kind of work going forward!!

Everybody here in Maryland is talking about the NCI proteomics moonshot initiative. I would like to officially cast my vote for targeted pathway quantification (and mutation confirmation) via sensitive and high certainty PRMs being a central component of the initiative!!  This paper would sure make a nice template!

Thursday, June 23, 2016

BatMass -- LC-MS/MS data visualization!

NA-NA-NA-NA-NA-NA-NA-NA-BATMASS!!!!  So glad this is out so I can finally talk about this!

Okay. SOOOOO...Orbitrap data is BIG DATA. For real. Unfortunately, we've been looking at it in such a limited way that we really haven't realized it yet. Don't worry, I'm going to ramble on and on about this as time permits. [Dedacted rambling about other cool things I can't talk about yet!!! Shut up Ben, shut up. You can see the ENTIRE PROTEOME ITS ALL THER..QUIT TYPING!!! Dedacted]

How do other fields deal with Big Data? P.S. this book is somewhat off topic, but awesome. You've got to step away from the small ways of dealing with things and look at the big picture. Its hard to do. Let go. And look for trends.

Enter BatMass. Batmass takes us away from looking at individual MS scans and away from searching against limited and often incorrect databases and sums your RAW files up into a picture. A zoomable, scalable, searchable, overlayable picture!

Check out this fuzzy example. Step away from these 9 RAW files and --- BOOM!  (One of these things is not like the others!)

Rapidly, visually locate what is different --- you can use it to find files where something went awry -- or to look for your differentially expressed molecules between files. Once you find them, you have an entire tool suite to 

Zoom in and extract the data of interest. Seriously! I love this concept (if you can't tell)... Better yet, its experiment-independent. BatMass doesn't care if this is a proteomics experiment or a metabolomics run or an untargeted food screening run - it just does its job and visualizes the differences it sees (this corn syrup has a pesticide we didn't even think to look for -- what?! Your drug changes the global lysine acetylome -- what?!?)

If you just want to get it and mess around with it -- its

You'll find a nice getting started guide for mass spec people and some overview videos. You'll also find resources (and the original source code) for developers who want to take this concept further or integrate it into their workflows!!  

Wednesday, June 22, 2016

Slightly off topic -- the Thanatoscriptome....

This is off topic, a little, but weird enough to me that I've got to share it. This review has not been peer-reviewed yet. I did my homework, though, and there are several papers leading up to this review that have been, so I'm gonna assume the review is okay.

The topic is the "thanatoscriptome" which appears to be a distinct pattern of genes that continue to be active after the death of an organism. One of the previous papers shows work in Zebra Fish and another in mice, but they've taken samples from dead humans as well and shown that these patterns are similar.

Weird, right? Cause I'm sitting here thinking of...where's the evolutionary pressure that would force something like that, then I realize that is probably a stupid thought. What we're likely seeing is a lack of evolutionary pressure to actively stop all mechanisms at a certain point. When an organism is under extreme stress during its living processes it makes sense to have stress response proteins (like our so-limitedly-annotated "heat shock proteins") persist as long as possible -- the organism's life might depend on it!  And at the point at which an organism is "dead" (whatever that means) it would be silly from an evolutionary standpoint to devote energy to stopping that stress protein.

Sorry for the stream of consciousness. I'm up super early (for me) and the cycle was: MIND BLOWN -- Caffeine slowly kicking in -- OH, this isn't as crazy as I thought, but still super interesting.

The most interesting part is the pattern, really. I'm sure they started off thinking "we've got a fast sequencer and we already spent a ton of money on reagents -- lets take a sample from this dead fish -- wait. that's not random...?

I have this note on the dry erase board in my office that says something cryptic in my awful handwriting "mass spec time of death." I'm wondering if those two are linked, or if that just refers to the how hard it is to find electron multipliers for the LCQ.,,,

Tuesday, June 21, 2016

Neat PD 2.1 trick we stumbled on -- exporting and researching filtered spectra

Honestly, it was a bit of miscommunication between a friend and I that ended up in finding out that this works. And...besides the application we were discussing (which is secret) I can't think of a reason off the top of my head that this would be useful...but...

If you did have some cool PSMs that you were interested in --- wait -- I've got one!  Okay. So..what if you found some PSMs that were differentially regulated but you wanted to see if all the ones that looked significant were possibly contamination. Then you could do this!

Use your filters to reduce your PSMs (or MS/MS spectrum input or whatever) down to the list you are interested in and checkmark them. (You can highlight a whole big group and then tell it to "Check selected").

Then you can go up to export and export out the spectra in any of the PD supported formats. In this case, we did mZmL.

Then you can reimport that mZmL as an input file.

And then run this smaller file through your pipeline with that contaminating organism database!

Monday, June 20, 2016

Need your RAW files in another format?

Need to convert your Thermo .RAW files in another format? There are a ton of different software out there that can do it!

Ummm....but are they all the same? Lin He et al., sure don't think so!

In this paper from November they introduce RawConverter...and the comparison to other software for doing data conversion is somewhat one-sided.

When the vendor gives out a free solution for this conversion, I'm gonna lean toward that (the Proteome Discoverer Viewer) (particularly if said vendor has established something of a precedence for subtly altering .RAW file formats over the years) but, seriously, when you look at this data it is impressive.

Also, this software doesn't car if it is converting data dependent or data INdependent files and I can't guarantee that PD can handle the latter.

Time for a head-to-head matchup!  Unfortunately....I've got a job...and I've got to go to work....sigh... [Dislcaimer -- Ben does not consider it in any way "unfortunate" that he has a job. He actually totally digs his job and, if he could cut something out of his life it would be sleeping so he could spend more time on his awesome job AND evaluating all the free proteomics software out there. This has been a message from the Commission for Keeping Mass Spec Nerds off the Streets.]

Shoutout to the incredible Pandey lab for tipping me off to this cool paper!

Sunday, June 19, 2016

The consequences of genetic varation at the proteome!

I'm having a ridiculously great weekend. Perfect climbing weather and a cookout with my 3 favorite (and intimidatingly brilliant) geneticists and I've still got NBA Finals game 7 tonight?!?! What a weekend!

So, if I read a couple papers that make me feel a little intellectually inadequate, I can handle it.

I'm going to start with this one from Joel Chick and Steven Munger et al., in this issue of Nature.

What is it? Its looking at genetic variation at the proteome level. At first statement that doesn't sound like such a big deal, but it totally is. As much as it is fun for us to have distinct proteins from one (or a couple) organisms in a nice FASTA database for a species, that is just a summary of what we think we know about what is a protein coding region from one organism.

I'm going to summarize this as something that works for my brain well.

If we have a nice curated Dog FASTA, that file is likely going to represent the whole species -- Heck, it might even come from a sample from just one dog! Therefore, that poor deformed experiment of irresponsible breeding in the photo above would be represented by the exact same FASTA database as the majestic Pug beside it. And, heck, at the straight up DNA base pair level, they probably wouldn't be all that different.

At the proteome level, its gonna be incredibly different! But how do you evaluate it?

Back to the paper!  There is this group of mice called the Diversity Outbred (DO)model. They have been developed to have a bunch of genetic variation so that we can better understand how individual organisms turn out different than their forebears.

If you're thinking "what? we know how that happens! my Dad is homozygous for the bushy eyebrows gene and my mom is bald and that is recessive, so I get both of them. Simple!" It turns out if you look closely...that the exchange of traits isn't anywhere as simple as Mendelian models suggest. Genes don't get cleanly copied and transferred, and we don't actually know how the whole system works. (Summary of a ton of WikiPedia articles I read while trying to understand what Joel and Steve are even working on here. My explanation is definitely way too simple, but it helps in my espresso charged brain!)

There are these things called Quantitative Trait Loci (QTL). These are areas of the chromosome that are associated with a trait. Once we know what/where those are, then sequencing techniques can start to figure out what actual genes/proteins come from that area.

Back to the paper, I swear!  So this group starts with some of these mice that are deliberately genetically diverse and they take some of their livers and do some TMT SPS MS3 Fusion proteomics on the livers to get the proteomics data.  By "some" mice, I mean 192 (slackers) with half the mice on a normal diet and the other half on a high fat diet. You know, cause this wasn't complex enough. They, of course, did transcript-level profiling of all of these mice as well. Its worth noting that there are not complete genomes for these DO mice. The original mice that were bred to created this heterozygous model colony are sequenced and well annotated, though, and this is info is critical to the...imposing...level of downstream analysis.

Edit 6/21/16: I left out some of the coolest bits!  Okay, so when you are doing stuff like this you toss the gene ID stuff. What you think you know of the protein coding region and so forth, cause we need to make the assumption that maybe not all of that stuff is 100% accurate -- seriously, this is going to be a theme that pops up going forward (unassigned spectra, what!?). Rather, focus on the stuff you do know, as where in the chromosome this stuff matches to!

Now, there is all sorts of awesome biology to infer from this paper. There are close to 100 supplemental figures and tables. There are fantastic conclusions made here in terms of how much the genetic variation effects both the protein expression in general -- as well as the response to this extreme dietary change. But check this one out.

 In case showing this figure is totally against the rules (please don't sue me...Nature! See disclaimer statement or email me:, and I'll take it down! Promise! But this is really cool and I'm sending people to read your paper, you can loan me one tiny screenshot from one huge supplemental picture, right?).

Blue is looking at these loci at the transcript (RNA) level and orange is at the proteome. Number C, makes sense, right? Lots of messenger RNA --> lots of protein. D? More mRNA? Less protein!  Turns out that they can pinpoint site mutations that cause post-transcriptional regulation.

Take home points out of this great paper that I'm seriously concerned I still might not understand at all and probably completely butchered?

We are still vastly oversimplifying the biology of our eukaryotic models.
But we have the technology (right now!) to throw out some of our erroneous preconceived notions and readdress how we do all of this stuff!

Saturday, June 18, 2016

m2Lite -- convert PD msf files to mzIdentML

I'm not going to say I know everything that is going on in this rapidly expanding field. But...I feel like I read kindof a lot of papers. So...when someone in a building that I walk by ALL THE TIME writes a piece of software that converts Proteome Discoverer output into the very useful mzIdentML format, I feel pretty dumb for not catching it for a couple of years.

And...that is exactly what this awesome paper from Paul Aiyetan et al., does.

Since it was published in 2014, and presumably work started on it earlier than that, it was originally designed for PD 1.3. Fortunately, however, if you go to Dr. Aiyetan's BitBucket account here, you'll see that he's updated it with new versions as recently as around this time last summer.  You'll also see some stuff that sounds really cool that I'll need to investigate!

Friday, June 17, 2016

Time to ditch the Qual Browser! FreeStyle is rad!

FreeStyle has been floating around out there for a bit. If you got a new instrument in the last 2 years or so, chances are it came with the installation CD or installed on the PC on the instrument, but its just kind of lurked around. I know these cool guys up on Victoria Island that have been using it and one lab at NIST that uses it to make figures for papers...but...that's it, really.

I heard these words yesterday while anonymously sitting in on a vendor webinar. "FreeStyleWILL replace QualBrowser completely" and it sounds like soon.

YIKES!  I'd better download that sucker and give it a real run. The way the Warriors played last night, they did not deserve even 70% of my attention.

First of all. FreeStyle is very very blue. Like seriously blue. Like Ben Healey blue. Seriously, though, its worth looking at cause its honestly a lot better than Qual Browser.

1) FreeStyle knows what scan it is looking at -- every time -- and puts decimal places accordingly
2) FreeStyle can do all sorts of real-time processing stuff for you with single button presses (at the top of the screen. Check this out:

Is that an Xtract button? Seriously? You have your spectra open for your intact mass (protein or nucleotide chain...or...whatever) and you hit the Xtract button. A little menu pops up on the right where you put in very simple parameters and you hit "Apply"

AND ITS XTRACTED!!!!  You get a window with your masses and a table at the bottom that gives you the outputs. Here I took a myoglobin file I found in my downloads file (horse?cow? I dunno) and this is your real output. It found the intact mass of 16941.0072 and two less abundant peaks that look like water loss. How simple is that?

What if its small molecule data? Pull up a file, get to your MS/MS spectra and hit the NIST search tab. A NIST window opens to the right asking where your NIST library is. Reference that library -- and match. (Can't show you that one -- I've got to get to work in a minute!)

What I can show you -- hit the mzCloud button!  (First time you use it you'll have to Install MicroSoft SilverLight. BE VERY CAREFUL. It will try to set Bing as your default search engine...ugh....and make all your browsers start at  Uncheck those boxes! And install the rest. I promise it is worth the risk. Cause...


Here are your molecule matches, predicted structures, evidence, fragmentation tree. YEAH!

Okay, holy cow, I need to shower and get in my car!  There are other features, but I'm behind. do you get FreeStyle? Get your Xcalibur upgraded to the newest version and it just comes with the install. If you can't upgrade your Xcalibur cause you're on older instrumentation -- register or log in to the Thermo FlexNet and you can download, trial, and register it. 

Thursday, June 16, 2016

Are we making a mistake when we throw away the little stuff?

You know those nice databases we use that come from the genome sequences? It turns out that, in an effort to help minimize the tiresome annotation process, pretty much anything between two stop codons that was under 300 base pairs in length was skipped. That's up to a 100 amino acid protein! You consider the average mass of an amino acid at 110Da and that is an 11kDa protein that is systematically skipped, doesn't get considered for annotation in the genome -- doesn't get translated -- and doesn't end up in our FASTA database file.

They talk about the implications to proteomics a little when they talk about us letting the little stuff run right off the bottom of the gel!  In a more modern consideration of this problem perhaps -- YM10 is the typical FASP filter cutoff most of y'all are using, right? That 10 means that it retains anything over 10kDa. (yeah, I know its plus or minus a good bit from my own hands-on) but..whoa!  

Turns out that we're losing evidence on the protein end, it isn't in our databases and -- there is a ton of cool stuff down in that range! They talk about labs that are just trying to fill in those blanks -- and they are finding all sorts of cool small regulatory proteins in all sorts of organisms!

I'm gonna pulls some of the C-HPP papers on "missing proteins" to see if they're considering this (they probably are, I'm probably the only one surprised by this!) also, would our friends in the top down arena who are mostly looking at 30kDa down finding a lot of stuff that isn't in our genomes? 

Worth a thought, anyway, right? 

Wednesday, June 15, 2016

What is Morpheus up to these days?

Whoa! I totally stumbled on this.

I still love the Morpheus search engine the Badgers came up with. Simple, faaast, to the point.

Turns out, its still developing out there.

Case in point? This cool new paper from David Gemperline et al., in this they show how they can add spectral counting functions to the software. You can actually get the program here.

I like the font and layout, so here is a screenshot.

Curious, I went back to PubMed to see if I'd missed anything else awesome that you cool people have done:  Answer? Of course I have.

S.Zhang et al., developed a paired ion strategy for Morpheus and...wait...this deserves its own line.

Michael R. Shortread et al., wait...seriously?...used Morpheus to search only for curated PTMs, thus vastly reducing the overhead search space and massively improved their FDR. Wow!

Its almost like the ProsightPC flat-file (contains your annotated PTMs) search methodology, but for bottom-up!  Why aren't I doing this?

Only a 10% increase in your search space to add every known human post translational modification? This is brilliant! Improve your matches, improve your FDR, speed up your search?

Tuesday, June 14, 2016

Time to take the free version of PD for a stroll!

What did you do during timeouts in Finals game 5?

I'm running a bunch of different samples through the new free version of PD!

How is it?

If I said it is easily the best piece of proteomics software you can get for $0, you'd probably suspect that I'm biased. So I won't say it. Not trying to insult anyone out there, but this is pretty sweet.

Seriously, though. The framework of Proteome Discoverer is really really good. You get the ability to organize your experiments -- separate processing and consensus workflows -- persistent workflows -- friendly node driven interface -- a whole ton of the awesome things that the full PD version contains -- but you don't have to pay for it!!  Seriously, I'm pretty sure the manufacturer dumped a lot of time, energy -- and money into developing this interface and the IMP-PD piggybacks all of that.

Throw in the ability to instantly pull out your XICs for the PSM you're looking at? A really nice label free quan node -- and differential statistics? This is a SERIOUSLY nice piece of software.

The next real question is this -- is the Free version of PD better than the pay for version? Well...I'm totally not going that far. I need SILAC quan (pay for version only); I like the PD 2.1 TMT quan algorithm better than the IMP one (honestly, possibly cause I'm misusing it -- waiting for the publication eagerly! ) and the full version filters better. One more difference -- speed of search.

Same data file; 65,000 MS/MS spectra -- normal-ish settings

Sequest first (I highlighted the wrong line. I meant to highligth the 3.5 minutes for the Sequest search. Did I mention I am watching the NBA finals while writing this?)

Then MSAmanda (the only search engine in the free IMP-PD.)

Alright, here is a limitation. The current version of MSAmanda is the fastest one yet, but its still not as fast as Sequest. (P.S. The numbers here look different just cause of how the formatting lines up in the Administration window.)

60,000 spectra is pretty small by today's standards. Put this up against the much more normal set of 5e5 to 1e6 total MS/MS spectra? This is gonna be a disadvantage. Throw in some more PTMs? Yikes. Consider that my processing computer is probably a good bit faster than yours? Double yikes.

Wanna see something interesting, though?

Sequest total search time runs 18 min with this set. Mostly cause Percolator takes 9 minutes or so.

Elutator is the IMP option for Percolator. And it is also a little slower than Percolator (publication in works on this one and I'm excited for it -- this little algorithm is superb!) But when you look at all the little steps in the processing pathway, the difference isn't all that different. Percolator + Sequest is 18 minutes in this format. MSAmanda + Elutator = 35 min.

At some point I realized I was running off of my HDD storage drive, rather than off my C:/ which is an SSD. But I was too distracted by the game to rerun this. Times might be faster for both of these datasets.

Okay, so I'm gonna take the high road (this has been popular in the media this week). The IMP nodes compared to the pay for nodes?

WOOOOOOOOHHHHOOOOOO!!!!!  Using both gets me a bunch more IDs?  Sure. So I go back into the MSAmanda+Elutator matches, look at them usng my normal cutoff procedure -- and they're good -- like, they're real good. My worst scoring peptides are great.

The best solution for me is
Sequest + Percolator
MSAmanda + Elutator

Cause that's a bunch of peptides out of an Orbitrap Elite 2 hour gradient!

See why I'm excited to know about Elutator?!?!

Monday, June 13, 2016

Correcting protein interaction databases by separating protein into >5k fractions!

Yeast 2 hybrid assays (Y2H) are a classical molecular biology technique. They were a huge leap forward in our ability to determine protein-protein interactions. They got even better at it when they could be cranked up with automation -- tons of robot arms whirling about and generating huge knowledge bases of what protein interacts with what other ones.

So....what if those knowledge bases turned out to be somewhat less than perfect? That would be exciting, right? What if other databases of protein-protein interactions also turned out to have some high FDR as well? It would at least be pretty controversial. So maybe you should cover your bases by doing some serious benchmark. Maybe running more than 5,000 fractions(?!!) before you submit that one to to MCP....

And that is what happens here in this new paper from Maxim Shatsky et al.,. These authors use a seriously convoluted methodology involving protein level fractionation and iTRAQ labeling to determine interacting protein partners. The paper is open access so I can borrow this image, I think.

Get it? Told you, convoluted!

They start by growing 400Liters(!!) of their opportunistic pathogen (D. vulgaris), which I imagine doesn't get chalked up as someone's very favorite semester, so they can start with 10 grams or so of protein. Then they start by doing the ammonium sulfate fractions. Each fractions then go into ion exchange (this is all still protein level), they shuffle up the fractions to make it less likely they have overlap and do HIC and repeat the step above with SEC.

The goal? End up with fractionation so complete that they should only see proteins hanging out together IF they are part of the same complex. At this point...I'm...skeptical....but curious enough that I keep reading.

Thank goodness, here they don't actually run 5,000 fractions. They digest the protein fractions - iTRAQ labelled with either the 4plex or the 8plex. The method section is very confusing here...unless...and I'm not ruling this out...they actually started this study before the iTRAQ 8 plex reagent was released. I first used that in 2009 or 2010, I think. This might explain some other things here.

Next, iTRAQ labeled fractions are combined by mixing fractions from further down the fractionation scale to further minimize overlap. Then the iTRAQ labeled mixtures are (oh no!!) peptide fractionated and MALDI spotted. Seriously. I'm writing about this study because I think its good. But I'm envisioning one of these authors completing his/her Ph.D. on this project after 13 years or so of working on it.  Which is fine. If you're gonna spend 13 years in grad school, there are worse places to do it than Berkeley.

Joking aside. This is where it gets more confusing. So now you have these iTRAQ combined 1,400 fractions or so. And you get these quantification values for all of these peptides. So then you have to recombine this data. With this amount of fractionation they are able to get about 1,400 protein with the MALDI-TOF. This is about half the bacteria's proteome. Now they have to recombine the data to figure out what is coeluting (and therefore complexing) with the others.

I mentioned some skepticism above. So they look for their model complexes. The ones that Y2H has seen (as well as other techniques). And they are there. With each other. This crazy thing works. Wow. They do mention that they previously did this on a much smaller scale. But there definitely had to be some relief after all of this work to see proteins showing up together that you know should be together.

The method section then has a lot of words I don't know, but this is probably where you figure out how 17 trillion MALDI-spot files work together. This is where they need to build their pipeline to work out these interactions and build up their database of what interacts with what in this bacteria. For an added complication they get some historic data and run it through their pipeline and it turns out that 1) some of this published stuff has HUGE FDR. And some of these established databases through Y2H and other MS techniques have FDRs higher than we'd hope to see. In one dataset they suggest and FDR as high as 85%....yikes.

Seriously. I do like this study. And they needed to cover their bases here. Especially with the MS technology they had to work with, the upfront workflow is going to need to be huge. And you can't start something this ambitious and then stop or change the methodology in the center.

And then this guy gets to graduate!  (Last joke. I promise)

BIG and ambitious study. Glad it was done and almost as glad that I didn't have to do it.

The big take away here. If your MS/MS data suggests that you've got some interacting proteins but the databases say it isn't likely, or true, it might not be the mass spectrometer that is wrong. And hopefully, eventually we'll be able to replace some of these existing knowledge bases with something better and more accurate.

Sunday, June 12, 2016

iPRG 2016 study -- Inferring proteoforms from bottom up data.

Whats the iPRG up to for 2016?

A survey of our capabilities of inferring protein IDs from the peptides we identify!  Hey, yeah! Where are we with that stuff? We're darned good at identified Peptide Spectral Matches (PSMs) but we have to take kind of a jump forward when we say what proteins are represented by those peptides.

You can find more details on the study design here.  I'm super curious to see how it turns out!  Oh, and also more details here (including how to register).

The poster at ASMS said it was still looking for participants.

And something I stumbled across while looking for this listing? The iPRG is also looking for some new members since some people have or will be retiring. Its a nice chance to be involved in really seeing where our field is right now - as well as where it might be heading. More details on that here.

Saturday, June 11, 2016

Its official! We're getting FPOP in Baltimore!

I didn't want to break this news until I saw someone else had and I just saw it on a message board, so I don't have to wait. This summer, my good friend Dr. Lisa Jones is relocating to University of Maryland, Baltimore!

This means a couple of things!  One, we've got a leading expert in 3D protein structural elucidation by mass spec right here in town. Two, it means that another major University is paying attention to how mass spectrometry can solve biological problems that were maybe solved by other (slooower and less precise) methods traditionally, and 3 (selfishly) another great lab that I (and others here, of course) can visit and kick ideas around with.

Lisa is looking for a postdoc. The posting is here. Save your time and only apply if you're real good. ;)  Want to know more about her and the group? Here is the current lab website!

She's also bringing 2 graduate students from her lab out here to Maryland. I got to talk with them for a bit and they were seriously sharp. So I guess there is a number 4 - more young and talented mass spectrometrists here, so....

Top level mass spec people
+ awesome instrumentation
+ access to many of the world's leading disease researchers...most of whom have never tried to look at proteomics to solve their problems
=More world changing science happening right here in my home town!!


P.S. If anyone else top-notch is interested in relocating to the beautiful state of Maryland, I can assure you a warm welcome. Genome Alley is finally realizing the power of Proteomics and we need talent here. (Even more secret good news coming!!!)

Tracking aging and senescence with proteomics

In this young field there are so many papers that just seem to just revolutionize it. The leaps and bounds you guys are making just seem to appear out of nowhere. Sometimes, though, its nice to read something a little more straight-forward that introduces a couple new things and doesn't make me feel inferior to the authors for not thinking of it first.

(Seriously, though, there was a ton of stuff at ASMS like this. One of the very smartest things I heard at the whole conference was in a casual conversation and a friend mentioned a method they've been using in their core that is so elegant and simple and brilliant that I swear everyone just stood around with their mouths open. Since nothing on earth seems to stop me talking for much time at all, I was the first one to yell "why haven't you published this yet!" I'm giving him a couple days to tell me if  they're writing it up before I make all y'all feel really awkward for not coming up with it....unless everybody is doing it and no one has ever thought to publish it cause its that simple.  Not every smart thing is published...and that's why conferences are a great idea!)

Back on topic:
This paper is a good example. It isn't a run-around-screaming-scaring-dogs-and-squirrels kind of study, but it is a solid piece of work. It is from Yang Lu and Jingchao Wang et al., and is a look at how kidney tube epithelial cells progress as they age.  Not a headline grabber exactly, but I got 2 really interesting points out of it.

The first is that senescent markers seem to be some really high copy number proteins. Seriously, look at that list above. It is even more pronounced in the tables in the paper. When we were gel slicing and running the Finnigan LTQ we saw virtually all of those in every sample. Going down their table it was like --- wow! I know all of these! Conclusion 1: Senescence (aging) markers are "high fliers" in that they're high copy number and/or very amenable to LC-MS analysis.

The second thing I learned about in this study is the Molecular Annotation System (MAS). Which is an older, but unknown-to-me pathway generation system. It isn't the most high-tech system I've ever seen, but it has some nice functions.

This thing above is simple, but might throw a nice new angle in your downstream processing pipeline. It builds networks from your observations and color codes by P-values. Another function is the auto-generation of heat maps based on the data that you feed into it. The input interface seems pretty flexible, and if your collaborator just wants a heatmap -- hey, here is a webinterface to point them toward.

Oh, I guess there is a third thing I learned and get to look forward to as I continue to get older -- the conversion of the epithelial cells in the transfer tubes of my kidneys to mesenchymal cells where they lose flexibility and proper function. Yay!

Thursday, June 9, 2016

It's over!! Ben's ASMS 2016 roundup part 1 -- Software!

Yeah...what a week. Around how excited I am to see old friends and cool new science, I always underestimate what a freaking marathon ASMS is.

For those of you that didn't make it, or just want to read my rambling notes on the conference, here I am rapidly typing in the picturesque San Antonio Airport.

Impression 1:  Software was king!

From the first look at ASMS, this wasn't a big year for hardware. There are some cool things definitely, but there were some major software things!!!

 In no particular order.

1) Fusion software 2.1 unveil.  Whats it do? It adds a ton of new features to the Fusion or Fusion Lumos. Okay. While the QE HF is my current favorite instrument (I'm a biologist, after all), I seriously dig this path that the manufacturers (whatever their name is..) has taken with this instrument. If you invested in the Fusion or Fusion Lumos, every 6 months or so you effectively get an awesome instrument upgrade. With the hardware configuration and the 2 powerful processors onboard the Fusion all sorts of things are possible. Someone just needs to think up an awesome experiment and BOOM!  Its a free new instrument upgrade for everybody.

Fusion 2.1 has some new tweaks, but the highlights are canned methods for crosslinking analysis and standard triggered TMT quan. Doesn't sound super impressive, but hear me out

If you've been doing this proteomics thing for any length of time, I bet you've taken a swing at crosslinking some proteins and then figuring out which ones are partying which other ones.  And chances are you've been seriously frustrated. I've failed at it. Friends WAY smarter than me have failed at it.  So...Thermo created a new crosslinking reagaent, added a really amazingly complex workflow for the Fusion sofware for working with that reagent, and ...

...Proteome Discoverer 2.2 (which I also got to see in action for real!!) automatically processes the data!!!!  P.S. You can also use your crosslinking reagent of choice, but the new one seems pretty bad ass.  Oh, all this work was done with a big collaboration with the Heck lab!

Standard triggered TMT quan (I forget what they're calling it)  You know that Bruno Domon method where they spike in a standard for the peptides you're looking for at high concentrations and the QE starts doing PRMs for the light endogenous peptide when it sees it?  (P.S. There were, of course, more very nice and frustrating posters on this method and it still seems like you have to join some secret fraternity on the solstice in the cellar of the Kirchberg campus -- for added effect I suggest you picture Rachmaninoff playing in the background).

In this method, they are using a TMT0 tag to label the mix of standard peptides. These are spiked into the mixture of your TMT10(!!!) labeled samples(!!!). See where this is going? It gets better. When the Fusion or Lumos sees the TMT0 tagged peptide standard it then starts looking for the TMT10 labeled peptides. In what Dr. Gygi described, they then use a huge ion injection time so they can get ridiculous levels of sensitivity. Its TMT so you don't need a bunch of measurements across the peak. Get that sensitivity!  It does a bunch of other smart things including verifying that the tag really is true with a super fast survey scan or something...

What I care about? The methods are canned. Load this template and get going!  Ridiculous levels of sensitivity on 10 samples at once. The way they're validating it is ridiculous. No offense to the instrument manufacturer, but the three times they described it I was having trouble keeping my eyes open. Then Gygi got on stage and it was like...oh...wait?..what?...WOW!!!  Its like the super secret triggered QE PRM method but multiplexed, better controlled and multiplexed!

2) Proteome Discoverer 2.2 (serious, folks, its real, hopefully beta testing starts soon) -featuring:
a) Peak alignment and label free quan nodes
b) The aforementioned crosslinking nodes
c) Isn't that enough!  Holy cow!  I'd just take label free quan.

3) Prosight 4.0 -Faster better stronger top-down (at lease one non-Kelleher group has it, so betas might be available if the commercial release isn't ready)

4) CrapOme 2.0! (Was shown in poster, didn't appear to be up on the website.

6) New Byonic (on the way, I guess, no real time to stop by the booth...)

5) [Told you. No particular order]. Nuts. I've got 5 cool open source software things in my notebook, but none of them have papers up or their web interfaces active. Okay. Cool stuff on the way for --
protein network analysis, visualization, full core lab unsupervised QC systems

9) DIA Probe - Quality control for those of you who are legally allowed to run and process data independent acquisition experiments!  Its an add-on for the world's most popular targeted quantification program

For now, that ends the software stuff. There's definitely more I forgot, but the WiFi on this plane is super slow so I'm gonna make a Part 2 (and probably edit this a bunch later!)

Wednesday, June 8, 2016

Any interest in Proteogenomics?!?! GO TO BOOTH 601!!!

You know how all of us want to do ProteoGenomics but we don't all want to learn how to program?!?

What if there is a little company that has a bunch of cool software (seriously, really cool stuff) but they have something ridiculously awesome.

They have an integrated RNASeq to FASTA generation tool.


They have tool that you run on your PC that changes your RNASeq data to a FASTA file. You can control your quality filters within the tool. You don't have to learn how to use R, you don't have to set up a collaboration. You put your RNASeq data in this software and it spits out a FASTA file (with quality filters you can control) and you run that database in your software of choice!!

For those of you who can't visit these brilliant people in their booth, you can find their website here. 

For those of you at ASMS? GO VISIT!! Tell them I sent you, LOL, when I realized what they're doing, I think my enthusiasm might have come off just a little...crazy....

EDIT (6/11/16): Seriously, this is not the only really smart thing this little company is doing. And when I saw another brilliant poster from them, I realized that they weren't just dropped off by a UFO at ASMS 2016 with proteomics from the future. Tony Koller had mentioned to me the ridiculous work they were doing probably about a year or so ago and I forgot to investigate them. An endorsement from somebody that good at this mass spec stuff reinforces that these are some people we need to keep an eye on. I've asked them if they'd come out to Maryland/DC to show some of this stuff sometime in the coming months. I'll keep you informed!

Tuesday, June 7, 2016

When you get to only 1 talk on a day at ASMS? Top-down KRAS, FTW!!

I consider it a pretty good ASMS in my line of work if I get to 3 talks. Just too much to do. Today, I really get a shot at one talk and only by running really fast and being late for some other stuff.

Top-down proteomics + K-RAS  (the most common protein mutated in cancer!!) Yup, that's where I'm going to end up.

This talk was an extension of this recent paper I rambled on about recently.

Thinking "Great....another talk on a paper I can read?" Since reporting on the K-RAS intact proteomics data, they report 10x more coverage of intact proteins. They also show the improvements in top-down technology allows identification of positional isoforms (which all come out in one single peak!) and improved statistics that lead to all of these new proteins.

They get enough quantifiable top-down that they can do pathway analysis!!  Quick conversation with the presenter says this paper is in works! I sure can't wait for it!

Monday, June 6, 2016


Whoa!  Whats that thing? That is a capillary electrophoresis equipped Q Exactive HF Biopharma (it can do all the regular QE stuff, but has additional features for intact and native proteins.

The CE system is called the ZipChip. High resolution capillary electrophoresis that can separate -- metabolites, peptides and even native intact proteins.

How high is the CE resolution? It can separate out your intact proteoforms that differ by only one PTM!!!

Sunday, June 5, 2016

Genome level tumor proteome characterization!

WHOA!!! Great talk and its from this ridiculously great paper in Nature Cancer from S Tyanova et al., that you can find here!


Application of SUPER SILAC to a huge cohort of tumors and the application of beautiful levels of statistical analyses to flush out the differences between these tumor types.

Observations found and validated that did NOT appear quantitatively disregulated at the transcript level. (Though a bunch that did! Don't think I'm giving up on the power of RNA Seq just yet, just enforcing the fact that they are both necessary to get a real picture.)

Back to the beautiful statistics!  New algorithms implemented into Perseus...which means its currently outside of my capabilities,

Saturday, June 4, 2016

MutaBind! Does a mutation in your protein mess up your complex!?!?

As I gaze into my crystal ball from the time warp that is this blog (seriously, I have no idea what time it is on this blog at all. I just hit "Automatic Publish" and it puts in a random time. I do really prefer the format of the pages when there is one post per day -- that is the other reason dates and times don't seem to make sense..what was I talking about?) I see that protein interactions will be HUGE in 2016. Like we-finally-figured-out-how-everyone-can-do-crosslinking-experiments kind of huge.

What do you need in terms of downstream analysis? Depends on what you're doing but MutaBind could be a huge asset!

What's it do? It figures out how a specific mutation might effect your protein-protein binding at the tertiary/quaternary level. What?!? I know!

You can read about it here (open access!). It's super sweet!

Friday, June 3, 2016

Off to Sunny San Antonio!!!

Wait. What...?  This is the desert, right?

I made it. The U.S. media makes its money by trying to frighten everyone. Its raining a little. No floods here. Don't be concerned about coming!!!

Thursday, June 2, 2016

How could you improve Proteome Discoverer? How 'bout a free version?!?!

Y'all know I'm biased. I love what Proteome Discoverer has become. It was a voyage for me. From unfiltered hatred for version 1.0 to grudgingly using 1.2 to really liking everything after 1.2 a whole lot. With 2.0 I think we've easily got the most powerful commercial platform in the world.  Sure, it could use some statistics and better label free algorithms, but I hear lots of work is being done to that effect.

But if you were going to do one thing to make PD better today, what would that be?

Well, the IMP thinks that one way to do it would be to make it FREE!

You get free PD, and you get free PD, you all get free PD!

Wait. What? I know! Confused? Me too! But this is what it says:

PD has always had some nodes that anyone could use. The ability to convert Thermo .RAW files to MGFs, for example. In PD 2.x users get the ability to use many of the Consensus and filtering nodes without a license (or with an expired demo key). But what about the free nodes that groups like IMP, OpenMS and Protein Metrics have made? Apparently they can work in that expired demo environment.

So Karl Mechtler and his group decided to fill in the gaps. They made nodes that will allow you to run full workflows in PD 2.x.

Oh...and they also threw in a node that does quantitative statistics based on the Limma R software package developed that was originally developed for transcriptomics.

As a win for everybody? They also resurrect the MS2 Spectrum processor!!

I'll throw out a healthy disclaimer here. As with any of these nodes that aren't released by the manufacturer, you can't get these and then bug the software manufacturer if you have problems with them.  Fortunately, IMP has set up a GoogleGroup and help documents!!

Enough of my ramblings? You just want to run a single .EXE file and get a bunch of cool new nodes and the ability to run a new version of PD on any computer?

Go to this link: