Thursday, March 31, 2016

Case study -- Beautiful looking data but NO peptide IDs!!!!

...Sherlock Pug...

Anyway!  I recently got involved in some sleuthing helping a friend solve a really weird mystery and he gave me permission to share this in case any of you guys run into it.

Background: GREAT lab. Q Exactive Classic that is approaching historic levels in terms of the number of papers this one little box has turned out. Absolute work horse that has pretty much been calibrated and PM'ed and otherwise hasn't stopped running in 4 years.

Big study underway. Fractionated samples through a time course or something. 400GB of RAW data or so. Ran through a weekend and a holiday break. Why not. This thing has never had a problem and we know from lots of historical evidence that the calibration will be steady throughout the run.  And this instrument is always lockmassed on our good friend 445.120025 (QE rounds to "3", of course)

Go to process the data --- no peptides ID'ed! None. Check the TIC? Fantastic!  Check the MS/MS spectra -- tons of beautiful spectra. WTF!! (what the phenylalanine??)

There are a lot of steps in a proteomics pipeline where things can go bad, right? But the RAW data looks fine, so maybe Proteome Discoverer isn't communicating with the Mascot server correctly? Maybe the Mascot server is just mixed up? Maybe someone switched the FASTA database (don't laugh, I've seen that one, LOL!)

I'm lazy and don't have a lot of free time, so I start with one file. I run it through Sequest on my server. Same results.  Time for Preview (which I run through PD, of course!)

HOLY COW!!!  Wait, what? We're 65ppm off at the MS1 and 64ppm off at the MS/MS??  Is this the best file I ever ran on my QTrap? (LOL. Hey, I loved that QTrap and built my entire career on it, but you don't buy a QTrap for mass accuracy. You buy it for...well, we bought ours cause it was free! True story!.)  No, this is Q Exactive data and 65ppm is nuts

More investigation. This lab had power outages do to a recent severe storm. The Orbitraps are all on UPS, so they never went down. But the climate control? Well, it lost its mind.  Turns out the room where the Orbitraps are may have gotten heated to over 90F (that is over 32C for you people who use intelligent units of measurements. Slackers. Anyone can count by 10s!  In 'Murica we like to challenge ourselves with arbitrary historic units!)  Over 90F...maybe more than once.

So...if you've got an LTQ Orbitrap in a room and your room goes from sauna to freezing a couple times, chances are nothing is going to happen. There is a huge water cooling system on that bugger and its gonna stay at/around 26C. A Q Exactive, however, has no cooler. It compensates calibration by the temperature and there are limits.

Also, lock mass on a Q Exactive can only correct mass errors within 20ppm.

Okay, mystery solved!  How do we fix it?

Well, Preview/Byonic's recalibrator has no problem with it. Lets try another fast recalibrator!!!

Thank you Mechtler lab!!  Now, normally this thing works without a hitch. Boom!  Recalibrated MGF output. In this case, however, we uncovered a limit to this awesome free resource (this is at, btw).  65ppm is too far out, it appears.  I fixed some stuff that was around 20ppm, but that has been my upper limit so far.

What else? Now, there are plenty of tools out there, but I don't know all of them.

One tool that no one seems to know about is a free tool from Thermo (I didn't know about it, shoutout to Detlef for introducing me to it). It might be already installed on your instrument.  Some engineers put it on at install, especially on the LTQ Orbitrap instruments. If you don't have it you can get it at:

The tools are called FTPrograms. There are several cool things in there, but one of them is called RecalOffline. If you install FTPrograms, Windows may not recognize it as a new tool and highlight it. You'll have to use the search bar to find RecalOffline, but then you can make a desktop shortcut to it:

Okay, here I grabbed a random file and tried to anonymize it. This is just an example and I've got a meeting to get to!

So you give RecalOffline a mass to choose. Here I put in my polysiloxane 445 and then you give it a mass tolerance. I cranked this up to 70ppm just to see if it can take it. And you have other setting controls. It will then go through every single MS1 scan and adjust the calibration so that you've essentially lockmassed the file.

Worth can be very slow. Do this on an old Dell you've got sitting in a corner that you aren't sure why you still have. Or on something real fast (SSD, for the win!). You can also use the Slicer function to cut out a smaller section of the file and then Recal it to reduce your processing. And as far as I know this GUI can only be used for one file at a time. I believe there is/was a way to automate it from the CMD prompt, but I don't know if that feature is still enabled or how to do it or if I just made that up.

And please note, I'm not questioning anybody's skills here. These guys know what they're doing. If you walk away on a Friday and there is no alarm to tell you that your buffers on your LC are boiling, this can totally happen to you. I shared this anecdote with some of the members of the great Thermo COE team and 2 of them had seen something similar happen in their years on the road.

What I want to show you is that all is not lost. Chances are you can totally save those files, and these are tools that can do it.  BEN, SHUT UP AND GO TO YOUR MEETING!!!

Wednesday, March 30, 2016

MaxReport GUI- A friendly complement to MaxQuant

There are a lot of MaxQuant fans out there.I was one once, for sure, and I keep meaning to check out the new versions. No question, it is powerful software!
My objections to the software are 1) Wow, that is a lot of buttons and settings and I have no idea what those are..and really, I don't have time to take a week of classes to figure out what they all do and 2) Wow, that output is really ugly.

We all have different strategies to get what we want. If you want to maintain your aura of the awesome mass spec wizard in the basement with your tools that no one in your department will ever really understand -- actually, most software will let you dazzle your ignorant collaborators with your unsurpassed brilliance --- but MaxQuant just might take the cake. (Just make sure you know what all 75 columns in that output sheet are when they ask! Or do the huffy frustrated genius thing, roll your eyes, storm out and go look them up.)

(First image Google came up with for "frustrated wizard"!  I'll take it!

Do we have to trade power for ease of use? Not according to Tao Zhou et al., and their new work in PLOSone!  Here they introduce the MaxReport GUI above.

You can download the GUI along with the source code (Python) and a CMD version of the program here.

Alright!  That there is a rational number of buttons!  And all I had to do was download, UnZip and run the .EXE to get an interface.

Okay, so what does it do?  According to the authors its main mission is to:  optimize the results of MaxQuant and to provide additional functions for protein N-terminal modifications, isobaric labeling quantification and descriptive statistical analyses

Yeah!  Now, its worth noting that the current version is compatible with MaxQuant 1.2-1.5. What do we really get now out of the program? Well, we get some nice output graphics and summaries of the data.

Hey!  Its one heck of a lot friendlier than the imposing Excel worksheet I tend to end up with. And biologists do love pie charts. What I'm seeing is some stuff that Perseus doesn't give me and something that is a whole lot friendlier than an Excel worksheet that goes out to column CQ. Does it really take MaxQuant and package it nicely for muggles? Not quite, but its certainly a step in the right direction!

Edit: The more I think about it, the more I like the ability to stack predicted isobaric tag quan info over the observed. That actually is a pretty nice touch. Not sure where I'd use it, but I haven't seen it before!

Edit 2:  Stolen from this Tumblr

Tuesday, March 29, 2016

Hunting metabolic biomarkers of Huntington's disease

More metabolomics? Geez, Ben, read some proteomics papers!  Sorry, this is what I'm doing in my free time right now! And doing metabolomics this year has yielded more interesting things than the last 5+ years of proteomics runs I've evaluated on a certain organism I think about in my free time.

In this paper described in the title, Stewart Graham et al., appear to have reached a similar position. The samples they work with are Huntington's disease. Another awful nefarious neurodegenerative condition.

This is interesting. So metabolomics has been done on these samples before.

Someone did NMR...and found nothing.

Someone else did metabolomics with a QTOF...and found nothing...

So...why on earth would you think that you should get some precious human brain samples and do Orbitrap metabolomics on it? Cause...maybe the Orbitrap is the BEST metabolomics tool in the world? And you'll find stuff with it that nothing else can find? Maybe!

In this study, these researchers use an Orbitrap Elite and do positive mode only.  They tuned the instrument with a couple of low mass metabolites. They don't elaborate much here. You can gain some stuff on an Orbitrap with an S-lens if you tune the front lens. Don't mess with the others. Just lock them in the correct settings. (Hint: thats why we don't tune a Q Exactive, the S-lens is locked in the correct settings. Those are on the blog here somewhere, right?) [[ Edit: here is a link to the S-lens ideal settings : 2,3,9,15,20,800,front-lens!  If you've got an LTQ with an S-lens hang that document up somewhere nearby! ]]

They run the instrument in MS1 only mode (as far as I can tell) and used a resolution of 60k. You might argue that with a high field Orbitrap that has its overhead in the front of the instrument that it would make sense to run 120k, but 60k looks pretty darned nice here!

They run some pooled samples to improve the statistics, convert the files to mZxmL (or something) and run the data through XCMS and use a ton of stats I don't understand.

To review: NMR didn't find anything. A Q-TOF analysis didn't find anything. An Orbitrap found 200+ features that were differentially regulated and significant enough that they went back, reran these and targeted for fragmentation. At MS1 of 60k you can basically tell what a lot of these metabolites are but fragmentation and mzCloud can verify these identities (and figure out the ones that you are unsure about).

They put their 15 favorites in a table.

Then they put their 8 favorite pathways THAT ARE MESSED UP IN THIS DISEASE in the final table.  And this is just the soluble metabolites. That ionize in positive mode. And elute in reverse phase!

Seriously. If you have a biological model and you're thinking...well the metabolome can't be involved here cause So and so et al., did 4,000 Q-TOF runs on it. might want to just take a look. If you're reading this you probably have an Orbitrap...

Monday, March 28, 2016

A snapshot of how far we've gotten!

Stole this off Twitter tonight (thanks, Julian!). Still counts as a post to Blogger!

Sensitivity is up! Speed is up, and in the immortal words of the great Ricky Bobby....

Thursday, March 24, 2016

FPOP (and lock) footprint quantification!

I am going to admit it. Every single time I've heard about FPOP, my brain has started saying something about "Pop and lock," which I assume is in some dumb Dubstep song on my Pandora station (and apparently that, above, is what it is...)

What is it?  What?  FPOP!  What is FPOP? Oh, its a mass spec based protein structure characterization method!  Wait! Here is a picture from this awesome new paper from Aimee Rinas et al., that I'm going to talk about once I get back on topic. (Get back on topic, Ben!)

Who needs an NMR (whatever that is...) to figure out what domains of their protein are where!?  Not you if you've got an Orbitrap. Your protein in its 3D state hangs out in some solvent and then you hit it with oxidating conditions. The domains exposed on the surface are oxidized/modified. Compare, for example, a mutated protein to the wild-type in this way and you can look at how your mutation changes the protein 3D structure. BOOM!

An interesting note from the paper. We only know the 3D structures of about 11% of the proteins in Swissprot. I think its because whatever that NMR thing is, isn't all that fast. Not only does this FPOP thingy give us the ability to find out stuff about our 3D structures without other equipment, this is LC-MS, so presumably we can automate it and speed it up!

This cool paper addresses this labeling technique and shows you how you can use the FPOP methodology and process the data with Proteome Discoverer.

Now, not everything in the world is easy to do. And setting up Proteome Discoverer for this experiment is one of those things that isn't easy to do. Good thing this clever team in Indianapolis set up the whole thing for us!

Thats what it looks like. Why all the nodes? Well, cause a downside of FPOP is that it doesn't just put one modification on the protein where it can put a whole bunch of different mods. And you need to use multiple nodes to get that many mods. The use of Mascot and Sequest allows for higher certainty in the assignment of the modifications (if Mascot and Sequest both agree on this ID, then that adds more certainty, right?) which is important when your search space is this big...even on a single protein.

With all of these competing engines and mods, you'd immediately worry about FDR, right? A second layer of FDR calculations (performed using a cool Excel add-on!) shows that this not only works, but when you are looking at PTMs with HRAM MS1 and MS2, the number of false identifications really isn't that bad!

EDIT: I recently spoke to an author on this study and realized I left out something important here. FPOP labeling is performed with a specific wavelength of lazer (laser?). This is a critical requirement for performing the experiment.

Wednesday, March 23, 2016

"Miniprep" methodology for extracellular vesicles

Ever spent time around genetics people working? Its all "miniprep" this, "maxiprep" that. For over a year I thought that a coworker had an odd speech mannerism or defect of some kind, then I realized there is also a "miDiprep" and he just couldn't say the "N" sound in that particular word.

The subject of this really nice paper from Jaco Knol et al., is a miniprep isolation of extracellular vesicles (EVs). Why? Cause EVs are super important and great places to find biomarkers. Unfortunately, the prep protocols out there are pretty terrible. You ultracentrifuge some huge amounts of samples forever and:
1) Maybe it works (probably doesn't)
2) It takes forever
3) If yours does work and somebody else tries to reproduce it -- it doesn't work. Then you assume they're dumb and you don't want to go out for drinks after work with and that's a slippery slope of seeming snobby and/or just antisocial and you eventually have no friends and just end up writing a blog or getting a bunch of small animals or something else in some desperate effort to fill that lonely void in your life.  ;)

To save themselves from this fate, these authors go through a previously described protocol that has been shown by various inferior methods to be 1) Fast 2) Scalable 3) Reproducible! and show that the darned thing works for proteomics!!!

Check out the time savings summarized in the pic above. Normal method on left! Almost 4 hours of just centrifugation. Versus...wait...I cut the timing off the right. Its 32 minutes! You'll have to believe me, I'm not taking another screenshot. I've got to go to work this morning.  Way better!

Not only does this method work, but the coverage is awesome. They show something along the line of 3,000 protein IDs with their method which is on par with the much longer method and pretty solid levels of reproduction.

This is a great little method that is going to make the days of some people I know a good bit shorter. It also highlights the power of looking at EVs ,and the dangers of overlooking them. That is a big section of the proteome floating around there!

Monday, March 21, 2016

How to prep red sparkling wine for proteomics!

Imagine this: You are about to get ready to prep some red sparkling wine for proteomic analysis -- then OH NO! There are only protocols for white sparkling wine! You search and search the literature only to find that everyone before you has taken shortcuts and studied Champagne or whatever.

That was really the case before this year!  Fortunately, Elisabeth Vogt et al., bit the  bullet, did the heavy lifting, opened some red wine and, working 50mL at a time developed some good methods for getting to clean red wine proteins.

I can't fault this study. They found an unaddressed subject and developed a method to address it that they describe in concise and logical detail, when I probably would've just FASPed it just to see what happened.  I know what you're thinking. Another wine study? I bet this group is European. And......

....they totally are! (A Top Cat reference...seriously...?)

Sunday, March 20, 2016

Unsupervised quality control!

I LOVE quality control (you may see evidence of this fact here and there in this blog). Most of the stuff that you'll see, however, is what I think would be considered "supervised QC". Not 100% on that actually.

This page gives a breakdown of what is supervised and what is unsupervised learning in statistics. And it uses the disembodied head of George Clooney to drive the point home (??)  So it is probably better to say this: QC software that I've seen before have been more of a supervised type. That there are specific compounds or paramters that the software is told to look for.

In this new paper from Wout Bittremieux et al., this team describes a suite of tools they have developed that are for unsupervised QC. Seems like you have to do a whole lot less up-front planning to keep watch over how your experiments are going!  That's the up-side. The down-side is that these tools were developed in Python....

You can access these tools at Github here!

Saturday, March 19, 2016

Find some missing transmembrane proteins!!

Word of advice this morning: Do not just search for the word "missing" for a cool image. It can be somewhat depressing. "Found" isn't all that much better... Moving on!!

That one is okay!  Moving on!

Okay, so the article that started this morning's search on the train is this cool new study from O.Vit et al., and is an analysis of the proteins hidden in the membrane

Membrane proteomics is still not trivial business and this study reveals why when they detail how they uncovered 13 of the missing proteins (the ones that have been identified at the DNA and/or transcript level!) that we've never seen at the proteome level. 

I took this image from WikiPedia. Right, this is how we thing of proteins with trasmembrane domains. That there is a long series of nonpolar peptides that cross the membrane, then  a long section that is polar on either end. But what if there aren't long polar ends sticking out? What if they're incredibly short and/or undetectable to typical shotgun approaches for other reasons (like no lysines or arginines (too long!) or have tons of lysines and arginines (too short?), then we've got to get that stupid transmembrane domain.

So this group went old-school. They sheared the surface proteins off with mechanical force. They busted the cells and only kept the membranes, then used solvents and (ugh...) CNBr. (P.S. Cyanogen bromide cleaves at methionines if you're careful and it doesn't kill you first. To my fellow old lab rats who think I'm being a wimp about it -- you're right!)

How'd they do? I guess I already gave the secret away. 13 missing proteins ID'ed with high confidence!  One step closer to making sense of the link between the stages of the central dogma? And a great method for getting high coverage of the trickier sections of the membrane proteome!  So pretty good! 

Friday, March 18, 2016

Spectral libraries are a great idea for plants, too!

I only tried to do plant proteomics a couple times and it wasn't great. It is like a plasma proteomics run. Go across the TIC and point out the 50 most abundant peaks -- instead of being albumin, they are something called Rubisco.

Worse yet, there are tons and tons of polymer type things in there. Long polysaccharides all over the place (sometimes with amino acid attachments), but definitely stuff that retains well on C-18.

In this paper from Pier Righetti and Egisto Boschetti, they show the power of a couple of solutions to this problem. While the paper focuses a lot on something called CPLL (ligand binding something or other), I didn't read that part. Why lie to you?

What I found was interesting (is interesting) is that this is another place where spectral libraries show a ton of power. I've got some local friends who are kinda famous for spectral libraries and the more I visit them the more I've come to terms with: 1) Spectral libraries are way more powerful than I ever thought and 2) Ridiculously under-utilized.

Maybe I've mentioned this before? The fact that I saw a human tumor searched against a multi-GB spectral library and that the decoy spectral match was something in the range of 1e-3? Can't give details cause they're gonna publish this (soon, I hope!!!). Honestly, I've actually given talks (with some people from this lab in the audience....cringe...about spectral libraries being cool, but not really useful, cause I honestly thought they were dumb.  And you know what? I was wrong.  It happens a lot. Maybe someday someone will prove to me that SWATH isn't dumb...or flight tube ion mobility... I've got an open mind, fo' real, yo!

In this study they take a look at their problem. Not that they research plants, but the reasons why plants are hard to do proteomics on. Domination of high abundance proteins and polysaccharides all over the place. Generate some high quality spectral libraries and you can see through some of that noise. Just another paper on the stack of recent evidence that spectral libraries can boost our ID #s and make our lives easier.

Back to dancing time!!

Wednesday, March 16, 2016

GOrilla! Gene ontology enrichment online! And it's fun to say!

Shoutout to Brett Phinney for tipping me off to this super cool tool!

Gene Ontology enrichment is a hot topic these days!  Problem is, I don't have the time to learn an R script to do this stuff.

The GOrilla is a web-based solution!  It is 1) Fun to say (say it with me -- "GooooooooRILLA!!!) 2) Its real easy and fast and 3) It might be a rapid and free way to make some sense of that big list of differentially regulated genes/proteins you're looking at.

First of all, you need your background protein list. So...everything you identified.

If you're using PD 2.1 you can just start at your protein list and Export that top layer to Excel (click to expand)

Next you need to filter your list down to your differentially regulated list. (Thank you Dr. Horn for your assistance with this filter. It was driving me nuts.)

This filtered all of my data down to the ones that are up- and down-regulated beyond an arbitrarily determined statistically determined threshold cutoff.

Repeat the Export. In my case I end up with 148 differentially regulated proteins.

Now, go into both of the Excel spreadsheets and select the column that has your Universal Gene Identifier. Wait. You don't have one?? Reprocess your consensus with the Annotation node. That should do it. Maybe do that before you do all the steps above. It'll look like this.

Now, this is SWEET! Just highlight the whole column (first the differential and then your background list) and set up the GOrilla to take both.

Tell it what organism it is looking at copy in the lists (it is VERY forgiving of formatting!) and hit the Search button.

And it's gonna kick out an awesome pathway like this thing!!


Sorry it doesn't fit on the screen, I only had 20 minutes between meetings!!  This is cool enough to frantically put together, right?

Tuesday, March 15, 2016

Proteomics of MRSA blasted with antibiotic combo!

MRSA (methicillin resistant Staph aureus) is one of a bunch of scary-as-heck superbugs out there that are resistant to just about every drug we can throw at them.

Several recent studies tout the efficacy of so-called "resistant breakers," which are multi-drug combinations that lend efficacy to traditional antibiotics (cause...we'll probably have new antibiotics in 20 years or so...).

In a very similar tilt, this new paper from Xiaofen Lu et al., takes a look at an effective drug combo at a proteomic level in MRSA.  The drugs in this study are plain-old oxacillin and a subtly altered derivative of erythromycin.

In general, the -cillin drugs block the action of Penicillin Binding Proteins (PBPs). These proteins catalyze the formation of short peptide crossbridges that hold the long glycan strands of bacterial cell walls together. though we've been using it for 60 or so kills bacteria of some kinds? We're not really sure how. But it is pretty well established that it doesn't directly affect the bacterial cell wall.

Check out the electron microscopy at the top, though!  The panels go like this 1) MRSA chillin' 2) MRSA chillin' 3) MRSA chillin' 4) MRSA FREAKING OUT!!!!

So we've got a drug combo that kills this scary stuff. Lets do some proteomics!

So they harvested some peptides using normal techniques for bacteria and loaded it on an LTQ Velos and used spectral counting for quantification. Look, I've got exactly zero problems with spectral counting if it works for you. Are there more sophisticated quantification methods? Sure! Should you use spectral counting on a high resolution/accurate mass instrument? Probably not without really thinking about your experimental design (particularly your instrument parameters - as most Orbitraps instrument methods are designed in such a way, by default, to lower the efficacy of spectral counting techniques  -- not on purpose, but because you give up sample depth by increasing spectral repetition - and they are designed with sample-depth in mind). [P.S. There are ways to improve the efficacy of spectral counting on instruments like the Q Exactive that are designed, by default to "ignore isotopes" and not repeat the fragmentation of the same ions. But you will be giving up coverage. If it is worth it for you I can show you how some groups I've worked with are currently doing it, though I just gave you two big hints]

If you've got a fast ion trap and large amounts of material and good software for spectral counting statistics -- spectral count away!

Just look at the results this group got. Hundreds of statistically significant differentially expressed proteins! [Okay. Is "differentially" not a word? Cause Blogger thinks its not a word. But I'm going to use it in every post, so there.]

For data processing, this group used OMSSA, threw out any proteins with less than 5 spectral counts, and did downstream analysis with MatLab. Q-RT-PCR was used to validate the observations. In the end, they find a big change in the abundance of a major player in cell wall formation and a disruption in the bacteria's ability to respond to oxidative stress.

All in all, this is a really nice study. Great model, good use of a work horse of an instrument and some good downstream analysis that reveals some surprising new information about one of the deadliest things out there!  Highly recommended (and open access!)

Monday, March 14, 2016

Proteogenomics reveals cool insights into hibernating mammals!

It's almost climbing season in the NorthEast! I got out with a good friend over the weekend. It is the new record for the earliest date in the year that I have EVER climbed in Maryland. See, even when we are systematically trying to heat up our planet in some bizarre act of stubborn childish defiance that will definitely kill us all -- there are some silver linings!  I got an 85F day the first week in March!

Wait...what was I talking about? Oh yeah!  Hibernation!

So, around where I grew up and where some of the best rock in the world is, we have a lot of these guys!  

...and this time of year they wake up really hungry and the first thing they do is have a huge and painful looking bowel movement (click for more info on bear fecal plug..why? I don't know..cause its interesting?).... so they tend to be jerks.

Do we know an awful lot about hibernation? NOPE!  Heck, we didn't even know what those big foot long fecal plugs were made of until just a few years ago.

Sounds like a job for PROTEOGENOMICS!!

In this brand new study in JPR from K.J. Anderson et al., this team of researchers investigates the effects of the hiberation process on a nice model mammal. As you can imagine, bears are terrible models, so they used a 13-lined squirrel (much less of a jerk)

They got some squirrels, took some protein (I skipped how, on the grounds of that guy above being so darned cute) and they iTRAQ 8-plex labeled them, as so.

Now. Proteogenomics comes in...cause of course we don't have a good curated genome for that cute little guy. And maybe not for any hibernating mammal. So they took what sequences they could get from NCBI, did their own RNA-Seq analysis of these squirrels, threw in the cRAP database and they had themselves a FASTA.

Now, the University of Minnesota has done a lot of proteogenomics. I still don't have a good feel for how the heck they do it. A lot of algorithms were used here. A lot in Galaxy-P, some Protein Pilot(??) and they pulled in some DAVID for downstream analysis.

In the end, though, they wind up with a bunch peptide/database matches, some single mutations that make sense and are observable with high confidence at the peptide level and an overall nice story of how hibernating mammals reduce toxic nitrogen waste products during months of inactivity.

We also get another data point that shows: 1) Yes, transcriptomics is cool useful stuff 2) but without the proteomics to see what is REALLY happening, you've got a lot of spurious data that doesn't really show at the functional (protein!) level.

Sunday, March 13, 2016

BioCyc! Free access to almost 8,000 pathways!

Ever have one of those weeks that was so busy that you didn't want to read anything about mass spectrometry for fun for days?

Took access to some super awesome free tools to get me back into it today!

The tools in question are BIOCYC. You can access them here at

What are they? Just a ton of cool stuff. Nearly 8,000 curated pathways at that have been analyzed at the genomic, proteomic or metabolic level. Downloadable tools that will take your differentially regulated gene/protein/metabolite list and overlay them on the established pathways. Tools for researching an outlying variable, etc., etc., an absolute ton of free stuff.

I know...there are several big sites out there with cool random tools all dumped together. What sets this one apart? Well, its really well organized, it has a slew of nice tutorial and introductory videos and it flows really well once you watch the 8 minute video that explains the overall goal of the project.

Why did I break this one out? Cause I've got a list of ~1,000 statistically valid differential-ly abundant metabolites from a Compound Discoverer 2.0 run and this thing seems to know how to make sense of them!

Wednesday, March 9, 2016

Blast from the past -- Observable peptides!

I'm giving a couple of talks today and I was digging through my references.

In 2011, this team in this paper found that in a normal run on 2011's high end instrumentation, they could see via MS1 that there were >100,000 peptides (or features) available in their samples. They also found that they could only ID about 16k of them.

Funny thought here...I have a Lumos RAW file open and in front of me and I've got >100,000 individual fragmentation scans that were obtained on a 2 hour gradient. The instrument was set to ignore isotopes and make exclusions based on the uncharged mass, so even if it saw a +2 and +3 precursor of the same peptide, it ignored all but the single most intense (if it was the +2, it will not fragment the +3! Yay for tons of onboard processing power!!!)  So...I should really have fragmented 100,000 unique things in this run.

Interestingly, even if I run this file with multiple alogorthims using tons of possible PTMs, alternative cleavages, a seriously awesome database, export the unmatched spectra for de novo sequencing with the awesome DeNovoGUI and combine everything with something as close to a 1% FDR as I can get, I still only get about 40% of them matched out.

What the heck is the rest of this stuff??  They sure look like peptides from their isotopic distributions, they stick to C18 and elute like peptides....seems like there are some HUGE components of the human proteome that we don't know about yet (or maybe tons of little things?!?). Either way, I'd totally appreciate it if somebody would figure out what that other stuff is just so I'd know. If it is really a major biomolecule class we don't know about, maybe you can collect a Nobel for your troubles.

Monday, March 7, 2016

Natural genetic variation measured in nematodes shows mRNA levels are a small part of the picture!

...darned nemotodes...

Wait...natural genetic variation? This is a proteomics blog (most of the time..) don't you need PCR or some super sensitive genetic technique to measure genetic variation? NOPE!  You can do it with mass spec, and you should be doing it with mass spec, and we need to get that into people's heads out there!

This brand new study from Polina Kamkina et al., is a great example. In this study they work with some hungry hungry nemotodes using SILAC for protein abundance and microarrays for transcript abundance. What did they find? That "...only a fraction of the changes in protein abundance can be explained by changes in mRNA abundance."

This isn't particularly novel. We've seen this before. In this nice model organism, they come up with a higher correlation between the transcript abundance and the protein abundance than we've seen in some other systems, but its still just a fraction.

Which begs the least a little...why does it appear that transcript abundance is still king over lowly protein abundance? Which may be another rant for another time. Maybe...

From a transcript level we might argue here that microarrays aren't exactly the cutting edge, so its a somewhat unfair comparison of technologies, but this organism is pretty simple and extremely well characterized.

Anyway, this group used SILAC and a couple of very divergent nematodes. They did some top-notch proteomics here with sample pre-fractionation with high-pH reverse phase into 47 fractions. They pooled the samples and normalized the fraction pooling content by UV absorbance at 214nm and use their own in house peptide mixture to QC their separation (nice extra steps!)  They quantify using a combination of MaxQuant and Progenesis IQ and do their enrichment analyses and statistical analyses using some R packages. To top it off they go back in on their quantifiable targets and verify them with Skyline. This group doesn't mess around. If they say its differentially expressed, its differentially f'racking expressed!

All the data at the end was uploaded into the WORMQTL (which I assume is something well-known in the nematode world. And the RAW data was put into PRIDE.

Absolutely a solid analysis that is another great example that you need to look at the proteome level if you really want to know what is going on in your organism. Again, stuff we're all to aware of, but that the biologists out there really need to get on board with!

Sunday, March 6, 2016

Full out naive human stem cells successfully derived!

Stem cells have been some confusing business. Its a hot topic, so you can make a name for yourself real quick by making a new development and marketing it well. There was also a couple of false alarms, but this one looks real!

The paper in question is from Ge Guo et al., out of the University of Cambridge. At first glance it doesn't seem like a big deal. People have taken loads of different cells, put them in the right media (almost always pumped full of so many artificial growth factors that, while you could learn a lot from them, aren't really what we're looking for).  And there have been some awesome successes in mouse cell lines. Which, again, while cool isn't really what we're looking for.

An interesting aspect of this work is that the techniques don't appear revolutionary at all. It looks like they just cut out all the shortcuts. Another tricky aspect appears to be the controversial nature of the field along the grounds of what a real naive stem cell is, versus a cell that expresses a certain set of features. Honestly, this is a really interesting paper. There are big differences regarding glycolysis (particularly based on mitochondrial functions), the level of DNA methylation, activity of a slew of different pathways, etc.,

But to this un-expert eye, it seems like they put a pretty solid list of requirements together and it looks like they met them. I'm gonna assume the reviewers of something called "Stem Cell Reports" would be ready to lay the smack down on some bad science in this field.

Not proteomics, but the most interesting thing I've read today!

Friday, March 4, 2016

DOE approach to optimizing negative ion coverage in LTQ Orbitraps!

I haven't seen a DOE paper in a while!


is a reverse engineering approach for figuring out how to optimize an experiment. If you go to one of the world's best engineering schools (go Hokies!!), even as a biologist, you're gonna here a lot about DOE. It takes all the guess work out of how to best do your experiment.

Dave Muddiman's group did a great one of these back in 2011 for Proteomics. Totally worth taking a look at if you've got an LTQ-Orbitrap for proteomics.

Nik Lomonakis et al., said "wait a minute! nobody's done this for small molecules that ionize in negative mode? Let's do it!" And...arguably people that do this stuff need it more than the proteomics groups, with hundreds of apps notes and thousands of publications to pick from. There is less information out there for people in the negative small molecule field, so this is a great new resource!

Thursday, March 3, 2016

What goes into making Uniprot?

Call me a boring, unimaginative biologist, but I love Uniprot. I know, I know, you can get more PSMs with RefSeq and Trembl or whatever.

But if I want to turn my PSMs into something that I can actually track down, Uniprot is the way to go. Manually annotated, no silly "putative - whatever - whatever" "predicted -whatever- whatever", "Open Reading Frame - whatever -whatever"....


....everything mapped to a universal gene identifier so I can cheat and use downstream analysis tools the genomics people have been nice enough to build for us, sometimes decades ago?

( this song is in my head again. Good going, Ben...)

What was I talking about again...OH YEAH!

Well, if you want to know how painstakingly the good people at Uniprot edit what you can freely download, check out this brand new paper on the Uniprot Human proteome!  Just like the Uniprot databases, this article is open access!