Sunday, January 22, 2017

Monitor your instruments with MS Instrument Connect on the Cloud thing!


Do y'all know about this thing? I think it might be a serious asset!  I guess it has been around for a while for a lot of other instruments -- like the thermocyclers and things -- but it is now live for mass specs via MS Instrument connect!



It runs through the Thermo Fisher Cloud and it has a simple user interface that allows you to monitor, in real time!, up to 3 instruments from your PC and possible from your mobile devices!

What it does:

1) Overview of your instrument(s) performance and status



2) In depth analysis of the runs if you want to check the status


3) Customizable email alerts if there are errors!  Right?!?  I don't actually want to look all the time, but I just want an email if the instrument stops (or once my Sequences complete!)



4) It isn't TeamViewer or GoogleConnect or other widely used (and therefore attractive targets for hacking...) Seems like these are being blocked at more and more institutions in my area....hacking sounds like a lot of work...is somebody gonna really want to spend a lot of time to get a few LC-MS labs? Seems less likely to me!  (Scary screenshot from my newsfeed a while back...)


The JHU Center for Proteomic Discovery has been beta testing this for their Lumoses and I've been hearing good things.

There are some requirements.

You have to register with the Thermo Fisher Cloud.

You're instrument(s) need to be on newer Foundations/Xcalibur combinations.


And you need to download Applications to your network connected instruments and your devices you want to monitor them with. There are good instructions to walk you through setting it all up.


Saturday, January 21, 2017

You can't characterize transposable elements with proteomics!! Wait -- what?!?!?


I admit it -- I'm blown away by this one. To the authors of this study -- I owe you a round of drinks. Seriously. See you at ASMS?

1) You can't do proteomics of transposons!
2) You certainly can't do it on a non-model species without 100% coverage, 50x translational coverage and perfect annotation
3) You did it on the yellow fever mosquito? The vector of all sorts of murderous diseases?!?!?

You deserve a medal -- and a round of drinks. Warning: I might hug you. Kidding, probably!

This is the paper -- and everyone should read it (Open Access even? Of course it is...)


1) First off -- why is this a big deal? I'm glad you asked!  The reason I said #1 above is that transposons are inherently chaotic from a genomic/proteomic sense. As this stolen image shows...

...the transposon moves around and interrupts things. There are systematic reasons/regions for this, but overall they are tough to deal with. If you're doing proteomics with your nice, curated UniProt database alone and you've had some jumping genes (transposons) in the proteome of the organism that you've actually digested -- well, you aren't going to find the area where the transposon messed up (okay...maybe if it has 2 copies...but that's another problem for another time...let's simplify it).

Hopefully it didn't land in the middle of your coding region or blow up a start or stop codon, but if it did you might not have MS/MS spectra to match your database (cause the genetics just don't match anymore).

2) There is a reason we develop new methods on E.coli and C.elegans and D.melanogaster --cause we know just about everything about them. A piece of a gene from the model coliform bacteria ends up in the wrong place -- we can figure it out, probably. Outside of our model organisms, there is an awful lot of chaos.

3) This mosquito sucks. It is known to transmit at least 5 viruses -- and it is awesome enough that, on rare occasions, if it bites you once it can infect you with more than one virus. We need to know more about this thing!

Okay -- so how did they do this?  They used a technique called PIT (Proteomics Informed (by) Transcriptomics. It is detailed in this Nature Methods Paper from 2012 (from several of the same authors of this study).

In a nutshell -- they de novo sequence the RNA transcripts that they find with their fancy next-gen sequencing equipment. Once they have those, that is what they search their MS/MS data against. If you want to do this yourself, you need Galaxy (if you have a big genomics effort at your institution, chances are you already have a server loaded up with Galaxy programs...you may have to leave the safety of mass spec cave for a bit...the comforting sound of roaring vacuum pumps will be there when you return...not making fun of you guys...I'm with ya'!) and you need this GitHub package.  Implementation of this into Galaxy is thoroughly detailed in this open! study from last year.


...only inserted because it was far too many words in a row without a picture. Bernie looking indignant after being accused of stealing my shoe cracks me up every time (and made the front page of Reddit/r/pugs a while back!)  Back to seriousness!

Let's go back to #2 and #3 from above -- using a method like PIT is gonna be a whole lot easier on a model organism...but we have a sequenced genome for this sucky mosquito!  Why go to all the trouble of PIT?

...cause PIT shows that the genome of this organism needs an awful lot of work!  Think about it -- we're flipping the paradigm here! Traditionally, the thought is that we can only identify peptides that are informed by our genomic sequence. Here -- we are leveraging the transcriptomics to give the power to take the MS/MS spectra and show where the genome is wrong! They can go into specific examples of these regions where the spectra and the genome don't match and figure out that this area of the genome, for whatever reason, had low sequencing depth -- or was misannotated or things. You know, 'cause the mass spec never lies.

On this topic -- and something that is only of very minor concern here is that the data was acquired on an Orbitrap Velos running in high/low. Is there a little more wiggle room in the peptide sequencing data because of the lower resolution/lower mass accuracy of the ion trap? High resolution acquisition of the MS/MS spectra as well might very well strengthen the findings of this beautiful paper, but on an OV you are going to take a hit in overall peptide sequencing depth and I can't disagree here that depth was more important.

Okay -- finally back to #1!  The transposons! Transposable elements have characteristic regions cause transposases (I think that's what they are called) leave specific signatures behind -- I forget the details and I'm losing motivation -- this one is taking a loong time! The unbelievable sequencing depth this group has from the transcriptomics + proteomics allows them to find all of these (they call it the mobilome!---adding that to the translator for sure!) Some of those places where the genome and PIT disagree ends up being inserted transposable elements -- and with information from both the T and P levels, it is darned convincing!

TL/DR:  Amazing study with publicly available tools shows how proteomics/ genomics/ transcriptomics can be leveraged together to massively improve our understanding one of earth's worst pests. Ben puts awkward image of him hugging authors with his freakishly long arms into people's heads.

Friday, January 20, 2017

A changing of the times...?

With a title like this on a day like this, you probably think I'm talking about something happening in about an hour from me in Washington.

Don't worry, my level of denial regarding that series of events is nearing 100.000%. I don't even know what you might be implying.

What I'm actually thinking about is the abundance of these pre-print articles that keep popping up in fields outside of our own!


This is the topic of this article in this month's The Scientist (my 4th favorite magazine stacked in my bathroom -- and probably yours!)

I know this is controversial -- and there are big 'ol weaknesses to preprints -- like -- THEY HAVEN'T BEEN PEER-REVIEWED YET!  -- but -- the up-sides can't be ignored either. It doesn't seem uncommon for a traditionally published article to be in some stage of review/revision for 2 years, right? I'm seeing quite a few.  And...you could maybe argue...that these kinds of lags aren't exactly allowing science to always proceed at the rocket pace other areas of technology might be able to (and maybe that is a good thing...but....)

I figured it was time to start talking about this when one of my favorite recent studies in our field went the open / pre-print route!


F1000 is an open and transparent publication system.  It works like this:



In the case of this paper from Breckels et al., I can see exactly where they are in the review process. As of this morning, 2 reviewers have checked in and it looks to me like it is about to be accepted with some revisions. I also know who is reviewing it, and I can read their comments. This is just a little different than what we're used to. Is it the right system? I dunno...but it sure is fast!

The thing I'm seeing a lot more of...but not proteomics yet (well...some nice meta-analyses) is biorXiV (not sure I capitalized the right stuff).

This resource from CSHL is a hybrid system. It allows the posting of research and comments before the study goes to a journal for publication. It allows you to have a place holder for your findings to some degree while the final work is finding a permanent home. Your data gets out there, allowing the next person to build on it, but you can definitely stake your claim that it was your development.

The next question, of course, is this nice statement we see a lot of the times when our we submit and article for review:

If you read the beginning (this is for MCP) it sounds like you can't use a pre-print service, but if you keep digging you'll find this isn't the case at all! You just need to disclose all the info (further down in the highlighted zone).


YEAH!! Go MCP!

You have to download the JPR author guidelines PDF to find that that they're already prepared for this revolution in proteomic data dissemination


(WOOHOO!!) ...which definitely makes it seem like I'm the one who is behind the times!

Thursday, January 19, 2017

SRMs for targeted verification of cancer!


I have a great subtitle for this nice new paper from Anuli Uzozie et al.,currently in press at MCP.

Subtitle: "Why a biostatistician should be involved at the beginning of your project".

At first glance, this is a boring paper. They've got a bunch of samples and they've got a bunch of discovery data so they design SRMs on a relatively small number of interesting targets (their heatmap shows less than 30 at the end of the paper) and run these samples on their TSQ Vantage. Sounds like they forgot to write the last chapter of their previous paper and here, somehow, they got it into MCP.

However, this might be the best validation exercises I've ever seen. It really makes me suspect one of those fancy biostatistics people was involved in this from the very beginning -- because they get so much data out of it! Start with the fancy randomization stuff (boring...but good science...) and the level of downstream statistics that makes REALLY impressive conclusions from relatively little data -- and I'm walking away from this paper a little bored (which might just be 'cause I'm typing rather than finishing my coffee) but seriously impressed....

...and wondering if I designed my experiments just a little better if I wouldn't have just a whole lot more data out of every run....?

Wednesday, January 18, 2017

Absorb entire cells into dried acrylamide gels for digestion!??!?


This study is paywalled...and you can't find any details until you get into the Supplemental info, but it is still neat enough that it is worth talking about!


What is it? A new and enormously efficient looking digestion method. They seriously make polyacrylamide gels dry them down completely and then add a solution containing a solution of whole cells (looks like they used non-adherent cancer cells, but I forget now).

Once the cells are inside the gel they basically treat them like a normal in-cell digestion. Compared to what they get with their normal in-gel digestion methods (and something called pro-absorb) they get massively more peptides and proteins per run on their Orbitrap Fusion.

Personally, I'd never have guessed that whole cells could be taken up like that! And, in their analysis, there were far fewer missed cleavages than the other methods they used. Now....you might argue that the total number of proteins they get out of all 3 methods seems a lot lower than what normally expect to see with more traditional methods like in-solution digestion or FASP, but this paper is clearly focused on showing the promise of this novel cell-absorption/digestion method.


Tuesday, January 17, 2017

Monday, January 16, 2017

Biodiversity pathway-centric data browsing plug-in for Skyline!


It has been enormously exciting (at least for me!) to watch our field develop over the last decade or so. Every year we get more and more coverage of our proteomes and the file seem to have more data that we don't have the algorithms to extract yet and...yet...it is also getting kind of intimidating! Discovery is great, but we still want to be able to easily do hypothesis driven research -- which is tough in a datafile with 1e5 spectra and 40,000 quantified peptides!

If you wanted a tool to narrow this down a little without learning a completely new software interface, you should check out this study from Michael Degan et al.,!


I'll be honest, I don't have hands-on time with this yet. I just downloaded and unzipped it, but I'm running out of time this morning already. The concept is good enough that I'm gonna trust them that they executed it as they described.

Okay...one super minor criticism...it was a little tough to find where I actually downloaded this thing (okay...accidentally closed the tab where I got it from....then found it tough to find a second time, n=2!), but you get the software from GitHub here. While I'm at it, here are the instructions for the software!

Why would you use this? If you are about 75% of our field, you probably are already using Skyline (I made the number up) so you're not going to have to install and learn a new software interface. And...unless I'm completely mistaken -- Skyline has never had features like this at all. You tell the Biodiversity Plug-in your organism of interest and it populates known pathways for the organism -- and you use those pathways to zero-in on what you want to study -- and browse the quantitative data for those pathways in the RAW files you've loaded!  RIGHT!?!?!?

Not excited? I don't think I explained it right.

Say...you've got all this data from high-throughput screens of your drugs that suggest that this drug is inhibiting cell growth by messing up central metabolism.  You could process your 400 RAW files the traditional way and then dig through the resulting 5e6 identified and quantified peptides which will take at least a couple days of crunching time -- regardless of what you are using to process your data OR...you can focus on the pathway you are interested in....


This gives you a % coverage OF THE PATHWAY YOU ARE INTERESTED IN!!!  80% of the peptides in the proteins in this PATHWAY (sorry, having caps locks problems) are contained within this library and easily accessible from within this plugin.

Then you input this specific hypothesis-driven data into Skyline and can look at the quantification of the peptides/proteins in this pathway in all your cells. The data reduction alone means that you can do this in real time (rather than 2 days to process...and maybe a day to filter later).

Even if we ignore the amount of data reduction you have here -- I'm going to guess that you've collaborated with someone who was aghast at the big Excel spreadsheet you gave them with all the IDs and quan at some point -- when they just wanted the mass spec to confirm their hypothesis. And this might be the easiest way to get right to this that I've ever seen, both for you and for them (once they download the amazing free Skyline software, of course!)

Worth noting --- this Plug-In has more features than what I'm focusing on here. I'm selfishly just going after what I find the most interesting -- and since it is in Skyline, I think this is going to be compatible with every instrument type and vendor and scan-mode (they specifically mention that it works with data-independent (DIA) type experiments as well.

Sunday, January 15, 2017

ASMS 2017 -- Abstracts due February 3rd!!


This public service announcement is brought to you from the beautiful and progressive state of Indiana! If you don't want to miss your chance to see it you need to get your abstracts in for ASMS 2017 by February 3rd!!!

Here is the direct link to the abstract submissions page.

Saturday, January 14, 2017

Do you hate PD 2.1 and just want to run it like PD 1.4? I made you a template.


I was having lunch with 2 of the most skilled proteomics guys I personally know -- and both of them talked about how they're still using Proteome Discoverer 1.4 -- and hate PD 2.x. I understand, for real!  It is a new architecture -- and with that backlog of samples you have, there isn't a lot of time to learn new software interfaces.

So I put this together this morning and maybe it will help? I call it the "I Hate Proteome Discoverer 2.1" Analysis Template.  Disclaimer: This is for simple peptide ID runs. Maybe I'll do a quan one later.

Step 1: Open the accursed new version of Proteome Discoverer and start a new Study, name the study and choose your RAW files. Ignore the processing templates and other things (you can click to expand these pictures)


Step 2: Download this template from my personal DropBox account here.  Depending on your browser, you may need to right click here to download (I have to say "save link as").

Step 3: Ignore all that junk in the big grey box. All of it -- just Open the Analysis Template you just downloaded, then Click the weird little button by the "Processing Workflow" text.


Now you are in a window that looks just like what you're used to in PD 1.0 - 1.4 SP1, right? The consensus workflow is set up and you don't need to bother with it. Go to your search engine, add your FASTA file, adjust your tolerances, all exactly the way you do in the version of PD that you don't hate and you're almost set.  All you need to do is get your RAW files into that little box below the number 5 (above) and hit the "Run" button.

Step 4: Click the Input tab near the top of the screen to get to the Raw files you added earlier. You can also add more raw files here.  Click on the Raw file you want and drag it over to the window below the processing tab. It takes a few tries to figure out where you need to click (and where you can't) to drag the files over. You can hit the <ctrl> button to click and highlight more than one file at once, just the way you would in Excel.   Hit the Run button!


Step 5: Go to the Job Queue and open your processed files. You'll need to open the Consensus workflow for each file and you're golden.


Now...the output report might not look like what you're used to. I haven't used PD 1.x in a long time, and I don't remember what it looks like (and it's Saturday, I'm gonna get out and enjoy some of this snowy day, rather than work on this blog all day!). If you hate the output and have suggestions, email me what you want it to look like and I'll see if I can create a filter method that I can add to that download that will make the output closer to what you want.

While I might not have LOVED the PD 2.x architecture at first -- I immediately preferred the PD 2.x output reports -- but if it is bugging you, let me know, I can take a swing at it!

Wait --> One more disclaimer before I put on my boots --> the results from this template may not 100% match what you get in PD 1.4, because of 2 changes in PD 2.x --> one I could change to PD 1.4 format and one I don't think I can change.

1) PD 1.4 only does false discovery rate filtering at the Peptide Spectral Match (PSM) level. PD 2.x can also do FDR at the peptide group and protein (and protein group level). I left the peptide group FDR on. There is too much evidence in the literature that this step is essential to getting the best data for me to recommend turning it off. There is a video over there to the right where I discuss this and show you how to turn it off if you need to match your peptide IDs exactly.

2) Parsimony for protein group identification. In PD 1.3 and 1.4 (maybe the earlier ones -- it has been too long for me to recall) when we have equal evidence at the peptide level that would equally support the identity of 2 distinct proteins, the protein that would have the highest percent coverage is made the top hit in the protein group ID. In PD 2.x, under these same conditions, the most intact protein reference from the FASTA is chosen for the group ID. The protein (not protein group) list is unchanged. I absolutely love this change because most databases have alternative cleavage events and partial protein fragments that, in the older versions, would get a higher group ranking -- in big databases you'd almost never see your intact protein -- even though it is probably(?) the most likely one biologically to be actually present.

That's it for today-- Gusto is wearing a scarf and he's ready to go!



Friday, January 13, 2017

Cell-specific proteomics for biological discovery!


I know my Twitter feed keeps telling me I should be boycotting Elsevier for something or other, but if I had I'd missed this stellar new (and open access!) review from Shannon Stone et al.,.


Half-way through this I had some serious questions regarding the background of this lab. I mean this in a very positive way!  It is clear from the offset that this isn't coming from either a classically trained genomics or proteomics lab. Dr. Tirrell was originally a polymer chemist and engineer, but his lab is applying these approaches to biological systems. I suspect we're going to hear a lot from them, because what they are doing is not only unique (to my knowledge!), but it also seems seriously brilliant.

While researching "who the heck are these people?" I ran across this paper from last year and it deserves more time than I have here (they induce a mutation so they can chemically tag proteins from just their organism with the mutation!).

If you say to me cell-specific proteomics -- I'm immediately going to immediately think you are talking about flow cytometry or laser capture microdissection. How else are you going to separate distinct cell type populations for proteomics?

This is NOT, however what this group is focusing on! They are using chemical labeling strategies that will allow them to either physically separate the cells OR to just be able to tell afterward from the proteomics or transcriptomics where that particular cell came from!

This paper is OPEN, so I'm gonna borrow part of a table to show you why you should check this out!

(Hopefully the reason I'm not supposed to be reading Elsevier isn't because they've been suing bloggers for posting parts of tables....if it is, someone please let me know!)

This is the center piece of this awesome paper. The method -- what you can study -- a brief overview -- the two "Yes" is whether it is compatible with secreted proteins then whether it is compatible with PTMs. They then go into advantages/disadvantages of the techniques!

Each one of these method bears further investigation!

Why? Oh -- because sometimes you just can't separate the individual cells. And -- this is the real advantage in my mind -- maybe because I've got a beautiful set of runs of mixed peptides from primates and Alveolata on my desktop and I'd give my right arm to know which peptide came from which organism -- if we had cell-specific labeling protein labeling techniques that we commonly used, can you imagine the reduction in protein inference problems?!?!  It would be staggering. Just the data reduction alone at the end -- not to mention -- the possibility of using the instrument to only focus on the tagged proteins -- but I'm getting just a little ahead of myself.

It is worth nothing that most/all of these methods seem to be for cells that are actively growing and dividing.

Summing it up -- this paper is seriously worth a read -- at the very least check out the complete table!

Thursday, January 12, 2017

ArgC-like digestion!


This new study at JPR is great -- and introduces a promising alternative to traditional enzymatic digestion!


In theory, ArgC is a great enzyme for shotgun proteomics. If you only cleave at the R residues and skip the lysines, we're going to have half the number of peptides floating around. Sure, the peptide chains will be longer, in general, but with normalized collision energy like the ones that are automatically working in the background of today's instruments -- this really isn't going to be a problem.

There are two problems with ArgC -- it is expensive (relative to sequencing grade trypsin...which used to be pretty expensive itself...though...only us old guys remember that...) and, honestly, ArgC has some specificity issues (it isn't very good).

Solution from this group? Why don't we just block the lysines!?!?  They chemically modify the lysines in an E.coli digest and then digest with trypsin -- and it works!  They get cheap digestion that ends up looking better than any ArgC digestion I've ever done.

Super interesting note here for us nerds -- they do their peptide mapping with a MALDI-Orbitrap XL! There are so few of these around (2 in Baltimore...which may be more than any city in the world!) and it always makes me happy to see one of these awesome things in use -- especially for proteomics.

Wednesday, January 11, 2017

UniPept -- a CompOmics resource for metaproteomics!


CompOmics is churning out awesome tools for our community faster than I can keep track of them. Somehow, unsurprisingly, I completely missed that they've got a full metaproteomics pipeline up.

You can check out UniPept here!

I don't want to waste my time and yours going over the features listed on the front page -- but the "Unique Peptide Finder" and "Peptidome clustering" are tools I've never seen before in a web interface.

Wait---I just wasted all my coffee time playing with the clustering tool -- this is unbelievable. How did I not know about this?!?  Check this out!


You can load the proteomes of over 15,000 species for easy unique peptide identification (for species differentiation) and for clustering the peptidomes.  I don't know what they have this housed on, but these analyses occur almost as rapidly as I can add a new proteome/peptidome to the analysis.

In this analysis I went with a simple model organism, Bacillus subtilis and some close relatives. You can interact fully with the graph and immediately download the table or the figure as is. The green line is my favorite part -- and the most mysterious (where is this paper?!?) it gives you a measurement of the "unique peptidome" of each organism. So...how many peptides that, by our current knowledge of the genome/proteome where if you find that peptide -- this organism is present!?!?

This tool is brilliant -- absurdly powerful -- and bears further investigation!  Shoutout to the EuBic Winter school for all the Tweets, one of which led me here -- and to @PastelBio for keeping a huge curated list of proteomics databases so I could easily hunt this down!

Tuesday, January 10, 2017

Perseus Nature Methods paper!


EDIT: Honestly, you should probably skip all the stuff I wrote below and check out this recent paper detailing how powerful Perseus is!  (Shoutout to @UCDProteomics for the paper link)


Over the years I feel like I've been just a little harsh on the amazing free software the Max Planck Institute rolls out for us every year. I don't mean to be. I'm honestly a huge fan! Seriously. I only switched over to using commercial proteomics packages (PD 1.2, Mascot, PEAKS) when I felt like they caught up to what I could get out of the release of MaxQuant that I had instructions for.

In the end, I might honestly think that MaxQuant/Andromeda/Perseus is probably better than any commercial software out there. The downsides are linked to one of the big upsides of all open software. It's free. And evolving. The Max Planck software, honestly, is also kind of intimidating. Holy cow...that is a lot of features!!

But...the power....you have is seriously amazing as a user!  This is how I stack it up in my head:



I only made the first part -- and then I felt like it was confusing. Then I tried to clarify it at the bottom. Then some neurons way deep in my brain started making me quote Skeletor (warning..audio)  -- thanks...brain...not sure where our keys are but you're holding onto a 5 minute monologue from a Dolph Lundgren movie...we're gonna need to have a talk about that....what was I talking about?

The top part!  There is no factual basis to that chart. It is just how my...obviously glitchy?...brain stacks these things up.

As a user -- commercial software is super awesome. There are instruction manuals, the software is tested for bugs by people who get paid to test it and there people you can blame if you aren't getting the data you want. You are, however, generally limited in the features you can set and the outputs you can have relative to the other things on the list.

The people at Max Planck, one could argue, are pretty good at proteomics. They design expert software for expert users. It can be intimidating to pick up, but you have a ton of control over the input the output -- the beautiful experimental designs.

The only way you can have more power -- complete power over your input, output, design, everything else -- is to go full out bioinformatician. You break out the R and the Python and you tell the data what the heck you want it to do!

Wow. That was a lot of words. Maybe I should summarize it at the top and here. Perseus is awesome! You should check out this paper!

Monday, January 9, 2017

SILAC proteomics evaluates link between sex hormones and kidney diseases!



It is no surprise to most of us to hear that male sex hormones are dangerous. There are obvious reasons why the life expectancy for male humans in most countries is far lower than that for women.



This stellar new paper in press at MCP from Sergi Clotet et al., explores some less obvious ones!  In this work they use SILAC proteomics to study primary kidney cells that are treated with male sex hormones.

Surprisingly -- they find some pretty massive perturbations in central metabolism -- that suggest a strong link between diabetes associated kidney disease and these hormones!

One interesting note for us downstream data nerds is that they used an add-in in CytoScape called BinGO to develop the GO and pathways stuff and to select proteins for validation via Western Blots.

Great experimental design and methodology with a surprise biological ending?

Sunday, January 8, 2017

Great...now I have a protein list...what do I do next...?


This weekend I added a new page to the side of this blog. Over there --> somewhere! Or link to it here. It is by no means a comprehensive list, but I hope that maybe new people to the field might find it useful. It is just the programs I, or people I trust who do this stuff professionally go to after they have some proteomics output.

It is a work in progress!  Please, feel free to add what your go-to programs for making sense of these big protein lists in the comment section and I'll add 'em!