Just for clarification sake, I heard some people were looking for me. I'm not at ASMS. I worked in St. Louis (un-related to ASMS) over the weekend and am in town so I saw some of you guys!!!!
Unfortunately, I have to head to Chicago to work with another lab tomorrow night. I'm hoping to wrap up my work in St. Louis early enough to catch a talk or two before I grab my plane north.
Sunday, May 31, 2015
Saturday, May 30, 2015
In PD 2.0 we see in every FDR box, even the once empty Fixed Value PSM validator, a line that says "Maximum Delta CN". Honestly, this has always been a little unclear to me, but we worked it out this week and then I thought of a good way of illustrating it.
While Sequest is going through matching theoretical fragmentations to your actual MS/MS spectra, it commonly comes up with more than one match. And it keeps them and gives them a score. The closest thing I could think of was the way that PepNovo+ does it. So I went and downloaded the newest version of the awesome and free de Novo GUI and did a quick run with their example data set.
(Click to expand). Now, this MS/MS spectra is exactly what I was looking for. It isn't an incredible spectra and the de Novo GUI came back with 4 possible sequence matches and the scores are pretty similar. The fist one is 43.98, the second is 41.48, and the last 2 are tied at 41.17. The delta CN is the degree of match between the scores of the possible peptide spectral matches (PSMs). For example, the default is 0.05. So we're saying if there are multiple matches we take the best one and anything that falls within 5% of that match. 5% less than 43.98 is 41.78. So, even though it looks like these three possibilities are pretty close we're only going to keep the very highest scoring one. If I changed it to 0.10, though, all 4 of these matches would be kept and it would be up to Percolator or target decoy to sort them out.
What is cool here is this: it is actually theoretically possible to get more peptides identified than you have MS/MS spectra. If there were lots of possibilities that were highly similar. If you manipulate the Delta CN cutoff you can actually ensure that. If you are using Scaffold to perform FDR on your Proteome Discoverer results, you actually want this to be the case. You set the Maximum Delta CN to 1 so every single match that Sequest makes goes into Scaffold and it does the rest.
Side note: I haven't downloaed the de Novo GUI in a while. And it keeps getting better. I know they were planning to add a BLAST function and they did! I can go to that match and just hit that little magnifying glass to the right and BOOM! it goes right to the BLAST interface and inputs my sequence into the BLAST bar. If you haven't checked this out you should.
Friday, May 29, 2015
So this resource is currently under works and I'm super excited about it. I'm just going to put it here on the blog so I can refer back to it later. I want to see what this becomes!
In conjunction with this, check out this funny little video that is on the page. Lets start on the street and ask people "what is proteomics?"
Wednesday, May 27, 2015
One reason I was excited for the trip was the chance to revisit one of my favorite posters from ASMS last year. This poster showed a great workflow for integrating proteogenomics data with the use of the GalaxyP resources (here!)
While the publicly available GalaxyP interface doesn't have the power that the user's at UMinn have internally, it is steadily growing. If you haven't visited this interface before (or in a while) you should probably check it out.
Here are some highlights:
Raw data conversion resources (online! w00t!)
FASTA merging with data redundancy removal! (the mechanism looks for full sequence homology within FASTA entries and drops anything that matches up to the first one it sees. Useful to me, for sure!)
Access to proteogenomics tools for converting nucleic acid data to protein sequences.
What they currently have internally that hasn't been implemented outside of UMinn is even more exciting...such as the ability to blast peptide sequences to aid in the identification of new and novel proteoforms.
If you are in Saint Louis next week I suggest you swing by Pratik Jagtap's poster detailing the progress of this package.
And if you happen to be in London around July 6th this year, you should drop in on the Galaxy user's meeting. Galaxy is a HUGE project with developers around the globe trying to simplify genomics - and now proteogenomics - for the rest of us.
Tuesday, May 26, 2015
Umm...so, how the fudge did I not know about LogViewer before? Seriously?!?
Published in 2011, used by lots of people. Never heard of it.
You can read about it here. It, in conjunction with RawXtract pulls a ton of useful data out of Thermo RAW files and plots it out. I have a crazy slow...seriously phone line modem slow...internet connection on this flight delay so I can't seem to get it all downloaded.
Now. The interesting thing here: RawMeat suffers from an issue called "integer overflow" if you give it RAW files are too big and it generates some weird data. Does this cool little program (that I currently have downloaded 6% of...) suffer from the same issue? In other words, can I feed it a Q Exactive HF file without getting weird stuff back? If you know, let me know! If this is new to you, check it out here!
Saturday, May 23, 2015
Human tears are surprisingly complicated. They are full of lysozyme to break down proteins and pathogens but they have been shown to have other anti-microbial activities. To figure out where these are coming from Mikel Azkargorta et. al., performed an in-depth peptidomics analysis of some human tears. Their analysis looks something like this:
Turns out 2 of the things they found look like (in modeling experiments) antimicrobial compounds we didn't know about before!
Friday, May 22, 2015
Hey! Do you guys know about this nanopore thing? Its mostly new to me but its really cool. Imagine this: you have a bead with a hole in it that is only a nanometer wide. Then you apply a strong electric current to it to pull stuff to want to go through the hole. But even single stranded DNA can only go through the hole one way. Linear and straight through. So it comes out the hole one nucleotide right after the other. Boom! Easy sequencing!
So...this has been going on for a couple years in genomics. And you can read a lot about it if you're interested. I stole the image above from this wikipedia article. One of the cool things about this technology is how small it is...and how little power it requires. You can literally power it off the 3.5V that comes out of your USB connection. These are commercially available. DNA sequencing wherever you are
They can make these this small because the detector is just the voltage shift caused by the charge of the very next thing through the pore. Since there are only 4 bases in DNA a tiny little voltage shift is pretty easy to see (it can only be one of 4 things, right!).
Okay, so why am I rambling about this. Well, for one I'm kinda drunk (its Friday!). And for 2 the next thing? The thing that tons of big name people are working on? Pulling proteins through these dumb little pores! And you know what? It has worked some here and there!
Check out this paper from Christain Rosen et al., In this work this team pulled phosphoproteins through the nanopore. Phosphos have a lot of charge! So they could tell exactly when the phospho came out - BOOM - voltage shift!
The next trick, the one people are working on like crazy? Getting detectors sensitive enough that they can tell which amino acid is coming out at once. Minor voltage shifts (and 26 of them!) but it doesn't sound like that big of a problem considering that we can measure the mass discrepancy of a single electron without trying all that hard.
Tuesday, May 19, 2015
This question has come up a couple of times so I figured I'd better cut a screenshot. In old Proteome Discoverers we could open any MSF files we wanted by highlighting them and they would open as a "multiconsensus report". There is a problem when we do this, however, we get no control over how it combines those files.
In PD 2.0 you can still create multiconsensus reports from your .MSF files but you must combine them into a .PDResult file and you have to do this through a new consensus step.
In the screenshot above I've opened one of the studies I've been messing around with. (Please ignore all of the red exclamation points, I learned this software through trial and error...lots and lots of error!) Anywho, if I highlight multiple .MSF files then when I click the Reprocess button my only option is the one above "use results to create new (multi) consensus report." w00t!
When I do activate that I get this analysis pop up.
The processing steps that generated these MSF files are locked up. But you can go in and edit the consensus steps. Things like how you deal with the quan and how many peptides per protein and FDR cutoffs can be set up the same between these files. Since the Consensus steps are generally pretty rapid you get your final Consensus .pdresult file filtered tthe way you want them without waiting too long and everybody wins.
Monday, May 18, 2015
This article gets a picture of my puppy, because you don't get anything interesting if you Google Image Search False Discovery Rates. And if you Google "FDR" you get pictures of some dead guy.
The article in question is this one from Mikhail Savitski and Mathias Wilhelm et al., and is currently in press at MCP.
What is it? A new way of doing FDR.
Why would we need one of those? Don't we have several? We do, but they have drawbacks. Target decoy can't keep up with big datasets and databases. Don't believe me? Run your most recent samples versus the organism database and then run that exact same sample versus the entire TREMBL database. The number of decoy hits go through the roof and your positive protein IDs drop through the floor.
Well, how does this group propose we do it?
Okay, this is pretty smart. Say you have a peptide from a protein that comes up positive. They take a another peptide from that protein and set them off together as a pair. Then when you do the target-decoy match you have successfully narrowed down the data rather than taking the entire database and flipped it backwards. Make sense? I kind of get it. It seems foggy, but I'm also really sleepy.
The thing is, it seems to work. I recommend this paper to anyone curious about how FDR works. Even if you skip all the new stuff they've done here, it is a great review of FDR and various proposed mechanisms (that are target-decoy based).
To test this mechanism, they pulled 19,000 LC-MS runs (yup. like almost 20k runs!) and ran this approach. They got better data than the target decoy approach.
Okay, this is cool and everything, but what about Percolator!?!? You're very right. Percolator is the gold standard right now for this stuff. But they did 19,000 LC-MS runs. I did the crude math and Percolator in its current form could dig through that many files in around 11 years, and I'm guessing they did this a little faster!
Friday, May 15, 2015
The PD 2.0 upgrade process is relatively simple. If you have active "maintenance" in your existing copy of Proteome Discoverer then you can go to the Thermo Omics portal, download the upgrade license, install the upgrade key and you're good to go.
The trick for a lot of people is figuring out whether you still have maintenance. There are two ways of figuring this out. The first is to go to the Administration tab. Find licenses and click "show expired".
As you can see here, my Annotation has expired. This means I don't have maintenance on my copy of PD 1.4 so I can't upgrade without talking first to my sales rep. I can, however, download the 60 day free demo and use it as much as I want. So...after I acquire a ton of data...I'm going to do just that.
Now, I recently saw a copy of PD 1.4 that did not show this in the license box. In that case, go to the Workflow editor and look for the Annotation node. If its missing your maintenance has expired. If its there, go download that upgrade!
Thursday, May 14, 2015
I <3 cytoskeletal proteins. They are super important and they are very conducive to shotgun proteomics (mostly because they are large soluble proteins and there are tons of them per cell.)
Something entirely new to me is a post translational modification (PTM) that occurs in tubulin during rearrangement events like cell division. This PTM is detyrosination. Apparently it is critical and I bet its something a lot of us haven't looked at or for.
According to this paper in the April 23 issue of Science, detyrosination is responsible for this process.
Thats mitosis and the green are stained microtubules. During cell division the chromosomes are replicated and then pulled by cytoskeletal proteins into the two daughter cells. Despite thousands of years of research or whatever we haven't really ever understood this process. Turns out its the detyrosination up the microtubules that makes this go. Man, wouldn't it be wacky if this PTM didn't occur at the correct time?!?! Could this be linked to some of the insane cytoskeletal changes in cancer cells?!? The image at the very top I just randomly grabbed from Google images. It appears this PTM happens and the antibodies aren't cross reactive. You can have one antibody that only reacts to detyrosinated proteins and it looks like this is something that happens a lot!
Searching for this PTM is a little tricky because it requires the removal of an entire amino acid (or acids). For a full breakdown of how to set up this search with Mascot you can check this sweet paper from Ziad Sahab et al., in JPR from a couple of years ago (open access!) Turns out that this process is totally disregulated in cancer and this is just one of the many many many things in this world that there has been tons of work done on that I've never known anything about.
Shoutout to Dr. Norris for the great morning coffee read!
Wednesday, May 13, 2015
Check out this great figure! Does it make you immediately want to download and read this brand new review from Oliver Pagel et. al., (open access!)? I especially like it because 1) it focuses on the clinic and 2) Phosphorylation isn't all in bold or 10x larger than every other PTM. Phosphorylations are important, don't get me wrong, but they are one of many protein modifications and we need to eventually spend as much time on all these others as we do on phosphos (IMaHO).
Tuesday, May 12, 2015
I feel like this paper was almost written to give me the best start to a Tuesday ever! Before I go further the paper is from Mark Larance et al., out of the University of Dundee.
Why am I so happy about this paper?
1) I very easily could find a Sponge Bob clip with hungry nemotodes
2) The starvation stress response is one of the most interesting things that happens in multicellular organisms and we don't know nearly enough about it, particularly how this weird system seems to play into the whole awful aging process thing.
3) This paper is just awesome. It reads like a Cell paper. The model is awesome. The experiment is set out in a very elegant way. The stats look rock solid. The observations are well validated....
Okay! How'd this go? They SILAC labeled nemotodes (wild-type and mutants!) then put them through a thoroughly used starvation modeling system. They used a combination of SAX fractionation and organelle specific fractionation and did the proteomics with a (50cm EasySpray, FTW!) Q Exactive. They deposited the RAW files where they can be easily found. They processed the data with MaxQuant using sensible settings. Then all the stuff coming out was checked for statistical significance in R. GO was generated by DAVID. Cool stuff that was significant was tested with qRT-PCR. A huge matrix of data was generated and the discussion on the results is clear and broken into the major observations including changes in the metabolic pathways and in histone composition (cool....). The processed data appears to be available through the Encyclopedia of Proteome Dynamics, but I'm about out of time this morning. Excited to check it out.
This paper is just a great study that just happened to generate an awesome new resource for the biological community! Kudos to this group for doing just a superb job.
Monday, May 11, 2015
Spectral counting isn't my favorite thing in the world. It has its uses, for sure, but I have been lucky enough throughout my career to have access to first SILAC labeled strains and later to freezers full of reporter ion quantification reagents.
Here is an interesting blog post from NonLinear Dynamics regarding why they don't give you spectral counting numbers directly in their interface (though you can export the data out and do spectral counts in Excel or presumably a nice spectral counting interface like ProtMax.
I was interested to learn that Proteome Discoverer 2.0 actually added in a new column to every report that provides the exponentially modified protein abundance index (emPAI) for every protein in every separately generated output. If you are going to do spectral counting, using the emPAI is generally a better way of doing it than what PD 1.4 offered with simply counting the PSMs.
Sunday, May 10, 2015
I'm just throwing this out so that people with questions about this function can find it when they do a Google Search.
When you buy a Q Exactive Plus, there are a couple of awesome upgrades available that you can't get on the Q Exactive. One is enchanced resolution mode. This allows you to have over one-quarter million resolution. 280,000 resolution. Boom! This one is self-explanatory.
The second one is called Protein Mode. Protein Mode is a function that improves HIGH RESOLUTION analysis of Proteins. If you can get high resolution spectra of your protein on a Q Exactive, protein mode on a Q Exactive Plus will make these spectra look nicer.
If you are studying large proteins, like un-reduced mAB or antibody-drug conjugates (ADCs) or anything you can only resolve under 30,000 resolution, you do not need protein mode. Turning it on will likely give you worse data.
If you are buying a Q Exactive Plus for antibody analysis, then should you completely skip protein mode then? If there is a chance that you would want to maybe later reduce your antibody and study the light chain at high resolution, then Protein Mode might be a good move for you. If you know that this instrument is only going to be studying intact un-reduced antibodies, you're better off saving your money and buying better clean up spin columns or trying out more sophisticated chromatography columns than boring old C4.
Saturday, May 9, 2015
Okay. so we've got a little over 3 billion base pairs of DNA in each of our cells (in the conventionally understood model). Every single time a cell divides we need to go through that 3 billion base pairs and make a complete and new and perfect copy. And we've got to do this while still maintaining things like metabolism, respiration and rolling our eyes at reviewer suggested changes to the discussion sections of papers that make us want to just submit it somewhere else.
Sometimes mistakes are made (in the DNA, I mean). And sometimes the process is messed up by things like UV radiation and exposure to oxidative radicals. Even after the process is successfully completed, some of these things can come right into the cell and break the DNA strands clean in two. If everything is working well a bunch of proteins come in and fix these mistakes and put the DNA back together. If things aren't working well, then the break or mistake is allowed to stay there and that damage is duplicated when the cell divides again. This is how we get new mutations, but the goal is pretty much always to not get them.
We know LOTs about the DNA repair process. Entire sections of big universities just study DNA repair mechanisms. We have all sorts of cool proteins that we know are involved in this process.
So...why did M. Rasche et al., do all this proteomics work featured in this month's Science? Cause it turns out that every other technique used over the years to identify proteins involved in DNA repair maybe just scratched the surface of the complexity of these mechanisms!
As their system, they messed up the DNA repair process of frog cells by stalling the replication forks. This is a common mechanism for this kind of study. Essentially the DNA replication is just jammed up (normally by depleting the free nucleotide pool with something like hydroxyurea) that introduces DNA breaks. The cool part is that you can jam the mechanism pretty much whenever you want. Then they used their new technique they are calling Chromatin Mass Spectrometry, or CHROMASS...cause, you know, this field doesn't have enough acronyms... which essentially allowed them to quantify all the proteins that are coming to the rescue of the stalled fork breaks.
How'd it turn out? Almost 100 proteins appear to be popping in to help out! They identified all the known DNA repair proteins and scaffolding proteins that were expected. And then have 50 or 60 new ones....and that's why its in Science!
Friday, May 8, 2015
Lysine acetylation is another of the myriad levels of controls our bodies use to regulate lots of things -- possibly everything! who knows at this point?!?! What we do know is that it is there and we can find a few of this mod in almost any global proteomics experiment if we look for it. What we haven't had is a streamlined workflow for going after these modifications to find what, if any, patterns exist in different states.
Tanya Svinkina et al., (in press at MCP and currently open access here!) have now changed that (BOOM!)
In this study, this group uses a mixture of commercially available antibodies (that you can buy right now, if you want) to selectively enrich peptides with lysine acetylation. They perform this enrichment on multiple cell lines and throw in both SILAC and reporter ion quantification reagents along the way just for kicks.
As I said, we have some idea of what this mechanism does and since a lot of these researchers are affiliated with the Broad (like toad!) they apply it to a bunch of different cell lines including mouse, and human cancer cells. By the way, that GIF above is really distracting while typing....
LC-MS analysis of the samples was performed with either an Obitrap Elite or Q Exactive.
How'd it turn out? For one, they ended up with more lysine acetylated peptides than anyone has ever reported getting at once. For two(?), they showed that their relatively simple technique for enrichments and LC-MS analysis was easily applicable to SILAC, iTRAQ, and TMT. For three(??) they found that about 50% of lysine acetylation sites were conserved between the mouse and Jurkat cancer cell lines (woo!). Anything that is that conserved amongst species is super important, suggesting that further analysis of this PTM is pretty critical to our fundamental understanding of mammalian biology.
Thursday, May 7, 2015
Just a few hours ago I flew out of beautiful St. Louis, Missouri, the scene of this year's ASMS. Here and there in my travels I've heard rumblings about the location of this year's conference. If you are bummed about going, I suggest that you Google Image search the terms "scenic Saint Louis". You'll be bombarded by beautiful pictures such as the one above, which someone took somewhere in Illinois (try it!)
But I digress... what I wanted to lead you guys to was the full schedule of things that Thermo will be doing. You can find it here. Looks like a bunch of cool talks from a bunch of people we've all read papers from!
Wednesday, May 6, 2015
This is the first 2D gel paper I've read in a while. I like it for a lot of reasons. (One of them is the fact that I didn't have to do this gel...NOT something you want my help with...)
The paper in question is by Judy Triplett et. al., and is open access here. In this study they look at proteomics and phosphoproteomics of a PINK1 knockout mouse as a model of inherited Parkinson's Disease. Interesting model? Check!
As you can tell from the gel image above, the technique is solid. Whoever did this knows what they're doing. Differential proteomic analysis was done and SYPRO ruby was used for differential phosphoproteomics. Spot IDs were performed with an Orbi XL.
Changes that were observed were validated by western blots. All around, just a nice technical study. Its almost just a plus that this work provided some insights in the major pathways affected by familial Parkinson's disease.
I'm pretty sure I mentioned this site before, but if you have an Orbitrap Fusion you should probably have it bookmarked. (Here is the link.)
The information is condensed and proteomics centric but it really breaks down the Orbitrap Fusion -- how it works, what the features are, and what experiments will yield you what.
I was helping a Fusion owner yesterday with a method that was yielding weird results. Turns out it was a method logic issue and I found the answer by going right to this awesome resource!
Tuesday, May 5, 2015
Quick blerb here on a hidden feature in Windows 7 that is called: Automatic Metric. This was something kind of snuck in and something a lot of us will never notice. One of the functions of automatic metric is for Windows to get to determine which port to use and when. Not a problem on most PCs, cause most of the time we have 1 port. But if you have, oh I don't know, 3 ethernet ports on your PC -- maybe one for your LC, one for a mass spec, and one for Internet -- it can cause problems.
The solution? Turn the darned thing off. Honestly, if I think about it I turn it off on every LC-MS system I visit if I think of it -- particularly if I'm looking at a system with a Waters LC on it. Does it fix every problem? Nope, but it has fixed several.
The way to find it is to go into the Network and Sharing Center (look that up in the search bar if you don't know how to directly navigate to it.) Then open the properties for the ports for each device. It takes a while to toggle through. In the end it looks like this (click to Zoom In):
In the case of the above screenshot I turned off the Automatic Metric and set the Orbitrap to permanently be port 2. Repeat for the LC (Waters LCs want to be port 1, by the way). If you have had some communication problems and Windows 7, it can't hurt to try it out.
Edit (5/6/15): For further discussions on this topic, see this helpful thread at the ABRF discussion forum (here.) pay particular attention to the posts from September 2014 from my fellow Hokie, Dr. Wingerd.
Edit (5/6/15) 2: If none of these suggestions help...sorry...go after the Windows Firewall settings. I recently had luck with a room full of mixed vendor instruments by taking all of them off the network and disabling all firewalls. An IT security person followed after me and put the devices back on the network after verifying that EVERY program that the LC and Mass spec used for communications (there are secret hidden ones!!! you have to get them all!) was allowed full permissions through the firewall. This took quite a bit of time.
Monday, May 4, 2015
Umm...so..this is the first Google Image that comes back for "direct infusion needle". We're gonna run with it. I've got a lot to do today, but this was too cool not to share.
The paper in question is in this month's Open Proteomics and its from Christina Looße et al., you can find it here. (In case you were wondering why the color change...I don't know what that "B" thing is or how to convince my keyboard to make it...yes, I'm that dumb...)
In this paper they got a synthetic peptide to optimize with and then went to quantify this peptide in a complex digest. They used all the normal techniques, like SRM and SIM (connected to an LC), then they did the title experiment. Direct infusion (with nano emitters and with the HESI source). Turns out that you can quantify just as well without an HPLC! Obviously the background is going to be more complex and the sensitivity will be lower since you have all your compounds at once, but I'll be darned if it doesn't work.
Best of all? They were getting CVs with the synthetic peptides hitting <15%. Not too shabby, right? So, if you're in a hurry or your LC is going through standard maintenance you can still quantify peptides quickly and accurately on your Q Exactive!
Sunday, May 3, 2015
Quick lunchtime answer to a question that comes up once in a while...when does the Orbitrap patent expire? A quick look on Google Scholar led me to US Patent 7714283 (here!). That was filed in 2006. Given general US patent laws as described in Wikipedia here, we have 20 years on the patent. We're looking at 2026.
Something else to consider, though, is that the C-trap wasn't filed until 2008 in the U.S. and an Orbitrap doesn't work without one of those. Its also good to keep in mind that there a multiple patents detailing improvements to the Orbitrap. So..even if the patent for the original device was up tomorrow you could probably only start building something like the Discovery in your basement!
P.S.This is the number of pages Google Scholar currently has for Dr. Makarov's patents....