Sunday, August 31, 2014
Told you guys I was going to get caught up on the literature again! My day job hasn't been getting less nutso and I've been falling behind on the fun stuff I want to do. Sometimes you have to pull a fun all-nighter backdating blog posts!
Anyway, this new paper in Open Proteomics shows us how to get EXTREMELY deep coverage of a sample. Did you know there are over 450 known protein PTMs?!?! Holy cow. We need to stop thinking about just phosphorylations! The problem is, these other 449 PTMs are also low abundance, but we don't yet have enrichment methodologies for all of them. What we need is extreme chromatography.
Our normal approaches for separation probably aren't going to cut it if we want true -omics coverage. We're going to need combination approaches. Separation at the top down level, followed by separation at the middle-down or bottom-up levels, as well as particular PTM enrichments. If we are really going to be stepping into the domains of trying to track every PTM, we're going to need some extreme measures. Definitely check out this paper here for a nice perspective on what we're dealing with and what strategies we currently have in place for going after weird PTMs.
Saturday, August 30, 2014
This one isn't a proteomics story. I really need to catch up on the literature. I've got some long plane rides coming up so that should help.
Anyway, this is next gen sequencing. And Ebola. And a tragic story. A study released in Science this week shows the use of next gen sequencing at 2000x coverage to trace the Ebola outbreak to a single person who picked it up from the natural reservoir. The detail here is astounding. I stole this pic from the article (it's open access, so I doubt I'll get sued.
It is a pretty conclusive story. And it makes sense from an epidemiology standpoint, right? From the news, we all know how tragic this outbreak has been and how scary it really might become.
To put the severity of this one in perspective, I think this speaks volumes: 5 of the contributing authors on this paper have died from the disease...
You can read more about this all here.
Thursday, August 28, 2014
This is SO smart. And we need an equivalent here in the U.S. If you want to start it, I volunteer to head it.
Prime-XS is a program ran by the EU. It ensures 2 things: 1) That proteomics is used for scientific studies of extreme merit. and 2) That labs participating in XS are exposed to high quality, high impact biological problems. It is a win-win. Top notch labs get top notch collaborators and the EU pays for it!
How it works: Researchers in the EU countries can apply for days of access to proteomics facilities that the EU has reserved for this program. This is how it is currently divided:
What a win for everybody! How many top notch proteomics facilities can you think of that have trouble finding high impact biological questions? Tons, right? Let's face it. Cool problems don't always come from the same places where our best proteomics facilities are. This fixes it. In one fell swoop. Top biological problems -- top proteomics capabilities -- and we all win.
The downside is that this program is going to push the power balance in impact factor toward Europe. Meaning we need something like this over here.
You can read more about Prime-XS here.
Wednesday, August 27, 2014
Tuesday, August 26, 2014
Today I got a super interesting question about Percolator and Spectral libraries. While investigating, I figured I'd better go to the Percolator Google Group and see (btw, I think this is going to be very interesting to everybody!!!)
Anywho, while searching, I came across a whole lot of names I recognize who belong to a Crux-users Google group. What the heck is Crux? Umm....maybe an incredibly awesome and apparently free software package for proteomics?
On further investigation, I found Scholar references back to 2008(?!?!?!). I feel less out of touch because it appears to be buried within the 1,000 programs and features that make up the Trans Proteomic Pipeline.
You can find out more about Crux here. Feel free to tell me about it. Back to Percolator....
Friday, August 22, 2014
I got to hang out all week with the very nice team of scientists at Protea Biosciences in the beautiful mountains of West Virginia. Not heard of Protea? Me neither! But I expect that they will rapidly be something that we'll be talking about, in part, because of this thing:
Yes. That is attached to a Q Exactive! And that light inside is for the camera that directs the LASER. I'm not going to lie and say I'm some laser ionization expert. I'm not. But I've spent some time on a MALDI-Orbi XL and a Rich Helm's MALDIs, but I got a crash course in it this week, and this was the most badass one I've seen. The source is a LAESI and you can read about it at wikipedia here. The team here has this source running on all sorts of samples. I was just there to see if we could fine tune the Q Exactive to get even better data. It was cool because we could get about 3 second laser pulses on our controls. The trick was optimizing cycle time in the Q Exactive so that we could optimize every single millisecond.
Definitely a different way of thinking. But if you think about the Q Exactive, and assume that this source can ionize virtually anything and consider those implications. On the QE we can run, at maximum, about 13Hz. If we multiplex, this gives us a chance to monitor as many about 65 compounds per second via targeted SIM or targeted MS2 (PRM; btw, I'm considering practicality. We can multiplex 10 compounds in the QE, but 5 is easy. 10 is trickier). If we're just going for detection, the LAESI-QE compound can probably SIM or PRM about 150 molecules per 3 second laser pulse (we were optimizing with small molecule drug mixtures!)
What else did we do? We worked most of yesterday optimizing native protein and top-down. Cause the LAESI can zap native proteins right out of tissue, right off a slide, right from where you want it from. Point to the area on the microscope and ZAP (it didn't actually make a noise...) native protein MS! We were even able to get nice top down data on our intact native protein using the LAESI by using multiplexing on the QE to fragment multiple charge states from the isotopic envelope. Did I mention that my week was really cool?
Anyway, this source can be put onto just about anything but, honestly, what is cooler than having imaging capabilities on the world's favorite mass spec?
BTW, imaging isn't all this lab does. They have a really exceptional team of mass spectrometrists with experts in virtually anything you can think of from molecule elucidation through quantitative proteomics and everything in between. (Can you tell I was impressed this week?)
TL/DR: You probably know about this already, I didn't, but check out Protea here!
Thursday, August 21, 2014
What is more annoying than doing that calculation above me? Seriously! I have x amount of protein in ug what is my molar concentration? You are probably smarter than me, but I have a tendency to lose a decimal place or two. This week I said "wow, there should be an app for that". And there is. Of course.
Promega has a free BioMath calculator that will do this one for you, among other things. (Lots of DNA calculations and dilutions, but this is what I downloaded it for!) It is available for both Android and Apple.
You can read more about it here. A big thanks to the talented team at Protea for pointing this one out to me!
Wednesday, August 20, 2014
Monday, August 18, 2014
I had to go to the doctor today for routine checkup stuff and what magazine was at the top of the stack? This month's issue of The Scientist! That's just how Baltimore is. There is probably almost as many scientists here as drug addicts. My physician is probably married to a researcher at Hopkins or UMD or something.
Anyway, there is a great article this month describing how to move from discovery to targeted proteomics, as well as a description of each open source platform. This'll come to no surprise if you've used it at all, but Skyline made the top of their list.
A couple of these platforms were new to me and it might be worth it to check out this nice little review. Or even to forward it to collaborators. It is concise and nicely written. You can find it here.
Sunday, August 17, 2014
Look what I got this morning! It is looking really really good too. I got it to do some PC benchmarking today while hanging out with a sick dog.
I'll show the benchmarking stuff later. I need to sort out some variables. While I was doing it, I noticed a file was taking a whole lot longer than it did on PD 1.4 (or on earlier PD 2.0 alpha copies).
Check out what our friends in Bremen got working!!! That one file did take longer than normal, but it was because PD was doing a bunch of other things:
If you've spent any time on the job queue on any version of Proteome Discoverer, chances are I just blew your mind a little. PD is running multiple files at once!!!! Now, it remains to be determined if those two matching 81% are because PD detected that I was reprocessing the same Fusion files just with different names, but the fact that it is intelligently allocating time is bound to make more than just me happy. I need to do some digging around. I don't have all that many Fusion files and I'm running them cause they are the hardest to work with. If we can do them fast, I'm not worried about the QE or Elite files. Easy. This RAW file has 16k unique peptides in it!
I'm running on a crazy fast PC (more details on that later, too!) but it knocked out 4 Fusion runs in an hour and 17 minutes. I was experimenting with different peptide and protein FDRs and it just tore right through them. By comparison, I just saw a big fancy dual CPU Xeon choke on HeLa files in PD 1.4 for hours. Better hardware. Better software. And all the sudden these huge datasets everyone is generating don't seem all that scary!
BTW, wait till you see how PD 2.0 handles complex experiments! Thermo is about to release the best proteomics software we've ever seen.
Friday, August 15, 2014
You know what I love? When people start applying nice statistics to proteomics data. A lot of these datasets are geting far too large for us to say "x is twice y". But we all have a lot on our plates. We can't just take a bunch of stats classes (believe me, I'm trying and I've already had to drop on that I paid for this summer...) in order to get caught up. We need good, trustworthy, time tested stats built into our processing schemes.
Why not go for simple p-values?
Because, obviously, it isn't that simple, dummy!
HAHA! But it turns out that it is!
JJ Howbert and Bill Noble think it is and they have some really good evidence. Check out this paper (it appears to be open access) in press at MCP.
In this study, they went to the original Xcorr values assigned by Sequest and looked at the total score distribution across all the peptide-spectral matches. At this level, they were able to determine the probability that their test hypothesis (in this case, the Xcorr value) was true, cause that's what p-values do.
When they went back and ranked their peptides by p-value, rather than Xcorr, they found they had a much more accurate measurement of PSM validity than merely saying "anything above an Xcorr of 2.0 is trustworthy" (which is what most of us have been doing all along, be honest, and we've all secretly known it was silly. It's like saying a TMT fold regulation of 1.25 is significant. It's just us being lazy....)
Awesome, right? As proof of principle, they compared the same data set to a bunch of different engines and, predictably, this worked better than the other engines tested.
What about Percolator?!?!
This is where I don't know quite enough Greek letters..or at least when you're adding and dividing them it does a funny thing to my brain. What I know this morning? They were able to work this pipeline into Percolator and I fell asleep. They come from the same place. Of course it works with Percolator!
Thursday, August 14, 2014
I'm still at this awesome LC bootcamp. Yesterday the instructors threw out this idea that has never ever occurred to me. If you have a dual pump system with enough valves, you can set up parallel LC. Dionex actually just sells a kit for this. The gist of the method is that while your peptides are eluting, the second sample is loaded onto a second trapping column and washed. When you get past the point in your gradient where you are just washing crap off your column and re-equilibrating you switch valves and then go right into the elution of the next set of peptides! You could be really ruthless with this and shave a ton of time off of each run or more conservative and still shave a lot of time!
In the example we were looking at we were able to shave off 30 minutes of trapping, desalting and equilibrating time from each sample injected. Imagine a semi-complex sample that you run with a 140 peptide elution time and you are talking about close to 6 hours of extra run time that you are squeezing in a day. (Assuming ~12 samples per day and 30 minutes each). Thats 2-3 extra runs per day!
Looking at the schematic makes this seem reasonably simple for any LC that has 2 switching valves and separate loading pumps. Might be a great solution for any of you guys who are juts getting buried under your sample queue!
Wednesday, August 13, 2014
This week I am at an intensive chromatography bootcamp taught by legacy Dionex experts Nick Glogowski and Daniel Kutscher. Of the hundreds of interesting things I've picked up so far, one of the coolest things is this explanation of smaller particle sizes and why we keep migrating to smaller and smaller particles.
This chart shows the optimal parameters for separation of different particle size solid phase materials. (Sorry about the colors, original document available here.) For the most striking difference look a the 10um particles. There is only a very specific flow rate where the binding is optimal. Look at that in comparison with the 2um and 3um beads where we can handle a much larger variation in flow rates and maintain a nice straight line. While I'm not going to pretend I understand all the math that has passed by on the projector this morning (sorry, guys!) I think it is still pretty clear that consistency in our chromatography sure makes a big difference (especially if we want to load our columns faster than we elute off them!)
Tuesday, August 12, 2014
Boy o' boy! Intact analysis and top down proteomics are all the rage these days! A lot of this has to do with the Exactives. The Q Exactive is great for both. The QE Plus has the new Protein mode option and the Exactive plus EMR is probably the easiest and most sensitive instrument for intact analysis ever made. A large percentage of my day job these days is supporting you guys with intacts and top downs. A problem I've ran into is that the standards out there kind of suck.
My friend Aimee Rineas at Dr. Lisa Jones's lab at IUPUI took a swing at fixing this problem a while back
Our solution? A pretty thorough analysis of the 6(?? 7?? read on, lol!) Mass Prep protein mixture from Waters. It is part number 186004900 . Be careful, there are several similar products and the Waters website doesn't do a very good job of distinguishing them. This is the one I'm talking about.
Great! So we have a standard. Easy, right? Not so fast. The chart above is all the information you get on the proteins. For Ribonuclease A, the mass is rounded to the nearest 100? Sure, this will be okay for some TOFs, cause thats about as close as you can get in the mass accuracy department, but I'm running Orbitraps. I want to see my mass accuracy in the parts per million, not the level of mass accuracy I will get with a TOF or SDS-PAGE.
This is where the work comes in. Aimee and I used a really short column. A C4 5 cm column and ran this standard out several times on the Q Exactive in her lab. We did this first to obtain the intact masses and a few more times for top down analysis.
Our best chromatography looked like this (this is an RSLCnano, we just ran the microflow with the loading pump:
Not bad for a 5 cm column, right?
Lets look at the first peak:
13681.2. This is our RiboNucleaseA. If we go to Uniprot and look up the sequence and cleave the first 25 amino acids (this is the pro- sequence that isn't actually part of the actual expressed protein...its a genetics thing that we don't have to worry about (?) in shotgun proteomics (?) but we have to worry about in intact and top down.
According to ExPasy, the theoretical mass is: 13681.32. This puts us 0.12 Da or 8.7 ppm off of theoretical. Boom! (On a QE Plus run since, I've actually tightened this awesome mass accuracy!)
Okay. I know there are some good TOFs out there. There are probably some that could get us within somewhere close to this mass. But if we dig a little deeper, we see the real power the Orbitrap has on this sample. Look at this below.
Can you see this? Sorry. Screen capture and blogger format only get me so far.
If we look closer at our pure protein we purchased, an inconvenient fact emerges. What we thought was our pure protein, we can see with the QE, most certainly is not. At the very least, we find that this protein is phosphorylated. This problem is exacerbated when you increase the sensitivity even further by analyzing this same sample with a QE Plus (I have limited data showing this mix actually has a 7th proteoform in it that I need to
find further evaluate.
By the way, this protein is known to be phosphorylated in nature. The manufacturer just wasn't aware some of it slipped through their purification process. We also did top down, remember. We should be able to localize the modification. I just haven't got quite there yet. Free time is a little limited these days.
(I have used ProsightPC and localized this modification, I'll try to put it up later).
(I have used ProsightPC and localized this modification, I'll try to put it up later).
People doing intact analysis are sometimes critical of the "noise" they find in an Orbitrap. Further evaluation of the noise will often reveal that they are really minor impurities in our sample. Are they biologically relevant if a TOF can't see them and only a QE Plus can? I don't know. Probably not. Maybe? Wouldn't it be better to know that they are there in any case?
I started this side project months ago, considered actually writing up a short note on it and figured, "what the heck" more people will probably read it if I put it here anyway. I've also gotten to run this sample on a QE Plus, which revealed even more cool stuff.
This is incomplete, and not doublechecked, but these are the masses that I have so far for this standard:
This is incomplete, and not doublechecked, but these are the masses that I have so far for this standard:
|Protein||Part list mass||Our mass|
|Cytochrome C, horse||12384||12358.2|
|Myoglobin horse + heme||17600|
P.S. All of this data was processed with ProMass. I, too, am a creature of habit. Protein Deconvolution gives me tons more tools and better data but for a super simple deconvolution I still default to good old ProMass or MagTran. If I had written this up and tried to submit it somewhere, you bet your peppy I'd take screenshots from my PDecon runs though!
Monday, August 11, 2014
Sleep deprivation has been big news lately, ever since a study went viral that showed sleep deprivation can cause permanent damage in the brains of mice. As someone who doesn't sleep all that much and who knows a lot of other people who I can reach just about 24/7 this has attracted my attention.
My criticisms of this study: 1) These are mice, not just that, these are horribly deformed and inbred mice that are produced to have no fear of human beings AND to be genetically identical. 2) For further evidence, Michael Jackson was reported to sleep no more than 3 hours per night and, last I checked, the King of Pop was doing just fine.
A new study takes aim at these observations using proteomics. Mice were forced to stay awake, their sleepy little brains were extracted, the neurons were enriched on a density gradient, and proteomics looked at the differences. Unfortunately, the results are a little underwhelming. They found 80 proteins or so that were differentially regulated (1.5-fold) and the DAVID and IPA analysis was a little inconclusive. The paper hints that further analysis is in works and that we'll know a lot more when they wrap up the next paper. However, if you are interested in looking at neurons via proteomics, this paper has a nice and concise method.
Sunday, August 10, 2014
This is really really cool and currently in press (open access!) at MCP and comes from work done at Roman Zubarev's lab. Edit: Here is the link to the abstract (left it out before).
In a DDA experiment we pick the ion we're interested in that looks like a peptide, based on the parameters we provide that say "this is a peptide and it is probably one that will fragment well with the method that I'm using right now". Then we isolate it and, too often, a bunch of
For years I've heard people kicking around this idea: What if we identify our peptide from our MS/MS spectra and then we remove every MS/MS fragment that can possibly be linked to that peptide. Then we're left over with the fragments from the peptides we accidentally isolated. Lets then database search that and find out what that is.
And that is exactly what DeMix does.
Let me rant a little bit about how cool the workflow here is. They ran this stuff on a QE with 70k MS1 and 17.5k MS2 and used isolation widths of 1 - 4 Da. They converted everything over to centroid using TOPP (btw, they found better results when they used the high res conversion option for this data, so I'm using that from now on. Next they ran their results through Morpheus using a 20ppm window and a modified scoring algorithm. The high scoring MS/MS fragments were used to recalibrate the MS/MS spectra (just like the Mechtler lab PSMR does) using a Pyteomics Python script.
Interestingly, when they made their second pass runs they tightened all of their tolerances and processed the deconvoluted MS/MS fragmentation events where the previously matched fragments were ignored. I should probably finish my coffee and then work my way through the discussion, because I would have done it the opposite way (and, when we do serial searches in PD, that is the default workflow). I'm not knocking it, I just find it counter-intuitive.
So what. Did it work? Of course it did, or it wouldn't have made it into MCP! Final stats? The QE was knocking out about 7 MS/MS events for every MS1. Using this approach, they IDENTIFIED 9 PSMS(!!!) out of each 7 spectra. They didn't get 2 ideas per MS/MS event, but they got about 1.2 which is a heck of a lot better than 1!
I can not wait to try this and I've got the perfect data set to run it on sitting right in front of me. I'll let y'all know how it goes.
Saturday, August 9, 2014
Thought I would share this. I love the perspective. Malaria - a disease that is virtually impossible to study with genomics techniques cause the involved genes mutate at a crazy rate, and only proteomics has a real shot of deciphering (its protein - antibody interactions! come on!)
Shark week? We need to have malaria week!
Friday, August 8, 2014
It was just a matter of time before someone did this, right? I get questions about this through the blog and through my day job all the time. The truth is it isn't easy to figure out what is going to make a good processing computer. You'd think it would be simpler, right? Old computers will be slow, new computers will be fast, and expensive new computers will be the fastest. Yet I've been in labs this summer that have dumped $3,000 + on Xeon desktops that are much slower than my quad core laptop I'm always bragging about. To this day, I definitely can not guess why one PC is going to be slower than another.
Fortunately, I don't have to worry about it anymore. These guys at OmicsComputing have put together 2 processing PCs, a basic "Omics Workstation" for fast processing and a super processing computer called, get this, the "Proteome Destroyer".
If you've read much on my blog, you know what a dork I am for stuff like this. So my first thought was to see if I could borrow some time on one of these and run some stuff. And I got to. And they aren't messing around.
Sorry the text is small here. What is it? My favorite HeLa high high file ran on the Proteome Discoverer 1.4 demo. I used the full human Uniprot. Static modification of alkylated cysteines and dynamic mod of oxidized methionine. Sequest took 57 seconds and Percolator (32-bit, not the awesome beta that I have accidentally shown you guys before) took under 6 minutes. So...a whole HeLa high-high run in 7 minutes or so. Pretty good. I've done better on my overclocked quad core with the Percolator beta we're testing, but this beats the heck out of virtually every other run I've seen.
Okay, who cares, high high files are easy. What about those big Fusion files that even my quad core suffers in processing (we're talking High/low and hugely dense data files) 20 minutes, with Percolator (misplaced the screenshot for proof, but I'll load it later. I know its in one of these inboxes somewhere...). I have never ever processed a Fusion HeLa file in under 30 minutes before....
Get this, though, apparently this isn't nearly as fast as this PC can go. When I reported back what I saw (probably faster than I've seen, but not mind blowing) they took a closer look at the runs and the processor wasn't running anywhere near maximum. I guess it uses a very aggressive processing boost function when it is under high load. Proteome Discoverer running a Fusion file wasn't enough to trigger high load functioning. The PC was like, "oh well, I'll run this but I don't need to activate all of my cores or memory or anything."
So they tweaked the software or hardware or something so that it recognizes PD as a software that should be ran at full capacity and invited me to try re-running my files. As you might guess, I'm psyched to test it out! As you also might guess, it may be a while till I can get to it cause I've got lots of other things to do.
You can check out their simple little webstore here.
TL/DR: This company designs computers just for genomics and proteomics processing and I'm pretty sure they are a whole lot faster than what you are processing your Proteome Discover data with. And apparently, I didn't see the full capacity of what these computers can do! BTW, they aren't nearly as expensive as you'd think, as they run from $1800 - $4000, crazy, right!?!?!
Thursday, August 7, 2014
Sorry, I know. That is super gross. But I'm going for impact in my first post since my awesome technology-free vacation (rock climbing all over Appalachia! wooo!)
A criticism I hear of proteomics is the lack of true results in the actual clinic. We can hotly contest that one, obviously, but that doesn't stop people from saying it. I love it when I can have a study in my back pocket to point people to that says "proteomics did THIS."
That's why this study in this month's journal Cancer is so awesome. Proteomics did THIS. Validated biomarker panel to detect cancer in the esophagus. In the clinic. Now.
The study I'm talking about is here and came from researchers in the Allegheny Health Network in Pennsylvania in conjunction with some researchers in Buffalo. Interestingly, this is yet another group I've ran into in the last month or so that is successfully using spectral counting -- maybe its coincidental, but it is really starting to look like this approach is making a major comeback. Something I really want to evaluate on today's super fast instruments (at Fusion speeds, can we get both sample depth and quantitative dynamic range? Maybe...again, thoughts for later!)
Are the approaches revolutionary? Not really. The samples are really cool. The math is good (by that, I mean good use of statistics!) and ELISA validation means that you can rapidly move from a proteomics observation right into a good molecular lab in the clinic. And even cooler that this study didn't go to MCP, but went right into Cancer! (No offense, MCP, but as a biologist I'd much rather read a great proteomics study in a biology journal...) Great example for the haters out there!
BTW, this study is getting lots of press. You can read more about it here.