Monday, November 30, 2015
How on earth do you guys do proteomics on unsequenced organisms? I know there's BICEPS, which relies on genomes from similar organisms and error tolerant searches, but...besides that?
Okay, here are some good examples. They finally did the tardigrade genome, and its reeeeaaaalllly weird. But if the genome is only just now done, how did:
These groups do the proteomics studies in 2010 and 2011?
Well, the 2010 group did 2D-gels and compare their IDs against the current known protein sequences and the group in 2011 focuses primarily on highly conserved heat shock proteins.
So, you do what you can with the proteins that are in the databases. Or you study something that doesn't change from organism to organism.
Why is this cool, other than the fact that tardigrades are frickin awesome? Cause this is a perfect example of a great meta-analysis project. Google Scholar pulls up 5 studies that appear to be at least partial proteomic analyses of these ridiculously cool organisms. Every single one of them was performed with imperfect genomic or protein databases. If I was looking to write a nice and reasonably high impact paper this weekend, I'd be downloading this genome here and seeing how many of these papers submitted data to Tranche and PRIDE that I can freely download and meta-analyze. Then maybe we'd know how these weird, indestructible things tick...or dance...or swim....
UPDATE 12/5/15: Ummm...okay, so maybe hold off on that meta-analysis... the work of this study above has been an explosion in the genomics community. It appears there might have been some errors. I LOVE SEEING SCIENCE IN ACTION!!! Maybe all this is real, and maybe not, but either way we'll be further ahead! Discussion on the controversy here.
Sunday, November 29, 2015
Despite what crime dramas might lead us to believe, forensics technologies still aren't perfect. A big hangup of the DNA evidence is that we have the same copy of DNA in every cell, so telling where the tissue samples came from is difficult/impossible.
Sounds like a job for proteomics!
In this cool paper from Sascha Dammeier et al., out of the Kohlbacher lab these researchers investigate using a proteomics approach to show that shotgun proteomics runs can tell you what tissue some evidence comes from. When your Materials and Methods section includes bullets and evidence bags, you've entered into interesting proteomics territory!
To make it even more interesting(!!) they did the proof-of-principle stuff on a cow organs subjected to blunt trama and then(!!!) they participated in a real crime investigation!!!
Okay, get this. A bullet passing through and organ carries enough protein for an Orbitrap XL to get a proteome signature good identify the organ!
Saturday, November 28, 2015
Got Proteome Discoverer 2.0?
Want to get a free upgrade that gives you awesome levels of label free proteomics capabilities?
I mentioned this before, but I just installed the nodes that are listed here on a fresh install of PD 2.0 and ran it and I'm blown away. These are really really super nice!
My recommendations (cause I don't know what I did wrong before):
1) Read the instructions
2) Download the nodes AND the processing and consensus workflows
3) Don't use a small file to test. Percolator needs to run for this to work. In PD 2.0 if you have less than 200 peptides going into Percolator it just turns off. Then the nodes can't work.
4) Revel in your new capabilities!!!!
(Look at this!!! I can find stuff in my runs that were not identified, but that were differentially (sp?) regulated in my two samples!!!! MAGIC!)
5) Remember that this is a free second party software. There is a nice mailing group you can sign up to for advice and news about the nodes. Your instrument vendor probably can't help you with these.
6) Do awesome label free proteomics!!!
Friday, November 27, 2015
This stuff has all been done on bacteria long ago, but these guys did this study in human cells. What genes (proteins!) are 100% essential for human cells to survive? Probably worth clarifying this statement -- what genes are 100% essential for human cells to survive in culture...cause thats the only way you're gonna pull this one off.
They come up with a list of 2,000. Wow, right? Depending on how you look at that, its either really big...or surprisingly small. I can't decide.
Are we surprised at all that a ton of them are proteins involved in protein glycosylation? Apparently that stuff is important.
Thursday, November 26, 2015
In celebration of my country's love of giant slow-cooked birds, I present you with this year's top hit on Google for "Turkey Proteomics!"
A EUPA meeting the third week of June in Istanbul?!?!? I LOVE the name of the meeting:
...Standardization and Interpretation of Proteomics.
Wednesday, November 25, 2015
I think its about time we start some concerted spy missions. The genomics people have all sorts of cool tools. Lets send proteomics people to genomics meetings and then just steal all their cool ideas, which is probably less like stealing because they give away virtually all of their algorithms.
Case in point? This WGCNA thing. Notice that its been around since 2008. So..around the time we were all getting our heads wrapped around target decoy searches, the genomics people were like "hey, lets do some unsupervised clustering of our thousands of quantitative changes and see what stands out in a hierarchical sense"
What's it do? It tries to find patterns in your complex data without you spending all day looking up genes that make sense. Its just pulling out common traits and clusters them with your sample(s) and type(s).
How's it do it? Well, its in R, so it does it in the ugliest way possible. Does it look like the screen of a Commodore 64? Yup! Then its likely R, he world's most powerful and utilized statistical software package!
Besides that? I have no idea. Google Scholar informs me that the initial paper describing this algorithm has been cited nearly 1,000 times. So, somebody has liked this paper. Maybe I'd even toy with it before trying to check the Stats.
You can visit the WGCNA website here. It has links to papers and full tutorials.
Shoutout to Alexis for mentioning this algorithm to me a second time and showing me cool clustering data from it so I'd remember to share with people!
Tuesday, November 24, 2015
At first you might think I was initially a fan of this paper because looking up the protein and nucleotide in the title is a gold mine on Google images. I'm not saying you're wrong, but there is more to this cool paper even than the JPGs I'm probably going to insert while I'm writing this.
A few years ago people were pretty hyped about AMP. Remember this?
It looked like AMP was going to be a great big important PTM. Turns out, however, that its an absolute pain in the foot(?) to study with LC-MS. A couple labs gave it a good hard try and got some nice results, but their techniques maybe looked a little too painful for us to want to replicate.
In this new study in MCP from
Monday, November 23, 2015
This link will take you to what I'm talking about. Its a press release of sorts about a new study in Nature Physics. Interestingly, from the time of the press release until now, it appears that they decided to change the name of the paper and it is available here (paywalled, sorry!)
The press release explains it better than I can, honestly, but they took that crazy super computer down in Tennessee, the Titan (19,000 CPUs AND 19,000 GPUs)
What they did was single cell protein modeling with all that processing power.
What did they conclude? That considering protein production speed and turnover via degradation, proteins may never actually achieve equilibrium within a cell. That you'd constantly be looking at a moving and changing population of even one single gene product. So if you were able to pull out just two proteoforms of a given gene product at any point in time, they probably wouldn't be the same. One might be just formed and the other might be partially degraded...or reacted, or modified.
Is it real? Who knows? Its fascinating, though! I guess it probably doesn't affect what we do much. I mean, we're looking at the averages of signals from thousands of copies of proteins from thousands or millions of cells at once, so maybe all of this averages out into just noise, but I always love things that highlight how little we still know about biology.
Sunday, November 22, 2015
I've been asked this question a couple of times. And maybe now I know the answer.
At first run, it certainly looks like PD 2.1 installs just fine with nothing special whatsoever on Windows 10.
Now...that being said, this isn't officially supported by the vendor. And just because the couple HeLa runs I just did seem to go just fine doesn't mean that every feature will be ready...but, again, I haven't had a problem yet!
Thursday, November 19, 2015
Like an awful lot of people right now, I'm kind of obsessed with the magical thing called "proteogenomics". Which...honestly...seems a little bit like magic. Getting good quality transcriptional data and filtering it so that you can see new mutations....that you can trust...AND THEN using this information to find new matches to your MS/MS data....that you can trust....
A few people have totally pulled it off....and I have the papers stacked up in front of my PC all marked up and highlighted and...well...maybe I'm dumb...but I still don't know how they did it.
For a good starting point, check out this nice review from Sam Faulkner et al.,. While I'm on the topic of magic, you might be surprised to see this is an article from Springer that isn't behind a paywall!
Oh yeah! And here is the link!
Wednesday, November 18, 2015
Great...another protein post-translational modification...cause my search space isn't nutso enough already....
Oh! Hi, there! Ever looked at this one? Its called 2SC...and it turns out that it has a role in human diseases. Its even detectable by other molecular techniques and has been implicated in diabetes. A brand new paper from Gianluca Miglia et al., takes a computational approach and analyzes a ton of proteins. Turns out that it mostly sticks to Cysteines. Which makes me wonder if our typical techniques of cysteine reduction/alkylation/assumption of 100% cysteine alkylation means that we probably don't see it in our RAW data and can't see it in our processed data if it was there.
The abstract suggests that this modification is prevalent enough that they have seen multiple 2SC modifications per individual protein. I can't get much further because I'm stuck behind a paywall at this Panera this morning. An Open Access paper on this modification in people is available here, though its worth noting that they appear to have done all their work with Western blots. As a side note, it appears there are a lot of journals out there, and it appears that people are running out of names for them.
Tuesday, November 17, 2015
"Hey guys, here's this awesome new algorithm for label free analysis!!!!"
Want a GREAT dataset to test it out on?
EDIT 8/22/16: Since I've been sending this paper link and/or the PXD number to everyone I know we found two things: 1) I miscopied the PRIDE number (fixed) and 2) I overlooked the fact that 3 author contributed equally to this. This incredibly useful resource should correctly be called:
Claire Ramus, Agnes Hovasse, Marlene Marcellin, Anne-Marie Hesse, et al.,
Check out this cool new paper from Claire Ramus et al.,! In this, they developed a great standard that all of us can download and use for free. They also use it to test a bunch of the current algorithms people are using.
The sample is the Sigma UPS1 48 protein mix (all equimolar proteins) spiked into a Yeast digest background at different concentrations. The spike-ins range from 50amol to 12,500 amol (yeah...there is a more efficient way of writing this, but I'm not great at metrics...thanks American public schools! I love sounding like an idiot to everyone else in the world...)
You can download the dataset...wow...my internet is slow right now...well, I guess you can download the dataset...from PRIDE. It will be PXD001819 after it is published. The authors were kind enough to provide a way to download the dataset prior to publication (wow, right!??!) that you can find in the abstract.
Got a couple files. This study was on an Orbitrap Velos running High-Low (FT for MS1 and CID ion trap MS/MS scans). It looks like 60k resolution at the MS1 (around 400m/z)
Monday, November 16, 2015
Its out of something called the Yates lab. Seems familiar, somehow...
The paper is called: "Off-Line Multidimentional LC and Auto Sampling Results in Sample Loss in LC/LS-MS/MS" and you can give it more time (appears open access! here)
Sunday, November 15, 2015
Saturday, November 14, 2015
Toxoplasmosis is terrifying. Okay, a lot of diseases are....or I'm a coward. Not sure which. But stories of the this disease's subtle but powerful effects have been floating around the pop science scene for years. Malcolm Gladwell even mentions it in one of his great books.
Check out this article, titled: "Parasite makes men dumb, women sexy"
The problem with this disease is how highly evolved it is. Its right up there with Plasmodium in terms of complexity. And beating it...after millions of years of successfully existing in the mammalian population? That's gonna be real tough.
Maybe someone ought to take all the data we have on this thing from every Omics technology and make a great big awesome tool for digging through it!
TAAADAAAA!!! ToxoMine is exactly this tool. Interested in complicated diseases? This is a nice example of how we can align all the power these big -omics tools can give us!
Friday, November 13, 2015
What powers are those?
How 'bout the power to Export your data to Excel format?
What about the ability to filter at any level and apply it to other levels (say, I only want to see peptides with cysteines? I filter at the peptide level and then toggle the tab that says "apply to proteins". BOOM!! You have only proteins with peptides with cysteines!
What about the ability to generate your own TMT ratios like you could in PD 1.4. Just want to see the ratio between 126N and 129C, manually generate that ratio!
Have a super computer like the 32 core Proteome Destroyer I hear is coming out, or have PD installed on a server with hundreds of cores? You can go right into the Administration tab and tell PD how many workflows PD can handle at once. Why run only 2 workflows when your monster PC can handle 8 or 10 or 100 files at once?
The biggest improvements? Advanced TMT quan. Instantly apply corrections to your reporter ion data according to the Signal-to-noise ratio of the individual peptide measurements. For example, if you have one peptide that had a signal to noise (S/N) of 2000, meaning 99.99% of the signal you're seeing there is from that peptide and not background, this value will have a higher weight in the total protein quan than a peptide that only had 1e3 counts and a S/N of 3. Better data is weighted to a higher level than lower quality data!
If your Maintenance is current you can download the upgrade from the Thermo Omics Portal and upgrade right now! Don't have current maintenance? Call your sales rep now!
Thursday, November 12, 2015
This is an interesting and thorough breakdown of the Proteomics Standards Initiative, one of many groups that is trying to demystify this awesome field of ours to outsiders (Open Access).
Wednesday, November 11, 2015
Okay, that's a pretty cool icon, right?
What do Proteome Discoverer and Compound Discoverer have in common (besides the obvious overuse of the letter "r") a ton of stuff!
Compound Discoverer looks just like the Proteome Discoverer interface. The goal is to make it easier to move from one data processing workflow to another without having to learn a whole new set of interfaces!
See? It looks just like it. You set up studies the same way, you import data files the same way, and then you just have to figure out what all those funny nodes do!
I'm about to put metabolite ID and metabolism via mass spec on my resume. Even better, when they get around to launching Compound Discoverer 2.0, then this interface can easily do straight up metabolomics.
By the way. If you have used this program (or bought it and would like to learn how to use it) this is the place to find info: WWW.myCompoundDiscoverer.com
Honestly, I wrote this whole post so that the next time I Google: MyCompoundDiscoverer, it'll take me here and then I can direct link to the page. There are instructional videos and overviews!
Tuesday, November 10, 2015
Apparently, I've downloaded several versions of the PRIDE inspector, because I have many .ZIP files in my Downloads folder. For some reason, though, I don't guess I've ever Unblocked the Zip file, unzipped it, and then ran it...or I forgot that I did...
A brand new paper in MCP reminded me of it, and I'm assuming that this means that the newest version I just downloaded is the best one anyway (available here!)
Part of the title of the new paper, btw, is:...moving toward a universal visualization tool for proteomics data standard formats...cool, right?
Some people save their data for archiving in some format or the other like: mzml, mxml, mzxml and xzmlzxmllxmzmlllllllzzMxxmz (just to name a few). These have all been very valiant attempts (except that last one. that guy was just being a jerk) to keep our data in a format that would standardize things, so we have all the important data in one place and we can compare instrument -to - instrument.
Problem is, there have been several of these. And it can be daunting. Say, I'm dosing these cells with a super cool drug and I want to do proteomics on it and I see that another lab previously dosed mice with that drug it might not be easy at all to compare that data.
See where I'm heading?
The PRIDE Inspector reads basically all of these data formats (except that last one. the inventor has been shunned by the field this morning). You get a nice graphical output of the spectra and if the data is processed you can check the results at every level. Perhaps even as nice, if the study is saved in the PRIDE database you can link directly to the dataset. Just want to rapidly meta-analyze everything in the PRIDE repository? This might be your window!
Now, notice the fact that the title includes "moving toward a universal". I'm sure that means there is more to be done here. But it looks like a
Monday, November 9, 2015
The Scientist has a really good article this edition on top down proteomics for protein complexes. If you don't get the free magazine delivered to your house, you can read the article here.
Highlights? Overviews of a bunch of different researchers current work...oh...and the suggestion that Northwestern has hacked a QE HF to have extended mass range along the way of the Exactive EMR!?!?! and it has the capability to do pseudo-MS3s!!!!!
Sunday, November 8, 2015
Image Source: 4designersart/Fotolia.com (lifted from this article)
Epidemiology has been one of those big things for a while. In my mind it seems like it kinda blew up the same time all this -omics stuff did. Schools have been putting lots of money into both the last few years. On the outside, it seems like they're opposite things. They are looking at trends in human beings to find disease patterns, while we're looking extremely deep into the disease, or people with the disease. However, now that we can get deep proteomic coverage in single runs, does this open us up to working together?
More and more, it looks like the answer is a resounding yes! For an overview of the topic you should check out this review: Epidemiologic Design and Analysis for Proteomic Studies: A Primer on -OmicTechnologies. It is open access and tries to bridge the gap.
I think it is very nicely written. While it is geared more toward the epidemiologists, in telling them what we do, it highlights some studies where the two were combined well. If I wanted to do a big study of some disease popping around in a human population, I wouldn't know how to sample people in a statistically valid sense. "Hey group A, you have the disease, right? You're TMT channels 126-129!"
Their job is to assess the factors that are important and design the experiment in a significant way. And then pull the right data out of the final protein list to show what is important. Turns out half the people here also suffer from a second disease? That's a nice data point to have so we don't draw a spurious conclusion, right? And there is something useful to be gained from that knowledge post-data processing? Even better!
This way we can focus on getting good quantitative protein IDs. And...if someone wants to explain what we do in terminology geared toward my collaborators' specific fields? Well! then I can send them this open source PDF to clear up some misconceptions before we sit down at the table and start designing this killer study!
Saturday, November 7, 2015
I just stole this right off a Twitter feed. Left the Tweet intact, even! (Thanks, Julian!)
Okay, this paper is obviously awesome. It goes after some biological question and it comes up with some great insight. Unfortunately, it contains a lot of words I don't know and on this lovely Saturday afternoon I don't have the motivation to do the research necessary for me to fully appreciate what they are going after.
Why should you check out this paper? Cause its pure spectralporn! I can say that, right? They say "foodporn" on network TV all the time! I mean, its like foodporn for LC-MS/MS spectra!
Seriously, though, check this figure out (click to expand)! This is some nice looking data! Benjamin Parker et al., out of the University of Sydney know what the heck they are doing.
Thursday, November 5, 2015
Okay. (Ben slowly gathers thoughts...)...
Now I'm going to tell you about a paper that is so cool that even though I have no idea how they did it, I still think its worth sharing. I'm hoping I'll figure it out as I write this.
First of all, its Open Access (yay!) and available here! Second of all, its cool enough that 2 people sent it to me since it came out and this morning I thought I'd get it on the second read through.
What I do get: The Cancer Genome Atlas is not a leather bound book that sits in a room that smells of rich mahogany....
...instead, it is a huge cohort of clinical cancer samples that have have been or are in the process of being studied with a ton of different genomics techniques. The homepage of the project is here.
Browsing through the papers that have been done on this Atlas (to construct this Atlas? that makes more sense...) shows that there is a lot of bioinformatics firepower at work here.
So...in this study this group took these samples and did an interesting protein array analysis of them. This is where I get foggy. The array they used is called an RPPA. This is a Reversed Phase Protein Lysate Microarray (wikipedia link) (and if are a Jove user, or care enough to register for a free trial, here is a video that shows how an RPPA works.)
Okay. So they are using fancy antibody arrays to show the presence/absence/abundance of proteins. Got it. What do the arrays detect? Well, they went for a whopping 181 antibody probes! Wait? What? Just 181 targets? And the targets were selected based on what we know of current cancer pathways and stuff. My assumption is that the arrays are very fast and/or very cheap...or we would have done this with a mass spec and looked at hundreds of targets with PRM (people are routinely doing 700+ per assay these days on Q Exactives) or more with SRM, right?
But this is where it gets impressive -- monitoring all 181 targets on these arrays they looked at over 3,000 different samples...which is a lot... And these samples have been previously clustered by neat things like disease type and primary driving mutation. So, you can see how different genes interact with hundreds of samples of the same disease that follow the same -- or different cancer driving pathways.
Take home point for me is: For you guys out there generating insane amounts of clinical data, we need to steal more genomics tools! Cause these guys seem (at least...to an outsider...) to be able to do stuff with the data!
Wednesday, November 4, 2015
Hey! I meant to put this up a bit ago. This was one of my favorite events all last year. My attendance this year isn't all that likely....though not out of the question yet! I still have vacation days. We'll see.
You can register here. Warning, if fills up fast! Oh, and this is what Bremen looks like in December...
...yeah, it totally sucks...
Tuesday, November 3, 2015
Alright! This painfully thought-out and beautifully executed experiment yielded a big list of differentially regulated proteins! Woooo! So...now what....?
This review from Karimpour-Fard et al., is a great place to start. This concise little piece in Human Genomics walks you through some tools and approaches that can help you figure out:
1) What is significant in your list (and all the stuff that isn't)
2) What those words mean that that stats and bioinformatics people are always using (Anova?)
3) How to extract some biologically meaningful data out of all that stuff.
A nice short review (and Open Access!) that might help you make that next step forward!
Shoutout to my aunt Beth who took this cool picture downstream off a bridge near home!
Monday, November 2, 2015
Mapping protein-protein interactions in complexes is a tough job. We can go one of two ways with it:
1) The relative way: When I pull down proteins under condition A and under condition B, I see relative upregulation of this protein, so it must be associated
2) The crosslinking way: Under these conditions I throw in a crosslinking compound, then pull everything down, digest and identify the crosslinked peptides.
Both are hard, but the relative way is a good bit simpler from the data processing perspective. Analysis of crosslinked MS/MS spectra? Thats hard. There are some nice approaches like XComb and StavroX/MeroX. SIM-XL is a new one. If you wonder why you might want to try a different piece of software, look at this result output (click to Zoom!)
Um....how frickin' cool is that?!?!?
Its a GUI driven interface with all sorts of cool graphical maps to help make sense of your crosslinked data. It'll accept all sorts of MS/MS converted data files (and if you've got the MSFileReader installed, it'll directly read Thermo .RAW files!) and its even possible to map your data against spatial constraint data obtained from 3D protein structures to see if what you are seeing is possible at the biological level!
You can check out the SIM-XL website here.
And you can find the original paper by Lima et al., here.