Thursday, July 24, 2014

Is the human proteome shrinking?

Is the human proteome (by that, I mean the number of proteins we can express) shrinking the way we've shrank the majestic mastiff into the super majestic pug?

Lets look at the evidence:

The Scientist this month (thanks Nastratin) reported the results of 7 studies that said we can express even fewer proteins than we thought we could.  For a bit of history, the human genome project initially reported about 30k coding regions.  Subsequent studies have found that a lot of those regions contain junk (by that, I probably don't mean mysterious epigenetic controls...or do I...?)

How did these studies determine that these other regions are not coding?  By looking at in-depth proteomics studies, of course!  One way of doing this would be to say "hey, we've ran 7 billion proteomics samples on tissue to this point in time and NO ONE has ever seen a peptide from this protein."  That is one way, right, but that doesn't rule out the possibility that this thing is in plasma and has a copy number of 10 proteins per mg of plasma, right?

This group took the evolutionary approach.  Genes that are highly conserved among many species produce proteins that are essential to life.  The more essential, the more species carry them, and the more conserved they tend to be.  What if we then just go to the gene sequence and compare that sequence against monkeys and dogs and a bunch of other things?  If no one has seen a peptide from this protien & this gene is not expressed by our closest relatives (or it is highly modified) & it has a structure that looks very unlikely to be a protein THEN we can probably safely say that isn't a sequence of DNA for making protein (its probably for mysterious epigenetic weirdness, cause its unlikely its just taking up space, right?)

The conclusion is that human beings can probably express about 19,000 proteins, which probably means their are only 4 billion proteoforms....

You can read the original article (open access) here.

You can read the article in the Scientist here.

Wednesday, July 23, 2014

Search GUI!

I can't hide my love of the friendly and powerful DeNovoGUI from you guys.  What if that easy interface also had a cousin with 4 integrated search engines and the display visualization of the cool PeptideShaker?

Then you'd have SearchGUI, yet another thing that has been around forever that it completely new to me.  Sure, these open source tools have weaknesses, but with their powers combined, you are looking at a very nice free (and easy to use!) tool

You can get SearchGUI here.

The truth about mad scienstists

I tried tracking down the original source for this:  seems like its redditor:  Dualaction2.

Tuesday, July 22, 2014

Proteogenomic characterization of cancer!

I have been waiting for this stuff to start showing up in bulk!

Most biology labs now have tons of "next gen" genomics data on their cells of interest.  Increasingly, these labs are all acquiring proteomics data.  The obvious next step?  Making these data sets work together.  Yes, some stuff has trickled out here and there, including some cool strategies for database reduction.  But I think we've seen the tip of the iceberg so far.

In this week's Nature, we start to see what is coming!  In a huge study that features members of the Tabb lab and Reid Townsend and many others, the power of this approach is really explored.  Proteomic data from colon and rectal cancer was compared to data from "next gen" sequencing data and tons of new data was discovered.

The punchline?  This NATURE paper required no mass spectrometer.  This data was already deposited in The Cancer Genome Atlas.  This was essentially a meta-analysis but comparing these datasets was powerful enough that it slammed into a journal this big.  Again, this is the kind of stuff that is coming....

Monday, July 21, 2014

More on protein quan trumping mRNA quan

More (now old...2013...where have I been...) evidence that protein quan is superior to RNA quan.  In this cool paper in Science from November, researchers looked at the expression levels of RNA and protein in:  humans, chimpanzees and rhesus (monkeys...I don't care what the correct term is).

The conclusion?  While the mRNA predicted tons of variation in expression levels, none was seen at the protein level.  And this makes sense, right?  Why would our close neighbors have completely different levels of the proteins that we have?  The conclusion in this awesome paper (like the one from Saturday) is that the real selective pressure on regulation is at the protein level, where we can exercise a much more fine level of control.  Thanks to Michael Ford for tipping me off to this great study.  And kudos to these researchers for such an elegant experimental design, because its easier for us to forget our primate cousins since we don't have to look at this every day:

Sunday, July 20, 2014

Heavy analysis of the human proteome drafts

I'm certainly not the only person who has jumped on the new resources provided by the human proteome drafts and checked them out.  In this brand new paper in JPR, a group out of Madrid takes a look at some of their favorite proteins in the human proteome drafts and comes back with an interesting analysis.  (Abstract here.)

I love the fact that, in this paper, they did the same experiment Alexis and I did the day the drafts came out.  We chose proteins that we knew would lead to cancer if they were over or under expressed and analyzed those.  This group took proteins from nasal tissue (olfactory receptor proteins) and looked for those in the various tissues.

At first glance, the image on the abstract looks pretty damning:

These are olfactory (smelling) receptors.  What are they doing being expressed in colon cells and platelets?!?!  (It is worth noting that the image above is from the (the data from the Pandey lab).

The authors of this analysis indicate, even in the abstract, that the "experimental data from these studies should be used with caution."  And I agree.  There is inherent error in studies this big; hell, a 1% false discovery rate on 100 million observations is 1 million observations that are false, right?.  But...the experimental data from every study should be used with caution.  And we all know that (by "we" I mean you proteomics experts who read this.)  I am glad that this caution is stated, though, for the people outside our field who have discovered this resource through mainstream news outlets.

That being said, I have some problems with this experimental design.  There are 3 big assumptions being made here:
1) The annotation of these proteins are 100% correct
2) These proteins have 1 function
3) These proteins only function in one tissue

Number 1 is easy.  Annotations suck.  The system for annotation sucks.  The first person to identify a protein in the first tissue gets to name it, right?  So there are tons and tons of proteins named in tissues that are heavily studied.

Number 2 is relatively easy, as well.  Making new proteins takes a ton of energy.  Evolutionarily (that's not a word? whatever...) there will be a lot of pressure for proteins to function in more than one way, in more than one context.  (Side note, one of my graduate committee members, Jiann-Shin Chen proved the first dual substrate the 1970s...sorry, couldn't find the link, I'll add it later if you're interested).  Considering the sophistication of eukaryote proteins, it is naive to think that if a protein is annotated as "Butt_itching_protein_1" that it would ONLY be utilized in the itchy butt response pathway.

Number 3 is an impressive coincidence.  Like millions of Americans, I subscribe to "I Fucking Love Science" and get Elise's feed of cool articles.  From this feed I know that:  zebrafish embryos highly express functional olfactory response proteins and olfactory receptors are highly active in human skin.  Heck, I've looked through more than a few high quality proteomics assays and seen "olfactory response proteins" in bunches of different tissues.  So...I think this was a poor choice for analysis.

TL;DR:  Please interpret the results of the human proteome draft maps with caution.  They are draft maps.  Two, consider proteins in an evolutionary context before using those proteins to generate excessive criticism of datasets that a ton of work went into.

Thanks, Karl, for suggesting something to read over coffee this morning!

Saturday, July 19, 2014

Peptide quan beats RNA quan?!?!?!

This is old news, I guess.  At least everyone at the bar last night knew about this except for me.  In my defense, this field publishes a ton of stuff.  As most y'all know, my background is biology, microbiology to be precise. While I never ever did RT-PCR, I am (was?) under the impression that nothing would give you verifiable insight into the amount of a gene product in a cell the way that technique does.

What if it isn't as precise as peptide quan.  What if it wasn't even close?

Well, it seems like an accepted fact now that it is the case.  For a breakdown, take a look at this paper in Nature Reviews Genetics.

Protein abundances are more conserved among species than RNA abundances.  There is plenty of evidence that living systems have protein abundance levels that they are happy with and a whole lot of that regulation is at the post-translational level.

mRNA transcript abundances only partially correlate with protein abundances.  Right.  I guess it seems obvious, I think of the protein levels in a cell as this dynamic mixture, but I assumed that we could tell how much protein is present -- in a very linear way -- from the amount of RNA present...and all the evidence says that we can't.  Post translational regulation is very very important to the amount of protein that is going to be present.  Again, probably obvious, but I'll have to change my mental framework around a little.

Thursday, July 17, 2014

CorConneX -- tools for zero dead volumes

Something cool I learned about this week is the CorConneX.  I have to admit I don't have a concrete understanding of how this works, but it was highly recommended by someone who knows what she's doing!
You can read more about it here.  The gist, however, is that it automatically (robotically?) makes zero dead volume junctions with silica columns and traps and lines and as many as you want.  This would be especially useful for systems that aren't, for whatever reason, compatible with nanoViper fittings.

Wednesday, July 16, 2014

Cold spring harbor proteomics course!

This week I was lucky enough to get to help out at the Cold Spring Harbor Proteomics Course.  If you aren't familiar, it is probably the most intense bootcamp in proteomics in the world.

Want to learn proteomics?  Sign up for next year.  The amount of stuff the instructors run through is just boggling.  If you can think of a proteomics experiment, chances are they at least talk about it at this course --heck, they probably do it.  Dave Muddiman showed up last night and taught a class on imaging mass spec that ran long after dark.

My contribution was to set up a shiny new Q Exactive and to train people on it and to provide instrument support.  I've heard from multiple people that it is pretty tough to find and hire skilled proteomics experts these days.  If you are in this quandary next year, you might want to considering hiring someone who is smart and motivated to work hard and send him/her up there.

Tuesday, July 8, 2014

SIEVE or PD for label free quan

In the context of obtaining peptide IDs, nothing can aid you the way good chromatography does.  More and more, I'm seeing that this is paramount with today's super fast instruments.

What about for assigning quantitative data?  How important is my chromatographic alignment?  SIEVE is a program that puts chromatography up front -- data in tight m/z windows within a small retention time windows are chosen as "frames" and those are what you do your quan on.

Proteome Discoverer has a label free quan node - the "Precuror Ion Area Detector".  When used in conjunction with an event detector, you can pull out ions from your files within a narrow ppm range (maximum is 4ppm! I use 2ppm) and compare those peptides.  Retention time is never considered in this current iteration.

This is how I've always done label free quan.  Mostly because my previous employers did not purchase SIEVE.  I'd manually export my label free quan data and use a short script in DigDB to remove matching peptides that did not match in retention time (or pull out the PSMs with the list the retention times, subtract them and throw out anything on the outside of my retention time window, then recompile the report).

Yes, it is a lot of work, and essentially a work-around.  But if you can't afford another software package, you have a backup plan.  You can follow a link here to a video I made regarding setting up PD for label free quan.

Wednesday, July 2, 2014

Sample collection. Level? Super robot!

This week I've been working up in Cape Cod.  Tough life, I know....

The proteomics facility at Woods Hole studies biomarkers throughout the ocean and found at different depths.  A complication of the research here is that sample collection requires dragging long (miles long!) cables below the ocean and pulling them up.  This process takes a long time.

Fortunately, WHOI has an amazing capacity for building submersible robots!

Enter CLIO, a robotic submersible for capturing proteomics samples (and genomics and RNA and metabolomics and other boring stuff) from the oceans of the world.

CLIO's job will be to drop down to incredible depths, filter sea water, store it and return it for proteomic analysis.

You can read more about CLIO here.

Tuesday, July 1, 2014

Captive spray is back for mass specs!

Almost a week with no entries!?!?!  I've been busy...  This one is definitely worth a few minutes of writing.

Captive spray is something that has a lot of fans out there.  Recent big business dealings made it go away, however, and now you can only purchase it for NMRs or for instruments made by an NMR company or something.

I recently found out that a new company is now producing these sources again for mass spectrometers.  If you are interested, check out Ultra-FAST at this link.  If English is your language, there is a small button in the right corner that will translate the page.