Sunday, June 25, 2017

Proteomic analysis of human cells grown in bacterial cellulose

A LOT of biological studies of human cells occurs with cells grown in culture. And we can thank cell culture techniques for a huge amount of what we know of how cells work, but it has been no secret that there are some limitations to cell culture techniques. This is a nice open review on limitations -- with a strong title.

Over the last 15 years or so there have been numerous ideas put forward to bridge the gap between cells growing in monolayers in cell culture and cells as they actually exist in human tissues and organs. Growing cells in 3 dimensions is something that has been discussed for years, but -- so far hasn't yielded much in the way of new understandings in biology. Please don't take this as a criticism, I know two great people working hard on this stuff right now who will have stuff out soon, but these complex bioreactors seem to come with their own bevy of complications -- which better technology may soon fix. In the meantime, check this out! 

This is the analysis of a bacterial cellulose gel material that can be used to support growing human cells....and the title gives away some of it...but this material seems to allow cells to develop in a more native type state!

The preparation of the cells to grow within the material seems reasonably straight-forward. (A quick Google search indicates that the material they used may be a commercialized product. Please do not interpret this as an endorsement and see my rapidly expanding list of blog disclaimers if you have questions).

This team goes to great lengths to compare classically grown cells with the ones grown in this bacterial cellulose material, employing immunocytochemistry and global RNA analysis. They also extract peptides and use isobaric tagging technology and do relative quantification with a quadrupole high field Kingdon-style trap system.

Of particular interest to this methods nerd, although a 10-plex isobaric tagging reagent is employed, the instrument is ran at 35,000 resolution, which the authors state is approximately 50,000 resolution in the reporter ion region. This should allow almost baseline separation of all the reporter ions and may compromise just a little on full separation in favor of maximum speed.

The data files and MaxQuant processed files are available at ProteomeXchange here (after the fully edited version of the paper is released; PXD003975). I strongly suggest checking out the supplemental info for the paper. They did a LOT of work here (...okay...maybe I just really like to look at expert level ICC and pretty STRING networks...)

What did all these quantified transcripts and proteins reveal? That this technique might be an easy way to obtain information from cells that more closely reflects the way the cells exist in the human body than growing those cells in plastic plates in 2 dimensions -- and that sounds like a step in the right direction!

Saturday, June 24, 2017

Activated Ion Electron Transfer Dissociation for FAST comprehensive top-down!

This one had been in my queue for a while and I'm finally getting caught up on things! This is the paper, and it just came out in June's JPR.

First off, this is an in-house developed fragmentation method that the Coon lab has. They published the setup of this instrument in ACS earlier this year (link here!)

ETD fragmentation is used in conjunction with infrared photoactivation. If I talk any more about how it is done, I'll just embarrass myself, so that's why both links are here at the front of this post.

What this biologist does get out of this paper is an AMAZING degree of sequence coverage. For the proteins examined, (up to just around 20kD) they demonstrate sequence coverage you'd only get if you combined every fragmentation method commercially available which would be...well...

...slow....probably taking several runs unless you were really optimized timing on each peak with an instrument capable of many fragmentation types.

AI-ETD is NOT slow. They are getting this amazing level of intact protein sequence coverage with 10-35ms activation time!

Friday, June 23, 2017

A lot of noncoding RNA -- is really noncoding!

Transcriptomics is still booming -- there is so much awesome data being generated from all those cool instruments (I recently heard one of the newest ones can generate 3000GB of transcriptomic data per sample!)

If you've been summarily browsing the biology literature, you've undoubtedly seen reclassification of some "non coding" genes as "coding", via these technologies. And there have definitely been several that have been validated at the protein level.

However -- there have definitely been some that have been purported as "coding" via transcriptomics that do not make it to the protein level. Is it because mass spec based protein technologies just can't detect them? This new open paper at JPR takes an in-depth looks at some of these disagreements and concludes ---

I feel bad for starting this post this way. It almost says -- here is the controversy and here is a response from some really smart Belgish people, but there are other reasons why you should check out this paper.

1) You can see what happens when all the cool free CompOmics tools are put into action (SearchGUI and PeptideShaker)
2) You can see what power we still have with existing tools to ask intelligent questions of the awesome proteomics repositories and answer today's most pressing fundamental questions!

Hey -- there's definitely stuff out there that is real -- but this study just introduces some caution into the mix. There is tons of info in these huge transcriptomics files and cool stuff waiting to be found -- but if something down in the noise range doesn't translate to a protein -- maybe we should....

....hold our horses before concluding it is some inherent fault on the proteomics side!

Thursday, June 22, 2017

How much coisolation interference is too much for reporter ion quantification?

This question comes up a lot and, while I have my own general rule that I picked up from a talk by someone way smarter than me, I wasn't aware of this paper till I stumbled upon it (while looking for something completely different...) where they try to answer this very question!

It is worth keeping in mind that this paper was submitted in 2013 and we've learned some stuff since then, but this is a really thorough approach to the topic, comparing LFQ and two isobaric tagging technologies on a fully characterized dataset.

There is a lot of good info on the paper -- from analysis of 2D separation for reporter ion experiments, to the LOD/LOQ of said experiments on the instrument platform they utilized in the study. In the abstract they even provide a single summary of the maximum percentage of coisolation interference they recommend using (which is lower than what I use), but the logic for why they use it is much better than what I had!

Tuesday, June 20, 2017

Over 9,000 LC-MS/MS experiments integrated by machine learning!

This is AWESOME! Before I get too carried away, let's point you to the very nice open access paper here!...then...

...put it into some context!

There have been multiple huge attempts to manually map protein-protein interactions. I have been completely unfiltered in my love and respect for the BioPlex project, and this will not change, but there are other resources with other technologies as well. BioPlex is a reasonably new effort and I've tried not to seem too stressed out about new studies where people have met-analyzed other datasets, like the huge Y2H (yeast 2 hybrid) assays when BioPlex info is available.

TADAA!!!  Welcome to hu.MAP (

What is it? It is a meta-analysis of the BioPlex data released so far with:
this study from Marco Hein et al., (great Max Planck study in Cell from 2015) and
this cool one from Cuihong Wan et al., that appeared in Nature around that same time (that I missed till now)

These studies are all great and well done and just awesome on their own. Why would anyone mess with them? Answer: Because that is what public repositories of data are for. And -- get this -- analyzing these together with fancy machine learning algorithms -- turns up protein-protein interactions that none of them did on their own!

Not to leave it alone there -- no way -- these authors also look at some of the Y2H databases, which is cool and all -- but they painstakingly validate some of the protein-protein interactions that their methods pull out of these beautiful data sets and show they are VERY VERY real.

How'd they do it? Magic formulas with loads of Greek letters that I can, in no way, confirm are correct or accurate -- but their validation assays sure look great! The important part is that we can go to and simply type in what we're interested in and yield the rewards of their efforts!

Monday, June 19, 2017

Taking on protein complexes with proteomics!

This is a great new paper talking about the methodology for protein complexes. Pubmed link here.

The author gets feedback from some heavy hitters in the field, including: Brian Chait, Neil Kelleher, Mike MacCoss, Carol Robinson, Uwe Schulte and Albert Heck -- among others.

While the author really starts off a little pessimistic regarding the challenges and necessary technology advances in NMR-type technologies the tone of the mass spectrometry experts is much more upbeat and optimistic. With the tools we have now for native complex mass spectrometry and crosslinking and cryo electron microscopy, maybe we don't need to wait for the next big breakthrough in NMR to fully elucidate protein complexes.

Highly recommended perspective paper -- a big bonus for me is a description of the successful analysis of protein/lipid complexes, a reference I need to look up and send to a friend who may have samples just like that heading her way!

Stop Windows update before setting a huge processing run!

I was overjoyed this morning to find that my PC had installed updates and rebooted....and optimistically hoped that my FASTA had somehow been allowed to complete before Windows decided it was more important to update my Audio Driver...which appears to be the only alteration listed...

...nope...(P.S. TrEMBL is 39GB takes a while....)

I've always hoped that I would find something that would allow Windows Update services to allow me to create a list -- "if X program is using 80% of the CPU don't reboot" but I never have.

This post is more of a reminder to myself than anything else -- that I need to type "Services" into the Windows search bar, find Windows Update, right click and STOP the service prior to starting any huge multi-day runs....or before trying to import a 39GB FASTA file in it's entirety....

Saturday, June 17, 2017

FASTA update time! TrEMBL and Joint Genome Institute!

At the Analytical Lab Manager's meeting at ASMS, I got to meet and answer panel questions with David Tabb. Of the many points he pressed upon us for proteomic bioinformatics, one that stuck with me was the negative consequences of using old FASTA databases. Maybe it stuck by me cause my FASTAs in use look something like this screenshot...

2011...2012...2015...who knows...? Okay. I'm at fault. Especially when FASTAs are free from UniProt. The difference was seriously impressed upon me when I realized TrEMBL is up to 19GB!  The last one I downloaded...was 7GB...that's a lot of data!

An important note from his talk I was very surprised by was about the Department of Energy Joint Genome Institute.

Which hosts tons of cool FASTAs from microorganisms! If you are studying something that isn't present in UniProt/SwissProt (or under-represented?) maybe you should check it out.

Friday, June 16, 2017

NIST is incorporating Proteome Tools resources into their libraries!

Are you familiar with the Proteome Tools project? If not, you can read about it here. It is an ongoing effort to generate a proteome wide synthetic peptide library!  At launch this group had 330,000 synthetic peptides done and they are still going.

There is a group here in my local area that knows something about spectral libraries -- and has an awesome looking Twitter icon...

...that has been curating this released data and setting it up for release. If you're already using tools like MSPepSearch that are fully compatible with these libraries, you don't have to do a thing different to utilize these resources!

You can just go to the ChemData NIST spectral library hub and HCD fragmentation data from the first release of the Proteome Tools library is there.

You can access it here!

Tuesday, June 13, 2017

More tools to find alternative splice variants in humans!

This is a bit of a continuation from yesterday's pre-coffee paper. These weird alternative splice variants -- or genes from over here and genes from other there forming transcripts and/or proteins that we might not be thinking to look for.

The paper I'm looking at is this new ASAP one at JPR.

I'm going to admit up front I don't 100% understand how they did this. The mass spectrometry is straight-forward, and the data has been deposited at PRIDE (PXD006026). The introduction  is eye opening (even if you were watching the Finals on the East Coast) and was enough for me to bumble through this...check this out!

I was thinking these weird alternative events were going to be very rare. Apparently I was thinking very incorrectly...

This is where I'm a little fuzzy on the details of this paper. What I do get: they generate databases that contain only tryptic peptides for alternative splice variants. I'm not 100% clear on how the databases were generated and filtured other than they employed a tool called SpliceVista that references two alternative splice variant databases called EcGene and EVBD. Perhaps the output from SpliceVista is just FASTA and that is all there is to it.

Once they had their databases the proteomics gets very interesting. They identify splice variants in their MS/MS data, just as Dr. Lazar found on her instrument in the paper I mentioned yesterday. Where this paper goes one step further is that they did phosphopeptide enrichment on their samples as well...and find phosphorylated splice variants!
..And they don't find a few variants. They finds a bunch of them -- enough to sit back and think -- wow...this HAS to have serious biological implications! How much more can we refine out of the hundreds (thousands?) of great historical phosphoproteomics datasets using the databases they generated here?

Monday, June 12, 2017

Chimeric fusion RNAs in noncancerous cells!

Biology is really complicated. Just throwing that out there.  It would be really great it if it would just all obey the central dogma...

 ....and we could get back to running our perfectly calibrated and QC'ed instruments (thanks, WikiPedia article on Chimeric RNA!), but then it throws us completely new (at least to me) concepts like Chimeric Fusion RNAs, which are the topic of this cool paper!

I'm familiar with how screwy DNA/RNA can get in cancer and how complex these mixups can be at every level (transflips, for example?)...but...come on....RNA can't be just binding to other RNA and making messed up protein in normal human tissue, right?!?!?

This team shows some convincing evidence that it is and does. They started by looking at RNA-Seq from nearly 300 different libraries and found loads of reads coming back that could only mean that this gene over here and this gene over here were somehow fused in making weird RNA, but they only found them in these tissues...

...yeah, like all of them.

If you've ever looked at RNA-Seq data, or sat through a talk on it where someone understands it and is being honest about the technique you know that there is a lot of noise in the data. Part of the reason their informatics and statistics are so advanced is that they have to be in order to get to the good stuff. So maybe this is all just noise and false positives?

That figure at the top of the paper? This is a clip from this study where they use PCR to amplify some of these products and visualize them on gels (not much noise there), the very top is Sanger Sequencing of RT-PCR products and the bottom is MS/MS that shows that some of these weird fusions are making it to expressed proteins.

In "normal" tissue! Maybe this is what a lot of our unidentified spectra are...?....if so, the only way to get to them is going to be more informed FASTA databases that include these tissue or individual specific fusions as options for database search....or de novo peptide sequencing that can BLAST back to short chains from multiple gene products...

Either way, biology is complicated...and it would be a whole lot less fun if it wasn't!

Sunday, June 11, 2017

Social network architecture + proteomics to study immune response!

Understanding social networks (like Gusto's, above!) is big business and requires sophisticated big data approaches to fully understand and capitalize on. All the algorithms being developed to study social networks must have applications besides the freakishly accurate predictions in my Inbox of when Gusto and Bernie need another case of Deedle Dudes, right?

Leave it to those brilliant people at Max Planck to divert them to understand something as ridiculously complex as the human immune system response! You can check it out in Nature Immunology here.

How'd they do it? First they needed the data. So they flow sorted 28 different types of immune cells (!!!) from a person and did deep proteomics on all of them. The raw instrument data is all available at PRIDE here (PXD004352), but they also set up a beautiful web resource for the full output data at

This is the Immprot front page image that describes the experimental design.

This looks like a big matrix already, right? They do proteomics to a depth of around 10,000 proteins on all of these cell lines -- but get this -- it is bigger than this. They take the flow sorted immune cells, culture them and activate them. I'm no immunologist and it's too early in the morning to call and ask any friends about it, but from a rough understanding those cells in their native state floating around aren't all that interesting, right? They expose them to cytokines in culture that initiates something like the response they'd have to pathogens or other things immune cells destroy. Interesting to the non-immunologist here -- the different cells are activated with different compounds.

These activated cells are studied by doing proteomics on the cells as well as doing proteomics on the media. If you are considering doing something like this -- proteins excreted by growing cells -- I strongly suggest the method section. I wouldn't have thought to remove contaminating background materials from the culture supernatants and they go to great lengths to remove cellular material that might interfere with their results. If nothing else, this section is beautifully written. It's a Max Planck paper for sure...

It's easy to miss here that they also did transcriptomics as well, but the proteomics was all single shot on a 50cm column with 180min linear gradient. I'm glad to see they are still employing the SprayQC instrument software (I haven't heard of it in a while and wasn't sure it was still updated for new instruments!) The MS1 is 120k resolution and MS/MS was set for more sensitivity over perfect transient matching with 55ms.

The data was expertly processed with MaxQuant and Andromeda and the MaxQuant output is also available at the PRIDE location above.  But the stuff that really knocks your socks off is what they do afterward. Maybe it is less impressive to computational people, but this all falls under bioinformaGics for me that you can check out at ImmProt (but the figures in the paper are just stunning!)

Examples of me just looking around this morning..wait.... You seriously have to check this out. And send this link to any immunologists you know (I just did!) this has got to be an amazing asset for them. I'll skip the volcano plots (which are awesome!) and just go for the protein networks.

I went after my favorite Integrin protein (cause they're involved in everything!)

I love the note at the bottom! There is obviously some serious computational power behind the scenes here. Being too greedy will cost you some time. It populated with interaction partners, and you update the interaction wheel.

And it constructs the network map wheel for you! I wasn't too greedy, this thing was like 10 seconds. Please note I clipped the cell key and output to make them fit here, so scaling looks better on the page. You can output the plot as a high res PDF, or you can go straight to searchable tables.

Wow. This post is really long, if you can't tell, I really like this resource. It shows just another place where we can come in with our technology and techniques and impact researchers who maybe haven't explored what we can do now. Is this a resource that dedicated immunologists can mine for years and continually find new things to explore? It sure looks like it!

Saturday, June 10, 2017

The host-parasite interactome!

Okay....I've got 22 pages of hand written notes from ASMS I'm trying to sort out. Unfortunately...they look something like these notes from Alexey's awesome "Proteomic Dark Matter" talk.... they probably condense down to 1 type written page, but while I'm filing things into my spreadsheets as 1) Published and can talk about it 2) Watch for this paper!!!, etc., I stumbled onto shockingly cool tool at bioRXiV!  You can read about it here.

Maybe host-parasite interactions aren't your thing. But "interactomics" is one of the big data approaches that the bioinformaticians are going to be using increasingly to make sense of the data we're depositing. Also -- it is just fun to spend 3 minutes here looking at this thing! (Direct link to the software is here!)

After making a bunch of awesome and wobbly networks between this parasite and that one, WikiPedia has a great break down of Interactomics here.  In general, it is a big picture approach to link what is changing. When we take our quantitative data out and put it into pathway analysis software like IPA or BioCyc, we're looking at known interactions that have been painstakingly constructed from historic studies. Interactomics steps away from all this known stuff and builds the networks statistically. If I say any more about it, I'll only embarrass myself further.

As someone really interested in host-parasite interactions -- I can't imagine a more appropriate application of these tools!!

Friday, June 9, 2017

The MS Bioworks App!

We finally updated our old personal cell phones to some fancy impossible to use things that are far too large, but have tons of memory. The first thing I installed? The MSBioWorks App!

If you don't have it (and don't have an old phone full of Pug and kitten pictures and no room for it) you should.

You can type in any protein by UniProt ID or Gene Symbol (not sure of the requirements, but if I use the universal gene identifier -- do they call that HUGO now?), it will find the protein. You can choose the digestion conditions and then email the report to yourself!

It will predict peptide fragmentation patterns from ones you enter and modify them however you want, as well -- also with the direct email output with a surprising number of features!

Okay...I apologize to the App developer, this screenshot doesn't do your cool tool justice. I sent myself the results in Korean (cause why not?), but my I converted the text to UniCode into Excel so it lost the formatting (my Excel doesn't have a Korean language pack installed). Line 71 and 84 are the y2 ions in as z=2 and z=3, respectively!

Also, the App has a great news section for new papers. I have no idea who keeps this updated, but it is great! And way better than getting news here!

Okay -- but the thing I have ALWAYS loved this App and developers for is the calculators. I don't have a good screenshot, but if you have an LC line of this X diameter and Y length it calculates the volume of that line for you! It has been invaluable for me in the past and I can't thank this group enough for making this for us and giving it out for free.

Thursday, June 8, 2017

Taking the new PeakJuggler for a Spin!

At ASMS we got an updated version of the PeakJuggler nodes for PD 2.x courtesy of IMP and I don't have a ton of data on them because I was just running them on my tablet at night, but here is some stuff I've learned.

First off - PeakJuggler uses R (in the background, don't worry!) but you have to make sure your version is current enough. I had a couple crashes because my tablet was carrying a version that was too old.

If you never use R for anything other than IMP PD nodes, you can just go to CRAN and download the newest version (3.4.0 as I'm writing this). This will put a second (or third) version of R on your PC which you can remove with Uninstall Programs in Windows.

If you have used it but aren't an expert (example, myself) and don't want to have multiple instances on your PC and want to migrate your files over you can run this script.

# installing/loading the package:
if(!require(installr)) { install.packages("installr"); require(installr)} #load / install+load installr   # using the package: updateR() # this will start the updating process of your R installation. It will check for newer versions, and if one is available, will guide you through the decisions you'd need to make.

If you are an expert, I bet you know a smarter way of doing this -- or already use the current version!


The great people at IMP already benchmarked the new PeakJuggler with the Ramus et al., Quan test dataset. It is high/low data of yeast with USP1 proteins spiked in. Since they did that one, I've been working on the Shalit et al., dataset instead.

This is human digest with E.coli digest spiked in at different levels.

As a first test, I ran just one 3ng E. coli spike in and 1 10ng spike in. These files are large relatively large; about 1.5GB each.) With PeakJuggler and MSAmanda and Percolator -- I'm hitting at 45 minutes per processing and around 9 minutes for each consensus for each individual file on my trusty old 8 core PC.  This is a marked improvement over the previous version! After combining the results with a MultiConsensus report, I get some great output!

Here I can see that the 10ng E.coli spike in is always higher than the 3ng spike in. Sorting through it is pretty clear the human proteins hover in the 1:1 range. You'll note at the bottom I've got one that wasn't ID'ed in the 3ng. It only has an area of 7e5 in the 10ng sample, and it appears that might be close to the threshold.

I don't think the webpage mentions improvements in the plotting, but at first glance the XICs of the peaks look really nice! In the PD 2.1 version the output appears to be the same as the PIAD, meaning that it can't calculate LFQ ratios.

Summing it up -- once you update your R, this free node is a really nice solution for label free quan in Proteome Discoverer!