Wednesday, June 28, 2017

Another awesome label free quan dataset to test with!


I was browsing through ProteomeXchange looking for cool data to queue up before I went to work -- and found a brand new dataset for testing label free quan algorithms!

Then I realized I already had this dataset and it is a great one. You can access it directly here.

Here is the description. Sounds like time for some more LFQ tests!




Tuesday, June 27, 2017

CHarge Ordered Parallel Ion aNalysis (CHOPIN)


They started making hybrid mass spectrometers so that one box could have more capabilities than any single instrument alone. A few years ago, a new type of hybrid instrument was produced that has a quadrupole an ion trap and a Kingdon trap system. In the simplest use of the experiment for proteomics, the high resolution trap acquires MS1 peptide signals while the low resolution trap simultaneously acquires fragmentation data from ions isolated via the quadrupole and fragmented either by CID in the ion trap of by HCD on the ion's way to the trap.

With all these capabilities, could there be a better way? These authors seem to think so!


What is it? A really complicated instrument method that further prioritizes the capabilities of each fragmentation type, mass analyzer, and instrument parallelization.

Honestly, I looked at this for a couple hours last night and I'm still not sure I understand where the extra time is coming from, but I do understand that over the course of one cycle they are able to gain 7 extra MS/MS scans. I also understand that this is an instrument method that I could just copy from their Supplemental info and give it a try!

I highly recommend checking out the RAW files at PRIDE here.  They evaluated this method on an offline fractionated cancer cell line with both the standard method and with CHOPIN. In both cases, these are really nice datasets.

I respect the amount of work that they put into optimizing this instrument method. However, the highlight of this beautiful paper for me might be the use of the Elastase enzyme in conjunction with trypsin to get the absolute deepest proteome coverge!

Monday, June 26, 2017

Proteomics of mice in space!!


I'm up stupid early in the morning and have to type this reeeal fast, but I'm way to excited to let it wait till I'm home after work, cause it's Miiiiiiiiice iiiiiiiiiiinnnnn Spaaaaaaaaaaace (sounds like this in my head --warning, audio!)

Before I have too much fun with this important study, this is what I'm talking about -- in this month's JPR.

This is important stuff, especially for any of us who grew up under the Asimovian assumptions that humans take the to the stars or our story ends here in our inevitable extinction. Which -- considering that advanced civilizations may receive decades of our television far before our arrival makes it seem less bad now than it did when I was younger. "Hello species that hasn't had war in 1 billion years -- yes...I come from the Tony Danza planet...."

It has always been really really hard to become an astronaut/cosmonaut. This was extremely well publicized here --especially with the Apollo program -- those dudes were the best they could find (of course -- within the time period's sad and systematic biases), and this process hasn't gotten ANY easier (18,000 applicants this year already! Does that make you as happy as it does me?).  But...it has also been well publicized how hard the process of being in space is on these extremely fit and conditioned people (great recent review on the topic).  If we are going to make it to the stars we're gonna need to be physically ready for it. And we're gonna need a new class of heroes to help us with this, namely Miiiiiiiiice iiiiiiiiiiinnnnn Spaaaaaaaaaaace!!


Sorry...back to serious.  They put a bunch of mice in a satellite -- IN SPACE -- FOR A MONTH -- and then studied them directly afterward and during/after recovery. There were, of course, age and gender-matched mice that didn't go to space for control.


A big focus of the study was the skeletal muscle proteomes. The intact proteins from each muscle type were separated in the first dimension by SDS-PAGE and nanoLC was performed on the digested/ extracted peptides.  The resulting peptides were desalted online and separated by nanoLC (25cm column) into a quadrupole Kingdon style trap mass spectrometer running a standard Top10 method. For QC, they used the iRT peptides from Biognosys (coincidentally, the company that makes the SpectroNaut software). This might be the first time I've seen that standard used for DDA experiments, and it sounds like a great use for it! The samples were randomized(!!) because if you're going to pay to put mice in outer space, you had better do some darned good science on the ground -- and they do!

The data was all processed with MaxQuant using LFQ and "match between runs" and the Raw and processed data are available at ProteomeXchange here (PXD005035).

I've got to run and I don't want to spoil the surprises -- but they find some serious cool stuff regarding how mammalian muscles respond to a month(!) in space and suggest key pathways we might be able to target to mitigate these effects.

Yes -- this post allowed me to be silly this morning -- but, make no mistake this is a really nice study performed by a top-notch team on a topic I think is very important and I can't recommend this paper more.


Sunday, June 25, 2017

Proteomic analysis of human cells grown in bacterial cellulose


A LOT of biological studies of human cells occurs with cells grown in culture. And we can thank cell culture techniques for a huge amount of what we know of how cells work, but it has been no secret that there are some limitations to cell culture techniques. This is a nice open review on limitations -- with a strong title.

Over the last 15 years or so there have been numerous ideas put forward to bridge the gap between cells growing in monolayers in cell culture and cells as they actually exist in human tissues and organs. Growing cells in 3 dimensions is something that has been discussed for years, but -- so far hasn't yielded much in the way of new understandings in biology. Please don't take this as a criticism, I know two great people working hard on this stuff right now who will have stuff out soon, but these complex bioreactors seem to come with their own bevy of complications -- which better technology may soon fix. In the meantime, check this out! 


This is the analysis of a bacterial cellulose gel material that can be used to support growing human cells....and the title gives away some of it...but this material seems to allow cells to develop in a more native type state!

The preparation of the cells to grow within the material seems reasonably straight-forward. (A quick Google search indicates that the material they used may be a commercialized product. Please do not interpret this as an endorsement and see my rapidly expanding list of blog disclaimers if you have questions).

This team goes to great lengths to compare classically grown cells with the ones grown in this bacterial cellulose material, employing immunocytochemistry and global RNA analysis. They also extract peptides and use isobaric tagging technology and do relative quantification with a quadrupole high field Kingdon-style trap system.

Of particular interest to this methods nerd, although a 10-plex isobaric tagging reagent is employed, the instrument is ran at 35,000 resolution, which the authors state is approximately 50,000 resolution in the reporter ion region. This should allow almost baseline separation of all the reporter ions and may compromise just a little on full separation in favor of maximum speed.

The data files and MaxQuant processed files are available at ProteomeXchange here (after the fully edited version of the paper is released; PXD003975). I strongly suggest checking out the supplemental info for the paper. They did a LOT of work here (...okay...maybe I just really like to look at expert level ICC and pretty STRING networks...)

What did all these quantified transcripts and proteins reveal? That this technique might be an easy way to obtain information from cells that more closely reflects the way the cells exist in the human body than growing those cells in plastic plates in 2 dimensions -- and that sounds like a step in the right direction!

Saturday, June 24, 2017

Activated Ion Electron Transfer Dissociation for FAST comprehensive top-down!


This one had been in my queue for a while and I'm finally getting caught up on things! This is the paper, and it just came out in June's JPR.


First off, this is an in-house developed fragmentation method that the Coon lab has. They published the setup of this instrument in ACS earlier this year (link here!)

ETD fragmentation is used in conjunction with infrared photoactivation. If I talk any more about how it is done, I'll just embarrass myself, so that's why both links are here at the front of this post.

What this biologist does get out of this paper is an AMAZING degree of sequence coverage. For the proteins examined, (up to just around 20kD) they demonstrate sequence coverage you'd only get if you combined every fragmentation method commercially available which would be...well...


...slow....probably taking several runs unless you were really optimized timing on each peak with an instrument capable of many fragmentation types.

AI-ETD is NOT slow. They are getting this amazing level of intact protein sequence coverage with 10-35ms activation time!

Friday, June 23, 2017

A lot of noncoding RNA -- is really noncoding!


Transcriptomics is still booming -- there is so much awesome data being generated from all those cool instruments (I recently heard one of the newest ones can generate 3000GB of transcriptomic data per sample!)

If you've been summarily browsing the biology literature, you've undoubtedly seen reclassification of some "non coding" genes as "coding", via these technologies. And there have definitely been several that have been validated at the protein level.

However -- there have definitely been some that have been purported as "coding" via transcriptomics that do not make it to the protein level. Is it because mass spec based protein technologies just can't detect them? This new open paper at JPR takes an in-depth looks at some of these disagreements and concludes ---

I feel bad for starting this post this way. It almost says -- here is the controversy and here is a response from some really smart Belgish people, but there are other reasons why you should check out this paper.

1) You can see what happens when all the cool free CompOmics tools are put into action (SearchGUI and PeptideShaker)
2) You can see what power we still have with existing tools to ask intelligent questions of the awesome proteomics repositories and answer today's most pressing fundamental questions!

Hey -- there's definitely stuff out there that is real -- but this study just introduces some caution into the mix. There is tons of info in these huge transcriptomics files and cool stuff waiting to be found -- but if something down in the noise range doesn't translate to a protein -- maybe we should....



....hold our horses before concluding it is some inherent fault on the proteomics side!

Thursday, June 22, 2017

How much coisolation interference is too much for reporter ion quantification?


This question comes up a lot and, while I have my own general rule that I picked up from a talk by someone way smarter than me, I wasn't aware of this paper till I stumbled upon it (while looking for something completely different...) where they try to answer this very question!


It is worth keeping in mind that this paper was submitted in 2013 and we've learned some stuff since then, but this is a really thorough approach to the topic, comparing LFQ and two isobaric tagging technologies on a fully characterized dataset.

There is a lot of good info on the paper -- from analysis of 2D separation for reporter ion experiments, to the LOD/LOQ of said experiments on the instrument platform they utilized in the study. In the abstract they even provide a single summary of the maximum percentage of coisolation interference they recommend using (which is lower than what I use), but the logic for why they use it is much better than what I had!

Tuesday, June 20, 2017

Over 9,000 LC-MS/MS experiments integrated by machine learning!


This is AWESOME! Before I get too carried away, let's point you to the very nice open access paper here!...then...


...put it into some context!

There have been multiple huge attempts to manually map protein-protein interactions. I have been completely unfiltered in my love and respect for the BioPlex project, and this will not change, but there are other resources with other technologies as well. BioPlex is a reasonably new effort and I've tried not to seem too stressed out about new studies where people have met-analyzed other datasets, like the huge Y2H (yeast 2 hybrid) assays when BioPlex info is available.

TADAA!!!  Welcome to hu.MAP (proteincomplexes.org)



What is it? It is a meta-analysis of the BioPlex data released so far with:
this study from Marco Hein et al., (great Max Planck study in Cell from 2015) and
this cool one from Cuihong Wan et al., that appeared in Nature around that same time (that I missed till now)

These studies are all great and well done and just awesome on their own. Why would anyone mess with them? Answer: Because that is what public repositories of data are for. And -- get this -- analyzing these together with fancy machine learning algorithms -- turns up protein-protein interactions that none of them did on their own!

Not to leave it alone there -- no way -- these authors also look at some of the Y2H databases, which is cool and all -- but they painstakingly validate some of the protein-protein interactions that their methods pull out of these beautiful data sets and show they are VERY VERY real.

How'd they do it? Magic formulas with loads of Greek letters that I can, in no way, confirm are correct or accurate -- but their validation assays sure look great! The important part is that we can go to proteincomplexes.org and simply type in what we're interested in and yield the rewards of their efforts!


Monday, June 19, 2017

Taking on protein complexes with proteomics!


This is a great new paper talking about the methodology for protein complexes. Pubmed link here.


The author gets feedback from some heavy hitters in the field, including: Brian Chait, Neil Kelleher, Mike MacCoss, Carol Robinson, Uwe Schulte and Albert Heck -- among others.

While the author really starts off a little pessimistic regarding the challenges and necessary technology advances in NMR-type technologies the tone of the mass spectrometry experts is much more upbeat and optimistic. With the tools we have now for native complex mass spectrometry and crosslinking and cryo electron microscopy, maybe we don't need to wait for the next big breakthrough in NMR to fully elucidate protein complexes.

Highly recommended perspective paper -- a big bonus for me is a description of the successful analysis of protein/lipid complexes, a reference I need to look up and send to a friend who may have samples just like that heading her way!

Stop Windows update before setting a huge processing run!


I was overjoyed this morning to find that my PC had installed updates and rebooted....and optimistically hoped that my FASTA had somehow been allowed to complete before Windows decided it was more important to update my Audio Driver...which appears to be the only alteration listed...


...nope...(P.S. TrEMBL is 39GB unzipped....it takes a while....)

I've always hoped that I would find something that would allow Windows Update services to allow me to create a list -- "if X program is using 80% of the CPU don't reboot" but I never have.

This post is more of a reminder to myself than anything else -- that I need to type "Services" into the Windows search bar, find Windows Update, right click and STOP the service prior to starting any huge multi-day runs....or before trying to import a 39GB FASTA file in it's entirety....

Saturday, June 17, 2017

FASTA update time! TrEMBL and Joint Genome Institute!


At the Analytical Lab Manager's meeting at ASMS, I got to meet and answer panel questions with David Tabb. Of the many points he pressed upon us for proteomic bioinformatics, one that stuck with me was the negative consequences of using old FASTA databases. Maybe it stuck by me cause my FASTAs in use look something like this screenshot...


2011...2012...2015...who knows...? Okay. I'm at fault. Especially when FASTAs are free from UniProt. The difference was seriously impressed upon me when I realized TrEMBL is up to 19GB!  The last one I downloaded...was 7GB...that's a lot of data!

An important note from his talk I was very surprised by was about the Department of Energy Joint Genome Institute.


Which hosts tons of cool FASTAs from microorganisms! If you are studying something that isn't present in UniProt/SwissProt (or under-represented?) maybe you should check it out.

Friday, June 16, 2017

NIST is incorporating Proteome Tools resources into their libraries!


Are you familiar with the Proteome Tools project? If not, you can read about it here. It is an ongoing effort to generate a proteome wide synthetic peptide library!  At launch this group had 330,000 synthetic peptides done and they are still going.

There is a group here in my local area that knows something about spectral libraries -- and has an awesome looking Twitter icon...


...that has been curating this released data and setting it up for release. If you're already using tools like MSPepSearch that are fully compatible with these libraries, you don't have to do a thing different to utilize these resources!

You can just go to the ChemData NIST spectral library hub and HCD fragmentation data from the first release of the Proteome Tools library is there.



You can access it here!

Tuesday, June 13, 2017

More tools to find alternative splice variants in humans!


This is a bit of a continuation from yesterday's pre-coffee paper. These weird alternative splice variants -- or genes from over here and genes from other there forming transcripts and/or proteins that we might not be thinking to look for.

The paper I'm looking at is this new ASAP one at JPR.


I'm going to admit up front I don't 100% understand how they did this. The mass spectrometry is straight-forward, and the data has been deposited at PRIDE (PXD006026). The introduction  is eye opening (even if you were watching the Finals on the East Coast) and was enough for me to bumble through this...check this out!


I was thinking these weird alternative events were going to be very rare. Apparently I was thinking very incorrectly...

This is where I'm a little fuzzy on the details of this paper. What I do get: they generate databases that contain only tryptic peptides for alternative splice variants. I'm not 100% clear on how the databases were generated and filtured other than they employed a tool called SpliceVista that references two alternative splice variant databases called EcGene and EVBD. Perhaps the output from SpliceVista is just FASTA and that is all there is to it.

Once they had their databases the proteomics gets very interesting. They identify splice variants in their MS/MS data, just as Dr. Lazar found on her instrument in the paper I mentioned yesterday. Where this paper goes one step further is that they did phosphopeptide enrichment on their samples as well...and find phosphorylated splice variants!
.
..And they don't find a few variants. They finds a bunch of them -- enough to sit back and think -- wow...this HAS to have serious biological implications! How much more can we refine out of the hundreds (thousands?) of great historical phosphoproteomics datasets using the databases they generated here?

Monday, June 12, 2017

Chimeric fusion RNAs in noncancerous cells!


Biology is really complicated. Just throwing that out there.  It would be really great it if it would just all obey the central dogma...



 ....and we could get back to running our perfectly calibrated and QC'ed instruments (thanks, WikiPedia article on Chimeric RNA!), but then it throws us completely new (at least to me) concepts like Chimeric Fusion RNAs, which are the topic of this cool paper!


I'm familiar with how screwy DNA/RNA can get in cancer and how complex these mixups can be at every level (transflips, for example?)...but...come on....RNA can't be just binding to other RNA and making messed up protein in normal human tissue, right?!?!?

This team shows some convincing evidence that it is and does. They started by looking at RNA-Seq from nearly 300 different libraries and found loads of reads coming back that could only mean that this gene over here and this gene over here were somehow fused in making weird RNA, but they only found them in these tissues...


...yeah, like all of them.

If you've ever looked at RNA-Seq data, or sat through a talk on it where someone understands it and is being honest about the technique you know that there is a lot of noise in the data. Part of the reason their informatics and statistics are so advanced is that they have to be in order to get to the good stuff. So maybe this is all just noise and false positives?

That figure at the top of the paper? This is a clip from this study where they use PCR to amplify some of these products and visualize them on gels (not much noise there), the very top is Sanger Sequencing of RT-PCR products and the bottom is MS/MS that shows that some of these weird fusions are making it to expressed proteins.

In "normal" tissue! Maybe this is what a lot of our unidentified spectra are...?....if so, the only way to get to them is going to be more informed FASTA databases that include these tissue or individual specific fusions as options for database search....or de novo peptide sequencing that can BLAST back to short chains from multiple gene products...

Either way, biology is complicated...and it would be a whole lot less fun if it wasn't!

Sunday, June 11, 2017

Social network architecture + proteomics to study immune response!


Understanding social networks (like Gusto's, above!) is big business and requires sophisticated big data approaches to fully understand and capitalize on. All the algorithms being developed to study social networks must have applications besides the freakishly accurate predictions in my Inbox of when Gusto and Bernie need another case of Deedle Dudes, right?

Leave it to those brilliant people at Max Planck to divert them to understand something as ridiculously complex as the human immune system response! You can check it out in Nature Immunology here.


How'd they do it? First they needed the data. So they flow sorted 28 different types of immune cells (!!!) from a person and did deep proteomics on all of them. The raw instrument data is all available at PRIDE here (PXD004352), but they also set up a beautiful web resource for the full output data at www.immprot.org.

This is the Immprot front page image that describes the experimental design.


This looks like a big matrix already, right? They do proteomics to a depth of around 10,000 proteins on all of these cell lines -- but get this -- it is bigger than this. They take the flow sorted immune cells, culture them and activate them. I'm no immunologist and it's too early in the morning to call and ask any friends about it, but from a rough understanding those cells in their native state floating around aren't all that interesting, right? They expose them to cytokines in culture that initiates something like the response they'd have to pathogens or other things immune cells destroy. Interesting to the non-immunologist here -- the different cells are activated with different compounds.

These activated cells are studied by doing proteomics on the cells as well as doing proteomics on the media. If you are considering doing something like this -- proteins excreted by growing cells -- I strongly suggest the method section. I wouldn't have thought to remove contaminating background materials from the culture supernatants and they go to great lengths to remove cellular material that might interfere with their results. If nothing else, this section is beautifully written. It's a Max Planck paper for sure...

It's easy to miss here that they also did transcriptomics as well, but the proteomics was all single shot on a 50cm column with 180min linear gradient. I'm glad to see they are still employing the SprayQC instrument software (I haven't heard of it in a while and wasn't sure it was still updated for new instruments!) The MS1 is 120k resolution and MS/MS was set for more sensitivity over perfect transient matching with 55ms.

The data was expertly processed with MaxQuant and Andromeda and the MaxQuant output is also available at the PRIDE location above.  But the stuff that really knocks your socks off is what they do afterward. Maybe it is less impressive to computational people, but this all falls under bioinformaGics for me that you can check out at ImmProt (but the figures in the paper are just stunning!)

Examples of me just looking around this morning..wait.... You seriously have to check this out. And send this link to any immunologists you know (I just did!) this has got to be an amazing asset for them. I'll skip the volcano plots (which are awesome!) and just go for the protein networks.

I went after my favorite Integrin protein (cause they're involved in everything!)

I love the note at the bottom! There is obviously some serious computational power behind the scenes here. Being too greedy will cost you some time. It populated with interaction partners, and you update the interaction wheel.


And it constructs the network map wheel for you! I wasn't too greedy, this thing was like 10 seconds. Please note I clipped the cell key and output to make them fit here, so scaling looks better on the page. You can output the plot as a high res PDF, or you can go straight to searchable tables.

Wow. This post is really long, if you can't tell, I really like this resource. It shows just another place where we can come in with our technology and techniques and impact researchers who maybe haven't explored what we can do now. Is this a resource that dedicated immunologists can mine for years and continually find new things to explore? It sure looks like it!

Saturday, June 10, 2017

The host-parasite interactome!



Okay....I've got 22 pages of hand written notes from ASMS I'm trying to sort out. Unfortunately...they look something like these notes from Alexey's awesome "Proteomic Dark Matter" talk....


...so they probably condense down to 1 type written page, but while I'm filing things into my spreadsheets as 1) Published and can talk about it 2) Watch for this paper!!!, etc., I stumbled onto shockingly cool tool at bioRXiV!  You can read about it here.


Maybe host-parasite interactions aren't your thing. But "interactomics" is one of the big data approaches that the bioinformaticians are going to be using increasingly to make sense of the data we're depositing. Also -- it is just fun to spend 3 minutes here looking at this thing! (Direct link to the software is here!)

After making a bunch of awesome and wobbly networks between this parasite and that one, WikiPedia has a great break down of Interactomics here.  In general, it is a big picture approach to link what is changing. When we take our quantitative data out and put it into pathway analysis software like IPA or BioCyc, we're looking at known interactions that have been painstakingly constructed from historic studies. Interactomics steps away from all this known stuff and builds the networks statistically. If I say any more about it, I'll only embarrass myself further.

As someone really interested in host-parasite interactions -- I can't imagine a more appropriate application of these tools!!

Friday, June 9, 2017

The MS Bioworks App!


We finally updated our old personal cell phones to some fancy impossible to use things that are far too large, but have tons of memory. The first thing I installed? The MSBioWorks App!

If you don't have it (and don't have an old phone full of Pug and kitten pictures and no room for it) you should.

You can type in any protein by UniProt ID or Gene Symbol (not sure of the requirements, but if I use the universal gene identifier -- do they call that HUGO now?), it will find the protein. You can choose the digestion conditions and then email the report to yourself!


It will predict peptide fragmentation patterns from ones you enter and modify them however you want, as well -- also with the direct email output with a surprising number of features!


Okay...I apologize to the App developer, this screenshot doesn't do your cool tool justice. I sent myself the results in Korean (cause why not?), but my I converted the text to UniCode into Excel so it lost the formatting (my Excel doesn't have a Korean language pack installed). Line 71 and 84 are the y2 ions in as z=2 and z=3, respectively!

Also, the App has a great news section for new papers. I have no idea who keeps this updated, but it is great! And way better than getting news here!




Okay -- but the thing I have ALWAYS loved this App and developers for is the calculators. I don't have a good screenshot, but if you have an LC line of this X diameter and Y length it calculates the volume of that line for you! It has been invaluable for me in the past and I can't thank this group enough for making this for us and giving it out for free.