News in Proteomics Research: May 2017

Wednesday, May 31, 2017

Fascinating paper outside our field!

As I'm getting things checked off my list for ASMS, there hasn't been a lot of free time..or sleep... However, I get this really anxious feeling if I don't post something interesting.

This is outside of our field and a bit older, but I think it has some ramifications for all scientists and it was new to me.

You can check it out open access here.

Thank you Dr. M!

Tuesday, May 30, 2017

CLMS Vault -- View crosslinking data from multiple techniques!

I've rambled on here a lot about how crosslinking recently. And I am absolutely in love with the new reagents and XLink 2.0 software and the soon-to-be-released Xlink nodes for PD. Crosslinking experiments have never ever been easier than they are now.

However, lots and lots and lots of crosslinking studies have already been done and been publicly deposited. AND there are some of you out there who have been way more successful with other crosslinking reagents and 3d structural work without fancy reagents than I ever have been!

CLMS Vault is an effort to bridge the gap between different crosslinking studies and software and techniques to integrate all these data into a single user interface. You can check it out in this new paper here!

While a demo version of the software is available online with nice instructions and a web interface here. I think the only way to get the full version for full use is to download the Python package. You can get that here.

There is a lot of value in this package, but the thing that stands out as the most powerful in my mind is the meta-analysis prospects. If you are studying a protein that is ridiculously important and mysterious -- p53 would be a good example -- this platform would allow you to integrate the work that you are currently doing with the crosslinking work that previous people have done.

I haven't checked to verify these techniques are compatible or if the files are all publicly accessible (this is just an example), but wouldn't one platform be useful to see if your files and results from your crosslinking study are complementary with the results from
these,
and these studies?

Yeah, it's Python, but you can learn how to load and run a Python package this nice in a day or two -- or find some nerd that can already do it in that same amount of time. Imagine if doing that would allow you to see that someone previously replicated your interesting finding and how much better you'd feel during the 6 months you are about to spend trying to validate it or explore it further?!?

Disclaimer: I don't have hands on with this software, I only scrolled through the demo and user manual -- I just think the idea behind it is fantastic!!!

Monday, May 29, 2017

Need an awesome high resolution dataset to test that label free quan workflow?

Big shoutout to Brett Phinney @UCDProteomics for this one! Diverting social media platforms for science!

Want to make sure you're using that label free quan algorithm correctly? Want to test that awesome new algorithm you're developing? Some awesome, expertly prepared and acquired samples might be exactly what you need. (PD 2.2 launch is just around the corner!)

This one is fantastic and described here!

My go-to is the Claire Ramus et al., dataset -- and the Shalit et al., dataset is pretty solid.

How do the 2 stack up?

Ramus dataset -- Yeast digest with Sigma UPS1 peptides spiked in -- ran in High/low on Orbitrap Velos

Shalit dataset -- E.coli complete digest spiked into HeLa -- ran on a Q Exactive.

The Shalit dataset is more complex, but with high resolution fragment ions, depending on where you bottleneck (searching versus peak alignment/extraction) it might search faster.

If I was running an LTQ Orbitrap and using high/low, I'd probably still stick to the previous dataset. If you're using high/high, the other might be a better test for you.

Best of all? Both datasets are publicly available! The link to the Ramus dataset is here (ProteomeXchange PXD001819)

You can get the Shalit dataset directly here (it is ProteomeXchange PXD001385)

I love MCP!

There is no real substance to this post. Just espresso-fueled enthusiasm on this nice rainy holiday morning. Readers of this silly blog have pointed out before things like "wow, you sure seem to read a lot of MCP." Which is true. The other proteomics journals are, of course, awesome. And it makes me even happier to see more and more mainstream biology and medical journals cover proteomics, but I LOVE MCP. Yeah... it is a journal for mass spec nerds. Yeah...the real impact factor is kinda low for just that reason (many serious mass spectrometrists write very little -- this is primarily due to the fact that many mass spectrometrists are in core lab environments or industry), but everything about this journal is what I want it to be as a reader.

Wait -- there will be substance! The new guidelines for targeted protein quantification, set out by this team of people who know something about the topic...

do kick in this week. Definitely review them before you submit anything featuring targeted quan! In my humble opinion, they aren't that restrictive. They are more focused on making sure that results can easily be compared and reproduced.

Probably the most common papers to make this dumb blog are MCP Early Edition articles. These are made available ahead of print. If you've submitted papers to MCP you know how hard the review process is. But even the best review panel in the world is going to miss some things once in a while. Twice this year that I'm aware of, an Early Edition article has been pulled after Epub when it was discovered something was missed. Both times what was missed was so relatively minor compared to the story and impact and the overall conclusions of the studies that I just had to laugh about it! (The second one inspired this post today.)

I hope this isn't taken as any sort of criticism of the other journal out there, this isn't mean to be -- it is just a rambling appreciation of the work the editors and review team of this specific journal put into making and keeping this journal at just a ridiculous level of technical quality. In case you're worried you're just being jerks sometimes, I want to state on the record (does this count as a record?) that it is appreciated.

Sunday, May 28, 2017

The Manual of Cardiovascular Proteomics

My local library finally got this book in a few weeks ago!

I'm never going to read the whole book. Chapters like "a history of proteomics" and reviews on bottom up and top down proteomics aren't meant for me. They are written for cardiovascular researchers who are wondering what this proteomics thing can do for them.

What I have read is pertinent to these questions: Why would you want to do cardiovascular proteomics? (the intro and first chapter) and chapter 5 -- Vascular proteomics.

Let me start off with stealing this kind of shocking chart from chapter 1 (authors or Springer, please let me know if you want this image taken down (orsburn@vt.edu)

This is the number of studies that have been done in this field! Like all of proteomics -- more stuff all the time, but when you consider that a lot of cardiovascular diseases are environmentally driven -- not genetic or driven by mutations -- it is easy to wonder if this is nearly enough work in this field!

Chapter 5 is really interesting because I never thought to ask the question -- how would you even get cells from a blood vessel?!?!? Is there enough material present to do subcellular fractionation? What a useful resource if you're sitting at your instrument, minding your own business, and someone emails you a request to do some vascular mitochondrial proteomics!

Whoa! I am ever glad that I skimmed further in the book!

This chapter from Arrell and Terzic at Mayo is useful for everyone!

You can find this chapter here! This is a really useful review of modern downtream analysis tools, many of which I'd never heard of before. Even better? It pretty much starts with a glossary of network analysis terms and really useful terms so that you're following the language and concepts they use in the review.

In all this book is a really nice resource. If you are thinking of putting together a graduate course in cardiovascular proteomics, I think I found your textbook!

Friday, May 26, 2017

Soil metaproteomics of the ground where truffles come from!

I could ramble about the little I actually know regarding metaproteomics for quite a while. And I'm going to get to next week! I'm not lacking in enthusiasm for this exciting new field, but I could definitely increase my knowledge-base.

The best part about metaproteomics papers is that you can learn SO MUCH about the environment, because the authors inevitably have to explain something like why, exactly, they are optimizing protein extraction techniques from this one patch of soil...

Case in point:

Before stumbling across this awesome open access paper, my complete and total knowledge of black truffles is that they are expensive, have a really unique smell and that you hunt them with pigs. This last tidbit of knowledge courtesy of TaleSpin.

Thanks to this paper, I know a lot more. The fungus(es?/i?) form(s) a commensal (sp?) relationship with a tree and a tree that will produce truffles will sometimes have a bare area around it. They call this area a brule (with some funny marks above the vowels I don't know how to reproduce).

These researchers set out to understand what is different about the brule compared to other places in the ground. In metaproteomics, you typically don't know what you're starting with. Here, they are taking samples of soil from the brule and from outside the brule and extracting all the proteins from the soil that they can. There may be thousands -- millions -- more?? different organisms in this soil sample -- bacteria, fungi, who knows?!? And you do proteomics on it!

The biggest part of this paper is the optimized data processing methodology. They do what I would do -- when in doubt, FASP it! but first they dry the soil, pulverize it and do some chemistry stuff to it first:

What can you learn from proteomics on an unknown mixture of an unknown quantity of organisms? The tools and techniques at a data processing level are where they can make predictions of what fungi are present and in what relative amounts! On top of that there are a lot of proteins that are extremely well conserved among all life forms. They can find what molecular functions are the most represented -- and what they find that is a striking difference at the GO level between the soil inside and outside of the brule. Since cell organisms binding processes are massively over-represented within the area of interest, compared to the outside.

This is a sample of what their output looks like.

How do you translate this into biological relevance? You've got me! This is where you hand these results and writing back to the fun fungi guy to interpret, but at a surface level what seems interesting is that although there doesn't appear to be much life in the brule area -- at least not much vegetation - there is some serious metabolism happening in the soil! The micro-organisms are getting energy from somewhere and are hard at work.

They go much more in-depth and dissect individual GO processes here. They also previously did metagenomics on this soil and comparing the individual species (or..genus) to these results gives them more insight than I have the capacity to absorb.

This is just another cool paper showing where our tools and techniques can answer questions that I know I never thought about asking!

Thursday, May 25, 2017

Elegant overview of the proteome universe!

As you can tell from a blog post or two, I'm not a fantastic writer. I'd like to think I'm okay when I'm not in a hurry, but these authors can write!

If the title of this awesome Open paper didn't draw you in, I'm gonna guess you didn't grow up wanting to be this guy...

Once you get past a title that great -- this isn't fluff. This is one of the single best reviews of why biological complexity and the proteome are so intrinsically intertwined that I have ever read. It is easy to forget the expectations we once had for what doors the Human Genome Project would open into our understanding. What it opened was a door that showed how little we currently understand.

Look, even if I wasn't being lazy with the blog today, I can't read this paper over my coffee and do any aspect of it justice. If you want to read a great perspective paper on how far we've come -- and how amazingly far we still have to go, I can't recommend anything more.

Wednesday, May 24, 2017

Is de novo sequencing already a viable alternative to database searches?

If you look you might see some ravings on this blog regarding the DeNovoGUI, another incredible free resource out of the CompOmics group.

If you're interested, the original paper is here.

Since that paper came out the DeNovoGUI has expanded and incorporated more algorithms. When I booted my copy today it told me there were new updates as well! (Downloading now)

We all know de novo searching algorithms are out there. I know more and more labs that are using PEAKS as their primary software -- meaning PEAKS has come a long way! I think the consensus maybe 5-6 years ago was -- yeah, it was a nice tool, but it was your fallback plan if you didn't find what you were looking for with database tools.

As a sign -- and thorough measurement of this possible shift, check out this new paper from Thilo Muth and Bernhard Renard (the latter is a fun name to say! try it 3 times fast!)

The question they set out to answer -- are the de novo algorithms, right now, a good alternative to the database tools?

To test it they get 4 publicly deposited datasets. All were generated on Orbitraps. 3 are high/low (Orbitrap for MS1 and ion trap for MS/MS) and one is high/high. Yeast, human, mouse, and some weird thing -- oh, it's an extremophile! cool! Pyrococcus furiosis. My last Latin class was {dedacted} years ago, but I'm pretty sure it's name means something like -- "we found this tube shaped thing growing in an active volcano" I may need to check these RAW data out later!

For comparison they use PEAKS, Novor and PepNovo -- 2 of which can just be ran in the DeNovoGUI (but they may have ran them some other way, I didn't check).

To establish their working base, all the data was searched with MS-GF+ and X!Tandem. I'm a little fuzzy on the details (honestly, I skimmed a little...big day ahead!), but I think they took the peptide spectral matches that both engines agreed upon.

There is a TON to be learned from this paper -- including some really interesting info on what peptide sequences modern de novo engines have the most trouble with, which ones scale the best (more processors meaning much more performance), etc., etc.,

But check this out. Oxford Bioinformatics -- I love your Journal and only recently discovered what a treasure trove it is (thanks @PastelBio!). If it is a problem I used one of the images, please email me (orsburn@vt.edu) to receive my apology and instant removal, but this is an awesome chart!

Again, I'm not 100% on the metrics here -- but this looks pretty darned good, right? I found PepNovo really surprising. I've used it a lot over the years and it was the main reason I started using the DeNovoGUI (cause I did just have an old PC that only ran this program!), but I use PepNovo+ and I don't think that these authors did....

Ignoring this, Peaks and Novor did REALLY well! Even SequestHT and Mascot disagree on correct matches by 10%-15% or so (crude numbers from long ago when I still had access to both -- don't hold me to them). 60-70% sounds pretty darned good -- given NO database!

The best data -- I won't show here -- is the crazy Volcano creature -- in an example where we don't have a good database to use the classical engines with (I am imagining them trying to kill this creature to get it's DNA out. After years of failure by every international team, President Michelle reveals the truth she's known all along -- that a partially completed deathray had been abandoned in a secret facility in Siberia at the end of the Cold War. An international treaty is established and 2 scientists from each country are selected to work on the team and to complete work on the project. By diverting all the electrical consumption of Europe for 2 weeks (almost 4 minutes worth for NYC or Vegas) they can accumulate enough power to fire the completed deathray one time. In doing so this will also destroy the device and all the resources required to every build another, but they know it is the only chance they will ever have -- and fire the ray, finally break the Pyrococcus cell wall. Their hopes are shattered, however, when they find the process shreds the DNA completely, leaving only the proteins(??) intact...leaving us right where we started...) and our only choice is to use de novo tools, the comparison between engines is really interesting - and maybe the most pertinent

In summary -- maybe the current generation de novo algorithms aren't 100% ready to replace our current database-driven tools, but WOW have they ever gotten good!

Tuesday, May 23, 2017

Two new studies reveal some of the inner working of toxoplasmosis!

Toxoplasmosis is an absolutely fascinating and terrifying disease. It has been a big topic in the popular science realm -- due to how weird it is. Here is an NPR article on the potential link between this disease and mental illness. Most interesting to the mainstream media has been odd (and controversial?) links between subtle human behaviors of people infected with this weird thing. Like this from Scientific American, which isn't the weirdest one I've heard about.

Toxoplasma gondii (Tg) is kind of a mystery, though. It has a 69Mb genome and a ridiculously complex life cycle....

Two new studies took completely different approaches to try and figure out how this weird little thing can do all the stuff that it can.

I took the picture at the top of this post from the newest one in that is in this month's Elsevier JOP that you can find here.

This is an interesting paper for a couple of reasons. The first being that -- Tg can infect just about any cell with a nucleus. So...if you've got a bunch of gerbils around, might as well see what it does to that one, right? Maybe seeing the effects that infection has on different organisms will help reveal some new information?

Very minor note -- the authors state a 5600 TripleTOF was used for this work and mention features in terms that the 5600 has. The resolution settings and the fact the output was .RAW indicates this was, however, performed on a Q Exactive. Just a little mixup in the methods section that I could have figured out a little quicker if the data had been publicly posted.

They find a fascinating number of differential proteins. Apparently Tg goes crazy in a gerbil brain with big changes in proteins linked to oxidative stress response and others! They check these with RT-PCR and Western and it looks pretty convincing -- and all sorts of terrifying!

The second study -- nuts, I'm running out of time this morning -- is this one in MCP.

I don't really have time to do this one justice at all. In a nutshell, they track a methylation on Arginine that contributes to how this thing regulates itself! They make some mutations to verify that this is involved. Interestingly, there appears to be a lot of crosstalk between this methylation event and phosphorylation. The methylations were found in a previous study with phospho-enrichment.

The downstream analysis is really convincing that this is a major component of parasite control. You don't have to do too much classic genetics + high resolution proteomics to convince me your on the right track! If you are interested in this (or related) parasites, this one is a gold mine! What else controls itself with this mechanism!?!?

These were both really interesting reads inspired by my fear of anything making alterations to my poor glitchy brain and by the tiny natural reservoir of the parasite I found lost and dehydrated in the woods this weekend....

...after thorough review Isaiah Tomcat appears to have been accepted into the pack...with the final requirement being that he had to promise that he wouldn't infect anyone with brain parasites!

Monday, May 22, 2017

New York GC Hackathon for proteomics!! Applications due today by 5pm!

Are you a bioinformatician -or training to become one? NCBI and New York Genome Center are having an awesome hackathon June 19-June 21.

I just heard about this this morning -- and applications are due by 5pm TODAY!

You can check out the NCBI posting here!

Interact with great teams finally get that great idea you have our of your head and making an impact!

neXtprot -- Fast new peptide uniqueness checker!

Hey! I just found this cool peptide that is upregulated 11-fold in all these patients with this condition. It has the full y ion spread and looks great!

Want to instantly check out whether it is unique to your organism (or something that shouldn't be there)? Mathieu Schaeffer et al., just provided you the easiest and fastest way I've seen yet! You can read about it in this open paper here (it's 2 pages!)

Or you can go to neXtprot and type your peptide sequence into the box!

It has to be at least 6 amino acids long -- and don't put spaces it will think that the next space is the next peptide (which means it can take a bunch of peptides at once -- up to 1,000!)

If you do it right, you're output looks something like this:

My sequence is completely unique to just one protein from one species.

You can download the output as a .csv file. Probably not that useful if you've only entered 1 peptide sequence, but if you are dumping in your de novo results....this could be invaluable!

Sunday, May 21, 2017

ProteoSign -- Powerful, easy statistics for Proteome Discover and MaxQuant!

Have you looked at your output report from Proteome Discoverer -- or even MaxQuant and said something like "Wow, it would be awesome if I could easily get some super advanced statistics on all this quan without having to work very hard?"

If so -- I've got GREAT news for you -- and it's called ProteoSign! You can check it out in this nice open paper here.

These authors want to supply you with great differential statistics in a fast, simple and free web interface. They set up a nice little server online somewhere that you can access directly here.

At this point, ProteoSign appears set up for supporting PD 1.4 and a couple versions of MaxQuant only -- but -- since it is taking the text file output from PD -- I think that it would be able to take data from the new versions as well -- heck, I think if you matched the formatting ProteoSign requires you could put in data from any proteomic software with quantification (but I haven't tried yet).

If you are someone like me who loves to just start dumping data into a program before you read any instructions at all -- you'll be very impressed with the speed of the server interface -- you can make a lot of very large and embarrassingly uninformed mistakes about how the whole thing works. If you are persistent with this strategy (there is some value to pressure testing an online resource, right? Please don't take this as my encouragement to not read the instructions. I was just really excited to try it out!) you can accidentally hover over very nice instructions that will tell you what you should be doing that will make you feel dumb and teach you how to use the software at the same time!

Some statistical tools like ANOVA and PCA and volcano plots are coming to PD 2.2, but if you are using PD 1.4 and want to keep using it -- here are those tools. There are features in ProteoSign that don't have analogues in the upcoming PD version, such as the (really cool) replicate scatterplot as well.

I really appreciate the work the authors put in here. They looked at two pieces of software with thousands of users around the world and thought -- "hey, let's add a bunch of tools to them and make it really easy for users to get and use those tools!" Stuff like this explains: 1) Why this is the coolest field in all of science 2) Why this field continues to advance at the amazing pace that it is.

Disclaimer: Yes, I know there are several downstream statistical tools for MaxQuant. The emphasis of this post was -- great statistics without having to really learn anything!

Saturday, May 20, 2017

How to view or open Proteome Discoverer results or files

This question pops up a lot. This post is really here to help people who might ask Google the question in the title: How do you open or view Proteome Discoverer results or files? (Does it help the crawler if the words are in the text as well?)

The common scenario:

Your core lab or collaborator ran some samples for you on that mass spec thing and provided you with a spreadsheet of results. After evaluating this you find some really interesting results and you need to explore this protein or pathway further. Your pathway of interest is upregulated. Is the key post translational modification present? Is the shortened proteoform there as well? General proteomics relies on upfront information. Searching every post-translation mod or sequence variation can explode the computational time and that data might very well be there, but more focused searching or filtering will probably be necessary to find it.

You can communicate all this back to the lab and get back in the data processing queue OR you can do it yourself. If they processed the data in Proteome Discoverer you can hike over with a thumb drive, get the file and look more into the data yourself with the free Proteome Discoverer Viewer.

You can get the Proteome Discoverer Viewer from two different places.

1) The Thermo-Omics Portal (https://portal.thermo-brims.com/)
2) The Thermo Flex Net (https://thermo.flexnetoperations.com/control/thmo/login)

You'll need to register at one of these, wait for your approval and then download Proteome Discoverer (PD)

Once inside you have some choices. You can follow 2 different strategies:

1) Get the version that the mass spec nerds processed your data in.
2) Get the newest version -- because new PD will always open old PD results.

My suggestion is #2. As of the writing of this post, this will be PD 2.1. PD 2.2 should be out in the middle of 2017

The newer versions of PD provide a lot more power to you, the operator of the viewer. There are specific PD viewer keys, but the best thing to do is to just go ahead and install the 60 day demo key. For 60 days you have pure, unstoppable power and can do anything with your data that you want. At the end of the 60 days most of the searching features will expire, leaving you with the Viewer functions -- basically the "Consensus" workflow nodes and the ability to open files and search through them.

I recommend the demo key version for another important reason. During these 60 days you can add free nodes to PD created by amazing external groups like OpenMS and IMP. IMP has a whole suite of nodes that will continue to function after the demo key expires. It provides a ton of capabilities to the software -- the ability to search with MSAmanda, label free quan with PeakJuggler, differential statistics and a really advanced node for reporter ion (TMT/iTRAQ) experiments.

The IMP PD-nodes (I sometimes call this the PD free version) is really incredible and these can be found and installed at pd-nodes.org (It is under IMP nodes collection)

The free OpenMS community nodes are also fantastic and can be found at http://www.openms.de/. While this software is often associated with label free quan only, this is not the case. There are some functions here that are truly unique, including a node for Protein-RNA crosslink detection. Go around any big genomics department and I bet at least someone there is looking at this problem using DNA based tools. With the right sample prep and this software you can approach this from an alternative and complementary direction.

If you think you might ever want to search proteomics data on your own -- I highly recommend you take the ten minutes to install at least the IMP PD nodes and it won't hurt to add OpenMS to your free user interface!

Other software exists as well -- and more is coming -- I don't mean to slight anyone I didn't mention here. The point of this post is for people outside of proteomics --If you paid for proteomics samples or are collaborating with a mass spectrometrist there are sometimes communication gaps. This isn't due to any shortcomings on either side. It is just that 14 years of training in analytical chemistry for the mass spectrometrist didn't leave time to become a specialist in your biological field -- and you probably don't realize what level of expertise and jargon complexity you are using. If this gap is hindering your progress you have very nice free tools to open, view, filter and even re-interrogate those beautiful, dense, and probably foreign (and crazy looking) data files yourself.

If you did download the newest PD and your data was originally processed in PD 1.4, I made a short walkthrough here that shows you how to open these results.

To you computational nerds, I would also like to mention that PD result files are simple SQLite files with a swapped suffix. You can open them with any database tools you are comfortable with that support this file type.

Friday, May 19, 2017

Global proteome-scale crosslinking! Thousands of protein-protein interactions in one go!

Ever tried to crosslink some proteins, digest them and figure out what proteins were interacting with what (whom..?)

If you have, you should probably understand why I've been so excited about these new reagents, instrument methods and data processing software that make this way easier!! And the first paper I know about that uses this is now out!

I've talked about the MS-cleavable reagent they use on the blog previously as well as the data processing workflows. We saw some applications at ASMS, but this paper does true global scale work!

They crosslink the an entire E.coli proteome as well as a human cell line and pull out information on thousands of protein interactions! The do reduce the complexity by SCX fractionation -- but thousands of protein-protein interactions in one experiment?!?! Come on -- if you've pulled this off before, you are way better at this than me!

Have you been down this road before and are skeptical? Don't worry, I don't blame you.

Maybe the awesome people at the Heck lab made XlinkX 2.0 a publicly available web Application that you can just go and use here.

Maybe they also put the example files from the E.coli (converted to MGF) available at this awesome site as well so you can check it out!

This team has been actively collaborating to bring the XlinkX 2.0 code into Proteome Discoverer as additional nodes if you want to search data like this within a framework you already know!

Thursday, May 18, 2017

Cross-sectional analysis of the salivary proteome and patient associations!

This is really cool and I think it might be foreshadowing of what part of the proteomics field may look like in the future! You can check it out ASAP at JPR here.

What is it? They took saliva from a bunch of people (almost 200, I think). They digested the proteins and did single shot proteomics on the peptides with an Orbitrap Velos with label free quantification. All normal stuff.

Where it gets really interesting is in the downstream analysis. They took all the info that they had about these participants they got the saliva proteomes from -- including the data that you can see in the title above -- and did fancy statistics based on the proteins identified and their relative quantification.

The genomics people are doing TONS of stuff like this. I bet you've heard of the GWAS stuff (Wikipedia article here). In these studies they take a snapshot of the genes, typically via SNP arrays or low read genome sequencing, of a bunch of people. In the simplest example, they do this on a group of people without a disease and a group with a disease and they try to figure out what areas of the genome associate with the disease. In the bigger and more ambitious studies, they just collect lots of info about people so they can separate them into classifications and then get genomic information on every participant they can afford.

This cool study is a page out of this, but instead of getting a picture of the area in the genome that there might be more copies of (which...might be transcribed...and that part might be translated...) they cut out the middle steps and go right to the proteins!

What do they find? Protein expression that strongly associates with some of these characteristics mentioned in the paper title! For example, 30 proteins can be associated with the saliva donor's BMI!

This is a nice method paper and proof-of-principle for this kind of study. The exciting part to me -- It doesn't take much imagination to come up with a way to apply it in a clinical sense, right? Collection of the sample couldn't be easier. We already know how to do the sample prep and analysis. Association with different diseases could be used to point us to individual proteins or patterns of proteins that could be early disease predictors. And maybe patterns are the key point here, and we can easily steal the tools the genomics teams are using for GWAS and divert them to finding patterns in the protein data.

This is a great forward-thinking study that I couldn't be more excited about!

Wednesday, May 17, 2017

TMT-11plex kits are live!

Slow week for the blog as I get ready for that thing in Indianapolis.

Short bit of good news, though! The complete TMT-11plex kits (TMT is a trademark of Proteome Sciences) are live now! You can order them here! I tried to update the workflow image and I think it turned out okay...

Monday, May 15, 2017

Going to ASMS? Have you downloaded the App yet?

Going to beautiful Indiana in a few weeks? Here is my first public service announcement.

The App is live and it is the best iteration I've seen so far!

You can build your calendar from the App on your phone (if you can remember your password) or you can build it on your PC with the online planner and then the App just pings you reminders when it's time for that talk or poster you're dying to see!

Direct Online planner link!

Quantifying proteins in dried blood spots after decades of storage!

...and now for something completely different!

I just find this one all-around interesting. First of all -- how did they find blood spots from newborns that were 40 years old?!? Is this a commonly acquired and stored in normal clinical practice? If so, what a potential resource!

The paper is an analysis of the feasibility of large scale sample biobanks using blood spots dried on paper. If you were going to set up a biobank, this would be a cheap and easy way to do it. Finger prick, drop of blood on paper, store it.

To study the stability of the proteins they use an immunoassay technology called Proximity Extension (PEA) and assess 92 proteins across samples going back as far as 50 years (I think I remember it saying 50 years in the paper somewhere) and look at different storage and acquisition techniques. A large number of the proteins appear to be genuinely unaffected by degradation over time. Another population of proteins, however, appears to decline with almost predictable half-lives. I don't have time today to read up on this PEA thing, but it appears to be an established technique despite the lack of a Wikipedia page on it.

1) Impressed these resources might exist commonly out there in the world for analysis
2) Ultra-impressed that you can quantify proteins 30/40/50 years down the road from them!

Sunday, May 14, 2017

An example of what you can do with the OmicsDI!

I was really excited about the OmicsDI paper, but I realize at this point in time we're kinda being inundated with terms like "Big Data" and everyone's got a "Database" and an "API" and... it does seem like these terms mean different things regarding on who you are talking to...and this explosion of new terms and ideas makes it a little hard to separate the really good from the blustery jargon.

First off -- OmicsDI can help us with this one of these problems. It does so by bringing a bunch of databases together. Here are some examples of what you can do with it! (this is OmicsDI.org, btw!)

I'm going to pick a random cell line. Let's go Colo205. Just typing the cell line into the little search bar gives me access to a ton of information:

There are 3 proteomics studies and 5 transcriptomics ones that we can directly access via OmicsDI that feature this cell line! From just looking at the studies (T -- transcriptome, P- proteome, etc.,) you can see an overview of the study and learn some stuff about them.

For example, 5 studies by ArrayExpress. Microarrays! Right from the start you see there are 7 studies here that won't provide any meaningful data whatsoever and you can move right along (kidding, of course!)

Clicking on the provided link will take you to the ArrayExpress (which, btw, I'd never heard of before) where you can a summary of the study and direct links to download the processed and RAW data from the study.

If someone had said to me -- cool proteomics, has anyone done transcript analysis on this model before? I would have started like this:

Which, btw, doesn't lead you to anything about science on the front page at all. OmicsDI has already made me more efficient in this hypothetical.

Okay...so...I was a little bummed that there were only 7 studies on this cell line in the repository. Guess what? There is some disagreement regarding the nomenclature of the cell line. Is it Colo-205 or Colo205. If I type the search "Colo205 or Colo-205" I get 10 more studies.

Including another database I didn't know about (Expression Atlas). Let's follow that one!

It leads me to a table that I can search in the web interface or download in it's entirety. It is the expression levels of 24,000 transcripts across a ton of cell lines with a heat map indicating relative up/down regulation stuff.

Remember that theoretical question I mentioned above? Did anyone find this in the transcriptomics? Take those proteins you found that were up- or down- regulated and search them here. And the data is at your finger tips! No looking for a database you didn't know existed!