Thursday, January 17, 2019

Phospho- glyco - proteogenomics of early onset gastric cancer!

I've been waiting for the embargo to be lifted so I could download this data from the CPTAC Portal for what seems like a while now. And the paper is now out, the embargo is lifted and ---

--that's a lot of authors! You definitely need at least this many people to put something like this together!

Now that the RAW data is being very nicely deposited into pre-determined and organized folders by Aspera manager -- wow-- the Q Exactive heavily fractionated N-glycoproteomics files sure look pretty ---

I still have no idea how they did this. I can't just download the data and start pressing buttons, I guess -- it's time to read, I guess....

HOW DO YOU INTEGRATE N-Glycoproteomics and PhosphoProteomics into your ProteoTranscriptomics?

It's got to be in here, because this is what they did.

Want to make it more fun? This is a lot of patient samples. And iTRAQ 4-plex!?!?!

What an ambitious project....and a beautiful output that we can mine for years going forward!

Whoa! Shoutout to @ScientistSaba for providing this awesome video related to this study!

Wednesday, January 16, 2019

Rapid Micobial ID and Antibiotic Resistance Prediction by LIPID signature!

Wait. This is a totally new idea, right?

Biotyper things work by culturing bacteria, and looking at the protein signatures, right?

What if the lipid signatures were just as valid? Then you'd have a complementary technology. Okay -- so the idea isn't totally new -- Goodlett lab has been working on it for at least a few years and last year produced this open access paper on it I appear to have missed.

The cool part about this new paper is that they figured out how to make this work fast! Like hospital friendly level fast!

Wednesday, January 9, 2019

Central dogma rates may explain why transcript measurements seems like such junk science!

Okay -- this new Nature Communications study might just be totally awesome.

I'm going to go with "might" for the following obvious reasons:

1) The terminology is outside of my wheelhouse
2) The math is so far away from the house where I store my wheels that it...(wait...what the hell is wheelhouse anyway..?...) that it could be complete and total gibberish and I would never ever know. My eyes stopped focusing as soon as I got to the page with the formula things.


1) A world class expert in these transcript words sent it to me with exclamation points on it.

2) Every person in proteomics has tried, at some point, to correlate transcript signal to real protein measurements and has walked away thinking "the transcript people are clearly just making numbers up" or "the transcript people are clearly just making everything up and one day we'll laugh about the trillions of dollars science spent on this generation of sequencing technology the same way we look back on global microarrays -- as expensive technology that yielded little or nothing"  Which, hopefully isn't true. I'm not saying it is! That's just what I might think for just one second when someone says "this mi-Seq says there are tons of transcipts so there will be tons of protein" and 15 SILAC peptides say "....NOPE! This is 1:1, yo" Maybe this awesome paper explains what's up!

3) This study might be the great unifier (which, apparently, isn't a word). This group looks at loads of transcription/translation rates in rapidly growing cells from bacteria, yeast, my favorite model organism for human disease, and humans -- and finds there are mechanistic reasons that genes would have high transcription but also low translation. And this might be gene specific. And it might be possible using these techniques to develop a metric for each gene, as in, this a gene that will correleate in transcript abundance and protein abundance -- and this one will not. What a useful database that could be, right?

I guess you could just measure the protein level directly using some sort of comprehensive -omics based measurement of protein levels, if such a thing existed (and cost a tiny fraction of the price to measure transcript abundance), but I'm sure there are diseases and models and all sorts of things where it is still important to understand the indirect things that lead to protein expression as well. It might seem simpler to us to move 1% of what is spent on transcriptomics per year over to direct global protein measurements, but whenever a solution seems that simple and direct to me it's generally that I don't fully understand the problem.

I'll be thinking about this paper all day. Maybe I'll even ask someone who understands the finer points to talk me through more of it!

Tuesday, January 8, 2019

Scary new paper on how centroiding high resolution data affects your quantification!

Wait. What?!? We didn't handle this already? I thought this was all addressed in 2012 and fixed! It's 20189! Okay -- everyone should read this. For real. 

CRITICAL INSIGHT.  I didn't even know that was a choice of article until today!

I'm too depressed now to type about it.

Shoutout to Chris Ashwood for highlighting the importance of this to the community this week.

Monday, January 7, 2019


Maybe I'm just leaving this here for me!

ABRF 2019 -- Abstract Deadline January 21st!  That's like 4 days from now!

ASMS 2019 (Atlanta! w0000h000000!!!!) Abstract Deadline January 31st!!  That's like 6 days from now!!

Get your abstracts in, unless you're furloughed, and aren't allowed....

Sunday, January 6, 2019

Follow-up paper on CHESS -- the new "f-word"

This commentary is a follow-up on CHESS that help drives home the significance of what -omics has been doing and needs to improve on.

Also, possibly my favorite abstract of all time.

Original CHESS rambling on this site is here (some people use "feeds" to land on this page and these may not directly go backward to the next article. It was news to me too! Technology!). 

CHESS -- The New Human Genome Catalog! I don't even like chess. It stresses me out....and I'm already off topic. Back!

Let's talk about this CHESS.

If you do human-based proteomics, chances are you base this on human-based genomics stuff that has been converted and cleaned up and annotated into nice human protein .FASTA or .XML files for you.

The trouble is that the genetics people can't seem to decide yet on how many human genes there actually are. It's gotten so bad that we've had to get involved with projects like C-HPP.

But there is SO MUCH genomics and transcriptomics data out there. Couldn't someone just get 9,795 human RNA-Seq files and come up with a brilliant way to figure out what genes humans actually make transcripts for? Is that so hard?  What would it be, at most, 900 BILLION transcript measurements?

That's what this group did. The scale here is just ridiculous. I'm not all up on the conversion of transcript reads and things, but my understanding is that the Hi-SeQ generates around 200GB of data -- and that system is rapidly being replaced by one that generates a TERABYTE or more of data per sequencing sample. MacCoss lab did some stuff with data minimization of RNA-Seq a few years ago, but for this analysis I don't think there is any way this group could do that. How much HPC firepower did they have access to? A lot.

Honestly, this is yet another paper this weekend that I can read and hear the sounds the words are making in my head, but I can't really grasp. What I can grasp is a HUMAN PROTEIN FASTA I CAN DOWNLOAD!!  You can get it here.

The format is funny when I look at it in my FASTA browser thing.

-- but this can't be anything but useful, right?  This is the stuff that we, as a species, make transcripts for!!

Saturday, January 5, 2019

Targeted proteomics finds early CSF markers for Alzheimer's disease!

There may easily be 12 good reasons to love this brand new study in MCP (early access version is open!)

however, unless this espresso changes my priorities this morning, I only have time for a couple. Hopefully the ones that will convince you to go right now and check it out.

If you were told to do some discovery proteomics to find some biomarkers for a pre-clinical marker, how would you do it? I predict you'd get some samples, digest them, fractionate them as much as possible and queue them up for days/weeks of instrument acquisition time and matching amounts of data processing time.

Do we always need to do this?

I just went to the @ProteomeXchange Twitter account and from January 2nd to 8:24am EST on Jan 5, 2019 --32 new public proteomics datasets have been reported (tweeted) as newly available.

This is how Lleo et al., (sorry, my keyboard won't make all the right letters) did their discovery --

-- they mined a ton of deposited datasets! I can't pretend to understand the biology here, but they are hunting markers that will show up in CSF before the normal markers that indicate the patient has progressed into the disease. I have this vague understanding that many diseases have better outcome the earlier you realize someone may possess it, and studies were selected that might lead them toward this end.

Some of these proteins were mined directly from the papers (they describe in detail how they used terms to search literature databases) and others were pulled from PRIDE/ProteomeXchange and other public repositories.

I'm sure this takes loads of time and bioinformatic and medical expertise, but there are at least 25 complete disease studies here. This composite represents years of work and centuries of combined skill acquisition that no new study, regardless of instrumentation advances, could replicate in a reasonable time frame.

While writing this, I realized that this group also did discovery proteomics on enriched "synaptosomes" and somehow leveraged this against all the data they mined. Again -- I think a lack of understanding on my part regarding the biology is keeping me from the full picture here. An Orbitrap Velos was used for that part, on fractionated (SCX) samples.

Now it's validation time!

They made heavy peptides for their markers. They got real patient CSF based on classifications from clinicians and they got to work (nanospray on a 5500). About half their chosen markers had too much interference to be used.

Looks like Skyline with MSStats for all SRM work -- and convincing -- useful preclinical biomarkers were found.

This is a great study all around, despite my perhaps confusing description of it.

I think we're going to see more of this in the future, for sure. I was about to runs some stuff and found that a group at Yale had just published a nearly identical discovery experiment! I wrote the authors for clarification regarding which file is which (a problem I have with a lot of PRIDE files, but I think I'm probably just dumb -- anyone else have issues with this?). They used a better instrument than what we have here AND the operator is likely better at her/his job than I am. BOOM! Weeks of instrument time that can be used on something novel. All I have to do is process 80 or so beautiful files and start looking for targets!

Thursday, January 3, 2019

Forensic proteomics overview!

(Photo used entirely without any permission whatsoever from this great article!)

The topic of forensic proteomics is suuuuper interesting and for reasons I'm sure we don't need to discuss -- suuuper scary. Cough cough HIPAA cough cough.

How many mutations make it to cause single amino acid variants (SAAV)s? At most one in three, right? I don't remember the details, but I vaguely recall a talk by Zhongqi Zhang where he discussed the likelihood and it is lower than this, due to the distribution of codon tRNA letters and energetics or something (most of the words in this sentence I may have just made up, but it feels like a memory).

However -- there are TONS of mutations in people. Individual genetic variation like all the genes that make me dumb and you smart or contribute to why your hairline isn't what it once was and I still look like I could totally play bass for Slayer.

We're not ready to think about the ramifications forensic proteomics might have for our field. At all. So, let's just think about how cool it is and this article is a fantastic place to start!

Wednesday, January 2, 2019

MoMo -- Find statistically significant PTM motifs!

I've been recently reintroduced to this "motif" idea. The idea is that the enzymes or whatever that modify proteins recognize certain strings of amino acids available in many different proteins as where they should put their PTMs.

Proteome Discoverer 2.3 will have a neat little Motif add-on when it releases shortly!

If you need more,  MEME suite has TONS of motif tools.

You can read about MOMO -- the newest addition -- here.

And...if you're thinking that something else had that name but you couldn't remember what --

You probably had a friend with a Civic or Sentra with a turbocharger that was WAY too big for it that had stickers like these all over the inside of it!

Tuesday, January 1, 2019

Cause no one ever asked for it! My favorite papers of 2018!

It's already time to break out my sequin jacket and end the year at the Laser Disco?!??  How did that happen??  I didn't get a fraction of the stuff I planned to do in the last 12 months done...logically, instead of working on those things I'm going to do my biennial review of my favorite papers of the last 12 months?  Meh. Whatever.

In no particular order!!


You're tired of hearing about it. So am I (not!). This is maybe gamechanger of the year, y'all. When you can boost the S/N of your low abundance ions by an order of magnitude or two, VERY good things happen. I swear I've got an 80% written draft waiting for some feedback so I can get it out the door to be summarily and rapidly rejected by my peers like every other thing I've submitted recently. Hint: BoxCar is great for proteomics, but it's waaaaaaaay better for other things. Where do you need dynamic range the most? BoxCar it!


We can't amplify stuff like those weird DNA people down the hall can with their PZR machines, or whatever they're called. With SCoPE we can -- sortof. You can amplify tiny amounts of DNA into loads of it and then you can use whatever super expensive, low quality, error prone sequencing technique you want to. SCoPE lets you leverage TMT reagents to the increase in interscan dynamic range to quantify peptides that are below the dynamic range you would get with MS/MS sequencing. SCoPE is a revolutionary idea and I don't think we've even started to explore the ramifications that it can have for us. Perhaps more important than the original SCoPE paper is mPOP that makes it a lot easier to do SCoPE (trust me, it's not a great project if you're rusty on sample prep. I screwed it up, I think, and that's why it's a robot's job now.)


If you haven't downloaded this marvelous piece of software from the Smith lab you are missing out. We use this every day -- and every week or two it somehow gets better.

Since MetaMorpheus came out this year, upgrades have landed like
FlashLFQ (crazy fast label free quan)
Crosslinking analysis!
And MetaDRAW (which hasn't been published yet, but I'm using right now). Once MetaMorpheus finds that PTM you didn't know was there, didn't think to look for, and totally didn't have to do anything to find (cause that's what it does -- BOOM! PTMs!) you can use MetaDraw to make sure it's real.

Yo. That looks like MetaMorpheus found me a darned acetylation! The team is super responsive, and loves to get feedback. I wrote them about a feature that would be really helpful for my work -- and the next time I updated MetaMorpheus (it sends you little popups that new builds are available!) the feature I needed was there. If you don't have it, you can get it here. 

FDR for Spectral Libraries!

Spectral libraries sure seem like the future of mass spectrometry again, right? While DIA methods are pushing the development, shotgun proteomics needs them as well and we're seeing great new stuff! MS-Ana is a great new engine (wait. where is the MS-Ana post?), FDR for MSPepSearch added in the NIST stand-alone and in development in other software where it's used, and we can keep on going on. Now that FDR is in hand and there are amazing new libraries like ProteomeTools -- I just need to get off my butt and start using them again. So should you!

More Proof that NanoLC is mostly dumb!

Okay -- sometimes I say something like "I need 10ug of protein to work with" and the person I'm talking to stares at me like I'm wearing an awesome sequin jacket and strobe lights on my head, but most of the time, people have MILLIGRAMS of protein to work with.  I LOVE this paper. NanoLC is likely the weakest link in your pipeline. There are alternatives. Let's not get caught up in dealing with these engineering atrocities as if they're something we can never do without. Evidence is mounting that they may cost us more in productivity than they provide us in sensitivity and this paper is the best one yet.

Biological Application of a Phase Constrained Orbitrap!

I hope hope hope hope the application of the phase constraint algorithms for Orbitraps is just around the corner. Remember the jump in speed we got when eFT was added? This looks like it could be bigger. Do I want 60,000 resolution in the same amount of time it takes my QE HF to do a 15,000 resolution scan? Yes. Please. Now. Thanks.

ProteomeTools PTMs!

Yup, I already mentioned ProteomeTools once. This project is too awesome. This study on the fragmentation of 20+ PTMs from synthesized standards is an absolute gold-mine. There is more to be learned about PTMs fragmentation in this one study than in any book on the topic in the world. No. I haven't read every PTM book in the world, but I'd go to this paper first.

On the topic of PTMs -- what about Chemical Proteomics?

Big, far reaching implications here! Multiple reasons this study is in Cell. You don't have to take the experiment as far as these authors here. Just considering the unmodified peptides that drop in concentration when they're drug treated as something of importance might be the thing to blow the doors open on your project.

I admit it. I'm obsessed with the idea of what we could do with real Clinical Proteomics.

And this is the study I carry around if I want to prove to someone that this isn't 10 or 20 years down the road. We can help now if we're given a chance to.

A re-evaluation of FDR demonstrates some scary stuff about it -- but also how to fix it!

I've noticed a famous software program has some new feature that looks like they took this study seriously. I haven't investigated.

Wow. This could go on all day. Unfortunately it's late, I didn't sleep a ton last night and I have to cut this here. It was an AMAZING YEAR for Proteomics. And I expect nothing but bigger and better stuff from you awesome teams out there revolutionizing all the things. 2019!!!! Let's go!