Wednesday, January 30, 2019

Uncovering the world of MaxQuant!

 I just stumbled across this really nice breakdown of MaxQuant and Andromeda related terms, and a great new blog.

You can check out ProteomicLandScapes here!

DIA -- An almost perfect tutorial!

Have you finally seen that bit of evidence that said "hey -- DIA would be the way to do this project?" but are not really sure how to get started?


This is a great way to get started. I've gone back and forth on the title of this post. I'm going with "almost perfect" for 2 reasons.

1) This is great if you've got a QE HF/HF-X. There are loads of them out there.
2) It is very focused on the leading (and expensive) software package.

I know. I know. I've got 60+ videos on an expensive software package over there -->. I'm a clumsy insomniac in a glass house full of elderly half-blind rescue dogs who juggle their time between being tripping hazards and barking at invisible things.  That's obviously a metaphor or an analogy for "I don't have room to criticize"...I don't know which.

And this is a 48 year old bichon with orange/purple dreadlocks who showed more energy in this adorable attempted escape from the veterinarian than she has in the last 4 months here combined.

Still -- this is a GREAT guide. If you've got a high field orbitrap you don't have to tinker with anything. If you've got a D30 Orbitrap (QE Classic/Plus) you can start with this and adjust the cycle times and window widths and you are good to go! My dumb calculator might help. It is fair to note, however, that there are several other DIA software packages out there. I have my favorite (Pinnacle, but I'm biased, I used to work with the guy who wrote it and it's got a very government-friendly price tag). I used a VERY early version of SpectroNaut and -- for real -- it was awesome.

So I'm leaving the title as it is.

Tuesday, January 29, 2019

Raw Beans! Generate loads of graphs from your RAW data!

Sometimes I leave posts written with a goal of checking them the next day for grammar and profanities before posting. Commonly....I appear to forget to go back and hit the big orange "Publish" button... I'm wondering what these 268 things are. A lot of them are me making fun of SWATH and SONAR and the 70ppm mass accuracy of the TIMSTOF Pro, for sure....but not 268 of them....!

Raw Beans is one of these!!  You can stop reading my stupid words and get it here!

Remember RawMeat? If not, don't worry about it. It probably doesn't work with your instrument unless it's older.

You point Raw-Beans at your file full of RAW data and hit the "go" button. And it generates a whole folder full of pretty graphs about your files!

Mass deviation (off your chosen ions in the box above)?

And loads more sophisticated things! 

Monday, January 28, 2019

Twitter Chat on Proteomics February 22nd! Send questions or people with questions!

I'm not entirely sure what we all signed up for.

I'm not entirely sure I understand or like Twitter. It's an amazing tool for rapidly disseminating science but outside of that it's kind of a mess ran by a corporation that obviously puts $$ over ethics at all time. Is that redundant since I typed the word "corporation"? I think so.

Why isn't there a Science Twitter thing? Please start one and if you name it Swatter, I want credit for the idea.

Back on topic:

Some of us Twittering proteomics scientists will be participating in a "Twitter Chat" hosted by the Bioanalysis Zone.

If you've got some proteomics questions or know someone who does, tune in(?) to Twitter on February 22nd. It starts at 3pm GMT (10am EST) and will feature questions answered by:

Matthew Trost (Professor of Proteomics @ NewCastle:

Eduardo Chicano Galvez (Senior Mass Spectrometrist and MS Imaging specialist at IMIBC, Cordoba, Spain)

---I've got questions for both of these guys already....

and on this side of the Atlantic:

John Wilson (Cold Spring Harbor Mass Spec and Proteomics)

and this blogger I know who can't stop just typing and typing and typing.

Please note: Tweets from any of us will be solely our opinions and may not be interpreted as the official statements of the organizations for which we are employed. That sounded fancy enough, right?

Sunday, January 27, 2019

Crazy idea allows single protein detection in zeptomole-scale mixtures....

Imagine that Dr. Fenn never invented electrospray ionization, and no one else was smart enough to come up with it either. This whole mass spectrometry revolution never happened, 'cause even if we all devoted all our time to MALDI or FAB or whatever, they'd never have the impact that directly coupling HPLC or CE directly to mass spectrometry has. 

What if you wanted to do proteomics and you had all these fancy genomics/transcriptomics technologies to draw off of for inspiration. In this issue of "what if" Edman degradation is still a thing that is done all the time. 

Maybe this is what you'd come up with....?

Is this kind of brilliant? Absolutely.
Am I scared to try and describe it beyond what is shown here? Yes.
Is it something that will replace our existing technologies? Not yet, obviously, but there is a lot of potential here and loads to think about!

Saturday, January 26, 2019

Where can I learn mass spectrometry and proteomics?

A lot of people have noticed already the new page --> over there!

Please keep information coming in. I get this question all the time. There are very very very few mass spectrometry degree programs, despite this explosion in the actual need for actual mass spectrometrists. There are ways to learn this stuff. Ways MUCH better than reading the ramblings of some weird blogger.

If you're setting up a summer school or a workshop or something that is aimed toward educating mass spectrometrists, please shoot me an email. Let's build a list!!

If you've already got a great list that is probably better than I can do in my 4 minutes I get to devote to this blog/day these days I will immediately replace that page with a link to your page, just like the Conference thing I couldn't keep track of!!

My comment thing might be broken again. Probably more efficient to email me directly until I look at it! (

Saturday, January 19, 2019

Integrative -omics of limb regeneration!

Okay! Now this is a team that isn't afraid to ask the big questions! You can check it out in JPR here.

I don't know what a Cynops is, and -- honestly -- I probably won't like the answer. I'm fine with assuming it's linked to some of the most nostalgic scenes from after-school TV shows of my young life, such as --

-- strange that he'd be the only character to lose limbs multiple times....I guess....or convenient, since I guess he's the only one who could do that.....

Aww...nuts...I had to look up another paper, this is just a I'm considering ethical implications that I totally shouldn't.....

To stop you from thinking about these as well --

--you're welcome!

This JPR study is really a re-analysis of this open access study from a few years ago, and this is where we'll find the proteomics details.

Not the details of where the samples came from!!!  Don't read those!

The mysterious samples were labeled with iTRAQ reagent. Whether the labeling was done with  4-plex or 8-plex reagent is a tightly guarded secret....wait...I figured it out! They used channels 118,119 and 121 from an 8-plex kit. The samples are taken at 0, 2 and 6 hours when regeneration is supposed to be happening -- suggesting that Piccolo is much better at this than whatever this organism is.

The labeled peptides were separated into 30 SCX fractions and these fractions were ran on a 10cm column on a Q Exactive using a secret gradient and ultra top secret mass spectrometry parameters. (Meh -- "you just load the autosampler and press the button" -- have I ever told you guys about the guy I interviewed who said that to me during his interview to run an LTQ Orbitrap -- 😰he...umm....didn't get the job....) Data processing was performed in Proteome Discoverer 1.4 using Mascot with 20ppm MS1 tolerance and 0.1Da MS/MS tolerance, which may explain why the instrument method parameters weren't described. The Q Exactive probably didn't survive the fire that caused it to need search tolerances that wide. Proteins were considered significant using the device many proteomics people do that appears to shorten the life span of the statisticians who see our work. 1.5-fold in either direction is considered significant.

I'm joking about most of this of course! I prefer my search tolerances to reflect the maximum capabilities of the instrument that is producing the files, that's me. I'm also tired of the meetings with HR about why the stats guy is crying this time, but if you can get your biology story out of your experiment, that's all that really matters.

This group produced two peer-reviewed manuscripts out of these files and obviously drew some interesting biological implications from them. They integrated a dizzying amount of genetics data with it and validated results with rt-PCR, making it a well-rounded and validated story.

Friday, January 18, 2019

LOPIT-DC: Subcellular proteomics returns -- in a much more accessible form!

LOPIT, HyperLOPIT (excuse capitalization spelling and hyphens that all may be wrong) has appeared on this blog more than once. It's brilliant and provides a way for quantitative proteomics with subcellular fractionation. We're often not doing ourselves any favors by mixing all the organelles from loads of different cells together, but fractionating beyond that seems really really hard.

LOPIT -- as smart as it is -- never cracked my list of "I have to find someone who needs data like this so I can do it" protocols. Primarily because it looked REAAAAALLLLY hard for someone who isn't good at sample prep things. I'm not the only person who was thinking this, I guess....

Let's fix that right now!

The figure at the top that was stolen from this nice open paper shows the old protocol on the left and the new protocol on the right!

Set a timer on your gradient centrifugation and then pull your subcellular fraction? I can do that! End up with single TMTplex experiment for analysis? I can do that as well. I bet you I can even take these files from PRIDE (PXD0112554) and process them in something that doesn't start and end with the letter "R"!

And -- get this --BOOM! -- mass spectrometrist friendly Shiny Web Interface!!

Is LOPIT easy now? No, that's probably a stretch. But these tweaks have made it far easier than it was before -- and what other technique can possibly get you these results? Nothing I've ever seen!

Kudos to these authors. Could they have sat back and established a monopoly on subcellular localization proteomics? Probably. Instead, they took the time and made this amazing technique far more approachable for the rest of us.

Thursday, January 17, 2019

Phospho- glyco - proteogenomics of early onset gastric cancer!

I've been waiting for the embargo to be lifted so I could download this data from the CPTAC Portal for what seems like a while now. And the paper is now out, the embargo is lifted and ---

--that's a lot of authors! You definitely need at least this many people to put something like this together!

Now that the RAW data is being very nicely deposited into pre-determined and organized folders by Aspera manager -- wow-- the Q Exactive heavily fractionated N-glycoproteomics files sure look pretty ---

I still have no idea how they did this. I can't just download the data and start pressing buttons, I guess -- it's time to read, I guess....

HOW DO YOU INTEGRATE N-Glycoproteomics and PhosphoProteomics into your ProteoTranscriptomics?

It's got to be in here, because this is what they did.

Want to make it more fun? This is a lot of patient samples. And iTRAQ 4-plex!?!?!

What an ambitious project....and a beautiful output that we can mine for years going forward!

Whoa! Shoutout to @ScientistSaba for providing this awesome video related to this study!

Wednesday, January 16, 2019

Rapid Micobial ID and Antibiotic Resistance Prediction by LIPID signature!

Wait. This is a totally new idea, right?

Biotyper things work by culturing bacteria, and looking at the protein signatures, right?

What if the lipid signatures were just as valid? Then you'd have a complementary technology. Okay -- so the idea isn't totally new -- Goodlett lab has been working on it for at least a few years and last year produced this open access paper on it I appear to have missed.

The cool part about this new paper is that they figured out how to make this work fast! Like hospital friendly level fast!

Wednesday, January 9, 2019

Central dogma rates may explain why transcript measurements seems like such junk science!

Okay -- this new Nature Communications study might just be totally awesome.

I'm going to go with "might" for the following obvious reasons:

1) The terminology is outside of my wheelhouse
2) The math is so far away from the house where I store my wheels that it...(wait...what the hell is wheelhouse anyway..?...) that it could be complete and total gibberish and I would never ever know. My eyes stopped focusing as soon as I got to the page with the formula things.


1) A world class expert in these transcript words sent it to me with exclamation points on it.

2) Every person in proteomics has tried, at some point, to correlate transcript signal to real protein measurements and has walked away thinking "the transcript people are clearly just making numbers up" or "the transcript people are clearly just making everything up and one day we'll laugh about the trillions of dollars science spent on this generation of sequencing technology the same way we look back on global microarrays -- as expensive technology that yielded little or nothing"  Which, hopefully isn't true. I'm not saying it is! That's just what I might think for just one second when someone says "this mi-Seq says there are tons of transcipts so there will be tons of protein" and 15 SILAC peptides say "....NOPE! This is 1:1, yo" Maybe this awesome paper explains what's up!

3) This study might be the great unifier (which, apparently, isn't a word). This group looks at loads of transcription/translation rates in rapidly growing cells from bacteria, yeast, my favorite model organism for human disease, and humans -- and finds there are mechanistic reasons that genes would have high transcription but also low translation. And this might be gene specific. And it might be possible using these techniques to develop a metric for each gene, as in, this a gene that will correleate in transcript abundance and protein abundance -- and this one will not. What a useful database that could be, right?

I guess you could just measure the protein level directly using some sort of comprehensive -omics based measurement of protein levels, if such a thing existed (and cost a tiny fraction of the price to measure transcript abundance), but I'm sure there are diseases and models and all sorts of things where it is still important to understand the indirect things that lead to protein expression as well. It might seem simpler to us to move 1% of what is spent on transcriptomics per year over to direct global protein measurements, but whenever a solution seems that simple and direct to me it's generally that I don't fully understand the problem.

I'll be thinking about this paper all day. Maybe I'll even ask someone who understands the finer points to talk me through more of it!

Tuesday, January 8, 2019

Scary new paper on how centroiding high resolution data affects your quantification!

Wait. What?!? We didn't handle this already? I thought this was all addressed in 2012 and fixed! It's 20189! Okay -- everyone should read this. For real. 

CRITICAL INSIGHT.  I didn't even know that was a choice of article until today!

I'm too depressed now to type about it.

Shoutout to Chris Ashwood for highlighting the importance of this to the community this week.

Monday, January 7, 2019


Maybe I'm just leaving this here for me!

ABRF 2019 -- Abstract Deadline January 21st!  That's like 4 days from now!

ASMS 2019 (Atlanta! w0000h000000!!!!) Abstract Deadline January 31st!!  That's like 6 days from now!!

Get your abstracts in, unless you're furloughed, and aren't allowed....

Sunday, January 6, 2019

Follow-up paper on CHESS -- the new "f-word"

This commentary is a follow-up on CHESS that help drives home the significance of what -omics has been doing and needs to improve on.

Also, possibly my favorite abstract of all time.

Original CHESS rambling on this site is here (some people use "feeds" to land on this page and these may not directly go backward to the next article. It was news to me too! Technology!). 

CHESS -- The New Human Genome Catalog! I don't even like chess. It stresses me out....and I'm already off topic. Back!

Let's talk about this CHESS.

If you do human-based proteomics, chances are you base this on human-based genomics stuff that has been converted and cleaned up and annotated into nice human protein .FASTA or .XML files for you.

The trouble is that the genetics people can't seem to decide yet on how many human genes there actually are. It's gotten so bad that we've had to get involved with projects like C-HPP.

But there is SO MUCH genomics and transcriptomics data out there. Couldn't someone just get 9,795 human RNA-Seq files and come up with a brilliant way to figure out what genes humans actually make transcripts for? Is that so hard?  What would it be, at most, 900 BILLION transcript measurements?

That's what this group did. The scale here is just ridiculous. I'm not all up on the conversion of transcript reads and things, but my understanding is that the Hi-SeQ generates around 200GB of data -- and that system is rapidly being replaced by one that generates a TERABYTE or more of data per sequencing sample. MacCoss lab did some stuff with data minimization of RNA-Seq a few years ago, but for this analysis I don't think there is any way this group could do that. How much HPC firepower did they have access to? A lot.

Honestly, this is yet another paper this weekend that I can read and hear the sounds the words are making in my head, but I can't really grasp. What I can grasp is a HUMAN PROTEIN FASTA I CAN DOWNLOAD!!  You can get it here.

The format is funny when I look at it in my FASTA browser thing.

-- but this can't be anything but useful, right?  This is the stuff that we, as a species, make transcripts for!!

Saturday, January 5, 2019

Targeted proteomics finds early CSF markers for Alzheimer's disease!

There may easily be 12 good reasons to love this brand new study in MCP (early access version is open!)

however, unless this espresso changes my priorities this morning, I only have time for a couple. Hopefully the ones that will convince you to go right now and check it out.

If you were told to do some discovery proteomics to find some biomarkers for a pre-clinical marker, how would you do it? I predict you'd get some samples, digest them, fractionate them as much as possible and queue them up for days/weeks of instrument acquisition time and matching amounts of data processing time.

Do we always need to do this?

I just went to the @ProteomeXchange Twitter account and from January 2nd to 8:24am EST on Jan 5, 2019 --32 new public proteomics datasets have been reported (tweeted) as newly available.

This is how Lleo et al., (sorry, my keyboard won't make all the right letters) did their discovery --

-- they mined a ton of deposited datasets! I can't pretend to understand the biology here, but they are hunting markers that will show up in CSF before the normal markers that indicate the patient has progressed into the disease. I have this vague understanding that many diseases have better outcome the earlier you realize someone may possess it, and studies were selected that might lead them toward this end.

Some of these proteins were mined directly from the papers (they describe in detail how they used terms to search literature databases) and others were pulled from PRIDE/ProteomeXchange and other public repositories.

I'm sure this takes loads of time and bioinformatic and medical expertise, but there are at least 25 complete disease studies here. This composite represents years of work and centuries of combined skill acquisition that no new study, regardless of instrumentation advances, could replicate in a reasonable time frame.

While writing this, I realized that this group also did discovery proteomics on enriched "synaptosomes" and somehow leveraged this against all the data they mined. Again -- I think a lack of understanding on my part regarding the biology is keeping me from the full picture here. An Orbitrap Velos was used for that part, on fractionated (SCX) samples.

Now it's validation time!

They made heavy peptides for their markers. They got real patient CSF based on classifications from clinicians and they got to work (nanospray on a 5500). About half their chosen markers had too much interference to be used.

Looks like Skyline with MSStats for all SRM work -- and convincing -- useful preclinical biomarkers were found.

This is a great study all around, despite my perhaps confusing description of it.

I think we're going to see more of this in the future, for sure. I was about to runs some stuff and found that a group at Yale had just published a nearly identical discovery experiment! I wrote the authors for clarification regarding which file is which (a problem I have with a lot of PRIDE files, but I think I'm probably just dumb -- anyone else have issues with this?). They used a better instrument than what we have here AND the operator is likely better at her/his job than I am. BOOM! Weeks of instrument time that can be used on something novel. All I have to do is process 80 or so beautiful files and start looking for targets!

Thursday, January 3, 2019

Forensic proteomics overview!

(Photo used entirely without any permission whatsoever from this great article!)

The topic of forensic proteomics is suuuuper interesting and for reasons I'm sure we don't need to discuss -- suuuper scary. Cough cough HIPAA cough cough.

How many mutations make it to cause single amino acid variants (SAAV)s? At most one in three, right? I don't remember the details, but I vaguely recall a talk by Zhongqi Zhang where he discussed the likelihood and it is lower than this, due to the distribution of codon tRNA letters and energetics or something (most of the words in this sentence I may have just made up, but it feels like a memory).

However -- there are TONS of mutations in people. Individual genetic variation like all the genes that make me dumb and you smart or contribute to why your hairline isn't what it once was and I still look like I could totally play bass for Slayer.

We're not ready to think about the ramifications forensic proteomics might have for our field. At all. So, let's just think about how cool it is and this article is a fantastic place to start!

Wednesday, January 2, 2019

MoMo -- Find statistically significant PTM motifs!

I've been recently reintroduced to this "motif" idea. The idea is that the enzymes or whatever that modify proteins recognize certain strings of amino acids available in many different proteins as where they should put their PTMs.

Proteome Discoverer 2.3 will have a neat little Motif add-on when it releases shortly!

If you need more,  MEME suite has TONS of motif tools.

You can read about MOMO -- the newest addition -- here.

And...if you're thinking that something else had that name but you couldn't remember what --

You probably had a friend with a Civic or Sentra with a turbocharger that was WAY too big for it that had stickers like these all over the inside of it!

Tuesday, January 1, 2019

Cause no one ever asked for it! My favorite papers of 2018!

It's already time to break out my sequin jacket and end the year at the Laser Disco?!??  How did that happen??  I didn't get a fraction of the stuff I planned to do in the last 12 months done...logically, instead of working on those things I'm going to do my biennial review of my favorite papers of the last 12 months?  Meh. Whatever.

In no particular order!!


You're tired of hearing about it. So am I (not!). This is maybe gamechanger of the year, y'all. When you can boost the S/N of your low abundance ions by an order of magnitude or two, VERY good things happen. I swear I've got an 80% written draft waiting for some feedback so I can get it out the door to be summarily and rapidly rejected by my peers like every other thing I've submitted recently. Hint: BoxCar is great for proteomics, but it's waaaaaaaay better for other things. Where do you need dynamic range the most? BoxCar it!


We can't amplify stuff like those weird DNA people down the hall can with their PZR machines, or whatever they're called. With SCoPE we can -- sortof. You can amplify tiny amounts of DNA into loads of it and then you can use whatever super expensive, low quality, error prone sequencing technique you want to. SCoPE lets you leverage TMT reagents to the increase in interscan dynamic range to quantify peptides that are below the dynamic range you would get with MS/MS sequencing. SCoPE is a revolutionary idea and I don't think we've even started to explore the ramifications that it can have for us. Perhaps more important than the original SCoPE paper is mPOP that makes it a lot easier to do SCoPE (trust me, it's not a great project if you're rusty on sample prep. I screwed it up, I think, and that's why it's a robot's job now.)


If you haven't downloaded this marvelous piece of software from the Smith lab you are missing out. We use this every day -- and every week or two it somehow gets better.

Since MetaMorpheus came out this year, upgrades have landed like
FlashLFQ (crazy fast label free quan)
Crosslinking analysis!
And MetaDRAW (which hasn't been published yet, but I'm using right now). Once MetaMorpheus finds that PTM you didn't know was there, didn't think to look for, and totally didn't have to do anything to find (cause that's what it does -- BOOM! PTMs!) you can use MetaDraw to make sure it's real.

Yo. That looks like MetaMorpheus found me a darned acetylation! The team is super responsive, and loves to get feedback. I wrote them about a feature that would be really helpful for my work -- and the next time I updated MetaMorpheus (it sends you little popups that new builds are available!) the feature I needed was there. If you don't have it, you can get it here. 

FDR for Spectral Libraries!

Spectral libraries sure seem like the future of mass spectrometry again, right? While DIA methods are pushing the development, shotgun proteomics needs them as well and we're seeing great new stuff! MS-Ana is a great new engine (wait. where is the MS-Ana post?), FDR for MSPepSearch added in the NIST stand-alone and in development in other software where it's used, and we can keep on going on. Now that FDR is in hand and there are amazing new libraries like ProteomeTools -- I just need to get off my butt and start using them again. So should you!

More Proof that NanoLC is mostly dumb!

Okay -- sometimes I say something like "I need 10ug of protein to work with" and the person I'm talking to stares at me like I'm wearing an awesome sequin jacket and strobe lights on my head, but most of the time, people have MILLIGRAMS of protein to work with.  I LOVE this paper. NanoLC is likely the weakest link in your pipeline. There are alternatives. Let's not get caught up in dealing with these engineering atrocities as if they're something we can never do without. Evidence is mounting that they may cost us more in productivity than they provide us in sensitivity and this paper is the best one yet.

Biological Application of a Phase Constrained Orbitrap!

I hope hope hope hope the application of the phase constraint algorithms for Orbitraps is just around the corner. Remember the jump in speed we got when eFT was added? This looks like it could be bigger. Do I want 60,000 resolution in the same amount of time it takes my QE HF to do a 15,000 resolution scan? Yes. Please. Now. Thanks.

ProteomeTools PTMs!

Yup, I already mentioned ProteomeTools once. This project is too awesome. This study on the fragmentation of 20+ PTMs from synthesized standards is an absolute gold-mine. There is more to be learned about PTMs fragmentation in this one study than in any book on the topic in the world. No. I haven't read every PTM book in the world, but I'd go to this paper first.

On the topic of PTMs -- what about Chemical Proteomics?

Big, far reaching implications here! Multiple reasons this study is in Cell. You don't have to take the experiment as far as these authors here. Just considering the unmodified peptides that drop in concentration when they're drug treated as something of importance might be the thing to blow the doors open on your project.

I admit it. I'm obsessed with the idea of what we could do with real Clinical Proteomics.

And this is the study I carry around if I want to prove to someone that this isn't 10 or 20 years down the road. We can help now if we're given a chance to.

A re-evaluation of FDR demonstrates some scary stuff about it -- but also how to fix it!

I've noticed a famous software program has some new feature that looks like they took this study seriously. I haven't investigated.

Wow. This could go on all day. Unfortunately it's late, I didn't sleep a ton last night and I have to cut this here. It was an AMAZING YEAR for Proteomics. And I expect nothing but bigger and better stuff from you awesome teams out there revolutionizing all the things. 2019!!!! Let's go!