Friday, July 13, 2018

A really insightful analysis in paleoproteomics!

I'll be honest, I don't quite understand the paleo part of this study OR the proteomics part! However, my slowly caffeinating brain still realizes that what I do get suggests that this is really cool and has implications for what I do every day!

Besides the obvious WOW factor that this is a proteomics study in an Evolutionary Biology journal(!!) there is still a lot of insight here even past the paleontology part.  Dr. Welker is drawing conclusions from proteomic analysis of both distant relatives and very close ones (humans to chimpanzees) using different search techniques to show where they do and don't have power to make connections.

While most of us aren't doing paleoproteomics, almost all of us are searching proteomics data against protein FASTA files that don't contain the exact sequences of the organism you just lysed and chopped up with trypsin. Individual proteomic variation like single amino acid variants (SAAVs) are undoubtedly having an effect on just about every run we queue up. Maybe error tolerant searches like the ones used here are the temporary fix we need until proteogenomics gets easier (and sequencing gets a little cheaper)! 

Thursday, July 12, 2018

rawDIAG -- Rational shotgun proteomics optimization!

I'm not 100% sure that I appreciate the implications. Rational optimization?!? What is the alternative? Only some small percentage of us are randomly pushing buttons.  I'm joking about one or more things, possibly....

However -- I'm 100% that I love this idea and new tool!

Even us old seasoned mass spectrometrists can fall into the optimization pit. Building parameters and changing things more on feel and hunch and previous experience. It might totally work out -- especially if you have loads and loads of experience, but even then -- wouldn't you feel way smarter if there were concrete facts behind how you designed the experiment? In the absence of facts, what about fancy statistics?

Before you move on to looking at something smarter than this site because it says R package (gross), there is also a GUI for this! They even blast through some real world data (hundreds of core lab generated files) to show that this thing knows what it's doing!

Tuesday, July 10, 2018

Some 2018 MaxQuant Summer School Videos are up already!

And here I was thinking after a long but very great day that the dogs and I would watch some PandR -- and I get a YouTube alert that the MaxQuant summer school videos are already up!

If you right click on these videos below it will take you directly to YouTube where you can better control the screen size and resolution. Once you go to one of them, you can find the rest.

Here is this year's intro (and BoxCar)--

This one is...ummm...intimidating...? But really shows what you can do with Perseus if you know what you're doing -- Network analysis. No one ever said this software wasn't crazy powerful....





....but it can get a little complicated still....

This year's program can be downloaded here and that should help to find the lectures as they continue to be uploaded. I thought I'd link them all here, but then I thought...

Wednesday, July 4, 2018

ProCal (JPT Retention Time Standardization Kit) m/z

This might be something I'm leaving on the blog mostly for me so I don't have to look for it again, however, maybe someone else would find it useful?

If you are using the ProCal (now called the JPT Retention Time Standardization Kit) you know what I'm talking about and have probably already done this yourself, but if you lose your Excel spreadsheet with the m/z you can download mine here.

If you don't know what I'm talking about you should check out this paper!

Retention time, mass AND COLLISION ENERGY calibrators.

They are a little expensive compared to some of the other QC reagents (10pmol of the 40 will set you back $99) but the ability to easily verify that HCD CE 30 on the Fusion still looks the most like HCD NCE 27 on the HF...?..this is the easiest way I know to verify it!

Monday, July 2, 2018

XRNAX -- The ultimate methodology for studying RNA/Protein interactions?

I'm seriously beyond impressed with this new study at BiorXiV. What an amazing amount of work in a critically under-developed area!

I feel like this stuff comes up all the time. How do we study what is happening between the RNA and protein. If you're thinking -- bleh -- who cares -- didn't Rosalind Franklin and some guys who lucked out because some crazy alcoholic congressman hated Linus Pauling solve all the DNA--> RNA --> protein interaction stuff? We have the central dogma, right? We're good to go! 

Easy to think that but -- it turns out that it's way way way more complex than this. You know how they teach us there is like 4 nucleotides?

.... As of 2006 there were over 100 post-transcriptional modifications of RNA nucleotides known and I know 2 people that are working on studying new ones and whether they are biologically relevant. I only learned this recently and I was inconsolably depressed about it for like 4 minutes. For real. That's messed up.

Okay -- so with that in mind. Let's go into a less comfortable framework. Maybe we don't know ANYTHING about how protein and RNA interact, why they do it, and what happens when they don't do it right?  What the Heck do you do when someone asks you to study these things? If it's an email, you could probably pretend it went to your junk filter for -- what? -- 5 weeks? That's fair, I think. If you've got an office, though, you're probably going to get an appointment request by week 6.

What do you do? Schedule and appointment for 2pm and take a Xanax at lunch? (Obviously -- your legally obtained, doctor prescribed medication -- I assume having an office automatically qualifies you for some sort of anti-anxiety medication. Everyone knows where to find you now!)



This paper is huge and kind of intimidating. It's in BioRxiV now, but this should be in Cell or something equally impressive soon, I'm sure. But check this out. You can go to and the website walks you through something huge and scary like RNA/Protein interactions -- unbiased -- ridiculously powerful -- and it starts like this --

It's got photographs of everything. All the steps. All the materials and methods are so clearly described and pictured that I could probably do it.

I bet that's the clear stuff!!

It doesn't stop at the sample prep. The different levels of experimental design depending on what your question is. Do you want to know exactly where the nucleotides and proteins interact. Do you need something focused or quantitative global (SILAC)? its' all in here.

Applications of this XRNAX are described in the paper and at, but I suspect this is just the very tip of a huge iceberg. There is so much that we can do with this. There are a ton of fundamental biological questions to go after -- you can't tell me that when cells are irradiated or blasted with free radicals that it just busts up DNA -- other things have to be affected -- but DNA is the only thing we've ever really looked at (cause it's way easier). XRNAX gives us a whole lot of ways to look these complex interactions that I don't think we've had before.

While I'm just rambling. There are lots of other gems in this study. This is the first time I've seen the awesome MSFragger in action (indiscriminate PTM IDs -- they do +1000Da!!). I should probably stop typing about this, but -- wow -- you should really check this out!

Sunday, July 1, 2018

CharmeRT -- Time to get a lot more peptide IDs with advanced second searching!

Hey. Could you use -- I dunno... --60% more IDs in some of your datasets?!?  I'm going to assume you said "heck yes I could, but that would be crazy to ask for."

Say hello to CharmeRT!

I've been looking at this since it came out on Thursday -- and I still don't get all of it.

What I do get? Just about every time we fragment a "peptide" in a complex dataset, we actually fragment the thing we want to -- and everything within a dalton or two from it. You don't even have to look hard in your data to find really good peptides with tons of background behind them.

Random dataset I just pulled, sorted in terms of highest XCorr and highest coisolation interference.

Literally ever b and y ion was detected for this short peptide. I'll go way out on a limb here and say that  it's probably this peptide sequence. But look at all that other crap in the background!

Look at it's MS1

The target peptide and it's two isotopes are there -- but it looks to me like there is another peptide in there, right? We just want the one, but I fragmented everything in there that is yellow.

Could we go in and remove those nice matching MS/MS fragments above and re-search this spectra knowing that something in that yellow MS1 isolation window was the parent peptide?

And that's what CharmeRT is doing. Now -- this is where I get a little lost. It's more complex than this because it utilized retention time calculation through Elutator in some way to aid in the scoring. How cool is that? My guess is that on the second round you have to open up your MS1 tolerance -- so anything to help with your confidence has gotta help. Retention time is a great place to start!

The problem with retention time calculations is that you might be doing something really weird with your chromatography. If you are handpacking with noble gases while underwater or something and you just thought that this wasn't for you -- you just need this thing!

The Elutator RT trainer lets you load up some of your own data from peptides that have eluted off that mix of 80% C-18 and 20% Camel Dental Plaque that you know does the best job of getting those hydrophilic peptides -- and lets you create the training dataset that takes this into account for your lab and your conditions.

You can get all of this right now here!

Saturday, June 30, 2018

Battle Royale! 7 Serum depletion kits. Which is best?!?!


As fast as things have been changing in proteomics -- and with all the upstart little companies with all their cool technologies, it's hard to keep up. If you haven't had to do plasma/serum proteomics in a while -- this might be the ultimate guide right now.

Yeah -- I know it's ElfSeverer -- but the pre-formatted version is open access -- and the supplemental table is an Excel document that you can download from the abstract that provides a lot of the results.

I'm definitely not going to complain about anything since this great team in -- wait. what? Google Images informs me that they are on an island in the Mediterranean that looks like this --

-- this is correct -- good for them!

Back on topic -- this study was obviously a lot of work and it sure is going to save me time the next time I have to think about which depletion kit to use for what.

Friday, June 29, 2018

How to get data from Proteome Discoverer 2.x results uploaded to ProteomeXchange!

Congratulations! You're about to submit a manuscript. I, quite seriously, probably can't wait to read it. Shoot me an email if it's really smart. I can't guarantee I'll put it here, but I'm always looking for tips on what to read next.

Okay -- now you've got to get that data uploaded somewhere so the reviewers and nerds with insomnia can download and look at it, right!?!?

If you haven't submitted data in a while and you're using a new version of Proteome Discoverer you might rapidly find that the tools you used to use for PD 1.x require you upload a .MSF file.

Your results in PD 2.0 or newer are actually something called a .PDresult file. (You have an MSF file, but it isn't your FDR filtered results). Don't fret. You can directly export the data out of PD into mzIdentML (which is often shortened to .mzId -- just to keep you on your toes.)

Open this revolutionary dataset that you have generated in PD 2x (2.2SP1 pictured) apply your filters, and export your mzIdentML or mzTab. Boom! Done!

Add that to your complete submission.

Oh -- this is a great recent shortcut as well -- you can, of course, still directly upload to your storage site of choice, but if you are using a ProteomeXchange partner, you can shortcut by using the PX Submission java GUI. You can download it here.

Thursday, June 28, 2018

MzMine -- Find changing features in any LC-MS datasets!!

I left a post here back in September mostly to remind myself to check out this software. Then I found that post while looking for a piece of software that does this. Who needs a memory? Not me!


You can get it here.

Did you think the idea behind SIEVE from Vast (and later Thermo) was a great idea? Let's look at what is changing from one LC-MS run to the next -- and THEN let's try to ID those?

Did you also try SIEVE and wished it was: More stable, faster, more stable, had more features, was more customizable, more stable and didn't make you frightened to leave your 32-bit PC alone in your house when it was aligning big files?

MZMine is everything that SIEVE could have been. I can be mean about it now that it's been retired, right? Worth noting -- Compound Discoverer can do everything (and much more) that SIEVE could do -- for small molecules. It doesn't work well with peptides -- in my hands at least.

There are two downsides to MZMine that I've found already --


You can pick up most software for LFQ analysis and it will make a lot of useful assumptions for you. Like -- you'd like to align your peaks the one way it knows how to and you'd like to filter them in the one way it knows how to and you'd like to merge and filter them and it has a logical order for doing all those things.

In the MZMine your destiny is your own.

Ummm...what..?. ....


In MzMine -- you need to do some thinking -- maybe lots of thinking. You need to take your data and work through each step of processing it through. You appear to be able to do stuff that might not make sense at all. But you can always remove what you've just generated. I'm currently pressure testing that functionality.

Filtering? There's 5 of those. Peak detection? 8. Alignment? At least 3. 

Once you develop your pipeline that isn't crazy at all you can build a batch mode to walk your data through all these steps. Okay -- I'll assume that's what you can do -- cause for all the attempts I've made for everything to turn out right Elizabeth is still being obstinate about Mr. Darby and Captain Bingsley seems like he'll never marry Lydia -- who was what? 14? Blech. What I mean is that nothing is working out in anywhere near the way I (or old Mr. Bennett) want it to. 

Did I successfully interest you in checking out MZMine? Or did I just bring up half forgotten memories of creepy old books you had to read in 9th grade? If it's the former -- there's tons of documentation on MZMine. Here is the manual I just found. (PDF download) this is not a piece of software where you can succeed while randomly pushing buttons. 

Wednesday, June 27, 2018

Need a cool fall class project? Have an extra turkey?

Is anyone else jealous of the cool stuff undergrads get to do in class these days? I've met a number of people recently who used MS in undergrad classes and have done some really serious science. \

Last year,  Rich Helm's BioChem 4115 at Virginia Tech did this one (all the students are even listed in the manuscript)!

If you're looking to mix up your lab class in the fall, this might be an interesting way to do it.

Tuesday, June 26, 2018

Complex Portal -- New superpowers for UniProt!

It can be hard to keep track of all the awesome information and links that are around EMBL and UniProt. There is so much information now that sometimes I can't even find what I know I'm looking for when I know I'm on the right page (this is a good thing even if it makes me feel just a little crazy sometimes).

ComplexPortal is another new UniProt power -- looks to me like it's easier to go to ComplexPortal and then follow the UniProt links back to get specific information on specific proteins.

Bonus: If there is lots of information on your complex it'll get all wobbly and animated for a second and make you wonder when you left lab last night.

Pro Tip: My web browsers at home have Adblockers installed. I don't have to whitelist this page, I only need to click links twice to get them to load, but if you aren't getting anything to show here you might need to allow popups from EMBL.

2 dimensional online fractionation with EasyNanos?!?!?

I have honestly never considered doing this. And can't think of a good reason for doing so now, but in case one comes up -- at least I now know it is possible -- and there is loads of proof. 

Check this out --

What the Heck!?!  These authors take our boring old inflexible EasyNLC and without adding pumps or anything -- turn it into an online 2D system. Somehow they get it up to 21 hour separations -- moving their QE Classic from 3k proteins from a HEK293 digest to closer to 8,000.

I found this while looking for something in no way related. Now that I know about it, however, this isn't the first or last time someone has pulled this off. It's just the one that Google Images knows about (if you hit a paywall -- I definitely didn't tell you that you could maybe get the paper if you went through Images, because I'd never ever recommend circumventing a paywall!)

Now that I know about this -- I found another instances of people doing something similar. This one is from 2010.

Monday, June 25, 2018

Common contaminants in mass spectrometry!

I thought for sure this was on this dumb blog somewhere..but I can't find it...

Hey -- and if anyone has ever updated this ultimate guide to contamination from 2008 -- please let me know!

Got something weird coming off your column? Chances are it's in this thing!

The paper isn't open access so if you can't get to it -- do I ever have good news for you!

Boston College Chemistry Department hosts the supplemental lists. If you use them, please site this amazing work, but you can get the full lists here!

Saturday, June 23, 2018

Worse than boring post -- Trying to find specific MS/MS scans in Xcalibur!

Imagine this strange scenario -- you have a peptide that you discovered with your search engine and you just want to take a quick look at it in Xcalibur.

You have the assigned monoisotopic mass from your search engine -- all the way out 4 decimals.

You are easily able to find it by XIC with ranges like this --

And now you want to take a look at the MS/MS spectra in Xcalibur -- you open up ranges for the MS/MS spectra triggered -- and it isn't there -- there is some stuff that is close -- but nothing exact...

Ummm....where'd the MS2 go? 

Turns out Xcalibur has it listed twice. 

If you are just scrolling along through with no range filters applied you'll find the top spectrum (T), but if you go along with ranges you'll need to hunt down the bottom spectrum (F).

Note that this is the exact same MS/MS scan -- but the top one is the mass you are looking for and the bottom is the one you'll have to use in ranges to find it. How fun is that?!?!

I have no proof of this -- but I believe that the bottom mass (and one that is assigned in ranges) is the preview scan mass. The Orbitrap doesn't complete it's full MS1 scan before it starts acquiring ions that it is going to do MS/MS on. We don't want to waste that much time.  Partway through the scan it's already created a list of ions that either passed MIPS (also called PeptideMatch) or the minimum intensity and charge cutoffs you provided for it and then starts working on getting them.

In this case -- this wasn't hard to find. We're off by just a tiny amount. And this is the Tribrid. This issue is much worse on the Q Exactives.  Here is a typical example.

A standard tactic I employ is opening an MS/MS range for every ion mass close to my ion of interest.

In this case I can find the scan on the second attempt

The difference here is about 0.02. Off the top of my head, that sounds like about 30ppm. Neat, right!!??!  My second favorite part about it is the fact that the ranges aren't listed in --- numerical order. 

This post is worse than boring. I honestly thought when I started this that I'd come up with a clever solution for finding what I was looking for -- and I still don't have one...

It is worth noting that Xcalibur is kinda old. And FreeStyle is probably meant to replace it in the near future for good reason.  Maybe FreeStyle is the solution?!?! 

EDIT: 6/26/18. Thanks for the tips in the comments, y'all!!  I will try tracking by scan number. Unfortunately -- and I should talk about this later -- sometimes I'm trying to track Minora features that don't provide you with scan numbers. 

I did try FreeStyle and it does the same thing -- 

--- but it does it SO MUCH FASTER that it might become my go to almost immediately with one really weird outlier -- data from our Elite only gets 2 decimal places by default while the Fusion gets 5....

...which is probably my fault, but I can't seem to change it...FreeStyle automatically recognizes the correct settings for the Exactives and Tribrid and that saves me a lot of steps. 

Friday, June 22, 2018

MOFA -- Reduce the dimensionality of all the data!

What a great way to start my morning!
1) My Twitter feed popped up a paper that I checked out because it had a funny name (MOFA! link here!) and, while a little scared to check Google Images, it turns out it is a moped that has pedals!

2) I realize on page 3 or so of the paper that this is one that my wife was talking about that started a conversation that we should write some journals and suggest that software links are provided in abstracts.

You can get the software for R or Python here.  This post isn't just rambly wasted time! The link is hard to find in the paper. With my service today complete -- time for (probably inaccurate) rambling!

What is MOFA?!?!  Multi-Omics Factor Analysis, duh.
Could that mean anything? Sure it could!

What does it mean here?

It means a new way of integrating data from all sorts of input -- the more I think about it, the more I like it. However, after 4 shots of espresso there is a period of time in the morning when I like everything, especially my cat.  Sorry, this has been cracking me up all week....he's fine with business catsual.

Stop laughing at the cat, Ben -- be serious and talk about dimensionality reduction!

How are we doing things in proteogenomics/metabogenomics/multi-omics right now?

1) Somebody does transcriptomics on the cell/patient and works out a huge list of the transcripts that are changing (and probably those that are unique to the cell -- variant call files and such, but lets ignore those right now)
2) Somebody else finds a list of small molecule features that are changing from sample to sample and assigns the best metabolite ID they can to all of those features
3) You identify as many PSMs as you can and then quantify those.

Generally these lists are reduced to what appears to be significantly different between these groups -- based on the significance that makes sense for each individual experiment. This is likely highly driven by the depth of coverage and the number of samples. It isn't hard to imagine a problem if you had 300 metabolites quantified compared to 30,000 transcripts quantified, right? Is the significance cutoff the two lists the same? Sure, your cutoffs make sense in each individual experiment....

Then someone converts those lists to something universal -- probably the proteins to gene IDs (which has some serious weaknesses I should ramble about some day) and then puts those lists all into KEGG or Ingenuity(tm) or something similar. (Perhaps the complete lists are fed to Ingenuity).

MOFA says -- before you do all that stuff -- why don't you just try reducing all the factors to what changes between your sample sets?

What is the output from all of these things? 3 dimensions.

Dimension 1: The patient or sample
Dimension 2: The transcript, PSM/protein, metabolite ID
Dimension 3: The relative quantification you get for Dimension 2

What if -- for just a minute -- you forget where that data came from? What if you didn't care that this was a metabolite and this was a transcript and so on? Now you just have a big list of things about your sample versus the other samples and their quan. Could you just reduce the data to seek the factors that are explaining the variance between Sample A and Sample B? (More realistically -- Sample Set A and Sample Set B -- a big n is going to be required to do it)

This is probably inaccurate -- but this is what I interpret that MOFA is doing. Massive multiomics data reduction.  Figure 5 was what finally convinced me I was on the right track logically about what was happening here. I suggest scrolling down to it and then start into the results section.

The paper is open access, you should check it out, because they look at 200 patient samples with multiomics data integration and they pull out some really interesting observations with this approach suggesting that this makes a lot of sense.

30,000 transcripts with abundance --> get a significant list
+ 3,000 metabolites with quan --> get a significant list
+ 8,000 proteins quantified --> get a significant list
Try to combine that significant list with cutoffs that make sense in terms of the data source itself but perhaps border on arbitrary compared to the sum the total variance from all the data as a whole.

OR MOFA it all down to what is really different between your samples first while using the sum of all the data points you've all worked so hard to generate together to increase the true power of this huge effort?

By the way -- they don't ignore the mutations and stuff in their study. They integrate all that too!