Monday, April 30, 2018

Orphan kinase demonstrates remarkable phosphorylation control in malaria!

Ummmm...yeah...I have to come back to this one....what...?...gonna need more espresso to tackle this one....

Great TMT phosphoproteomics + knockout of a weird "orphan kinase" that has a lot of homology to all sorts of other kinases we know about = really weird downstream effects suggesting that we really have no idea what happens when protein X is phosphorylated in stage Y in P. falciparum's life process.

Intimidating because...well...we have tons of beautiful diagrams (like this) saying what and how phospho cascades work in this organism.

A relief(? no. probably not.?) because it still seems to be doing all sorts of mysterious things despite how hard really smart people are working to try to figure it out.

This is some top notch work that shows how many mysteries there still are in a disease that infected around 200 million people in 2015 -- and killed over 400,000....

Handy Venn diagram for amino acid properties

I just saw this great figure on Twitter. Shoutout to @reducentropy for posting it.

It is from this incredibly useful chapter on amino acids that you can find open access here. This is getting posted on my wall....

Sunday, April 29, 2018

Prot-SpaM -- Fast alignment of PROTEOMES for phylogeny reconstruction

It is really hard to type this because the picture is just cracking me up. Focus, Ben. Stop laughing.


The genomics people do things like this all the time -- take the DNA sequences and figure out how everyone is related to whom. Prot-SpaM allows us to jump in and look at the phylogeny of organisms (alignment free) from the protein sequences. 

The results look seriously cool when they come out of the software but you have to zoom in like 100 times to tell that you aren't just looking at a weird smear on the page.

Those are compressed circular dendrograms!!  How cool is that?

Now here is the neat thing about this -- classically this stuff is ALWAYS done with gene alignments (essentially permutations on just BLASTing the entire genomes of organisms). This requires supercomputer level resources. Prot-SpaMming doesn't. It requires far less resources AND (this makes sense to us protein people) protein sequence information appears to be a far more sensitive method for detecting relation than DNA alignment. Wins all around!

Saturday, April 28, 2018

MetaUniDec - Deconvolution software that can even handle native spectra!!!

I'm currently (well -- not right this second -- each run is only like 12 minutes -- but...) comparing a protein or 3 under both reduced (acidic conditions) and native (non acidic -- tried like Albert Heck [most appropriate use of this joke EVER] -- not to alter the protein in any way) conditions.

A minor problem is that there is exactly one computer with software that can handle deconvolution of native proteins in our building.

WooHoo! Thank you Deseree Reid et al., for this awesome free deconvolution software that is exactly what I need!

I was kinda bummed because the paper leads you to a download link to the software that looks like you have to use Python to use it (which isn't so much of a bummer since a guy in my lab is a Python expert) BUT then I found this link that leads you to a GUI!!

This is what it looks like in action --

See all these things it does? It has features that are not present in our commercial package -- including some things that I really wish that it did.

Now -- you do have to get the data into the correct format first -- from the manual (which is online here)

Okay -- so that's a lot of words and I don't know what most of them are. FORTUNATELY -- all this AMAZING software is looking for is a text file to deconvolute. That I can handle.

You can make Xcalibur do that for you. First, find your ugly protein peak (you don't get to see mine -- the peak is like 3 minutes wide -- it's essentially just a desalting column)

Then go to Excel and paste it!

Whoa! How cool is that? The RAW file doesn't start at exactly 2,000 (m/z)!!  You'll have to believe me, but it doesn't stop at exactly 6,000 either! Quads get kind of wobbly at that kind of range, right? Makes sense to me.

Now you can make that into the text file and load it. Okay -- actually -- I'm having some trouble with the formatting on it. While my first suspicion is that this was designed for a TOF and the fact I have mass accuracy is probably the issue -- it might be the header format. It's a Saturday night and I've still got to do some sample prep so I'm going to worry about this later.

Even better? You can just directly paste the clipboard in.

It's even better than I thought!! Check this out.

Okay -- this will clearly need some optimization -- but with no use of the manual and I've got a nice peak out of this software on my first shot. For perspective, I don't know how to set the parking brake in the car I got in December. I just don't park on hills.

It was probably dumb to start with an unknown (that appears to suffer from some degradation with in-source collision energy at 90eV) so I put in one of my QC proteins. Check out this cool output deconvolution plot!

The protein should be 42,882 -- this trippy output requires some finagling (which appears to be a word?) but the software looks spot on and I haven't scratched the surface of what I can do with this awesome new tool I now have!

Friday, April 27, 2018

Mango -- Clean up chimeric spectra for crosslinking experiments!

Yes there are around 40 different software packages for crosslinking analysis. And some of them actually work!

However -- Mango (JPR paper here) looks like it meets a specific and important need -- cleaning up the chimeric (multi peptide) spectra for processing!

You know the way that ETD totally doesn't look like it worked unless you clean up the data? I think Mango is going to allow us to look at crosslinked MS/MS spectra that we couldn't extract anything from before -- and provide us with all sorts of new information.

Wednesday, April 25, 2018

MSstats -- Way more than just QC!

MSstats has shown up in my ramblings more than once here, but always in the context of MSstatsQC.  I just sat through an awesome talk that demonstrated that it is capable of much more than this.

You can check out MSstats directly here.

Highlights? An R package that can take data from MaxQuant, Proteome Discoverer, Skyline, OpenMS, OpenSwath, and other stuff (its at least 7 of them) and make sense of it all with advanced statistics all over the place.

The inference algorithm appears to be called the "Accelerated time failure model". I don't know yet how it compares to the ones we more typically use (like the k nearest neighbor) but it sure sounds way cooler.

There is so much power here. Choose wisely.

Tuesday, April 24, 2018

MassIVE made some umm...massive...spectral libraries

I've got pages and pages of notes from ABRF already and as I'm sitting here trying to organize them I'll probably pull together a few blog posts out of them.

On Sunday Nuno Bandeira talked about MassIVE. Of course, I know about MassIVE. It's one place where you can deposit your RAW data so the journal editors will leave you alone about it.

However -- it's not just sitting there. Busybody bioinformaticians are combing through the data trying to find new things (succeeding) -- and they are compiling huge spectral libraries.

What do you get when you compress the most meaningful data out of 30TB of HCD fragmentation spectra?  Other than a file that takes a REALLY long time to download on a hotel WiFi connection? Over 2million annotated spectra.

I may have to give up on downloading it -- or remote login to something on a much faster connection.

Now -- the question remains -- how does this help me?  I was hoping that after I had it I could see if I had anything that could open it or use it as an input (MSPepSearch maybe?) be continued....

Monday, April 23, 2018

Wow! I'm fully aware that this blog has recently been worse even than normal. Part of this is a result of this other project that I'm working on that has been revealed at ABRF today and a description of the project will show up in bioRxIV sometime this week.

If you've had the misfortune of reading this blog for a while you might remember that there was once a tab over there --> that said something like "Orbitrap Methods Database"

This was a problem for me for a lot of reasons. First of all -- I'm not going to write the best instrument method in the world. Chances are I can write you one that will work (if it's stuff I'm good at -- I'm going to write a method that will work pretty darned well) but it's crazy to think that you should be running your instruments the way I tell you to.  There were other downsides and I really had to take it down -- always hoping I could bring it back somehow in a smarter way.

What about this?!?!  What if I could set up a website and Google Team Drive and get a ton of instrument methods -- then -- could I convince some great mass spectrometrists to get with me a few times a year and go through the methods and pick the very best ones?!?

WAIT -- Could I also come up with a completely ridiculous way of recruiting help that requires me to put a dog in a costume and make him do a pose he really doesn't want to do...?

Was this the project I was born to do?!?

(This picture is on the ABRF poster)

Most jokes aside -- here is the idea --

If you are brand new to mass spectrometry and your instrument just got installed -- HOW DO YOU DO ANYTHING?

If you are awesome at shotgun proteomics and someone asks you to get an accurate mass of an intact antibody -- how do you get going?  Do you know that in source fragmentation is a critical parameter?

Maybe you just go to Www.MassSpectrometryMethods.Org and you download the "Intact protein > 60kDa" method for your instrument.  Now you've got a starting point.

Now -- back to the initial idea -- and the picture of Gusto above -- if this is just me putting methods in a Google Drive -- this idea is dumb. However -- if I can get some of the top experts in mass spectrometry to sign on and help me out -- can you imagine how cool the next release could be?

I don't know how to do lipidomics. I'm not very good at Top Down (I'd like to get better). The metabolomics methods could definitely improve. Heck -- the shotgun proteomics methods could be better.

If you'd like to help out -- check out the site and shoot me an email -- its now

Friday, April 20, 2018

FRACTION OPTIMIZER -- Take the guesswork out of 2D optimization!

i had to disable my caps lock key or it would look like i was crazily shouting about how much i love this new study and software throughout this whole post.  you can check out this awesome new tool here!

if you do highph offline fractionation followed by the same low ph gradient on every one of these fractions and you look at the data coming off you'll think something like "wow...i really could get more ids if i optimized each fraction separately."

but that's a lot of work (and it will impact your reproducibility if you are doing something like label free quan). plus it would be a lot of work. one run to see the relative elution times and a second with the reoptimized gradient..?...

fraction optimizer can do this for you!!!

100% recommend you check it out. you can get the software directly here.

Wednesday, April 18, 2018

PRESTO! Collect all sorts of data and let MatLab sort it out.

EDIT 4/24/18: If you don't have MatLab there is a PRESTO stand-alone! You can find it here. 

I can clearly remember my excitement when I first saw 2 different "-omics" datasets integrated together (that totally worked) in a great study. It might still be in the 1,000+ entries on this scrambled blog, but who will ever know (the search bar appears to lose power as the entries get older -- which is honestly fine by me, I've said some dumb stuff over the years)

These days -- I'm still impressed -- but it's a lot more common. However, I haven't pulled one off myself yet but I'd sure love to....but where do you even start....what about here?

First impression -- wow -- this looks seriously powerful. This team pulls in a bunch of different datasets -- microarrays, proteomics, my heart jumped for just a second because I thought they were also integrating CyTOF data (I don't think they do here, they just discuss the statistics involved) and through "t-stochastic neighbor embedding" --


-- they massively reduce a staggering amount of signals from various sets to very small and shockingly meaningful observations.

Is it a trick?

Honestly, I can't say for sure, but the results seem logical and really impressive. (Two gifs is probably too much for one post).

I start to get nervous as soon as we start talking statistics things -- but if it gets you to a small number of targets that you can validate (and they do) it's a WIN.

I made sure to mention MatLab in the title of this post. MatLab is not free software (though trials are available and many many University's have license deals for the software). There is also a home version that is roughly 1/10 the normal license price, but has some limits on it's functionality. If you have access to this software --- and have some huge and intimidating "omics" data sets you should definitely check this out!

EDIT: I accidentally reread this and I feel like I didn't emphasize this study correctly. PRESTO is a utility that these authors developed that runs in/on MATLAB. I don't mean the title of this post (which -- to some extent -- can't be changed now) to detract from the awesome amount of work PRESTO is.

Tuesday, April 17, 2018

2018 EuPA School of Practical Proteomics in Vienna!

Hello Vienna in the summer time!! (I was there in October on vacation 2 years ago and fled south as fast as a 2 cylinder diesel KIA rental "car" could get us somewhere NOT COLD.

I plan to spend far more than 8 hours in Vienna this time. I want to learn all the things on that list. Oh. And I'll be there rambling about some stuff that I'm doing in my lab as well, but nothing as cool as the topics listed on the cool picture above.

If you are interested in learning Advanced Practical Proteomics or know someone who might, send them this link (I think the course tops out at 30 students!)

We can try to solve this mystery together...

...why are there rubber ducks everywhere!?!? 

Friday, April 13, 2018

Determination of Site-Specific Phosphorylation Ratios in a single PRM run!

I've been sending this great new JPR study to lots of people for weeks now and I forgot to post it here! (If you think this blog seems frantic -- you haven't received many emails from me....I need an embedded exclamation mark filter...)

Honestly, the picture does a great job of describing the workflow, but I love to type!

The phosphopeptides are going to have low relative ion signal on their own -- maybe too low to find in cell lysate even with a PRM, but a TiO2 enrichment will do a great job of enriching your single phoshorylation site. And -- okay -- hard to admit, but some proteins are too low to find with high confidence without a stupid antibody enrichment thing --particularly in body fluids, but do you have to run the instrument 3 times? These authors say --

(How did he know that was going to work...?...)

Just combine it back together and schedule the single (per appropriate replicate, of course) LC-PRM assays. If your protein and phospho are high enough in abundance that you don't have to add in the extra variables of TiO2 and IP and can just build your own -- maybe using the amazing resources at PhosphoPedia -- even better, but if you don't have enough sensitivity, this method could save you valuable instrument time!

Thursday, April 12, 2018

A standard to test inference algorithms!!!

I absolutely have to run out the door, but I have got to leave this here so I can read it if I ever get caught up again!

Proteomics is great at assigning MS/MS spectra to peptide identifications (Peptide Spectral Match -- PSMs).
We are...not so figuring out what protein that PSM belongs to.
We try really hard and we've got lots of things that do it for us.

As far as I know we've NEVER had a good standard for testing how well that algorithm builds a protein identification out of a PSM.  (Sure, we have years of evidence that the things we use work) but this is a standard specifically engineered to help test the efficacy of algorithms!

Check this out here!

Wednesday, April 11, 2018

Skyline User's meeting registration is open (and free) for ASMS 2018

Need a reason to get to beautiful San Diego a day early for ASMS 2018?

Check out the ridiculously cool lineup of things going on Saturday at the Skyline User Group Meeting!

Normal Skyline stuff is covered. Then it goes crazy.
Skyline for glycans
Skyline for small molecules
Skyline LIPIDs!!
Skyline for drug monitoring stuff.

The final talk is Matt McDonald from the University of Pittsburgh. I've had the distinct pleasure recently of working on a project with his lab where they generated the data and I did the downstream data processing. This experience was humbling because I've never -- in my life -- generated data as good as what this team rolled off of an Orbitrap XL. The study is being written up now by the people who generated the samples so I can't go into it, but I think this lab is quietly re-establishing the boundaries of what we can do with clinical proteomics if we step away from our normal routines and put experimental design and QC as our number one priorities. 

Registration is free, but space is limited. Also -- some evil corporations are providing lunch!

Tuesday, April 10, 2018

Clostridium acetobutylicum regulation with acetylation and butyrylation!

From the title of this blog post (and this great new paper in press at MCP) this sounds really obvious, right?

A bacteria that is famous because it can make really high quantities of industrial chemicals as it grow anaerobically (some of which are shown in the picture above I stole from this great ASM study) regulates itself primarily by the derivatives of the two things it's most famous for producing.  End of story.

However -- what the heck is a butyrylation!?! And just because your metabolic pathways produce weird things that doesn't mean those are great things to control your metabolism with, right? I don't know, I find this study really cool for these reasons

1) This is a super weird way of regulating anything.
2) Lots of people are working with this bacteria and trying to make it make more of these chemicals they produce. Virtually all of the world's acetone and butanol is made from propylene (from fossil fuels). Dirty, inefficient systems that are only functional right now because we're pumping these things out of the ground like they aren't in any way finite. At some point our species might take a step back -- consider that we aren't acting any more intelligently in terms of resource utilization/waste management than log phase E.coli in a swirling flask -- and look at alternative ways of doing things. This bacteria could be a great alternative way of getting a chemicals we take for granted! What was I saying...?

OH YEAH!! But consider this -- microbiology is done by genetics. Knockouts, overexpressions, but -- this awesome paper shows that this isn't how this bacteria regulates itself in terms of fine control -- it's regulating itself for all the important (industrial type production) things with PTMs that even 100x coverage on a Hi-SeQ is not going to show you. You want to REALLY engineer this bacteria? You need to find out what a butyrylation is and how to monitor and regulate that.

Saturday, April 7, 2018

Optimized collision energy by ID rates!

HCD is awesome. It's super fast. It breaks more-or-less right along the b/y ion backbone that you get from CID and it's super fast.  Also -- it's fast.

Unfortunately -- nothing beats CID in terms of predictable fragmentation. HCD fragmentation efficiency can vary quite a bit (look a the actual eV used in your scan header when you're using normalized CE for proof of this) and is dependent on some additional variables like the fragmentation energy calibration, your HCD gas pressure (and -- no proof of this yet -- the quality of the N2 being delivered.)

I think I've rambled on here more than once about my love of the PROCAL peptide standard. One reason for that is it gives you the capability of calibrating your HCD CE so that your fragmentation patterns match between instruments (PROCAL paper here).

Okay -- so that is one part of the equation -- I need 29 HCD on the Fusion (today) to match 27 nCE on the HF -- but....what's the ideal CE? This team takes a swing at it here! 

It's more complicated than you'd guess, unfortunately. But this team sets up a really nice mechanism for determining it.  They use a series of different fragmentation energies versus Mascot scores and other metrics and work out this multiphase ideal shown above. There is some further interesting info in here, including how to strengthen different ion series --- super important if you're thinking about doing something crazy like studying peptides where the y-ions aren't going to be as helpful to you.

Friday, April 6, 2018

Time to level up glycomics -- with your old ion trap!

It should come as no surprise to you that glycans are super important in all sorts of diseases. What we normally do, however, is say -- "hey! this site used to be glycosylated" and put it on the list (because we got rid of the glycans). Totally valid. Great science comes from this and will continue to. While we're seeing more studies with glycan oxonium ion triggered ETD that can work out peptides with glycosylations and what those chains are there are some limitations. First off -- 2 fragmentations slow your instrument down and second off (?) there is a finite limit to the length of the glycan chain you can study and third off (? I should restructure this sentence?) many of the stupid sugars have exactly the same stupid mass -- so you can't tell them apart.

As the body of work continues to build that the actual sugars within the glycan chain is of paramount importance -- here...and here.... --- is it fair to ask the question -- are we focusing our powers on the right side of the molecule?

Of course whether the glycosylation occurs or not is important.
Of course it would be great to know whether you get a di -- or branched-penta peptide at this site. But -- what if you treated a cell and the biggest and most profound difference in that cell was something like this --

-- umm -- I give up. I can't figure out how to rotate this. You're going to have to turn your head. What if your drug functions by completely eliminating an important class of glycosylations -- or two -- that have the same mass as one that it doesn't eliminate? Maybe you could find it, but it sure would be great to have a pipeline specifically find stuff like this (P.S. those papers up there aren't weird -- cancer people are talking about glycans all the time and coming up with crazy ideas for how to study them like making them stick to glass arrays and using lasers and stuff -- dedicated glycan analysis workflows could be VERY popular for your lab) --but how would you ever set one up?!?!  I sure don't want to think about it....WAIT....CHECK THIS OUT! 

What if this team was already setting up a revolutionary new kind of glycan analysis pipeline? What if you already have everything in your lab that you need to set it up? HPLC? Check. Ion trap? Check! Skyline?!?!?  YEAH!! 

In negative ion mode diagnostic fragment ions can be produced that can tell between the sugar isomers. This team works this out and shows you how to set up high throughput workflows to figure out what these sugars are. And -- it's Skyline -- you know we can quantify them. Time to break out the Accela (...or...umm...something else....) and put that Ion Trap back to work full-time!

Thursday, April 5, 2018

Boring stuff Thursday #2: How to improve your life with Morpheus!

If you are one of the people who read this weird blog, I've probably convinced you to download Morpheus (and METAMORPHEUS, btw, I think the paper just came out for it -- I can't tell you about that today because it's boring stuff Thursday).

Here are some boring tips that will make you like regular Morpheus better. Associate your TSV files!

Your Morpheus results will pop out as a bunch of .TSV files and some XML thing. If you are a smart data person you probably know how to use the XML thing. I'm not -- and I don't.  I use the TSV files and if you are a smart data person please stop reading right about 2 lines ago. Because I'm going to use Excel (...trigger groans....)

However, Excel doesn't know what a .TSV file is either, so you have to associate it. To do this (told you this was gonna be boring!) find the folder where your processed Morpheus stuff went:

Pick one (I only open the Protein_Groups one (I mostly use Morpheus just to get a good snapshot of my experiments and to check my QC standards)

Right click on it and go to Properties

In properties you need to find "Open With"

Here it looks like Windows recognized Microsoft Excel. I assure you that it does not -- and will not. You'll need to Browse for Excel. I can't describe to you how to do that without leaving a lot of profanities typed on this page. If you've used Bing much, you won't be surprised to find out that if you do choose Browse and then think that the search bar that pops up in Microsoft Windows can find the program Microsoft Excel....

...but it is in there somewhere. Once you find it. Checkmark "always use the selected program". From now on your Morpheus output files will all open in Excel. And you'll be so mad when you go to the LTQ you don't use as often and realize you have to do it all over again. But -- eventually -- they'll all be right...

Boring stuff Thursday #1 -- Not all Glu-C enzymes are the same.

SO...Here I was with evidence it was finally here. I'm insane. For real. Not "Oh, look at the old guy climbing a tree outside the bar, I bet his kids are embarrassed" crazy.  The real one. Where you concentrated really hard and on the 5th time you still FAILED TO DO AN IN-GEL DIGESTION.  You QC'ed the instrument again. And still nothing. Air bubbles in all the vials? You've checked everything and that help wanted sign you saw on the door at the gas station this morning strangely pops into your mind. What is that doing there? I'm trying to troubleshoot something. No random flashbacks!! Wait. Did I change a nanoLC column at 9pm last night? Why would that even make sense?!? 

Wait. What's this? The Biopharma Finder runs you queued up last night finally finished (my gosh -- great software -- but we need to see if we can install it on a cluster or something -- slooooow....) says this other protein digested great with trypsin AND LysC, but no Glu-C peptides at all?   It's not me (exclusively) me!?!? It might (also be) the enzyme!?!?

Okay -- so -- I'd just assumed everybody had the exact same enzyme source and they probably just change the label on the vial. Apparently not true.

Promega (don't sue me!) clearly states -- suitable for in-solution digestion. Doesn't say in-gel. Ask my friend who didn't take 5 years off from science. Of course she already knew this.

Okay -- crisis averted -- I guess. Especially since a friend can loan me one that is compatible tonight!

Pierce's Glu-C specifically states it is in-gel compatible.

NEB says theirs is and recommends a specific buffer they provide and protocol.

Roche Glu-C is in-gel compatible.

Probably others. And maybe Promega's is too, but it doesn't work in my hands. It worked great for an in-solution digestion, though.

Wednesday, April 4, 2018

Just leaving this here so I don't forget to read it!

Link here.

I bet it wasn't this one....which is still hilarious....

...and better than I could do...and turned out okay in the end (NY Post article)