Friday, September 21, 2018

The modern guide to proteomic statistics!

Okay, on the surface this probably isn't the most exciting topic in the world, but this review is perhaps the best recent example I've seen on the topic of false discovery rates in proteomics.

I'm permanently linking this over there ---> somewhere in the (probably needs updated anyway) section for people new to proteomics ideas.

Next time you're having the FDR conversation with a customer or collaborator -- oh, you have one today? me too! -- maybe think about starting with this amazingly insightful and well-written tutorial.

Definitely on this topic -- this xkcd I'd never seen till today. Shoutout to Ben Neely for the link!

Somewhat related and something that will go in that section over there --> as well -- if you do the Twitter thing -- @BioTweeps posted a really concise and well-written overview of Mass Spectrometry that surprisingly well within the Twitter character limits. You can find it here.

Thursday, September 20, 2018

Need some inspiration? REAL CLINICAL PROTEOMICS. This is what we can do today!!

Okay -- cool developments with our toys aside for a second -- this might be one of my all-time favorite papers. If you can look at that picture without getting inspired by how far proteomics has come -- AND WHAT WE MIGHT DO NEXT?? you probably didn't click the right link to end up here.

I think I might have passed by this paper once because I didn't know what a lot of the words were in that title. 

To be honest -- I have one criticism of this paper. The title is terrible. 

What I would have made the title --

 In a clinically relevant time frame we can help diagnose a cancer patient and pick a personalized therapy to kill their tumor and massively increase the chance the patient survives!!  

How'd they do it? They did label free proteomics on slides (? pretty sure?)  from an excised tumor as well as the surrounding stromal tissue. 

They rapid tip digest it (they use the rapid reduction/alkylation together method as well) and they pop 1ug of peptides onto an inhouse 40cm column (probably cost them $5 to make?) on a Q Exactive HF. Yup, just an HF. Like the one that is running brain digests from mice with generalized anxiety disorders or something in your lab right now that you were considering trading in for something more expensive to run mouse brain digest with? Same one! 

They use the always free MaxQuant and Perseus for the data processing and downstream analysis/stats. 

Why am I fixating on prices? Because the first argument in clinical anything (at least in the US) is "how can we make an absurd profit for our shareholders again this year if the test we charge $8000 for costs more than $1.16 to actually run? Do you think we're running a charity in this hospital?!?!?" 

This group just showed us that we could do personalized proteomics to help patients TODAY with an aging benchtop Orbitrap (that will fit neatly into any clinical lab -- have you seen how much smaller colorimetric blood analyzers are now? They're tiny!  Boom -- put an HF there). If you consider free software, virtually no cost for reagents (10uL of acetonitrile and some trypsin) and I don't think we're too far off that $1.16 target for the assay cost. We can stop talking about personalized medicine and actually start doing it already! 

Monday, September 17, 2018

Ultra-high pressure column loading?

I'm torn on this one, cause this is real footage of me the first, last and only time I ever packed a nanoLC column... know what would make this even better? Throwing in an extra 25,000 PSI? Ummmm.... no...not the best idea for me....

However -- if you have successfully packed a nanoLC column without injuring yourself or others AND can make one that doesn't negatively impact the performance of that $1M instrument you're using...maybe this is a solution that would work for you!

You can find the paper at ACS here!

The MSM blog has also posted a summary of the paper here!

Me...I'll probably keep buying nanoLC columns from vendors who are responsible for doing it safely and correctly and are responsible for the upfront QC on their products....

Saturday, September 15, 2018

Miss the days when PD was slower and less stable? Tips to relive those days!

This has come up a lot over the years and I was surprised to see I couldn't find a case of me rambling about it here! So....I present Ben's guide to run Proteome Discoverer way slower and with lots of weird random chaotic surprises!

Tip 1 (picture above):

Keep all your stuff on separate drives while processing! Bonus points if you keep your RAW files on network drives. Double bonus points if you process your RAW files over your network from one drive and then deposit the results on a different processing drive!!  Want to level all the way up? Pull your RAW files from one network storage drive. THEN transmit your processed results to a DIFFERENT network storage drive!

...hours of processing....

Besides the fact that you've went from eSATA data transfer rates (according to Google --6Gbps) to (assuming you have true gigabit ethernet LAN) to a whopping 100 Mbps which is a minimum of 60x slower, you also get to deal with a bunch of cool extra things that are described well in this page.

It totally cracks me up that the physical distance between your network drive and your PC is a tangible factor that can affect your network rate. High traffic on your network doesn't speed things up either (a win for us nocturnal scientists!), but that is often negated by the huge FAIL that the drives tend to do things like perform their backups and security scans at 2am when there is only the one weird guy in the building using them.

Honestly, our files aren't all that big. We just did some deep fractionated proteomes (15 fractions) and they're maybe 24GB per patient. Transferring 6Gbps and 100Mbps a second shouldn't be that big of a change, even if you had 10 of them, right? However, it isn't just one reading step. It's constant R/W steps (have you seen the funny huge ".scratch" file that is generated?  while you're running? You are constantly reading that back and forth across the network.

Around the fact that PD is super slow -- you get all sorts of hilarious strange bugs. This week I saw one where PD would claim there was something wrong with the name of the output file that someone was trying to use! Wins all around!

Tip 2: Even on the same PC --- process your data on different drives!  

I think I have proof around here somewhere. I think I worked it out to 24x slower if you process the same data all on one drive as opposed to R/W to different drives.  I think it's on my old PC....I'll update if i find it, it's striking.

Wait -- side note --- did you know that even HDDs can have markedly different speeds? They totally do! There are drives designed for storage that are much slower than ones meant for working on. I've described my problem with that recently on here I think.

This is from a paper currently in review from our lab, but I think it's cool to use it here out of context ---

The cool part is how our new software makes processing huge proteomics sets much faster while kicking out the same data -- but what is pertinent in this ramble is the two shorter bars. Using the exact same files, huge mult-gig proteogenomic FASTA and software settings, we can drop a processing run from 24 hours to down to 14 or so just by moving everything from a HDD to a faster standard commercial Solid State Drive (SSD). If you aren't processing on these, I'd recommend checking them out again. They are getting cheaper every day. I think we just ordered some 1TB ones for less than $200. Bonus: I've still never had an SSD fail. And I've got 2 HDDs on different boxes that sound like they are popping popcorn (not the best sign ever) that aren't as old as the SSDs sharing space with them.

Can I call this a "guide" if there are only two tips? On the first Saturday in approximately 3 years in Maryland where the sun is shining? Looks like I sure can. I need to put on some brake pads.

TL/DR: PD HATES processing over network drives. Move your data and output files to the same drive when running PD then put them back. Yeah, transferring is a pain, but you'll more than make up for it in processing your data faster and with less random chaos.

Big shoutout to the two great scientists who introduced me to new PD errors this week that inspired this post!  I promise I'm not making fun. This really does come up a lot. It's too tempting to use your >100TB network storage rather than move things around, but I think system architecture needs improved before you can do it bug-free.

Friday, September 14, 2018

ANN-Solo -- Use spectral libraries to search for modified spectra?

Another big thanks to @PastelBio for something I would have missed!

Okay -- so what if you took one of the remaining limitations of spectral libraries and threw it out the window?  I'm talking about the fact that your library must contain the modification that you're looking for(!?!) -- Then you'd have ANN-Solo!  You can read about it at JPR here. An earlier version of the text was released at BiorXiV as well.

Now...I'm unclear how the ANN (Approximate Nearest Neighbor) part of this differs from the NIST Open Search functionality added to MSPepSearch last year.  At first it seems interesting that the authors use the NIST library here but don't appear to compare their code to MSPepSearch + HybridSearch.  They do use other libraries and since MSPepSearch only utilizes NIST library format, maybe the comparison isn't possible?  I would be very interested in seeing a comparison between the two.

Unfortunately, while Open Search has an .Exe that  I can run and use, ANN-Solo requires a NumPy Python to work and I'll have to ask for help if I want to try it. Honestly, with results as good as the paper reports -- 100% worth it.

Interested and don't feel like reading on a Friday? You can get the software here!

Thursday, September 13, 2018

Mislabeled Data Challenge starts September 24th!

Yo! Bioinformatics peeps! Want in on the coolest challenge you've ever heard of?

Check this out!

Can you get clinical data and proteomic data (and RNA-Seq data, but who cares about that?) from "patient samples" that have been deliberately mixed up and blinded and then sort them back out?

I'm not 100% sure our group qualifies to compete, but we entered anyway. Fingers crossed, we REALLY want in on this after work/weekend project.

I would like to unofficially increase the level of this challenge. Forget the RNA-Seq data. We can do this with proteomics alone!

If we get in we aren't even gonna download the RNA-Seq. The signal is there in the proteomics data. We just need smarter ways to pull it out for comparison.

Time to shift the paradigm!  It's FINALLY the age of the proteome and this is a test case where we can prove it.

You can read more about this challenge in Nature Medicine here.

You can directly sign up here.

Be warned, I'm already planning an award ceremony for when someone pulls this off without looking at the nucleotide data.

My proposal -- we should have an award ceremony for ourselves at ASMS or HUPO next year. I also propose it features the great Dr. Jurgen Cox coming in and kicking over a stool that has a Mi-Seq on it. Come on, tell me you had trouble visualizing that happening when you read it!

{Edited to remove some statements regarding RNA-Seq and shoes}

Wednesday, September 12, 2018

Rapid assessment of non-protein contaminants with Skyline!

Hopefully if you're doing proteomics you're always throwing in some great FASTA entries from cRAP or the MaxQuant contaminant database or have even generated your own list of stuff that you find in every water blank (or a combination of all 3).

Have you ever seen a way to keep track of the junk in your sample that isn't from sheep wool or gorilla keratin peptides?

Me either! Here you go!

Loads of reasons to read this paper.
1) PEG is in just about every sample in some way. It's only when there is tons of it that it's a serious problem. This can help you keep track of this!

2) PEG is the first thing you might think of, but there are other contaminants as well. And this method doesn't just work for proteomics. It'll work for any LC-MS experiment.

3) The author totally pulls off a full (and awesome) application note as the single author. It's a great precedent for people with a bunch of stuff on their desktop that they felt funny about writing alone. Writing "I" a lot in a paper feels really weird while you're doing it, just because you're so used to reading "we".  It doesn't come off as weird when you read someone else who wrote it that way.

4) In Excel you can =MROUND([Cell],5) to round to the nearest 5. Which no person ever in the history of the world has ever needed. You're welcome.


Tuesday, September 11, 2018

Probing the sensitivity of the Fusion Lumos system!

This new paper at JPR sets up a terrific standard method for determining the sensitivity of an LC-MS system.

The NIST reference antibody was digested and spiked at different levels into a universal concentration of a standard yeast digest. The Lumos was operated in different ways to determine relative sensitivity by picking up the mAB digest at different spike levels.

The most interesting comparisons are probably when the ion trap and Orbitrap are compared and when the Lumos is compared head-to-head with a Q Exactive Plus instrument.

While the Lumos comes out ahead in every comparison, it's only when the ion trap is involved that the gap between the two instruments becomes something you couldn't overcome with some optimization and gradient lengthening -- the gap is just too large.

There are a lot of gems in this study that help guide for instrument selection and method optimization on this great platform.

Monday, September 10, 2018

Time to justify that Virtual Reality gaming rig you've been thinking about!

Sometimes we need to ride some coattails to move science forward. Case in point?

First of all -- there is an entire journal called "Computer-Aided Molecular Design" !?!? 

Second --- you might also be mostly aware that VR headsets are out there from videos of how stupid people look while playing games with them....

Okay -- but what if you could take one of these things and with shockingly little code, that is freely available here, use these things to immerse yourself in protein structures from the immense PDB databases all those weird structural people are already uploading?

The better the PBD structure present (newer ones tend to have way more snapshots of the protein from different angles) the better this all works, but if you can't seem to sort out those protein interactions, maybe a visual/pseudo kinesthetic approach will help you get that breakthrough!

Wednesday, September 5, 2018

Deep diving in spinal fluid proteomics!

Human body fluids have unbelievable dynamic ranges in terms of protein abundance. Spinal fluid is no different. Which is surprising, honestly, if you look at it, because it just looks like water.

(Not my freakishly large hands).

Just like the blood/serum/plasma proteome, we don't know with 100% certainty what proteins are present in the fluid under normal conditions, but this brand new JPR study does the best job yet of thorough characterization.

Cool stuff from this study -- there are just companies where you can buy commercial human body fluids from! They just bought a bunch of CSF!

They digest the CSF, TMT6-plex and then they break out the OFF-GEL and use the high resolution fractionator (24 isoelectric peptide fractions).

It looks like they take the peptides directly from the OFF-GEL and desalt online (! awesome if true !) and run a complex 171.354 minute gradient (my math) on a 50cm column into an Orbitrap Fusion Lumos running in OT-OT mode (120k MS1 15k MS/MS).

That's 68 hours of Lumos time and the highest number of peptides and proteins from CSF to-date, by a large margin!   Now that there is an improved baseline for "normal" is it time to re-evaluate some of these historic datasets from studies on different pathologies?  I'd think so!

All RAW files from this great new reference dataset are available at PRIDE/ProteomeXchange here.

Saturday, September 1, 2018

Computers get worse at processing data when their hard drives are full.....

Maybe everyone already knows this except me. But over the last week both Compound Discoverer 3.0 (beta?) and PD 2.2 have slowed down on my PC when they're running simultaneously.

At first I thought "maybe my PC shouldn't be running big PD and CD batches at the same time," but I bothered Dave at OmicsPCs (where I get my hardware) and he checked it out.

So....apparently....these bars are red because even Windows realizes that it's a problem.....

Dave said that even with a super speed SSD thing (the 931GB drive that I processed everything on before I got too lazy and then just started processing everything on the not-precisely-meant-for data processing slow storage drive that I specifically requested he add even after he warned me that it would be a poor choice for data processing, but would be super awesome for low voltage long term secure data storage) when it gets that full, it slows down.

And then I get to answer the question -- why do I have 288GB of stuff in my recycling bin?

Delete that! And it's way faster again!

Moral of the story -- if you're a small business and I buy stuff from you, expect really stupid questions if you answer your phone.  Shoutout to the the teams and Protifi and OptysTech that are probably far too nice to agree with this statement.

Wednesday, August 29, 2018

mPOP --> Streamline single cell proteomics!!

Have you taken a swing at ScoPE-MS?

Are you also really bad at pipetting?

It doesn't take too much distraction before you will lose that single freaking cell that is in the bottom of your stupid plate. ("Did I add the 0.25uL of label to this tube? I'll get the microscope to see. Hopefully it won't evaporate before I get back")

Okay -- this HAS to help!

I'm aware there is some concern regarding ScoPE-MS, such as why it hasn't been peer-reviewed yet, but I know of at least one independent lab that is pulling in results from this approach (not counting me -- yet -- pipetting issues....) so the data is coming! If you use Tweeter, or whatever, I recommend you follow @SlavovLab. This group is kicking ass and appear characteristic of the growing community of research labs that are looking to alternatives to the classic peer review model for rapidly pushing science forward.

Tuesday, August 28, 2018

You can add Ion Mobility to your Fusion!

*Cue someone to email me and tell me FAIMS and Ion mobility are different.*

high-Field Asymmetric waveform Ion Mobility Spectrometry is back in the form of the FAIMSpro device you can add to any Tribrid mass spec!

What's it do? It allows gas phase selection of peptides directly inline on your instrument.

I took this from the manufacturer's website here -- 24% increase gain when using FAIMS on a Lumos, vs not?  This is ion trap MS/MS, btw, but that is an amazing boost!

Some details I've found between the spec sheet and the press release.

The data can be processing in Proteome Discoverer 2.2 (no digging through GoogleGroups to find the patch to process your data like some other ion mobility medium resolution system we hear a lot about)

FreeStyle can also process the data (might mean you finally have to ditch QualBrowser -- which, honestly, you really should. Yeah, I'm still opening everything in Xcalibur too -- but FreeStyle is so so so much better.)

FAIMs Fusion operational software templates are also ready to go (new Fusion tune upgrade coming?!?)

Skyline can also deal with the data automatically!

If the manufacturer plans to put this device on it's fastest and most sensitive mass spectrometers, I don't see it mentioned anywhere, but it provides an enormous boost for the ones made here in the U.S.!