Thursday, November 28, 2019

PDV --- An integrated proteomics data viewer!

Do you mess around with all sorts of different search engines?

Would it be awesome if someone spent a ton of time making sure that you could look at all your data in the same downstream interface?

Paper link here.

Are there other ways to visualize data from a bunch of different engines and tools? Sure. But PDV is probably the easiest and most widely compatible -- and has some features that are ridiculously handy.

This little Java file does the perfect job of looking simple but pulling off crazy powerful.

You can get it at this Github where you'll see the ridiculous number of tools it supports. I can't take a screenshot that is big enough to capture them all. Basically if you can think of a search engine, PDV can upload the results and it all comes out looking like this!

Did you, for example, reprocess a bunch of MaxQuant files from PRIDE with MetaMorpheus and MSFragger to see if the original analysis missed anything? Have 4 little Java windows open at once and look at the evidence in the same format! How valuable is that?  (Can't seem to clip the other monitor simultaneously -- but here is two! MaxQuant on right, MM on left.) MaxQuant has SO MANY ROWS (columns? it's early here. whatever the vertical ones are) that it can be a little tough to visualize. And that ends my complaints about this great tool I somehow missed.

There is more here. The PROTEOGENOMICS button lets you upload a gosh darned proBED or proBAM file and look at it directly in this as well. Another button lets you look for individual peptides inside a file (just type the peptide sequence in the box!) and plot it if it sees it. I put in a bunch of fake peptide sequences and the visualization was a little odd. I haven't checked it with anything real. I've got a bunch of cooking to do.

Great tool. Solidly supported with tutorials AND example files. Recently updated. Unifies a lot of our tools, which we really need to do more of so we can keep them around. 100% recommended!

Monday, November 25, 2019

The Okinawa Analytical Instrument Network Meeting 2019!

Have you ever heard of OIST? I'll be honest, neither had I. I am, however, 100% confident that you will hear lots about it soon. OIST stands for the Okinawa Institute of Science and Technology and it might be the most beautiful campus on earth. No picture I've taken has done it justice at all. The facility is built on mountaintops to responsibly protect the ecosystems around it. And it is growing! 

I was inexplicably invited to talk at the Analytical Instrument Network Meeting this year along with some serious mass spectrometry experts. I'm going to ramble through an overview of this meeting here:

Day 1: Meet people, see the amazing facility and Q/A session.

This facility has an amazing array of instrumentation and hardware. From robotics through NMR and various arrays of GCMS and LCMS to support the needs of a facility dedicated to primary scientific discovery. The work they are doing is on a lot of organisms I've never even heard of. The best part of day #1, however, was the Q/A session. The speakers (who I'll detail below by talk on Day #2) and I were invited to sit down with the Instrument Analysis Section Staff (the people running the 40 analytical instruments!) and their collaborators who are utilizing the mass spectrometers the most.

This is a plea from the very bottom of my heart to everyone with an aging mass spectrometer: Please please do not retire that 10 year old Orbitrap. Don't trade it in for a $15,000 credit unless you really truly can't afford it with your new instrument. Do this instead: Set that instrument aside and let the power users of your facility -- or better yet -- students at your or nearby university's use it. Have a workshop and show people how to do the basic stuff, set it up with some nice generic workflow and let them go. It might be the best investment you've EVER made in the future.

OIST has done something similar. They have several LTQ Orbitraps and one is user walk-up. I know your objections. "The students will just break it!" Sure. Set it up so it's hard to. Capillary flow or, heck, analytical flow. Put a good desalting/trap system on it. Use a wider bore emitter. Give them enough training that they will mess it up some. The payout is astronomical.

I mean no offense to any biologists I've worked with in the past, present, or future but if your collaborator's best understanding of what happens after they give you the sample is something akin to this..... are handicapping yourself, them, and science in general.

They aren't going to ask the right questions of you or the technology. They are going to ask questions that LCMS is not the right solution for and the project will be a failure. Or they are going to ask questions that the technology could easily answer without knowing that while doing so they could learn much much more. (And how much longer are they going to even be interested in what we do?) Without some fundamental understanding of what we're doing in the lab (and to be fair, I do a decent amount of dancing in the lab when no one is around, and I KNOW I'm not the only one -- what else do you do when you're sonicating or centrifuging for 3 minutes?) we aren't speaking the same language. OIST previously had an ion trap that some users were trained to utilize and now they have an Orbitrap Classic. And you know what?

The questions the panelists and I received were some of the best I've EVER heard from a group of people consistently. I'm not kidding -- the biologists who are using the center resources know enough about what the instrument is doing that they are pushing the boundaries of what I or the other panelists (and I think we were talking about 75 years of mass spec experience on the panel, all told) knew or had thought about. I frantically started taking notes, because there were things the biologists cared about that we could absolutely be helping them with, that we aren't because we aren't thinking in the same terms. (Now, it probably doesn't hurt that the proteomics lead here, Alejandro Villar, might be a legitimate genius, but the end users here are impressive. And, yeah, there is probably an abundance of talent, but my bet is that most facility's could come close to replicate this model with similar success.)

As an aside -- this group does have what might be the solution for the hardest part of single cell analysis. They have a slow, methodical robot that is designed 100% toward reproducibility and accurate sample handling. It might cost more than an Eclipse, but how many single cell proteomes do you have to mess up before that seems negligible? Depends on the model, I'd guess!

Day 2: The actual meeting.

Our host Dr. Kazuo Yamauchi kicked off the workshop by reminding everyone that this was an interactive event (yeah! that's how you learn this stuff, unless you're a giant sample prep robot..)


Dr. Andreas Huhmer provided a remarkable optimization of single cell proteomics by his team beginning with label free analysis (<500 proteins) to ScoPE-MS (~1,000 proteins) through adding FAIMS and then Direct Search (the Eclipse function that presearchest TMT data to make sure it's decent before doing SPS MS3.

Huge takeaway here (and he's sent me a reference) if you are doing ScoPE-MS you do not want to overload your carrier channel!! It seems like you could just load more and more peptides into your carrier and you will get more data. You will get more peptide IDs -- sure -- but you will screw up your reporter ion signals. You'll suppress the single cells. To be honest -- I was sure I was ruining my sample prep when did ScoPE-MS (and I probably didn't do it any favors) but I did think that loading more carrier was a great next step when the first experiment didn't look great -- and -- I think my data looks like what he was describing. More on this later, for sure.

Professor Yet-Ran Chen -- studies of plant immunology leading to a massive success story toward leveraging plant immune function to getting the world away from pesticides with endogenous immune peptides. Check out this new paper on the topic. Also -- his team has developed a method for using Mascot to search for endogenous peptides. He's doing this with plant immune peptides -- strategies to steal for HLA/MHCs? Probably! (I mean...I was just impressed to learn that plants had immune systems, to be honest)

Professor Newman Siu Kwan -- His group does a lot of work with proteomics and PTMs and aging, and that is what he spoke about. His Google Scholar indicates that a lot of stuff deals with...compounds that mediate...age related terribleness. 100% worth searching through this!

Professor Takeshi Bamba -- Metabolomics by supercritical fluids!  This talk began with me frantically taking notes and looking up what supercritical fluids are. His team uses pressurezed CO2 to separate metabolites! You can separate lipid classes with it. And it can be coupled to HPLC and then to mass spec and he's partnered with vendors to bring this technology to market. I'm still not 100% sure I get it, but it's worth checking out.

Other highlights were great questions and conversations and the pages of notes that I have.

I'm still not sure how I got included in this, but I'm deeply greatful to the organizers for this experience and have made some friends here that I hope I have for life. I even got to stick around and sit in front of their instruments and learn and bounce ideas back and forth with this amazing team.

OIST has a dream of doing something truly special to improve the environment and science of the world and they are well on their way to accomplishing it on this ridiculously beautiful and fun little island!

Edit: Right as I was boarding the next leg of my 20+ hour trip home I received the picture at the top from Dr. Yamauchi. I made the joke repeatedly that all you would see was the projector bouncing light off my shiny head. I am waiting for my first of many planes today to board and I can't stop laughing because I was more right than I knew. A consequence of the great aerodynamics I get these days!

Sunday, November 24, 2019

FragPipe -- MSFragger made a bit more mortal, and a lot more powerful!

I've been meaning to do some tests with the new iterations of MSFragger and finally had a few minutes to do some extremely limited tests. More coming. 

Let's clarify what we have here:

MSFragger is the search engine. It is impossibly ridiculously fast in command line format. If you had to remember one thing about MSFragger it's that it's the open search engine. Let it search for any delta mass shifts within 500 Da of your target. It'll find those masses, and it will find them in seconds or minutes not hours, days, months, or millenia (some engines might honestly take more than a human lifespan to do a delta search, I don't have the patience to verify). I also only used 500Da as an example. 

MSFragger nodes work directly in Proteome Discoverer and the newest ones even work with all the quan nodes! 

From what I wrote in that post, I'm trying to guess which computer I used for it...and I'm going to guess it was a 7th gen i7 laptop that I don't use much these days. Last year TSA agents were forced to work without pay for 35 days or so and...well...airport security was a little more intense at times and my laptop got dropped and bent and getting it to go power on requires some careful flexing around the power button. I'm only rambling about it because it might be important in a minute. 

(clicking on this should expand and make this less blurry)

FragPipe is the package that contains MSFragger and all sorts of great stuff to make it useful. I'm using the Windows GUI version and it links directly into the Philosopher and it does all sorts of cool stuff like automatically downloading and formatting your FASTA files and directly reading from RAW files! 

I haven't tried the DIA Umpire options yet (or many others, but I suspect I'll get to them soon. This is WAY too cool not to spend time on). 

For those of us who get to one format of data input/output and get stuck on it a little -- there are MSFragger Proteome Discoverer nodes now!  I've got a partially written post on here somewhere that I started a month ago and just realized I never finished....on the wrong continent and I don't want Proteome Discoverer on the PC I am using right now. 

There are some steps involved in setting up FragPipe, but the tutorials are great, and the software gives you prompts. Got a Java that's too new? Installed an old version of Philosopher a while back? It fixes all that stuff. 

To the mortal part -- and the PC. When I first ran the command line MSFragger (I linked the post above) I spent a lot of time just hitting the enter button to see if it really was completing searches faster than I could click the enter button again. I'm not kidding, it is that fast. To the point that you assume it errored out and there could be no data -- but there was data! However, a huge list of potential matches and OPEN SEARCH shifted masses was a little overwhelming. Obviously lots of power but, for me, some limited practicality compared to MetaMorpheus which provided a more practical output for me. (Is there some sort of Ann Arbor/Madison rivarly thing based around cricket or quidditch? I think there might be. Let's extend it to free software!) 

FragPipe is doing a lot of stuff when I run it now. Way more stuff than MSFragger was doing. It's taking my vendor format binary file (RAW) and it's running with that. It's making a file of my decoy reverse stuff. And -- it's running on an i3 I got off Amazon for <$300 and upgraded (SSD and RAM) to be slightly less of a catastrophe (go ahead and drop it TSA, everything is backed up on Cloud things!)  

Total run time per file? I'm hitting around 11 minutes with a 500 Da open search. A 500 Da open search!  That's hella fast. It's just not instantaneous. It could be the laptop I got for typing and battery life and NOT for data processing, but it could be that MSFragger is the fastest thing ever, but all those other new steps take time. I'll check on other PCs later.

But -- does it work? I wouldn't be typing this at 3am if it didn't! 

I didn't tell Fragpipe any modifications at all and just Open Searched it. I kicked out the TSV file into Excel and used "Ideas Artificial Intelligence Button" (LOLs!) to make a Pivot table and chart. What do you know -- loads of iodoacetamide and alkylations. (I think that's a nice test for an open search engine. Find the stuff you know is there first!) 

FragPipe -- Totally works -- easy to use -- still crazy fast -- and now has all the tools around it to make MSFragger a complete package. 

Saturday, November 23, 2019

Static Percolator allows application to smaller datasets!!

Okay -- if we've talked at a meeting about data processing -- we've talked about this. I'm at an amazing meeting right now and I was in 2 great conversations about this concept already.

Percolator is fantastic. It is the gold standard for false discovery rate calculations, but it was designed for global applications. If you've looked at your data you've seen this phenomena where all the sudden you can't seem to trust what it is giving you by default.

Some of those videos over there on Proteome Discoverer are like 7 years old now? And I ramble incoherently about it there. But what is the solution?  Could it be this!?!?

I have a finite number of stored Obama Boom GIFs left, but this deserves one.

Static modeling percolator!!

What's the difference? Normal percolator is dynamic. It learns from that big 'ol dataset you just gave it and that's what it uses to set your parameters.

Static modeling flips the switch. What if it learns from a big 'ol dataset and takes all that stuff it just learned and you apply those settings to the little dataset you just gave it?

Well -- it looks like you've got a smart Percolator right out of the box!

As shown in the picture at the very top that I stole and then clipfarted over -- when your datasets are too small for Percolator to learn enough from it gets "discordant" (their word). I like fuzzy better. I've always wondered if there was a static cutoff. It's great at 100,000 PSMs. It can be BAD at 1,000 PSMs. Where is the cutoff?

What we learn here is that there is no set number (which makes sense, it would be weird it there was) but you get progressively fuzzier as PSM numbers drop (which we've all seen. we're all totally smart. just not smart enough to fix it). These guys (one of them, I hear, has some evidence he knows something about how the whole thing works) just pulled the whole thing together.

100% Recommended reading (or at least skimming). It has big implications for our field and how we will process ALL our data in the future.

Edit: Want to know more? Check out this blog post!

Friday, November 22, 2019

Wanna go fast?!? SPEED Acid digestion!

I'm going to start off by admitting I mostly moved this paper up the blogging queue so I would have excuses to put more Ricky Bobby quotes on this blog.  However, there are clearly some jewels in here. It might seem off topic, but -- THERE ARE MICROBIOME STANDARDS FROM ATCC!!! -- and they use one for the gut microbiome in this study. I've written a lot of commercial groups that do genetics microbiome standards to see if they had anything for proteomics and I've been ignored. Probably because everyone in the world but me knows that ATCC has already locked up that market? Great. I hear they make good standards and it looks to me like they'll be perfect for metaproteomics and metabolomics. 

With the dozens (hundreds?) of sample prep methodologies out there, why would you use this one? Cause...

...that was a joke...

SPEED isn't a crazy fast digestion method like FLASH or sTRAP (which can be killer fast, if you want to use it that way -- I've noticed it used slower in the literature, because our typical overnight digestion can be pretty darned convenient). The goal here appears to be to minimize sample handling, because that should always be our goal.

I stole the overview of the method as the top image in the post and it should expand if you click it. I really really like one part of it. The turbidity measurement for protein concentation. <1 minute to know what my protein concentration is? Now...that being said....I've not found nanodrop to be the most accurate way of measuring proteins and turbidity seems even a step back from that. (As always, please keep in mind I'm not good very good at sample prep and my decreasing vision and hand eye coordination aren't exactly helping (thanks sensescence! you're the best!). If that is linear and sensitive, that would save a lot of time.

Here, microwave digestion is used as a first step for the hardest of samples (10 seconds for gram positives with an 800W microwave [Much less time than you'd need for Shake 'n Bake. Yes. That just happened]) it isn't used as the ultimate digestion method. Trypsin is still employed as the final digestion reagent.

The authors are quick to point out that while this method has some great pluses -- like no detergents, there are some obvious concerns --

--- they look at phosphorylations in this study.

How's it do? On the microbiome standard it appears to outperform iST by a lot and even beats STrap. If you'd like to check out the data it has been uploaded to ProteomeXchange as PXD011189, but hasn't been made live yet.

Hey authors! This is two papers in one weekend. Don't forget to tell the repositories when you're papers get accepted so all us nosy people can actually start digging through them.

Do we have a new number 1 digestion method? I'd like to see more data from other people before thinking this beats the current number 1 (which I'd call STrap right now).

And you know you can't have 2 number 1s -- cause then you'd have 11.

Thursday, November 21, 2019

ScoPE-MS +TMTPro + FAIMS + Realtime Search (Fusion Eclipse) for Single Cell Proteomics!

What technology should you use for the highest coverage of single cell proteomics?

Let's go with....ALL OF THEM!

Wednesday, November 20, 2019

Precision FDA -- Free super computing power preloaded with applications!

Okay -- I now have a PrecisionFDA account and I've just uploaded data to it, and I'm trying to reduce some spectra with RIDAR with it -- and I have no idea why this is here, but I like it.

Disclaimer: Since this is an HHS US government thing it might be for US people only? But...I'm in Japan right now and I logged right in, even with the 2-factor identification thing.

You can go to PrecisionFDA here.

What do I know about it?

Well...they are the ones responsible for the CPTAC challenge to identify mislabled samples -- so they're clearly the good guys. That was cool, even if the start and end of the challenge deadline made it clear they didn't expect anyone interested in participating had a job. Scientists tend to be busy people, yo. If you want them to volunteer for stuff and they see they have to start and complete in like 3 weeks, no one is going to take you up on it.

What else do I know about them? I just got a free account and uploaded data to their cluster. There are a bunch of tools there already but it currently looks like all dumb genome and transcriptome stuff, but if someone is going to let me run my tools on their power bill they're cool people in my book.

Tuesday, November 19, 2019

A wild TMTPro Paper has appeared!!

About darned time!

Just accepted at JPR -- the first (as far as I'm aware, please correct me if I'm wrong, study showing the use of TMTPro (previously TMT16-plex)!

Quick summary of my rapid readthrough:

1) This group typically uses NCE of 38 for TMT10/11-plex reagents, they use 32 for TMTPro (please keep in mind that proper HCD NCE can vary from system to system and there are ways to calibrate for that now.  The important part is that the HCD is lower/closer to what we use for unlabled peptides!  This is particularly good for those of us still using MS2 for TMT. The authors describe the use of both MS2 and SPS MS3 on their instrument. (And -- in my hands an HCD of 32 on an Orbitrap Fusion lines up pretty close to a NCE HCD of 27 on a Q Exactives -- again, varies from instrument to instrument, but this all sounds right to me!)

2) The larger tag makes the peptides a bit more hydrophobic (elute later) but it is a shift of a few minutes that can be easily adjusted for

3) When comparing number of peptide/protein IDs directly TMTPro results in a few percent less identifications, but you get 5 extra samples done simultaneously, so I still call that a win.

Concise, well-executed little study that will deserve the thousands of citations it will get for being the first one to press.

And -- for those of us dying to get our hands on a TMTPro dataset -- all these files have been deposited! (PRIDE PXD014750)  I'm filling out the form on PRIDE now to have the files released for public download. 

Monday, November 18, 2019

ThermoRawFileParser --- A big little step away from Windows!

A long time ago I was in a relatively serious car accident. My recovery cost me two weeks of classes and I learned that concussions are seriously no fun at all. However, if you gave me an option of going through that again or migrating all my computers to Windows 10....I'd need time to think about it.

Unfortunately, like all of us, there is no choice at all. Windows 10 support is ending and our field is intrinsically tied to this Cortana and Bing infused catastrophe. Or is it? What is still missing?

Sure -- the instruments need to run on a corporate operating system but there are increasing numbers of options for the data processing that don't involve somoeone running an ad to try and sell you stuff while looking for your stuff on your hard drive

(If you do run into an Ad inside your computer, this tutorial will help. This appears disabled in Enterprise versions, but who knows for how long? I removed Cortana from the SysReg manually, and on the next update, there she is, helpfully taking me to a place to purchase Kanye's new album every time I type the exact name of an Excel spreadsheet into the search bar.

"..thanks Bing! You're the best!"

I should sleep more. This is getting out of hand.

Certain bioinformaticians in our field have been leading the charge away from Windows for quite some time and my obsession with learning how to follow them is filling the pages of this increasingly strange blog these days. And ThermoRAWFileParser couldn't have come at a better time!

I'm working on installing the ProteoWizard on our cluster now, and as far as I can tell there is still considerable extra functionality in it that I should definitely still get both up there, but this new tool has some really cool advantages as well, including the direct production of JSON metadata files. And, in a head to head with msConvert, it appears the new tool produces mzML files more accurately, as they result in more total peptide IDs!

Sunday, November 17, 2019

Re-Identifiability of Proteomic Data and its Implications.... this is open access and it addresses one of the biggest (and scariest) elephants in the room. I hate to keep drawing attention to it, but with 40+ peer-reviewed studies on forensic proteomics in 2019 already, we need to start talking about this.

Anyone in the world can go to ProteomeXchange and download data from one of the repository partners like PRIDE or MASSIVE. If there is personally identifiable information in there, do we need to be thinking about this? Albert Heck, do we need to start having this conversation with the general scientific community and/or...yikes...government regulators...?

This thoughtful paper addresses these and (IMAHMFO) properly describes them as "dilemmas". 

With genetics we need to be extremely cautious with how the data is made anonymous -- and explicit disclosure agreements and fancy government forms for release of genetic data with descriptions of the potential consequences. I think I've been told that there are people at Hopkins who do this stuff as a job, informing patients of their rights when they're participating in big genetics studies.

If you could track single amino acid variants specific to people in things as benign as hair? It doesn't seem all that hard to imagine that you could definitely identify a person and stuff about them from a plasma proteome, right?  Maybe y'all on the biology side are already doing this stuff and I should just get out of the noisy room more? I hope so!

Saturday, November 16, 2019

Unnatural selection -- 100% Recommended Documentary!

If you need to catch up on a ton of those genetics terms and techniques you've heard people mumbling about, there might not be a better or more interesting way than this new documentary.

CRISPR stuff?  Check! 
GeneDrive stuff? Check!
Some...interesting....looking "Biohacker" guys saying reasonably accurate science things and then injecting themselves with stuff?

Friday, November 15, 2019

What to do with 100,000 core hours of super computer access!??!

I think I just successfully convinced @SpecInformatics to throw in on a study where we try to do ALL THE PROTEOMICS THINGS on High Performance Computing.

I just got 100,000 core hours for free, and I was told that if I could come up with a valid excuse I could probably have another 300,000 hours to use in the next 365 days.

Lesson 1)

CompOmics FTW! The amazing people at the UVA HPC were easily able to set up an Anaconda module and -- BOOM -- SearchGUI.

Interesting thing I forgot 2 Linux boxes ago -- or honestly didn't know -- while SearchGUI installs your 10 search engines with the Windows install package, they might not automatically install in the Linux versions.  Okay -- but this is can be a huge advantage.

Wait -- you know about SearchGUI. I ramble about it all the time. Okay -- if you don't -- SearchGUI is this amazing idea from a bunch of smart Belgish(?) people? who said -- wow, there are a lot of amazing search engines out there for free but most of them are a pain in the arse to set up and use, so people using one aren't going to have the energy to set up the others. Can we fix this? Oh...and choosing just one is dumb....let's fix that too!"  And you get --- 10 engines you've heard of -- in a super easy interface!

(I was only running with decoy search off because I was trying to troubleshoot something odd.)

It's an amazing bit of convenience and power that you can get here. I can't recommend it enough. I even started making tutorial videos for it a couple years ago and forgot it completely. Maybe I'll finish them later! My calendar says there is some free time coming up in August of 2024.

Can you imagine how much work it would be for this group to keep up in the improvements of each of these engines? They do a great job, but the awesome Comet engine has had at least 2 updates since ASMS 2019, which I'm convinced was yesterday.

I don't know how to do it yet, but it looks like I can just get going with the newest version! Success!!

Right now I've just got Novor and DirecTag going -- because if you've got 100,000 computational core hours and you don't go after de novo first you probably don't need it. I always need de novo! 

How long does this HPC need to NovoR + DirecTag search a human Hela MGF file from 200ng from a QE HF?  (I've got ProteoWizard, I've just got to get it set up properly so it will accept .RAW and .d)

About 60 seconds for both. Interestingly, at 3AM it is about 40% faster than 1pm....

If you've got an HPC on your campus -- go talk to the nice people that run it -- and see if it can be an asset for you!  My next plan -- MAXQUANT -- because --

MaxQuant isn't just for Windows anymore!!!

Thursday, November 14, 2019

Great GalaxyP Tutorials hosted at

Have you seen the great new application study where GalaxyP was used and thought...okay....

The arguments are building up for why you need this.


If you're also thinking "...wait...remind me what Galaxy is again...? I know I saw a talk from that really cool guy from Minnesota (Pratik)"

Galaxy is a flexible interface for linking all sorts of tools on super computer thingies. GalaxyP is the proteomics version. You can have someone smart build you a GalaxyP instance on your supercomputer thing -- but there is a cooler way of doing this -- you can just borrow time on someone else's!

GalaxyProject.EU has workflows built in that you can use AND they have loads of tutorial stuff so you aren't starting alone on that terrifying project.

You can directly access all this stuff here.

Tuesday, November 12, 2019

Challenges and Opportunities for Mass Spec cores in the Developing World!

This article isn't brand new, but I just stumbled across it and really appreciated the perspective on it. It's open and available here.

1) How do you get funding to set up and run a core outside of where most of them are?
2) What challenges would you face if you packed up and decided to go there? Yo...the 24 hours to pump down your Orbitrap after every brown out....that sounds like a blast, right?
3) And this is the absolute best part of the article -- the Opportunities!  -- yes, there is all sorts of great basic science that you can do with baker's yeast. But -- there are diseases the World Health Organization reference lists as serious people killers that I've never heard of, and I bet that almost no proteomics or metabolomics has ever been done on.  There is such an opportunity to do good and have an impact that we can't possibly ignore the development of biological mass spec in the developing world.

Yeah, you could argue that you could send more samples here, but have you gotten human samples from Africa before? I have and I wish I knew about this new technique that helps you tell how many freeze/thaws your samples have been through!  When your samples are coming thousands of miles there is a very good chance that some valuable data may be lost, particularly in molecules that might not be as structurally robust.

Monday, November 11, 2019

ProtRank -- Go beyond protein value imputation!

How we deal with "missing values" may always be controversial and I'm going to assume that no level of improvements in mass spectrometry engineering is going to be able to fix this. Sure, we can get better coverage, but sometimes that peptide just isn't going to be there -- maybe because it's a got a single amino acid variant (SAAV) or maybe because it's got a post translational modification in patient/or condition A that is not present at all in B.

At some level, though, we've got tough decision to make. Do you reeeeeeaaaallllly want to divide by zero? Or do you want to throw out that whole peptide measurement in your downstream analysis pipeline? It often makes sense to impute a value for that peptide or molecule that you can't see in your extracted chromatogram.

ProtRank may not be the ultimate solution (...cause...realistically there may not be one universal solution...), but it's a different take on this old problem. You can read about it in this new open article.

ProtRank is assembled in Python and is available at github here.

This study is interesting in it's examination of some extreme dataset models and looks at the biases typical imputation methods cause in them. One place that is really scary to impute is phosphoproteomics. A lot of phosphorylation sites change to such an extent that they exceed the linear dynamic range of the instruments (I don't fall into the school of thought that there are truly 100% on/off switches, I think it's different bi-stability cliffs -- I almost threw in some references here, but I really should go to work). Do you impute here?

Want to talk about a nightmare dataset? They look at phosphoproteomic shifts in IRRADIATED CELLS. DNA damage repair functions through phosphorylating everything it can to stop processes that make the radiation damage worse. The increases in phosphorylation are probably as big as you can get. Imputing some values shifts the data to the point that you lose a lot of the known phosphorylation changes. Whoops.

How much better does ProtRank do? In some part we have to wait and see. It is applied in a big biological study that is in preparation. This is the introduction and logic behind the code, and a nice way to say "download me!"  So...

MaxQuant Summer School 2019 videos are up!

What great timing. I was just whining about how I can't make Perseus do something that seems really simple in my head -- BOOM! 4 new Perseus videos!

You can access the MaxQuant Summer School videos on the YouTube page here.

I'm personally going to start with video T4. Because I suspect I'm missing something important right at the beginning in my dumb pipeline.

Sunday, November 10, 2019

Precise protein turnover -- IN LIVE ANIMALS -- the ultimate protocol!

Do you have 4-5 weeks?

Do you need to get an absolute understanding of the rates of protein turn-over IN A LIVING ANIMAL SYSTEM?

This isn't the first technique for protein turnover measurements. This may be, however, the most complete picture that we've been able to get.

If your strengths aren't exactly centered in the wet lab aspect of proteomics does this look a little bit like a nightmare? Yes. I can confirm. However, it's only the first 70 steps in the protocol that will negatively affect my already erratic sleep patterns -- at 71 we get to the data's MATLAB...but it's already all done for you!

What do you get out of this? A comprehensive and precise measurement of protein turnover in an entire organism -- like, for real, whatever organ or system you care about -- up to and including ALL of them.

Forensic proteomics is coming fast -- Genetic variation detection in hair keratins!

Recently there has been an explosion of new evidence that proteomics has value in forensics analysis. While it's obvious that this is a great thing -- I'd also argue that it might be kind of a scary thing as well. Could you, for example, determine every sample I've ever prepared in my life from the RAW data by identifying a specific keratin peptide variant that is unique to the majestic Pugs that I've dedicated my life to rescuing and protecting from a world that isn't nearly good enough for them?

This new study from NIST suggests that -- yes -- this and possible AND can even be used for the identification of human genetic variants (which you could effectively argue might be an application of this technology that would be slightly more widely applicable...I guess....)

I'd like to point out a technical in this study that is really cool. They did In-gel digests of these hair samples. The gels were then stained with SimplyBlue Safe Stain. Then the gels were scanned.

Why'd they scan the gels? To determine where to cut the gels so that the protein loads were thereby equivalent!

Should I know about this? Why haven't we all always done this when using SDS-PAGE to fractionate our proteins? We could break out the scanners and the Windows XP software that is up on a shelf somewhere from the days of 2D-gels and make them easily do this, right?

Back to the study -- they use all sorts of different extraction conditions and protocols and that is a big part of the study -- developing the methods to do this, but I'm obviously going to focus on the data -- and this is reaaaally cool.

They're starting with a standard and well-characterized hair sample (cause you can obviously get standard hair material(?)) and they use MSPepSearch to analyze the peptides from the digested hair. 40% of the peptides don't match anything in the NIST human spectral library database. 40%!!

In my mind there are 2 main causes for this and my first guess would be
1) The default button in MaxQuant and other software to ignore the common lab contaminants. I'm sure I've mentioned before my difficulty in studying phosphorylations in keratins because the software just hid them by default -- geez -- that was almost a decade ago.... my layout for PD 2.4 is still set to hide wool, Pug and trypsin peptides
2) Is it individual variation? Could it be THAT prevalent? That would be nuts, right?

The authors deploy NIST Hybrid Search to answer this question. If you haven't tried this, you should. FAST and accurate identification of delta shifted spectra against spectral libraries. I feel like I've given away too much stuff in this great paper already. It is NIST, so the paper is open access.

Friday, November 8, 2019

STRING 11.0 -- You should take time to revisit this resource!

Talk about a surprise! I am cranking on this cool dataset for a talented young biologist and I thought -- what the heck -- I haven't put anything into STRING in so long I'm not even sure if it is still supported and --- 

The output is just stunning -- and reeeeeeeaaaaaaaaly helpful for his model. Almost all the pieces fall right into place for this phenotype....obviously results will vary depending on your model, coverage, etc., Dr. JJ Park did the proteomics on these samples on an HF-X and the data is as good as I've ever seen, so that doesn't hurt at all. 

I suggest that if you put some data into String in 2013....

....and blocked the site on your browser so it would never happen again that you consider a revisit. This isn't the same thing at all anymore. 

It's not just me being out of the loop either, v11 is a substantial upgrade. Not only does the number of organism double, and the libraries that it reference increase markedly in this release, but this is the first version that allowed the upload of complete genome/proteome sized datasets. In fact, it gives you all sorts of warnings if you attempt to upload just the proteins that you've determined are significant. By default is wants to take all your data.

100% recommended you check it out!