Tuesday, June 23, 2020

Spritz -- Proteogenomics for everyone!


WOOOOOOOOOOOOOOOOOOOOOOOOOOOOOHHHHOOOOOOOOOOOO!!!!


Preprint is here!

You can get the software here.

You do need the Docker Desktop thing, and if you were forced the mandatory Microsoft Edge thing in the last 5 days, Docker may have issues (my 2 PCs that had that update throw an error that my Windows version isn't new enough, my PCs that have updates disabled are just fine).

Like most Smith lab software it is really straight-forward.  You do need to make sure that you allow Docker access to the hard drive where you have Spritz.

You can either download your FASTq files from your "next gen" stuff or you can pull directly from SRA/SRX at NCBI here.

One surprise is that (being a dummy) I thought that Docker was just for GPU based software, so I wanted to make sure to run it on a PC with a decent old GPU, but the software recognizes how many CPU cores you have and defaults to using all but 1 of them.

THEN -- Spritz does all the proteogenomics magic stuff you've heard of  itself --
It makes a snake and then it calls the variants on your cellphone AND --

If the authors aren't lying (it is a preprint and I don't have time to verify this) -- you know how there is a version of UniProt databases in XML format that holds PTM information? What if you could cross reference the variants and this PTM information?



The authors demonstrate the use of Spritz in conjunction with both bottom up AND top down data!!


Monday, June 22, 2020

What is up with all the ubiquitin memes??


My inbox is pretty weird sometimes.....time to share!






I couldn't even find the one I was thinking about when I started this....I'll add it later if I run into it.

Found it!  Someone texted it to me. (Weird people have my phone number....)


Sunday, June 21, 2020

Advances in Proteomics (and metabolomics) symposium 6/25/2020!



 You know what you need right after your 3 weeks of ASMS videos??

Another Proteomics and Metabolomics Symposium!!!  You're in luck, there is one this week. It's free and you can register for it here.

John Yates is the Keynote and he's a great speaker, and I'm not sure I've ever heard Kathryn Lilley speak (bonus!) and Birgit Schilling always has something cool to talk about how she's working on stopping people (only cool people) from aging.

I also get to ramble a little about the #ALSMinePTMs project finally!! But this kind of snuck up on me so the data hasn't all been crunched yet.

Saturday, June 20, 2020

The Power of Three -- More Enzymes, More Search Engines, More Databases!


When I read the catchy title of this new paper, I thought of some TV commercial jingle, but if Google images gives you an old Doctor Who image, you use it....


...particularly if it's a cannabis paper. Why? I dunno. I just thought it was funny to type.

I haven't done tons of plant proteomics. Some cannabis stuff, a few grape vines,  a scumbag arabidopsis or three, maybe a tomato study that I consciously try to forget? seems familiar and besides being a pain to get the proteins out, and seeing anything but ruBisCO, the proteins are typically really short.

This group does a stellar job of trying to overcome the two relevant obstacles for them by painstakingly optimizing (in previous studies you'll find in the text) the extraction and digestion methods. They demonstrate here that the use of complementary enzymes is critical to getting good sequence coverage because of the stupid short proteins. They also scrounge up protein sequences from just about everywhere they can for the plant, throw in Mascot and SeQuest to sort it all out.

The only dubious part might be the use of  a really old version of Proteome Discoverer for FDR. When you get into larger number of PSMs either because you've generated tons of MS/MS spectra, or just have a tremendous number of sequences to sort through, we know that PSM level FDR has a tendency to under filter on it's own. This is probably best highlighted by the Ezkurdia et al., paper which caused most software packages to begin implementing peptide group level FDR steps as well.


Friday, June 19, 2020

Even more SARS-CoV-2 proteomics! (with phosphoproteomics via DIA!)





Rapid highlights?

-EvoSep 1 ultra rapid gradients + QE HF-X
-Fractionation + DDA for libraries
-HF-X DIA for (most of? again, moving fast, busy day ahead) the global proteomics
-Interestingly -- DIA for the phosphoproteomics and the # of isolation windows changes for the phosphoproteome, but the collision energy does not.
-Some of the prettiest downstream interpretation data you'll see this week. Are y'all keeping artists around for this stuff? My stuff comes out of Perseus looking like this.


(I actually Googled "ugliest R plots" and was relieved that most of the things that popped up are even worse than my histograms)

Thursday, June 18, 2020

The Proteome Landscape of the Kingdoms of Life!



Article titles are something that I think can go one of two directions. 
Direction 1) How many words can I squeeze into this box the publisher's provide in order to make sure that I describe everything I did in the most thorough way possible while ensuring that no one will ever actually look forward to reading this paper. 
Direction 2) BOOM. This is what we did and we've got a name for it. 

The problem with direction #2 is that you can sometimes get into the paper with the flashy title and feel like it was a marketing job. "Like....oh....yeah....of course I read it, but it wasn't that good." 

Every once in a while, though, you run into Article title/type #3 where you get a really good and catchy title for something that makes you want to stop your car at a closed rural highway truck weighing station so you can read it -- and even after you have to explain to a police officer why you parked at 4:30 AM at a closed weighing station  (defund the police) you're going to move your car out of the special space that says "emergency services only" and the paper is good enough that you're still going to finish your blog post anyway. 

What I'm typing about right now is the 3rd type. 

You can read about it here. Or you can go dig around in the data at www.proteomesoflife.org



What's it about? Well, they did proteomes of 100 taxonomically distinct organisms. For one study. Even if it was bacteria, that's a buttload of proteomes. And it's not all bacteria. It's all sorts of organisms. 

Quick takeaways in case I have to move a hypothetical car again. 

1) uPAC columns have somehow come in and vindicated what Jun Qu has been doing for upwards of 10 years now. Columns over 1 meter are all the rage! In this study 2 meter long uPAC columns were used. 

2) This is how you set them up! (due to the extra emphasis in this paper on the importance of grounding, I'm imagining there is a story that someone somewhere knows....) 


3) How would you do 100 proteomes for one study? 

You could:
-Digest with preOmics robot/Bravo
-StageTip
-Spider Fractionate (8 fractions) (yup -- there's over 800 HF-X RAW files!) 
-Run uPAC columns at 750nL/min with QE HF-X with DDA
-You could also mess around with MaxQuant targeted (further investigation warranted here)
-Process with MaxQuant
-Develop some new tools that will be necessary to deal with this much data (put them up on Github!
-Put all the data up on ProteomeXchange Partners (PXD014877)
-Set up a snazzy website for interpreting all the data
-And wait till June for Nature to publish the paper that was accepted back in April


Monday, June 15, 2020

piNET -- Downstream data analysis with PTM to MODIFYING ENZYME connections!




Okay --- maybe THIS is finally it? Maybe this is the tool that can link all the phosphoproteomics data together into something that saves us from having to look up each and every one, or relying on that one "pathway analysis guru" that you might or might not be lucky enough to have at your work, or occasionally go rock climbing with? (I've been trying to figure out a social distancing rock climbing solution, to no avail....booooo....2020....booo.....)



I haven't test driven it yet, but a quick check of the figure above (what I highlighted) MAPK1/3 are definitely linked to the phosphorylations on Q15418. If those are linked to TP53 phosphorylation regulation, UniProt doesn't shout it out in the first page, but it seems very likely that they'd be linked by one of the many data sources that piNET is actively referencing!


The trick is filtering out the studies where people were obviously going for peak bagging (trying to grab bragging rights for the highest number of phosphorylation sites they could identify, something that was mistakenly thought to be cool for about 2 weeks one year long long ago, that we all universally agree is not at all cool to do now and was just an unfortunate mixup) but with some filtering capabilities in this software? This looks like an insanely powerful and useful new tool for our utility belts!

Monday, June 8, 2020

Omics Discovery Index Rest Interface!



A lot of these tools have been out for a while (programs out in advance of this great recent paper)....

...but this is a great compendium of the tools that are available and the super simple command that you can use through the REST interface (what's a REST? here ya go)  to pull up proteomics data from anywhere and basically anything!

Proteomic informatics is getting better every day, and we might be leading the way for -omics when it comes to data sharing and transparency (having significantly smaller files might be helpful in this regard, but still)!

There is so much power here that you're basically only limited by your own imagination. Need a good example?

What if you were really interested in 2 proteins and wanted good experimental spectra from situations where you'd find them?

You could make a line like this.

 www.omicsdi.org/ws/dataset/search?query=UNIPROT:P08648%20AND%20UNIPROT:Q99714%20AND%20omics_type:Proteomics.

...after the "UNIPROT:" put in any protein identifiers that you want (in this case I put in Integrin Alpha 5 for humans in place of the example the authors used, the second one appears to be a mitochondrial thing of some sort) and hit enter!


WHOOOOOA!  Overwhelmed with data? Yes you are!

Note at the end of the line where it says "type:Proteomics."??  You know what that means? You don't need to restrict yourself to proteomics. Obviously, I would start with the best -omics, but if you wanted data from the inferior ones, they're available as well!

Sunday, June 7, 2020

Friday's LPDG Talk -- Clinical Classifiers of COVID-19!


Hey! Yes, my favorite week of the world is currently happening (#ASMS2020), and the largest acts of civil disobedience in half a century are occurring around the world, buut -- COVID-19 is still happening.

Our good friends at the London Proteomics Discussion Group haven't forgotten and they're still trucking away to find the science being done and bring it to everyone's attention.

This Friday -- Clinical Classifiers. Huge deal! There are hypotheses being kicked out all over (today I heard that people with certain blood types are faring worse than others? Not sure if that is true, but it's clear some people are doing much worse than others, and viral load (level of exposure) doesn't seem to be answering all the questions.

You can register to see this ultra interesting/pertinent talk here!


Saturday, June 6, 2020

One ASMS 2020 TakeAway -- GPUs are finally coming!


I'm woefully behind on ASMS stuff. 2020 (the year itself) has been a little tiring.

One takeaway that I've got for this year is that we're finally seeing Graphics Processing Units in Proteomics. I think I've rambled about this here before, but I can't find it. Anyway, in today's computer stuff we basically have 2 kinds of processors

Central Processing Units (CPUs) -- these are the stickers on your computer "i7" "XEON" or, more commonly now "Ryzen" and, if you're really lucky, "Threadripper" (I am not this lucky yet)

CPUs have a small number of cores but each core has access to tons of resources.

Graphics Processing Units (GPUs) have TONS of cores, but each core is capable of doing only very small things, like controlling a few pixels.

A lot of genomics has even gotten to the point where it has advanced even further


to Application Specific Integrated Circuits (ASICs) these things are even dumber than GPUs -- an ASIC is designed to do only one job. And when you focus processors to just one task they can be really really good at it. The first ASICs for genomics were advertising 100x increases in speed over GPU alignments. We'll get there one day! 

Most of us are using CPUs and doing just fine with it. Someone I talked to at ASMS had his PC running for about a month solid on one analysis for his talk....indicating that sometimes we do need more power....but most of the time we're okay. 

Where do you need GPUs in proteomics? 

1) Deep learning (PROSIT, etc.,) 
2) Processing absurdly large files (liket those 40GB per run TIMSTOF files, maybe?) 

Worth noting, John Yates did a talk for Bruker (you can find it in the clunky Horsebrutality suite thing they have set up) and it's one of the best educational talks about the evolution of proteomics data processing I've ever seen. 100% recommended. The Yates lab has been using GPUs for data processing for several years through the commercial program IP2....which... I think my search bar is broken, I know there is stuff here about that....

3) Other programs are out there, like ANN-Solo GPU, and G-MSR. This isn't new, but -- okay -- this is big -- 

4)  You know that Phase Constraint thing that we keep hearing about (and finally saw in a very very limited form on the Exploris 480 and Eclipse instruments last year)? 

You know why it isn't running on everything all the time? It's super computationally difficult. You're pushing your resources on the Exploris hard to even do the phase constraint in the narrow window around your TMT tags (what, 10 Amu?) 

I strongly recommend you check out 

MP 127 High dynamic range proteome analysis with BoxCar DIA and super-resolution Orbitrap mass spectrometry

-- because these jokers set up some GPUs (TITANs!! The crazy expensive server ones) so they could phase constraint across entire mass windows. The speed and resolution increase is so much that they can do BoxCarDIA using EvoSep separations and dig way deeper into the plasma proteome. 

They process the data in SpectroNaut, which I'm going to guess now does something with the BoxCars, cause I swear it did look like it didn't look at the BoxCars MS1s at all, but I bet they fixed that! 


Tuesday, June 2, 2020

ASMS Statement And Characteristic Rambling



There is some great -- unfreaking believable -- amazing science streaming as the ASMS reboot and ... just when the world seemed like it couldn't get more chaotic ....protests about excessive use of force by the police were met by even more excessive use of force by police.... during a global pandemic ...and it's hard to pay attention to even the best science in the world.

This blog isn't right when it's serious so I'll throw in the true story about the one time that I met Richard Yost. We're all missing our ASMS friends right now. One of mine is about 5 foot tall and hilarious -- and she might be able to lift a small car. One year she'd loaded up on cool vendor swag like cannabis themed light up gear, including a brightly flashing cowboy hat that I think had a pot leaf and Shimadzu logo on it. She was carrying me piggy back style and running really fast. 100% her idea. And I got to say something characteristically intelligent like "Holy shit. You're Richard Yost." And he said something like "It looks like you guys are having a fun conference!" And we were gone. As I said, she was going really fast.

2020 has been a tough year. I can not wait to see everyone in Philly next year!!

And....


....stay strong, y'all.

Want to help? Here is a great article on how.

Monday, June 1, 2020

Needing inspiration on what ASMS Talks to Watch? Here is the FeMS list!

ASMS can be (i.e. IS TOTALLY) overwhelming even the best and easiest years. Even if you've been strategizing all morning on what talks to catch you probably have a hole or two in your schedule. Want to try a different sort of list?

Check out the FeMS list here! It's a live document so I'm sure there will be updates coming!

IT'S ASMS 2020!!!


We're 97 minutes from my favorite holiday! As appalled and furious as I am about what is happening in my country right now, I'm selfishly going to step away from thinking about it and tune in as much as possible to see what the best scientists in the world are doing with earth's most accurate measurement technologies.

My blog is having some glitches, this morning, however.... but that isn't important. What IS important is that I see you at ASMS!  Hit this button to go there! 

https://eventpilot.us/web/page.php?page=HomeCustomIntHtml&project=ASMS20&id=customnow