Wednesday, September 25, 2019

Error rates in Match Between Runs!


I can't read this yet -- way way way behind on everything -- but this is a super important missing piece in the Match Between Runs (MBR) puzzle.

Again -- that's the secret (not secret) thing the Europeans have been doing years in MaxQuant that makes up for a lot of the reason why they always get more peptide IDs than we do with all our stuff here. If a peptide is ID'ed in one run then it doesn't need to be fragmented in every run if the MS1 is there and it matches in m/z, isotopic envelope (?) and retention time.

Some group in Boston has been doing really smart stuff with proteomics quantification by making absolute samples with things like TKO. In this study they do something similar to estimate the errors occurring when you match between runs. Again -- this needs time when I have time -- but it would be AMAZING to have a metric for how many MBR measurements are true/not true.

For context I've essentially built an MS1 library from around 48 files from ProteomeXchange from a region of the human brain that my friends really really care about and I'm using that with MBR to boost the number of IDs from the tiny tiny tiny amount of protein they've enriched from that same brain region from like 80 dead people. Searched alone -- I can get something like 500 total protein IDs from all of their samples. With the MS1 library I'm up to something around 2,300. Are 1,800 of them artificacts of MBR (or of me misusing it?)?  Hooooooooly cow, I hope not, but it isn't the simplest thing to manually evaluate a dataset this large. This study gives me hope because it looks like MBR is making mistakes at the PSM level -- but after you roll the data up the error rate diminished markedly!! I think it is a terrible idea to put blind trust in anything, but my life would be a lot simpler if I just sat back and said "maybe someone at Max Plank knows what they're doing! (and... maybe I can follow a nice instructional YouTube video without screwing it all up...)

Tuesday, September 24, 2019

Delta-S-Cys-Albumin -- A Rapid Quality Check for Plasma Quality!


"...freeze/thaw histories are often poorly documented.." might be my favorite understatement in the history of understating things. Okay -- everyone should read this paper -- even if this study only directly applies to human serum/plasma studies. This is super powerful.


I am so pumped. This has so much value -- and I passed by this paper because the title sucks.

It should be something like

"A Rapid QC for whether someone screwed up your plasma samples by leaving it out on a bench for 3 days after thawing it!"

You don't know where those patient samples are coming from most of the time. It's a huge problem in our field even finding out what color the top of the tubes are (there are secret phlebotomist codes -- purple is EDTA or green or backwards). Not knowing this can totally wreck your study. But -- freeze/thaw monitoring? That's what this is!


This modification happens to albumin at a predictable rate. So predictable that they can blind samples and then tell how long that plasma has been sitting around not cold.

Man...I love this.....I don't care if you've depleted it, there is still plenty of albumin around. If you've got a question -- like -- did I leave that box of priceless samples sitting out? You have a way to backtrack it. Yes -- this is an intact mass analysis -- and yes -- albumin doesn't fly all that well (due to all the dumb cysteines, I believe) but -- man -- what a tool I don't think we've ever had before!

Monday, September 23, 2019

cobia -- Cofragmentation Risk Prediction for MetaProteomics!

Okay -- I don't know who thought of this, but it's seriously smart.

Wait -- I do know who thought of it -- it was probably one of these people?!??


People doing metaproteomics don't exactly have the same advantages over everyone doing "normal" proteomics.

Normal proteomics example:  Weird biologist grows rapidly mutating cancer cell line that kind of sort of has some sort of similarity to human cancer cells after 45 years of being passaged in tubes by other weird biologists. But when they bring you protein from it, it's probably close to 100% that version of the cancer cell line (and whatever cow serum they grew it in -- and --well, let's face it -- probably mycoplasma -- I'M KIDDING! Geez....)

Metaproteomics example: I found this mud over here and I extracted protein from it. Can you compare it to the proteins from this weird rock we found in the Andes in 2008?

In metaproteomics you often don't know the starting material. If you're lucky, maybe someone did some 16s Ribosomal RNA stuff on it and you can narrow it down a little. Often, that's the whole point of metaproteomics -- what organisms are actually here and how many?

If you think you've got coisolation issues with your yeast digest, can you imagine what the coisolation (yo -- and in this case I mean peptides that are fragmented along with other peptides when you don't want them there) issue you have when your mud has 4,000 different species of mud bacteria...and yeast...and fungi...and some archaea...and some virus...and some decomposing tree...?

Okay -- so here is the framework for this great idea -- what if you could develop some sort of metric for how bad coisolation might be? Like -- maybe get an idea if your peptides might be more or less prone to disappearing into the background. This is critical because if your peptides are more coisolated (and their ID scores or whatever decrease) then it looks like that peptide -- and possibly that organism -- aren't there and it biases your results!

I don't get the maths stuff. What I do get is that when they apply this metric on all sorts of historic data they can refine and improve the metaproteomics data. I also get that you can get all the code here.

Okay -- and this is where I thought I was going to get to finally insert a video about Halifax (where this great work was done!) by one of my all time favorite comedy troupes...and then before I hit "post" I thought I'd watch it again. And...well... maybe the number of random "F-bombs" is too high for a direct link. You'll have to find it yourself, potty mouth.




Thursday, September 12, 2019

Who needs MS/MS? 1,000 proteins in 5 MINUTES with Direct MS1!

You knew this was coming, right? We've been working our way back this direction the last couple of years -- and here it is.

What is "match between runs"? It's essentially just MS1 based identification.

That's why BoxCar retrieves so many identified peptides/proteins. It increases the S/N and increases the number of MS1 based identifications. You lose MS/MS -- because it relies on MS1 based libraries. It seemed inevitable that we'd soon see the intelligent application of stand-alone MS1 based proteomics, but -- I'll be honest -- I didn't expect the data to look this good.

The idea of using MS1 exclusively for your peptide/protein IDs is not new. Peptide mass fingerprinting was described during the FIRST SEASON of Walker, Texas Ranger.


Some people in Washington were doing MS1 based ID quan in proteomics on big Helium cooled magnet systems and ultra-high quality HPLC systems before the first commercial Orbitrap came out, but as good as the resolution was, they were sloooooooooow and expensive and, I'd argue, the biggest weakness was that our understanding of the depth of the proteome was more than a little flawed. Now that we know that in basically every HRAM MS1 scan there is probably a PTM-modified peptide (or 10) and our libraries can grow up to reflect this...these approaches start to make more sense and false discoveries become somewhat(?) less ubiquitous.

These authors argue some additional points. 120,000 is a lot of resolution, and if you can get more than 4 scans/second, you can do some nice HPLC. And -- if we have learned anything in the last few years it's that the informatics side of proteomics that has been lacking -- in every area -- in every regard. (I do not mean this as a slight in any way to any of the great programs out there, but the people out there writing the new stuff aren't doing it in a vacuum. They're taking the traditional stuff, identifying the weaknesses, and fixing them. The reanalysis of beautiful old data with new better algorithms is basically what half our field is doing right now).

I can't follow all the weird Greek letters and all the Python scripts that this group has either developed, or painstakingly chosen for their daily operation from other groups (comparisons described in numerous previous studies) but I think think that this idea is definitely worth exploring and you should check this paper out!

My favorite observation from the paper might be that going up to 240,000 resolution did not improve the number of identifications over 120,000 resolution. The author's conclusion is that it's the relative loss in # of MS1 scans. In the end, the Orbitrap doesn't get any bigger when you crank up resolution. Any gains you get in resolving coeluting peaks is offset by the speed.

The deisotoping and peak detection was done with the Dinosaur algorithm. I only mention this now so I can use this as a valid excuse in my mind to use this great picture I just found.



Wednesday, September 11, 2019

SWARM-- Remove the adducts and clean up the data!


I'm not sure I get it. I probably shouldn't admit this, since the authors call it "straightforward" twice.


Wait. 3 times! And I think I get it now! Go espresso go!

Here's the problem: When you use ESI to ionize a protein you always get some dumb adducts, particularly if you are using some conditions to try and get the whole protein-protein or protein-ligand measurements and use ammonium acetate or whatever. It is less stressful when you've got one protein or a simple mixture, but it's a lot more stressful as your samples get more complicated.

The Sliding Windows part of SWARM is post-acquisition processing stuff. This is what threw me off. What if you assume that the adduct formation is a constant? Imagine you've got a single protein and you're going to incubate it with a ligand that will bind to it 1 or 2 or 3 or 4 times. You're already looking at intact mass spectra that aren't fun to figure out. Then imagine that you've got no adduct + adduct in there. Counting your no adduct no ligand protein you've got a 10 (?) actual protein combinations present and multiple charge states of each! Gross. Your deconvolution algorithm is going to have a hard time on this and every time it picks an adduct on accident -- fake mass generated....

In the simplest instance of SWARM (if I've got this right) you would run your protein alone, with no ligands. Then you'd figure out what is your protein and what your protein adducts are. Now you make the assumption that no matter if you add 1 or 4 ligands the level of the buffer adducts wouldn't change. So you subtract out all the peaks that have the + adduct signature! Yeah! I think this is what it's doing.

The authors demonstrate this works in simple cases then in more complicated cases, then backwards. The spectra are all acquired on a Waters QTOF with possibly an interesting nanospray ionization hack I'm unfamiliar with. (Could just be I haven't been around a Waters system in a loooong time). Deconvolution is handled mostly with UniDeC and SWARM is implemented through custom Python scripts. If they're publicly available, I missed the link in the paper.

I'm glad I continued to stare at this thing in sleepy puzzlement. There is a lot of power here, just not in my espresso this morning. I hope that the deconvolution software writing people take note of this. For something like antibody drug conjugates, this could be enormously valuable. The authors are careful to note that the main assumption (that adduct formation dynamics are consistent) may not hold true in all cases, but where it does? I'll take any decrease in spectral complexity I can get.


Tuesday, September 10, 2019

Do you have TMTPro (TMT 16-plex)?!? Here is how you process the data!


First off. TMT and TMTPro and probably the word "plex" are the sole properties of Proteome Sciences. Trademark. Copyright. Whatever is necessary to keep me out of trouble. (Big R with a circle around it?)

Important stuff! (Don't sue me) We can plex 16 channels!!

Next -- HUGE shoutout to -- (Wait. Don't sue them either....I should anonymize the person who works for a company, shouldn't I....?) You know who you are, anyway! Dr. Secret Scientist 1 and Dr. Ed Emmott for the resources. I did none of this. Wait. I'll totally make the method templates for Proteome Discoverer because someone wrote me this morning and asked me for them. I'll contribute something!

#1: MAXQUANT for TMTPro?  Best of luck. Have fun. I won't help you at all with this, however...

Dr. Emmott (who will be opening his lab in 7 weeks in Liverpool) has made all the XML add-ins you'll need to modify MaxQuant to use these reagents and made them available via this DropBox.

(...thank you Ed! and good luck with the new program! Need help carrying boxes?)

#2 PROTEOME DISCOVERERERERERERERER

This will require a couple of steps.

Step 1: You need to add the modifications to your instance of PD.

I recommend you update your UniMod






Both TMTPro 16 and TMTPro ZERO were uploaded today! I don't care about TMTPro Zero (sorry if you do, but you can figure it out. I believe in you! You're very very smart and people like you for obvious reasons.)

If you can't update your UniMod (offline or whatever) you can download this XML from my 100% totally nonprofit Google Drive thingy here.

Then you have to checkmark your TMTPro reagents, hit apply, and then in PD 2.2 I had to close my software and reopen it for it to take effect. Maybe in PD 2.3 as well.

Next you'll need to go to your Administration and import this quantification method (thanks Super Secret Scientist 1!)

Now you should have the picture at the very top of this way-too-long blog post!

16 quantification channels!

For proof that I've contributed something meaningful to human existence today. Here is the processing method for MS2 based TMTPro. You may note that the method name includes the words "probably wrong". I suggest you never get your methods from a completely nonprofit -- costs me a surprising amount of money each month to keep all these things going -- blog.

I wanted to make this as a reminder that TMTPro does not have the same mass you're used to at MS1.

And...according to Dr. Kamath (at a University, I checked, put them lawyers back on their leashes), who was just at a talk today about these reagents, you'll need to think about tuning your collision energy down for these reagents (hopefully to 27 NCE on a QE!)  I don't have details yet!



I think I'll be a reporter when I grow up. Today's blog isn't all that bad for a 9 year old.

Monday, September 9, 2019

Proteomics is not an island!


Okay -- move fast. That gif is super distracting and annoying! 



I just gave a Multi-Omics talk or two and this was great to draw from. What is daunting is that my talk centered on using metabolomics and genomics and proteomics in tandem.

What is a bummer is that
--glycomics
--lipidomics
and some other things are coming and -- if the big three don't hold the answer to your disease or model, it may realistically be the others.....


Sunday, September 8, 2019

The Glycan (glycomics?) field is coming -- and they're not messing around!


There is a mandatory quote from someone at every talk about glycan modifications of proteins that's something like "glycans are involved in every human disease." I spent some time trying to find where that came from, but since I couldn't find anything conclusive, I'm going to blame Jerry Hart for it.

Glycan chain analysis suuuuuucks....they all have the same stupid masses. Is it a GlcNaC or is it a GalNaC? It's all the same stupid HexNaC mass. It sucks whether you approach them when you've liberated them or when they're still attached to the peptides. The bond energy is waaaaay different for glycosidic and peptide bonds and if you are using fragmentation that is biased toward bond strength like CID/HCD you are only going to get part of the picture. But smart people are still finding ways to tackle this stuff.

Today's awesome example: 



Okay -- as cool as it is to say the glycan and disease thing above -- it's a lot harder to do something that opens up glycomics capabilities to the world. And that is what we see in this great new preprint!  I'm not going to embarrass myself or insult anyone with my interpretation of the biology. You know what is important and I do get? This is making cell specific libraries of glycans. That you can get and I can get and everyone can get and now we can use them! Because as stupid as the masses of the individual sugar things are -- when they make chains they are different!



Something this group has been working on for a while is the use of porous graphite columns to resolve glycans chromatographically. The work I'd seen previously had looked great, but the MS/MS was ion trap (still cool!) but here we see this technique powered up on an Orbi Velos Pro.

This group is....intimidatingly.... good at this stuff.

Pooled QC samples? Check
Randomized samples? Check
Internal standards? Of course
Blank + internal standards to verify carryover protection? Oh yeah.
Data publicly available? On Panorama Public here and Glycopost!

Wait. What's a GlycoPost? This is where I change the title of this blog post. Cause this is a new-to-me data storage site for the glycan stuff!  WoooHooo!



It's just as easy to use as ProteomeXchange! I just pulled down some RAW files from glycan analysis of the atlantic salmon!  Look....I'll never tell you what to do with your weekend....do whatever you want. I've got salmon sugars to look at!

Back to the Crashwood paper...okay...so how do you analyze glycan/glycomics data, anyway? Byonic (Protein Metrics) is used here...man...this is a lot of data....as well as GlycoMod -- which is really cool (link here) and something called the GlycoWorkbench (all code is available here!) and this is all used to filter down to Skyline for the quan and statistics.

Whew....okay...so glycomics still is not easy. It's hard even getting my head wrapped around all the stuff they did!

Okay -- but like I mentioned before -- I don't have to! Because I can just go to the Panorama public link above and just download their library!


BOOM! How cool is that? Look -- any study we do is a phenomenal amount of work and it's still definitely more work, but if you make it as easy as possible to get your resources and output that's how to ensure that you're making the future better!

Great study! 100% recommended.

I should be working on a talk, but I'm going to keep typing in this box.

Part 2: I want to remind you about SugarQB.


What's that? That's a totally free glycoproteomics search engine that works within the Proteome Discoverer framework. (There might be a stand-alone -- I forget)

There hasn't been a SugarQb paper, but it's been applied in a couple of great studies. You can get the nodes at www.pd-nodes.org

Here is the thing. SugarQb is great -- but only as great as the libraries that it has. This new resource from Gundry lab I just rambled on about allows me to power up my instance of SugarQb, because I can add this great new data to the human glycan library that I've got (it's just a CSV file!)

I haven't had Byonic in a while, but unless things changes -- it works the same way, and obviously this all works in Skyline as well!

Saturday, September 7, 2019

30 seconds to make the world better? Help Skyline!


The amazing wonderful and -- free! -- unifier of all things mass spectrometry -- Skyline software is supported by a combination of grants and direct vendor support. To keep this great package of tools we take for granted going, we periodically need to inform people that 1) we are here! and we're a growing field! 2) we take Skyline and our ability to compare data from instrument to instrument and from lab to lab for granted in what we all do.

Grants are due next week, meaning Skyline needs your help right now!  You can spend 30 seconds showing support here at this link (or on the image below), or spend 3 minutes writing a letter and upload it.

https://skyline.ms/project/home/software/Skyline/funding/2019%20LOS/register-form/begin.view?

Friday, September 6, 2019

TMT 16-plex is out! And someone already used it for single cell!!


We've been hearing rumors about this for months. And now you can order it here!


...and some people have already had access to it -- check out this KILLER application of it in ScoPE-MS!! for single cell!! Since the effects of ScoPE-MS are essentially additive, more channels equals more sensitivity (though...I mean...one cell at a time isn't a lot...but it's better than having one cell less!)


I don't have time to read this..yet...but the HF-X collected MS/MS at 45,000 resolution, so it looks like it seamlessly integrates into your workflow -- just with a bunch more channels!


Oh. Need to process the data? Here you go!

Correction: This study doesn't appear to use the TMTPro. Still a great title, though....

Thursday, September 5, 2019

Determination of Proteolytic Proteoforms with HUNTER!


I really truly try to read at least one paper in its entirety each day. It's a rule that I started when I worked for the great Michal Fried and I thought it was the only way I'd ever be able to have a chance of having any context for how to help apply what I know how to do to the brilliant medical stuff she does. A really bad day for me is when I don't get to even a single one.

A great day is when I start something like this brilliant new MCP study and I am learning from the very first sentence!


What?

Wait. So....did you know this? Should I blame this on all-to-frequent head impacts?  Look, I know about caspases. I know that they're amazing things to ignore when we're doing proteomics (that your quantitative difference might actually be that one set of cells has decided to go into it's own death cycle and is degrading it's own proteins) but is there truly something that we should just be considering in all systems that is as broad-stroke as N-terminal degradation?    My ignorance in biology aside -- that we could be deriving important context from how one side of a protein is systematically degraded -- how on earth would you quantitatively measure something like this?

TAILS would be my first thought. This technique is covered in these past posts (1, 2, 3)

HUNTER is detailed here and seems like TAILS went all Super Saiyan....wait...[Google]


....okay...of course that is a thing. YouTube video here....

I'd like to point out here that TAILS is a tough experiment from the sample prep side. HUNTER looks crazy hard. As someone who IS NOT good at sample prep, this looks like something I'd only try once I found someone really talented at doing it (or programming a sample handling robot) to do it.

Fortunately......


The authors walk you through how to do it manually as well -- but here are step by step instructions (sorry) on how to set up a robot for an impossible sample prep design!

As further proof of 1) this technique totally works and 2) it can be applied to various biological systems and 3) it produces useful biological data from all of them -- they apply this method to a variety of human systems and to plants.

By selectively labeling and enriching for N-terminal peptides, they demonstrate the recovery, identification and quantification of >1,000 N-terminals even when they start with micrograms of material.....

The LCMS work is demonstrated on both a Q Exactive HF and a Bruker Impact II, showing that this technique with all of it's power and apparent biological significance can be applied in any proteomics lab. Do I fully get why you'd want to do it from a biology level? Nope! But I know an awful lot of biological models out there where the -omics hasn't solved the phenotype...and here is a fully mature technique provided in excruciating detail that might be the way to the answer.

Wednesday, September 4, 2019

6 hour gradients + HF-X + DIA = 10,000 Human Proteins


...well....maybe I'll interpret the peak width stuff a little later....BECAUSE EVERY INDIVIDUAL FILE IS OVER 15GB!!!  (...well...at least the ones I'm interested in, from this study based on the results they report)


The title of this post kind of sums it up.

This team looks at several different chromatography conditions and materials to gradually build an ideal gradient for their ultra-long run DIA analysis. I think they settle on 60cm column of CSH solid phase at 250nL/min. This is probably a really good idea because they use a slEasyNLC 1200 system and at higher flow rates you'd probably run short of buffer.

0.3 x 360 (6 hour @ 60min)  = 108uL? Okay, so not as bad as I'd have thought. The total pump capacity is only 140 uL on each pump. If you assume that you use around 12 uL to load (didn't look, but that's typically what I expect) you're still okay.

They use an HF-X system with 120,000 resolution MS1 and it looks like 30,000 resolution DIA windows, but 60,000 when the gradient gets to 6 hours and beyond. 60,000 resolution scans take a long time. Your peaks are gonna shift by the time you get through a full cycle. To compensate for this, they throw in an additional MS1 scan part-way through to allow AGC to have better data to work off of.

MS1 (AGC calculation) - DIA/DIA/DIA/ MS1 (AGC calculation) DIA/DIA/DIA/DIA - Repeat (number of windows not accurate)


3 normalized energies are used -- 25.5, 27, and 30. I find this surprising because a lot of the recent DIA work I've seen has used direct eV for the fragmentation since the normalization doesn't do much. This is easier, and it's interesting to me that such a small step is worth the effort of putting it in!

SpectroNaut is the data analysis software and they do some interesting stuff with the data processing. In some experiments they rely on a library made directly from .FASTA, though it looks like ultimately the best data is obtained when they use it in combination with real libraries.

I'd hoped to look at the RAW data, but it looks like my ConCast home internet has said no. I've got 4GB downloaded and it still says 2 hours for one file. If you're interested in DIA there is a solid amount to learn from this new study.

Tuesday, September 3, 2019

Its MASH Suite time!


Have I talked about this yet? I forget and don't care!

I have the deepest and most profound respect for Dr. Neil Kelleher. He's always looking 20 years ahead and his lab has produced some of the best mass spectrometrists I've ever had the pleasure to work with. And he's done this all by being disciplined and 100% serious at all times. You'll never catch him wasting a second doing anything ridiculous. That's his secret, I think.

But -- I'll be honest here -- I've never ever in years of working with it been able to figure out how to use ProSightPC. I have many friends who have figured it out and use it all the time.  It's amazingly powerful software. It is the industry standard by 100 miles, but I'm too dumb to get it.

And -- what else do you use for top down proteomics? The weird command line thing someone at NCBI wrote in 1981? I mean, that'll totally do stuff probably. Not for me. (I assume all command line things were originally written on a Commodore 64).

In what I think might be the first serious back up plan for those of us with various ProSightPC deficiencies -- you should 100% check out the free MASH Suite software!



What's it do? TOP DOWN PROTEOMICS!

How much does it cost? Nuthing!

Do you have to read a manual? I guess not! I sure didn't and its been deconvoluting and searching intact protein data for me for months.

For real, it might be a neurochemistry issue. Maybe my childhood fear of PUFfins has something to do with it (top down protein analysis jokes...) but I can make this software do stuff.

And check out all the options it has!


5 deconvoluters! And none of them are Xtract!! Xtract is awesome for one protein. Xtract is NOT AWESOME for cell lysates. If I'm firing up BioPharma Finder with Xtract, I do it before I go home for the night. Fingers crossed  -- it might be done in the morning!

At the very least this is a great new set of tools -- for free!  And they're surprisingly easy to use!

You can get them at the Ye lab website here.



Oh -- and v1.1 just came out this weekend. If you've got the older one, go to "Remove programs" and uninstall it. You'll want v1.1.

Monday, September 2, 2019

How many proteins should there be, anyway?!?


I've spent a lot of time this year wondering things like "okay -- so -- how many fracking proteins are there supposed to be here, anyway?" And the answers are suprisingly murky.

What I do know -- proteomics loooooooves to use cancer cell lines. You know why? Because...


They aren't normal human cell environments. For one, most of them can't stop dividing regardless of what damage they pick up. "Oh....this neuroblastoma cell line is now expressing tooth enamel production proteins? Not normal, but it probably won't stop that cell from continuing to grow."

If you're doing work on healthy human brain tissue, you probably shouldn't see those tooth enamel production proteins, right?

We all have decent feel for what we should get out of HeLa digests on our instruments (or Hek or K562 or whatever) and unless you're doing cancer stuff all day those numbers are probably crazy high compared to what you're normally doing. Here is the question, though, how many should be there?

The picture at the top is taken from this Human Protein Atlas page.  Of 19,000 or so human proteins, around 11,000 are found in the human liver. Okay -- I actually chose the human liver as an example at random, but this actually comes from this brand new paper.


There aren't just liver cells -- the liver is an organ made of all sorts of different types of cells.

I'd assume that there is no way that a Kerpuffle cell would express every protein that an Marovaculus encoshelail cells would (if they did, they'd be the same cell, right?) so if we subsection the liver cells by flow cytometry or by laser capture microdissection then we'd expect that number of proteins to drop of markedly, right? We're talking less than 11,000 now. A lot less?

Seems very cell-type specific. For example, probably on the low end are the boring simple old red blood cells. Two recent studies (post 1 and 2 here) may only have 2,000 or 3,000 total proteins. They don't have to do much but haul hemoglobin and malaria parasites around. They don't need a ton of proteins. I'd expect everything else goes up from there?

Getting a good answer this morning has been tougher than I thought it would be...if anyone knows of a good breakdown or review, that would be great. I feel like I should be able to make one of the Atlas projects make a chart for me, but I hadn't figured it out yet. I also can't figure out my stupid washing machine (what ever happened to a dial? what's wrong with the spring loaded -- wash -- spin -rinse -spin? why does a washing machine need a really crappy touch screen user interface?) so -- grain of salt...it's probably easy....

Scholar insists that the answer in this paper (it isn't. this title promises a lot. the paper doesn't deliver)



What about the human protein map (JHU version)?  AHA!

There is this sweet chart that provides solid insight --



The bottom chart is all 30 tissues they tested. There are 2,350 (far right) proteins that were found in every cell type they checked out. On the opposite end are genes/proteins that are unique to one single tissue/cell. Most are in the middle. I think this says a lot -- like the Venn diagram would be horrendous to look at -- OMG -- it would make the best UpSetR plot...though....okay......I've got other stuff I should be doing. This makes sense to me. I don't think RBCs were done, but they'd be the low end -- in this 2,500 protein range and we'd see this complexity all the way up, since this should all be additive, but each human cell type would exists on a spectrum ranging from 2,500 proteins right on up.

Wait. What was the point of this? It wasn't to ask a question and then say -- "sorry, I totally don't know" but that seems to be what happened. There is a take-away, though!

If you're running some proteomics experiments, don't freak out if you don't get the 6,000 or 8,000 or 16,000 proteins that you expect from your HeLa cell line under the same conditions. Your cells probably don't have that many proteins. Probably if you look hard enough in the literature for your specific organ or cell, there is guidance on what you should expect. (Transcript studies like this one might be useful guidance -- if it isn't transcribed, it won't be translated so it may be the high numbers excluding posttranscriptional/translational thingies).

Chances are it's a lot lower than your cancer control digest, and the more homogenous the cells going into your digest are the lower those total # of proteins ID'ed should be.


Sunday, September 1, 2019

There is still no convincing evidence for the frequent occurrence of posttranslationally spliced HLA-I peptides.


HLA peptides are the hottest thing to talk about in mass spectrometry in the US. There are probably 20 posts on this dumb blog about them demonstrating how little I know about them, and -- immunology in general...

Why they're important:
If you know the neoantigen things on the cell surface you can specifically target those cells for destruction. There have been some successes and many, many, many failures.

A recent hypothesis is very controversial, that part of the reason we can't figure it out is that during the protein processing in the whatever-its-called proteins are post-translationally spliced and kicked out.

Zach Rolfs et al., disagrees. This is the abstract. The entire abstract.



This short paper brings up a really important point. So important that I'll use both italics and bold again. Your database you use for both forward and decoy searches can massively influence the results of your proteomics search. I assume that most of the misguided souls who read this blog just rolled their eyes so hard at this last sentence that it hurt their ears. Yes. Obviously it does. However, have you seen an example as important as this?

There are people who are attempting to make antibody based drugs to target these peptides that are being demonstrated on the cell surface of cancer and other diseases.

This team goes to a previous study and reanalyzes the previous study's data with the same software using the same settings and all they do is change the way the FDR stuff is done.

And the results are completely different. The spliced peptides appear to disappear. Almost completely.

Which is right? The original study? The new re-analysis? Why would you ask a blogger?

What I do know? That biology shit looks hard. Last year I did hundreds of quality checks on antibody drug conjugates there is exactly one person there (who makes like $27k in an exploitation that the NIH is allowed to do that is called a "postbac") who ever sent an antibody that actually was what he was trying to make.

My assumption isn't that everyone else was dumb and useless, it was that you had to be really gifted to make antibody based drugs.

And if someone is going to pay a really gifted person $8 an hour to work on this -- we should manually review the data we send them or...well....at least come up with a better and smarter way to do FDR on endogenous peptides! And this looks like a step in the right direction!