Friday, August 26, 2016

Ummm...gene name errors are widespread in the scientific literature.... you've probably ran into this before. Honestly, I would be really surprised if you haven't but if this paper is correct, Excel autoformatting is a much bigger problem than I ever guessed...

Ever sorted your output list and ended up with something weird like this?

Umm...why are there a bunch of dates in September at the top of my output sheet? And why do they have quantification values?

Cause if Excel sees the gene identifier for Septin 3, SEP3, its gonna assume you're just lazy and didn't feel like writing the date out correctly and it'll fix it for you. Which is all well and good, cause septins are super boring and don't really do anything...wait...they're GTPases!  Okay, no problem. We'll just put an apostrophe in front of it 'SEP3' and everything is okay.  Everybody else do that and we're fine (except...if you have to convert it to text and back and then it does it again).

This isn't the only one. Alexis knew several off the top of her head when ABRF Tweeted this paper yesterday, so its affecting the nucleotide bioinformaticians as well.

The title is flashy!

Sounds alarmist, right? How bad could it be?  Shockingly surprisingly bad!

They pulled thousands of published papers and supplemental files and looked for genes that had annotation mistakes that could be directly attributed to Excel autocorrects (autocowrecks?) they found hundreds of supplemental files per year in the relatively small list of journals they looked at.

The journal average is about 20% or so. About 20% of the supplemental data found in leading genomics journals in the last 10 years had some sort of Excel-linked mess-ups in the data. The sample size was smaller and the number of supplemental tables is always bigger in the biggest journals, but the jounal with the highest percentage of spreadsheet autoconversion mistakes? Something called ?Nature? -- with over 30% of files showing these issues.

The solution these authors suggest after all that searching? Databases!

No thanks! I'll turn off all the autocorrect and autofill functions, like this!

Go to File and Options is at the bottom and this window will pop up.

Important: Change your UserName to something funny so that people will see it as the author of any spreadsheet you send them (they'll see it rarely enough that it will stay funny for a while)/ Then go to the Proofing menu!

Then open the autocorrect options and turn a bunch of stuff off!

If you don't need it, turn it off!

Shoutout to the ABRF forum that turned this up and @ABRF for Tweeting it posting the paper link!

1 comment:

  1. Hi Ben, this is an important issue. But I don`t think changing autocorrect will help you in the SEP3 case. Here you probably need to define the corresponding column as 'text' instead of 'general' during import of the file.