What an interesting week in the big journals! Nature is running this really great focus on Big Data and what the heck we are going to do with all of it. I think we're all pretty familiar with these problems by now. I am currently carrying 3 1TB hard drives with me so that I can work on my research on the road and have control data sets to show other researchers and still have space to process any data that they want.
Tranche is worthless to most people now. I haven't been able to get even 1 single RAW file off of the site in over 6 months. Supposedly there is data there, but I think it's probably just a room full of smoking servers. The problem is that we are generating, on average about 1GB/hour with most of the current instrument line. What do we do with all of this? How do we store it safely. Hell, how do we even process it?
(This is what our little server is running at right now.... 93% load... )
We're not the only people having problems. The Next Gen sequencers are having the exact same problem. A researcher in Maryland that I know told me that his sequencers can generate as much data in a weekend as was produced during the entire 13 years of the Human Genome Project. Yeah, they feel our pain.
The good news is that since we're all having problems, maybe solutions will be coming more quickly! If you're at all concerned with where we're going, I definitely suggest that you check this out!