The Relativity of Einstein, Elephants, Air Mattresses and ETL
According to Albert Einstein, the definition of insanity is repeating the same actions and expecting a different result. While I won’t go quite as far as to call it insanity, it has always bothered me that people keep tuning ETL tools that can’t handle larger data volumes. Over Christmas I had an experience which helped me understand at least some of the logic behind it.
It was Christmas Day and I was staying at the home of my fiancé’s parents. I had taken an inflatable bed so that we could stay the night after indulging in way too much turkey. Having managed to shoehorn the bed into a room that was entirely too small for it, I settled down to sleep. Shortly thereafter at about 3 a.m., I woke to find that I was being swallowed by the mattress. It had developed a slow puncture. For those of you that haven’t experienced it, moving around on a deflating air mattress is not easy or fun!
Knowing that if I got up and off the mattress it was going to deposit my fiancé onto the floor, I had little choice but to inflate the mattress from where I was (waking up everyone else in the house in the process). From that point forward, I spent nearly every hour repeating the same process of inflating the mattress until it was time to get up for the day. Needless to say, I was grumpy and the rest of the house was irritable that entire day. There was also a large air mattress deposited directly into the rubbish bin!
This whole situation got me thinking. Even though I knew it wouldn’t help for more than an hour, why did I continue to inflate the mattress throughout the night?
For starters, I didn’t think that I had any other options (although the 4 hours I spent sleeping on the sofa the next day while Boxing Day chaos continued around me proved that wrong). I also thought (at least for the first inflation at 3 a.m.) that inflating the mattress would permanently solve the problem. It was after the second time (okay, probably the third) that I got wise.
Bringing my crazy story back to ETL, the vast amount of people out there “tuning” ETL tools are likely working on this same logic. The first time they do it, there is probably an assumption made about only needing to do it once. The second time, they maybe think that they just didn’t get it quite right last time and this time will work exactly right. The third time, the harsh reality of their situation starts slowly seeping in as they realise they could be doing this for the rest of eternity and not get the result they are seeking.
However, here is the thing. Ultimately, I knew I only had to keep inflating the bed that one night. The next day the leaky air mattress would be in the bin and I’d be at home sleeping in my own bed. People who “tune” ETL tools don’t have that luxury. They know data volumes are increasing (between 10% and 500% a year depending on which customer I talk to) and fundamentally their ETL tools aren’t going to help. Sure, they can try and buy more hardware (a bigger air mattress), but that’s just a temporary (and very expensive) measure because that leak is definitely going to reappear.
In fact, given all the discussion about Hadoop and Big Data, I am now picturing an elephant standing on a deflating mattress! For those of you that made it to this point in my post, thank you for sticking with me. Now it is your turn. I’d love to hear about your thoughts and experiences tuning ETL tools to handle larger data volumes. Comments are welcome!