Wednesday, August 18, 2010

Greening the Code

A couple of years ago I worked for a short while at Current TV as a data architect and then a systems architect for The place was as politically labyrinthine as medieval Florence, so the constraints on architecting scalability were heavy: fundamental things like a software release process were verboten, and a certain amount of money had to be spent on specific vendors, but no one wanted to say anything like that directly, so planning meetings tended to have a very strange flow, but that's another story altogether.

As a media company, Current is a heavy user of CPU cycles in video production, post-production, transcoding, programming, playback, and many other ancillary activities, and the official gospel was that current।com would become similar to myspace as an important social web destination, so the architecture I was tasked with designing needed to be 'fiscally responsible' but at the same time 'scale-ready' - of course, this was back when facebook was only open to university affiliations, so those numbers had not quite ramped up to current levels

Since Current's stated objective was to provide professional production for user-generated content ideologically aligned with Al Gore both politically and socially, and since Amazon was just then launching S3, the very first cloud, I made a proposal that would use dynamically instantiated virtual servers on a cloud, with load-balancing taking the form of controllers that managed these instances in response to demand. My initial estimates showed that instead of spending millions on equipment and hosting, this would enable a staggering reduction in projected costs and carbon footprint - if my numbers were right, this new cloud would, for example, enable mySpace to handle all its volume at that time for a total operations cost of just under $40K per month.

Well, I thought this was earth-shatteringly good news, and well worth pursuing on an aggressive exploratory line, since even if my figures were off by a factor of ten the numbers were still very attractive... It was also very exciting to come up with an alternative to traditional scalability plans that was so very ecologically friendly, and that very much went along with Current's stated political and social beliefs. On the other hand, it did not satisfy certain political objectives at all, which I would later come to understand were the top priority for management.

Even though I never got to implement the green cloud data center idea at Current, and it's been 3 years, the idea is more relevant today than ever. While I was at Yahoo, I noticed that there was a sort of soft wall of isolation among groups. Yahoo is divided into properties, such as Flickr, Buzz, or Mail, and functional groups that provide infrastructural services to these properties, such as login, video, or reputation services. Unfortunately, some of these services have sub-functions that are very repetitive, result in unnecessary bandwidth and CPU usage, and have imperfectly duplicate data, causing consistency problems that create yet more usage.

It is not rocket science that every CPU cycle has a direct carbon-footprint cost. Duplicate services translate directly into duplicate energy consumption during use, and they complicate QA cycles, causing more wasted energy. What is even more inefficient is that many of these data are stored in de-normalized forms, causing unnecessary disk i/o at each of these repeat trips.

So there too, at Yahoo, my idea was that the carbon footprint of serving one digital asset to one consumer could be greatly reduced by normalizing data and removing duplication of work. Unfortunately, getting a good picture of the cost in watts, for instance, of a single functional operation, such as logging on, is pretty tough, since it cuts across departments, disciplines, and possibly spans considerable geography as well, so putting the dynamic metrics in place to express the goal clearly would be a bit of work. Nonetheless, there is no question that an investment in analytical and architectural work would pay big direct dividends in energy-consumption per unit interaction, and yield some very direct public relations benefits as well.

As cloud computing becomes more popular, vendors are scrambling to make their clouds more palatable to architects, and the progress is palpable. Redundancy levels are up, outages are down, and spinning up high-level functionality like virtual data stores that offer an acceptable level of integrity and availability becomes simpler every day. Perhaps it's time to get serious about greening the code.

No comments: