A data center 609 square miles times six
By 2030, the amount of data is forecast to have grown to 1 Yottabyte (a lotta bytes, for sure...think 1 followed by 24 zeroes). That was the prediction of Martin Saddler from HPLabs, who I heard make a presentatation in London a few weeks ago.
Storing this data, according to Martin, would require a datacenter six times the space surrounded by the M25 highway. I did the math, the M25 area is 609 square miles, or around 1470 square km. So, you can imagine. If I add to this the fact that datacenter would consume 36% of the available energy, I’m sure you understand we have a problem.
Last week, the Wall Street Journal published an article titled “Forget the Cloud; the Fog is tech’s future”, where they highlight the concept of “edge computing” where the processing of the data is pushed out to where the data is actually gathered. Is that the future?
What data are we talking about?
Let’s go back to basics. The first question we will have to ask ourselves is the type of data comprising the 1 Yottabyte I talked about earlier. According to IDC, 68% of the data generated in 2012 is created and consumed by consumers, watching digital TV, interacting with social media, sending camera phone images and videos between devices and around the internet.
Sure, the Internet of Things will change the proportion. By 2020 IDC estimates that only a third of the data generated will contain information that might be valuable if analyzed. So, the bulk of the data will still fill up the internet for immediate consumption.
Indeed, there is data and data. In their article, the Wall Street Journal highlights the bandwidth problem of the internet and proposes to transform the data at the source to reduce bandwidth. Such approach can work with “Internet of Things” data, but will be useless for looking at video, a film or digital TV.
This points out that all data is not equivalent. On top of that, a large portion of the “Internet of Things” data is structured, a fair amount is numerical. Large quantities of information can be sent within a small bucket. It’s only video (e.g. from video surveillance cameras), photo or sound recordings that may require larger bandwidth.
The Internet of Things and “edge computing”
The vision of billions of sensors transmitting humongous amount of information is real. We can see the first glimpses with fitness bands, wild animal tracking devices, structural sensors on bridges and other artworks, the first prototypes of sensor enabled clothes etc. They all generate data today. They do that using a variety of communication mechanisms going from radio signals to USB plug-in. Yes 3G/4G bandwidth is limited, although 4G has a peak data transmission rate of 500Mbps, which is 100 times faster than 3G. It is true these numbers should not be taken for granted as the backbone pipes are often the real limiting factors.
The concept of edge computing consists in processing the data at point of generation and only transmit summary information to the next level up. It is even suggested that information is only transferred in case of alarms or unusual behavior being spotted at the edge, the point of origin of the data.
Although this may seem an excellent alternative for the Internet of Things, there are mainly three issues with such approach. The first one is security. Having the intelligence at the point of origin brings with it security concerns. What happens if the device code is altered so that specific events are no longer spotted and alarmed on? Sure, the sensor itself can be tempered with, but as soon as the data stops being emitted or demonstrate strange behaviors, alarms are risen. In the case it’s the device itself that manages its alarms, things become more complex.
Several articles have pointed to the security implications of the internet of things. It is protecting “edge computing” that is at the center of these discussions.
But there is a second point. Only transmitting summary information reduces the visibility of what actually happens. Operational teams are only interested in what happens now, and summary information gives them that. But people that want to analyze behaviors over time may want to exploit all details so they gain the level of granularity they require to understand how the “system” measured by the sensor is actually behaving. Telling you the bridge vibration at a given moment in time is 10 is one thing, knowing that it changes from 10 to 12 within a matter of seconds is something else. Getting the time dimension is critical in many situations.
The third point involves the correlation of data provided by multiple sensors. Again with aggregated data the correlation granularity is limited to the time intervals the data becomes available. Working with the raw data gives you a much crisper view of what is happening and may allow you to spot things much easier.
The role of the aggregator
As in the satellite business today, new roles will appear with the Internet of Things. Organizations managing a pool of sensors will own the data generated by those sensors. They will transform that data in meaningful information and sell that information to companies and individuals requiring it. They will play a role of data aggregator. When you want to take the motorway, you are not interested to know the exact interval between two cars, but you want to have a view of the fluidity of the traffic.
Aggregators may want to sell raw data to people needing them. In other words, if a researcher is looking at the traffic movement patterns in a particular location, he may need the actual sensor data so he can make out how the speed of cars evolve over time to understand how the fluidity of traffic is influenced by external elements.
Aggregators can decide whether they want to leave the raw data on the edge or take it back to the center.
Fundamentally balancing storage capacity at the edge with security concerns and bandwidth considerations should influence the decision process here.
And the other data?
As we pointed out the majority of data on the internet is not part of the Internet of Things today, and I do not believe the ratio will change dramatically in the near future. That other information does not lent itself for aggregation. As such edge computing does not really add value here. This is more about content delivery networks and finding optimal ways to store the appropriate data to reduce the bandwidth required at peak hours. 4G allows the life streaming of video to mobile devices. But as an increasing amount of users decide to look at an ever increasing amount of material available on the internet, the bandwidth issue will resurface quickly. That’s why the telecommunication companies are already working at 5G, which would provide a peak data transmission of 10Gbps. And guess what, I’m sure we will manage to run into bottlenecks with that bandwidth too. The more we make available to users, the more they consume.
The Internet of Things is an intriguing proposition. One that will force us to rethink through many issues, not the least one being privacy concerns. As I tried to sketch, multiple architectural approaches are possible, but as anywhere else, one size will not fit all. We require a holistic view on what to achieve with the Internet of Things to identify the best location for processing the data. The disappearance of net-neutrality may make the decision even more complex, as companies will then start competing for bandwidth. So, is the future, cloud, fog or edge? Frankly, these are nice buzzwords and concepts. What is key is understanding the end-to-end architecture, the implications of the choices made and the communication technologies used at each level.