Determining data value to reduce cloud storage risks
by Char Sample
Do enterprises know which data is being stored in the cloud, and where? Expert Char Sample offers some housecleaning tips to lower cloud storage risks.
Recent news stories like the celebrity iCloud hack have shined a light on cloud security in general, and cloud storage risks specifically. Much like the United States justice system presumes innocence over guilt, many view the cloud as secure enough until it's shown to be insecure. Vulnerabilities or cloud breaches are bound to happen, and when they do, they serve as reflection points where users must rethink their assumptions before deciding on the best choice in the security, cost and performance balancing act.
There are many reasons to store data in the cloud, and users have been doing so for years. Mail services are one of the early examples of data storage in the cloud. Historically, however, when these services were hacked, the attackers would typically use the compromised accounts to deliver malware, spam or both. More recently, the interest of attackers has shifted toward learning the users' personally identified information, or PII. This leads to questions about the content of the actual data, and thoughts about where the data should reside.
The distributed nature of cloud data introduces two areas of concern to security-minded people. The first area of concern deals with the processing and handling of the data. The second area deals with the value of the data. Generally speaking, the first area receives more attention than the second; however, both areas are of equal importance.
Data processing and handling
Cloud processing and handling of data has users facing a basic question: Who owns the data? This problem has not been solved and appears to have no solution in the near future. Part of the problem stems from the fact that users do not always want to be responsible for their digital data, and entrusting all users to securely manage their digitally stored data would require a significant portion of the population to learn more about computer security than they are willing to undertake. Thus, the cloud is a viable option for a large portion of the user population.
Data utility requires evaluation for the value of the content in the present, along with the potential value of that same data content in the future.
Another problem is that while cloud service providers (CSPs) are quite willing to store this information, they certainly don't want to take ownership of the data. The legal liability issues for the CSPs would be immense since the Internet is a domain of both legal and illegal commerce and the laws that govern Internet usage vary by country. Many of these issues about data ownership played out a couple of decades ago as e-commerce was introduced, however; when Internet service providers (ISPs) were in the role of today's CSPs, the data was not as widely distributed as is the case today with some of the large CSPs.
Let's consider how data is actually stored in the cloud. In order to achieve the high availability that defines the cloud, the major CSPs have geographically distributed data centers, with clustered server farms at each data center. Long-term storage of data at remote facilities represents an attack vector, as do the distributed data centers. These endpoints and the paths between them represent major attack vector classes for data stored in the cloud.
The security industry has spent decades attempting to secure the places where data is stored and transmitted, with mixed success. However, the nature of digital data, with multiple copies being made at various locations, begs the question: Is the data itself an attack vector? If so, this emergent vector needs guiding principles. In the meantime, if enterprises can't secure the places where their data is stored and transmitted, they at least need to know where those places are because that will give them the opportunity to decide which data should be stored in the cloud and where it will be most secure.
One aspect of data guidance comes from a time before the existence of the cloud. The old adage, "some things do not belong on the Internet," is still relevant; however, with the automatic backups that we see with cloud storage services and the movement of data between personal devices and the enterprise network, this problem has grown exponentially.
In this context, the value of data deals with the utility of data. Data utility requires evaluation for the value of the content in the present, along with the potential value of that same data content in the future. A useful analogy might be to consider an old photograph taken of a subject in his younger days and showing him wearing the styles of that era. At the time the picture was taken, the image provided no offense to the subject. However, the same picture many years later might cause the subject to cringe at the fashion it displays.
Now consider that instead of an old, funny picture, business or personal data is on display. The digital nature of the data ensures that once a copy of the data is in the cloud, it will remain there forever. Data of little value to an organization today becomes a data point in the future for mining and trend analysis. As a result, enterprises need to examine how the value of current data may change in the future.
Even data that passes the "safe-to-post standard" becomes a potential big data issue where it may be mined in the present or the future and joined with other data to create new data that may actually be sensitive or more valuable. This results in the problem that even the "safe" data is no longer truly safe. Yet, migration of data to the cloud continues.
There's also the issue of data that's not meant for cloud storage but inadvertently ends up there. Data that crosses the boundary between work and home provides a significant opening for an attacker to gain knowledge about the organization. This is not news, of course. But less obvious data, such as forwarded e-mails, that provides insight into the projects and roles of various individuals can sometimes go unnoticed as a major risk. Project information, personnel information and organizational information are all important to keep out of the public domain, and away from the cloud.
How can enterprises prevent valuable data from moving to unauthorized cloud storage services? Architecturally speaking, the most logical choice would be to create a decontamination network for personal devices that travel between home and office. A decontamination network is a sandbox,DMZ or some other trusted zone of the network where data comes in through different sources and users but must pass through processing layers of proxies and filters in order to join the network.
The decontamination network can provide the checks through application proxies that examine for keywords during data inventory and potentially prevent data breaches similar to the celebrity iCloud hack. The determination of those keywords will depend on corporate policies. However, a serious discussion on which keywords are important to the organization must take place across all groups of the organization so data can be properly valued.
The fundamental question of what to do about enterprise data remains, as does the bigger problem of what the server does with the data. But having a conversation about enterprise data value and inventory sooner rather than later will give enterprises the opportunity to make the problem manageable. Putting off this conversation, however, will likely result in an exponential growth of the problem.
About the author:
Char Sample, CERT security solutions engineer, has close to 20 years' experience in Internet security, and she has been involved with integrating various security technologies in both the public and private sectors. She is presently a doctoral candidate at Capitol College, where her dissertation topic deals with the use of soft markers in attack attribution.