How Big Data is Changing Everything
This article originally appeared in Xconomy.
There’s a radical transformation happening in information technology today, one that promises to be every bit as significant—and every bit as disruptive to existing business models—as were Web applications in the 1990s and virtualization in the first decade of the 21st century. It’s a foundational change in the way enterprises, their employees, and their customers manage, share, and secure the staggering amounts of data that pass through their hands every day. It will make data available at higher speeds, on more massive scales and at lower costs that anyone could have imagined even a few years ago. It’s Storage 3.0, and it’s happening right now.
The big story in IT today is “big data”—the almost inconceivable volumes of digital information created and delivered by sensors, financial transactions, video surveillance, Web logs, animation studios, genomics, online gaming networks and a literally unlimited number of other sources. It’s the inevitable but still breathtaking extension of Moore’s Law: There’s more of everything now, on corporate networks, on home computers and on mobile devices. More data is being produced by more endpoints, and the data that’s being produced, like high-definition video, is denser that would have been imaginable even a few years ago. All those ones and zeroes have to be stored somewhere—and, crucially, many enterprises want to keep their data forever—and IT systems worldwide are strained to the limit and beyond as they try to accommodate that demand.
A vast array of solutions—some of them enterprise-class, some consumer-oriented—have emerged to deliver the data storage capacity the world is crying out for. E-mail users frustrated with their Internet service providers’ data caps can use services like Box and Dropbox for oversized attachments. Online backup sites like Carbonite, one of the portfolio investments at Menlo Ventures, reduce the risk of data loss caused by system failure. Amazon and Apple let businesses and individuals keep everything from financial information to family photos in the cloud. The multibillion-dollar investments being made in these projects highlight our insatiable demand for storage, and the major business benefits to those who can harness it at scale. But these systems are not well-suited to big-data analytics.
The benefits of analyzing big data—that’s data measured not in gigabytes or terabytes, but in petabytes—are far-reaching, and they’re only beginning to be realized. The Human Genome Project is leveraging petabytes of DNA sequencing data as it transforms medical research, and automakers are crunching equally huge amounts of safety test data to improve their new-car designs. But smaller businesses, too, are generating vast amounts of data that they want to store, analyze, and preserve. Casinos now store and mine petabytes of video surveillance data. Gaming companies collect all their users’ in-game interactions to find ways to improve retention and monetization. Digital advertising companies collect and process terabytes of display, mobile and video ad impressions daily to improve campaign performance. Retailers analyze consumer purchases side-by-side with their ad campaigns to optimize revenues and gross margin dollars per shopping basket. Every business with a product or customer base of any size is feeling the competitive pressure to get smarter with its data, and do it now.
But traditional storage data architectures—what we now think of as Storage 1.0—can’t keep up with the demands of these environments. Fibre Channel is too rigid and too complex to set up and manage fast-growing multi-petabyte data farms. And legacy storage arrays are simply too costly to maintain at the petabyte level.
Storage 2.0 emerged in the last decade, with significant improvements on existing storage array designs and smart software features like thin provisioning, deduplication and storage tiering. The changes, and their bottom-line potential, got noticed, with providers like 3PAR, Isilon Systems and Spinnaker Networks bought up in a series of acquisitions worth more than $12 billion. (3PAR and Spinnaker were also Menlo Ventures investments.) These deals injected badly needed innovation into the storage industry, but they didn’t change the fundamental architecture of storage to enable flexibility and massive scale. Storage 2.0 still runs on the same old protocols, and that means the same old bottlenecks. Think of it this way: Storage 2.0 is a used car—one with a more powerful, more efficient engine, but still a used car. And like a lot of used cars, it just can’t keep up with the traffic.
Big data demands a radically different approach. Today’s data-intensive enterprises need massively scalable high-performance storage arrays with extremely low per-petabyte costs. Those arrays have to be connected with easy-to-manage networks that run at speeds of 10 gigabits (GBs) and above. And performance bottlenecks must be eliminated by the optimal combination of flash and disk technologies. The good news is that different approach is being created right now. It’s Storage 3.0.
Storage 3.0 is being driven by new crop of venture-capital-backed innovators that leverage technology disruptors like solid state flash storage, 10G Ethernet and new data storage algorithms. Flash storage greatly enhances big-data analytics, but it’s too expensive to scale in petabyte volumes. Fusion-io pioneered the use of flash to accelerate the performance of datasets that can reside in a single server. Avere, another Menlo Ventures portfolio company, takes flash a step further, putting it in the network instead of on the server. The private storage providers Violin Memory and Coraid , where Kevin is CEO, are delivering flash-powered storage arrays that can replace the biggest boxes from companies like EMC and Hitachi Data Systems.
But the Storage 3.0 story doesn’t end with new storage media and algorithms. It’s rapidly expanding to encompass “macro” innovations in the design of the entire networking and storage layers of the data center. The Ethernet storage-area network (SAN) model, for example, leverages the incredible price-performance gains of 10-GB Ethernet, and replaces old, excessively complex SAN design with a liquid, massively parallel cluster of low-cost commodity hardware.
Storage 3.0 represents the intersection of enterprise storage capabilities with modern cloud architectures. In order to keep up with big data growth and rapid changes in the compute and networking layers, this disruption in storage is a necessity. Billions of dollars in market share will be reallocated, and new tech leaders will be created. No one’s sure how it will play out, but it’s clear that the enterprise and cloud storage of the future will run on high-performance media, commodity hardware economics, scale-out architectures, virtualization and self-service automation. And it’s clear that it will run on Ethernet.
And that’s all good news. Whenever you take a fundamental IT component like storage and enable it to handle 10 or 100 times more capacity, at higher speeds and much lower costs, great things happen. Storage 3.0’s big-data innovations can deliver better medicines, smarter power grids, new quantum physics discoveries and more intelligent online services—and they can do it right now. So let’s hurry it up!