In the E-Discovery Features Arms Race, Scalability Still Reigns Supreme
In Disney's 1961 film, Babes in Toyland, an elderly toymaker struggling to meet a fast-approaching Christmas deadline is presented with a machine capable of making toys with no manual effort. The toymaker expresses skepticism and reluctantly gives the machine a try assuming it won't live up to the promise. A few seconds later, out pops a toy doll. Overjoyed, the toymaker ramps the machine up hoping to churn out toys in rapid succession. As the machine starts working faster and faster, it quickly malfunctions before ultimately exploding.
Of course, Babes in Toyland is fiction. Even the most sophisticated machines in today's world don't crank out toys like the one in the film. But the story of the toy machine isn't without an e-discovery parallel. Companies are constantly being presented with technologies to address various e-discovery needs, spanning the full EDRM lifecycle. Many of these products are adorned with snazzy looking interfaces, “powerful" features and “advanced" capabilities, which perform admirably in controlled demos with small data sets. However, more than a few of these products inevitably will start to perform poorly or, in extreme cases, malfunction altogether when presented with real-world demands. Like the toy machine, these technologies aren't scalable, defined by TechTarget as the ability of a computer application or product (hardware or software) to continue to function well when it (or its context) is changed in size or volume in order to meet a user need.
Scalability is a revered concept in the technology realm. Many would argue that it's the single most important factor that separates the truly powerful technologies from those that never make it off the ground. Google and Facebook, for instance, have grown into practical institutions thanks in large part to their remarkable ability to deliver the same user experience and performance no matter how much information is poured into each system.
In e-discovery, the topic of scalability is usually addressed in the context of exponentially rising data volumes. The technology being used to collect, process and analyze that data must be scalable in order to handle the digital information explosion. Achieving scalability is easier said than done. In many cases, technological requirements grow exponentially with the amount of data inputted into a given system.
How technologies scale is a complex topic. On a very basic level, it is important to understand the concepts of vertical and horizontal scalability:
- Vertical scalability refers to the ability to increase capacity by putting different layers of a business application each on their own server. For example, putting the database, application logic and web server all on separate machines will almost certainly improve overall performance. Multi-tiered applications offer an added benefit of tailoring the computing resources to the specific tier. If the application requires a high speed database, additional memory and storage systems could be made available for the database tier while the other tiers could utilize less expensive hardware.
- Horizontal Scalability: This refers to the ability to increase capacity by putting a single tier on multiple servers, so that they work towards a single purpose. An example of scaling out is the way Google utilizes thousands of independent servers to deliver the processing speed necessary to perform user search functions.
Most IT environments require a combination of vertical and horizontal scaling. E-Discovery systems that work best tend to have built-in flexibility to scale up and out, depending on user needs. One of the ways software companies can ensure their technologies are vertically scalable is by installing different processes (e.g. deNISTing and indexing) on separate machines, so each has greater capacity to perform its specific task and scale up to do so if needed. As workload increases, horizontal scalability can be achieved through load balancing, which involves assigning work to the least busy computer in a cluster of individual systems, to avoid overloads and minimize response time.
One of the increasingly onerous e-discovery challenges faced by multinational corporations is complying with international data privacy laws, which prohibit certain ESI from leaving its native country. Advanced e-discovery technologies leverage horizontal scalability to protect these companies from running afoul of the laws by assigning a matter's collection and processing tasks to distributed locations, so that they can be performed locally.
These are just a few of the many considerations that fall under the scalability discussion. As demands on e-discovery systems continue to intensify, scalability figures to be an increasingly important factor in purchasing decisions. It will also be what separates the viable e-discovery technologies from the toy machines.
To learn more about the importance of scalability in e-discovery software, read Exterro's recent article, “Key Considerations for Deploying E-Discovery Software in the Cloud."