What New Email Behavior Study Can Teach Us about the Future of E-Discovery and Information Governance

Created on April 14, 2015

The typical email inbox is random hodgepodge of new messages, replies, spam, forwarded messages and auto replies. Guessing what message will appear next at the top of the unread list is basically a crapshoot. Our email distribution patterns tend to be as equally random. Most of us have no idea at the beginning of the work day what emails we will end up sending or to whom. We've grown largely accustom to the reactive nature of email, so much so that we can't really imagine working or living without it.   

It turns out our email behavior may be a bit less random and chaotic than we thought.

Researchers at Yahoo Labs in Barcelona and California and USC studied patterns of behavior from 16 billion emails exchanged between two million people in what has been described as the first large scale analysis of email conversations. Results of the study were reported in MIT Technology Review (I encourage you to read the full article here)

Some of the findings were somewhat predictable, such as the fact that replies on mobile devices tend to be shorter than those sent from a desktop or laptop, and emails without attachments tend to be replied to much quicker.  

But there were also some interesting findings from the study as well, including:

  • Men tend to send faster, shorter replies than women
  • Email replies tend to be shorter later in the day
  • Emails at the beginning of a chain tend to be similar both in reply time and length but grow increasingly more random as the chain nears its end.

Based on the results, researchers were able to build a machine learning algorithm to predict the stages of email conversations and the time and length of emails depending on where they fell in the chain.

According to the article, researchers believe that the findings could impact how emails systems are designed in the future and ease the burden of email overload by organizing emails based on priority, not just time received.

But email system designers aren't the only people that should be taking notice. The study should perk the ears of e-discovery professionals as well.

Despite the emergence of new communication platforms, email is still by far the most commonly requested form of electronically stored information (ESI) during e-discovery.

We've already seen in recent years the emergence of predictive coding technologies, which automatically detect patterns in documents and categorize them as either relevant or non-relevant. One shortcoming of these systems, which are advancing rapidly, is that they tend to operate in a binary fashion, providing little context as to the nature of a given document's relevancy. For instance, an email might contain a high volume of a particular relevant keyword. But that doesn't necessarily mean it's going to be important to the matter. And even if it is, how that email relates to others isn't always clear.   

As we learn more about email communication patterns and design more advanced algorithms that can predict the importance of an email based off its attributes (length, time to reply, etc.), legal teams can begin piecing together the larger story. For instance, the length of an email containing a high volume of keywords might indicate that it was sent towards the end of a longer exchange, prompting legal teams to investigate earlier emails that may have otherwise gone undetected using more traditional search techniques.

There are important information governance implications as well. The more we can learn about email behavior and are able to detect patterns in communication, the easier it will be to proactively address potential legal or compliance risks before they balloon into major issues.   

There is still a long ways to go. After all, this was the first study of its kind, and the algorithm's predictions, while impressive, were far from perfect according to the article. But it is highly encouraging to see that through science and technology, we're starting to learn that there is at least some method to the email madness.

Related Resources:

Applying Analytics to Information Governance (blog post)

What to Expect in E-Discovery and Information Governance in 2015 (article)