198 Million US Voter Records Left Online For Two WeeksSome of the Information May Have Never Been Previously Released Publicly
A data analytics company firm aligned with the Republican Party says it accepts "full responsibility" after it exposed online a list that includes virtually all U.S. voter registration records along with extensive research that attempts to guess people's political views.
See Also: The 5 Foundational DevOps Practices
The data came from Deep Root Analytics, which describes itself as a predictive media analytics company. Exposed were 198 million voter registration records, which included names, dates of birth, home addresses, phone numbers and registration details.
In a statement, Deep Root Analytics says the data was exposed after access settings were changed on June 1. The data was secured two weeks later.
"We accept full responsibility, will continue with our investigation, and based on the information we have gathered thus far, we do not believe that our systems have been hacked," the company says.
Chris Vickery, a cyber risk analyst with cybersecurity startup UpGuard who has uncovered several major data breaches by scanning the internet, discovered the data, according to an UpGuard blog post. The discovery followed Vickery in December 2015 finding a mysterious database containing 191 million U.S. voter registration records (see 191 Million U.S. Voter Registration Records Exposed?).
The Deep Root Analytics data recently discovered by Vickery was contained in an unsecured Amazon Web Services S3 "bucket," which is the term for a storage instance on the popular cloud hosting service.
Anyone could have accessed the Amazon bucket by navigating to a subdomain with the letters "dra-dw," an abbreviation for the Deep Root Analytics Data Warehouse, UpGuard writes. The publicly exposed data amounted to 1.1 terabytes.
Security researchers have warned that configuration and other mistakes by developers could result in sensitive information being exposed online. Using the Shodan search engine, which queries internet-connected devices, it is often possible to figure out if, for example, a database has been left online that doesn't require a username and password.
"The fact that these confidential files were left on a publicly accessible server should not be a surprise," says Mike Shultz, CEO of Cybernance, a risk management security firm. "An organization's greatest threat is usually not an outside attacker, it's the people inside the organization and their mistakes that are the most frequent offenders."
What's the Risk?
Vickery's findings received substantial media attention on Monday. But whether the leak poses risks from a personal data perspective depends in part on the state where a registered voter lives.
Laws across the 50 U.S. states vary in relation to access and use of voter registration data. All but 11 states allow some public access to electoral roles. All states do, however, allow political parties and candidates to have access to voter registration records.
Wider public access varies. For the states with the fewest restrictions, it's easy to find current voter registration records online. For example, the site VoterRecords.com claims to hold 50 million registration records from Alaska, Arkansas, Colorado, Connecticut, Delaware, Florida, Michigan, Nevada, North Carolina, Ohio, Oklahoma, Rhode Island, Utah and Washington.
According to that site's FAQ, its data has been amassed from public sources. The data includes party affiliation, residential and mailing addresses as well as race-related information if a voter has chosen to share that during registration. The Deep Root Analytics' cache has some of that data as well, UpGuard writes.
Reading Voters' Minds
What's perhaps more concerning about Deep Root Analytics' cache is that the leaked data shows efforts to predict how voters feel about certain issues, as well as guess people's race and religion.
But this shouldn't come as a surprise. President Donald Trump's campaign is believed to have worked with firms that specialize in very detailed profiling of voters. By using publicly available data, the profiles can be used to craft carefully tailored online advertisements that may be more likely to drive voters to back a certain candidate.
The data compiled by Deep Root Analytics also appeared to draw on two other companies that work for Republican interests: TargetPoint Consulting, a digital market and political research consultancy, and The Data Trust, a Washington, D.C., group that collects voter files.
UpGuard says it appears Deep Root Analytics has compiled 9.5 billion data points on 198 million U.S. voters with the aim to guess "their likely political preferences using advanced algorithmic modeling across 48 different categories.
"The result is a database of grand scope and scale, collecting the modeled personal and political preferences of most of the country - adding up to an unsecured political treasure trove of data which was free to download online," UpGuard contends.
UpGuard also found data fields that read "modeled ethnicity" and "modeled religion," suggesting that the companies took a stab at guessing those characteristics.
Scale of Zero to One
But are the predictions accurate? The author of UpGuard's blog post, Dan O'Sullivan, writes that he located his information after finding his "RNC ID," which is a 32-character alphanumeric number assigned to voters.
Vickery and O'Sullivan found that they could link other analyses in the leaked data sets to real people using the RNC ID. That would allow someone who had the data to tie the algorithmic predictions to real names.
"This reporter was able, after determining his RNC ID, to view his modeled policy preferences and political actions as calculated by TargetPoint," O'Sullivan writes. "It is a testament both to their talents, and to the real danger of this exposure, that the results were astoundingly accurate."
One 50 GB file gives an insight into the analysis, UpGuard writes. It contains a list of voters - listed by RNC ID number - and then columns that indicate what trait is trying to be predicted, such as whether a person agrees or disagrees with U.S. foreign policy.
What's recorded is a decimal fraction. The closer the number is to zero, the less likely the voter is to support a particular policy, while a decimal fraction closer to one indicates that support is "very likely," UpGuard writes.
It's unclear what other sources the data analytics companies inputted into their algorithms in order to generate those scores. But UpGuard did find what it described as a large cache of posts from Reddit, the news aggregation and comments board. That suggests that other public data sources, such as social media postings, may have been scraped.
RNC Pauses Work With Deep Root
In its statement, Deep Root Analytics says the data found by UpGuard "was not built for or used by any specific client," and claimed that it was instead "our proprietary analysis to help inform local television ad buying."
The company added: "Deep Root Analytics builds voter models to help enhance advertiser understanding of TV viewership."
But a statement supplied to the Washington Post by the Republican National Committee suggests that the data may indeed have been intended for use by a client. The RNC, notably, says it has "halted any further work with the company pending the conclusion of their investigation into security procedures."
The organization adds: "While Deep Root has confirmed the information accessed did not contain any proprietary RNC information, the RNC takes the security of voter information very seriously, and we require vendors to do the same."
Vendor Management Questions
Chris Pierson, chief security officer and general counsel for payment services firm Viewpost, says the data exposure also raises vendor management and information assurance questions. "Every company that deals in sensitive or valuable data should have an information assurance program that risk rates their vendors, monitors them for security and other factors, and provides governance to the company regarding their third party and the risk appetite set by the company," he says.
One unanswered question about the data exposure is the extent to which Deep Root was actively monitoring for this type of information exposure. "The RNC database leak root cause appears to be sloppiness by their third party and might have been caught [via] mandated configuration scanning or cloud storage providers or other types of penetration testing," Pierson says.
Neither Deep Root Analytics nor the RNC could be immediately reached for comment.
Potential Long-Term Repercussions
One worry about this information exposure is how the data may have already been harvested by hackers working counter to U.S. national interests. "The exposure of RNC voter data ups the ante for election security in 2018 and beyond," says Pierson. "In the 2016 election we saw the influence and impact of indirect attacks on the DNC and how this could shape and influence our election."
Other countries, including France, Germany and the Netherlands, have also been targeted by these types of cyber propaganda campaigns (see Au Revoir, Alleged Russian 'Fancy Bear' Hackers).
Pierson says the information amassed by the RNC contractor could potentially be used to influence future U.S. elections. "With such a large data dump of RNC voter data and contact information, a nation state could reverse-engineer an influence attack on those individuals that might be able to affect their voting predisposition or the communications they receive in future elections," he says. "The unique problems attached to a voter database include the fact that the immutable characteristics, location, and age data will be viable points of attack for decades to come."
Executive Editor Mathew Schwartz also contributed to this story.