FB pixel

AI firm with ties to U.S. government exposes of billions of documents in breach

UpGuard finds PII, biometrics among 550GB of data exposed due to configuration error
AI firm with ties to U.S. government exposes of billions of documents in breach
 

New research from data security firm UpGuard shows that a U.S. government AI contractor’s massive database of sensitive documents was exposed on the Internet until the end of last month. In a post on its blog, UpGuard breaks down how Veritone AI exposed 550GB of internal and client data including audio, video and biometric image media, employee PII, police body camera footage, FOIA requests and related documents, employee credentials, system logs with authorization tokens, and more.

The exposed centralized dataset contained sensitive information about Veritone resources and users, including employees’ full names, usernames, and email addresses. But exposure of government personnel data was of particular concern. “Internal credentials also appear in the exposed logs, such as application tokens and, in some cases, plain text passwords. The unauthorized use of these credentials would grant a threat actor whatever level of access the exposed accounts held, possibly exposing additional sensitive data to a malicious third party.”

At least some of the exposed personal data was being used to train AI systems, which has some observers asking if machine learning algorithms touting their security bonafides are in fact creating a mother lode of vulnerable data honeypots.

“What we have become accustomed to call ‘artificial intelligence’ relies on concatenating pieces of an enormous dataset with a complex algorithm and detailed data tagging” says UpGuard. “Because AI technologies often require massive databases full of whatever information they are analyzing, both the likelihood and impact of a data exposure rapidly increase.” It notes that “a significant portion of the services Veritone provides for government and police agencies involves automatically redacting sensitive information from documents, analyzing facial recognition data (referred to as identifying suspects), and processing audio and video surveillance data to find insights, keywords, and types of images.” It also points out that Veritone provides AI services for a wide array of industries, including law, energy, and entertainment – meaning the potential for data breaches is everywhere.

UpGuard discovered Veritone’s first exposed Elasticsearch server hosted on the Microsoft Azure Government Cloud on March 23. It contained 464 million documents. The next day, the second server was discovered, containing 1.2 billion documents. According to the blog, “these servers did not require or ask for any credentials but rather provided anonymous access to anyone on the internet.”

After being made aware of the breach, Veritone secured the Elastic servers on March 30. The data is no longer publicly available.

In this case, the fault does not lie with Elasticsearch. The software, an open source search and analytics engine designed to quickly search large datasets, can be configured to require authentication. However, Veritone’s servers were not configured as such – an oversight that undercut other security measures and left the government data exposed. Elasticsearch has been transparent about the necessity of configuring the software for authentication. A blog from 2020 outlines simple steps users can take to secure their data from breaches.

In an interview with Axios, UpGuard VP of Cyber Research Greg Pollock says Microsoft is likely also off the hook. “Microsoft is providing the government cloud as a service; they’re probably not involved in the administration of this database,” Pollock says.

If the responsibility lies with Veritone in its failure to properly configure the Elasticsearch servers – as UpGuard’s assessment clearly implies in stating that “operational tasks such as spinning up an Elastic server should have controls in place to ensure that the server is not publicly accessible” – it is not the first AI firm to mishandle data. Still, given the volume and sensitivity of Veritone’s information, the breach could have significant implications for how AI training databases are collected, stored and secured.

Related Posts

Article Topics

 |   |   |   |   | 

Latest Biometrics News

 

Checkr launches sharable profiles, integrates Socure into Checkr Trust

Biometric background check provider Checkr has launched Checkr Profiles for verified credentials. According to a release, the product allows individuals…

 

Germany launches program to bring open source maintainers into standards bodies

Tech experts who lead open source digital infrastructure projects rarely get to participate in developing technical standards, even though three-quarters…

 

Aware’s Q1 2026 reflects transition to biometric orchestration platform focus

Aware reports $3.4 million in revenue in the first quarter of fiscal 2026, down slightly from $3.6 million in the…

 

Austria adds digital student id to eAusweise wallet with Youniqx Identity

Austria has rolled out a digital student identification card with the help of its technology partner Youniqx Identity, which has…

 

T-Shirts have become a facial recognition threat, a new study shows how to stop it

Discsussions of biometric presentation attacks typically center around financial fraud attempts, but the increasing use of facial recognition in public…

 

Indonesian startup launches multimodal biometric KYC, authentication platform

Indonesia-based identity verification and biometrics provider Beeza has launched a multimodal biometric authentication platform with proprietary biometric liveness detection for…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events