Private medical record photos spotted in biometrics training dataset

Sep 28, 2022, 11:52 am EDT | Joel R. McConvey

Private medical record photos spotted in biometrics training dataset

Medical record photos are private — but that may not stop them from showing up in datasets used to train artificial intelligence (AI) and biometric systems, according to a story on Ars Technica.

A California artist who works with AI was shocked to discover that LAION-5B, a dataset scraped from publicly available images on the web, contained two post-op medical photos of her taken nearly a decade ago. The artist, who calls herself Lapine, said the photos were shot following procedures to treat dyskeratosis congenita, a genetic disorder that inhibits blood cell production in the bone marrow.

A signed release Lapine posted on Twitter clearly shows she did not consent to the photos being used anywhere outside her medical record. The surgeon who took the pictures died in 2018. How they got into LAION-5B is anyone’s guess. But one thing is certain: they are not the only sensitive biometric data in there. Ars Technica conducted a search to confirm that Lapine’s photos were indeed present in LAION-5B, and discovered “thousands of similar patient medical record photos in the data set, each of which may have a similar questionable ethical or legal status.” Furthermore, many of these may already have been integrated into commercial AI image synthesis services and used to train facial recognition algorithms.

LAION is a non-profit organization “aiming to make large-scale machine learning models, datasets and related code available to the general public.” In other words, its datasets are composed of lists of URLs to original images. So, while its website does have brief instructions on how EU citizens can request takedowns in specific scenarios (e.g., when image and name are linked), LAION does not actually host the images in its datasets. When Lapine posted a question about her problem to LAION’s Discord server, an engineer from the organization suggested she ask for it to be taken down at the source — i.e., it was not LAION’s fault her picture was out there to be scraped.

Lapine, for her part, still wants her photos removed from LAION 5-B and has paused her work with AI, for now, citing ethical concerns about what — or who — might end up in it. “Just because they scraped it from the web doesn’t mean it was supposed to be public information,” she says. “Or even on the web at all.”

The discovery comes weeks after AlgorithmWatch found that a facial recognition data set of trans people remained available online for several years after the initial controversy of its existence.

Article Topics

Private medical record photos spotted in biometrics training dataset

Article Topics

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events

Private medical record photos spotted in biometrics training dataset

Article Topics

Latest Biometrics News

Yoti challenges academic research, invites independent audit of age assurance platform

US probe puts prediction market identity controls under the spotlight

Age assurance landscape diverging between US, everywhere else

2026 World Cup to test online betting age verification at scale

ID4Africa’s Joseph Atick on why Africa is setting the pace for digital identity

UK selects Cognitec for facial age estimation in asylum assessments

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events