Photos of Australian children found in AI training dataset, create deepfake risk
Personal photos of Australian children are being used to train AI through a dataset that has been built by scraping images from the internet – exposing kids to the risk of private information leaks and their images being used in pornographic deepfakes.
Biometrics researchers have been struggling with how to train algorithms to recognize children, particularly as they age, for instance for investigations of child sexual abuse material, and have turned to synthetic data to avoid potential harm to real data subjects.
The images of the children were collected without the knowledge or consent of their families and used to build the Laion-5B dataset, according to findings from human rights organization Human Rights Watch (HRW). The photos were then used by popular generative AI services such as Stability AI and Midjourney, The Guardian reports.
HRW claims that AI tools trained on the dataset were later used to create synthetic images that could be categorized as child pornography.
The dataset was created by the German nonprofit open AI organization Laion. The photos were collected from personal blogs, video and photo-sharing sites, school websites and photographers’ collections of family portraits. Some were uploaded decades before the Laion-5B dataset was created while many of them were not publicly available.
Human Rights Watch has so far found 190 photos of children from Australia but this is likely only the tip of the iceberg. The database contains 5.85 billion images and captions and the organization has only managed to review less than 0.0001 percent. Some photos were listed with the children’s names and other information, making their identities traceable.
Laion has confirmed that the dataset contained children’s photos found by Human Rights Watch and pledged to remove them. The non-profit also said that children and their guardians were responsible for removing children’s personal photos from the internet.
“LAION datasets are just a collection of links to images available on public internet. Removing links from LAION datasets DOES NOT result in removal of actual original images hosted by the responsible third parties on public internet,” the organization told The Guardian.
HRW’s children’s rights and technology researcher Hye Jung Han called on the Australian government to urgently adopt laws to protect children’s data from “AI-fueled misuse.” Australia is currently preparing to amend its Privacy Act, including drafting the Children’s Online Privacy Code.
“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” says Han.
Article Topics
Australia | biometrics | children | data privacy | dataset | face biometrics | face photo | generative AI
Comments