Vtuber Songs Confirmed in Large AI Training Datasets
The inclusion of VTuber music in these datasets means the vocal styles and catalogues of many popular talents can be directly synthesized by AI without their consent, raising questions about copyright and artist control in the VTuber industry.
Reporting from 1 sources: VTuber NewsDrop.
An investigative report by The Atlantic and confirmed by VTuber NewsDrop has found that songs from corporate and indie VTubers, including hololive, NIJISANJI, Kizuna Ai, and Ironmouse, are included in two large AI training datasets called LAION-DISCO-12M and Sleeping-DISCO-9M. The datasets, totaling over 22 million tracks from YouTube, are used to train AI music generators like Suno and Udio.
The Atlantic's Alex Reisner reported on four giant datasets shared among AI developers, two of which-LAION-DISCO-12M and Sleeping-DISCO-9M-contain VTuber music. VTuber NewsDrop confirmed that hololive and NIJISANJI talents appear, alongside Kizuna Ai, HIMEHINA, and indie VTubers like Ironmouse, Isaa Corva, and Bao The Whale. Producers such as Alohaii, PrettyPatterns, and TeddyLoid also have tracks in the datasets. HIMEHINA, Nanahira, Mori Calliope, AmaLee, AZKi, KAF, Derivakat, Hoshimachi Suisei, and Akuma Nihmune each have over 100 tagged results, meaning most of their song catalogues are vulnerable to AI synthesis. LAION-DISCO-12M was built by German non-profit LAION, which received funding from Hugging Face and Stability AI's co-founder. Sleeping-DISCO-9M was assembled by the Sleeping AI collective. AI companies Suno and Udio use these datasets to generate tracks mimicking specific genres, vocal techniques, or artist sounds.
Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.
Sources
- VTuber NewsDrop VTuber Songs Tagged in Large AI Music Training Datasets