2 min readfrom Machine Learning

[P] Deezer showed CNN detection fails on compressed audio, here's a dual-engine approach that survives MP3

I've been working on detecting AI-generated music and ran into the same wall that Deezer's team documented in their paper, CNN-based detection on mel-spectrograms breaks when audio is compressed to MP3.

The problem: A ResNet18 trained on mel-spectrograms works well on WAV files, but real-world music is distributed as MP3/AAC. Compression destroys the subtle spectral artifacts the CNN relies on.

What actually worked: Instead of trying to make the CNN more robust, I added a second engine based on source separation (Demucs). The idea is simple:

  1. Separate a track into 4 stems (vocals, drums, bass, other)
  2. Re-mix them back together
  3. Measure the difference between original and reconstructed audio

For human-recorded music, stems bleed into each other during recording (room acoustics, mic crosstalk, etc.), so separation + reconstruction produces noticeable differences. For AI music, each stem is synthesized independently separation and reconstruction yield nearly identical results.

Results:

  • Human false positive rate: ~1.1%
  • AI detection rate: 80%+
  • Works regardless of audio codec (MP3, AAC, OGG)

The CNN handles the easy cases (high-confidence predictions), and the reconstruction engine only kicks in when CNN is uncertain. This saves compute since source separation is expensive.

Limitations:

  • Detection rate varies across different AI generators
  • Demucs is non-deterministic borderline cases can flip between runs
  • Only tested on music, not speech or sound effects

Curious if anyone has explored similar hybrid approaches, or has ideas for making the reconstruction analysis more robust.

submitted by /u/Leather_Lobster_2558
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#automated anomaly detection
#natural language processing for spreadsheets
#cloud-based spreadsheet applications
#rows.com
#conversational data analysis
#real-time data collaboration
#real-time collaboration
#data analysis tools
#AI-generated music
#CNN detection
#MP3
#AI detection rate
#source separation
#reconstruction
#mel-spectrograms
#audio compression
#stems
#Demucs