[GSoC-Genre] Initial DB schema for multi-genre support #14898

sdv0001 · 2025-06-04T10:16:24Z

This PR introduces the foundational database schema changes for the GSoC 2025 Multi-Genre Support project.

This PR modifies res/schema.xml

Changes Details: Phase 1: Core Database & Logic

1.1 Database Schema Definition & Initial Migration
Establish the essential data structure for multiple and hierarchical genres, ensure migration of existing data

1.1. 1 Create a new genres table with the following columns:
* id (INTEGER PRIMARY KEY AUTOINCREMENT)
* name (TEXT NOT NULL COLLATE NOCASE, UNIQUE)
* parent_id (INTEGER DEFAULT NULL, FOREIGN KEY to genres.id ON DELETE SET NULL ON UPDATE CASCADE) - For hierarchical structure.
* is_active (INTEGER DEFAULT 1) - To allow "disabling" genres.
* display_order (INTEGER DEFAULT 0) - For custom sorting of sibling genres.
* original_name (TEXT DEFAULT NULL) - To store the original name from a source if the user renames it.
* notes (TEXT DEFAULT NULL) - For optional user notes.

1.1.2 Create a new genre_tracks junction table (renamed from track_genres for consistency) with:
* track_id (INTEGER NOT NULL, FOREIGN KEY to library.id ON DELETE CASCADE)
* genre_id (INTEGER NOT NULL, FOREIGN KEY to genres.id ON DELETE CASCADE)
* Composite PRIMARY KEY (track_id, genre_id).

1.1.3 Implement necessary indexes on the new tables for performance and uniqueness
(idx_genres_name_unique, idx_genres_parent_id, idx_genre_tracks_track_id, idx_genre_tracks_genre_id).

1.1.4 Increment the database schema version to 40
const int MixxxDb::kRequiredSchemaVersion = 40;

1.1.5 Implement a basic migration script to:
* Populate the genres table with unique, trimmed genre names from the existing library.genre column (as top-level, active genres, with original_name set).
* Populate the genre_tracks table by linking library.id to the corresponding new genres.id based on a case-insensitive name match.

1.1.6 Locally test schema creation (empty DB) and migration (existing DB)

Context and Purpose:

These changes are the first step as outlined in the GSoC project proposal. They lay the database groundwork required for all future multi-genre functionalities, including hierarchical management, improved UI, and enhanced metadata handling.

This PR addresses Task 1.1 in the main GSoC tracking issue

Testing Done:

Verified successful schema creation from an empty database.
Verified successful data migration from an existing database populated with various single-genre scenarios.
Database structure and migrated data inspected with DB Browser for SQLite.

This PR focuses solely on the database schema and initial data migration. Subsequent PRs will address DAO changes, UI implementation, and other features.

Swiftb0y

Coupld thoughts. looks pretty reasonable so far.

res/schema.xml

daschuer · 2025-06-04T15:21:03Z

res/schema.xml

+        id INTEGER PRIMARY KEY AUTOINCREMENT,
+        name TEXT NOT NULL COLLATE NOCASE,
+        parent_id INTEGER DEFAULT NULL,
+        is_active INTEGER DEFAULT 1,


"is_active" Maybe we can be more specific. Active is probably not the best fitting word:
https://dictionary.cambridge.org/dictionary/english/active

For my undertanding we want to hide certain genres form the suggestions when selcting one.
We want to hide them form the genre tree probably, but we can't if the genre is used by a single track.

So I think first of all we need a flag for the suggestion box "for_tagging".
Do we need other degrees or aspects of active?

You're understanding of the feature's goal is spot on.
My intention was to allow users to "hide" or "prune" genres from the main genre tree in preferences and from suggestions lists(like autocompletion when tagging).

You also raised a crucial constraint: a genre cannot be completely hidden if it's currently in use by any track and also maybe I agree that we need a more specific name for the column.

"for_tagging" is a good suggestions. Thinking about it, the flag controls more than just tagging suggestions.
What do you think one of these alternatives?

is_visible (INTEGER DEFAULT 1): This seems clear and coincise. A genre is either visible in UI helpers (trees, suggestions lists) or not.

is_hidden (INTEGER DEFAULT 0): This is the inverse, but can also be very clear. A genre is not hidden by default

daschuer · 2025-06-04T15:39:15Z

res/schema.xml

+      CREATE TABLE IF NOT EXISTS genres (
+        id INTEGER PRIMARY KEY AUTOINCREMENT,
+        name TEXT NOT NULL COLLATE NOCASE,
+        parent_id INTEGER DEFAULT NULL,


In the original proposal you mentioned a "n to n" realtionship. We have also discussed that we need a tree view with a strict single parent relationship for sorting in playlists. So the parent_id is good.

But what do we do with "Elektro swing" user has to decide if it is under "Swing" or under "EDM". https://en.wikipedia.org/wiki/Electro_swing. My expectation is that I find "Caro Emerald - A Night Like This" in a genre tree under Swing and EDM

So do we want to allow a second parent, like for git merge commits? We can make use of merge genres to create more parents if this really necessary. The alternative would be a association table to describe the paret/child relationship.
Feels a bit overdone...

But what do we do with "Elektro swing" user has to decide if it is under "Swing" or under "EDM"

That's up to the user, that's why we want to be able to have 'multiple genres' in the genre field of a track.

There will also be tracks eg fitting in the 'Acid Jazz' genre, which can be seen as a sub genre of Electronic - Deep House or as a subgenre of Jazz.

So we need to allow a track to be in different branches of the tree.

So we don't want to have a genre in different branches in the tree? From the user perspective it would be convenient when all Electro Swing tracks will appear automatically under Swing and EDM without editing anything.

Strict Tree Hierarchy with Multi-Tagging: a genre can only have one parent (using the parent_id model), but a track can be tagged with multiple genres from different branches of the tree via the genre_tracks junction table.

The user would place "Electro Swing" under its primary parent (e.g., "Swing"). To have a track also appear under "Electronic", the user would simply tag the track with both the "Electro Swing" genre and the "Electronic" genre. The search functionality (a later task) will then find this track under both categories and their respective parent genres.

id name parent_id

10 Electronic NULL

30 Swing NULL

31 Electro Swing 30

The flexibility would comes from the genre_tracks junction table with two separate entries in that table for that one track:

(track_id, 31) <--- Links to the track to the "Electro Swing" genre ID

(track_id, 10) <--- Links the track to the "Electronic" genre ID

WDYT?

daschuer · 2025-06-04T15:44:21Z

res/schema.xml

+        name TEXT NOT NULL COLLATE NOCASE,
+        parent_id INTEGER DEFAULT NULL,
+        is_active INTEGER DEFAULT 1,
+        display_order INTEGER DEFAULT 0,


"display_order" is associated to "by artist" or such. Is it a "sort_id"?

It's a number to build the custom tree, else the tree can only be build (preferences) on ID or alphabetic on name.
I can Imagine the user wants to put the most used genres higher in the tree than lesser used genres.

Is this part of your sorting proposal with a number for each level?

Is this part of your sorting proposal with a number for each level?

-> +/- it's to save the chosen custom view of the user

Ah, it makes sens a simple number, populated as soon the user changes the order. The only issue is that the word order does not echo that 100% IMHO

In this case, it is a number that is used for sorting. In case equal numbers a sorted by name. So I suggest to make this more clear with something like "sorting_number" "sort_key" or "display_order_index".

daschuer · 2025-06-04T15:54:05Z

res/schema.xml

+        parent_id INTEGER DEFAULT NULL,
+        is_active INTEGER DEFAULT 1,
+        display_order INTEGER DEFAULT 0,
+        original_name TEXT DEFAULT NULL,


"original_name" is a good idea to track the relationship to the default model right? Is this the "default_name" so the user can select "back to default"? Or is this a kind of unique id, like a "programatic_name".
Or is this the en_US version in case the user wants to translate the Genre?
Or do we even want to allow that the default genres are translated via transifex?

original name = name in eg musicbrains / discogs.
The name-field contains the custom name.
ATM I wouldn't translate via transiflex because then we get a custom name from a translator, let's start with a custom name = "name" and "original name"

The translation is only a display issue, no database issue. We just need to consider how we pass the untranslated genres to tr() that they are picked up.

When reading this, I was just stumbling about "original". Maybe we can find something more precise. If a user adds a custom genre, this field is empty, right?

Maybe we can find something more precise. If a user adds a custom genre, this field is empty, right?

correct. referenced_genre ?

This sounds like a link to another genre. Can the user edit it or is it a read only value from our Mixxx genre tree?
Do we even need that value or can it be hard coded in c++ this will solve the tr() issue.
Here we need only a unique id as a pointer into the Mixxx tree.

The other question is which data input we want to distinguish.

We have agreed to a build-in genre tree that the user may use as default. This default, is not editable as instructions in our GitHub repository.

Than we have the editable database. This can be filled from different sources mentioned above. Including from the build-in genre tree. The question is if and how we wan to trace the genre source and how to populate the two genre name fields. I have tried above to brainstorm scenarios as base of discussion. Probably confusing.

How about this idea:

Have a build in "hard coded" table of Genres.

Table has a Unique ID, English name with tr() and a List of alias names, and a brief description of the genre also with tr()

The genre table in the database has a name and also the Unique ID. in case the name is empty the translated contend form the build in table is used.

All data from external services is treated as user data.

To be clear:
If a user chooses to follow the genres as offered by an external source (discogs, musicbrainz) and wants to automatically import the genre for a track from that source in the library (and in the track metadata), the complete genre tree from that source must be in the database.

If a user chooses to use another name for a genre/subgenre... than used by the source, the user needs to create a genre link: original name (= source name) linked to name (custom name).

If a user wants to group different genres from the source to a custom genre they all need to have a genre link.

If a user doesn't want to use subgenres as offered by the source but instead wants them to be grouped in their parent (main or sub) the user just 'disables' the subgenres (they need to have the names used by the source in custom name or original name.

All this was my complex number 3 in which I proposed to have let the user choose a template to follow (template = genre tree from an external source)
As that could be complex, we can choose to add different source trees in the 'default' table, then we need a source-model-field. In the preferences we can let the user choose which model they'd like to use.

If a user chooses to follow the genres as offered by an external source (discogs, musicbrainz) and wants to automatically import the genre for a track from that source in the library (and in the track metadata), the complete genre tree from that source must be in the database.

They may introduce new genres after Mixxx has been released. That's why my idea came up to import genres when the appear for tracks. (in a addition to the Mixxx genre tree)

If a user chooses to use another name for a genre/subgenre... than used by the source, the user needs to create a genre link: original name (= source name) linked to name (custom name).

Ah OK, I did not understand it in a first place. So the original_name is actually a reference to an "is_active = 0" other genre? Or is it a link to the hard coded genre tree?
My first understanding was that a user is allowed to rename a genre via "name" and the "original_name" just keeps track of the old name. This is wrong, correct?

Can you please outline the steps that happen with the group genre scenario you brought up above?

An other reason why I suggested the template import is the experimental nature of our users.
If the genre tree is 'hardcoded and a user starts messing around with it (deleting genres in a db tool, linking genres to other parents...), questions will rise 'how can I get the default genre tree back?'
As there will be other valuable info in the database we won't be able to tell them 'delete the database' or reïnstall Mixxx ...
So I have to return to my original idea: importing genre-models (discogs, musicbrainz, Mixxx...)
and exporting the tree as a model will be needed too because users will love to share their custom genre tree with others (especially when they start adding custom definitions for moods, energy ...)
I lost focus on that.

My first understanding was that a user is allowed to rename a genre via "name" and the "original_name" just keeps track of the old name. This is wrong, correct?

In my original idea the user could indeed enter a custom name for the genre in the (external) model (linking external source to custom genres), but with all the input and emerging questions I need to reread my notes and re-check all questions.
I think my original analysis was correct, but we lost each other in communication the last days.

daschuer

This looks already good. Added some thoughts.
Thank you also for the really good PR description.
I am of the too lazy for doing the same.

daschuer · 2025-06-04T16:42:55Z

res/schema.xml

+    <sql>
+      CREATE TABLE IF NOT EXISTS genres (
+        id INTEGER PRIMARY KEY AUTOINCREMENT,
+        name TEXT NOT NULL COLLATE NOCASE,


Isn't this value null by default? I can imagine that this field contains only the user override name.

daschuer · 2025-06-04T21:12:45Z

Regarding the unique id. https://www.wikidata.org/wiki/Q3990466 collects unique genre ids of different services.
I think we may consider to adopt one. Such ID will make the database immune against subtle name changes that might be introduced in future Mixxx default trees.

sdv0001 mentioned this pull request Jun 4, 2025

[GSoC 2025] Multi-Genre Support Implementation (Antonio Giordano) #14897

Open

Swiftb0y reviewed Jun 4, 2025

View reviewed changes

res/schema.xml Outdated Show resolved Hide resolved

res/schema.xml Outdated Show resolved Hide resolved

[GSoC-Genre] Initial DB schema for multi-genre support

f8c09b7

sdv0001 force-pushed the genre-db-schema branch from 94fcb47 to f8c09b7 Compare June 4, 2025 13:33

Swiftb0y requested a review from daschuer June 4, 2025 14:15

daschuer reviewed Jun 4, 2025

View reviewed changes

Uh oh!

[GSoC-Genre] Initial DB schema for multi-genre support #14898

Are you sure you want to change the base?

[GSoC-Genre] Initial DB schema for multi-genre support #14898

Uh oh!

Conversation

sdv0001 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Swiftb0y left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eve00000 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daschuer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daschuer commented Jun 4, 2025

Uh oh!

Uh oh!

sdv0001 commented Jun 4, 2025 •

edited

Loading

Eve00000 Jun 4, 2025 •

edited

Loading

daschuer left a comment •

edited

Loading