-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[GSoC-Genre] Initial DB schema for multi-genre support #14898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coupld thoughts. looks pretty reasonable so far.
id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
name TEXT NOT NULL COLLATE NOCASE, | ||
parent_id INTEGER DEFAULT NULL, | ||
is_active INTEGER DEFAULT 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is_active" Maybe we can be more specific. Active is probably not the best fitting word:
https://dictionary.cambridge.org/dictionary/english/active
For my undertanding we want to hide certain genres form the suggestions when selcting one.
We want to hide them form the genre tree probably, but we can't if the genre is used by a single track.
So I think first of all we need a flag for the suggestion box "for_tagging".
Do we need other degrees or aspects of active?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're understanding of the feature's goal is spot on.
My intention was to allow users to "hide" or "prune" genres from the main genre tree in preferences and from suggestions lists(like autocompletion when tagging).
You also raised a crucial constraint: a genre cannot be completely hidden if it's currently in use by any track and also maybe I agree that we need a more specific name for the column.
"for_tagging" is a good suggestions. Thinking about it, the flag controls more than just tagging suggestions.
What do you think one of these alternatives?
- is_visible (INTEGER DEFAULT 1): This seems clear and coincise. A genre is either visible in UI helpers (trees, suggestions lists) or not.
- is_hidden (INTEGER DEFAULT 0): This is the inverse, but can also be very clear. A genre is not hidden by default
CREATE TABLE IF NOT EXISTS genres ( | ||
id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
name TEXT NOT NULL COLLATE NOCASE, | ||
parent_id INTEGER DEFAULT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original proposal you mentioned a "n to n" realtionship. We have also discussed that we need a tree view with a strict single parent relationship for sorting in playlists. So the parent_id is good.
But what do we do with "Elektro swing" user has to decide if it is under "Swing" or under "EDM". https://en.wikipedia.org/wiki/Electro_swing. My expectation is that I find "Caro Emerald - A Night Like This" in a genre tree under Swing and EDM
So do we want to allow a second parent, like for git merge commits? We can make use of merge genres to create more parents if this really necessary. The alternative would be a association table to describe the paret/child relationship.
Feels a bit overdone...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what do we do with "Elektro swing" user has to decide if it is under "Swing" or under "EDM"
That's up to the user, that's why we want to be able to have 'multiple genres' in the genre field of a track.
There will also be tracks eg fitting in the 'Acid Jazz' genre, which can be seen as a sub genre of Electronic - Deep House or as a subgenre of Jazz.
So we need to allow a track to be in different branches of the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't want to have a genre in different branches in the tree? From the user perspective it would be convenient when all Electro Swing tracks will appear automatically under Swing and EDM without editing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strict Tree Hierarchy with Multi-Tagging: a genre can only have one parent (using the parent_id model), but a track can be tagged with multiple genres from different branches of the tree via the genre_tracks junction table.
The user would place "Electro Swing" under its primary parent (e.g., "Swing"). To have a track also appear under "Electronic", the user would simply tag the track with both the "Electro Swing" genre and the "Electronic" genre. The search functionality (a later task) will then find this track under both categories and their respective parent genres.
id | name | parent_id |
---|---|---|
10 | Electronic | NULL |
30 | Swing | NULL |
31 | Electro Swing | 30 |
The flexibility would comes from the genre_tracks junction table with two separate entries in that table for that one track:
- (track_id, 31) <--- Links to the track to the "Electro Swing" genre ID
- (track_id, 10) <--- Links the track to the "Electronic" genre ID
WDYT?
name TEXT NOT NULL COLLATE NOCASE, | ||
parent_id INTEGER DEFAULT NULL, | ||
is_active INTEGER DEFAULT 1, | ||
display_order INTEGER DEFAULT 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"display_order" is associated to "by artist" or such. Is it a "sort_id"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a number to build the custom tree, else the tree can only be build (preferences) on ID or alphabetic on name.
I can Imagine the user wants to put the most used genres higher in the tree than lesser used genres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of your sorting proposal with a number for each level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of your sorting proposal with a number for each level?
-> +/- it's to save the chosen custom view of the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it makes sens a simple number, populated as soon the user changes the order. The only issue is that the word order does not echo that 100% IMHO
In this case, it is a number that is used for sorting. In case equal numbers a sorted by name. So I suggest to make this more clear with something like "sorting_number" "sort_key" or "display_order_index".
parent_id INTEGER DEFAULT NULL, | ||
is_active INTEGER DEFAULT 1, | ||
display_order INTEGER DEFAULT 0, | ||
original_name TEXT DEFAULT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"original_name" is a good idea to track the relationship to the default model right? Is this the "default_name" so the user can select "back to default"? Or is this a kind of unique id, like a "programatic_name".
Or is this the en_US version in case the user wants to translate the Genre?
Or do we even want to allow that the default genres are translated via transifex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original name = name in eg musicbrains / discogs.
The name-field contains the custom name.
ATM I wouldn't translate via transiflex because then we get a custom name from a translator, let's start with a custom name = "name" and "original name"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The translation is only a display issue, no database issue. We just need to consider how we pass the untranslated genres to tr() that they are picked up.
When reading this, I was just stumbling about "original". Maybe we can find something more precise. If a user adds a custom genre, this field is empty, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can find something more precise. If a user adds a custom genre, this field is empty, right?
correct. referenced_genre ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a link to another genre. Can the user edit it or is it a read only value from our Mixxx genre tree?
Do we even need that value or can it be hard coded in c++ this will solve the tr() issue.
Here we need only a unique id as a pointer into the Mixxx tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other question is which data input we want to distinguish.
We have agreed to a build-in genre tree that the user may use as default. This default, is not editable as instructions in our GitHub repository.
Than we have the editable database. This can be filled from different sources mentioned above. Including from the build-in genre tree. The question is if and how we wan to trace the genre source and how to populate the two genre name fields. I have tried above to brainstorm scenarios as base of discussion. Probably confusing.
How about this idea:
- Have a build in "hard coded" table of Genres.
- Table has a Unique ID, English name with tr() and a List of alias names, and a brief description of the genre also with tr()
- The genre table in the database has a name and also the Unique ID. in case the name is empty the translated contend form the build in table is used.
- All data from external services is treated as user data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear:
If a user chooses to follow the genres as offered by an external source (discogs, musicbrainz) and wants to automatically import the genre for a track from that source in the library (and in the track metadata), the complete genre tree from that source must be in the database.
If a user chooses to use another name for a genre/subgenre... than used by the source, the user needs to create a genre link: original name (= source name) linked to name (custom name).
If a user wants to group different genres from the source to a custom genre they all need to have a genre link.
If a user doesn't want to use subgenres as offered by the source but instead wants them to be grouped in their parent (main or sub) the user just 'disables' the subgenres (they need to have the names used by the source in custom name or original name.
All this was my complex number 3 in which I proposed to have let the user choose a template to follow (template = genre tree from an external source)
As that could be complex, we can choose to add different source trees in the 'default' table, then we need a source-model-field. In the preferences we can let the user choose which model they'd like to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user chooses to follow the genres as offered by an external source (discogs, musicbrainz) and wants to automatically import the genre for a track from that source in the library (and in the track metadata), the complete genre tree from that source must be in the database.
They may introduce new genres after Mixxx has been released. That's why my idea came up to import genres when the appear for tracks. (in a addition to the Mixxx genre tree)
If a user chooses to use another name for a genre/subgenre... than used by the source, the user needs to create a genre link: original name (= source name) linked to name (custom name).
Ah OK, I did not understand it in a first place. So the original_name is actually a reference to an "is_active = 0" other genre? Or is it a link to the hard coded genre tree?
My first understanding was that a user is allowed to rename a genre via "name" and the "original_name" just keeps track of the old name. This is wrong, correct?
Can you please outline the steps that happen with the group genre scenario you brought up above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An other reason why I suggested the template import is the experimental nature of our users.
If the genre tree is 'hardcoded and a user starts messing around with it (deleting genres in a db tool, linking genres to other parents...), questions will rise 'how can I get the default genre tree back?'
As there will be other valuable info in the database we won't be able to tell them 'delete the database' or reïnstall Mixxx ...
So I have to return to my original idea: importing genre-models (discogs, musicbrainz, Mixxx...)
and exporting the tree as a model will be needed too because users will love to share their custom genre tree with others (especially when they start adding custom definitions for moods, energy ...)
I lost focus on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first understanding was that a user is allowed to rename a genre via "name" and the "original_name" just keeps track of the old name. This is wrong, correct?
In my original idea the user could indeed enter a custom name for the genre in the (external) model (linking external source to custom genres), but with all the input and emerging questions I need to reread my notes and re-check all questions.
I think my original analysis was correct, but we lost each other in communication the last days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks already good. Added some thoughts.
Thank you also for the really good PR description.
I am of the too lazy for doing the same.
<sql> | ||
CREATE TABLE IF NOT EXISTS genres ( | ||
id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
name TEXT NOT NULL COLLATE NOCASE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this value null by default? I can imagine that this field contains only the user override name.
Regarding the unique id. https://www.wikidata.org/wiki/Q3990466 collects unique genre ids of different services. |
This PR introduces the foundational database schema changes for the GSoC 2025 Multi-Genre Support project.
This PR modifies
res/schema.xml
Changes Details: Phase 1: Core Database & Logic
1.1 Database Schema Definition & Initial Migration
Establish the essential data structure for multiple and hierarchical genres, ensure migration of existing data
1.1. 1 Create a new
genres
table with the following columns:*
id
(INTEGER PRIMARY KEY AUTOINCREMENT)*
name
(TEXT NOT NULL COLLATE NOCASE, UNIQUE)*
parent_id
(INTEGER DEFAULT NULL, FOREIGN KEY togenres.id
ON DELETE SET NULL ON UPDATE CASCADE) - For hierarchical structure.*
is_active
(INTEGER DEFAULT 1) - To allow "disabling" genres.*
display_order
(INTEGER DEFAULT 0) - For custom sorting of sibling genres.*
original_name
(TEXT DEFAULT NULL) - To store the original name from a source if the user renames it.*
notes
(TEXT DEFAULT NULL) - For optional user notes.1.1.2 Create a new
genre_tracks
junction table (renamed fromtrack_genres
for consistency) with:*
track_id
(INTEGER NOT NULL, FOREIGN KEY tolibrary.id
ON DELETE CASCADE)*
genre_id
(INTEGER NOT NULL, FOREIGN KEY togenres.id
ON DELETE CASCADE)* Composite
PRIMARY KEY (track_id, genre_id)
.1.1.3 Implement necessary indexes on the new tables for performance and uniqueness
(
idx_genres_name_unique
,idx_genres_parent_id
,idx_genre_tracks_track_id
,idx_genre_tracks_genre_id
).1.1.4 Increment the database schema version to 40
const int MixxxDb::kRequiredSchemaVersion = 40;
1.1.5 Implement a basic migration script to:
* Populate the
genres
table with unique, trimmed genre names from the existinglibrary.genre
column (as top-level, active genres, withoriginal_name
set).* Populate the
genre_tracks
table by linkinglibrary.id
to the corresponding newgenres.id
based on a case-insensitive name match.1.1.6 Locally test schema creation (empty DB) and migration (existing DB)
Context and Purpose:
These changes are the first step as outlined in the GSoC project proposal. They lay the database groundwork required for all future multi-genre functionalities, including hierarchical management, improved UI, and enhanced metadata handling.
This PR addresses Task 1.1 in the main GSoC tracking issue
Testing Done:
This PR focuses solely on the database schema and initial data migration. Subsequent PRs will address DAO changes, UI implementation, and other features.