Skip to content

Enum obsolete value handling maybe slow with many records #11986

Description

@beerbohmdo

Description

I have a database with relative large tables. When I run db:build its hangs multiple minutes for checking if the database still contains old enum values.

I found the origin and patched it for me in the past. As we use postgres I think it be a niche issue.

As now I have to migrate my patch to ss-6, I will submit a PR for that anyway.

Additional context or points of discussion

I was able to narrow down the issue to this section:

        // Get all enum values
        $enumValues = $this->getEnum();
        if (DB::get_schema()->hasField($table, $name)) {
            $existing = DB::query("SELECT DISTINCT \"{$name}\" FROM \"{$table}\"")->column();
            $enumValues = array_unique(array_merge($enumValues, $existing));
        }

I am not completly sure why, but for large tables this can be realy slow:

QUERY PLAN
Unique  (cost=1000.46..588711.55 rows=1 width=10) (actual time=2764.849..2786.626 rows=1 loops=1)
  ->  Gather Merge  (cost=1000.46..588711.55 rows=2 width=10) (actual time=2764.848..2786.615 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Unique  (cost=0.44..587711.29 rows=1 width=10) (actual time=0.062..1831.997 rows=1 loops=3)
"              ->  Parallel Index Only Scan using ix_e3dc4f5497f7676548ca90e713c5e001 on ""Container_Versions""  (cost=0.44..559543.66 rows=11267055 width=10) (actual time=0.060..1187.245 rows=9052704 loops=3)"
                    Heap Fetches: 6232848
Planning Time: 0.160 ms
JIT:
  Functions: 4
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 0.591 ms, Inlining 150.968 ms, Optimization 23.279 ms, Emission 23.194 ms, Total 198.031 ms"
Execution Time: 2786.999 ms

A group by does't help either.

My solution:

I fetch the current schema of the enum column, compare it to the new schema.

Than I check for the obsolete classes if any of them exists in the table. I don't know why but a EXISTS-Query is fast 🤷

        // Get all enum values
        $newEnumValues = $this->getEnum();

        if (!DB::get_schema()->hasField($table, $name)) {
            DBEnum::$enum_cache[$table][$name] = $newEnumValues;
            return $newEnumValues;
        }

        $currentConstraintValues = DB::get_schema()->enumValuesForField($table, $name);

        if (!in_array('', $currentConstraintValues)) {
            $currentConstraintValues[] = '';
        }

        if (!in_array('', $newEnumValues)) {
            $newEnumValues[] = '';
        }

        $removedValues = array_diff($currentConstraintValues, $newEnumValues);

        if (empty($removedValues)) {
            DBEnum::$enum_cache[$table][$name] = $newEnumValues;
            return $newEnumValues;
        }

        $obsoleteValues = [];
        foreach ($removedValues as $value) {
            $exists = DB::prepared_query(
                "SELECT EXISTS(
                    SELECT 1 FROM \"{$table}\"
                    WHERE \"{$name}\" = ?
                    LIMIT 1
                ) as exists",
                [$value]
            )->value();

            if ($exists) {
                $obsoleteValues[] = $value;
            }
        }

        if (!empty($obsoleteValues)) {
            DB::alteration_message(
                "Found obsolete {$name} values in {$table}: " . implode(', ', $obsoleteValues),
                'notice'
            );
            $newEnumValues = array_merge($newEnumValues, $obsoleteValues);
        }

        // Cache and return
        DBEnum::$enum_cache[$table][$name] = $enumValues;

PR

Validations

  • You intend to implement the feature yourself
  • You have read the contributing guide
  • You strongly believe this feature should be in core, rather than being its own community module
  • You have checked for existing issues or pull requests related to this feature (and didn't find any)

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions