Skip to content

taxids created with create-taxdump skip numbers #59

@apcamargo

Description

@apcamargo

When you create a taxdump using create-taxdump (ICTV taxonomy, for example), the taxids "skip" some numbers. For example:

$ head ictv-taxdump/names.dmp
1	|	root	|		|	scientific name	|
287205	|	Hoswirudivirus MRV1	|		|	scientific name	|
287935	|	Shomudavirus limadaptatum	|		|	scientific name	|
1096518	|	Sclerotimonavirus betaclarireediae	|		|	scientific name	|
1138752	|	Potato virus H	|		|	scientific name	|
1536674	|	Rhopapillomavirus 1	|		|	scientific name	|
1845995	|	Monomorium pharaonis virus 1	|		|	scientific name	|
1890985	|	Aquamavirus A	|		|	scientific name	|
2079526	|	Hylipavirus	|		|	scientific name	|
2290567	|	Fattrevirus	|		|	scientific name	|

This is not a problem in itself, as the nodes are still connected. However, this causes a bug when you try to create a MMSeqs2 taxonomy database using the custom taxonomy, as it apparently assumes that numbers are not skipped (unless they are in delnodes.dmp and merged.dmp, I guess).

I wrote a script that mapped taxids such that no number is skipped and it solved the issue.

$ head ictv-taxdump/names.dmp
1	|	root	|		|	scientific name	|
2	|	Hoswirudivirus MRV1	|		|	scientific name	|
3	|	Shomudavirus limadaptatum	|		|	scientific name	|
4	|	Sclerotimonavirus betaclarireediae	|		|	scientific name	|
5	|	Potato virus H	|		|	scientific name	|
6	|	Rhopapillomavirus 1	|		|	scientific name	|
7	|	Monomorium pharaonis virus 1	|		|	scientific name	|
8	|	Aquamavirus A	|		|	scientific name	|
9	|	Hylipavirus	|		|	scientific name	|
10	|	Fattrevirus	|		|	scientific name	|

This is not a TaxonKit bug in any way. But because MMSeqs2 is pretty popular, I thought it was best to report this here in case anyone else faces the same issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions