-
Notifications
You must be signed in to change notification settings - Fork 163
Open
Labels
ciinfrastructureChanges to infrastructure, testing, CI/CD, pipelines, etc.Changes to infrastructure, testing, CI/CD, pipelines, etc.
Description
Is your feature request related to a problem?
Issue from the customer: https://opensearch.slack.com/archives/C0526AVT84S/p1689036508739229
For context,
my_alias
points to 800 indices, each index is sorted byhw_id
andsnapshot_day
, each index has only one unique value ofsnapshot_day
and each index has ~750 million records distributed on 10 shards of ~30GB each.The cluster is in AWS, has 20 data nodes and 3 master nodes. All are
r6g.12xlarge.search
.
SQL query runs ~8 seconds:
select * from my_alias where hw_id = 'abcd' and (snapshot_day between cast('2023-04-27' as date) and cast('2023-05-03' as date) or snapshot_day between cast('2023-06-30' as date) and cast('2023-07-05' as date)) limit 10000
DSL equivalent runs ~4:
{
"from": 0,
"size": 10000,
"query": {
"bool": {
"filter": [
{
"term": {
"hw_id": "abcd"
}
},
{
"bool": {
"should": [
{
"range": {
"snapshot_day": {
"gte": "2023-04-27",
"lte": "2023-05-03"
}
}
},
{
"range": {
"snapshot_day": {
"gte": "2023-06-30",
"lte": "2023-07-05"
}
}
}
]
}
}
]
}
}
}
NOTE
Track down all slack discussion, the query was optimized and accelerated.
What solution would you like?
- Get or generate a huge dataset
- Allocate a cluster for tests
- Make test framework (maybe reuse Jenkins)
- Run the test and investigate this specific issue
- Run more tests to find other bottlenecks
- Re-run all tests on all OpenSearch releases to detect degradation
- Update release workflow to repeat these tests before every code freeze
What alternatives have you considered?
- Create
Best Practices
documentation section - Automatically optimize some functions and replace them by literals, e.g.
DATE('...')
toDATE '...'
,PI()
to3.1415
and so on to reduce or even completely avoid scripts in filters pushed down
Do you have any additional context?
Opened #1847
Metadata
Metadata
Assignees
Labels
ciinfrastructureChanges to infrastructure, testing, CI/CD, pipelines, etc.Changes to infrastructure, testing, CI/CD, pipelines, etc.