Skip to content

[FEATURE] Perform big data performance and regression tests #1862

@Yury-Fridlyand

Description

@Yury-Fridlyand

Is your feature request related to a problem?

Issue from the customer: https://opensearch.slack.com/archives/C0526AVT84S/p1689036508739229

For context, my_alias points to 800 indices, each index is sorted by hw_id and snapshot_day, each index has only one unique value of snapshot_day and each index has ~750 million records distributed on 10 shards of ~30GB each.

The cluster is in AWS, has 20 data nodes and 3 master nodes. All are r6g.12xlarge.search.

SQL query runs ~8 seconds:

select * from my_alias where hw_id = 'abcd' and (snapshot_day between cast('2023-04-27' as date) and cast('2023-05-03' as date) or snapshot_day between cast('2023-06-30' as date) and cast('2023-07-05' as date)) limit 10000

DSL equivalent runs ~4:

{
  "from": 0,
  "size": 10000,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "hw_id": "abcd"
          }
        },
        {
          "bool": {
            "should": [
              {
                "range": {
                  "snapshot_day": {
                    "gte": "2023-04-27",
                    "lte": "2023-05-03"
                  }
                }
              },
              {
                "range": {
                  "snapshot_day": {
                    "gte": "2023-06-30",
                    "lte": "2023-07-05"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

NOTE
Track down all slack discussion, the query was optimized and accelerated.

What solution would you like?

  • Get or generate a huge dataset
  • Allocate a cluster for tests
  • Make test framework (maybe reuse Jenkins)
  • Run the test and investigate this specific issue
  • Run more tests to find other bottlenecks
  • Re-run all tests on all OpenSearch releases to detect degradation
  • Update release workflow to repeat these tests before every code freeze

What alternatives have you considered?

  • Create Best Practices documentation section
  • Automatically optimize some functions and replace them by literals, e.g. DATE('...') to DATE '...', PI() to 3.1415 and so on to reduce or even completely avoid scripts in filters pushed down

Do you have any additional context?

Opened #1847

Metadata

Metadata

Assignees

No one assigned

    Labels

    ciinfrastructureChanges to infrastructure, testing, CI/CD, pipelines, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions