Skip to content

Edge cases with 0-vectors in knn_search #81167

@cbuescher

Description

@cbuescher

While playing around with the new vector search endpoint I ran into two interesting edge cases with "cosine" similarity.
I myabe didn't read the docs too carefully so I used zero-valued vectors in the query which raised an assertion error on my locally started nodes
where we run with "-ea" turned on. If assertions are disabled, and start with this setup

DELETE my-index

PUT my-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3,
        "index": true,
        "similarity": "cosine"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my-index/_doc/1
{
  "my_text" : "text1",
  "my_vector" : [3, 0, 6]
}

PUT my-index/_doc/2
{
  "my_text" : "text2",
  "my_vector" : [0, 1, 0]
}


POST my-index/_knn_search
{
  "knn": {
    "field": "my_vector",
    "query_vector": [0, 0, 0],
    "k": 2,
    "num_candidates" : 100
  }
}

I get

this error

org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        [...]
Caused by: org.elasticsearch.ElasticsearchException$1: Index 2147483645 out of bounds for length 1
        at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:639) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:410) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        ... 20 more
Caused by: java.lang.IndexOutOfBoundsException: Index 2147483645 out of bounds for length 1
        at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) ~[?:?]
        at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) ~[?:?]
        at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) ~[?:?]
        at java.util.Objects.checkIndex(Objects.java:359) ~[?:?]
        at org.apache.lucene.index.CodecReader.checkBounds(CodecReader.java:103) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
        at org.apache.lucene.index.CodecReader.document(CodecReader.java:88) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
        at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:374) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
        at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:374) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
        at org.elasticsearch.search.internal.FieldUsageTrackingDirectoryReader$FieldUsageTrackingLeafReader.document(FieldUsageTrackingDirectoryReader.java:123) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:374) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
        at org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:448) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.fetch.FetchPhase.prepareNonNestedHitContext(FetchPhase.java:314) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:270) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:158) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:90) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:653) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:628) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:483) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
        ... 6 more

When running with assertions enabled I run into this check even earlier:

java.lang.AssertionError: null
»       at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:75) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
»       at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:274) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
»       at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
»       at org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
»       at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:194) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:167) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:541) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
»       at org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:233) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:187) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:88) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:458) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:621) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
»       at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:483) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]

I also noted that indexing a document with e.g. a [0,0,0] vector into a field with similarity "cosine" is possible and if matched returns "null" scores (or runs into an assertion error if -ea is enabled like in local tests etc...)

Just wanted to mention this, maybe its is useful to protect against this with some better error message etc...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions