-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
While playing around with the new vector search endpoint I ran into two interesting edge cases with "cosine" similarity.
I myabe didn't read the docs too carefully so I used zero-valued vectors in the query which raised an assertion error on my locally started nodes
where we run with "-ea" turned on. If assertions are disabled, and start with this setup
DELETE my-index
PUT my-index
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3,
"index": true,
"similarity": "cosine"
},
"my_text" : {
"type" : "keyword"
}
}
}
}
PUT my-index/_doc/1
{
"my_text" : "text1",
"my_vector" : [3, 0, 6]
}
PUT my-index/_doc/2
{
"my_text" : "text2",
"my_vector" : [0, 1, 0]
}
POST my-index/_knn_search
{
"knn": {
"field": "my_vector",
"query_vector": [0, 0, 0],
"k": 2,
"num_candidates" : 100
}
}
I get
this error
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
[...]
Caused by: org.elasticsearch.ElasticsearchException$1: Index 2147483645 out of bounds for length 1
at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:639) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:410) [elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
... 20 more
Caused by: java.lang.IndexOutOfBoundsException: Index 2147483645 out of bounds for length 1
at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) ~[?:?]
at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) ~[?:?]
at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) ~[?:?]
at java.util.Objects.checkIndex(Objects.java:359) ~[?:?]
at org.apache.lucene.index.CodecReader.checkBounds(CodecReader.java:103) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
at org.apache.lucene.index.CodecReader.document(CodecReader.java:88) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:374) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:374) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
at org.elasticsearch.search.internal.FieldUsageTrackingDirectoryReader$FieldUsageTrackingLeafReader.document(FieldUsageTrackingDirectoryReader.java:123) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:374) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
at org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:448) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.fetch.FetchPhase.prepareNonNestedHitContext(FetchPhase.java:314) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:270) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:158) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:90) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:653) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:628) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:483) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
... 6 more
When running with assertions enabled I run into this check even earlier:
java.lang.AssertionError: null
» at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:75) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
» at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:274) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
» at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
» at org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
» at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:194) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:167) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:541) ~[lucene-core-9.0.0-snapshot-cc2a31f2be8.jar:9.0.0-snapshot-cc2a31f2be8 cc2a31f2be843935a67c0fdcd3478a65970e791d - mayyasharipova - 2021-11-02 14:36:40]
» at org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:233) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:187) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:88) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:458) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:621) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
» at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:483) ~[elasticsearch-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
I also noted that indexing a document with e.g. a [0,0,0] vector into a field with similarity "cosine" is possible and if matched returns "null" scores (or runs into an assertion error if -ea is enabled like in local tests etc...)
Just wanted to mention this, maybe its is useful to protect against this with some better error message etc...