Skip to content

Commit 85f7fd9

Browse files
committed
Knn vector rescoring to sort score docs (elastic#122653)
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of doc ids and corresponding array including scores fo such docs. A binary search is performed on top of the docs array, and such global ids are converted back to segment level ids (subtracting the context docbase) when scoring docs. RescoreKnnVectoryQuery did not sort the array of docs which caused binary search to return non deterministic results, which in turn made us look up wrong docs, something using out of bound ids. One symptom of this was observed in a DFSProfilerIT test failure which triggered a Lucene assertion around doc id being outside of the range of the bitset of live docs. The fix is to simply sort the score docs array before extracting docs ids and scores and providing them to KnnScoreDocQuery upon rewrite. Relates to elastic#116663 Closes elastic#119711
1 parent 1ff3021 commit 85f7fd9

File tree

3 files changed

+8
-3
lines changed

3 files changed

+8
-3
lines changed

docs/changelog/122653.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 122653
2+
summary: Knn vector rescoring to sort score docs
3+
area: Vector Search
4+
type: bug
5+
issues:
6+
- 119711

muted-tests.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,6 @@ tests:
162162
issue: https://github.com/elastic/elasticsearch/issues/117740
163163
- class: org.elasticsearch.xpack.security.authc.ldap.MultiGroupMappingIT
164164
issue: https://github.com/elastic/elasticsearch/issues/119599
165-
- class: org.elasticsearch.search.profile.dfs.DfsProfilerIT
166-
method: testProfileDfs
167-
issue: https://github.com/elastic/elasticsearch/issues/119711
168165
- class: org.elasticsearch.multi_cluster.MultiClusterYamlTestSuiteIT
169166
issue: https://github.com/elastic/elasticsearch/issues/119983
170167
- class: org.elasticsearch.xpack.test.rest.XPackRestIT

server/src/main/java/org/elasticsearch/search/vectors/RescoreKnnVectorQuery.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323

2424
import java.io.IOException;
2525
import java.util.Arrays;
26+
import java.util.Comparator;
2627
import java.util.Objects;
2728

2829
/**
@@ -60,6 +61,7 @@ public Query rewrite(IndexSearcher searcher) throws IOException {
6061
TopDocs topDocs = searcher.search(query, k);
6162
vectorOperations = topDocs.totalHits.value();
6263
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
64+
Arrays.sort(scoreDocs, Comparator.comparingInt(scoreDoc -> scoreDoc.doc));
6365
int[] docIds = new int[scoreDocs.length];
6466
float[] scores = new float[scoreDocs.length];
6567
for (int i = 0; i < scoreDocs.length; i++) {

0 commit comments

Comments
 (0)