|
| 1 | +[role="xpack"] |
| 2 | +[[rank-vectors]] |
| 3 | +=== Rank Vectors |
| 4 | +++++ |
| 5 | +<titleabbrev> Rank Vectors </titleabbrev> |
| 6 | +++++ |
| 7 | +experimental::[] |
| 8 | + |
| 9 | +The `rank_vectors` field type enables late-interaction dense vector scoring in Elasticsearch. The number of vectors |
| 10 | +per field can vary, but they must all share the same number of dimensions and element type. |
| 11 | + |
| 12 | +The purpose of vectors stored in this field is second order ranking documents with max-sim similarity. |
| 13 | + |
| 14 | +Here is a simple example of using this field with `float` elements. |
| 15 | + |
| 16 | +[source,console] |
| 17 | +-------------------------------------------------- |
| 18 | +PUT my-rank-vectors-float |
| 19 | +{ |
| 20 | + "mappings": { |
| 21 | + "properties": { |
| 22 | + "my_vector": { |
| 23 | + "type": "rank_vectors" |
| 24 | + } |
| 25 | + } |
| 26 | + } |
| 27 | +} |
| 28 | +
|
| 29 | +PUT my-rank-vectors-float/_doc/1 |
| 30 | +{ |
| 31 | + "my_vector" : [[0.5, 10, 6], [-0.5, 10, 10]] |
| 32 | +} |
| 33 | +
|
| 34 | +-------------------------------------------------- |
| 35 | +// TESTSETUP |
| 36 | + |
| 37 | +In addition to the `float` element type, `byte` and `bit` element types are also supported. |
| 38 | + |
| 39 | +Here is an example of using this field with `byte` elements. |
| 40 | + |
| 41 | +[source,console] |
| 42 | +-------------------------------------------------- |
| 43 | +PUT my-rank-vectors-byte |
| 44 | +{ |
| 45 | + "mappings": { |
| 46 | + "properties": { |
| 47 | + "my_vector": { |
| 48 | + "type": "rank_vectors", |
| 49 | + "element_type": "byte" |
| 50 | + } |
| 51 | + } |
| 52 | + } |
| 53 | +} |
| 54 | +
|
| 55 | +PUT my-rank-vectors-byte/_doc/1 |
| 56 | +{ |
| 57 | + "my_vector" : [[1, 2, 3], [4, 5, 6]] |
| 58 | +} |
| 59 | +-------------------------------------------------- |
| 60 | + |
| 61 | +Here is an example of using this field with `bit` elements. |
| 62 | + |
| 63 | +[source,console] |
| 64 | +-------------------------------------------------- |
| 65 | +PUT my-rank-vectors-bit |
| 66 | +{ |
| 67 | + "mappings": { |
| 68 | + "properties": { |
| 69 | + "my_vector": { |
| 70 | + "type": "rank_vectors", |
| 71 | + "element_type": "bit" |
| 72 | + } |
| 73 | + } |
| 74 | + } |
| 75 | +} |
| 76 | +
|
| 77 | +POST /my-rank-vectors-bit/_bulk?refresh |
| 78 | +{"index": {"_id" : "1"}} |
| 79 | +{"my_vector": [127, -127, 0, 1, 42]} |
| 80 | +{"index": {"_id" : "2"}} |
| 81 | +{"my_vector": "8100012a7f"} |
| 82 | +-------------------------------------------------- |
| 83 | + |
| 84 | +[role="child_attributes"] |
| 85 | +[[rank-vectors-params]] |
| 86 | +==== Parameters for rank vectors fields |
| 87 | + |
| 88 | +The `rank_vectors` field type supports the following parameters: |
| 89 | + |
| 90 | +[[rank-vectors-element-type]] |
| 91 | +`element_type`:: |
| 92 | +(Optional, string) |
| 93 | +The data type used to encode vectors. The supported data types are |
| 94 | +`float` (default), `byte`, and bit. |
| 95 | + |
| 96 | +.Valid values for `element_type` |
| 97 | +[%collapsible%open] |
| 98 | +==== |
| 99 | +`float`::: |
| 100 | +indexes a 4-byte floating-point |
| 101 | +value per dimension. This is the default value. |
| 102 | +
|
| 103 | +`byte`::: |
| 104 | +indexes a 1-byte integer value per dimension. |
| 105 | +
|
| 106 | +`bit`::: |
| 107 | +indexes a single bit per dimension. Useful for very high-dimensional vectors or models that specifically support bit vectors. |
| 108 | +NOTE: when using `bit`, the number of dimensions must be a multiple of 8 and must represent the number of bits. |
| 109 | +
|
| 110 | +==== |
| 111 | + |
| 112 | +`dims`:: |
| 113 | +(Optional, integer) |
| 114 | +Number of vector dimensions. Can't exceed `4096`. If `dims` is not specified, |
| 115 | +it will be set to the length of the first vector added to the field. |
| 116 | + |
| 117 | +[[rank-vectors-synthetic-source]] |
| 118 | +==== Synthetic `_source` |
| 119 | + |
| 120 | +IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices |
| 121 | +(indices that have `index.mode` set to `time_series`). For other indices |
| 122 | +synthetic `_source` is in technical preview. Features in technical preview may |
| 123 | +be changed or removed in a future release. Elastic will work to fix |
| 124 | +any issues, but features in technical preview are not subject to the support SLA |
| 125 | +of official GA features. |
| 126 | + |
| 127 | +`rank_vectors` fields support <<synthetic-source,synthetic `_source`>> . |
| 128 | + |
| 129 | +[[rank-vectors-scoring]] |
| 130 | +==== Scoring with rank vectors |
| 131 | + |
| 132 | +Rank vectors can be accessed and used in <<query-dsl-script-score-query,`script_score` queries>>. |
| 133 | + |
| 134 | +For example, the following query scores documents based on the maxSim similarity between the query vector and the vectors stored in the `my_vector` field: |
| 135 | + |
| 136 | +[source,console] |
| 137 | +-------------------------------------------------- |
| 138 | +GET my-rank-vectors-float/_search |
| 139 | +{ |
| 140 | + "query": { |
| 141 | + "script_score": { |
| 142 | + "query": { |
| 143 | + "match_all": {} |
| 144 | + }, |
| 145 | + "script": { |
| 146 | + "source": "maxSimDotProduct(params.query_vector, 'my_vector')", |
| 147 | + "params": { |
| 148 | + "query_vector": [[0.5, 10, 6], [-0.5, 10, 10]] |
| 149 | + } |
| 150 | + } |
| 151 | + } |
| 152 | + } |
| 153 | +} |
| 154 | +-------------------------------------------------- |
| 155 | + |
| 156 | +Additionally, asymmetric similarity functions can be used to score against `bit` vectors. For example, the following query scores documents based on the maxSimDotProduct similarity between a floating point query vector and bit vectors stored in the `my_vector` field: |
| 157 | + |
| 158 | +[source,console] |
| 159 | +-------------------------------------------------- |
| 160 | +PUT my-rank-vectors-bit |
| 161 | +{ |
| 162 | + "mappings": { |
| 163 | + "properties": { |
| 164 | + "my_vector": { |
| 165 | + "type": "rank_vectors", |
| 166 | + "element_type": "bit" |
| 167 | + } |
| 168 | + } |
| 169 | + } |
| 170 | +} |
| 171 | +
|
| 172 | +POST /my-rank-vectors-bit/_bulk?refresh |
| 173 | +{"index": {"_id" : "1"}} |
| 174 | +{"my_vector": [127, -127, 0, 1, 42]} |
| 175 | +{"index": {"_id" : "2"}} |
| 176 | +{"my_vector": "8100012a7f"} |
| 177 | +
|
| 178 | +GET my-rank-vectors-bit/_search |
| 179 | +{ |
| 180 | + "query": { |
| 181 | + "script_score": { |
| 182 | + "query": { |
| 183 | + "match_all": {} |
| 184 | + }, |
| 185 | + "script": { |
| 186 | + "source": "maxSimDotProduct(params.query_vector, 'my_vector')", |
| 187 | + "params": { |
| 188 | + "query_vector": [ |
| 189 | + [0.35, 0.77, 0.95, 0.15, 0.11, 0.08, 0.58, 0.06, 0.44, 0.52, 0.21, |
| 190 | + 0.62, 0.65, 0.16, 0.64, 0.39, 0.93, 0.06, 0.93, 0.31, 0.92, 0.0, |
| 191 | + 0.66, 0.86, 0.92, 0.03, 0.81, 0.31, 0.2 , 0.92, 0.95, 0.64, 0.19, |
| 192 | + 0.26, 0.77, 0.64, 0.78, 0.32, 0.97, 0.84] |
| 193 | + ] <1> |
| 194 | + } |
| 195 | + } |
| 196 | + } |
| 197 | + } |
| 198 | +} |
| 199 | +-------------------------------------------------- |
| 200 | +<1> Note that the query vector has 40 elements, matching the number of bits in the bit vectors. |
| 201 | + |
0 commit comments