Memory-optimized search
Introduced 3.1
Memory-optimized search allows the Faiss engine to run efficiently without loading the entire vector index into off-heap memory. Without this optimization, Faiss typically loads the full index into memory, which can become unsustainable if the index size exceeds available physical memory. With memory-optimized search, the engine memory-maps the index file and relies on the operating system’s file cache to serve search requests. This approach avoids unnecessary I/O and allows repeated reads to be served directly from the system cache.
Memory-optimized search affects only search operations. Indexing behavior remains unchanged.
Limitations
The following limitations apply to memory-optimized search in OpenSearch:
- Supported only for the Faiss engine with the HNSW method
- Does not support IVF or product quantization (PQ)
- Requires an index restart to enable or disable
If you use IVF or PQ, the engine loads data into memory regardless of whether memory-optimized mode is enabled.
Configuration
To enable memory-optimized search, set index.knn.memory_optimized_search
to true
when creating an index:
PUT /test_index
{
"settings": {
"index.knn": true,
"index.knn.memory_optimized_search": true
},
"mappings": {
"properties": {
"vector_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"engine": "faiss"
}
}
}
}
}
To enable memory-optimized search on an existing index, you must close the index, update the setting, and then reopen the index:
POST /test_index/_close
PUT /test_index/_settings
{
"index.knn.memory_optimized_search": true
}
POST /test_index/_open
Integration with disk-based search
When you configure a field with on_disk
mode and 1x
compression, memory-optimized search is automatically enabled for that field, even if memory optimization isn’t enabled at the index level. For more information, see Memory-optimized vectors.
Memory-optimized search differs from disk-based search because it doesn’t use compression or quantization. It only changes how vector data is loaded and accessed during search.
Performance optimization
When memory-optimized search is enabled, the warm-up API loads only the essential information needed for search operations, such as opening streams to the underlying Faiss index file. This minimal warm-up results in:
- Faster initial searches.
- Reduced memory overhead.
- More efficient resource utilization.
For fields where memory-optimized search is disabled, the warm-up process loads vectors into off-heap memory.