Link Search Menu Expand Document Documentation Menu

Memory-optimized search

Introduced 3.1

Memory-optimized search allows the Faiss engine to run efficiently without loading the entire vector index into off-heap memory. Without this optimization, Faiss typically loads the full index into memory, which can become unsustainable if the index size exceeds available physical memory. With memory-optimized search, the engine memory-maps the index file and relies on the operating system’s file cache to serve search requests. This approach avoids unnecessary I/O and allows repeated reads to be served directly from the system cache.

Memory-optimized search affects only search operations. Indexing behavior remains unchanged.

Limitations

The following limitations apply to memory-optimized search in OpenSearch:

If you use IVF or PQ, the engine loads data into memory regardless of whether memory-optimized mode is enabled.

Configuration

To enable memory-optimized search, set index.knn.memory_optimized_search to true when creating an index:

PUT /test_index
{
  "settings": {
    "index.knn": true,
    "index.knn.memory_optimized_search": true
  },
  "mappings": {
    "properties": {
      "vector_field": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
          "name": "hnsw",
          "engine": "faiss"
        }
      }
    }
  }
}

To enable memory-optimized search on an existing index, you must close the index, update the setting, and then reopen the index:

POST /test_index/_close

PUT /test_index/_settings
{
  "index.knn.memory_optimized_search": true
}

POST /test_index/_open

When you configure a field with on_disk mode and 1x compression, memory-optimized search is automatically enabled for that field, even if memory optimization isn’t enabled at the index level. For more information, see Memory-optimized vectors.

Memory-optimized search differs from disk-based search because it doesn’t use compression or quantization. It only changes how vector data is loaded and accessed during search.

Performance optimization

When memory-optimized search is enabled, the warm-up API loads only the essential information needed for search operations, such as opening streams to the underlying Faiss index file. This minimal warm-up results in:

  • Faster initial searches.
  • Reduced memory overhead.
  • More efficient resource utilization.

For fields where memory-optimized search is disabled, the warm-up process loads vectors into off-heap memory.

Next steps

350 characters left

Have a question? .

Want to contribute? or .