Memory-optimized search

Introduced 3.1

Memory-optimized search allows the Faiss engine to run efficiently without loading the entire vector index into off-heap memory. Without this optimization, Faiss typically loads the full index into memory, which can become unsustainable if the index size exceeds available physical memory. With memory-optimized search, the engine memory-maps the index file and relies on the operating system’s file cache to serve search requests. This approach avoids unnecessary I/O and allows repeated reads to be served directly from the system cache.

Memory-optimized search affects only search operations. Indexing behavior remains unchanged.

Limitations

The following limitations apply to memory-optimized search in OpenSearch:

Supported only for the Faiss engine with the HNSW method
Does not support IVF or product quantization (PQ)
Requires an index restart to enable or disable

If you use IVF or PQ, the engine loads data into memory regardless of whether memory-optimized mode is enabled.

Configuration

To enable memory-optimized search, set index.knn.memory_optimized_search to true when creating an index:

PUT /test_index
{
  "settings": {
    "index.knn": true,
    "index.knn.memory_optimized_search": true
  },
  "mappings": {
    "properties": {
      "vector_field": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
          "name": "hnsw",
          "engine": "faiss"
        }
      }
    }
  }
}

To enable memory-optimized search on an existing index, you must close the index, update the setting, and then reopen the index:

POST /test_index/_close

PUT /test_index/_settings
{
  "index.knn.memory_optimized_search": true
}

POST /test_index/_open

Integration with disk-based search

When you configure a field with on_disk mode and 1x compression, memory-optimized search is automatically enabled for that field, even if memory optimization isn’t enabled at the index level. For more information, see Memory-optimized vectors.

Memory-optimized search differs from disk-based search because it doesn’t use compression or quantization. It only changes how vector data is loaded and accessed during search.

Performance optimization

When memory-optimized search is enabled, the warm-up API loads only the essential information needed for search operations, such as opening streams to the underlying Faiss index file. This minimal warm-up results in:

Faster initial searches.
Reduced memory overhead.
More efficient resource utilization.

For fields where memory-optimized search is disabled, the warm-up process loads vectors into off-heap memory.

Next steps

Limitations
Configuration
Integration with disk-based search
Performance optimization
Next steps

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Memory-optimized search

Limitations

Configuration

Integration with disk-based search

Performance optimization

Next steps

OpenSearch Links

Get Involved

Resources

Contact Us