Range Search) search.-CSDN博客

Range Search) search.

"radius": 0.9, "range_filter": 0.5
For COSINE metric, radius is the minimum distance (similarity) to return. range_filter is the maximum distance (similarity) to return.
For IP metric, radius is the minimum distance (similarity) to return. range_filter is the maximum distance (similarity) to return.
For L2 metric, radius is the maximum distance (similarity) to return. range_filter is the minimum distance (similarity) to return.

10. Upsert (update or insert) data. Upsert is a combination of insert and update. If the primary key exists, the row is updated. If the primary key does not exist, the row is inserted.

Upsert is useful for incremental updates to your data. For example, if you have a collection of news articles, you can upsert new articles as they are published.
Upsert is not atomic. If the upsert fails, some rows may have been updated and others not.
Upsert is not supported in Milvus Client.

11. Query data. Query is similar to search, but instead of returning vectors, it returns the data associated with the vectors.

Query is useful for retrieving data that matches specific metadata filters.
Query is not supported in Milvus Client.

Example Notebooks

Milvus Client - No-schema wrapper around Milvus collection.
Milvus Collection - Schema-based collection.
Zilliz Pipelines RAG - Quick way to try out Milvus with built-in embedding model and retrieval quality.

Learning Resources

Community & Help

Milvus向量数据库核心技术解析与最佳实践指南

前言

在人工智能和大数据时代，非结构化数据处理能力成为关键。Milvus作为一款开源的向量数据库，专为高效存储、索引和查询嵌入向量而设计，已成为构建AI应用的重要基础设施。本文将深入解析Milvus的核心技术架构，并提供从入门到生产环境的最佳实践指南。

一、Milvus核心概念解析

1.1 什么是向量数据库

向量数据库是一种专门用于存储和检索向量数据的数据库系统。与传统数据库不同，它能够高效处理由深度学习模型生成的嵌入向量，这些向量代表了非结构化数据（如文本、图像、视频等）的语义特征。

关键特性：

支持高维向量存储（通常1024维或更高）
提供近似最近邻(ANN)搜索算法
专为大规模向量相似性搜索优化

1.2 Milvus的架构设计

Milvus采用分层架构设计，各层可独立扩展：

接入层：提供RESTful和gRPC接口
协调服务层：负责集群管理和任务调度
工作节点层：执行实际的查询和索引构建
存储层：持久化数据和日志

这种设计使得Milvus具备良好的水平扩展能力和容错性。

二、快速入门指南

2.1 环境准备

Milvus支持多种部署方式：

本地开发：Docker容器或Kubernetes集群
生产环境：云托管服务（支持AWS、GCP、Azure等）

对于初学者，建议从Docker方式开始：

docker pull milvusdb/milvus:latest
docker run -d --name milvus -p 19530:19530 milvusdb/milvus:latest

2.2 基本操作流程

连接数据库

from pymilvus import connections
connections.connect("default", host="localhost", port="19530")

创建集合(Collection)

from pymilvus import CollectionSchema, FieldSchema, DataType, Collection

# 定义字段
fields = [
    FieldSchema("id", DataType.INT64, is_primary=True),
    FieldSchema("vector", DataType.FLOAT_VECTOR, dim=768)
]

# 创建集合
schema = CollectionSchema(fields)
collection = Collection("my_collection", schema)

插入数据

import random
vectors = [[random.random() for _ in range(768)] for _ in range(1000)]
collection.insert([list(range(1000)), vectors])

构建索引

index_params = {
    "index_type": "IVF_FLAT",
    "metric_type": "L2",
    "params": {"nlist": 128}
}
collection.create_index("vector", index_params)

执行搜索

search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
results = collection.search([query_vector], "vector", search_params, limit=5)

三、核心技术深度解析

3.1 索引类型与选择策略

Milvus支持多种向量索引算法：

| 索引类型 | 适用场景 | 特点 | |---------|---------|------| | IVF_FLAT | 中等规模数据集 | 平衡精度与性能 | | HNSW | 大规模数据集 | 高召回率，内存占用较大 | | ANNOY | 超大规模数据集 | 支持内存映射，适合磁盘存储 | | SCANN | 量化压缩场景 | 高压缩比，适合移动端 |

选择建议：