以下是一个完整的 Elasticsearch Ingest Pipeline 实现示例,用于在文档写入时调用外部向量生成服务。这个方案包含详细的代码、配置说明和优化建议:
1. 创建 Ingest Pipeline (Python 示例)
python
from elasticsearch import Elasticsearch from elasticsearch.client.ingest import IngestClient # 连接 Elasticsearch 集群 es = Elasticsearch( hosts=["https://your-es-cluster:9200"], http_auth=("username", "password"), verify_certs=True ) # 创建 Ingest Pipeline pipeline_body = { "description": "Generate text embeddings via external API", "processors": [ { "script": { "lang": "painless", "source": """ // 文本预处理:清理特殊字符,限制长度 def text = ctx.content_text; if (text != null) { // 移除非文字字符