anythingLLM结合searXNG实现联网搜索

1、docker-compose 部署searXNG

GitHub - searxng/searxng-docker: The docker-compose files for setting up a SearXNG instance with docker.

cd /usr/local
git clone https://github.com/searxng/searxng-docker.git
cd searxng-docker

2、修改 .env文件

# By default listen on https://localhost
# To change this:
# * uncomment SEARXNG_HOSTNAME, and replace <host> by the SearXNG hostname
# * uncomment LETSENCRYPT_EMAIL, and replace <email> by your email (require to create a Let's Encrypt certificate)

SEARXNG_HOSTNAME=172.16.50.25
LETSENCRYPT_EMAIL=381599113@qq.com

# Optional:
# If you run a very small or a very large instance, you might want to change the amount of used uwsgi workers and threads per worker
# More workers (= processes) means that more search requests can be handled at the same time, but it also causes more resource usage

# SEARXNG_UWSGI_WORKERS=4
# SEARXNG_UWSGI_THREADS=4

3、执行下面命令,生成  secret key 

sed -i "s|ultrasecretkey|$(openssl rand -hex 32)|g" searxng/settings.yml

4、修改 Caddyfile,防止 caddy 默认配置的 80端口冲突

{
	admin off
    http_port 8880
	log {
		output stderr
		format filter {
			# Preserves first 8 bits from IPv4 and 32 bits from IPv6
			request>remote_ip ip_mask 8 32
			request>client_ip ip_mask 8 32

			# Remove identificable information
			request>remote_port delete
			request>headers delete
			request>uri query {
				delete url
				delete h
				delete q
			}
		}
	}
}
172.16.50.25:8880 {


tls {$SEARXNG_TLS}

encode zstd gzip

@api {
	path /config
	path /healthz
	path /stats/errors
	path /stats/checker
}

@search {
	path /search
}

@imageproxy {
	path /image_proxy
}

@static {
	path /static/*
}

header {
	# CSP (https://content-security-policy.com)
	Content-Security-Policy "upgrade-insecure-requests; default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; form-action 'self' https://github.com/searxng/searxng/issues/new; font-src 'self'; frame-ancestors 'self'; base-uri 'self'; connect-src 'self' https://overpass-api.de; img-src * data:; frame-src https://www.youtube-nocookie.com https://player.vimeo.com https://www.dailymotion.com https://www.deezer.com https://www.mixcloud.com https://w.soundcloud.com https://embed.spotify.com;"

	# Disable some browser features
	Permissions-Policy "accelerometer=(),camera=(),geolocation=(),gyroscope=(),magnetometer=(),microphone=(),payment=(),usb=()"

	# Set referrer policy
	Referrer-Policy "no-referrer"

	# Force clients to use HTTPS
	Strict-Transport-Security "max-age=31536000"

	# Prevent MIME type sniffing from the declared Content-Type
	X-Content-Type-Options "nosniff"

	# X-Robots-Tag (comment to allow site indexing)
	X-Robots-Tag "noindex, noarchive, nofollow"

	# Remove "Server" header
	-Server
}

header @api {
	Access-Control-Allow-Methods "GET, OPTIONS"
	Access-Control-Allow-Origin "*"
}

route {
	# Cache policy
	header Cache-Control "max-age=0, no-store"
	header @search Cache-Control "max-age=5, private"
	header @imageproxy Cache-Control "max-age=604800, public"
	header @static Cache-Control "max-age=31536000, public, immutable"
}

# SearXNG (uWSGI)
reverse_proxy localhost:8080 {
	header_up X-Forwarded-Port {http.request.port}
	header_up X-Real-IP {http.request.remote.host}

	# https://github.com/searx/searx-docker/issues/24
	header_up Connection "close"
}
}

5、docker-compose配置文件

version: "3.7"

services:
  caddy:
    container_name: caddy
    image: docker.io/library/caddy:2-alpine
    network_mode: host
    restart: always
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data:rw
      - caddy-config:/config:rw
    environment:
      - SEARXNG_HOSTNAME=${SEARXNG_HOSTNAME:-http://localhost}
      - SEARXNG_TLS=${LETSENCRYPT_EMAIL:-internal}
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "1"

  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine
    command: valkey-server --save 30 1 --loglevel warning
    restart: always
    networks:
      - searxng
    volumes:
      - valkey-data2:/data
    cap_drop:
      - ALL
    cap_add:
      - SETGID
      - SETUID
      - DAC_OVERRIDE
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: always
    networks:
      - searxng
    ports:
      - "8080:8080"
    volumes:
      - ./searxng:/etc/searxng:rw
    environment:
      - SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/
      - UWSGI_WORKERS=${SEARXNG_UWSGI_WORKERS:-4}
      - UWSGI_THREADS=${SEARXNG_UWSGI_THREADS:-4}
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "1"

networks:
  searxng:

volumes:
  caddy-data:
  caddy-config:
  valkey-data2:

如果不需要反向代理,其中的caddy配置可以删除,

还有SEARXNG_BASE_URL 是 SearXNG 服务的基础 URL 配置参数,主要用于定义实例对外访问的协议、域名和端口信息‌。该参数直接影响搜索结果链接生成、静态资源加载路径以及反向代理的配置适配性,是部署时确保内外网访问一致性的关键配置项。

本地调试场景
若仅本地访问且未启用 HTTPS,可设置为:

  - SEARXNG_BASE_URL=http://172.16.50.25:8080/

On the first run, you must remove cap_drop: - ALL from the docker-compose.yaml file for the searxng service to successfully create /etc/searxng/uwsgi.ini. This is necessary because the cap_drop: - ALL directive removes all capabilities, including those required for the creation of the uwsgi.ini file. After the first run, you should re-add cap_drop: - ALL to the docker-compose.yaml file for security reasons.

6、配置  settings.yml

项目默认的搜索返回的格式为 html 格式,在使用网络爬虫或其它形式的分析器调用 API 时,希望返回 json 格式,这时就需要修改返回格式。

添加以下内容

search:
  formats:
  - html
  - csv
  - json
  - rss

整体内容如下

# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
  # base_url is defined in the SEARXNG_BASE_URL environment variable, see .env and docker-compose.yml
  secret_key: "58e0507b6a5428c2f87e26fc83509028f53a12daa1800448dcd7c47c19c7bb6e"  # change this!
  limiter: false  # can be disabled for a private instance
  image_proxy: true
ui:
  static_use_hash: true
redis:
  url: redis://redis:6379/0
  
search:
  formats:
  - html
  - csv
  - json
  - rss

配置参考 

Step by step installation — SearXNG Documentation (2025.3.22+5986629c6)

测试 json返回

http://172.16.50.25:8080/search?q=deepseek&format=json

7、anythingLLM配置 websearch

8、工作空间配置

9、使用

### AnythingLLM联网搜索能力分析 当前关于 DeepSeek V2 系列的描述表明,该系列模型已经具备了联网搜索的功能[^1]。然而,在讨论 AnythingLLM 时需要注意的是,其具体实现方式可能有所不同,因为 AnythingLLM 并未直接提及于上述引用中。 通常情况下,大语言模型(LLMs)要实现联网搜索功能,可以通过以下几种技术手段: #### 1. **Post-Training 技术** DeepSeek 提到通过 Post-Training 方法提升了多个方面的性能,其中包括新增加的联网搜索功能。这意味着模型经过特定训练后能够更好地理解网络数据并将其融入生成的内容之中。对于 AnythingLLM 来说,如果它也采用了类似的策略,则可以推测其实现机制可能是基于对实时抓取的数据进行处理后再反馈给用户的过程。 #### 2. **混合架构设计** 另一种常见的做法是采用本地计算与在线资源相结合的方式。正如第二个引用所提到,“本地+联网”的混合架构被用于创建更加灵活且可靠的应用程序[^2]。在这种模式下,核心推理逻辑仍然运行在用户的设备上或者私有服务器内,而外部互联网上的最新信息则作为补充材料提供给 LLM 使用前先下载下来再输入进去完成最终输出结果形成过程。 以下是简单的伪代码展示如何将这两个部分结合起来以达到目的: ```python def anythingllm_query(prompt, enable_web_search=False): local_model_output = run_local_llm(prompt) # 执行本地模型推断 if not enable_web_search: return local_model_output web_results = perform_web_search(prompt) # 对提示词执行网页查询操作获取相关内容片段列表形式返回 combined_input = merge_outputs(local_model_output, web_results) # 将两者合并起来构成新的上下文环境供进一步加工利用 final_answer = refine_with_additional_context(combined_input) # 利用额外加入的信息重新调整答案质量等级直至满意为止 return final_answer ``` 此函数展示了当启用了 `enable_web_search` 参数之后,除了依赖原有的离线版本外还会尝试从公开可用渠道收集附加资料来辅助决策制定流程从而提高整体表现水平。 --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

非ban必选

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值