LLM Cache Proxy is a FastAPI-based application that serves as a caching layer for OpenAI's API. It intercepts requests to the OpenAI API, caches responses, and serves cached responses for identical requests, potentially reducing API costs and improving response times.
# Run the container
docker run -p 9999:9999 \
-v $(pwd)/data:/app/data \
ghcr.io/so2liu/llm-cache-server:latestOR
# Run the container with custom API key
docker run -p 9999:9999 \
-e OPENAI_API_KEY=your_api_key_here \
-e OPENAI_BASE_URL=https://api.openai.com/v1 \
-v $(pwd)/data:/app/data \
ghcr.io/so2liu/llm-cache-server:latestThe proxy is now available at http://localhost:9999
Use /cache/chat/completions for cached requests
Use /chat/completions for uncached requests
- Caches responses from OpenAI's API
- Supports both streaming and non-streaming responses
- Compatible with OpenAI's chat completion endpoint
- Configurable via environment variables
- Dockerized for easy deployment
- Persistent cache storage
- Python 3.12+
- Docker (optional, for containerized deployment)
-
Clone the repository:
git clone https://github.com/so2liu/llm-cache-server.git cd llm-cache-server -
Install the required packages:
pip install -r requirements.txt
Set the following environment variables:
OPENAI_API_KEY: Your OpenAI API keyOPENAI_BASE_URL: The base URL for OpenAI's API (default: https://api.openai.com/v1)
You can set these in a .env file in the project root.
-
Start the server:
python -m app.main -
The server will be available at
http://localhost:9999
Recommended: Use pre-built image from GitHub Container Registry
docker run -p 9999:9999 \
-e OPENAI_API_KEY=your_api_key_here \
-e OPENAI_BASE_URL=https://api.openai.com/v1 \
-v $(pwd)/data:/app/data \
ghcr.io/so2liu/llm-cache-server:latestOr build locally:
-
Build the Docker image:
docker build -t llm-cache-server . -
Run the container with persistent storage:
docker run -p 9999:9999 \ -e OPENAI_API_KEY=your_api_key_here \ -e OPENAI_BASE_URL=https://api.openai.com/v1 \ -v $(pwd)/data:/app/data \ llm-cache-serverThis command mounts a
datadirectory from your current working directory to the/app/datadirectory in the container, ensuring that the cache persists between container restarts.
/chat/completions: Proxies requests to OpenAI's chat completion API without caching/cache/chat/completions: Proxies requests to OpenAI's chat completion API with caching
Both endpoints accept the same parameters as OpenAI's chat completion API.
To run the application in verbose mode, use the --verbose flag:
python -m app.main --verbose
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.