A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very efficient text streaming. Parameter presets, 8-bit mode. Layers splitting across GPU(s), CPU, and disk. CPU mode, FlexGen, DeepSpeed ZeRO-3, API with streaming and without streaming. LLaMA model, including 4-bit GPTQ. RWKV model, LoRA (loading and training), Softprompts, and extensions.
Features
- Dropdown menu for switching between models
- Notebook mode that resembles OpenAI's playground
- Chat mode for conversation and role playing
- Instruct mode compatible with Alpaca and Open Assistant formats
- Nice HTML output for GPT-4chan
- Markdown output for GALACTICA, including LaTeX rendering
- Advanced chat features (send images, get audio responses with TTS)
- Layers splitting across GPU(s), CPU, and disk