Important
A free large language model(LLM) plugin that allows you to interact with LLM in Neovim.
- Supports any LLM, such as GPT, GLM, Kimi, deepseek or local LLMs (such as ollama).
- Allows you to define your own AI tools, with different tools able to use different models.
- Most importantly, you can use free models provided by any platform (such as
Cloudflare
,GitHub models
,SiliconFlow
,openrouter
or other platforms).
Note
The configurations of different LLMs (such as ollama, deepseek), UI configurations, and AI tools (including code completion) should be checked in the examples first. Here you will find most of the information you want to know. Additionally, before using the plugin, you should ensure that your LLM_KEY
is valid and that the environment variable is in effect.
- virtual text
- blink.cmp or nvim-cmp
Streaming output | Non-streaming output
One-time, no history retained.
You can configure inline_assistant to decide whether to display diffs (default: show by pressing 'd').
You can configure inline_assistant to decide whether to display diffs (default: show by pressing 'd').
curl
-
Register on the official website and obtain your API Key (Cloudflare needs to obtain an additional account).
-
Set the
LLM_KEY
(Cloudflare needs to set an additionalACCOUNT
) environment variable in yourzshrc
orbashrc
.
export LLM_KEY=<Your API_KEY>
export ACCOUNT=<Your ACCOUNT> # just for cloudflare
Expand the table.
Platform | Link to obtain api key | Note |
---|---|---|
Cloudflare | https://dash.cloudflare.com/ | You can see all of Cloudflare's models here, with the ones marked as beta being free models. |
ChatGLM(智谱清言) | https://open.bigmodel.cn/ | |
Kimi(月之暗面) | Moonshot AI 开放平台 | |
Github Models | Github Token | |
siliconflow (硅基流动) | siliconflow | You can see all models on Siliconflow here, and select 'Only Free' to see all free models. |
Deepseek | https://platform.deepseek.com/api_keys | |
Openrouter | https://openrouter.ai/ | |
Chatanywhere | https://api.chatanywhere.org/v1/oauth/free/render | 200 free calls to GPT-4o-mini are available every day. |
For local llms, Set LLM_KEY
to NONE
in your zshrc
or bashrc
.
- lazy.nvim
{
"Kurama622/llm.nvim",
dependencies = { "nvim-lua/plenary.nvim", "MunifTanjim/nui.nvim"},
cmd = { "LLMSessionToggle", "LLMSelectedTextHandler", "LLMAppHandler" },
config = function()
require("llm").setup({
url = "https://models.inference.ai.azure.com/chat/completions",
model = "gpt-4o-mini",
api_type = "openai"
})
end,
keys = {
{ "<leader>ac", mode = "n", "<cmd>LLMSessionToggle<cr>" },
},
}
Cmd | Description |
---|---|
LLMSessionToggle |
Open/hide the Chat UI |
LLMSelectedTextHandler |
Handle the selected text, the way it is processed depends on the prompt words you input |
LLMAppHandler |
Call AI tools |
Expand the table.
Parameter | Description | Value |
---|---|---|
url | Model entpoint | String |
model | Model name | String |
api_type | Result parsing format | workers-ai | zhipu |openai | ollama |
timeout | The maximum timeout for a response (in seconds) | Number |
fetch_key | Function that returns the API key | Function |
max_tokens | Limits the number of tokens generated in a response. | Number |
temperature | From 0 to 1. The lower the number is, the more deterministic the response will be. The higher the number is the more creative the response will be, but moe likely to go off topic if it's too high |
Number |
top_p | A threshold(From 0 to 1). The higher the threshold is the more diverse and the less repetetive the response will be. (But it could also lead to less likely tokens which also means: off-topic responses.) |
Number |
enable_thinking | Activate the model's deep thinking ability (The model itself needs to ensure this feature.) | Boolean |
thinking_budget | The maximum length of the thinking process only takes effect when enable_thinking is true. | Number |
schema | Function-calling required function parameter description | Table |
functions_tbl | Function dict required for Function-calling | Table |
keep_alive | Maintain connection (usually for ollama) | see keep_alive/OLLAMA_KEEP_ALIVE |
streaming_handler | Customize the parsing format of the streaming output | Function |
parse_handler | Customize the parsing format for non-streaming output | Function |
Expand the table.
Style | Keyname | Description | Default: [mode] keymap |
Window |
---|---|---|---|---|
float | Input:Submit | Submit your question | [i] ctrl+g |
Input |
float | Input:Cancel | Cancel dialog response | [i] ctrl+c |
Input |
float | Input:Resend | Rerespond to the dialog | [i] ctrl+r |
Input |
float | Input:HistoryNext | Select the next session history | [i] ctrl+j |
Input |
float | Input:HistoryPrev | Select the previous session history | [i] ctrl+k |
Input |
float | Input:ModelsNext | Select the next model | [i] ctrl+shift+j |
Input |
float | Input:ModelsPrev | Select the previous model | [i] ctrl+shift+k |
Input |
split | Output:Ask | Open the input box In the normal mode of the input box, press Enter to submit your question) |
[n] i |
Output |
split | Output:Cancel | Cancel dialog response | [n] ctrl+c |
Output |
split | Output:Resend | Rerespond to the dialog | [n] ctrl+r |
Output |
float/split | Session:Toggle | Toggle session | [n] <leader>ac |
Input+Output |
float/split | Session:Close | Close session | [n] <esc> |
float : Input+Outputsplit : Output |
float/split | Session:Models | Open the model-list window | [n] ctrl+m |
float : App input windowsplit : Output |
split | Session:History | Open the history windowj : nextk : previous<cr> : select<esc> : close |
[n] ctrl+h |
Output |
float | Focus:Input | Jump from the output window to the input window | - | Output |
float | Focus:Output | Jump from the input window to the output window | - | Input |
float | PageUp | Output Window page up | [n/i] Ctrl+b |
Output |
float | PageDown | Output window page down | [n/i] Ctrl+f |
Output |
float | HalfPageUp | Output Window page up (half) | [n/i] Ctrl+u |
Output |
float | HalfPageDown | Output window page down (half) | [n/i] Ctrl+d |
Output |
float | JumpToTop | Jump to the top (output window) | [n] gg |
Output |
float | JumpToBottom | Jump to the bottom (output window) | [n] G |
Output |
Handler name | Description |
---|---|
side_by_side_handler | Display results in two windows side by side |
action_handler | Display results in the source file in the form of a diff |
qa_handler | AI for single-round dialogue |
flexi_handler | Results will be displayed in a flexible window (window size is automatically calculated based on the amount of output text) |
disposable_ask_handler | Flexible questioning, you can choose a piece of code to ask about, or you can ask directly (the current buffer is the context) |
attach_to_chat_handler | Attach the selected content to the context and ask a question. |
completion_handler | Code completion |
curl_request_handler | The simplest interaction between curl and LLM is generally used to query account balance or available model lists, etc. |
Each handler's parameters can be referred to here.
Examples can be seen AI Tools Configuration
See UI Configuration and nui/popup
Local LLMs require custom parsing functions; for streaming output, we use our custom streaming_handler
; for AI tools that return output results in one go, we use our custom parse_handler
.
Below is an example of ollama
running llama3.2:1b
.
Expand the code.
local function local_llm_streaming_handler(chunk, ctx, F)
if not chunk then
return ctx.assistant_output
end
local tail = chunk:sub(-1, -1)
if tail:sub(1, 1) ~= "}" then
ctx.line = ctx.line .. chunk
else
ctx.line = ctx.line .. chunk
local status, data = pcall(vim.fn.json_decode, ctx.line)
if not status or not data.message.content then
return ctx.assistant_output
end
ctx.assistant_output = ctx.assistant_output .. data.message.content
F.WriteContent(ctx.bufnr, ctx.winid, data.message.content)
ctx.line = ""
end
return ctx.assistant_output
end
local function local_llm_parse_handler(chunk)
local assistant_output = chunk.message.content
return assistant_output
end
return {
{
"Kurama622/llm.nvim",
dependencies = { "nvim-lua/plenary.nvim", "MunifTanjim/nui.nvim" },
cmd = { "LLMSessionToggle", "LLMSelectedTextHandler" },
config = function()
require("llm").setup({
url = "http://localhost:11434/api/chat", -- your url
model = "llama3.2:1b",
streaming_handler = local_llm_streaming_handler,
app_handler = {
WordTranslate = {
handler = tools.flexi_handler,
prompt = "Translate the following text to Chinese, please only return the translation",
opts = {
parse_handler = local_llm_parse_handler,
exit_on_move = true,
enter_flexible_window = false,
},
},
}
})
end,
keys = {
{ "<leader>ac", mode = "n", "<cmd>LLMSessionToggle<cr>" },
},
}
}
We would like to express our heartfelt gratitude to the contributors of the following open-source projects, whose code has provided invaluable inspiration and reference for the development of llm.nvim:
- olimorris/codecompanion.nvim: Diff style and prompt.
- SmiteshP/nvim-navbuddy: UI.
- milanglacier/minuet-ai.nvim: Code completions.