Skip to content

LLM tehnique that stochastically injects tokens at inference time to redirect the Chain-of-Thought reasoning process

Notifications You must be signed in to change notification settings

stephen1cowley/doubt-injection

Repository files navigation

Doubt Injection

As part of MEng project "Mitigating Hallucinations in LLMs"

arXiv paper: (link coming soon)

Doubt Injection is a proposed technique that aims to encourage an LLM's Chain-of-Thought (CoT) to explore a wider set of ideas---motivated by observations of idea exploration in CoTs. This is done by randomly injecting a statement e.g. "But" at each new paragraph in the CoT. This can make marginal (but currently statistically insignificant) improvements of distilled DeepSeek on arithmetic reasoning (29.2% $\rightarrow$ 29.6% for DeepSeek-R1-Distill-Qwen-1.5B) and adversarial question datasets (26.1% $\rightarrow$ 26.7% for DeepSeek-R1-Distill-Qwen-32B).

Example:

Screenshot 2025-05-28 114406

The research investigates the effect of injection string, injection probability, temperature and model size on the performance of Doubt Injection compared with regular generation.

The paper also lays out a simple statistical framework to gauge the significance of LLM accuracy results, that we hope will be used more widely. For example for 1 run through a 200-question dataset, the claim that LLM B (73.0%) is better than LLM A (72.5%) is statistically insignificant: there is only 54% chance that LLM B has a higher true accuracy than LLM A. This comes from comparing posterior probability distributions over LLM accuracy.

Evaluation Code

This repository contains the evaluation scripts used in obtaining all results provided in the research paper, primarily on AIME 2024 and SimpleBench benchmarks. Additional scripts used to obtain motivating results, analyse the ideas explored in CoT responses, make changes to the formatting of result files are provided in additional_results/.

A small number of example LLM responses and experimental results are provided in responses/.

To run

First install PyTorch from the official website. Then:

pip install numpy pandas transformers protobuf sentencepiece

To run an evaluation on the AIME 2024 dataset:

python aime_eval.py --doubt_injection $1 --llm_name "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" --temperature_set "0.6" --injection_string "But"

To run a specific question evaluation on the SimpleBench dataset:

python simplebench_eval.py --q_id 2 --llm_name deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --temperature_set "0.6" --injection_string "But"

About

LLM tehnique that stochastically injects tokens at inference time to redirect the Chain-of-Thought reasoning process

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published