Secure your LLM prompts with confidence
A TypeScript library for validating and securing LLM prompts. This package provides various guards to protect against common LLM vulnerabilities and misuse.
- Validate LLM prompts for various security concerns
- Support for multiple validation rules:
- PII detection
- Jailbreak detection
- Profanity filtering
- Prompt injection detection
- Relevance checking
- Toxicity detection
- Batch validation support
- CLI interface
- TypeScript support
npm install llm-guard
import { LLMGuard } from 'llm-guard';
const guard = new LLMGuard({
pii: true,
jailbreak: true,
profanity: true,
promptInjection: true,
relevance: true,
toxicity: true
});
// Single prompt validation
const result = await guard.validate('Your prompt here');
console.log(result);
// Batch validation
const batchResult = await guard.validateBatch([
'First prompt',
'Second prompt'
]);
console.log(batchResult);
# Basic usage
npx llm-guard "Your prompt here"
# With specific guards enabled
npx llm-guard --pii --jailbreak "Your prompt here"
# With a config file
npx llm-guard --config config.json "Your prompt here"
# Batch mode
npx llm-guard --batch '["First prompt", "Second prompt"]'
# Show help
npx llm-guard --help
You can configure which validators to enable when creating the LLMGuard instance:
const guard = new LLMGuard({
pii: true, // Enable PII detection
jailbreak: true, // Enable jailbreak detection
profanity: true, // Enable profanity filtering
promptInjection: true, // Enable prompt injection detection
relevance: true, // Enable relevance checking
toxicity: true, // Enable toxicity detection
customRules: { // Add custom validation rules
// Your custom rules here
},
relevanceOptions: { // Configure relevance guard options
minLength: 10, // Minimum text length
maxLength: 5000, // Maximum text length
minWords: 3, // Minimum word count
maxWords: 1000 // Maximum word count
}
});
Detects personally identifiable information like emails, phone numbers, SSNs, credit card numbers, and IP addresses.
Filters profanity and offensive language, including common character substitutions (like using numbers for letters).
Detects attempts to bypass AI safety measures and ethical constraints, such as "ignore previous instructions" or "pretend you are".
Identifies attempts to inject malicious instructions or override system prompts, including system prompt references and memory reset attempts.
Evaluates the relevance and quality of the prompt based on length, word count, filler words, and repetitive content.
Detects toxic, harmful, or aggressive content, including hate speech, threats, and discriminatory language.
Contributions are welcome! Please feel free to submit a Pull Request on GitHub. We appreciate any help with:
- Bug fixes
- New features
- Documentation improvements
- Code quality enhancements
- Test coverage
- Performance optimizations
- Fork the repository on GitHub
- Create a new branch for your feature or bugfix
- Make your changes
- Write or update tests as needed
- Ensure all tests pass
- Submit a Pull Request with a clear description of the changes
For more complex changes, please open an issue first to discuss the proposed changes.
For more detailed documentation, visit our documentation site.