A modern, extensible research automation tool that leverages LLMs, web search, and browser-based extraction to generate actionable, professional reports from the web.
- Automated Research Pipeline: Generates focused queries, scrapes the web, summarizes findings, and recursively explores follow-up questions.
-
Prompt-Driven Extraction: Uses an editable prompt file (
prompts/browser-use.md
) for browser-based content extraction—customize instructions without code changes. - Multi-Model Support: Choose your preferred OpenAI and BrowserUse LLM models at runtime.
- Flexible CLI: Control depth, breadth, concurrency, and models from the command line.
-
Professional Reports: Automatically generates a clean, markdown-formatted
REPORT.md
with findings and references after each run. -
Environment-First: All API keys are loaded from
.env
for security and flexibility.
-
Installation For end-users, install from npm:
npm install alchemy-deep-research
For development (to run from source or contribute):
git clone https://github.com/your-username/alchemy-deep-research.git # Replace with your repo URL cd alchemy-deep-research npm install # or npm ci
-
Set up environment variables Create a
.env
file in the root directory:OPENAI_API_KEY=your-openai-key FIRECRAWL_KEY=your-firecrawl-key BROWSER_USE_API_KEY=your-browser-use-key
-
Edit the browser extraction prompt (optional)
- Customize
prompts/browser-use.md
to change how web content is extracted and summarized.
- Customize
-
Run a research query After installing the package locally in your project (
npm install alchemy-deep-research
):npx alchemy-deep-research \ --query "How do LLMs handle long-term memory?" \ --openai-model "gpt-4o-mini" \ --browser-model "gpt-4.1-mini"
If you installed the package globally (
npm install -g alchemy-deep-research
), you can omitnpx
:alchemy-deep-research --query "Your query here"
To run from source (after cloning and
npm install
in the package directory):npx ts-node src/cli.ts --query "Your query here" # or npm run build && node dist/cli.js --query "Your query here"
The tool generates reports in a structured report/
directory:
-
report/raw/
: Contains detailed LLM outputs for each research query and each piece of browser-enhanced content.- Files are named based on the sanitized query, e.g.,
[sanitized_query].md
(learnings) and[sanitized_query].json
(full digest). - Enhanced content outputs are named like
[sanitized_query]_enhanced_[identifier].md/.json
.
- Files are named based on the sanitized query, e.g.,
-
report/cleaned_report/
: Contains the final, consolidated summary report.- Named
[sanitized_query]_summary_report.md
. - If the
--json
flag is used,[sanitized_query]_summary_report.json
is also created here.
- Named
- Console Output: Key logs, extraction steps, and the final summary report are also printed to the terminal.
Option | Description | Default |
---|---|---|
--query |
Main research question or topic | "How do LLMs handle long-term memory?" |
--openai-model |
OpenAI model for LLM and report (e.g. gpt-4o-mini) | gpt-4o-mini |
--browser-model |
BrowserUse model for extraction | gpt-4.1-mini |
--depth |
Recursion depth (levels of follow-up) | 1 |
--breadth |
Breadth (queries per level) | 1 |
--concurrency |
Number of queries to process in parallel | 1 |
--json |
Also save the research results and report as JSON | false |
-
Prompt Engineering: Edit
prompts/browser-use.md
to control how browser-based extraction works. -
Pipeline Logic: Extend or modify the pipeline in
src/pipeline.ts
. -
Report Formatting: Tweak
src/report.ts
for advanced report customization.
- API Rate Limits: The pipeline logs rate limit errors and retries automatically.
- Empty Results: Try adjusting your query, increasing breadth/depth, or using a different model.
-
Missing Keys: Ensure your
.env
file is correctly formatted (no quotes around values).
MIT
- Built with OpenAI, Firecrawl, and BrowserUse APIs.
- Inspired by the need for fast, deep, and actionable research.
You can customize your research run with the following options:
Add the --json
flag to also save the research results and the markdown report to REPORT.json
:
alchemy-deep-research \
--query "How do LLMs handle long-term memory?" \
--openai-model "gpt-4o-mini" \
--browser-model "gpt-4.1-mini" \
--json
This will produce how_do_llms_handle_long_term_memory_summary_report.md
and how_do_llms_handle_long_term_memory_summary_report.json
in the report/cleaned_report/
directory.
alchemy-deep-research \
--query "Latest advancements in AI ethics" \
--openai-model "gpt-4o" \
--browser-model "gpt-4.1-mini" \
--depth 2 \
--breadth 3 \
--concurrency 4 \
--json
When you run the tool, it generates a comprehensive markdown report like this example (from a query about LLMs and long-term memory):
# REPORT: How Do LLMs Handle Long-Term Memory?
## Introduction
Large Language Models (LLMs) have transformed the landscape of artificial intelligence and natural language processing. One of the critical areas of research is how these models can effectively manage long-term memory. This report summarizes recent findings on the mechanisms, challenges, and architectural considerations involved in enhancing LLMs' long-term memory capabilities.
## Key Findings
### 1. Memory Components
- **Enhancing Memory Retention**: LLMs can significantly improve long-term memory retention and retrieval by incorporating additional memory components. This includes:
- **Database Storage**: External databases that allow for the storage of relevant information over extended periods.
- **Internal Memory Modules**: Built-in memory systems that help retain context and information.
### 2. Architectural Design Considerations
- **Storage Capacity**: Effective storage solutions are crucial for maintaining large amounts of information.
- **Memory Update Mechanisms**: Systems must be in place to update stored information, ensuring it remains relevant.
- **Integration with Attention Mechanisms**: Attention mechanisms help ensure that responses are contextual and relevant, enhancing the user experience.
## References
1. Yulleyi. (2023). *Large Language Models (LLM) with Long-Term Memory: Advancements and Opportunities in GenAI*
2. Arxiv. (2023). *Long-Term Memory in Language Models*
You can also use alchemy-deep-research
as a library in your own TypeScript/JavaScript projects.
-
Install the package:
npm install alchemy-deep-research
-
Import and use in your code:
import { deepResearch, generateReport, DeepResearchOptions, ResearchResult } from 'alchemy-deep-research'; import * as dotenv from 'dotenv'; dotenv.config(); // Ensure your API keys are loaded from .env async function run() { const topic = "Future of renewable energy storage"; const options: DeepResearchOptions = { openaiModel: "gpt-4o-mini" }; const result: ResearchResult = await deepResearch(topic, 1, 1, 1, options); console.log("Learnings:", result.learnings); const reportMd = await generateReport(topic, result.learnings, result.visited, options.openaiModel); console.log("\nReport:\n", reportMd); // You can save reportMd to a file using fs.writeFile } run();
For a more detailed example, see examples/programmatic_usage.ts
.
Please note that alchemy-deep-research
is an ES Module package ("type": "module"
). Ensure your project is configured to work with ESM packages if you're importing it.
MIT
- Built with OpenAI, Firecrawl, and BrowserUse APIs.
- Inspired by the need for fast, deep, and actionable research.
Note: This is an early-stage project. Expect rapid changes! I will be building out more agentic flows and advanced features to enable even better, deeper research in the near future.