Skip to content

Translate EPUB books using Large Language Models while preserving the original text. The translated content is displayed side-by-side with the original, creating bilingual books perfect for language learning and cross-reference reading.

License

Notifications You must be signed in to change notification settings

oomol-lab/epub-translator

Repository files navigation

EPUB Translator

ci pip install epub-translator pypi epub-translator python versions license

Open in OOMOL Studio

English | 中文

Translate EPUB books using Large Language Models while preserving the original text. The translated content is displayed side-by-side with the original, creating bilingual books perfect for language learning and cross-reference reading.

Translation Effect

Features

  • Bilingual Output: Preserves original text alongside translations for easy comparison
  • LLM-Powered: Leverages large language models for high-quality, context-aware translations
  • Format Preservation: Maintains EPUB structure, styles, images, and formatting
  • Complete Translation: Translates chapter content, table of contents, and metadata
  • Progress Tracking: Monitor translation progress with built-in callbacks
  • Flexible LLM Support: Works with any OpenAI-compatible API endpoint
  • Caching: Built-in caching for progress recovery when translation fails

Installation

pip install epub-translator

Requirements: Python 3.11, 3.12, or 3.13

Quick Start

Using OOMOL Studio (Recommended)

The easiest way to use EPUB Translator is through OOMOL Studio with a visual interface:

Watch the Tutorial

Using Python API

from pathlib import Path
from epub_translator import LLM, translate, language

# Initialize LLM with your API credentials
llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
)

# Translate EPUB file using language constants
translate(
    llm=llm,
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language=language.ENGLISH,
)

With Progress Tracking

from tqdm import tqdm

with tqdm(total=100, desc="Translating", unit="%") as pbar:
    last_progress = 0.0

    def on_progress(progress: float):
        nonlocal last_progress
        increment = (progress - last_progress) * 100
        pbar.update(increment)
        last_progress = progress

    translate(
        llm=llm,
        source_path=Path("source.epub"),
        target_path=Path("translated.epub"),
        target_language="English",
        on_progress=on_progress,
    )

API Reference

LLM Class

Initialize the LLM client for translation:

LLM(
    key: str,                          # API key
    url: str,                          # API endpoint URL
    model: str,                        # Model name (e.g., "gpt-4")
    token_encoding: str,               # Token encoding (e.g., "o200k_base")
    cache_path: PathLike | None = None,           # Cache directory path
    timeout: float | None = None,                  # Request timeout in seconds
    top_p: float | tuple[float, float] | None = None,
    temperature: float | tuple[float, float] | None = None,
    retry_times: int = 5,                         # Number of retries on failure
    retry_interval_seconds: float = 6.0,          # Interval between retries
    log_dir_path: PathLike | None = None,         # Log directory path
)

translate Function

Translate an EPUB file:

translate(
    llm: LLM,                          # LLM instance
    source_path: Path,                 # Source EPUB file path
    target_path: Path,                 # Output EPUB file path
    target_language: str,              # Target language (e.g., "English", "Chinese")
    user_prompt: str | None = None,    # Custom translation instructions
    max_retries: int = 5,              # Maximum retries for failed translations
    max_group_tokens: int = 1200,      # Maximum tokens per translation group
    on_progress: Callable[[float], None] | None = None,  # Progress callback (0.0-1.0)
)

Language Constants

EPUB Translator provides predefined language constants for convenience. You can use these constants instead of writing language names as strings:

from epub_translator import language

# Usage example:
translate(
    llm=llm,
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language=language.ENGLISH,
)

# You can also use custom language strings:
translate(
    llm=llm,
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language="Icelandic",  # For languages not in the constants
)

Configuration Examples

OpenAI

llm = LLM(
    key="sk-...",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
)

Azure OpenAI

llm = LLM(
    key="your-azure-key",
    url="https://your-resource.openai.azure.com/openai/deployments/your-deployment",
    model="gpt-4",
    token_encoding="o200k_base",
)

Other OpenAI-Compatible Services

Any service with an OpenAI-compatible API can be used:

llm = LLM(
    key="your-api-key",
    url="https://your-service.com/v1",
    model="your-model",
    token_encoding="o200k_base",  # Match your model's encoding
)

Use Cases

  • Language Learning: Read books in their original language with side-by-side translations
  • Academic Research: Access foreign literature with bilingual references
  • Content Localization: Prepare books for international audiences
  • Cross-Cultural Reading: Enjoy literature while understanding cultural nuances

Advanced Features

Custom Translation Prompts

Provide specific translation instructions:

translate(
    llm=llm,
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language="English",
    user_prompt="Use formal language and preserve technical terminology",
)

Caching for Progress Recovery

Enable caching to resume translation progress after failures:

llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
    cache_path="./translation_cache",  # Translations are cached here
)

Related Projects

PDF Craft

PDF Craft converts PDF files into EPUB and other formats, with a focus on scanned books. Combine PDF Craft with EPUB Translator to convert and translate scanned PDF books into bilingual EPUB format.

Workflow: Scanned PDF → [PDF Craft] → EPUB → [EPUB Translator] → Bilingual EPUB

For a complete tutorial, watch: Convert scanned PDF books to EPUB format and translate them into bilingual books

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

About

Translate EPUB books using Large Language Models while preserving the original text. The translated content is displayed side-by-side with the original, creating bilingual books perfect for language learning and cross-reference reading.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published