Skip to content

math formula ocr #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lrybbbccc opened this issue Dec 14, 2024 · 1 comment
Open

math formula ocr #17

lrybbbccc opened this issue Dec 14, 2024 · 1 comment
Labels
enhancement New feature or request open for contribution Invites open-source developers to contribute to the project.

Comments

@lrybbbccc
Copy link

Hello, this is a very good work, it allows me to OCR PDFs very fast.
When I tried to convert PDF to markdown, the math formulas in PDF are badly OCRed, will you enhance such ability in the future?

@gagb gagb added enhancement New feature or request help wanted Extra attention is needed open for contribution Invites open-source developers to contribute to the project. and removed help wanted Extra attention is needed labels Dec 14, 2024
@hongbo-miao
Copy link

hongbo-miao commented Jan 2, 2025

Just share some info which may help. ☺️

MinerU has done a fantastic job. It is currently using

  • fine-tuned YOLOv8 for formula detection
  • UniMERNet for formula recognition

Below is a markdown generated by MinerU. As you can see, even though this formula is very complex, it recognizes it perfectly.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request open for contribution Invites open-source developers to contribute to the project.
Projects
None yet
Development

No branches or pull requests

3 participants