Skip to content

Markdownify upgrade is causing problems #1236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
harshitgtypeface opened this issue May 5, 2025 · 1 comment
Open

Markdownify upgrade is causing problems #1236

harshitgtypeface opened this issue May 5, 2025 · 1 comment

Comments

@harshitgtypeface
Copy link

With the recent version of markdownify, the library no longer converts HTML content into proper Markdown. Instead, it often returns the original HTML as-is. This issue is especially noticeable when handling HTML extracted from PDFs, where the expected Markdown formatting is incorrect.

@afourney
Copy link
Member

Is there a sample document you could provide?

Also, to clarify, if HTML is extracted from a PDF, it won't automatically be converted to markdown. This type of recursive processing currently is not automatic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants