Skip to content

Images in docx files cannot be converted to md documents #1222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
keller31 opened this issue Apr 28, 2025 · 4 comments
Open

Images in docx files cannot be converted to md documents #1222

keller31 opened this issue Apr 28, 2025 · 4 comments

Comments

@keller31
Copy link

The images in the document are converted into codes similar to the following, but they are incomplete and lack base64 content.
![](data:image/jpeg;base64...)

@keller31
Copy link
Author

After reading some documents, I found a solution. Using the keep_data_uris parameter allows md to retain the base64 content of the image.

@keller31
Copy link
Author

example:
markitdown xxx.docx > xxx.md --keep-data-uris

@joshjm
Copy link

joshjm commented Apr 28, 2025

there is pr #277 looking to address this. I'm keen to get some code in to merge this functionality; it seems pretty important to me. Will try and have a look at getting code in for this this week; if you can provide any further review on that pr #277, i'll try and fork and address issues.

@joshjm
Copy link

joshjm commented Apr 28, 2025

Personally, i have a post processing step, that greps through for the base64 data, generates a description, then replaces the binary data with the description. its a little fast and loose right now, but has potential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants