Skip to content

Images in Docx file #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abhilash-kozhikkot opened this issue Feb 8, 2025 · 3 comments
Open

Images in Docx file #317

abhilash-kozhikkot opened this issue Feb 8, 2025 · 3 comments

Comments

@abhilash-kozhikkot
Copy link

Hello Team,

We are using MarkItDown and so far it is working very well.

We came across an issue with docx files having images, looking into the code, it looks like mammoth library which is used by DocxConverter allows passing a handler which can process the images and return let's say alt text. These when converted to markdown are giving descriptions for the images.

But I could not see an option of passing this handler in the convert call on MarkItDown class.

Could this be exposed if possible ?

an example in mammoth will be like this

htmlResult = mammoth.convert_to_html(
    "<path to docx file>",
    convert_image=mammoth.images.img_element(convert_image),
)
@afourney
Copy link
Member

afourney commented Feb 9, 2025

Fantastic find. We should certainly expose this. I'm looking into a mechanism that will allow passing more options to the converters (as well as allowing for more of a plugin architecture).

@pcliupc
Copy link

pcliupc commented Feb 25, 2025

Fantastic find. We should certainly expose this. I'm looking into a mechanism that will allow passing more options to the converters (as well as allowing for more of a plugin architecture).

It will be really nice if the mechanism can also handle images in xlsx file.

@Sillocan
Copy link

They implemented plugins and I have an example of keeping the images in files using mammoth's CLI ImageWriter. I haven't explored alt text yet: #1099 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants