Replies: 2 comments 1 reply
-
Awesome. Happy to hear that people are experimenting with plugins. In the plugin's
or
BUT, I plugins are always added after built-ins, so I'm guessing something else is going wrong. Can you add code to make sure the register_converters method is being called? Remember you need to use You can also try the sample:
And then do:
And see if that works for you -- it should. Finally, the output of |
Beta Was this translation helpful? Give feedback.
-
Have an example that is working
"""Markitdown plugin for converting a docx while keeping image content in local files."""
# The version of the plugin interface that this plugin uses.
# The only supported version is 1 for now.
__plugin_interface_version__ = 1
from typing import TYPE_CHECKING
import mammoth
from mammoth.cli import ImageWriter
from mammoth.documents import Image
from mammoth.html import Element
from markitdown import PRIORITY_SPECIFIC_FILE_FORMAT
from markitdown.converters import DocxConverter
if TYPE_CHECKING:
from typing import Any, BinaryIO, Callable
from markitdown import DocumentConverterResult, MarkItDown, StreamInfo
class DocxConverterWithImages(DocxConverter):
"""Subclass of DocxConverter which outputs images rather than strip them.
TODO: Provide method to instead output alttext
"""
def get_image_converter(self, **kwargs: Any) -> "Callable[Image, list[Element]]" | None:
"""Return argument for `convert_image` argument in `mammoth.convert_to_html`.
NOTE: This defaults to current directory so the CLI still works.
"""
output_dir = kwargs.get("output_dir", ".")
return mammoth.images.img_element(ImageWriter(output_dir))
def convert(
self,
file_stream: "BinaryIO",
stream_info: "StreamInfo",
**kwargs: Any, # Options to pass to the converter
) -> "DocumentConverterResult":
"""Convert docx while handling images in user defined way."""
style_map = kwargs.get("style_map", None)
return self._html_converter.convert_string(
mammoth.convert_to_html(
file_stream,
style_map=style_map,
convert_image=self.get_image_converter(**kwargs),
).value
)
def register_converters(markitdown: "MarkItDown", **kwargs):
"""Register MarkItDown plugin."""
markitdown.register_converter(DocxConverterWithImages(), priority=PRIORITY_SPECIFIC_FILE_FORMAT)
# rest of file
[project.entry-points."markitdown.plugin"]
markitdown_docx_img_plugin = "markitdown_docx_img_plugin"
# rest of file Then install that package and run |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've made a plugin to convert docx, and extract images in a folder.
But, when calling markitdown, the plugin isn't called (it's well recognized by markitdown), because markitdown calls first the builtin docx converter.
How can we do to force the execution of a particular plugin, or define a priority, (...or other idea) ?
Or is the only solution to desactivate the builtin docxconverter, but how ?
Thanks a lot for your feedback.
Btw, great work @afourney
Beta Was this translation helpful? Give feedback.
All reactions