Skip to content

[Feature Request] Magika Dependency Optional #1234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RKeelan opened this issue May 4, 2025 · 2 comments
Open

[Feature Request] Magika Dependency Optional #1234

RKeelan opened this issue May 4, 2025 · 2 comments

Comments

@RKeelan
Copy link

RKeelan commented May 4, 2025

Would it be possible to make the Magika dependency optional (i.e., pulled in as an extra)? I'm trying to use MarkItDown in the browser via Pyodide and Magika's dependency on ONNX is causing trouble.

My understanding is that Magika is used to determine the stream type, I guess in cases where the application doesn't provide it (or maybe provides incorrect information by accident?). In my case, I'd be happy to trade away that flexibility in exchange for dropping the Magika / ONNX dependency.

I tested with a forked repo where I removed self._magika.identify_stream(file_stream) from _get_stream_info_guesses() and I was able to convert PDFs in the browser.

@josh-levinson-scratchpad

👍 We're trying to use this in an AWS lambda and the >100MB runtime dependency makes this more challenging than it needs to be. Being able to specify a file type based on extension, mime type, or something lighter weight like file would be nice.

@RKeelan
Copy link
Author

RKeelan commented May 26, 2025

I forked the repo and removed the magika dependency, and it seems to work okay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants