You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it be possible to make the Magika dependency optional (i.e., pulled in as an extra)? I'm trying to use MarkItDown in the browser via Pyodide and Magika's dependency on ONNX is causing trouble.
My understanding is that Magika is used to determine the stream type, I guess in cases where the application doesn't provide it (or maybe provides incorrect information by accident?). In my case, I'd be happy to trade away that flexibility in exchange for dropping the Magika / ONNX dependency.
I tested with a forked repo where I removed self._magika.identify_stream(file_stream) from _get_stream_info_guesses() and I was able to convert PDFs in the browser.
The text was updated successfully, but these errors were encountered:
👍 We're trying to use this in an AWS lambda and the >100MB runtime dependency makes this more challenging than it needs to be. Being able to specify a file type based on extension, mime type, or something lighter weight like file would be nice.
Would it be possible to make the Magika dependency optional (i.e., pulled in as an extra)? I'm trying to use MarkItDown in the browser via Pyodide and Magika's dependency on ONNX is causing trouble.
My understanding is that Magika is used to determine the stream type, I guess in cases where the application doesn't provide it (or maybe provides incorrect information by accident?). In my case, I'd be happy to trade away that flexibility in exchange for dropping the Magika / ONNX dependency.
I tested with a forked repo where I removed
self._magika.identify_stream(file_stream)
from_get_stream_info_guesses()
and I was able to convert PDFs in the browser.The text was updated successfully, but these errors were encountered: