Skip to content

mark-down: unable to convert pdf file #275

Closed
@PranshuJain97

Description

@PranshuJain97

markitdown path to .pdf > document.md -> im using this command but im getting below error

Traceback (most recent call last):
File "/opt/anaconda3/bin/markitdown", line 8, in
sys.exit(main())
^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/markitdown/main.py", line 42, in main
result = markitdown.convert(args.filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/markitdown/_markitdown.py", line 1094, in convert
return self.convert_local(source, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/markitdown/_markitdown.py", line 1114, in convert_local
return self._convert(path, extensions, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/markitdown/_markitdown.py", line 1255, in _convert
raise FileConversionException(
markitdown._markitdown.FileConversionException: Could not convert '/Users/pranshujain/Desktop/python/markitdown/src/test.pdf' to Markdown. File type was recognized as ['.pdf', '.pdf']. While converting the file, the following error was encountered:

Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.11/site-packages/markitdown/_markitdown.py", line 1239, in _convert
res = converter.convert(local_path, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/markitdown/_markitdown.py", line 490, in convert
text_content=pdfminer.high_level.extract_text(local_path),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pdfminer/high_level.py", line 169, in extract_text
for page in PDFPage.get_pages(
File "/opt/anaconda3/lib/python3.11/site-packages/pdfminer/pdfpage.py", line 154, in get_pages
doc = PDFDocument(parser, password=password, caching=caching)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pdfminer/pdfdocument.py", line 748, in init
raise PDFSyntaxError("No /Root object! - Is this really a PDF?")
pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions