Skip to content

'charmap' codec can't encode characters in position 0-2: character maps to <undefined> #313

Closed
@Wonder-donbury

Description

@Wonder-donbury

I was trying to test via CLI commands on korean pdf documents and it ended up giving errors like this.

PS C:\Users\donghwan.lee\Documents\markitdown> markitdown test.pdf > document.md
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\donghwan.lee\AppData\Local\Programs\Python\Python311\Scripts\markitdown.exe\__main__.py", line 7, in <module>
  File "C:\Users\donghwan.lee\AppData\Local\Programs\Python\Python311\Lib\site-packages\markitdown\__main__.py", line 43, in main
    print(result.text_content)
  File "C:\Users\donghwan.lee\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

I have also tried setting the walkaround of PYTHONIOENCODING=utf-8 but it doesn't works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions