Skip to content

bug: markitdown fails with 403 Forbidden when converting Zhihu article URL #1196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cubxxw opened this issue Apr 21, 2025 · 0 comments
Open

Comments

@cubxxw
Copy link

cubxxw commented Apr 21, 2025

When attempting to convert a Zhihu article using the CLI tool, markitdown throws an unhandled exception due to a 403 Forbidden response from the target URL.

Reproduction Steps:

markitdown https://zhuanlan.zhihu.com/p/11654788270 > test.md

Error Traceback:

Traceback (most recent call last):
  File "/Users/xiongxinwei/Library/Caches/pypoetry/virtualenvs/telepace-server-WT4oou3h-py3.12/bin/markitdown", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File ".../markitdown/__main__.py", line 197, in main
    result = markitdown.convert(
             ^^^^^^^^^^^^^^^^^^^
  File ".../markitdown/_markitdown.py", line 271, in convert
    return self.convert_uri(source, stream_info=stream_info, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../markitdown/_markitdown.py", line 443, in convert_uri
    response.raise_for_status()
  File ".../site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://zhuanlan.zhihu.com/p/11654788270

Expected Behavior:

The tool should either:

  • Successfully fetch and convert the article if access is allowed, or
  • Gracefully handle 403 responses with a clear error message indicating access is denied.

Environment:

  • OS: macOS
  • Python: 3.12
  • Tool version: latest from repo
  • Installed via: Poetry

Possible Cause:

Zhihu may be blocking automated requests. It might be necessary to:

  • Add custom headers (e.g., a user-agent string) to mimic a browser
  • Handle HTTP errors more gracefully
@cubxxw cubxxw changed the title markitdown fails with 403 Forbidden when converting Zhihu article URL bug: markitdown fails with 403 Forbidden when converting Zhihu article URL Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant