Pass through links to .md files on GitHub #1200

ewired · 2025-04-22T01:28:22Z

If I use markitdown on a link to a README.md file on GitHub, it currently includes the entire header and footer navigation of GitHub's web interface, when I really only want the contents of the README.md. Should markitdown be able to convert the GitHub URL into a raw CDN URL and put it into the conversion pipeline, pass it through unaltered, or keep the current behavior?

mat-0 · 2025-05-04T10:52:25Z

I'm not sure this is the responsibility of markitdown to do this and could open up a precedence for loads of url manipulations. I would suggest a little helper function in python to parse your input before passing them to the markitdown class. Here's a simple, untested, function you could use or start with.

def fetch_github_file_content(github_url):
     raw_url = github_url.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/")
     response = requests.get(raw_url)
     if response.status_code == 200:
         return response.text
     else:
         print("Failed to fetch data. Status code:", response.status_code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass through links to .md files on GitHub #1200

Pass through links to .md files on GitHub #1200

ewired commented Apr 22, 2025 •

edited

Loading

mat-0 commented May 4, 2025 •

edited

Loading

Uh oh!

Pass through links to .md files on GitHub #1200

Pass through links to .md files on GitHub #1200

Comments

ewired commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mat-0 commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ewired commented Apr 22, 2025 •

edited

Loading

mat-0 commented May 4, 2025 •

edited

Loading