Skip to content

Pass through links to .md files on GitHub #1200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ewired opened this issue Apr 22, 2025 · 1 comment
Open

Pass through links to .md files on GitHub #1200

ewired opened this issue Apr 22, 2025 · 1 comment

Comments

@ewired
Copy link

ewired commented Apr 22, 2025

If I use markitdown on a link to a README.md file on GitHub, it currently includes the entire header and footer navigation of GitHub's web interface, when I really only want the contents of the README.md. Should markitdown be able to convert the GitHub URL into a raw CDN URL and put it into the conversion pipeline, pass it through unaltered, or keep the current behavior?

@mat-0
Copy link

mat-0 commented May 4, 2025

I'm not sure this is the responsibility of markitdown to do this and could open up a precedence for loads of url manipulations. I would suggest a little helper function in python to parse your input before passing them to the markitdown class. Here's a simple, untested, function you could use or start with.

def fetch_github_file_content(github_url):
     raw_url = github_url.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/")
     response = requests.get(raw_url)
     if response.status_code == 200:
         return response.text
     else:
         print("Failed to fetch data. Status code:", response.status_code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants