Skip to content

Support for .doc extensions #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mictab opened this issue Dec 14, 2024 · 3 comments
Open

Support for .doc extensions #23

mictab opened this issue Dec 14, 2024 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers open for contribution Invites open-source developers to contribute to the project.

Comments

@mictab
Copy link

mictab commented Dec 14, 2024

Are you planning to offer .doc support in addition to the current .docx support?

@gagb gagb added enhancement New feature or request good first issue Good for newcomers open for contribution Invites open-source developers to contribute to the project. labels Dec 14, 2024
@aviral-bhardwaj
Copy link

created PR #36

@wfnian
Copy link

wfnian commented Dec 19, 2024

Patiently waiting for good news.

@Shane32
Copy link

Shane32 commented Apr 30, 2025

Just wanted to comment, I ran a check of file counts across our server, where I was considering using markitdown:

  1. PDF - 1,150k files
  2. JPG - 243k files
  3. DOC - 112k files
  4. XLS - 70k files
  5. XLSX - 69k files
  6. DOCX - 46k files

I think this demonstrates just how important DOC file handling is, even in 2025.

On a side note, what's not shown in the above list is the number of PDFs that are generated content versus scanned documents. Most are probably scanned, and 95% of the scanned PDFs should be searchable images.

Other related PRs / discussions not already linked above:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers open for contribution Invites open-source developers to contribute to the project.
Projects
None yet
Development

No branches or pull requests

5 participants