You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
When converting a DOCX file containing nested tables to Markdown using markitdown, the inner table content is discarded in the output. This occurs consistently with specific document structures.
Steps to Reproduce:
Environment:
• Device: MacBook Pro with M3 chip
• Installation:
pip install -e 'packages/markitdown[all]'
Test File:
• [Attach a minimal DOCX file with nested tables (e.g., outer table → inner table → text)].
Command:
markitdown path-to-file.docx > document.md
Observed Result:
• Outer table structure is preserved, but inner table content is missing in document.md.
Expected Result:
• Both outer and inner tables should be rendered in Markdown (e.g., as nested HTML tables or flattened Markdown).
The text was updated successfully, but these errors were encountered:
I've reviewed the conversion pipeline and identified an issue with nested table handling:
Current Behavior:
DOCX → HTML conversion works correctly (preserves nested tables)
HTML → Markdown conversion using markdownify fails to properly handle nested table structures
Problem:
• markdownify flattens nested tables into single-level Markdown tables
• This causes:
• Loss of table hierarchy
• Misaligned columns
• Broken formatting in complex documents
I have submitted a PR in the markdownify,If you encounter similar problems, you can add code here as follows
markdownify/init.py
def process_tag(self, node, parent_tags=None):
# **Handle nested tables**
if node.name == 'table' and 'table' in parent_tags:
# If this table is nested within another table, return its HTML representation
return str(node)
Uh oh!
There was an error while loading. Please reload this page.
Description:
When converting a DOCX file containing nested tables to Markdown using
markitdown
, the inner table content is discarded in the output. This occurs consistently with specific document structures.Steps to Reproduce:
Environment:
• Device: MacBook Pro with M3 chip
• Installation:
pip install -e 'packages/markitdown[all]'
Test File:
• [Attach a minimal DOCX file with nested tables (e.g., outer table → inner table → text)].
Command:
markitdown path-to-file.docx > document.md
Observed Result:
• Outer table structure is preserved, but inner table content is missing in
document.md
.Expected Result:
• Both outer and inner tables should be rendered in Markdown (e.g., as nested HTML tables or flattened Markdown).
The text was updated successfully, but these errors were encountered: