English | [简体中文](README_ch.md)
# Layout analysis
- [1. Introduction](#1-Introduction)
- [2. Install](#2-Install)
- [2.1 Install PaddlePaddle](#21-Install-paddlepaddle)
- [2.2 Install PaddleDetection](#22-Install-paddledetection)
- [3. Data preparation](#3-Data-preparation)
- [3.1 English data set](#31-English-data-set)
- [3.2 More datasets](#32-More-datasets)
- [4. Start training](#4-Start-training)
- [4.1 Train](#41-Train)
- [4.2 FGD Distillation training](#42-FGD-Distillation-training)
- [5. Model evaluation and prediction](#5-Model-evaluation-and-prediction)
- [5.1 Indicator evaluation](#51-Indicator-evaluation)
- [5.2 Test layout analysis results](#52-Test-layout-analysis-results)
- [6 Model export and inference](#6-Model-export-and-inference)
- [6.1 Model export](#61-Model-export)
- [6.2 Model inference](#62-Model-inference)
## 1. Introduction
Layout analysis refers to the regional division of documents in the form of pictures and the positioning of key areas, such as text, title, table, picture, etc. The layout analysis algorithm is based on the lightweight model PP-picodet of [PaddleDetection]( https://github.com/PaddlePaddle/PaddleDetection )
<div align="center">
<img src="../docs/layout/layout.png" width="800">
</div>
## 2. Install
### 2.1. Install PaddlePaddle
- **(1) Install PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
# GPU Install
python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple
# CPU Install
python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple
```
For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)。
### 2.2. Install PaddleDetection
- **(1)Download PaddleDetection Source code**
```bash
git clone https://github.com/PaddlePaddle/PaddleDetection.git
```
- **(2)Install third-party libraries**
```bash
cd PaddleDetection
python3 -m pip install -r requirements.txt
```
## 3. Data preparation
If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model.
### 3.1. English data set
Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(Dataset 96G),contains 5 classes:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
```
# Download data
wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz
# Decompress data
tar -xvf publaynet.tar.gz
```
Uncompressed **directory structure:**
```
|-publaynet
|- test
|- PMC1277013_00004.jpg
|- PMC1291385_00002.jpg
| ...
|- train.json
|- train
|- PMC1291385_00002.jpg
|- PMC1277013_00004.jpg
| ...
|- val.json
|- val
|- PMC538274_00004.jpg
|- PMC539300_00004.jpg
| ...
```
**data distribution:**
| File or Folder | Description | num |
| :------------- | :------------- | ------- |
| `train/` | Training set pictures | 335,703 |
| `val/` | Verification set pictures | 11,245 |
| `test/` | Test set pictures | 11,405 |
| `train.json` | Training set annotation files | - |
| `val.json` | Validation set dimension files | - |
**Data Annotation**
The JSON file contains the annotations of all images, and the data is stored in a dictionary nested manner.Contains the following keys:
- info,represents the dimension file info。
- licenses,represents the dimension file licenses。
- images,represents the list of image information in the annotation file,each element is the information of an image。The information of one of the images is as follows:
```
{
'file_name': 'PMC4055390_00006.jpg', # file_name
'height': 601, # image height
'width': 792, # image width
'id': 341427 # image id
}
```
- annotations, represents the list of annotation information of the target object in the annotation file,each element is the annotation information of a target object。The following is the annotation information of one of the target objects:
```
{
'segmentation': # Segmentation annotation of objects
'area': 60518.099043117836, # Area of object
'iscrowd': 0, # iscrowd
'image_id': 341427, # image id
'bbox': [50.58, 490.86, 240.15, 252.16], # bbox [x1,y1,w,h]
'category_id': 1, # category_id
'id': 3322348 # image id
}
```
### 3.2. More datasets
We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links,process to the JSON format of the above annotation file,that is, the training can be conducted in the same way。
| dataset | 简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [cTDaR2019_cTDaR](https://cndplab-founder.github.io/cTDaR2019/) | For form detection (TRACKA) and form identification (TRACKB).Image types include historical data sets (beginning with cTDaR_t0, such as CTDAR_T00872.jpg) and modern data sets (beginning with cTDaR_t1, CTDAR_T10482.jpg). |
| [IIIT-AR-13K](http://cvit.iiit.ac.in/usodi/iiitar13k.php) | Data sets constructed by manually annotating figures or pages from publicly available annual reports, containing 5 categories:table, figure, natural image, logo, and signature. |
| [TableBank](https://github.com/doc-analysis/TableBank) | For table detection and recognition of large datasets, including Word and Latex document formats |
| [CDLA](https://github.com/buptlihang/CDLA) | Chinese document layout analysis data set, for Chinese literature (paper) scenarios, including 10 categories:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation |
| [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title |
## 4. Start training
Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section.
If you do not want training and directly experience the following process of model evaluation, prediction, motion to static, and inference, you can download the provided pre-trained model (PubLayNet dataset) and skip this part.
```
mkdir pretrained_model
cd pretrained_model
# Download PubLayNet pre-training model(Direct experience model evaluates, predicts, and turns static)
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams
# Download the PubLaynet inference model(Direct experience model reasoning)
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
```
If the test image is Chinese, the pre-trained model of Chinese CDLA dataset can be downloaded to identify 10 types of document regions:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation,Download the training model and inference model of Model 'picodet_lcnet_x1_0_fgd_layout_cdla' in [layout analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)。If only the table area in the image is detected, you can download the pre-trained model of the table dataset, and download the training model and inference model of the 'picodet_LCnet_x1_0_FGd_layout_table' model in [Layout Analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models

野生的大熊
- 粉丝: 237
最新资源
- 大模型(LLMs)算法工程师面试题及我的答案记录仓库
- 从零开始学大模型Transformer、GPT2、BERT pre-training and fine-tuning from scratch
- 开源智能体项目:支持 6 大聊天平台、Onebotv11 一对多连接、流式信息 agent、对话键盘气泡生成,支持 10 + 大模型接口且持续更新,可转换多模型接口为带上下文通用格式
- 该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题与我写的答案
- 大数据平台架构下的模型管理与权限控制体系
- 开源的智能体项目 支持6种聊天平台 Onebotv11一对多连接 流式信息 agent 对话keyboard气泡生成 支持10+大模型接口(持续更新) 具有将多种大模型接口转化为带有上下文的通用格式的
- 针对复杂业务逻辑的Java实现系统,抽象出一套编程框架,借鉴领域模型的设计方法,使得开发体验更加环保、更加友好,大大提高代码的后期可维护性
- 大数据平台下模型管理与权限控制平台
- 《大语言模型》综述全书学习笔记
- 《大语言模型》综述全书的学习笔记整理
- log4j日志的监控系统,采用web形式展现各模块的运行状态,支持邮件短信报警,适用于一般企业大中型业务生成系统的监控;业务模块定时30秒采用json格式输出到文件
- log4j日志的监控系统,采用web形式展现各模块的运行状态,支持邮件短信报警,适用于一般企业大中型业务生成系统的监控;业务模块定时30秒采用json格式输出到文件
- 从 0 到 1 学用模块化等技术打造大型 Android 项目架构模式
- 基于 模块化+Kotlin+协程+Retrofit+Jetpack+MVVM 架构.能提供大家学习如何从0到1打造一个符合[大型Android项目的架构模式]
- txs0104e数据表(官方资料)
- 基于 模块化+Kotlin+协程+Retrofit+Jetpack+MVVM 架构.能提供大家学习如何从0到1打造一个符合[大型Android项目的架构模式]
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈


