英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
2.5.1. Datasets and Evaluation Metrics
2.5.3. Implementation and Hyper-parameter Tuning
2.6.2. Effectiveness of the Iteration Module
1. 心得
(1)ICD类别真的太多了
2. 论文逐段精读
2.1. Abstract
①Challenge of ICD classification: large label space
②Thus, they proposed Deep Iterative Learning Model (DILM-ICD) to solve this problem
2.2. Introduction
①Diagnosis code: about 13,500 in ICD-9-CM and 70,000 in ICD-10-CM.
②Challenges: label imbalance and large categories
2.3. Related Work
①Introduced traditional machine learning methods and deep learning methods
2.4. Methodology
2.4.1. Problem Formulation
①Clinical word sequence:
②ICD diagnosis: where
is the label space
③BCE loss:
where denotes real label
2.4.2. DILM-ICD Architecture
①The architecture of DILM-ICD:
2.4.3. EMR Processing Module
①Pretraining and initialize the embedding by the continuous bag-of-words (CBOW) model to get
②Employ layer norm on
③Capturing context by BiLSTM to get the representation
2.4.4. Attention Module
①“为了更好地学习标签的表示,我们不是随机初始化标签权重矩阵,而是使用ICD代码的标题信息来计算每个ICD代码的标签向量。具体来说,我们将 ICD 代码本身与其相应的 ICD 标题相结合,形成全面的 ICD 描述。此描述利用了 ICD 代码中存在的分层信息以及 ICD 标题中包含的疾病描述。”(这标题长啥样啊)
②Pretrained ICD description by skip-gram and the encoded it as a normalized way:
where is the length of ICD description,
is the word vector of ICD description
③ICD description matrix:
④Cross attention between clinical records and ICD description:
⑤Linear layer and residual block are to get the output:
2.4.5. Iteration Module
①Iteration framework:
where denotes the
-th iteration and
denotes model
②The initial score
③The predicted label scores:
where and
④The text feature vector:
where denotes the ReLU
⑤The iteration:
2.5. Experiments
2.5.1. Datasets and Evaluation Metrics
①Dataset: MIMIC-III
②Samples: 52,722
③Subsets of MIMIC III: MIMIC-III-full with 8,929 unique ICD code and MIMIC-III-50 with 50 common code
2.5.2. Baselines
①就不例举了,比较表里面有
2.5.3. Implementation and Hyper-parameter Tuning
①Learning rate: 0.001 with AdamW optimizer
②Batch size: 8
③Max text length: 4,000
④The dimension of embedding
⑤The hidden dim of BiLSTM: 512 in MIMIC-III-full and 256 in MIMIC-50
⑥Attention head: 1 in full and 4 in 50
2.6. Experimental Results
2.6.1. Main Results
①Comparison table on MIMIC-III-FULL:
②Comparison table on MIMIC-III-FULL:
2.6.2. Effectiveness of the Iteration Module
①Performance with iteration:
2.7. Conclutions
~