Generating Completions for Fragmented Broca's Aphasic Sentences Using Large Language Models

Abstract

Broca's aphasia is a type of aphasia characterized by non-fluent, effortful and fragmented speech production with relatively good comprehension. Since traditional aphasia treatment methods are often time-consuming, labour-intensive, and do not reflect real-world conversations, applying natural language processing based approaches such as Large Language Models (LLMs) could potentially contribute to improving existing treatment approaches. To address this issue, we explore the use of sequence-to-sequence LLMs for completing fragmented Broca's aphasic sentences. We first generate synthetic Broca's aphasic data using a rule-based system designed to mirror the linguistic characteristics of Broca's aphasic speech. Using this synthetic data, we then fine-tune four pre-trained LLMs on the task of completing fragmented sentences. We evaluate our fine-tuned models on both synthetic and authentic Broca's aphasic data. We demonstrate LLMs' capability for reconstructing fragmented sentences, with the models showing improved performance with longer input utterances. Our result highlights the LLMs' potential in advancing communication aids for individuals with Broca's aphasia and possibly other clinical populations.

Replicating our experiment

The jobscripts folder contains all the jobscripts (including two helpers) needed to replicate the experiments on the Hábrók server. Note that a virtual environment should be created beforehand.

Installation

For installing the dependencies, execute the following command:

pip install -r requirements.txt

The code targets Python 3.10 and 3.11.

Data setup and pre-processing

Note that the data setup scripts require CHA files from AphasiaBank and SBCSAE. Therefore, first retrieve those files and store them accordingly -- see helper_preprocessing.sh.

We created a helper for the setup and pre-processing steps:

jobscripts/helper_preprocessing.sh

The helper first executes the data setup files, converting the raw CHA files into workable dataframes, and then runs the pre-processing files over these dataframes.

Generating synthetic sentences and assessing their quality

Similar to the data setup and pre-processing, we created a helper for generating synthetic sentences and assessing their quality automatically.

jobscripts/helper_data_quality.sh

The helper generates synthetic sentences using the SBCSAE corpus and reproduces the data evaluation as shown in Table 3 in the paper. See the corresponding bash scripts for more information such as the data paths.

Fine-tuning the models including generation and analysis

Before we can fine-tune the sentence completions models, we need to create the data splits:

jobscripts/finetuning_splits.sh 31-10-2024

The splits can be found in the data folder.

Next up we fine-tune the sentence completion models, let them generate completions for the test set, and evaluate their performances using our fine-tune script:

jobscripts/fine_tune.sh SBCSAE

See python fine_tune_t5.py --help for more information about its parameters, and please find the generated completions in the experiment folder for convenience.

To gain more insights into the ChrF and Cosine similarity scores for each model, run the following command:

jobscripts/analyse_comp.sh

The bash scripts provides descriptive statistics about the completions by each model, including standard error, effectively recreating Table 4 in the paper.

Generating completions for authentic Broca's aphasic sentences

The generated completions for the authentic Broca's aphasic sentences can be reproduced using the authentic completion script:

jobscripts/auth_comp.sh

See the bash script and python authentic_completion.py --help to reuse the code with different input sentences. Please find the generated completions for the authentic input in the experiment folder as well.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data/SBCSAE		data/SBCSAE
exp/completion/SBCSAE		exp/completion/SBCSAE
jobscripts		jobscripts
preprocessing		preprocessing
smp_clf		smp_clf
JHBI_analysis.ipynb		JHBI_analysis.ipynb
README.md		README.md
add_eval_metrics.py		add_eval_metrics.py
analyse_completion.py		analyse_completion.py
authentic_completion.py		authentic_completion.py
create_finetuning_splits.py		create_finetuning_splits.py
create_splits.py		create_splits.py
create_splits_authsyn.py		create_splits_authsyn.py
fine_tune_t5.py		fine_tune_t5.py
negation_dist.py		negation_dist.py
requirements.txt		requirements.txt
rule_base.py		rule_base.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generating Completions for Fragmented Broca's Aphasic Sentences Using Large Language Models

Abstract

Replicating our experiment

Installation

Data setup and pre-processing

Generating synthetic sentences and assessing their quality

Fine-tuning the models including generation and analysis

Generating completions for authentic Broca's aphasic sentences

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sijbrenvv/Completions_for_Broca-s_aphasia

Folders and files

Latest commit

History

Repository files navigation

Generating Completions for Fragmented Broca's Aphasic Sentences Using Large Language Models

Abstract

Replicating our experiment

Installation

Data setup and pre-processing

Generating synthetic sentences and assessing their quality

Fine-tuning the models including generation and analysis

Generating completions for authentic Broca's aphasic sentences

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages