Afan Oromo Text Keyword Extraction Using Machine Learning

This document discusses keyword extraction from Afaan Oromo text using machine learning. It describes keyword extraction and its applications, as well as different machine learning algorithms that can be used for this task. The document aims to analyze how well term frequency-inverse document frequency (TF-IDF) can identify keywords from Afaan Oromo news texts.

Uploaded by

Bekuma Gudina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

201 views18 pages

Afan Oromo Text Keyword Extraction Using Machine Learning

Uploaded by

Bekuma Gudina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Afan Oromo Text Keyword Extraction Using

Machine learning

1. INTRODUCTION
1Background of the Study

• Natural Language Processing (NLP) refers to the use

of computational methods to process spoken or
written form of such free text. Which acts as a mode
of communication commonly used by humans(Assal
et al., 2011).There are lot many processes involved
in the pipeline of NLP. the syntactic level,
statements are segmented into words, punctuation
(i.e. tokens) and each token is assigned with its label
in the form of noun, verb, adjective, adverb and so
on (Part of Speech Tagging).
At the semantic level, each word is analyzed to get the
meaningful representation of the sentence. Hence, the
basic task of NLP is to process the unstructured text and
to produce a representation of its meaning. Automatic
Keyword Extraction (AKE) methods have been
proposed in the past. Traditional unsupervised graph-
based approaches consider the central nodes of a graph-
of-words as the most representative ones (Mihalcea and
Tarau, 2004) .
• Moreover, there exist strong baselines that use
common statistics (e.g. term frequency-inverse
document frequency, known as Tf-Idf) and/or
heuristics (e.g., position in the document) to detect the
most significant terms. Furthermore, state-of-the-art
approaches for the task span both classical supervised
machine learning methods (Medelyan et al., 2009) and
deep learning techniques (Meng et al., 2017) that
perform better compared to unsupervised ones.
Keyword extraction is the retrieval of keywords or
key phrases from text documents. They are selected
among phrases in the text document and
characterize the document’s topic. In this thesis,
summaries the most commonly used methods that
automatically extract keywords. The automatically
extract keywords from the documents use
heuristics to select the most used and significant
words or phrases from the text document.
Classify keyword extraction methods in the field
named natural language processing, which is an
important field in machine learning and artificial
intelligence. Keyword extractors are used to
extract words (keywords) or groups of two or
more words that create a phrase (key phrases). In
this article, I use the term keyword extraction,
which includes either keyword or key-phrase
extraction
When Use keyword extraction; Save time Based on
keywords, one can decide if the topic of the text (e.g. article)
interest him and whether to read it. Keywords provide a
summary of the document to the user
Find relevant documents Today, tons of articles are written,
and it is not possible to read all of them. Keyword extraction
algorithms can help us to find relevant articles. Keyword
extraction algorithms also automate book, publication or web
indexes building
• Keyword extraction as support for machine
learning: Keyword extraction algorithms find the
most relevant words that describe the text. They can
be later used for visualizations or to automatically
classify textKeywords Extraction is one of the most
important tasks in Natural Language Processing,
and it is responsible for determining various
methods for extracting a significant number of
words and phrases from a collection of texts
Both keywords and key phrases describe the essence of what a document
concerns. The difference between the two is that keywords are single words,
while key phrases can be either individual words or phrases (i.e., n-grams
with n ≥ 1). Many key phrase extraction methods form and rank the candidate
phrases using the previously scored candidate unigrams by a keyword
extractor (Wan and Xiao, 2008a; Hasan and Ng, 2014; Florescu and Caragea).
Now a days with the existence of computer technologies there is a huge effort
going on toward processing natural languages using computers which is so
called as
Afaan Oromo, also called Afaan Oromoo, is a member of the Cushitic
branch of the Afro-Asiatic language family
• All of this is done to summaries and assist in the relevant
and well-organized organization, storage, search, and
retrieval of content. There are numerous keyword
extraction algorithms available,?each of which employs a
unique set of fundamental and theoretical methods to this
type of problem. There are various types of NLP
algorithms, some of which extract only words and others
which extract both words and phrases. There are also NLP
algorithms that extract keywords based onthe complete
content of the texts, as well as algorithms that extract
keywords based on the entire content of the texts
The keyword extraction service is used to extract key words and phrases
from text, such as an email or chat. The algorithm parses the text into
sentences and remove most frequent but least useful words for determining
meaning (stop-words). It then applies various statistical and frequency
methods to determine the most significant key words and phrases.
Automatic Keyword Key phrase Extraction intends to discover a limited
but concise set of words phrases that reflect the main topics discussed
within a text document, avoiding the expensive and time-consuming
process of manual annotation by experts (Vega-Oliveros et al., 2019).
It is the third most widely spoken language in
Africa, after Hausa and Arabic. Its original homeland is an area
that includes much of what today is called Ethiopia, Somalia,
Sudan and northern Kenya and some parts of other East African
countries. Currently, it is an official language of Oromia Regional
State (which is the biggest region among the current Federal
States in Ethiopia). It is used by Oromo people, who are the
largest ethnic group in Ethiopia, which amounts to 50% of the
total population in 2007 (2015 Census statistic of Ethiopia)
(Tesema, n.d.)Natural Language Processing (NLP) or
Computational Linguistics (Mandefro, 2010)With theadvent of
the big data era, information has been increasing Exponentially.
•Traditionally, people acquire information from books, newspapers
and magazines. Now they are used to acquire information via the
Internet. Texts is one of the main information formats of information.
For textual information, a keyword set consists of several words,
which can express the meaning of the text. Keywords can help users
quickly understand the topics of text. Besides, keyword extraction is
the basis of applications such as summarization, information
retrieval, text classification and clustering
• In the early stage, keywords are manually extracted from the text
(wang zhuohao et.al, 2021).
•). Manual extracting and tagging keywords are time-consuming and labor-
intensive. The extraction results are subjective, which is difficult to objectively
reflect the meaning of texts. With the quickly increase of information, it is difficult
to manually extract keywords. Therefore, it is urgent to automatically extract
keywords from texts.
•The higher level tasks in NLP are Machine Translation (MT), Information
Extraction (IE), Information Retrieval (IR), Automatic Text Summarization (ATS),
Question-Answering System, Parsing, Sentiment Analysis, Natural Language
Understanding (NLU) and Natural Language Generation (NLG). Information
Extraction (IE) refers to the use of computational methods to identify relevant
pieces of information in document generated for human use and convert this
information into a representation suitable for computer based storage, processing,
and retrieval (Wimalasuriya and Dua, 2010)
1.2 Statement of the Problem

Afaan Oromo is the language spoken by a large ethnic group in

Ethiopia and nowadays it is becoming a popular language even to
outside of Ethiopia.Therefore,it is vitalNLPresearch area though
there is no standard corpus developed for it yet.??????????
The identification and classification of keyword extraction in
plain text is of key importance in numerous natural language
processing applications. In Information Extraction (IE) systems
keyword extraction generally carry important information about
the text itself, and thus are targets for extraction. In Machine
Translation (MT), keyword extraction and other sorts of words
have to behandled in a different way due to the specific
translation rules that apply to them(Farkas et al)
1.3 Research/Questions/Hypotheses

•Afaan Oromo is one of the local languages spoken in Ethiopia, especially by

Oromo ethnic groups and others. This language is still under development in view
of its applicability in the development of current technology, and due to this reason,
it is more interested domain for any language-dependent researches. Based
on this information, the currently proposed research work will answer the following
questions:
 What are previous researches conducted related to the currently proposed one?
 How does machine learning possibly enable in text extracting from Afaan Oromo
news, words, pharase and document?
 What is YAKE (Yet another Keyword extraction ) and Term Frequency?

What will the performance of machine learning be in extracting and

classifying text extracting from Afaan Oromo news texts comparative to
the other previously usemodels?.
1.4 Objectives

The objectives of the current study are explained separately as general objective and
specific objectives.
1.4.1.General Objective
The main objective of this research is to developing Afaan Oromo Text Keyword Extraction Using
Machine learning Approach.

1.4.2. Specific Objectives

The specific objectives of the proposed research are to:
The following specific objectives are identified in order to achieve the specified general objective
 To Analyze how effectively TF-IDF the can be implemented in identification and
classification of key word from Afaan Oromo news texts.
 Review techniques and methodologies used for Afaan Oromo text keyword extraction To design
architecture for Afaan Oromo text keyword extraction.
 Develop a prototype for keyword extraction
 Design and train TF-IDFAfaan Oromo textkeyword extraction system with Afaan Oromo news texts.
 Evaluate the performance of the system.research area in the future
 Test and evaluate the performance of the system.
 Finally, draw conclusion from experimental results and recommend for further research.
1.4.3. Scope of the Study
•This research focuses on single document keyword extraction for Afaan
Oromo texts. It doesn’t
employ an abstractive keyword extraction since it requires deep
linguistic analysis and difficult to implement with current state of the art
of the field. The main work is limited to a textual document of Afaan
Oromo text corpus only. However, there are other data types such as
image, audio, video, and etc, which are out of the scope of this study.
• The current study does not include the explanation about the
descriptive information
(attributes) of any keyword extracted from news texts, i.e., the task of
describing
about attributes of any extracted keyword extraction. And also, the
study does not
include the description about relationships among extracted keyword
from the document

Quality Agreement between Supplier and Client
No ratings yet
Quality Agreement between Supplier and Client
21 pages
COMPUTER PROJECT 1
No ratings yet
COMPUTER PROJECT 1
25 pages
Keyword Extraction Issues and Methods
No ratings yet
Keyword Extraction Issues and Methods
33 pages
Middleware Technologies
No ratings yet
Middleware Technologies
78 pages
DC Drives and Control: Lecturer: L.J. Ngoma
No ratings yet
DC Drives and Control: Lecturer: L.J. Ngoma
22 pages
text-analysis-semantic-search
No ratings yet
text-analysis-semantic-search
165 pages
Arabic Keyphrase Extraction
0% (1)
Arabic Keyphrase Extraction
77 pages
Linux Commands For Aws
No ratings yet
Linux Commands For Aws
42 pages
ENGLISH7-Q3-W1-D4
No ratings yet
ENGLISH7-Q3-W1-D4
44 pages
selected texts analysis lecture 5
No ratings yet
selected texts analysis lecture 5
28 pages
AlamiMerrouni2020 Article AutomaticKeyphraseExtractionAS
No ratings yet
AlamiMerrouni2020 Article AutomaticKeyphraseExtractionAS
34 pages
Philips hts3541 12 05 51 55 78 79 x78 mk2 SM
No ratings yet
Philips hts3541 12 05 51 55 78 79 x78 mk2 SM
55 pages
s11042-024-18110-5
No ratings yet
s11042-024-18110-5
33 pages
1 s2.0 S0020025519308588 Main
No ratings yet
1 s2.0 S0020025519308588 Main
33 pages
Ijieeb V13 N5 5
No ratings yet
Ijieeb V13 N5 5
9 pages
Unit4 Final
No ratings yet
Unit4 Final
57 pages
Resume Modul 6 Menyusun Paragraf II Eksposisi Dan Persuasi - Docx - Compressed
No ratings yet
Resume Modul 6 Menyusun Paragraf II Eksposisi Dan Persuasi - Docx - Compressed
33 pages
Piskorski 2012
No ratings yet
Piskorski 2012
27 pages
L5 LAN Module - 03
No ratings yet
L5 LAN Module - 03
23 pages
03 Keyword Extraction
No ratings yet
03 Keyword Extraction
23 pages
_Updated_2022H1030030G_mid_sem_report_PS
No ratings yet
_Updated_2022H1030030G_mid_sem_report_PS
14 pages
Unit 6 Endsem PYQs
No ratings yet
Unit 6 Endsem PYQs
15 pages
Keyword Extraction From Short Texts With A Text-To
No ratings yet
Keyword Extraction From Short Texts With A Text-To
15 pages
1704.03242
No ratings yet
1704.03242
12 pages
4 Extractive Text Summarization Using Lexical Association and Graph Based Lexical Association
No ratings yet
4 Extractive Text Summarization Using Lexical Association and Graph Based Lexical Association
12 pages
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
A Keyword Extraction Approach For Single Document Extractive Summarization Based On Topic Centrality
No ratings yet
A Keyword Extraction Approach For Single Document Extractive Summarization Based On Topic Centrality
9 pages
Incorporating Expert Knowledge Into Keyphrase Extraction
No ratings yet
Incorporating Expert Knowledge Into Keyphrase Extraction
8 pages
abacus_3
No ratings yet
abacus_3
2 pages
2024_ResearchontheTFIDFalgorithmcombinedwithsemanticsforautomaticextractionofkeywordsfromnetworknewstexts
No ratings yet
2024_ResearchontheTFIDFalgorithmcombinedwithsemanticsforautomaticextractionofkeywordsfromnetworknewstexts
10 pages
Pattern Rank
No ratings yet
Pattern Rank
6 pages
Automatic Keyword Extraction From Individual Documents
No ratings yet
Automatic Keyword Extraction From Individual Documents
17 pages
Ijcsn 2013 2 4 60 PDF
No ratings yet
Ijcsn 2013 2 4 60 PDF
3 pages
Key-Phrase Extraction For Classification
No ratings yet
Key-Phrase Extraction For Classification
4 pages
SOWNDARRAJAN Journal
No ratings yet
SOWNDARRAJAN Journal
7 pages
Contextual Topic Discovery Using Unsupervised Keyphrase Extraction and Hierarchical Semantic Graph Model
No ratings yet
Contextual Topic Discovery Using Unsupervised Keyphrase Extraction and Hierarchical Semantic Graph Model
19 pages
Key2Vec Automatic Ranked Keyphrase Extraction From Scientific Articles Using Phrase Embeddings
No ratings yet
Key2Vec Automatic Ranked Keyphrase Extraction From Scientific Articles Using Phrase Embeddings
6 pages
Text Extraction Research Paper
No ratings yet
Text Extraction Research Paper
6 pages
Article
No ratings yet
Article
5 pages
A Comparative Study of Keyword Extraction Algorithms For English Texts
No ratings yet
A Comparative Study of Keyword Extraction Algorithms For English Texts
8 pages
Vmware Vsphere Pricing Whitepaper
No ratings yet
Vmware Vsphere Pricing Whitepaper
19 pages
Keyphrase Extraction in Scientific Publications
No ratings yet
Keyphrase Extraction in Scientific Publications
10 pages
9 - Prinsip Desain Multimedia
No ratings yet
9 - Prinsip Desain Multimedia
22 pages
Simple Unsupervised Keyphrase Extraction Using Sentence Embeddings
No ratings yet
Simple Unsupervised Keyphrase Extraction Using Sentence Embeddings
9 pages
Introduction To Information Extraction Technology: Douglas E. Appelt David J. Israel
No ratings yet
Introduction To Information Extraction Technology: Douglas E. Appelt David J. Israel
41 pages
Improved Automatic Keyword Extraction Given More Linguistic Knowledge
No ratings yet
Improved Automatic Keyword Extraction Given More Linguistic Knowledge
8 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
Summarization
No ratings yet
Summarization
10 pages
A Machine Learning Approach To Information Extraction
No ratings yet
A Machine Learning Approach To Information Extraction
8 pages
How To Crack Patch Serial Keygen PDF
No ratings yet
How To Crack Patch Serial Keygen PDF
6 pages
Keyword Extraction
No ratings yet
Keyword Extraction
2 pages
Tle TVL Ict TD Module 3
No ratings yet
Tle TVL Ict TD Module 3
8 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
062D
No ratings yet
062D
7 pages
DeekshikaJadyada27-AP24LDS11
No ratings yet
DeekshikaJadyada27-AP24LDS11
4 pages
Akiyama - Pulsar PRO
No ratings yet
Akiyama - Pulsar PRO
1 page
Chapter 03
No ratings yet
Chapter 03
20 pages
HLR Interface - CFU
No ratings yet
HLR Interface - CFU
17 pages
Laplace Transform
No ratings yet
Laplace Transform
16 pages
Keyword Extraction Measure
No ratings yet
Keyword Extraction Measure
9 pages
Quikdraw Diagram Supplement Sheet External Call Annunciation / External Signaling External Call Annunciation With Ier-2
No ratings yet
Quikdraw Diagram Supplement Sheet External Call Annunciation / External Signaling External Call Annunciation With Ier-2
1 page
Acl 14
No ratings yet
Acl 14
12 pages
Ec1303 Digital System Design
No ratings yet
Ec1303 Digital System Design
2 pages
Inf. Tecn. Bci Aurora 750gpm@130psi Diesel
No ratings yet
Inf. Tecn. Bci Aurora 750gpm@130psi Diesel
27 pages
Jaya D. Kapoor Alamuri Ratnamala Institute of Engineering and Technology, Shahpur Kailas K. Devadkar Sardar Patel Institute of Technology, Andheri
No ratings yet
Jaya D. Kapoor Alamuri Ratnamala Institute of Engineering and Technology, Shahpur Kailas K. Devadkar Sardar Patel Institute of Technology, Andheri
6 pages
Text Mining and Natural Language Processing - Introduction For The Special Issue
No ratings yet
Text Mining and Natural Language Processing - Introduction For The Special Issue
2 pages
Different Text Mining Techniques
No ratings yet
Different Text Mining Techniques
4 pages
SA-BT230P SA-BT230PC SA-BT330P SA-BT330PC: Blu-Ray Disc Home Theater Sound System
No ratings yet
SA-BT230P SA-BT230PC SA-BT330P SA-BT330PC: Blu-Ray Disc Home Theater Sound System
146 pages
c6x Assembly Programming 1
No ratings yet
c6x Assembly Programming 1
20 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
Tasks in NLP
No ratings yet
Tasks in NLP
7 pages
Lesson 29 Exact Differential Equations: Integrating Factors
No ratings yet
Lesson 29 Exact Differential Equations: Integrating Factors
7 pages
CACIC 20070725 Induction Trees LopezDeLuise - v7
No ratings yet
CACIC 20070725 Induction Trees LopezDeLuise - v7
12 pages
My Painter Channko
0% (1)
My Painter Channko
145 pages
Keyphrase Extraction (3rd Review)
No ratings yet
Keyphrase Extraction (3rd Review)
22 pages
Service Manual - Centrifuge 5418 R - Eng
No ratings yet
Service Manual - Centrifuge 5418 R - Eng
78 pages
Professional Will Template
No ratings yet
Professional Will Template
9 pages
Arens AAS17 SM 11
100% (1)
Arens AAS17 SM 11
39 pages
Rca 46smartr30 - L46f3520-Ms63f-La - 1
No ratings yet
Rca 46smartr30 - L46f3520-Ms63f-La - 1
61 pages
Whipstock Brochure From ITS
No ratings yet
Whipstock Brochure From ITS
2 pages
Natural Language Processing: Fundamentals and Applications
From Everand
Natural Language Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Terminology Extraction: Fundamentals and Applications
From Everand
Terminology Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
From Everand
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
daniel Huston
No ratings yet
Natural Language User Interface: Fundamentals and Applications
From Everand
Natural Language User Interface: Fundamentals and Applications
Fouad Sabry
No ratings yet
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Semantics: Fundamentals and Applications
From Everand
Statistical Semantics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Afan Oromo Text Keyword Extraction Using Machine Learning

Uploaded by

Afan Oromo Text Keyword Extraction Using Machine Learning

Uploaded by

Afan Oromo Text Keyword Extraction Using

• Natural Language Processing (NLP) refers to the use

Afaan Oromo is the language spoken by a large ethnic group in

•Afaan Oromo is one of the local languages spoken in Ethiopia, especially by

What will the performance of machine learning be in extracting and

1.4.2. Specific Objectives

You might also like