基于wordnet的汉语老挝语跨语言测验相似度计算资源-CSDN下载

需积分: 11 141 浏览量 2021-04-09 16:55:57 上传评论收藏 581KB PDF 举报

资源详情

资源评论

资源推荐

This paper is supported by National Nature Science Foundation No.61662040, 61562049

Chinese-Lao Cross-Language Test Similarity

Computing Based on WordNet

Sizhuo Li

1,2

, Lanjiang Zhou

*,1,2

, Jianan Zhang

, Feng Zhou

, Jianyi Guo

,Wenjie Huo

1,2

School of Information Engineering and Automation, Kunming University of Science and Technology,

Kunming 650500, China

The Key Laboratory of Intelligent Information Processing, Kunming University of Science and

Technology, Kunming, Yunnan 650500, China

Information Engineering University, Kunming team of the three schools 650500, China

Abstract. Text similarity calculation is widely used by information retrieval,

question answering system, plagiarism detection and so on. At present, most

research just aim at text similarity of the same language, and research on

cross-language text similarity calculation is rarely, differences between languages

make cross-language text similarity calculation very difficult, in view of this

situation, this paper propose a WordNet-based method of Chinese-Lao

cross-language text similarity calculation. First, preprocessing and feature

selection for Chinese text and Lao text which in medicine, then use the semantic

dictionary WordNet to convert the Chinese text and Lao text into a middle layer

language, finally, compute the text similarity between Chinese and Lao in the

middle layer.

Key words: WordNet; middle layer language; cross-language text similarity

1. Introduction

Text similarity computing has been widely discussed in the fields of linguistics,

psychology, information theory and so on. Text similarity calculation aims to compare

the correlation between the two texts. In recent years, the method of text similarity

computation

[1,2,3]

based on the same language is more and more perfect, the algorithm

model represented by the Boolean model, vector space model, probability model and

so on. However, the research on cross-language text similarity is very rare.

Cross-language text similarity is to quantify the similarity between two different

language tests, and make the quantitative results as far as possible in accordance with

the results of the artificial judgment. Due to the differences in grammar between

Chinese and Lao, we can not use the existing method which calculate the similarity of

text in the same language to calculate the similarity between Chinese and Lao text. At

present, there are several methods to calculate the similarity of cross-language text:

The method based on Machine Translation

[4]

, The method based on statistical

translation model

[5]

, The method based on Parallel Corpus

[6]

This paper is supported by National Nature Science Foundation No.61662040, 61562049

WordNet is a semantic dictionary using synonym set represents a concept and

has multi language version. The Chinese WordNet used in this paper is developed by

Southeast University and Lao semantic dictionary is constructed by our laboratory.

The synonym set synset_id of WordNet between different language versions are

corresponding to each other. Therefore, this paper uses this characteristic and

proposes the method of Chinese-Lao Cross-Language Test Similarity Computing

Based on WordNet in Medicine. This method uses the WordNet to convert the

Chinese text and Lao text into an middle layer language, then, compute the text

similarity between Chinese and Lao in the middle layer.

2. The process of Chinese-Lao text similarity computing

2.1 Text preprocessing

Although the original text contains all the text information, but the current Natural

Language Processing technology can not completely processing these text messages.

Therefore, we need processing the text. Because the method of this paper needs to

analyze the semantic of the word, so it is necessary to deal with some special words,

such as names, place names and so on. Then convert these special words into a

specific string. In feature selection, these special words are ignored to avoid noise

interference.

2.2 Text feature selection

The purpose of feature selection is to select the characteristic items which have the

real contribution to the similarity computing, and the selected feature item should be

able to express the theme of the original text. In this paper, the word is extracted as

the feature of the text, and each document is treated as a word bag. Through the word

segmentation and remove the stop words, the Chinese document and the Lao

documents can form a feature word set. Then, by using the method of document

frequency selection to remove the useless words that interfere with the original text.

Document frequency (DF) refers to the number of texts that contain the feature word t

in the set of whole text. When DF is greater than a certain threshold value, then

remove the t. Because the higher the DF, the more t appears in text. When DF is less

than a certain threshold value then remove the t, because t is either a rare word or

noise.

2.3 Conversion of language space between Chinese and Lao

This paper uses the WordNet to convert the Chinese text and Lao text into an middle

layer language, then, compute the text similarity between Chinese and Lao in the

middle layer. The conversion model is shown in Figure 1.

剩余7页未读，继续阅读

评论收藏

内容反馈

weixin_38726712

粉丝: 2

基于wordnet的汉语老挝语跨语言测验相似度计算

评论0

最新资源

基于wordnet的汉语老挝语跨语言测验相似度计算

评论0

基于多特征融合的汉语句子相似度计算

基于跨语言语料库的汉语和老挝语单词分布

基于大规模语料库的汉语词相似计算 (2010年)

基于 WordNet、GloVe 实现词汇相似度计算

基于知网(WordNet)的词语相似度计算

基于《知网》的词汇语义相似度计算

wordnet语义相似度计算

Wordsimilarity-wordnet相似度计算工具

Java之词义相似度计算（语义识别、词语情感趋势、词林相似度、拼音相似度、概念相似度、字面相似度）

利用WordNet计算词语语义相似度的jar包

基于词林的语意相似度计算

基于路径与词林编码的词语相似度计算方法.pdf

词语相似度计算研究.pdf

文本相似度计算方法研究综述1

vb实现用wordnet计算词间的相似度

概念嵌入在信息相似度计算中的应用.pptx

2017-3基于信息内容的词林词语相似度计算_彭琦1

句子相似度计算java

WordNet Similarity 词语相似度

两级相似度计算在主观题机器阅卷中的应用1

jwordnetNetsim ,jnws, wordnet-infocontent

知网的词汇语义相似度计算1

windows如何把已安装的nodejs高版本降级为低版本&node多环境

决策算法：智能选择的艺术

最新资源