0% found this document useful (0 votes)
12 views4 pages

Speech Processing

Uploaded by

Shubham Sagar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

Speech Processing

Uploaded by

Shubham Sagar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2012 Fourth International Conference on Computational and Information Sciences

Overview of the Speech Recognition Technology

Jianliang Meng, Junwei Zhang,Haoquan Zhao


School of Control and Computer Engineering
North China Electric Power University
Baoding, China
e-mail: [email protected]

Abstract—As a cross-disciplinary, speech recognition is based


on the voice as the research object. Speech recognition allows II. THE DEVELOPMENT PROCESS AND CURRENT
the machine to turn the speech signal into text or commands SITUATION OF THE SPEECH RECOGNITION TECHNOLOGY
through the process of identification and understanding, and Speech recognition research work began in the 50’s, Bell
also makes the function of natural voice communication.
Labs speech recognition system-Audrey system first
Speech recognition involves many fields of physiology,
psychology, linguistics, computer science and signal processing,
identifies the ten English digits. But it really made
and is even related to the person’s body language, and its substantial progress, and as an important issue in conducting
ultimate goal is to achieve natural language communication research in the late 60’s the early 1970s. Further speech
between man and machine. The speech recognition technology recognition in the 1980s, the HMM model and artificial
is gradually becoming the key technology of the IT man- neural network (ANN) are successfully used in speech
machine interface[1].The paper describes the development of recognition. 1988ˈFULEE Kai and others use the VQ/Iü
speech recognition technology and its basic principles, IMM method to achieve speaker-independent continuous
methods, reviewed the classification of speech recognition speech recognition system-SPHINX, including 997
systems and voice recognition technology, analyzed the vocabulary. This is the first of the world speech recognition
problems faced by the speech recognition. system, it is a high-performance, non-specific, large
vocabulary continuous speech recognition system. People
Keywords-basic principles;method; speech recognition; finally breakthrough of the three major obstacles, including a
application large vocabulary, continuous speech and non-specific. And
it identified the mainstream of statistical methods and models
I. INTRODUCTION in speech recognition and language processing.
Speech recognition is the machine on the statement or Speech recognition system has already begun from the
command of human speech to identify and understand and laboratory to practical; there have been more mature market
react accordingly. It is based on the voice as the research products. Many developed countries such as the United
object, it allows the machine to automatically identify and States, Japan, South Korea, as well as IBM, Apple,
understand human spoken language through speech signal Microsoft, AT&T and other well-known companies to invest
processing and pattern recognition. The speech recognition heavily in research and development of practical speech
technology is the high-tech that allows the machine to turn recognition system [2].
the voice signal into the appropriate text or command III. BASIC PRINCIPLES AND METHODS OF SPEECH
through the process of identification and understanding.
RECOGNITION TECHNOLOGY
Speech recognition is a cross-disciplinary and involves a
wide range. It has a very close relationship with acoustics, The speech recognition system is essentially a pattern
phonetics, linguistics, information theory, pattern recognition recognition system, including feature extraction, pattern
theory and neurobiology disciplines. With the rapid matching, the reference model library. Its basic structure is
development of computer hardware and software and shown in Figure 1:
information technology, speech recognition technology is
gradually becoming a key technology in the computer
information processing technology. Products to develop
speech recognition technology is also widely used in voice-
activated telephone exchange query information networks,
medical services, banking services, industrial control every
aspect of society and people’s lives. Many experts believe
that speech recognition is one of the 2000-2010 IT field ten
scientific and technological developments.

Figure 1 The basic principles of speech recognition system

978-0-7695-4789-3/12 $26.00 © 2012 IEEE 199


DOI 10.1109/ICCIS.2012.202
¦ c jk N (x μ , ¦jk )
The unknown voice through the microphone is k k
transformed into an electrical signal on the input of the bj (x ) = ¦ c jk bjk (x ) = jk
identification system, the first after the pretreatment. The k =1 k =1
system establishes a voice model according to the human
voice characteristics, analyzes the input voice signal and
1≤ j ≤ N
extracts the required features on this basis, it establishes the Among them, the N(X,­ jk, ě jk) for multi-dimensional
required template of the speech recognition. Gaussian probability function, ­jk mean vectorˈě jk side
Computer is used in the recognition process according to difference matrix, k is the bj(X) the number of mixed
the model of the speech recognition to compare the voice probability, cj(X)is the combination coefficient, and
template stored in the computer and the characteristics of the k
input voice signal. Search and matching strategies to identify
the optimal range of the input voice matches the template.
¦ c jk
k =1
= 1
According to the definition of this template through the look- HMM is a more complete expression of acoustic model
up table can be given the recognition results of the computer. of the voice,and it uses statistical methods of training the
Representative speech recognition methods include underlying acoustic model and the upper voice model into
dynamic time warping (DTW), hidden Markov model the unified voice recognition search algorithm can obtain
(HMM), vector quantization (VQ), artificial neural network better recognition results, and can be used for continuous
(ANN), support vector machine (SVM) and so on.The article speech recognition, but the drawback is the need to be very
focuses on two methods of hidden Markov model (HMM) sophisticated calculations and a longer training sequence[4].
and artificial neural network (ANN).
B. Artificial Neural Network (ANN)
A. Hidden Markov Model (HMM) Artificial neural network ANN (based Artificial Neural
As a statistical model,Hidden Markov Models Hidden Networks), analogous to the way biological nervous systems
Markov Model (HMM) analysis founded in the 1970s and process information, using a large number of simple
1980s has been the dissemination and development and processing units connected in parallel to form a complex
successfully applied to the modeling of the acoustic information processing system. This system has the training,
signal.To the 1990s, HMM has also been the introduction of highly parallel, rapid judgment, fault tolerance features
the computer word recognition and mobile communication applies voice signal processing. Speech recognition neural
core technology of multi-user detection. So far, it is still networks are usually divided into two categories, a class of
considered to be the most successful approach to achieve fast neural networks or neural networks with the traditional
and accurate speech recognition system. HMM, the DP combination of hybrid network, the other is
The HMM model parameters represent the time-varying the establishment of the auditory neural network model
characteristics of the voice signal. It consists of two based on human auditory physiology, psychology research.
interrelated stochastic processes common to describe the Neural network model that more commonly used and has
statistical characteristics of the signal. One of which is the potentiating of speech recognition mainly include single-
hidden (unobserved) finite-state Markov chain, and the other layer perception model, multi-layer perception model,
is the observation vector associated with each state of the Kohonen self-organizing feature map model , radial basis
Markov chain stochastic process (observable). Reveal function neural network , predictive neural network etc. In
characteristics of the hidden Markov chain depends on the addition, in order to make the neural network reflects the
signal characteristics can be observed. In this way, a certain dynamic of the speech signal time-varying characteristics,
period of time varying signals such as voice characteristics delay neural network, recurrent neural network and so on.
described by the random process corresponding to the Artificial neural network technology in voice recognition
symbols of state observation. Signal described by the hidden applications mainly the following aspects:
Markov chain transition probability changes with time. a) Reduce the modeling unit, generally in the phoneme
HMM model in a state j under the corresponding modeling to improve the recognition rate of the entire
observed values by a set of probability bik, k = 1,2,…, M, to system by improving the recognition rate of phonemes.
describe, it is one of the M discrete countable observations,
b) Depth study of the acoustic model, the auditory
and thus known as the discrete the HMM. When the
observed value of a continuous random variable X, its model, the brain operation mechanism, the introduction of
corresponding observed values in the state j observed by a context information, in order to reduce the impact of
probability density function bj (X), which became continuous changes in voice more than the speech signal.
HMM. Continuous HMM using the Baum-Welch algorithm c) Extracted from the speech signal in a variety of
to estimate model parameters applied in the estimation of ±, features, a hybrid network model (HMM + NN), and apply a
A parameter, but the description in the estimation of bj (X) variety of knowledge sources (phonemes, vocabulary, syntax
parameter must be a certain limit can be established. Current and meaning of the word), for voice recognition to
most widely used is the Gaussian bj (X) it can be represented understand the research, to improve system properties[5].
using the following formula [3]: Speech recognition using artificial neural network
technology, including e-learning process and the speech
recognition process, shown in Figure 2. The network

200
learning process is to known speech signal as a learning IV. APPLICATION OF SPEECH RECOGNITION TECHNOLOGY
sample, self-learning neural network, and ultimately a set of AND THE FACING PROBLEMS
connection weights and bias. The speech recognition process
is to test the voice signal as network input, the recognition A. Application of speech recognition technology
results obtained through the network of associations. The key The world to speed up research and development of
of tthese two processes is to strike a speech characteristic speech recognition applications, there are some practical
parameters and neural network learning. speech recognition system put into commercial operation.
The typical speech recognition system-VRCP system
developed by AT&T in 1992 .The system is five words
(collect, person, third number, the operator and calling card),
non-specific small-vocabulary speech recognition system,
has been used in AT&T Communications online, you can
achieve the automatic operator-assisted call, instead of the
operator completed five kinds of call type.
In September 1996, Charles Schwab launched the first
large-scale commercial speech recognition application
systems: the stock quotation system. The system was also the
first in the financial field speech recognition system. The
system is effective to improve the quality of service and
customer satisfaction, and reduce call center costs. Soon,
Schwab opened the speech of stock trading system.
Departments in major U.S. telecom operator Sprint PCS
has the largest digital wireless networkˈat the same timeˈ
known for excellence and innovative customer service.The
opening voice-driven systems for clients since 2000. The
system provides customer service, voice dialing, check
number, and change addresses and other services. In addition,
China Telecom has launched a voice recognition integration
of value-added services system CELL-VVAS,
(VOICEVALUE-ADDED SYSTEM), the system uses a
distributed excellent recognition engine ,developed a stable
and efficient application. The system also perfectly
integrated telecommunications switching network
application to provide users with a variety of user-friendly,
personalized service[7].
Figure 2 Artificial neural network speech recognition process Another development branch of speech recognition
The application of artificial neural networks in the field technology is the development of the telephone voice
of speech recognition has been greatly developed in recent recognition technology, Bell Labs is a pioneer in this regard,
years, artificial neural networks in speech signal processing the telephone voice recognition technology will be able to
can be divided into the following areas: firstly, improve the telephone inquiries, automatic wiring, as well as some
performance of artificial neural networks. Secondly, artificial specialized operations, such as tourist information and other
neural network has been developed method combines a operations. After the bank use the voice query system of
hybrid system. Thirdly, explore the use of newly emerging or speech understanding technology, it can provide customers
widespread concern mathematical methods constitute the with 24-hour Phone Banking Service. Securities industry,
unique nature of the neural network, and applied to the field using telephone speech recognition audio system, then, the
of speech signal processing [6]. user would like to query market could speak out the stock
The application of artificial neural networks in speech name or code system to confirm the user's requirements, will
recognition has become a new hotspot. Artificial neural automatically read the latest stock price, which will greatly
network technology has been successfully applied to solve facilitate the user . In the 114 directory assistance artificial
pattern classification problems, and was shown to have voice technology, you can let the computer to automatically
enormous energy, we can predict that in the last decade, answer the needs of users, and then playback the phone
artificial neural network-based speech recognition system number of the query, thus saving human resources.
products will appear in the market, people will adjust their B. The facing problems
own way of speaking to accommodate a variety of
recognition system. At present, speech recognition research progress has been
slow, mainly in theory has been no breakthrough. Although a
variety of new amendments continue to emerge, but also the
lack of general applicability. Mainly in:

201
Poor adaptability of the speech recognition system is recognition system Human beings in the short term is also
mainly reflected in the dependence on the environment, If impossible to create a people comparable to the speech
you collected speech training system in certain recognition system, to build such a system is still a big
circumstances, the system can only be application in this challenge facing humanity, we can only forward step by step
environment, otherwise the system performance will be a direction to improve the speech recognition system.
sharp decline, anther problem is that this system does not
respond correctly for the error input of users. Additionally, REFERENCES
the progress of speech recognition in noisy environments is [1] Yu Tiecheng. The current development of speech recognition [J].
very difficult, because at this time people's pronounce varies Communication World, 2005.
greatly , like voice, slow speech rate, pitch and formant [2] Ren Tianping. Application of speech recognition technology [J]. Henan
changes, which is the Lombard effect, must find a new signal Science and Technology, 2005.
analysis and processing approach.
[3]L A Liporace.Maximum Likelihood for Multivariate Observation of
Understanding of the human auditory comprehension, the
accumulation of knowledge and learning mechanism and MarkovSources. IEEE.Trans. IT, 1982, 28(5): 729-734
system of the brain control mechanism is still unclear, and [4] Zhang Ping, Zhang Qiong. Based on HMM and BP neural network for
secondly, the existing achievements of this aspect is used in speech recognition [J]. Cross-century, 2008.
speech recognition also remains a difficult process. [5] Yin Peng, Li Tao, Wang Haibing.Intelligent neural network system
composed of the principle in speech recognition.Mini-Micro
V. CONCLUSIONS Systems,2000,21(8):836-839.
From the problems faced by the speech recognition, [6] Jiang Ming Hu, in the Yuan Baozong, Lin Biqin. Neural networks for
speech recognition systems in order to be widely used still speech recognition research and progress. Telecommunications
have a lot of areas for improvement. However, it is Science,1997,13(7):1-6.
foreseeable in the near future that, with the voice recognition [7] Huang Shan. Voice recognition systems in the telecom prepaid business
technology continues to progress, the speech recognition
applications [J]. Information Science, 2010.
system will be more in-depth, the application of speech
[8] Yangshang Guo, Yang Jinlong. The speech recognition technology
recognition systems will be more extensive[8]. A variety of
speech recognition systems will appear in the market, people overview [J]. Computer, 2006.
will adjust their speech patterns to adapt to a variety of

202

You might also like