0% found this document useful (0 votes)
104 views

A Class of Physical Modeling Recurrent Networks For Analysis / Synthesis of Plucked String Instruments

A class of physical modeling recurrent networks for analysis / synthesis of plucked string instruments

Uploaded by

inassociavel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

A Class of Physical Modeling Recurrent Networks For Analysis / Synthesis of Plucked String Instruments

A class of physical modeling recurrent networks for analysis / synthesis of plucked string instruments

Uploaded by

inassociavel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO.

5, SEPTEMBER 2002 1137

A Class of Physical Modeling Recurrent Networks


for Analysis/Synthesis of Plucked String Instruments
Alvin W. Y. Su, Member, IEEE, and Sheng-Fu Liang

Abstract—A new approach is proposed that closely synthesizes the excitation wavetable takes lots of memory space. Finally,
tones of plucked string instruments by using a class of physical the solution for special effects such as portamento usually seen
modeling recurrent networks. The strategies employed in this in plucked-string instruments is not addressed.
paper consist of a fast training algorithm and a multistage training
procedure that are able to obtain the synthesis parameters for In [14] and [15], a recurrent network based approach called
a specific instrument automatically. The training vector can be scattering recurrent network (SRN) for simulating the vibration
recorded tones of most target plucked instruments with ordinary of a plucked string succeeded in synthesizing plucked-string
microphones. The proposed approach delivers encouraging results tones. The structure of the SRN is similar to a lattice filter
when it is applied to different types of plucked string instruments because this is the basic form that simulates one-dimensional
such as steel-string guitar, nylon-string guitar, harp, Chin,
Yueh-chin, and Pipa. The synthesized tones sound very close to (1-D) wave propagation. One of the major contributions of
the originals produced by their acoustic counterparts. In addition, this approach is that the system parameters can be determined
this paper presents an embedded technique that can produce automatically by using the backpropagation through time
special effects such as vibrato and portamento that are vital to the (BPTT) training algorithm [16]. However, there exist some
playing of plucked-string instruments. The computation required difficulties when this technology is applied to practical music
in the resynthesis processing is also reasonable.
synthesis systems. First, structures of musical instruments are
Index Terms—Physical modeling, plucked string instruments, usually too complicated to be modeled by such a simple 1-D
portamento, recurrent networks.
model. Even if a multidimensional architecture is used [17], the
computation for the re-synthesis processing will be enormous.
I. INTRODUCTION Second, it is usually difficult to measure the time domain
responses of a played instrument at various positions so that
T RANSIENT responses of most acoustic instruments are
very difficult to reproduce. This is also the main reason
that synthetic sounds are not realistic enough with traditional
the measurement can be used as the training vector. Third, the
BPTT method takes lots of iterations to converge. Finally, the
simple waveforms used by SRN as the excitation signals can
approaches such as wavetable and FM methods. Model-based
no longer produce good synthesis results in any case.
approaches claim to be able to reproduce such dynamics by
What we want to achieve is to accurately synthesize the tones
modeling the sounding mechanism of a target instrument phys-
for any specific plucked-string instrument with reasonable
ically. There are plenty of works focusing on analysis and mod-
cost. Furthermore, it is desired that the synthesizer design
eling of piano soundboards, and top plates and air cavities of
can be done automatically. Therefore, several modifications
guitars and violins [1]–[3]. Techniques such as finite element
to the SRN method are proposed. First, the architectures of
based methods and ray-tracing methods are useful in analyzing
the networks are simplified so that the complexities required
musical instruments but none of them are practical enough to
in the training stage and the synthesis stage can be reduced.
be used to synthesize musical tones in real-time applications.
Second, the training vectors can be musical tones recorded by
The most successful applications of model-based techniques
using ordinary microphones. This allows easy measurement for
are compression, synthesis, and recognition of speech signals
users without complicate measurement devices. Third, a new
by simulating human vocal tracts with a class of digital lattice
training algorithm modified from simulated annealing resilient
filters [4], [5]. Among several physical-modeling music syn-
backpropagation (SARPROP) is used to speed up the training
thesis methods, the digital waveguide filters (DWFs) [6]–[8],
and obtain better system parameters [16], [18]. Fourth, the
[11] and the wave digital filters (WDFs) [9], [10] are the most
excitation wavetable should be kept small in its size and can
popular and practical ones. An efficient way of applying the
be obtained in the training process. It is noted that it is very
DWF method to plucked-string instruments has been proposed
difficult for the training to converge to a good solution without
in [12], [13]. There are some problems with these approaches,
this step. Finally, portamento and vibrato effects should be
however. First, the synthesizer design is complicated. Second,
embedded.
In Section II, a class of physical modeling recurrent networks
Manuscript received November 15, 1999; revised March 19, 2001. This work is proposed and the simplified version of the networks is pre-
was supported in part by National Science Council, Taiwan, R.O.C., under Grant
NSC 89-2218-E-006-132. sented. Its connection with the 1-D string model is described. In
A. W. Y. Su is with the Department of CSIE, National Cheng-Kung Univer- Section III, resynthesis processing with the proposed technique
sity, Tainan, Taiwan. is presented. In Section IV, a multistage training procedure and
S.-F. Liang is with the Department of Electrical and Control Engineering,
National Chiao-Tung University, Hsin-Chu, Taiwan. a new training algorithm are presented. Synthesis model pa-
Publisher Item Identifier S 1045-9227(02)03982-6. rameters and the excitation signal are obtained in this stage.
1045-9227/02$17.00 © 2002 IEEE
1138 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

Techniques to produce vibrato and portamento are described in


Section V. In Section VI, analysis and synthesis works are per-
formed over several plucked-string instruments. Conclusion and
suggestions of future work are given in Section VII.

II. PHYSICAL MODELING RECURRENT NETWORK FOR


PLUCKED-STRING INSTRUMENTS
The basic idea of physical modeling synthesis techniques is
to simulate the dynamic behavior of a musical instrument. The
Fig. 1. A nonuniform junction where the acoustic impedances on the two sides
major problem with this synthesis technique is the determina- are not identical.
tion of the synthesis model parameters. A recurrent network
synthesis model called SRN [14], [15] was proposed to solve the
parameter determination problem when a simple musical string rameters with the BPTT method [16]. In the synthesis phase, the
is modeled. initial excitation is the waveform obtained with an interpolation
The structure of the SRN model is constructed based on the method by using the magnitudes of the measurement at
physical model of an acoustic string and this network is suc- of the pickups. Fig. 3(a) shows such an initial excitation of SRN
cessfully used to synthesize some realistic tones for a plucked for the modeling of a Chin E string. In our experiments, simple
musical string that was analyzed by SRN. Morse derived the triangular-like waveforms can be used as the initial excitation to
1-D wave equation for a vibrating string [1]. The wave equation simulate the “plucks.”
of a string with purely resistive loss is Although SRN succeeds in synthesizing the tones of plucked
strings, it fails to be the synthesis model of a musical instrument
(1) because a tone produced by a plucked-string instrument is the
combined responses of strings, bridge, body, and air cavity with
where is the string tension, is the resistive parameter and respect to a pluck. Several modifications to the SRN method
is the string density. Here, the force is assumed to be linearly are proposed for synthesizing string instrument tones. First, the
proportional to the transverse velocity [2], [15], [19]. The gen- computation required for the SRN is too large in practice. A
eral solution to (1) can be obtained as simplified version of the physical modeling recurrent network is
proposed so that the computation required in the training stage
and the synthesis stage can be reduced. The new network struc-
(2) ture is shown in Fig. 2(b). This model consists of three basic
components: processing blocks (PBs), simple delay lines and
where is the traveling wave speed. is used two reflective ends. Between two adjacent PBs, there is a pair
to represent the displacement of a vibrating string as the func- of delay lines that make the connection. The PBs simulate the
tion of position and time. Let the sampling period be , the dis- energy loss as well as the scattering behavior [19]. The struc-
crete-time signal representation of (2) is shown as ture of a PB is shown in Fig. 4(a) and the computation is iden-
tical with that of SRN. There are three types of neurons in a PB,
displacement neurons, arrival neurons and departure neurons.
(3) The output of a displacement neuron, denoted by , repre-
sents the amplitude at the th sampling position in PB- . The
Since a real string may not be uniform in its construction, scat-
outputs of arrival neurons, denoted by and , represent
tering junctions are applied to model a nonuniform string [2],
the right-going traveling wave and the left-going traveling wave
[19]. If there is a nonuniform junction on a string, let the char-
flowing into displacement neuron , respectively. The outputs
acteristic impedances of the two sides be and , as shown
of departure neurons, denoted by and , represent the trav-
in Fig. 1. The right-going traveling wave flowing to this junc-
eling waves leaving and injecting into the right-hand-side
tion from the left-hand side and the left-going traveling wave
delay line and the left-hand-side delay line, respectively. A pair
flowing to this junction from the right-hand side are and ,
of delay lines that connect PB- and PB- is shown in
respectively. The relation among the traveling waves can be de-
Fig. 4(b). Within each pair of delay lines, signals pass through
scribed as follows (readers can refer to [15], [19] for thorough
them directly without any modification. Thus, the computation
physical explanation)
in the synthesis processing can be reduced. In our experiments,
(4) seven PBs are used in the proposed model and each PB con-
tains three displacement neurons. If the system requires 100 unit
and
delays from one fixed end to another, the computation cost is
(5) about times of the original SRN model [15]. When a trav-
eling wave meets a fixed end, it will completely reflect back
According to (2)–(5), the SRN model is shown in Fig. 2(a). with opposite phase. The operation of two reflective ends in the
Electromagnetic pickups are used to measure the vibration of a proposed model is shown in Fig. 4(c).
plucked musical string at various sampling positions. The mea- Second, electromagnetic pickups used in [15] can only mea-
surement is used as the training vector to obtain the system pa- sure the string vibration and the result is not what people usually
SU AND LIANG: CLASS OF PHYSICAL MODELING RECURRENT NETWORKS 1139

(a)

(b)
Fig. 2. The SRN model and the proposed synthesis model. (a) The SRN model for the modeling of musical strings. (b) The proposed new network structure for
synthesis of plucked-string instrument tones.

waveforms cannot be used when such complex waveforms are


analyzed. Since the wavetable size is kept to a minimum in the
SRN approach, it is desired to keep this property. The excitation
wavetable is also obtained in the training process and its size is
equal to the length from one reflective end to the other. Fig. 3(b)
shows the excitation signal for synthesizing a Chin tone. It is
found that this excitation waveform is much more complicated
than the one used in modeling musical strings.

III. SYNTHESIS PROCESSING


The synthesis processing of the proposed model contains two
stages: the initialization stage and the propagation stage. In the
initialization stage, the excitation waveform is loaded into the
(a) synthesis model with suitable system parameters obtained from
the training stage. Then, the excitation waveform as Fig. 3(b)
is distributed into the upper and lower tracks in the synthesis
model shown in Fig. 2(b), respectively. After the initialization,
the propagation operation starts to generate the desired synthe-
sized data without any additional information.
Initialization stage
In this stage, an initial excitation waveform has to be pro-
vided. The size of the initial waveform equals to the total delay
length, , in the upper track and lower track of the synthesis
model shown in Fig. 2(b). If the delay length in both tracks is
unit delays, it is computed as

(6)

where is the sampling rate of the synthesis system and is


(b)
the fundamental frequency of the desired tone. In our experi-
Fig. 3. The excitation waveforms for string modeling and tone synthesis. ment, the size of is only hundreds of samples. Therefore, the
(a) The excitation waveform of SRN for the modeling of a Chin E string.
(b) An excitation signal for the synthesis of Chin tone. memory cost is much less than that of a Wavetable method as
well as other traditional model-based synthesis techniques that
require longer recorded tone as the excitation signal (thousands
hear. Actually, the sound picked up at some distance is closer of samples with 44.1 kHz sampling rate) [12]. According to
to what we hear. Therefore, microphones instead of pickups Fig. 4(a), is the initial magnitude of the displacement neuron
are used to obtain string instrument tones as training vectors. in the PB- , i.e.
Third, the tone of a string instrument is much more complex
than that of a vibrating string. Simple triangular-like excitation (7)
1140 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

(a)

(b) (c)
Fig. 4. The three basic components in the synthesis model shown in Fig. 2(b). (a) The structure of a PB. (b) A pair of delay lines that connect PB-(i 0 1) and
PB-i. (c) The operation of two reflective ends.

Then, it is equally distributed into the right-going departure where the upper-track boundary delay buffer receives the
neuron and the left-going departure neuron as follows: right-going output signal of PB- ,
and the lower-track boundary delay buffer receives the
(8) left-going output signal of PB- .
There are two basic operations in a neuron. The first one sums
the signals flowing into this neuron as the net-input. The second
According to Fig. 4(b), represents the initial magnitude of one is the so-called activation function that is a mapping be-
the th delay unit of the th pair of the delay lines. Similar to tween the net-input and the corresponding output. The arrival
(8), the initial values of delay buffers are also one half of . neurons receive the weighted outputs from the departure neu-
rons or those from the delay buffers as
(9)

After the initialization stage is finished, the propagation stage


starts.
Propagation stage
Let the number of PBs be in the physical modeling recur- (12)
rent network and the number of displacement neurons in each and
PB be . According to Figs. 2(b) and 4, let and repre-
sent the th delay buffers of the th delay segments in the upper
and lower tracks connecting PB- and PB- , the traveling
waves in the upper and lower tracks can be represented as
(13)

where denotes the net-input of arrival neuron and


is the activation function. These notations are also used in
the following derivations. The displacement neurons receiving
the outputs from the adjacent arrival neurons is obtained by
(10)
and

(11) (14)
SU AND LIANG: CLASS OF PHYSICAL MODELING RECURRENT NETWORKS 1141

Finally, the traveling waves departing from the displacement


neurons to the nearby segments can be computed by

(15)
and

(16)

Equations (10) through (16) represent a propagation cycle to


produce a synthesized sample for one time step. The synthe-
sized signal is the output of a chosen displacement neuron. Al-
though nonlinear activation functions with trainable parameters
could possibly be used for better performance, it increases com-
putation complexity. Therefore, the activation function for each
neuron is an identity function in all of the experiments and ac-
tivation functions are discarded in the later sections to simplify
our notation.

IV. TRAINING
Fig. 5. The multistage training strategy for determination of model
There are some improvements with the model compared to parameters.
the one in [15]. First, a multistage training procedure is used
to obtain multiple sets of synthesis parameters for the varying
characteristics of an instrument. Second, a supervised training a quick gradient descent algorithm, called resilient backpropa-
method is used to obtain the synthesis parameters and the ini- gation (RPROP) [22], with a simulated annealing (SA)-based
tial excitation waveform automatically. Third, a hybrid-training global searching technique [23]. The RPROP takes into account
algorithm is used to speed up the training and obtain better syn- the sign of the gradient as seen by a particular parameter instead
thesis parameters. of the magnitude of the gradient. The SA involves the addition of
random noise to the parameter updates as well as decreases the
A. Multistage Training Strategy magnitudes of the updates in the training process gradually. The
The multistage training strategy for the synthesis model SARPROP method can indeed converge much faster compared
is shown in Fig. 5. The mean square difference between the to the BPTT method. In our experiments, it is found that the
recorded and the synthesized tones is used to adjust the initial SARPROP method is very sensitive to the learning parameters
excitation waveform and synthesis parameters. In Stage #1, the and the initial condition of the system parameters. The training
initial excitation waveform, denoted by and in (7) and diverges or converges to a totally unacceptable solution for a re-
(9), as well as the first set of the synthesis parameters have to current network soemtimes.
be determined. This training stage employs the recorded tone In this paper, a hybrid-training algorithm consisting of BPTT
within the interval as the training vector and the and SARPROP as shown in Fig. 6 is used in the training pro-
resultant parameters are used for synthesizing a tone from cedure of the proposed physical modeling recurrent network.
to . Because the synthesis processing no longer requires Since this synthesis network is a recurrent neural network,
external signals after the initialization stage, it is not necessary BPTT is used to calculate the magnitude of the gradient
to have the initial input waveform updated after this stage. for each parameter and the corresponding parameter update
Stage #2 begins at by using the recorded tone from to value is obtained by SARPROP. In Stage #1, both synthesis
as the training vector and the second set of parameters is parameters and initial excitation waveform must be updated.
obtained when this training stage finishes. This procedure stops Only synthesis parameters are updated in the other stages.
when all the training vectors are finished. Particularly, the synthesis network is constructed based on a
simplified physical model of a musical instrument. Therefore,
B. Training Algorithm each of the synthesis parameters has its physical meaning.
For a recurrent neural network (RNN), BPTT [16], [20] is The -type parameters simulate the nonuniform characteristics
a widely used training algorithm that is an extension of the at various physical positions and the -type ones simulate
standard backpropagation algorithm. This algorithm requires at the energy decay factors. The initial values of the synthesis
least 10 000 epochs to converge when it is applied to the pro- parameters can be reasonable values derived from the physical
posed synthesis model. In [18], the SARPROP method is pro- characteristics of the target instrument instead of random
posed for feedforward type networks. This algorithm combines values such that the training can be better. This is different
1142 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

match the desired output is called the visible neuron and the
remaining neurons are called hidden neurons. In each training
stage, let denote the recorded tone at time and let the
output of the corresponding displacement neuron , which is
the th displacement neuron in PB- , be chosen as the synthetic
result of the synthesis model. The error signal at any time is
defined as

(17)

and the error function is defined as

(18)

The total cost function to be minimized in the interval is


defined by

(19)

The gradient values for the -type parameters corresponding to


time layer can be derived as
Fig. 6. The hybrid-training algorithm consisting of BPTT and SARPROP for
the training procedure of the proposed model.

(20)
from most other applications using neural networks as their
parameter-finding mechanisms. where represents the local error of the neuron at
The temporal operation of the proposed model can be time layer . The gradient values of the -type parameters at
unfolded into a multilayer feedforward architecture with time layer can also be derived as shown in (21) and (22) at the
synchronous update. Those who are interested in this can bottom of the page.
refer to references such as [15], [21]. A neural network layer The local error of a displacement neuron can be obtained by
representing one time instant is called a time layer. Since (23). If a displacement neuron is a hidden neuron, its gradient
the synthesized signal is the output of a chosen displacement value can be computed based on the collection of the local errors
neuron, only this displacement neuron can have a teacher of the departure neurons connected with it. If this displacement
signal that is the recorded tone of the target instrument. The neuron is a visible neuron (denoted as ), it means that it will
displacement neuron that generates the synthesized output to directly contribute the error signal, as shown in (17), to the total

(21)

and

(22)
SU AND LIANG: CLASS OF PHYSICAL MODELING RECURRENT NETWORKS 1143

cost function. Therefore, this error signal must be involved in The local errors of the delay buffers in the upper track and the
the computation of the local error corresponding to this neuron lower track can be computed as shown in (28) and (29) at the
bottom of the page.
Since the total gradient for each parameter is the sum of the
gradient value corresponding to every time layer, the total gra-
dient for the synthesis parameters in one epoch can be computed
by
otherwise.
(30)
(23)
and
The local error of a departure neuron is obtained based on the
local error of the arrival neuron or the delay buffer connected
with this departure neuron by
(31)

In addition, if the training is in Stage #1, the excitation signal


used in (7) and (9) should be obtained in a similar way. When the
backpropagation computation is performed back to , the
(24)
gradient value for the excitation signal corresponding to delay
and buffers is computed by

(32)

and the gradient value for initial waveform corresponding to the


(25) displacement neurons is computed by

The local error of an arrival neuron is obtained based on the


collection of the local errors of both the displacement neuron
and the departure neuron connected with this arrival neuron and
otherwise
it can be computed by
(33)
According to Fig. 6, the amount of gradient values of the
synthesis parameters or the excitation waveform is obtained
(26) by BPTT for each epoch. Then, they are transferred to the
SARPROP [18] to determine the amount of adjustment. The
and
neural network used in [18] was a multilayer perceptron (MLP)
structure and the initial update parameter was 0.1. If the ini-
tial values of the network parameters are assigned randomly, it
(27) is found that this approach is not good for our application. Phys-

(28)
and

(29)
1144 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

(a) (b)

(c)
Fig. 7. Simulation of left-hand finger-gliding playing behavior. (a) The left-hand finger glides along the string from position A to position B. (b) The left-hand
finger glides along the string from position B to position A. (c) Modified structure of the proposed model for simulating the finger-gliding effect.

ically, the -type parameters are the reflection coefficients and


their range should fall between 1,1 . For the -type parame-
ters, they represent the energy loss factors and the range should
be around unity. The following learning parameters are deter-
mined empirically and found to be useful in many experiments.
The initial update parameter is 0.0001. The temperature
parameter is 0.01. The rest of the constants are set as follows.
.
In [15], the training processing of SRN required at least
10 000 iterations by using BPTT. In this paper, it is reduced to
less than 1000 epochs for each training stage and the results
are superior when SNR tests are concerned. BPTT is used to Fig. 8. Operations of -type parameters used for portamento and vibrato. The
update the synthesis parameters and the excitation waveform in solid curve represents the situation that the reflection coefficient of the position
Stage #1 to avoid unstable situation caused by SARPROP.
0
is changed to 1, which makes the junction be totally reflective. The dash
curve represents the situation that the original reflection coefficient is gradually
restored.
V. EMBEDDED VIBRATO AND PORTAMENTO PROCESSING
Some plucked-string instruments have no fret. A player’s fin- the finger-gliding effect, the structure of the delay lines shown
gers can glide along strings to produce effects such as wide in Fig. 4(b) within the corresponding gliding region has to be
range vibrato and portamento. In this section, an embedded ef- changed to the PB structure shown in Fig. 4(a). For example,
ficient method for such effects is introduced. An example is if the -type parameter is 1 and the -type parameter is 0,
shown in the next section. the traveling waves in the upper track and the lower track will
When a left-hand finger is gliding along a string from position pass through the displacement neuron directly without any
A to position B, as shown in Fig. 7(a), the length of a vibrating modification. Therefore, the proposed model shown in Fig. 2(b)
string becomes shorter gradually and the pitch of the tone also is changed to the one shown in Fig. 7(c). If we want to simulate
changes from low to high. On the contrary, when the left-hand the behavior of shortening the string such as the situation of
finger glides along the string from position B to position A, as shortening the length of the model from position A to position
shown in Fig. 7(b), the length of the vibrating string becomes B, it can be realized simply by changing the -type parameters.
longer gradually and the pitch changes from high to low. If the original value of is , the value of is decreased
Since the proposed model is constructed based on the gradually from to along the solid curve shown in Fig. 8.
physical model of vibrating strings, the length of this model When equals to , the position becomes a fixed end where
must change as the gliding behavior stated above to simulate the left-going traveling wave in the lower track reflects back to
the vibrato and portamento. In order to realistically produce the upper track with opposite phase. This means that the left-hand
SU AND LIANG: CLASS OF PHYSICAL MODELING RECURRENT NETWORKS 1145

(a) (b)

(c) (d)

(e) (f)

Fig. 9. Original and synthetic tones of various plucked-string instruments. (a) Steel-string guitar. (b) Nylon-string guitar. (c) Harp. (d) Pipa. (e) Yueh-chin. (f) Chin.

finger pressed firmly on the physical position corresponding to The synthesized tones produced with these operations can no
position B. In this case, outputs of all the displacement neurons longer sound so similar to the tones produced by the target
in the region to the left-hand side of position B are forced to zero. acoustic instruments. However, if the initial part of the synthetic
On the contrary, if the value of is restored gradually from tone is similar to the original, subjects tend to consider that
to its original value along the dash curve shown in Fig. 8, the these two are produced from the same instrument and the
model is gradually restored to the original situation. The pitch special effects are simply ornaments.
of synthetic tone will change from high to low.
Vibrato and Portamento effects are actually produced by VI. ANALYSIS/SYNTHESIS OF PLUCKED-STRING INSTRUMENTS
combining such shortening and lengthening operations. If the
gliding on the string can be described by a function of time, The followings are the analysis/synthesis experiments
the above effects can be easily achieved by changing the -type with respect to various types of plucked-string instruments,
parameters of the proposed model according to this function. steel-string guitar, nylon-string guitar, harp, Pipa, Yueh-chin,
1146 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

TABLE I
THE SIGNAL-TO-NOISE RATIOS OF SYNTHETIC RESULTS CORRESPONDING TO VARIOUS MUSICAL INSTRUMENTS

(a) (b)

(c) (d)

Fig. 10. Short-time-Fourier analysis of the signals shown in Fig. 9(d) and (f). (a) STFA of original Pipa tone. (b) STFA of synthesized Pipa tone. (c) STFA of
original Chin tone. (d) STFA of synthesized Chin tone.

and Chin to demonstrate the performance of the proposed Fig. 10(c) and (d) that the Chin tone has a smoother decay
method. Pipa, Yueh-Chin, and Chin are three Chinese tra- pattern compared to the Pipa tone. Although the SNR results
ditional plucked-string instruments [24]. In each case, there do not look impressed, such performance is not possible in the
are 7 PBs in the proposed model and each PB contains three past. In fact, most physical modeling synthesis methods can
displacement neurons. only reproduce the magnitude part of the frequency response.
1) Synthesis Results: The analysis/synthesis results are In general, the first few fractions of a second of a tone are how
shown in Fig. 9. Upper part of each subfigure shows the original people judge the instrument. Listening tests show that subjects
tone and lower part shows the corresponding synthesized tone. can find differences between the original and the synthesized
The waveforms of the original tone and the synthesized tone are tones but consider that they do sound very similar and regard
very close to each other. The SNR for each of the pairs is shown that the tones are generated from the same instruments.
in Table I. By examining Fig. 10(a) and (b), there are still small 2) Portamento Effects: Portamemto and vibrato are fre-
differences coming from the high-frequency components. In quently used in the playing of many stringed instruments. Chin
general, if the sounding mechanism of an instrument is less is an ancient Chinese plucked-string instrument that consists
perfect, the synthesis is more difficult. For example, Pipa, of a shallow rectangular-like wooden chamber and seven
a lute-like instrument, has a very thin top plate, a nonrigid strings and is the known instrument that uses portamento and
bridge and less well-constructed strings [25]. Therefore, its vibrato most. Since there is no fret on the top plate, the player’s
response is less smooth compared to instruments such as harp left-hand fingers can glide along strings to produce vibrato and
and Chin. This can also be seen on the STFT plots shown in portamento effects.
SU AND LIANG: CLASS OF PHYSICAL MODELING RECURRENT NETWORKS 1147

or struck-string instruments. Because different types of instru-


ments have different structures, it is necessary to design suit-
able physical models for them.

ACKNOWLEDGMENT
The authors would like to thank the editor and the reviewers
for their valuable comments.

REFERENCES
[1] P. M. Morse, Vibration and Sound. Woodbury, NY: Amer. Inst.
Phys./Acoust. Soc. Amer., 1936.
[2] N. H. Fletcher and T. D. Rossing, The Physics of Musical Instru-
ments. New York: Springer-Verlag, 1991.
[3] L. Cremer, The Physics of the Violin. Cambridge, MA: MIT Press,
Fig. 11. The STFT plot of the synthetic tone with portamento effect. 1984.
[4] A. V. Oppenhim and R. W. Schafer, Discrete-Time Signal Pro-
cessing. Englewood Cliffs, NJ: PrenticeHall, 1989.
[5] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing
The technique described in the previous section is used of Speech Signals. New York: Macmillan, 1993.
to simulate portamento. Fig. 11 shows the STFT plot of the [6] J. O. Smith, “Physical modeling using digital waveguides,” Comput.
Music J., vol. 16, no. 4, pp. 74–87, 1992.
simulation. The fundamental frequency is shifted from 190 [7] , “Music Application of Digital Waveguide,” Stanford Univ., Stan-
Hz to 215 Hz. Though the fundamental frequency is shifted ford, CA, CCRMA Tech. Rep. STAN-M-67.
to the desired pitch, the timbre has changed and is different [8] , “Efficient synthesis of stringed musical instruments,” in Proc.
1993 Int. Comput. Music Conf., 1993, pp. 64–71.
from the Chin used in this experiment. Because the beginning [9] A. Fettweis, “Wave digital filters: Theory and practice,” Proc. IEEE, vol.
transient sounds similar enough to the original, this timbre 74, pp. 227–327, Feb. 1986.
difference is usually ignored. Nevertheless, to simulate these [10] A. Sart and G. De Poli, “Toward nonlinear wave digital filters,” IEEE.
Trans. Signal Processing, vol. 47, pp. 597–, Sept. 1999.
effects without changing the timbre is still an interesting and [11] V. Duyne et al., “The 3-D tetrahedral digital waveguide with musical
challenging topic. applications,” in Proc. 1996 ICMC, Hong Kong, Aug. 1996, pp. 9–16.
[12] V. Välimäki et al., “Physical modeling of plucked string instruments
with application to real-time sound synthesis,” J. Audio Eng. Soc., vol.
VII. CONCLUSION AND FUTURE WORK 44, no. 5, pp. 331–353, 1996.
[13] M. Karjalainen et al., “Plucked-string models: From the Karplus-Strong
A class of physical modeling recurrent networks is pro- algorithm to digital waveguides and beyond,” Comput. Music J., vol. 22,
no. 3, pp. 17–32, 1998.
posed to synthesize musical tones of plucked-string instru-
[14] A. W. Su and S. F. Liang, “Synthesis of plucked-string tones by physical
ments. All the required parameters of the synthesis model modeling with recurrent neural networks,” Proc. IEEE 1997 Workshop
can be efficiently and automatically obtained by a hybrid Multimedia Signal Processing, pp. 71–76, June 1997.
BPTT/SARPROP learning algorithm. It is possible to closely [15] S. F. Liang, A. W. Su, and C. T. Lin, “Model-based synthesis of plucked
string instruments by using a class of scattering recurrent networks,”
synthesize for a specific instrument if electronic musicians IEEE Trans. Neural Networks, vol. 11, pp. 171–185, Jan. 2000.
consider the sound of this particular instrument is indis- [16] F. J. Pineda, “Recurrent backpropagation and the dynamical approach
pensable. The approach is also tested over a wide range of to adaptive neural computation,” Neural Comput., vol. 1, pp. 161–172,
1989.
plucked-string instruments and proven to be a very general [17] H. Krauß and R. Rabenstein, “Application of multidimensional wave
method. Based on this synthesis model, portamento effect can digital filters to boundary value problems,” IEEE Signal Processing
be easily synthesized. Because the training vector is easy to Lett., vol. 2, pp. 183–187, July 1995.
[18] N. K. Treadgold and T. D. Gedeon, “Simulated annealing and weight
obtain, it is possible for users to design their own synthesizers. decay in adaptive learning: The SARPROP algorithm,” IEEE Trans. on
Although the computation complexity in the resynthesis pro- Neural Networks, vol. 9, no. 4, pp. 662–668, 1998.
cessing is still large, it is close to the computation complexity [19] L. E. Kinsler et al., Fundamentals of Acoustics, 3rd ed. New York:
Wiley, 1982.
of speech synthesis. Based on the rapid progress of current [20] R. J. Williams and J. Peng, “An efficient gradient-based algorithm for
DSP processor design, computation cost in this range should on-line training of recurrent network trajectories,” Neural Comput., vol.
not cause much trouble. 2, pp. 490–501, 1990.
[21] S. Haykin, Neural Networks. Englewood Cliffs, NJ: Prentice-Hall,
Our future works are stated as follows. First, the SNR of 1994.
the synthetic tone to the original tone is still not good enough. [22] M. Riedmiller and H. Braun, “A direct adaptive method for faster back-
Actually, the high-frequency part contributes most of the error. propagation learning: The RPROP algorithm,” in Proc. ICNN93, San
Francisco, CA, 1993, pp. 586–591.
This will be our major focus. Second, playing techniques play
[23] H. Szu, “Fast simulated annealing,” in Neural Networks for Computing,
very important roles in how an instrument sounds. For ex- J. S. Denker, Ed. New York: Amer. Inst. Phys., 1986, pp. 420–425.
ample, Chin has thousands of techniques and each technique [24] H. D. Bodman, Chinese Musical Iconography: A History of Musical In-
produces a different timbre. How to handle this problem is strument Depicted in Chinese Art. Taipei, Taiwan, R.O.C.: Asian-Pa-
cific Cultural Center, 1987.
a challenging issue. Finally, it is desired to extend the pro- [25] S. Feng, “Some acoustical measurements on the chinese musical instru-
posed methodology to other types of instruments such as wind ment p’i-p’a,” J. Acoust. Soc. Amer., vol. 75, no. 2, pp. 599–602, 1984.
1148 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002

Alvin W. Y. Su (M’97) was born in Taiwan in 1964. Sheng-Fu Liang was born in Tainan, Taiwan, in
He received the B.S. degrees in control engineering 1971. He received the B.S. and M.S. degrees in
from National Chiao-Tung University (NCTU), control engineering from the National Chiao-Tung
Taiwan, in 1986. He received the M.S. and Ph.D. University (NCTU), Taiwan, in 1994 and 1996, re-
degrees in electrical engineering from Polytechnic spectively. He received the Ph.D. degree in electrical
University, Brooklyn, NY, in 1990 and 1993, and control engineering from NCTU in 2000.
respectively. Currently, he is a Research Assistant Professor
From 1993 to 1994, he was with Center for Com- in electrical and control engineering at NCTU.
puter Research in Music and Acoustics (CCRMA), His research activities include model-based music
Stanford University, Stanford, CA. From 1994 to synthesis, neural networks, and image processing.
1995, he was with Computer Communication Lab of His current projects include audio processing and
the Industrial Technology Research Institute (CCL. ITRI.), Taiwan. In 1995, he video signal processing.
joined the Department of Information Engineering and Computer Engineering
at Chung-Hwa University, where he serves as an Associate Professor. In 2001,
he joined the Department of Computer Science and Information Engineering,
National Cheng-Kung University. His research interests include digital audio
signal processing, physical modeling of acoustic musical instruments, human
computer interface design, video and color image signal processing, and VLSI
signal processing.
Dr. Su is a Member of IEEE Computer Society and Signal Processing Society.
He is also a Member of Acoustical Society of America and Audio Engineering
Society.

You might also like