Real-time Hybrid System Identification
with Online Deterministic Annealing

Christos N. Mavridis, and Karl Henrik Johansson Division of Decision and Control Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm. emails:{mavridis,kallej}@kth.se.Research partially supported by the Swedish Foundation for Strategic Research (SSF) grant IPD23-0019.
Abstract

We introduce a real-time identification method for discrete-time state-dependent switching systems in both the input–output and state-space domains. In particular, we design a system of adaptive algorithms running in two timescales; a stochastic approximation algorithm implements an online deterministic annealing scheme at a slow timescale and estimates the mode-switching signal, and an recursive identification algorithm runs at a faster timescale and updates the parameters of the local models based on the estimate of the switching signal. We first focus on piece-wise affine systems and discuss identifiability conditions and convergence properties based on the theory of two-timescale stochastic approximation. In contrast to standard identification algorithms for switched systems, the proposed approach gradually estimates the number of modes and is appropriate for real-time system identification using sequential data acquisition. The progressive nature of the algorithm improves computational efficiency and provides real-time control over the performance-complexity trade-off. Finally, we address specific challenges that arise in the application of the proposed methodology in identification of more general switching systems. Simulation results validate the efficacy of the proposed methodology.

Index Terms:
Hybrid System Identification, Switching Systems, Piecewise Affine System Identification, Online Deterministic Annealing.

I Introduction

Hybrid systems, described by interacting continuous and discrete dynamics, are a powerful modeling tool in the analysis of systems where logic and continuous processes are interlaced, as in most complex cyber-physical systems. In addition to being able to describe switching dynamics, hybrid systems can be used as a tool to approximate highly non-linear dynamics by a collection of simpler models, and boost model explainability and robustness, by decomposing the behavior of a complex system into sub-systems where first principles and domain knowledge can be used for precise model tuning [1, 2]. As a result, hybrid systems have attracted significant attention in the control community.

However, first principles modelling is often too complicated and sub-optimal, and a hybrid model needs to be identified on the basis of observations. The majority of the work in this area is based on piece-wise affine (PWA) systems, a class of state-dependent switched systems with important applications in identification, verification, and control synthesis of hybrid and nonlinear systems [2, 3, 4, 5]. PWA systems are a collection of affine dynamical systems, indexed by a discrete-valued switching variable (mode) that depends on a partitioning of the state-input domain into a finite number of polyhedral regions [2, 3]. The input–output representation of PWA systems is the class of piece-wise affine auto-regressive exogenous (PWARX) systems with the switching signal depending on a partitioning of the domain of a vector containing the recent history of input–output pairs. The problem of identifying a PWA system can be quite challenging [6, 7]. As a result, most existing approaches focus on offline identification methods [8, 9].

I-A Contribution and Outline

In this work, we propose a two-timescale stochastic optimization approach for real-time state-dependent switched system identification in both input–output and state-space representations. We first focus on the well-studied case of PWA and PWARX systems. In Section II we present the realization and identifiability conditions for PWA systems, and in Theorem 1 of Section II-B we provide the identifiability conditions for state space PWA systems in the form of a persistence of excitation (PE) criterion. In Section III, we formulate the state-dependent switching system identification problem as a combined identification and prototype-based learning problem, and in Sections IV and V we develop a two-timescale stochastic approximation algorithm to solve it in real-time.

In particular, Theorem 3 of Section IV constructs a stochastic approximation algorithm based on online deterministic annealing that estimates the mode-switching signal, as well as the number of modes, through a bifurcation phenomenon, studied in Section IV-B. In Section V a second stochastic approximation algorithm based on standard adaptive filtering, running at a faster timescale, is developed to update the parameters of the local models based on the estimate of the switching signal. The convergence properties of this system of recursive algorithms are studied in Theorem 4 of Section V-B, and the applicability of the proposed approach in more general state-dependent switching systems is discussed in Section VI. Finally, in Section VII, simulation results validate the efficacy of the proposed approach in PWA systems.

I-B Related Work

Most existing switched system identification methods [2, 3, 4] can be categorized by the problem formulation used as optimization-based [4, 10, 8, 11], algebraic [12, 13], or clustering-based [9, 14, 15], and by the the method used as offline [9, 16, 17] or recursive [18, 13]. Algebraic methods are based on transforming the SARX model to a “lifted” ARX model that does not depend on the switching sequence [12, 13]. Offline optimization-based methods often rely on solving a large mixed-integer program, which is tractable only for small data sets [11, 8], or relaxation techniques over the same problem [18]. Finally, clustering-based methods are optimization-based methods that make use of unsupervised learning to estimate the partition of the domain that is needed for the switching signal [9, 14, 15, 19, 20, 21]. Most hybrid identification approaches are offline methods that first classify each observation and estimate the local model parameters (either simultaneously or iteratively), and then reconstruct the partition of the switching signal [9, 14, 22]. In our recent work, we have proposed the use of the online deterministic annealing approach as a clustering method to estimate the partition of the switching signal [23, 24]. Compared to standard offline methods, this approach allows for real-time PWA system identification, provides computational benefits, and offers real-time control over the performance-complexity trade-off, desired in many applications. In this work, we modify and extend this approach and provide an extensive study of a real-time prototype-based learning method for more general switched systems in both input–output and state-space representations. Compared to [23, 24], the proposed method also provides a solution to the central problem of estimating a minimal number of modes.

I-C Notation

The sets \mathbb{R}blackboard_R and \mathbb{Z}blackboard_Z represent the sets of real and integer numbers, respectively, while +subscript\mathbb{Z}_{+}blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT represents the set of non-negative integers. For a real matrix An×m𝐴superscript𝑛𝑚A\in\mathbb{R}^{n\times m}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_m end_POSTSUPERSCRIPT, ATm×nsuperscript𝐴Tsuperscript𝑚𝑛A^{\mathrm{T}}\in\mathbb{R}^{m\times n}italic_A start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT denotes its transpose and vec(A)mnvec𝐴superscript𝑚𝑛\text{vec}(A)\in\mathbb{R}^{mn}vec ( italic_A ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m italic_n end_POSTSUPERSCRIPT the vectorization of A𝐴Aitalic_A. The n×n𝑛𝑛n\times nitalic_n × italic_n identity matrix is denoted Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. A0succeeds-or-equals𝐴0A\succeq 0italic_A ⪰ 0 is a positive semi-definite matrix, and the condition ABsucceeds-or-equals𝐴𝐵A\succeq Bitalic_A ⪰ italic_B is understood as AB0succeeds-or-equals𝐴𝐵0A-B\succeq 0italic_A - italic_B ⪰ 0. Unless otherwise specified, random variables 𝒳:Ωd:𝒳Ωsuperscript𝑑\mathcal{X}:\Omega\rightarrow\mathbb{R}^{d}caligraphic_X : roman_Ω → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT are defined in a probability space (Ω,𝔽,)Ω𝔽(\Omega,\mathbb{F},\mathbb{P})( roman_Ω , blackboard_F , blackboard_P ). The probability of an event is denoted [𝒳S]:-[ωΩ:𝒳(ω)S]\mathbb{P}\left[\mathcal{X}\in S\right]\coloneq\mathbb{P}\left[\omega\in\Omega% :\mathcal{X}(\omega)\in S\right]blackboard_P [ caligraphic_X ∈ italic_S ] :- blackboard_P [ italic_ω ∈ roman_Ω : caligraphic_X ( italic_ω ) ∈ italic_S ], and the expectation operator 𝔼[𝒳]=Ω𝒳d𝔼delimited-[]𝒳subscriptΩ𝒳d\mathbb{E}\left[\mathcal{X}\right]=\int_{\Omega}\mathcal{X}\textrm{d}\mathbb{P}blackboard_E [ caligraphic_X ] = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT caligraphic_X d blackboard_P. In case of multiple random variables (𝒳,𝒴)𝒳𝒴(\mathcal{X},\mathcal{Y})( caligraphic_X , caligraphic_Y ) and a deterministic function f𝑓fitalic_f, the expectation operator 𝔼[f(𝒳,𝒴)]𝔼delimited-[]𝑓𝒳𝒴\mathbb{E}\left[f(\mathcal{X},\mathcal{Y})\right]blackboard_E [ italic_f ( caligraphic_X , caligraphic_Y ) ] is understood with respect to the joint probability measure, while 𝔼[𝒳|𝒴]:-𝔼[𝒳|σ(𝒴)]:-𝔼delimited-[]conditional𝒳𝒴𝔼delimited-[]conditional𝒳𝜎𝒴\mathbb{E}\left[\mathcal{X}|\mathcal{Y}\right]\coloneq\mathbb{E}\left[\mathcal% {X}|\sigma(\mathcal{Y})\right]blackboard_E [ caligraphic_X | caligraphic_Y ] :- blackboard_E [ caligraphic_X | italic_σ ( caligraphic_Y ) ] denotes the expectation of 𝒳𝒳\mathcal{X}caligraphic_X conditioned to the σ𝜎\sigmaitalic_σ-field of 𝒴𝒴\mathcal{Y}caligraphic_Y. Stochastic processes {𝒳(k)}ksubscript𝒳𝑘𝑘\left\{\mathcal{X}(k)\right\}_{k}{ caligraphic_X ( italic_k ) } start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, k+𝑘subscriptk\in\mathbb{Z}_{+}italic_k ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, are defined in the filtered probability space (Ω,𝔽,{n}n,)Ω𝔽subscriptsubscript𝑛𝑛(\Omega,\mathbb{F},\left\{\mathcal{F}_{n}\right\}_{n},\mathbb{P})( roman_Ω , blackboard_F , { caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , blackboard_P ), where n=σ(𝒳(k)|kn)subscript𝑛𝜎conditional𝒳𝑘𝑘𝑛\mathcal{F}_{n}=\sigma(\mathcal{X}(k)|k\leq n)caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_σ ( caligraphic_X ( italic_k ) | italic_k ≤ italic_n ), k+𝑘subscriptk\in\mathbb{Z}_{+}italic_k ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, is the natural filtration. The indicator function of the event [𝒳S]delimited-[]𝒳𝑆\left[\mathcal{X}\in S\right][ caligraphic_X ∈ italic_S ] is denoted 𝟙[𝒳S]subscript1delimited-[]𝒳𝑆\mathds{1}_{\left[\mathcal{X}\in S\right]}blackboard_1 start_POSTSUBSCRIPT [ caligraphic_X ∈ italic_S ] end_POSTSUBSCRIPT and tensor-product\otimes denotes the Kronecker product. Finally, “min\minroman_min” defines the minimization operator while “minimizeminimize\operatorname*{minimize}roman_minimize” defines a minimization problem.

II Switched and Piecewise Affine Systems

A general discrete-time switched system is described by:

xt+1subscript𝑥𝑡1\displaystyle x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =fσt(xt,ut)+wtabsentsubscript𝑓subscript𝜎𝑡subscript𝑥𝑡subscript𝑢𝑡subscript𝑤𝑡\displaystyle=f_{\sigma_{t}}(x_{t},u_{t})+w_{t}= italic_f start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (1)
ytsubscript𝑦𝑡\displaystyle y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =gσt(xt,ut)+vt,t+formulae-sequenceabsentsubscript𝑔subscript𝜎𝑡subscript𝑥𝑡subscript𝑢𝑡subscript𝑣𝑡𝑡subscript\displaystyle=g_{\sigma_{t}}(x_{t},u_{t})+v_{t},\quad t\in\mathbb{Z}_{+}= italic_g start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT

where xtnsubscript𝑥𝑡superscript𝑛x_{t}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the state vector of the system, utpsubscript𝑢𝑡superscript𝑝u_{t}\in\mathbb{R}^{p}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT the input, ytqsubscript𝑦𝑡superscript𝑞y_{t}\in\mathbb{R}^{q}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT the output, and wtnsubscript𝑤𝑡superscript𝑛w_{t}\in\mathbb{R}^{n}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and vtqsubscript𝑣𝑡superscript𝑞v_{t}\in\mathbb{R}^{q}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT are noise terms. The signal σt{1,,s}subscript𝜎𝑡1𝑠\sigma_{t}\in\left\{1,\ldots,s\right\}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 1 , … , italic_s } defines the mode which is active at time t𝑡titalic_t. System (1) is a switched affine system when it can be expressed as:

xt+1subscript𝑥𝑡1\displaystyle x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =Aσtxt+Bσtut+f¯σt+wtabsentsubscript𝐴subscript𝜎𝑡subscript𝑥𝑡subscript𝐵subscript𝜎𝑡subscript𝑢𝑡subscript¯𝑓subscript𝜎𝑡subscript𝑤𝑡\displaystyle=A_{\sigma_{t}}x_{t}+B_{\sigma_{t}}u_{t}+\bar{f}_{\sigma_{t}}+w_{t}= italic_A start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (2)
ytsubscript𝑦𝑡\displaystyle y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =Cσtxt+Dσtut+g¯σt+vt,t+.formulae-sequenceabsentsubscript𝐶subscript𝜎𝑡subscript𝑥𝑡subscript𝐷subscript𝜎𝑡subscript𝑢𝑡subscript¯𝑔subscript𝜎𝑡subscript𝑣𝑡𝑡subscript\displaystyle=C_{\sigma_{t}}x_{t}+D_{\sigma_{t}}u_{t}+\bar{g}_{\sigma_{t}}+v_{% t},\quad t\in\mathbb{Z}_{+}.= italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .

The matrices Ain×nsubscript𝐴𝑖superscript𝑛𝑛A_{i}\in\mathbb{R}^{n\times n}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, Bin×psubscript𝐵𝑖superscript𝑛𝑝B_{i}\in\mathbb{R}^{n\times p}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT, Ciq×nsubscript𝐶𝑖superscript𝑞𝑛C_{i}\in\mathbb{R}^{q\times n}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q × italic_n end_POSTSUPERSCRIPT, Diq×psubscript𝐷𝑖superscript𝑞𝑝D_{i}\in\mathbb{R}^{q\times p}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q × italic_p end_POSTSUPERSCRIPT, f¯insubscript¯𝑓𝑖superscript𝑛\bar{f}_{i}\in\mathbb{R}^{n}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and g¯iqsubscript¯𝑔𝑖superscript𝑞\bar{g}_{i}\in\mathbb{R}^{q}over¯ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT define the affine dynamics for each mode i{1,,s}𝑖1𝑠i\in\left\{1,\ldots,s\right\}italic_i ∈ { 1 , … , italic_s }. System (2) is PWA when σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is defined according to a polyhedral partition of the state and input space, i.e., when

σt=i[xtut]RiR,iffsubscript𝜎𝑡𝑖matrixsubscript𝑥𝑡subscript𝑢𝑡subscript𝑅𝑖𝑅\sigma_{t}=i\iff\begin{bmatrix}x_{t}\\ u_{t}\end{bmatrix}\in R_{i}\subset R,italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_i ⇔ [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∈ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ italic_R , (3)

where Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,s𝑖1𝑠i=1,\ldots,sitalic_i = 1 , … , italic_s, are convex polyhedra defining a partition of the state-input domain Rn+p𝑅superscript𝑛𝑝R\subseteq\mathbb{R}^{n+p}italic_R ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n + italic_p end_POSTSUPERSCRIPT, that is when RiRj=subscript𝑅𝑖subscript𝑅𝑗R_{i}\cap R_{j}=\emptysetitalic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∅ for ij𝑖𝑗i\neq jitalic_i ≠ italic_j, and iRi=Rsubscript𝑖subscript𝑅𝑖𝑅\bigcup_{i}R_{i}=R⋃ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_R.

Switched affine systems can be expressed in input–output form as Switched AutoRegressive eXogenous (SARX) systems of fixed orders nasubscript𝑛𝑎n_{a}italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, nbsubscript𝑛𝑏n_{b}italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, such that for every component yt(i)superscriptsubscript𝑦𝑡𝑖y_{t}^{(i)}\in\mathbb{R}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_R of the output vector ytqsubscript𝑦𝑡superscript𝑞y_{t}\in\mathbb{R}^{q}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT it holds:

yt(i)=θ¯σt(i)T[rt1]+e¯t(i),i=1,,q,formulae-sequencesuperscriptsubscript𝑦𝑡𝑖superscriptsubscript¯𝜃subscript𝜎𝑡𝑖Tmatrixsubscript𝑟𝑡1superscriptsubscript¯𝑒𝑡𝑖𝑖1𝑞\displaystyle y_{t}^{(i)}=\bar{\theta}_{\sigma_{t}}^{(i)\mathrm{T}}\begin{% bmatrix}r_{t}\\ 1\end{bmatrix}+\bar{e}_{t}^{(i)},\ i=1,\ldots,q,italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) roman_T end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ] + over¯ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_i = 1 , … , italic_q , (4)

where the regressor vector rtd¯subscript𝑟𝑡superscript¯𝑑r_{t}\in\mathbb{R}^{\bar{d}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG end_POSTSUPERSCRIPT, d¯=qna+p(nb+1)¯𝑑𝑞subscript𝑛𝑎𝑝subscript𝑛𝑏1\bar{d}=qn_{a}+p(n_{b}+1)over¯ start_ARG italic_d end_ARG = italic_q italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_p ( italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT + 1 ), is defined by

rt=[yt1TytnaTutTut1TutnbT]Td¯.subscript𝑟𝑡superscriptdelimited-[]superscriptsubscript𝑦𝑡1Tsuperscriptsubscript𝑦𝑡subscript𝑛𝑎Tsuperscriptsubscript𝑢𝑡Tsuperscriptsubscript𝑢𝑡1Tsuperscriptsubscript𝑢𝑡subscript𝑛𝑏TTsuperscript¯𝑑r_{t}=[y_{t-1}^{\mathrm{T}}\ldots y_{t-n_{a}}^{\mathrm{T}}u_{t}^{\mathrm{T}}u_% {t-1}^{\mathrm{T}}\ldots u_{t-n_{b}}^{\mathrm{T}}]^{\mathrm{T}}\in\mathbb{R}^{% \bar{d}}.italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT … italic_y start_POSTSUBSCRIPT italic_t - italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT … italic_u start_POSTSUBSCRIPT italic_t - italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG end_POSTSUPERSCRIPT . (5)

The parameter vectors θ¯j(i)d¯+1superscriptsubscript¯𝜃𝑗𝑖superscript¯𝑑1\bar{\theta}_{j}^{(i)}\in\mathbb{R}^{\bar{d}+1}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG + 1 end_POSTSUPERSCRIPT, j{1,,s}𝑗1𝑠j\in\left\{1,\ldots,s\right\}italic_j ∈ { 1 , … , italic_s }, define each ARX mode, and e¯tqsubscript¯𝑒𝑡superscript𝑞\bar{e}_{t}\in\mathbb{R}^{q}over¯ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT is a noise term. Similarly, (4) is PWARX if

σt=irtPiPd,iffsubscript𝜎𝑡𝑖subscript𝑟𝑡subscript𝑃𝑖𝑃superscript𝑑\sigma_{t}=i\iff r_{t}\in P_{i}\subset P\subseteq\mathbb{R}^{d},italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_i ⇔ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ italic_P ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , (6)

and {Pi}i=1ssuperscriptsubscriptsubscript𝑃𝑖𝑖1𝑠\left\{P_{i}\right\}_{i=1}^{s}{ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT define a polyhedral partition of Pd𝑃superscript𝑑P\subseteq\mathbb{R}^{d}italic_P ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

II-A Realization and Identification of PWARX Models

Results in [25] show that every observable switched affine system admits a SARX representation. Necessary and sufficient conditions for input–output realization of SARX and PWARX systems are given in [26], and [27], respectively. It is worth mentioning, however, that the number of modes and parameters can grow considerably when a PWA state-space system is converted into a minimum-order equivalent PWARX representation [27].

Identifiability refers to whether or not identification of a given parameterized system from noise-free data is a well-posed problem. In spite of the increasing attention received by SARX and PWARX system identification, there are currently only few results on the identifiability of these systems [2, 3]. Identifiability with respect to the input–output behavior of switched linear systems is investigated in [28]. The general identification problem for a PWARX system of the form (4)-(6) can be formulated as a stochastic optimization problem over the parameters {na,nb,s,{θi}i=1s,{Pi}i=1s}subscript𝑛𝑎subscript𝑛𝑏𝑠superscriptsubscriptsubscript𝜃𝑖𝑖1𝑠superscriptsubscriptsubscript𝑃𝑖𝑖1𝑠\left\{n_{a},n_{b},s,\left\{\theta_{i}\right\}_{i=1}^{s},\left\{P_{i}\right\}_% {i=1}^{s}\right\}{ italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_s , { italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , { italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT }. We make the following assumption:

Assumption 1.

Upper bounds (n~a,n~b)subscript~𝑛𝑎subscript~𝑛𝑏(\tilde{n}_{a},\tilde{n}_{b})( over~ start_ARG italic_n end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , over~ start_ARG italic_n end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) on the orders of the model (na,nb)subscript𝑛𝑎subscript𝑛𝑏(n_{a},n_{b})( italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) are known.

Assumption 1 will allow us to concentrate on the properties of PWARX identification, assuming known (n~a,n~b)subscript~𝑛𝑎subscript~𝑛𝑏(\tilde{n}_{a},\tilde{n}_{b})( over~ start_ARG italic_n end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , over~ start_ARG italic_n end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) subject to potential computational bounds.

II-B Realization and Identification of PWA State-Space Models

The problem of identifying a state-space representation of a switched affine system can be quite challenging. Traditionally, it has been handled linked to applying results from classical realization theory to each linear subsystem [7]. However, identifiability issues arise regarding the characterization of minimality of discrete-time switched linear systems. The first issue relates to the known fact that realizations of a switched affine system are not unique [6]. The lack of uniqueness is related to that (i) the minimal realizations of the local linear systems from input–output observations are non-unique, and (ii) a realization of a switched affine system can be constructed for any arbitrary number of modes sssuperscript𝑠𝑠s^{\prime}\geq sitalic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ italic_s [6]. Here we address the problem of the realization of the local systems being unique only up to isomorphisms, even when the switching signal is known [7]. In particular, assuming no affine dynamics f¯σtsubscript¯𝑓subscript𝜎𝑡\bar{f}_{\sigma_{t}}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, g¯σtsubscript¯𝑔subscript𝜎𝑡\bar{g}_{\sigma_{t}}over¯ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, for any set of invertible matrices Pσtn×nsubscript𝑃subscript𝜎𝑡superscript𝑛𝑛P_{\sigma_{t}}\in\mathbb{R}^{n\times n}italic_P start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, σt{1,,s}subscript𝜎𝑡1𝑠\sigma_{t}\in\left\{1,\ldots,s\right\}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 1 , … , italic_s }, the realization {PσtAσtPσt1,PσtBσt,CσtPσt1,Dσt}subscript𝑃subscript𝜎𝑡subscript𝐴subscript𝜎𝑡superscriptsubscript𝑃subscript𝜎𝑡1subscript𝑃subscript𝜎𝑡subscript𝐵subscript𝜎𝑡subscript𝐶subscript𝜎𝑡superscriptsubscript𝑃subscript𝜎𝑡1subscript𝐷subscript𝜎𝑡\left\{P_{\sigma_{t}}A_{\sigma_{t}}P_{\sigma_{t}}^{-1},P_{\sigma_{t}}B_{\sigma% _{t}},C_{\sigma_{t}}P_{\sigma_{t}}^{-1},D_{\sigma_{t}}\right\}{ italic_P start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT } corresponds to the transfer function associated with (2), i.e., the same input–output observations. To ensure uniqueness of the realizations, given that all subsystems i{1,,s}𝑖1𝑠i\in\left\{1,\ldots,s\right\}italic_i ∈ { 1 , … , italic_s } share the same state space, and simplify the presentation of our methodology, we make the following assumptions.

Assumption 2.

Ci=Csubscript𝐶𝑖𝐶C_{i}=Citalic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_C, i{1,,s}for-all𝑖1𝑠\forall i\in\left\{1,\ldots,s\right\}∀ italic_i ∈ { 1 , … , italic_s } in system (2).

Assumption 2 implies that the order n𝑛nitalic_n is known (observed) and enforces that the set of observations is acquired using the same observation mechanism, which leads to the realization of (2) being unique. In practice, this corresponds, for example, with the assumption of using a single sensor with the same world reference to measure the states of the system for every mode, without allowing any similarity transformation.

Assumption 3.

No affine dynamics, i.e., f¯σt=0subscript¯𝑓subscript𝜎𝑡0\bar{f}_{\sigma_{t}}=0over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0, g¯σt=0subscript¯𝑔subscript𝜎𝑡0\bar{g}_{\sigma_{t}}=0over¯ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0, no feed-forward terms, i.e., Dσt=0subscript𝐷subscript𝜎𝑡0D_{\sigma_{t}}=0italic_D start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0, the states are fully observable, i.e., C=In𝐶subscript𝐼𝑛C=I_{n}italic_C = italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and the error terms wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT share the same zero-mean statistics for every mode of the system.

Assumptions 3 are made to simplify the presentation of the proposed methodology without loss of generality.

In addition to the realizations of the local systems being non-unique, minimality and identifiability of the switched system does not necessarily imply that of the local subsystems [28]. In Theorem 1, we describe the conditions under which the local linear models of (2) (under Assumptions 23) can be identified, even when a subset of them is not controllable (minimal) in isolation.

Theorem 1.

Consider a bounded-input bounded-output linear discrete-time system of the form:

xt+1subscript𝑥𝑡1\displaystyle x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =Axt+But,t+formulae-sequenceabsent𝐴subscript𝑥𝑡𝐵subscript𝑢𝑡𝑡subscript\displaystyle=Ax_{t}+Bu_{t},\quad t\in\mathbb{Z}_{+}= italic_A italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT (7)
ytsubscript𝑦𝑡\displaystyle y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =xt,absentsubscript𝑥𝑡\displaystyle=x_{t},= italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where xtnsubscript𝑥𝑡superscript𝑛x_{t}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, utpsubscript𝑢𝑡superscript𝑝u_{t}\in\mathbb{R}^{p}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, An×n𝐴superscript𝑛𝑛A\in\mathbb{R}^{n\times n}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, and Bn×p𝐵superscript𝑛𝑝B\in\mathbb{R}^{n\times p}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT. Denote rt=[xtTutT]Tsubscript𝑟𝑡superscriptdelimited-[]superscriptsubscript𝑥𝑡Tsuperscriptsubscript𝑢𝑡TTr_{t}=[x_{t}^{\mathrm{T}}u_{t}^{\mathrm{T}}]^{\mathrm{T}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. Then, if there exist some α,β,T>0𝛼𝛽𝑇0\alpha,\beta,T>0italic_α , italic_β , italic_T > 0 such that

αIn+pτ=tt+TrtrtTβIn+p,t0,formulae-sequenceprecedes-or-equals𝛼subscript𝐼𝑛𝑝superscriptsubscript𝜏𝑡𝑡𝑇subscript𝑟𝑡superscriptsubscript𝑟𝑡Tprecedes-or-equals𝛽subscript𝐼𝑛𝑝for-all𝑡0\alpha I_{n+p}\preceq\sum_{\tau=t}^{t+T}r_{t}r_{t}^{\mathrm{T}}\preceq\beta I_% {n+p},\quad\forall t\geq 0,italic_α italic_I start_POSTSUBSCRIPT italic_n + italic_p end_POSTSUBSCRIPT ⪯ ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⪯ italic_β italic_I start_POSTSUBSCRIPT italic_n + italic_p end_POSTSUBSCRIPT , ∀ italic_t ≥ 0 , (8)

the augmented parameter matrix Θ^t=[A^t|B^t]subscript^Θ𝑡delimited-[]conditionalsubscript^𝐴𝑡subscript^𝐵𝑡\hat{\Theta}_{t}=[\hat{A}_{t}|\hat{B}_{t}]over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] updated by the recursion

Θ^t+1=Θ^tγ(Θ^trtxt+1)rtT,t0,formulae-sequencesubscript^Θ𝑡1subscript^Θ𝑡𝛾subscript^Θ𝑡subscript𝑟𝑡subscript𝑥𝑡1superscriptsubscript𝑟𝑡T𝑡0\hat{\Theta}_{t+1}=\hat{\Theta}_{t}-\gamma\left(\hat{\Theta}_{t}r_{t}-x_{t+1}% \right)r_{t}^{\mathrm{T}},\quad t\geq 0,over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_γ ( over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , italic_t ≥ 0 , (9)

for some γ>0𝛾0\gamma>0italic_γ > 0, asymptotically converges to Θ=[A|B]Θdelimited-[]conditional𝐴𝐵\Theta=[A|B]roman_Θ = [ italic_A | italic_B ].

Proof.

See Appendix A. ∎

As a result of Theorem 1, throughout this paper, we make the following assumption to ensure identifiability of (2) under Assumptions 23:

Assumption 4.

All linear subsystems i{1,,s}𝑖1𝑠i\in\left\{1,\ldots,s\right\}italic_i ∈ { 1 , … , italic_s } of (2) are asymptotically bounded, and the bounded control input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is designed such that for every mode i{1,,s}𝑖1𝑠i\in\left\{1,\ldots,s\right\}italic_i ∈ { 1 , … , italic_s } of (2), there exist some αi,βi,Ti>0subscript𝛼𝑖subscript𝛽𝑖subscript𝑇𝑖0\alpha_{i},\beta_{i},T_{i}>0italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 for which the following persistence of excitation condition holds:

αiIn+pτ=tt+Ti[xτxτTxτuτTuτxτTuτuτT]βiIn+p,t0.formulae-sequenceprecedes-or-equalssubscript𝛼𝑖subscript𝐼𝑛𝑝superscriptsubscript𝜏𝑡𝑡subscript𝑇𝑖matrixsubscript𝑥𝜏superscriptsubscript𝑥𝜏Tsubscript𝑥𝜏superscriptsubscript𝑢𝜏Tsubscript𝑢𝜏superscriptsubscript𝑥𝜏Tsubscript𝑢𝜏superscriptsubscript𝑢𝜏Tprecedes-or-equalssubscript𝛽𝑖subscript𝐼𝑛𝑝for-all𝑡0\alpha_{i}I_{n+p}\preceq\sum_{\tau=t}^{t+T_{i}}\begin{bmatrix}x_{\tau}x_{\tau}% ^{\mathrm{T}}&x_{\tau}u_{\tau}^{\mathrm{T}}\\ u_{\tau}x_{\tau}^{\mathrm{T}}&u_{\tau}u_{\tau}^{\mathrm{T}}\end{bmatrix}% \preceq\beta_{i}I_{n+p},\ \forall t\geq 0.italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + italic_p end_POSTSUBSCRIPT ⪯ ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] ⪯ italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + italic_p end_POSTSUBSCRIPT , ∀ italic_t ≥ 0 . (10)
Remark 1.

Informally, condition (10) states that not every subsystem in (2) should be controllable (minimal), as long as the boundaries of each mode (region Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the state-input system) are visited often enough and form a rich-enough set of states.

Remark 2.

The assumption of asymptotic boundedness and controllability (thus, minimality) for all subsystems of (2) would simplify the condition (10) to a persistence of excitation criterion for the input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for each subsystem separately. Although this assumption is usually adopted, it is a limiting assumption in a practical sense. The assumption that all the local systems share the same state space of order n𝑛nitalic_n is a modeling assumption that facilitates the identification of the switched signal as a partition of the state-input space. However, it allows for situations when the minimal realization of some of the local models is of order n<nsuperscript𝑛𝑛n^{\prime}<nitalic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_n, as long as the switched system as a whole is identifiable.

III Hybrid System Identification as an Optimization Problem

Consider a switched linear system of the form

ψtsubscript𝜓𝑡\displaystyle\psi_{t}italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =Θiϕt+et,absentsubscriptΘ𝑖subscriptitalic-ϕ𝑡subscript𝑒𝑡\displaystyle=\Theta_{i}\phi_{t}+e_{t},= roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (11)
=[ϕtTIm]θi+et, if ϕtSi,t+,formulae-sequenceabsentdelimited-[]tensor-productsuperscriptsubscriptitalic-ϕ𝑡Tsubscript𝐼𝑚subscript𝜃𝑖subscript𝑒𝑡formulae-sequence if subscriptitalic-ϕ𝑡subscript𝑆𝑖𝑡subscript\displaystyle=[\phi_{t}^{\mathrm{T}}\otimes I_{m}]\theta_{i}+e_{t},\text{ if }% \phi_{t}\in S_{i},\ t\in\mathbb{Z}_{+},= [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,

where ψtmsubscript𝜓𝑡superscript𝑚\psi_{t}\in\mathbb{R}^{m}italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, ϕtdsubscriptitalic-ϕ𝑡superscript𝑑\phi_{t}\in\mathbb{R}^{d}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, σt{1,,s}subscript𝜎𝑡1𝑠\sigma_{t}\in\left\{1,\ldots,s\right\}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 1 , … , italic_s }, Θim×dsubscriptΘ𝑖superscript𝑚𝑑\Theta_{i}\in\mathbb{R}^{m\times d}roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT, for all i=1,,s𝑖1𝑠i=1,\ldots,sitalic_i = 1 , … , italic_s, θi=vec(Θi)mdsubscript𝜃𝑖vecsubscriptΘ𝑖superscript𝑚𝑑\theta_{i}=\text{vec}(\Theta_{i})\in\mathbb{R}^{md}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = vec ( roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT, etmsubscript𝑒𝑡superscript𝑚e_{t}\in\mathbb{R}^{m}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is a zero-mean noise signal, and {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT define a polyhedral partition of Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. System (4) can be written in the form (11) with ψt=ytqsubscript𝜓𝑡subscript𝑦𝑡superscript𝑞\psi_{t}=y_{t}\in\mathbb{R}^{q}italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, ϕt=[rtT1]Td¯+1subscriptitalic-ϕ𝑡superscriptdelimited-[]superscriptsubscript𝑟𝑡T1Tsuperscript¯𝑑1\phi_{t}=[r_{t}^{\mathrm{T}}1]^{\mathrm{T}}\in\mathbb{R}^{\bar{d}+1}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT 1 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG + 1 end_POSTSUPERSCRIPT, and Θi=[θ¯i(1)θ¯i(q)]TsubscriptΘ𝑖superscriptdelimited-[]superscriptsubscript¯𝜃𝑖1superscriptsubscript¯𝜃𝑖𝑞T\Theta_{i}=[\bar{\theta}_{i}^{(1)}\ldots\bar{\theta}_{i}^{(q)}]^{\mathrm{T}}roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT … over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, where m=q𝑚𝑞m=qitalic_m = italic_q, and d=d¯+1𝑑¯𝑑1d=\bar{d}+1italic_d = over¯ start_ARG italic_d end_ARG + 1. In addition, system (2) under Assumptions 2, 3 can be written in the form (11) with ψt=xt+1nsubscript𝜓𝑡subscript𝑥𝑡1superscript𝑛\psi_{t}=x_{t+1}\in\mathbb{R}^{n}italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, ϕt=[xtTutT]Tn+psubscriptitalic-ϕ𝑡superscriptdelimited-[]superscriptsubscript𝑥𝑡Tsuperscriptsubscript𝑢𝑡TTsuperscript𝑛𝑝\phi_{t}=[x_{t}^{\mathrm{T}}u_{t}^{\mathrm{T}}]^{\mathrm{T}}\in\mathbb{R}^{n+p}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + italic_p end_POSTSUPERSCRIPT, and Θi=[Ai|Bi]subscriptΘ𝑖delimited-[]conditionalsubscript𝐴𝑖subscript𝐵𝑖\Theta_{i}=[A_{i}|B_{i}]roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], where m=n𝑚𝑛m=nitalic_m = italic_n, and d=n+p𝑑𝑛𝑝d=n+pitalic_d = italic_n + italic_p.

Under the identifiability conditions discussed in Section II, the general identification problem for a switching system of the form (11) can be formulated as a stochastic optimization problem over the parameters {s,{θi}i=1s,{Si}i=1s}𝑠superscriptsubscriptsubscript𝜃𝑖𝑖1𝑠superscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{s,\left\{\theta_{i}\right\}_{i=1}^{s},\left\{S_{i}\right\}_{i=1}^{s}\right\}{ italic_s , { italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , { italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT }, as follows:

minimizes,{θi},{Si}𝔼[i=1s𝟙[ΦSi]dρ(Ψ,[ΦTIm]θi)],subscriptminimize𝑠subscript𝜃𝑖subscript𝑆𝑖𝔼delimited-[]superscriptsubscript𝑖1𝑠subscript1delimited-[]Φsubscript𝑆𝑖subscript𝑑𝜌Ψdelimited-[]tensor-productsuperscriptΦTsubscript𝐼𝑚subscript𝜃𝑖\operatorname*{minimize}_{s,\left\{\theta_{i}\right\},\left\{S_{i}\right\}}\ % \mathbb{E}\left[\sum_{i=1}^{s}\mathds{1}_{\left[\Phi\in S_{i}\right]}d_{\rho}% \left(\Psi,[\Phi^{\mathrm{T}}\otimes I_{m}]\theta_{i}\right)\right],roman_minimize start_POSTSUBSCRIPT italic_s , { italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } , { italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT [ roman_Φ ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( roman_Ψ , [ roman_Φ start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] , (12)

where ΨmΨsuperscript𝑚\Psi\in\mathbb{R}^{m}roman_Ψ ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and ΦdΦsuperscript𝑑\Phi\in\mathbb{R}^{d}roman_Φ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT represent random variables, realizations of which constitute the system observations, the nonnegative measure dρsubscript𝑑𝜌d_{\rho}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT is an appropriately defined dissimilarity measure, and the expectation is taken with respect to the joint distribution of (Ψ,Φ)m+dΨΦsuperscript𝑚𝑑(\Psi,\Phi)\in\mathbb{R}^{m+d}( roman_Ψ , roman_Φ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m + italic_d end_POSTSUPERSCRIPT that depends on the system dynamics, the control input, and the noise term in (11).

It is clear that the optimization problem (12) is computationally hard and becomes intractable as the number of modes and states increases. In particular, the number of modes s𝑠sitalic_s is unknown and completely alters the cardinality and the domain of the set of parameter vectors {θi}i=1ssuperscriptsubscriptsubscript𝜃𝑖𝑖1𝑠\left\{\theta_{i}\right\}_{i=1}^{s}{ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT that represent the dynamics of the system. In addition, a parametric representation for the polyhedral regions {Si}subscript𝑆𝑖\left\{S_{i}\right\}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } should be defined.

To represent the regions {Si}subscript𝑆𝑖\left\{S_{i}\right\}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, we will follow a Voronoi tessellation approach based on prototypes. We introduce a set of parameters ϕ^:-{ϕ^i}i=1K:-^italic-ϕsuperscriptsubscriptsubscript^italic-ϕ𝑖𝑖1𝐾\hat{\phi}\coloneq\left\{\hat{\phi}_{i}\right\}_{i=1}^{K}over^ start_ARG italic_ϕ end_ARG :- { over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, ϕ^iSsubscript^italic-ϕ𝑖𝑆\hat{\phi}_{i}\in Sover^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_S and define the regions:

Σi={ϕS:i=argminjdρ(ϕ,ϕ^j)},i=1,,K.formulae-sequencesubscriptΣ𝑖conditional-setitalic-ϕ𝑆𝑖subscriptargmin𝑗subscript𝑑𝜌italic-ϕsubscript^italic-ϕ𝑗𝑖1𝐾\Sigma_{i}=\left\{\phi\in S:i=\operatorname*{arg\,min}_{j}d_{\rho}(\phi,\hat{% \phi}_{j})\right\},\ i=1,\ldots,K.roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_ϕ ∈ italic_S : italic_i = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_ϕ , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } , italic_i = 1 , … , italic_K . (13)

The measure dρsubscript𝑑𝜌d_{\rho}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT can be designed such that the Voronoi regions ΣisubscriptΣ𝑖\Sigma_{i}roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are polyhedral, e.g., when dρsubscript𝑑𝜌d_{\rho}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT is a squared Euclidean distance or any Bregman divergence, as will be explained in Section IV-A. In this sense, each Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be mapped to a region ΣjsubscriptΣ𝑗\Sigma_{j}roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (for K=s𝐾𝑠K=sitalic_K = italic_s) or the union of a subset of of {Σj}subscriptΣ𝑗\left\{\Sigma_{j}\right\}{ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } (for K>s𝐾𝑠K>sitalic_K > italic_s), according to a predefined rule, as will be explained in Section IV-C. An illustration of this partition is given in Fig. 1.

In addition to the prototype parameters {ϕ^i}i=1Ksuperscriptsubscriptsubscript^italic-ϕ𝑖𝑖1𝐾\left\{\hat{\phi}_{i}\right\}_{i=1}^{K}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, we also introduce a set of parameters θ^:-{θ^i}i=1K:-^𝜃superscriptsubscriptsubscript^𝜃𝑖𝑖1𝐾\hat{\theta}\coloneq\left\{\hat{\theta}_{i}\right\}_{i=1}^{K}over^ start_ARG italic_θ end_ARG :- { over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, θ^imdsubscript^𝜃𝑖superscript𝑚𝑑\hat{\theta}_{i}\in\mathbb{R}^{md}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT, with each θ^isubscript^𝜃𝑖\hat{\theta}_{i}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT associated with the region ΣisubscriptΣ𝑖\Sigma_{i}roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT according to (13). Representing the augmented random vector

X=[ΨΦ]Πm+d,𝑋matrixΨΦΠsuperscript𝑚𝑑X=\begin{bmatrix}\Psi\\ \Phi\end{bmatrix}\in\Pi\subseteq\mathbb{R}^{m+d},italic_X = [ start_ARG start_ROW start_CELL roman_Ψ end_CELL end_ROW start_ROW start_CELL roman_Φ end_CELL end_ROW end_ARG ] ∈ roman_Π ⊆ blackboard_R start_POSTSUPERSCRIPT italic_m + italic_d end_POSTSUPERSCRIPT , (14)

we can define a set of augmented codevectors μ:-{μi}i=1K:-𝜇superscriptsubscriptsubscript𝜇𝑖𝑖1𝐾\mu\coloneq\left\{\mu_{i}\right\}_{i=1}^{K}italic_μ :- { italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT as

μi=[z(ϕ,θ^i)ϕ^i]Π,i=1,,K,formulae-sequencesubscript𝜇𝑖matrix𝑧italic-ϕsubscript^𝜃𝑖subscript^italic-ϕ𝑖Π𝑖1𝐾\mu_{i}=\begin{bmatrix}z(\phi,\hat{\theta}_{i})\\ \hat{\phi}_{i}\end{bmatrix}\in\Pi,\ i=1,\ldots,K,italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_z ( italic_ϕ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∈ roman_Π , italic_i = 1 , … , italic_K , (15)

where the first component of each μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT111Throughout this paper we will use the notation μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, μi(ϕ^i)subscript𝜇𝑖subscript^italic-ϕ𝑖\mu_{i}(\hat{\phi}_{i})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), μi(θ^i)subscript𝜇𝑖subscript^𝜃𝑖\mu_{i}(\hat{\theta}_{i})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), μi(θ^i,ϕ^i)subscript𝜇𝑖subscript^𝜃𝑖subscript^italic-ϕ𝑖\mu_{i}(\hat{\theta}_{i},\hat{\phi}_{i})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), μi(ϕ,θ^i,ϕ^i)subscript𝜇𝑖italic-ϕsubscript^𝜃𝑖subscript^italic-ϕ𝑖\mu_{i}(\phi,\hat{\theta}_{i},\hat{\phi}_{i})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ϕ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) interchangeably, to showcase the dependence on the variables of interest in each case. is a mapping z(ϕ,θ^i)=[ϕTIm]θ^i𝑧italic-ϕsubscript^𝜃𝑖delimited-[]tensor-productsuperscriptitalic-ϕTsubscript𝐼𝑚subscript^𝜃𝑖z(\phi,\hat{\theta}_{i})=[\phi^{\mathrm{T}}\otimes I_{m}]\hat{\theta}_{i}italic_z ( italic_ϕ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = [ italic_ϕ start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that simulates the local model dynamics in (11) with unknown parameters θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the second component is a set of unknown codevectors ϕ^isubscript^italic-ϕ𝑖\hat{\phi}_{i}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that define the partition in (13).

Problem (12) can then be decomposed into two interconnected stochastic optimization problems. Assuming {θ^i}i=1Ksuperscriptsubscriptsubscript^𝜃𝑖𝑖1𝐾\left\{\hat{\theta}_{i}\right\}_{i=1}^{K}{ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT are known, the optimization problem

minimizeϕ^𝔼[i=1K𝟙[ΦΣi(ϕ^)]dρ(X(Ψ,Φ),μi(θ^i,ϕ^i))]subscriptminimize^italic-ϕ𝔼delimited-[]superscriptsubscript𝑖1𝐾subscript1delimited-[]ΦsubscriptΣ𝑖^italic-ϕsubscript𝑑𝜌𝑋ΨΦsubscript𝜇𝑖subscript^𝜃𝑖subscript^italic-ϕ𝑖\operatorname*{minimize}_{\hat{\phi}}\ \mathbb{E}\left[\sum_{i=1}^{K}\mathds{1% }_{\left[\Phi\in\Sigma_{i}(\hat{\phi})\right]}d_{\rho}\left(X(\Psi,\Phi),\mu_{% i}(\hat{\theta}_{i},\hat{\phi}_{i})\right)\right]roman_minimize start_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT [ roman_Φ ∈ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_ϕ end_ARG ) ] end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_X ( roman_Ψ , roman_Φ ) , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ] (16)

finds the optimal parameters {ϕ^i}i=1Ksuperscriptsubscriptsubscript^italic-ϕ𝑖𝑖1𝐾\left\{\hat{\phi}_{i}\right\}_{i=1}^{K}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT that define the partition {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT subject to the joint distribution of (Ψ,Φ)ΨΦ(\Psi,\Phi)( roman_Ψ , roman_Φ ), and is, therefore, a mode switching signal identification problem.

On the other hand, assuming the partition {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT (and, therefore, {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT) is known, the optimization problem

minimizeθ^𝔼[i=1K𝟙[ΦΣi]dρ(Ψ,[ΦTIm]θ^i)]subscriptminimize^𝜃𝔼delimited-[]superscriptsubscript𝑖1𝐾subscript1delimited-[]ΦsubscriptΣ𝑖subscript𝑑𝜌Ψdelimited-[]tensor-productsuperscriptΦTsubscript𝐼𝑚subscript^𝜃𝑖\operatorname*{minimize}_{\hat{\theta}}\ \mathbb{E}\left[\sum_{i=1}^{K}\mathds% {1}_{\left[\Phi\in\Sigma_{i}\right]}d_{\rho}\left(\Psi,[\Phi^{\mathrm{T}}% \otimes I_{m}]\hat{\theta}_{i}\right)\right]roman_minimize start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT [ roman_Φ ∈ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( roman_Ψ , [ roman_Φ start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] (17)

is a system identification problem for each mode of the system.

In Section IV we address the question of finding the optimal number K𝐾Kitalic_K according to a performance-complexity trade-off, as well as finding a mapping between {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT and {Si}i=1s^superscriptsubscriptsubscript𝑆𝑖𝑖1^𝑠\left\{S_{i}\right\}_{i=1}^{\hat{s}}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT for the lowest possible number s^s^𝑠𝑠\hat{s}\geq sover^ start_ARG italic_s end_ARG ≥ italic_s. In Section V we tackle the problem of solving (16) and (17) as a system of interconnected stochastic optimization problems in real-time using principles from two-timescale stochastic approximation theory.

Refer to caption
Figure 1: Illustration of the partition {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT of the state-input space S𝑆Sitalic_S and its connection to the artificial partition {Σj}j=1KsuperscriptsubscriptsubscriptΣ𝑗𝑗1𝐾\left\{\Sigma_{j}\right\}_{j=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. The optimal parameters {ϕ^j}subscript^italic-ϕ𝑗\left\{\hat{\phi}_{j}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } induce a partition {Σj}subscriptΣ𝑗\left\{\Sigma_{j}\right\}{ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } that minimizes the mode switching error.

IV Mode Identification with Online Deterministic Annealing

We aim to construct a recursive stochastic optimization algorithm to solve problem (16) while progressively estimating the number K𝐾Kitalic_K of the augmented codevectors {μi}i=1Ksuperscriptsubscriptsubscript𝜇𝑖𝑖1𝐾\left\{\mu_{i}\right\}_{i=1}^{K}{ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, an estimate s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG of the actual number of modes, and a mapping between {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT and {Si}i=1s^superscriptsubscriptsubscript𝑆𝑖𝑖1^𝑠\left\{S_{i}\right\}_{i=1}^{\hat{s}}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT. Recall that the observed data are represented by the random variable XΠ𝑋ΠX\in\Piitalic_X ∈ roman_Π in (14), and the augmented codevectors {μi}i=1Ksuperscriptsubscriptsubscript𝜇𝑖𝑖1𝐾\left\{\mu_{i}\right\}_{i=1}^{K}{ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT are normally treated as constant parameters to be estimated. To progressively estimate K𝐾Kitalic_K and s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG, we will adopt the online deterministic annealing approach [29, 30], and define a probability space over an arbitrary number of codevectors, while constraining their distribution using a maximum-entropy principle at different levels. First we define a quantizer Q:ΠΠ:𝑄ΠΠQ:\Pi\rightarrow\Piitalic_Q : roman_Π → roman_Π as a stochastic mapping of the form:

Q(x)=μi with probability p(μi|x).𝑄𝑥subscript𝜇𝑖 with probability 𝑝conditionalsubscript𝜇𝑖𝑥Q(x)=\mu_{i}\ \text{ with probability }p(\mu_{i}|x).italic_Q ( italic_x ) = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with probability italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) . (18)

Then we formulate the multi-objective optimization

minimizeϕ^Fλ(μ)=(1λ)D(μ)λH(μ),λ[0,1),formulae-sequencesubscriptminimize^italic-ϕsubscript𝐹𝜆𝜇1𝜆𝐷𝜇𝜆𝐻𝜇𝜆01\operatorname*{minimize}_{\hat{\phi}}\ F_{\lambda}(\mu)=(1-\lambda)D(\mu)-% \lambda H(\mu),\ \lambda\in[0,1),roman_minimize start_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_μ ) = ( 1 - italic_λ ) italic_D ( italic_μ ) - italic_λ italic_H ( italic_μ ) , italic_λ ∈ [ 0 , 1 ) , (19)

where the dependence on ϕ^^italic-ϕ\hat{\phi}over^ start_ARG italic_ϕ end_ARG comes through μ(ϕ^)𝜇^italic-ϕ\mu(\hat{\phi})italic_μ ( over^ start_ARG italic_ϕ end_ARG ), the term

D(μ)𝐷𝜇\displaystyle D(\mu)italic_D ( italic_μ ) =𝔼[d(X,Q)]=p(x)ip(μi|x)dρ(x,μi)dxabsent𝔼delimited-[]𝑑𝑋𝑄𝑝𝑥subscript𝑖𝑝conditionalsubscript𝜇𝑖𝑥subscript𝑑𝜌𝑥subscript𝜇𝑖d𝑥\displaystyle=\mathbb{E}\left[d\left(X,Q\right)\right]=\int p(x)\sum_{i}p(\mu_% {i}|x)d_{\rho}(x,\mu_{i})~{}\textrm{d}x= blackboard_E [ italic_d ( italic_X , italic_Q ) ] = ∫ italic_p ( italic_x ) ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) d italic_x (20)

is a generalization of the objective in (16), and

H(μ)𝐻𝜇\displaystyle H(\mu)italic_H ( italic_μ ) =𝔼[logP(X,Q)]absent𝔼delimited-[]𝑃𝑋𝑄\displaystyle=\mathbb{E}\left[-\log P(X,Q)\right]= blackboard_E [ - roman_log italic_P ( italic_X , italic_Q ) ] (21)
=H(X)p(x)ip(μi|x)logp(μi|x)dxabsent𝐻𝑋𝑝𝑥subscript𝑖𝑝conditionalsubscript𝜇𝑖𝑥𝑝conditionalsubscript𝜇𝑖𝑥d𝑥\displaystyle=H(X)-\int p(x)\sum_{i}p(\mu_{i}|x)\log p(\mu_{i}|x)~{}\textrm{d}x= italic_H ( italic_X ) - ∫ italic_p ( italic_x ) ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) roman_log italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) d italic_x

is the Shannon entropy. This is now a problem of finding the locations {ϕ^i}subscript^italic-ϕ𝑖\left\{\hat{\phi}_{i}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } and the corresponding probabilities {p(μi|x)=[Q=μi|X=x]}𝑝conditionalsubscript𝜇𝑖𝑥delimited-[]𝑄conditionalsubscript𝜇𝑖𝑋𝑥\left\{p(\mu_{i}|x)=\mathbb{P}[Q=\mu_{i}|X=x]\right\}{ italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) = blackboard_P [ italic_Q = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X = italic_x ] }.

Notice that, for p(μi|x)=𝟙[ϕΣi(ϕ^)]𝑝conditionalsubscript𝜇𝑖𝑥subscript1delimited-[]italic-ϕsubscriptΣ𝑖^italic-ϕp(\mu_{i}|x)=\mathds{1}_{\left[\phi\in\Sigma_{i}(\hat{\phi})\right]}italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) = blackboard_1 start_POSTSUBSCRIPT [ italic_ϕ ∈ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_ϕ end_ARG ) ] end_POSTSUBSCRIPT and λ=0𝜆0\lambda=0italic_λ = 0, (19) is equivalent to (16). In that sense, (19) introduces extra optimization parameters in the probabilities {p(μi|x)}𝑝conditionalsubscript𝜇𝑖𝑥\left\{p(\mu_{i}|x)\right\}{ italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) }, and the parameter λ𝜆\lambdaitalic_λ that defines a homotopy Fλsubscript𝐹𝜆F_{\lambda}italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT. However, the advantages of this approach are notable, and, perhaps counter-intuitively, lead to numerical optimization solutions with several computational benefits. On the one hand, the Lagrange multiplier λ[0,1)𝜆01\lambda\in[0,1)italic_λ ∈ [ 0 , 1 ) controls the trade-off between D𝐷Ditalic_D and H𝐻Hitalic_H, which, as will be shown, is a trade-off between performance and complexity. On the other hand, the use of the conditional probabilities {p(μi|x)}𝑝conditionalsubscript𝜇𝑖𝑥\left\{p(\mu_{i}|x)\right\}{ italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) } allows for the definition of the entropy term H𝐻Hitalic_H, which introduces several useful properties [29, 30, 31, 32, 33]. In particular, as we will show in Section IV-B, reducing the values of λ𝜆\lambdaitalic_λ defines a direction that resembles an annealing process [29, 34] and induces a bifurcation phenomenon, with respect to which, the number of unique codevectors Kλsubscript𝐾𝜆K_{\lambda}italic_K start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT depends on λ𝜆\lambdaitalic_λ and is finite for any given value of λ>0𝜆0\lambda>0italic_λ > 0. This process also introduces robustness with respect to initial conditions [29, 35].

IV-A Solving the Optimization Problem

To solve (19) for a given value of λ𝜆\lambdaitalic_λ, we successively minimize Fλsubscript𝐹𝜆F_{\lambda}italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT first with respect to the association probabilities {p(μi|x)}𝑝conditionalsubscript𝜇𝑖𝑥\left\{p(\mu_{i}|x)\right\}{ italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) }, and then with respect to the codevector locations μ𝜇\muitalic_μ. The solution of the optimization problem

Fλ(μ)superscriptsubscript𝐹𝜆𝜇\displaystyle F_{\lambda}^{*}(\mu)italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) =min{p(μi|x)}Fλ(μ),absentsubscript𝑝conditionalsubscript𝜇𝑖𝑥subscript𝐹𝜆𝜇\displaystyle=\min_{\left\{p(\mu_{i}|x)\right\}}F_{\lambda}(\mu),= roman_min start_POSTSUBSCRIPT { italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) } end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_μ ) , (22)
      s.t. ip(μi|x)=1,subscript𝑖𝑝conditionalsubscript𝜇𝑖𝑥1\displaystyle\sum_{i}p(\mu_{i}|x)=1,∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) = 1 ,

is given by the Gibbs distributions [36]:

p(μi|x)=e1λλdρ(x,μi)je1λλdρ(x,μj),xΠ.formulae-sequencesuperscript𝑝conditionalsubscript𝜇𝑖𝑥superscript𝑒1𝜆𝜆subscript𝑑𝜌𝑥subscript𝜇𝑖subscript𝑗superscript𝑒1𝜆𝜆subscript𝑑𝜌𝑥subscript𝜇𝑗for-all𝑥Πp^{*}(\mu_{i}|x)=\frac{e^{-\frac{1-\lambda}{\lambda}d_{\rho}(x,\mu_{i})}}{\sum% _{j}e^{-\frac{1-\lambda}{\lambda}d_{\rho}(x,\mu_{j})}},~{}\forall x\in\Pi.italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x , italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG , ∀ italic_x ∈ roman_Π . (23)

In order to minimize F(μ)superscript𝐹𝜇F^{*}(\mu)italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) with respect to ϕ^^italic-ϕ\hat{\phi}over^ start_ARG italic_ϕ end_ARG we set the gradients to zero

ddϕ^Fλ(μ)=ddμFλ(μ)dμdϕ^=0dd^italic-ϕsuperscriptsubscript𝐹𝜆𝜇dd𝜇superscriptsubscript𝐹𝜆𝜇d𝜇d^italic-ϕ0\frac{\mathrm{d}}{\mathrm{d}{\hat{\phi}}}F_{\lambda}^{*}(\mu)=\frac{\mathrm{d}% }{\mathrm{d}\mu}F_{\lambda}^{*}(\mu)\frac{\mathrm{d}\mu}{\mathrm{d}{\hat{\phi}% }}=0divide start_ARG roman_d end_ARG start_ARG roman_d over^ start_ARG italic_ϕ end_ARG end_ARG italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) = divide start_ARG roman_d end_ARG start_ARG roman_d italic_μ end_ARG italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) divide start_ARG roman_d italic_μ end_ARG start_ARG roman_d over^ start_ARG italic_ϕ end_ARG end_ARG = 0 (24)

where dμdϕ^=[0m×dId]d𝜇d^italic-ϕmatrixsubscript0𝑚𝑑subscript𝐼𝑑\frac{\mathrm{d}\mu}{\mathrm{d}{\hat{\phi}}}=\begin{bmatrix}0_{m\times d}\\ I_{d}\end{bmatrix}divide start_ARG roman_d italic_μ end_ARG start_ARG roman_d over^ start_ARG italic_ϕ end_ARG end_ARG = [ start_ARG start_ROW start_CELL 0 start_POSTSUBSCRIPT italic_m × italic_d end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ], and

ddμFλ(μ)=ddμ((1λ)D(μ)λH(μ))dd𝜇superscriptsubscript𝐹𝜆𝜇dd𝜇1𝜆𝐷𝜇𝜆𝐻𝜇\displaystyle\frac{\mathrm{d}}{\mathrm{d}{\mu}}F_{\lambda}^{*}(\mu)=\frac{% \mathrm{d}}{\mathrm{d}\mu}\left((1-\lambda)D(\mu)-\lambda H(\mu)\right)divide start_ARG roman_d end_ARG start_ARG roman_d italic_μ end_ARG italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) = divide start_ARG roman_d end_ARG start_ARG roman_d italic_μ end_ARG ( ( 1 - italic_λ ) italic_D ( italic_μ ) - italic_λ italic_H ( italic_μ ) ) (25)
=ip(x)p(μi|x)ddμidρ(x,μi)dx=0,absentsubscript𝑖𝑝𝑥superscript𝑝conditionalsubscript𝜇𝑖𝑥ddsubscript𝜇𝑖subscript𝑑𝜌𝑥subscript𝜇𝑖differential-d𝑥0\displaystyle=\sum_{i}\int p(x)p^{*}(\mu_{i}|x)\frac{\mathrm{d}}{\mathrm{d}\mu% _{i}}d_{\rho}(x,\mu_{i})~{}\mathrm{d}x=0,= ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∫ italic_p ( italic_x ) italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) divide start_ARG roman_d end_ARG start_ARG roman_d italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x = 0 ,

where we have used (23) and direct differentiation with similar arguments as in [36]. It follows that ddϕ^Fλ(μ)=0dd^italic-ϕsuperscriptsubscript𝐹𝜆𝜇0\frac{\mathrm{d}}{\mathrm{d}\hat{\phi}}F_{\lambda}^{*}(\mu)=0divide start_ARG roman_d end_ARG start_ARG roman_d over^ start_ARG italic_ϕ end_ARG end_ARG italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) = 0 which implies that

p(x)p(μi|x)ddμidρ(x,μi)dx[0m×dId]=0,i.𝑝𝑥superscript𝑝conditionalsubscript𝜇𝑖𝑥ddsubscript𝜇𝑖subscript𝑑𝜌𝑥subscript𝜇𝑖differential-d𝑥matrixsubscript0𝑚𝑑subscript𝐼𝑑0for-all𝑖\displaystyle\int p(x)p^{*}(\mu_{i}|x)\frac{\mathrm{d}}{\mathrm{d}\mu_{i}}d_{% \rho}(x,\mu_{i})~{}\mathrm{d}x\begin{bmatrix}0_{m\times d}\\ I_{d}\end{bmatrix}=0,\ \forall i.∫ italic_p ( italic_x ) italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) divide start_ARG roman_d end_ARG start_ARG roman_d italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x [ start_ARG start_ROW start_CELL 0 start_POSTSUBSCRIPT italic_m × italic_d end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = 0 , ∀ italic_i . (26)

Equation (26) has a closed-form solution if the dissimilarity measure dρsubscript𝑑𝜌d_{\rho}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT belongs to the family of Bregman divergences [37, 29],information-theoretic dissimilarity measures that include the squared Euclidean distance and the Kullback-Leibler divergence, and are defined as follows:

Definition 1 (Bregman Divergence).

Let ρ:S:𝜌𝑆\rho:S\rightarrow\mathbb{R}italic_ρ : italic_S → blackboard_R, be a strictly convex function defined on a vector space Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that ϕitalic-ϕ\phiitalic_ϕ is twice F-differentiable on S𝑆Sitalic_S. The Bregman divergence dρ:H×S[0,):subscript𝑑𝜌𝐻𝑆0d_{\rho}:H\times S\rightarrow\left[0,\infty\right)italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT : italic_H × italic_S → [ 0 , ∞ ) is defined as:

dρ(x,μ)=ρ(x)ρ(μ)ρμ(μ)(xμ),subscript𝑑𝜌𝑥𝜇𝜌𝑥𝜌𝜇𝜌𝜇𝜇𝑥𝜇\displaystyle d_{\rho}\left(x,\mu\right)=\rho\left(x\right)-\rho\left(\mu% \right)-\frac{\partial\rho}{\partial\mu}\left(\mu\right)\left(x-\mu\right),italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x , italic_μ ) = italic_ρ ( italic_x ) - italic_ρ ( italic_μ ) - divide start_ARG ∂ italic_ρ end_ARG start_ARG ∂ italic_μ end_ARG ( italic_μ ) ( italic_x - italic_μ ) ,

where x,μS𝑥𝜇𝑆x,\mu\in Sitalic_x , italic_μ ∈ italic_S, and the continuous linear map ρμ(μ):S:𝜌𝜇𝜇𝑆\frac{\partial\rho}{\partial\mu}\left(\mu\right):S\rightarrow\mathbb{R}divide start_ARG ∂ italic_ρ end_ARG start_ARG ∂ italic_μ end_ARG ( italic_μ ) : italic_S → blackboard_R is the Fréchet derivative of ρ𝜌\rhoitalic_ρ at μ𝜇\muitalic_μ.

Throughout this manuscript, we will assume that the dissimilarity measure dρsubscript𝑑𝜌d_{\rho}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT in (13) is a Bregman divergence. Then the solution to the optimization problem

minimizeϕ^Fλ(μ(ϕ^)),subscriptminimize^italic-ϕsuperscriptsubscript𝐹𝜆𝜇^italic-ϕ\operatorname*{minimize}_{\hat{\phi}}~{}F_{\lambda}^{*}\left(\mu(\hat{\phi})% \right),roman_minimize start_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ( over^ start_ARG italic_ϕ end_ARG ) ) , (27)

where Fλ(μ)superscriptsubscript𝐹𝜆𝜇F_{\lambda}^{*}(\mu)italic_F start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ ) is the solution of (22) for a given λ[0,1)𝜆01\lambda\in[0,1)italic_λ ∈ [ 0 , 1 ) and p(μi|x)superscript𝑝conditionalsubscript𝜇𝑖𝑥p^{*}(\mu_{i}|x)italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) is given by (23), is given by Theorem 2.

Theorem 2.

If dρ:Π×Π+:subscript𝑑𝜌ΠΠsubscriptd_{\rho}:\Pi\times\Pi\rightarrow\mathbb{R}_{+}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT : roman_Π × roman_Π → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is a Bregman divergence, then

ϕ^i=ϕp(x)p(μi|x)dxp(μi)superscriptsubscript^italic-ϕ𝑖italic-ϕ𝑝𝑥superscript𝑝conditionalsubscript𝜇𝑖𝑥differential-d𝑥superscript𝑝subscript𝜇𝑖\hat{\phi}_{i}^{*}=\frac{\int\phi p(x)p^{*}(\mu_{i}|x)~{}\mathrm{d}x}{p^{*}(% \mu_{i})}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG ∫ italic_ϕ italic_p ( italic_x ) italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) roman_d italic_x end_ARG start_ARG italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG (28)

is a solution to the optimization problem (27).

Proof.

By definition, for a Bregman divergence dρ:Π×Π+:subscript𝑑𝜌ΠΠsubscriptd_{\rho}:\Pi\times\Pi\rightarrow\mathbb{R}_{+}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT : roman_Π × roman_Π → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT based on a strictly convex function ρ:Π:𝜌Π\rho:\Pi\rightarrow\mathbb{R}italic_ρ : roman_Π → blackboard_R, it holds that dρμ(x,μ)=2ρ(μ),(xμ)subscript𝑑𝜌𝜇𝑥𝜇superscript2𝜌𝜇𝑥𝜇\frac{\partial d_{\rho}}{\partial\mu}(x,\mu)=-\left<\nabla^{2}\rho(\mu),(x-\mu% )\right>divide start_ARG ∂ italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_μ end_ARG ( italic_x , italic_μ ) = - ⟨ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ ( italic_μ ) , ( italic_x - italic_μ ) ⟩. Similar to [30], with standard algebraic manipulations, (26) then becomes

(ϕϕ^i)p(x)p(μi|x)dx=0,i,italic-ϕsuperscriptsubscript^italic-ϕ𝑖𝑝𝑥superscript𝑝conditionalsubscript𝜇𝑖𝑥differential-d𝑥0for-all𝑖\int(\phi-\hat{\phi}_{i}^{*})p(x)p^{*}(\mu_{i}|x)~{}\mathrm{d}x=0,\ \forall i,∫ ( italic_ϕ - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_p ( italic_x ) italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) roman_d italic_x = 0 , ∀ italic_i , (29)

where p(μi|x)superscript𝑝conditionalsubscript𝜇𝑖𝑥p^{*}(\mu_{i}|x)italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) is given by (23) and the integral is defined over the domain ΠΠ\Piroman_Π. Eq. (29) is equivalent to (28) since p(x)p(μi|x)dx=p(μi)𝑝𝑥superscript𝑝conditionalsubscript𝜇𝑖𝑥differential-d𝑥superscript𝑝subscript𝜇𝑖\int p(x)p^{*}(\mu_{i}|x)~{}\mathrm{d}x=p^{*}(\mu_{i})∫ italic_p ( italic_x ) italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) roman_d italic_x = italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). ∎

Remark 3.

The partition {Σi}subscriptΣ𝑖\left\{\Sigma_{i}\right\}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } induced by (13) and a dissimilarity measure dρsubscript𝑑𝜌d_{\rho}italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT that belongs to the family of Bregman divergences, is separated by hyperplanes [37]. As a result, each ΣisubscriptΣ𝑖\Sigma_{i}roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a polyhedral region for a bounded domain S𝑆Sitalic_S.

Based on Theorem 2, Theorem 3 below constructs a gradient-free stochastic approximation algorithm that recursively estimates (28).

Theorem 3.

The sequence ϕ^i(t)subscript^italic-ϕ𝑖𝑡\hat{\phi}_{i}(t)over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) constructed by the recursive updates

{ρi(t+1)=ρi(t)+β(t)[p^i(t)ρi(t)]σi(t+1)=σi(t)+β(t)[ϕtp^i(t)σi(t)],casessubscript𝜌𝑖𝑡1absentsubscript𝜌𝑖𝑡𝛽𝑡delimited-[]subscript^𝑝𝑖𝑡subscript𝜌𝑖𝑡subscript𝜎𝑖𝑡1absentsubscript𝜎𝑖𝑡𝛽𝑡delimited-[]subscriptitalic-ϕ𝑡subscript^𝑝𝑖𝑡subscript𝜎𝑖𝑡\begin{cases}\rho_{i}(t+1)&=\rho_{i}(t)+\beta(t)\left[\hat{p}_{i}(t)-\rho_{i}(% t)\right]\\ \sigma_{i}(t+1)&=\sigma_{i}(t)+\beta(t)\left[\phi_{t}\hat{p}_{i}(t)-\sigma_{i}% (t)\right],\end{cases}{ start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) end_CELL start_CELL = italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_β ( italic_t ) [ over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ] end_CELL end_ROW start_ROW start_CELL italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) end_CELL start_CELL = italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_β ( italic_t ) [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ] , end_CELL end_ROW (30)

where xt=[ψtTϕtT]Tsubscript𝑥𝑡superscriptdelimited-[]superscriptsubscript𝜓𝑡Tsuperscriptsubscriptitalic-ϕ𝑡𝑇Tx_{t}=[\psi_{t}^{\mathrm{T}}\phi_{t}^{T}]^{\mathrm{T}}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT represents external input with ψtΨsimilar-tosubscript𝜓𝑡Ψ\psi_{t}\sim\Psiitalic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ roman_Ψ, ϕtΦsimilar-tosubscriptitalic-ϕ𝑡Φ\phi_{t}\sim\Phiitalic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ roman_Φ, tβ(t)=subscript𝑡𝛽𝑡\sum_{t}\beta(t)=\infty∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_β ( italic_t ) = ∞, tβ2(t)<subscript𝑡superscript𝛽2𝑡\sum_{t}\beta^{2}(t)<\infty∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) < ∞, and the quantities p^i(t)subscript^𝑝𝑖𝑡\hat{p}_{i}(t)over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) and ϕ^i(t)subscript^italic-ϕ𝑖𝑡\hat{\phi}_{i}(t)over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) are recursively updated as follows:

ϕ^i(t)=σi(t)ρi(t),p^i(t)=ρi(t)e1λλd(xt,μi(t))iρi(t)e1λλd(xt,μi(t)),formulae-sequencesubscript^italic-ϕ𝑖𝑡subscript𝜎𝑖𝑡subscript𝜌𝑖𝑡subscript^𝑝𝑖𝑡subscript𝜌𝑖𝑡superscript𝑒1𝜆𝜆𝑑subscript𝑥𝑡subscript𝜇𝑖𝑡subscript𝑖subscript𝜌𝑖𝑡superscript𝑒1𝜆𝜆𝑑subscript𝑥𝑡subscript𝜇𝑖𝑡\displaystyle\hat{\phi}_{i}(t)=\frac{\sigma_{i}(t)}{\rho_{i}(t)},\quad\hat{p}_% {i}(t)=\frac{\rho_{i}(t)e^{-\frac{1-\lambda}{\lambda}d(x_{t},\mu_{i}(t))}}{% \sum_{i}\rho_{i}(t)e^{-\frac{1-\lambda}{\lambda}d(x_{t},\mu_{i}(t))}},over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG , over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) end_POSTSUPERSCRIPT end_ARG , (31)

with μi(t)=[ziT(ϕt,θ^i),ϕ^i(t)T]Tsubscript𝜇𝑖𝑡superscriptsuperscriptsubscript𝑧𝑖Tsubscriptitalic-ϕ𝑡subscript^𝜃𝑖subscript^italic-ϕ𝑖superscript𝑡TT\mu_{i}(t)=[z_{i}^{\mathrm{T}}(\phi_{t},\hat{\theta}_{i}),\hat{\phi}_{i}(t)^{% \mathrm{T}}]^{\mathrm{T}}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = [ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, converges almost surely to ϕ^isuperscriptsubscript^italic-ϕ𝑖\hat{\phi}_{i}^{*}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT given in (28).

Proof.

The proof follows similar arguments as Theorem 5555 of [30]. ∎

Remark 4.

Notice that the dynamics of (30) can be expressed as:

ϕ^i(t+1)subscript^italic-ϕ𝑖𝑡1\displaystyle\hat{\phi}_{i}(t+1)over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) =β(t)ρi(t)[σi(t+1)ρi(t+1)(ρi(t)p^i(t))+ϕtp^i(t)σi(t)],absent𝛽𝑡subscript𝜌𝑖𝑡delimited-[]subscript𝜎𝑖𝑡1subscript𝜌𝑖𝑡1subscript𝜌𝑖𝑡subscript^𝑝𝑖𝑡subscriptitalic-ϕ𝑡subscript^𝑝𝑖𝑡subscript𝜎𝑖𝑡\displaystyle=\frac{\beta(t)}{\rho_{i}(t)}\bigg{[}\frac{\sigma_{i}(t+1)}{\rho_% {i}(t+1)}\left(\rho_{i}(t)-\hat{p}_{i}(t)\right)+\phi_{t}\hat{p}_{i}(t)-\sigma% _{i}(t)\bigg{]},= divide start_ARG italic_β ( italic_t ) end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG [ divide start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) end_ARG ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) + italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ] , (32)

where the recursive updates take place for every codevector ϕ^isubscript^italic-ϕ𝑖\hat{\phi}_{i}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT sequentially. This is a discrete-time dynamical system that presents bifurcation phenomena with respect to the parameter λ𝜆\lambdaitalic_λ, i.e., the number of equilibria of this system changes with respect to the value λ𝜆\lambdaitalic_λ which is hidden inside the term p^i(t)subscript^𝑝𝑖𝑡\hat{p}_{i}(t)over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) in (31). According to this phenomenon, the number of distinct values of ϕ^isubscript^italic-ϕ𝑖\hat{\phi}_{i}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is finite, and the updates need only be taken with respect to these values that we call “effective codevectors”. This is discussed in Section IV-B.

IV-B Bifurcation Phenomena

In Section IV-A we described how to solve the optimization problem for a given value of the parameter λ𝜆\lambdaitalic_λ. The main idea of the proposed approach is to solve a sequence of optimization problems of the form (19) with decreasing values of λ𝜆\lambdaitalic_λ. This process then becomes a homotopy optimization method [38]. In particular, the usage of the entropy term resembles annealing optimization methods and grants λ𝜆\lambdaitalic_λ the name of a ’temperature’ parameter. Notice that, so far, we have assumed an arbitrary number of codevectors K𝐾Kitalic_K. We will show that the unique values of the set {ϕ^i}subscript^italic-ϕ𝑖\left\{\hat{\phi}_{i}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } that solves (19), form a finite set of Kλsubscript𝐾𝜆K_{\lambda}italic_K start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT values that we will refer to as “effective codevectors”.

Notice that at high temperature (λ1𝜆1\lambda\rightarrow 1italic_λ → 1), (23) yields uniform association probabilities p(μi|x)=p(μj|x),i,j,x𝑝conditionalsubscript𝜇𝑖𝑥𝑝conditionalsubscript𝜇𝑗𝑥for-all𝑖𝑗for-all𝑥p(\mu_{i}|x)=p(\mu_{j}|x),\ \forall i,j,\forall xitalic_p ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) = italic_p ( italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x ) , ∀ italic_i , italic_j , ∀ italic_x, and as a result of (28), all pseudo-inputs are located at the same point ϕ^i=𝔼X[ϕ],isubscript^italic-ϕ𝑖subscript𝔼𝑋delimited-[]italic-ϕfor-all𝑖\hat{\phi}_{i}=\mathbb{E}_{X}\left[\phi\right],\ \forall iover^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_ϕ ] , ∀ italic_i, which means that there is one unique “effective” codevector given by 𝔼X[ϕ]subscript𝔼𝑋delimited-[]italic-ϕ\mathbb{E}_{X}\left[\phi\right]blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_ϕ ]. As λ𝜆\lambdaitalic_λ is lowered below a critical value, a bifurcation phenomenon occurs, when the number of “effective” codevectors increases, which describes an annealing process [29, 34]. Mathematically, this occurs when the existing solution ϕ^superscript^italic-ϕ\hat{\phi}^{*}over^ start_ARG italic_ϕ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT given by (28) is no longer the minimum of the free energy Fsuperscript𝐹F^{*}italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, as the temperature λ𝜆\lambdaitalic_λ crosses a critical value. Following principles from variational calculus, we can track the bifurcation by the condition:

d2dϵ2F({ϕ^+ϵψ^})|ϵ=00,evaluated-atsuperscript𝑑2𝑑superscriptitalic-ϵ2superscript𝐹^italic-ϕitalic-ϵ^𝜓italic-ϵ00\frac{d^{2}}{d\epsilon^{2}}F^{*}(\left\{\hat{\phi}+\epsilon\hat{\psi}\right\})% \bigg{|}_{\epsilon=0}\geq 0,divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( { over^ start_ARG italic_ϕ end_ARG + italic_ϵ over^ start_ARG italic_ψ end_ARG } ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT ≥ 0 , (33)

for all choices of finite perturbations {ψ^}^𝜓\left\{\hat{\psi}\right\}{ over^ start_ARG italic_ψ end_ARG }. Using (33) and direct differentiation, one can show that bifurcation depends on the temperature coefficient λ𝜆\lambdaitalic_λ (and the choice of the Bregman divergence, through the function ρ𝜌\rhoitalic_ρ) [30, 36]. In other words, the number of codevectors increases countably many times as the value of λ𝜆\lambdaitalic_λ decreases, and an algorithmic implementation needs only as many codevectors in memory as the number of “effective” codevectors.

In practice. we can detect the bifurcation points by introducing perturbing pairs of codevectors at each temperature level λ𝜆\lambdaitalic_λ. In this way, the codevectors ϕ^^italic-ϕ\hat{\phi}over^ start_ARG italic_ϕ end_ARG are doubled by inserting a perturbation of each ϕ^isubscript^italic-ϕ𝑖\hat{\phi}_{i}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the set of effective codevectors. The newly inserted codevectors will merge with their pair if a critical temperature has not been reached and separate otherwise. The merging criterion takes the form:

1λλdρ(ϕ^i,ϕ^j)ϵn,i,j,1𝜆𝜆subscript𝑑𝜌subscript^italic-ϕ𝑖subscript^italic-ϕ𝑗subscriptitalic-ϵ𝑛for-all𝑖𝑗\frac{1-\lambda}{\lambda}d_{\rho}(\hat{\phi}_{i},\hat{\phi}_{j})\leq\epsilon_{% n},\ \forall i,j,divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≤ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , ∀ italic_i , italic_j , (34)

for a given threshold ϵnsubscriptitalic-ϵ𝑛\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. The pseudocode for this algorithm is presented in Alg. 1. A detailed discussion on the implementation of the original online deterministic annealing algorithm, its complexity, and the effect of its parameters, can be found in [29, 30, 36].

IV-C Estimating the number of modes

As illustrated in Fig. 1, the problem formulation developed in Section III defines a possibly imperfect surjective mapping from {Σj}j=1KsuperscriptsubscriptsubscriptΣ𝑗𝑗1𝐾\left\{\Sigma_{j}\right\}_{j=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT to {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT such that each Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is defined as a union of a subset of {Σj}j=1KsuperscriptsubscriptsubscriptΣ𝑗𝑗1𝐾\left\{\Sigma_{j}\right\}_{j=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. According to Remark 3, it is possible for this mapping to be perfect in the sense that each Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is perfectly represented, inducing zero mode switching error. The design of an appropriate termination criterion for Alg. 1 is an open question and is subject to the trade-off between the number K𝐾Kitalic_K and the minimization of the identification error. In this work, we make use of the condition KKmax𝐾subscript𝐾K\leq K_{\max}italic_K ≤ italic_K start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT as a termination criterion, where Kmaxsubscript𝐾K_{\max}italic_K start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT represents the computational capacity of the identification device.

Recall that each ΣjsubscriptΣ𝑗\Sigma_{j}roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is associated with a parameter vector θ^jsubscript^𝜃𝑗\hat{\theta}_{j}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, j=1,,K𝑗1𝐾j=1,\ldots,Kitalic_j = 1 , … , italic_K. Assuming a set θ¯={θ¯i}i=1s^¯𝜃superscriptsubscriptsubscript¯𝜃𝑖𝑖1^𝑠\bar{\theta}=\left\{\bar{\theta}_{i}\right\}_{i=1}^{\hat{s}}over¯ start_ARG italic_θ end_ARG = { over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT, we define each θ^jsubscript^𝜃𝑗\hat{\theta}_{j}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as the mapping:

θ^j(θ¯)=θ¯i,if i=argminkdρ(θ^j,θ¯k).formulae-sequencesubscript^𝜃𝑗¯𝜃subscript¯𝜃𝑖if 𝑖subscriptargmin𝑘subscript𝑑𝜌subscript^𝜃𝑗subscript¯𝜃𝑘\hat{\theta}_{j}(\bar{\theta})=\bar{\theta}_{i},\ \text{if }i=\operatorname*{% arg\,min}_{k}d_{\rho}(\hat{\theta}_{j},\bar{\theta}_{k}).over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , if italic_i = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (35)

In this way ΣjSisubscriptΣ𝑗subscript𝑆𝑖\Sigma_{j}\in S_{i}roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if θ^j(θ¯)=θ¯isubscript^𝜃𝑗¯𝜃subscript¯𝜃𝑖\hat{\theta}_{j}(\bar{\theta})=\bar{\theta}_{i}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Therefore, given (35), the goal now is to find s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG and θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG such that s^=s^𝑠𝑠\hat{s}=sover^ start_ARG italic_s end_ARG = italic_s, and θ¯i=θisubscript¯𝜃𝑖subscript𝜃𝑖\bar{\theta}_{i}=\theta_{i}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i{1,,s}for-all𝑖1𝑠\forall i\in\left\{1,\ldots,s\right\}∀ italic_i ∈ { 1 , … , italic_s }. We follow a similar approach to the bifurcation mechanism described in Section IV-B. Starting with one codevector ϕ^0subscript^italic-ϕ0\hat{\phi}_{0}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we define θ¯0=θ^0subscript¯𝜃0subscript^𝜃0\bar{\theta}_{0}=\hat{\theta}_{0}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Every time a codevector ϕ^jsubscript^italic-ϕ𝑗\hat{\phi}_{j}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is split into a pair of perturbed codevectors, a new θ^jsubscript^𝜃superscript𝑗\hat{\theta}_{j^{\prime}}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is introduced. After convergence for a given λ𝜆\lambdaitalic_λ, merging of the codevectors is detected by (34). For the insertion of a new θ¯isubscript¯𝜃𝑖\bar{\theta}_{i}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT we check the condition:

dρ(θ^j,θ¯i)>ϵs,j,subscript𝑑𝜌subscript^𝜃𝑗subscript¯𝜃𝑖subscriptitalic-ϵ𝑠for-all𝑗d_{\rho}(\hat{\theta}_{j},\bar{\theta}_{i})>\epsilon_{s},\ \forall j,italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > italic_ϵ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , ∀ italic_j , (36)

with respect to a given threshold ϵssubscriptitalic-ϵ𝑠\epsilon_{s}italic_ϵ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Notice that in contrast to (34), (36) does not depend on λ𝜆\lambdaitalic_λ. If (36) is satisfied, a new θ¯isubscript¯𝜃𝑖\bar{\theta}_{i}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is introduced and s^s^+1^𝑠^𝑠1\hat{s}\leftarrow\hat{s}+1over^ start_ARG italic_s end_ARG ← over^ start_ARG italic_s end_ARG + 1. This process is integrated in the mode identification algorithm and its pseudocode is presented in Alg. 1.

Remark 5.

Note that {θ^j}subscript^𝜃𝑗\left\{\hat{\theta}_{j}\right\}{ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } are only used as functions of θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG, and the parameters {θ¯i}subscript¯𝜃𝑖\left\{\bar{\theta}_{i}\right\}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } are the ones that are being updated by the local system identification algorithm that will be presented in Section V.

Algorithm 1 Switched System Identification
  Set parameters and initialize ϕ^={ϕ^0},θ¯={θ¯0}formulae-sequence^italic-ϕsubscript^italic-ϕ0¯𝜃subscript¯𝜃0\hat{\phi}=\left\{\hat{\phi}_{0}\right\},\bar{\theta}=\left\{\bar{\theta}_{0}\right\}over^ start_ARG italic_ϕ end_ARG = { over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } , over¯ start_ARG italic_θ end_ARG = { over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }
  while K<Kmax𝐾subscript𝐾maxK<K_{\textrm{max}}italic_K < italic_K start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and λ>λmin𝜆subscript𝜆min\lambda>\lambda_{\textrm{min}}italic_λ > italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT do
     Perturb ϕ^i{ϕ^i+δ,ϕ^iδ}subscript^italic-ϕ𝑖subscript^italic-ϕ𝑖𝛿subscript^italic-ϕ𝑖𝛿\hat{\phi}_{i}\leftarrow\left\{\hat{\phi}_{i}+\delta,\hat{\phi}_{i}-\delta\right\}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← { over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_δ , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_δ }, ifor-all𝑖\forall i∀ italic_i
     Set t0𝑡0t\leftarrow 0italic_t ← 0
     repeat
        Observe x=(ψ,ϕ)𝑥𝜓italic-ϕx=(\psi,\phi)italic_x = ( italic_ψ , italic_ϕ ) according to (11)
        Update θ¯wsubscript¯𝜃𝑤\bar{\theta}_{w}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT, w=argminjdρ(ϕ,ϕ^j)𝑤subscriptargmin𝑗subscript𝑑𝜌italic-ϕsubscript^italic-ϕ𝑗w=\operatorname*{arg\,min}_{j}d_{\rho}(\phi,\hat{\phi}_{j})italic_w = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_ϕ , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), using (39)
        for i=1,,K𝑖1𝐾i=1,\ldots,Kitalic_i = 1 , … , italic_K do
           Update ϕ^^italic-ϕ\hat{\phi}over^ start_ARG italic_ϕ end_ARG using (30), (31)
        end for
        tt+1𝑡𝑡1t\leftarrow t+1italic_t ← italic_t + 1
     until Convergence
     Discard ϕ^isubscript^italic-ϕ𝑖\hat{\phi}_{i}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if 1λλd(ϕ^j,ϕ^i)<ϵn1𝜆𝜆𝑑subscript^italic-ϕ𝑗subscript^italic-ϕ𝑖subscriptitalic-ϵ𝑛\frac{1-\lambda}{\lambda}d(\hat{\phi}_{j},\hat{\phi}_{i})<\epsilon_{n}divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d ( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, i,j,ijfor-all𝑖𝑗𝑖𝑗\forall i,j,i\neq j∀ italic_i , italic_j , italic_i ≠ italic_j
     Insert θ^isubscript^𝜃𝑖\hat{\theta}_{i}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG if dρ(θ^j,θ^i)>ϵs,jsubscript𝑑𝜌subscript^𝜃𝑗subscript^𝜃𝑖subscriptitalic-ϵ𝑠for-all𝑗d_{\rho}(\hat{\theta}_{j},\hat{\theta}_{i})>\epsilon_{s},\ \forall jitalic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > italic_ϵ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , ∀ italic_j
     Lower temperature λγλ𝜆𝛾𝜆\lambda\leftarrow\gamma\lambdaitalic_λ ← italic_γ italic_λ, 0<γ<10𝛾10<\gamma<10 < italic_γ < 1
  end while
  Define {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT using (13)
  Define s^=card(θ¯)^𝑠card¯𝜃\hat{s}=\text{card}(\bar{\theta})over^ start_ARG italic_s end_ARG = card ( over¯ start_ARG italic_θ end_ARG )
  Define {Si}i=1s^superscriptsubscriptsubscript𝑆𝑖𝑖1^𝑠\left\{S_{i}\right\}_{i=1}^{\hat{s}}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT by ΣjSisubscriptΣ𝑗subscript𝑆𝑖\Sigma_{j}\in S_{i}roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if θ^j(θ¯)=θ¯isubscript^𝜃𝑗¯𝜃subscript¯𝜃𝑖\hat{\theta}_{j}(\bar{\theta})=\bar{\theta}_{i}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
  Estimated Model Parameters: s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG, {Si}i=1s^superscriptsubscriptsubscript𝑆𝑖𝑖1^𝑠\left\{S_{i}\right\}_{i=1}^{\hat{s}}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT, {θ¯i}i=1s^superscriptsubscriptsubscript¯𝜃𝑖𝑖1^𝑠\left\{\bar{\theta}_{i}\right\}_{i=1}^{\hat{s}}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT

V Piecewise Affine System Identification

In this section we review standard recursive system identification for estimating the parameters {θ¯i}subscript¯𝜃𝑖\left\{\bar{\theta}_{i}\right\}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of the local models given knowledge of the partition {Si}subscript𝑆𝑖\left\{S_{i}\right\}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }. We show that this kind of recursive identification can be formulated as a stochastic approximation algorithm, and that it can be combined using the theory of two-timescale stochastic approximation with the stochastic approximation method of estimating {Si}subscript𝑆𝑖\left\{S_{i}\right\}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } by {Σi}subscriptΣ𝑖\left\{\Sigma_{i}\right\}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } as proposed in Section IV.

V-A Recursive Identification of Local Models

Recall that, given knowledge of the partition {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT, each local linear model of the PWA system in (11) is completely defined by the parameters {θi}subscript𝜃𝑖\left\{\theta_{i}\right\}{ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }. In the following, we develop a stochastic approximation recursion to estimate {θ¯i}subscript¯𝜃𝑖\left\{\bar{\theta}_{i}\right\}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }. First we define the error:

ϵ(t)=i𝟙[ϕtSi][ϕtTIm]θ¯iψtitalic-ϵ𝑡subscript𝑖subscript1delimited-[]subscriptitalic-ϕ𝑡subscript𝑆𝑖delimited-[]tensor-productsuperscriptsubscriptitalic-ϕ𝑡Tsubscript𝐼𝑚subscript¯𝜃𝑖subscript𝜓𝑡\epsilon(t)=\sum_{i}\mathds{1}_{\left[\phi_{t}\in S_{i}\right]}[\phi_{t}^{% \mathrm{T}}\otimes I_{m}]\bar{\theta}_{i}-\psi_{t}italic_ϵ ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (37)

A stochastic gradient descent approach aims to minimize the error:

minimizeθ¯i12𝔼[ϵ(t)2],subscriptminimizesubscript¯𝜃𝑖12𝔼delimited-[]superscriptnormitalic-ϵ𝑡2\operatorname*{minimize}_{\bar{\theta}_{i}}~{}\frac{1}{2}\mathbb{E}\left[\|% \epsilon(t)\|^{2}\right],roman_minimize start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG blackboard_E [ ∥ italic_ϵ ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (38)

using the recursive updates:

θ¯i(t+1)subscript¯𝜃𝑖𝑡1\displaystyle\bar{\theta}_{i}(t+1)over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) =θ¯i(t)α(t)(θ¯iϵ(t))ϵ(t)absentsubscript¯𝜃𝑖𝑡𝛼𝑡subscriptsubscript¯𝜃𝑖italic-ϵ𝑡italic-ϵ𝑡\displaystyle=\bar{\theta}_{i}(t)-\alpha(t)\left(\nabla_{\bar{\theta}_{i}}% \epsilon(t)\right)\epsilon(t)= over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_α ( italic_t ) ( ∇ start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϵ ( italic_t ) ) italic_ϵ ( italic_t ) (39)
=θ¯i(t)α(t)[ϕtTIm]Tϵ(t)absentsubscript¯𝜃𝑖𝑡𝛼𝑡superscriptdelimited-[]tensor-productsuperscriptsubscriptitalic-ϕ𝑡Tsubscript𝐼𝑚Titalic-ϵ𝑡\displaystyle=\bar{\theta}_{i}(t)-\alpha(t)[\phi_{t}^{\mathrm{T}}\otimes I_{m}% ]^{\mathrm{T}}\epsilon(t)= over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_α ( italic_t ) [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϵ ( italic_t )

where nα(n)=subscript𝑛𝛼𝑛\sum_{n}\alpha(n)=\infty∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_α ( italic_n ) = ∞, nα2(n)<subscript𝑛superscript𝛼2𝑛\sum_{n}\alpha^{2}(n)<\infty∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) < ∞. Here the expectation is taken with respect to the joint distribution of (ψy,ϕt)subscript𝜓𝑦subscriptitalic-ϕ𝑡(\psi_{y},\phi_{t})( italic_ψ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) as explained in Section III. This is a standard recursive identification method and constitutes a stochastic approximation sequence of the form:

θ¯i(t+1)=θ¯i(t)+α(t)[hθ(θ¯i(t))+Mθ(t+1)],t0,formulae-sequencesubscript¯𝜃𝑖𝑡1subscript¯𝜃𝑖𝑡𝛼𝑡delimited-[]subscript𝜃subscript¯𝜃𝑖𝑡subscript𝑀𝜃𝑡1𝑡0\bar{\theta}_{i}(t+1)=\bar{\theta}_{i}(t)+\alpha(t)\left[h_{\theta}(\bar{% \theta}_{i}(t))+M_{\theta}(t+1)\right],\ t\geq 0,over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_α ( italic_t ) [ italic_h start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) + italic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t + 1 ) ] , italic_t ≥ 0 , (40)

where hθ(θ¯i)=𝔼[ϵ(t)2]subscript𝜃subscript¯𝜃𝑖𝔼delimited-[]superscriptnormitalic-ϵ𝑡2h_{\theta}(\bar{\theta}_{i})=-\nabla\mathbb{E}\left[\|\epsilon(t)\|^{2}\right]italic_h start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = - ∇ blackboard_E [ ∥ italic_ϵ ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] is Lipschitz, and M(t+1)=𝔼[ϵ(t)2]ϵ(t)2𝑀𝑡1𝔼delimited-[]superscriptnormitalic-ϵ𝑡2superscriptnormitalic-ϵ𝑡2M(t+1)=\nabla\mathbb{E}\left[\|\epsilon(t)\|^{2}\right]-\nabla\|\epsilon(t)\|^% {2}italic_M ( italic_t + 1 ) = ∇ blackboard_E [ ∥ italic_ϵ ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ∇ ∥ italic_ϵ ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a Martingale difference sequence. This sequence converges almost surely to the equillibrium of the differential equation

θi¯˙=hθ(θ¯i),t0.formulae-sequence˙¯subscript𝜃𝑖subscript𝜃subscript¯𝜃𝑖𝑡0\dot{\bar{\theta_{i}}}=h_{\theta}(\bar{\theta}_{i}),\ t\geq 0.over˙ start_ARG over¯ start_ARG italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG = italic_h start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_t ≥ 0 . (41)

which can be shown to be a solution of (38) with standard Lyapunov arguments. For more details the reader is referred to [39, 30]. Moreover, notice that (39) is a vectorized representation of (9), for γ=α(t)>0𝛾𝛼𝑡0\gamma=\alpha(t)>0italic_γ = italic_α ( italic_t ) > 0. Therefore, under the PE condition (10) of Assumption 10, and under the zero-mean noise assumption, it follows that θ¯isubscript¯𝜃𝑖\bar{\theta}_{i}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT converges asymptotically to θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all i=1,,s𝑖1𝑠i=1,\ldots,sitalic_i = 1 , … , italic_s, i.e., the minimum of (38) is achieved.

V-B Combined Mode and Dynamics Identification

Recall that the mode identification method is based on the stochastic approximation updates (30) that can be written with respect to the vectors ξi(t)=[ρiT(t)σiT(t)]Tsubscript𝜉𝑖𝑡superscriptdelimited-[]superscriptsubscript𝜌𝑖T𝑡superscriptsubscript𝜎𝑖T𝑡T\xi_{i}(t)=[\rho_{i}^{\mathrm{T}}(t)\sigma_{i}^{\mathrm{T}}(t)]^{\mathrm{T}}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = [ italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( italic_t ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( italic_t ) ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and a stepsize schedule β(t)𝛽𝑡\beta(t)italic_β ( italic_t ) in the form:

ξi(t+1)=ξi(t)+β(t)[hϕ(ξ(t),θ¯(t))+Mϕ(t+1)],t0,formulae-sequencesubscript𝜉𝑖𝑡1subscript𝜉𝑖𝑡𝛽𝑡delimited-[]subscriptitalic-ϕ𝜉𝑡¯𝜃𝑡subscript𝑀italic-ϕ𝑡1𝑡0\xi_{i}(t+1)=\xi_{i}(t)+\beta(t)\left[h_{\phi}\left(\xi(t),\bar{\theta}(t)% \right)+M_{\phi}(t+1)\right],\ t\geq 0,italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) = italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_β ( italic_t ) [ italic_h start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_ξ ( italic_t ) , over¯ start_ARG italic_θ end_ARG ( italic_t ) ) + italic_M start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_t + 1 ) ] , italic_t ≥ 0 , (42)

where hϕsubscriptitalic-ϕh_{\phi}italic_h start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT is Lipschitz, Mϕ(t)subscript𝑀italic-ϕ𝑡M_{\phi}(t)italic_M start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_t ) is a Martingale difference sequence and the dependence on θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG comes from the quantity p^i(t)subscript^𝑝𝑖𝑡\hat{p}_{i}(t)over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) in (31) given (35). At the same time, the recursive system identification technique to estimate θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG is a stochastic approximation sequence with a stepsize schedule α(t)𝛼𝑡\alpha(t)italic_α ( italic_t ) of the form:

θ¯i(t+1)=θ¯i(t)+α(t)[hθ(ξ(t),θ¯(t))+Mθ(t+1)],t0,formulae-sequencesubscript¯𝜃𝑖𝑡1subscript¯𝜃𝑖𝑡𝛼𝑡delimited-[]subscript𝜃𝜉𝑡¯𝜃𝑡subscript𝑀𝜃𝑡1𝑡0\bar{\theta}_{i}(t+1)=\bar{\theta}_{i}(t)+\alpha(t)\left[h_{\theta}\left(\xi(t% ),\bar{\theta}(t)\right)+M_{\theta}(t+1)\right],\ t\geq 0,over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_α ( italic_t ) [ italic_h start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ξ ( italic_t ) , over¯ start_ARG italic_θ end_ARG ( italic_t ) ) + italic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t + 1 ) ] , italic_t ≥ 0 , (43)

as given in (40). The dependence on ξ𝜉\xiitalic_ξ, comes through (37), since ξ𝜉\xiitalic_ξ defines ϕ^^italic-ϕ\hat{\phi}over^ start_ARG italic_ϕ end_ARG, which defines {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT through (13), which defines {Si}i=1s^superscriptsubscriptsubscript𝑆𝑖𝑖1^𝑠\left\{S_{i}\right\}_{i=1}^{\hat{s}}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT through the rule ΣjSisubscriptΣ𝑗subscript𝑆𝑖\Sigma_{j}\in S_{i}roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if θ^j(θ¯)=θ¯isubscript^𝜃𝑗¯𝜃subscript¯𝜃𝑖\hat{\theta}_{j}(\bar{\theta})=\bar{\theta}_{i}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Theorem 4 shows how the two recursive algorithms (42) and (43) can be combined using the theory of two-timescale stochastic approximation if β(t)/α(t)0𝛽𝑡𝛼𝑡0\nicefrac{{\beta(t)}}{{\alpha(t)}}\rightarrow 0/ start_ARG italic_β ( italic_t ) end_ARG start_ARG italic_α ( italic_t ) end_ARG → 0, i.e., when the estimation of the partition {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is updated at a slower rate than the updates of the parameters {θ¯i}i=1s^superscriptsubscriptsubscript¯𝜃𝑖𝑖1^𝑠\left\{\bar{\theta}_{i}\right\}_{i=1}^{\hat{s}}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_s end_ARG end_POSTSUPERSCRIPT.

Theorem 4.

Consider the sequence {ξ(t)}t+subscript𝜉𝑡𝑡subscript\left\{\xi(t)\right\}_{t\in\mathbb{Z}_{+}}{ italic_ξ ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT generated using the updates (42), where ξi(t)=[ρiT(t)σiT(t)]Tsubscript𝜉𝑖𝑡superscriptdelimited-[]superscriptsubscript𝜌𝑖T𝑡superscriptsubscript𝜎𝑖T𝑡T\xi_{i}(t)=[\rho_{i}^{\mathrm{T}}(t)\sigma_{i}^{\mathrm{T}}(t)]^{\mathrm{T}}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = [ italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( italic_t ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( italic_t ) ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, and (ρi,σi)subscript𝜌𝑖subscript𝜎𝑖(\rho_{i},\sigma_{i})( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are defined in (30). Consider the sequence {θ¯(t)}t+subscript¯𝜃𝑡𝑡subscript\left\{\bar{\theta}(t)\right\}_{t\in\mathbb{Z}_{+}}{ over¯ start_ARG italic_θ end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT generated by the updates (43). Let the stepsizes (α(t),β(t))𝛼𝑡𝛽𝑡(\alpha(t),\beta(t))( italic_α ( italic_t ) , italic_β ( italic_t ) ) of (43) and (42), respectively, satisfy the conditions nα(n)=nβ(n)=subscript𝑛𝛼𝑛subscript𝑛𝛽𝑛\sum_{n}\alpha(n)=\sum_{n}\beta(n)=\infty∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_α ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_β ( italic_n ) = ∞, n(α2(n)+β2(n))<subscript𝑛superscript𝛼2𝑛superscript𝛽2𝑛\sum_{n}(\alpha^{2}(n)+\beta^{2}(n))<\infty∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) ) < ∞, and β(n)/α(n)0𝛽𝑛𝛼𝑛0\nicefrac{{\beta(n)}}{{\alpha(n)}}\rightarrow 0/ start_ARG italic_β ( italic_n ) end_ARG start_ARG italic_α ( italic_n ) end_ARG → 0, with the last condition implying that the iterations for {ξ(t)}𝜉𝑡\left\{\xi(t)\right\}{ italic_ξ ( italic_t ) } run on a slower timescale than those for {θ¯(t)}¯𝜃𝑡\left\{\bar{\theta}(t)\right\}{ over¯ start_ARG italic_θ end_ARG ( italic_t ) }. If the equation

θ¯˙(t)=hθ(ξ,θ¯(t)),θ¯(0)=θ¯0,formulae-sequence˙¯𝜃𝑡subscript𝜃𝜉¯𝜃𝑡¯𝜃0subscript¯𝜃0\dot{\bar{\theta}}(t)=h_{\theta}(\xi,\bar{\theta}(t)),\ \bar{\theta}(0)=\bar{% \theta}_{0},over˙ start_ARG over¯ start_ARG italic_θ end_ARG end_ARG ( italic_t ) = italic_h start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ξ , over¯ start_ARG italic_θ end_ARG ( italic_t ) ) , over¯ start_ARG italic_θ end_ARG ( 0 ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (44)

has an asymptotically stable equilibrium λ(ξ)𝜆𝜉\lambda(\xi)italic_λ ( italic_ξ ) for fixed ξ𝜉\xiitalic_ξ and some Lipschitz mapping λ𝜆\lambdaitalic_λ, and the equation

ξ˙(t)=hϕ(ξ(t),λ(ξ(t))),ξ(0)=ξ0,formulae-sequence˙𝜉𝑡subscriptitalic-ϕ𝜉𝑡𝜆𝜉𝑡𝜉0subscript𝜉0\dot{\xi}(t)=h_{\phi}(\xi(t),\lambda(\xi(t))),\ \xi(0)=\xi_{0},over˙ start_ARG italic_ξ end_ARG ( italic_t ) = italic_h start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_ξ ( italic_t ) , italic_λ ( italic_ξ ( italic_t ) ) ) , italic_ξ ( 0 ) = italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (45)

has an asymptotically stable equilibrium ξsuperscript𝜉\xi^{*}italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, then, almost surely, the sequence (ξ(t),θ¯(t))𝜉𝑡¯𝜃𝑡(\xi(t),\bar{\theta}(t))( italic_ξ ( italic_t ) , over¯ start_ARG italic_θ end_ARG ( italic_t ) ) generated by (42), (43), converges to (ξ,λ(ξ))superscript𝜉𝜆superscript𝜉(\xi^{*},\lambda(\xi^{*}))( italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_λ ( italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ).

Proof.

It follows directly from Theorem 2, Ch. 6 of [39]. ∎

Corollary 4.1.

Condition (44) of Theorem 4 is satisfied by the definition of hθsubscript𝜃h_{\theta}italic_h start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT in (41). Therefore, (45) implies the convergence of ϕ^^italic-ϕ\hat{\phi}over^ start_ARG italic_ϕ end_ARG through (31), and of the partition {Σi}subscriptΣ𝑖\left\{\Sigma_{i}\right\}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } through (13).

Notice that the condition β(t)/α(t)0𝛽𝑡𝛼𝑡0\nicefrac{{\beta(t)}}{{\alpha(t)}}\rightarrow 0/ start_ARG italic_β ( italic_t ) end_ARG start_ARG italic_α ( italic_t ) end_ARG → 0 is of great importance. Intuitively, the stochastic approximation algorithm (42), (43) consists of two components running in different timescales, where the slow component is viewed as quasi-static when analyzing the behavior of the fast transient. In practice, the condition β(t)/α(t)0𝛽𝑡𝛼𝑡0\nicefrac{{\beta(t)}}{{\alpha(t)}}\rightarrow 0/ start_ARG italic_β ( italic_t ) end_ARG start_ARG italic_α ( italic_t ) end_ARG → 0 is satisfied by stepsizes of the form (α(t),β(t))=(1/t,1/(1+tlogt))𝛼𝑡𝛽𝑡1𝑡11𝑡𝑡(\alpha(t),\beta(t))=(\nicefrac{{1}}{{t}},\nicefrac{{1}}{{(1+t\log t)}})( italic_α ( italic_t ) , italic_β ( italic_t ) ) = ( / start_ARG 1 end_ARG start_ARG italic_t end_ARG , / start_ARG 1 end_ARG start_ARG ( 1 + italic_t roman_log italic_t ) end_ARG ), or (α(t),β(t))=(1/t2/3,1/t)𝛼𝑡𝛽𝑡1superscript𝑡231𝑡(\alpha(t),\beta(t))=(\nicefrac{{1}}{{t^{\nicefrac{{2}}{{3}}}}},\nicefrac{{1}}% {{t}})( italic_α ( italic_t ) , italic_β ( italic_t ) ) = ( / start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT / start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT end_ARG , / start_ARG 1 end_ARG start_ARG italic_t end_ARG ). Another way of achieving the two-timescale effect is to run the iterations for the slow component with stepsizes {αt(k)}subscript𝛼𝑡𝑘\left\{\alpha_{t(k)}\right\}{ italic_α start_POSTSUBSCRIPT italic_t ( italic_k ) end_POSTSUBSCRIPT }, where t(k)𝑡𝑘t(k)italic_t ( italic_k ) is a subsequence of t𝑡titalic_t that becomes increasingly rare (i.e. t(k+1)t(k)𝑡𝑘1𝑡𝑘t(k+1)-t(k)\rightarrow\inftyitalic_t ( italic_k + 1 ) - italic_t ( italic_k ) → ∞), while keeping its values constant between these instants. A good policy is to combine both approaches and update the slow component with slower stepsize schedule β(t)𝛽𝑡\beta(t)italic_β ( italic_t ) along a subsequence keeping its values constant in between (e.g., [30, 39]).

VI General Switched System Identification

So far, in Sections III, IV, and V we have developed a real-time idenitification method for PWA systems. Notice, however, that neither the proposed methodology, nor the algorithmic implementation of Alg. 1 are constrained to PWA systems, meaning that the proposed approach can, in principle, be applied to more general switching and hybrid systems. However, one must proceed with caution, as issues may arise with respect to the identifiability conditions, the mode-switching estimation error, and the possibly non-linear local system identification error. In this section, we discuss the applicability of the proposed approach in different cases often encountered in hybrid control systems, namely switched linear systems with non-polyhedral partition, and switched non-linear systems with polyhedral partition.

VI-A Switched linear systems with non-polyhedral partition.

In the case of linear local dynamics, the recursive identification method discussed in Section V-A remains unchanged, and the same convergence results hold as well. However, if the regions Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the mode switching partition {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT are non-polyhedral, they cannot be perfectly approximated by a finite union of polyhedral regions {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. Therefore, the mode switching estimation will have inherent non-zero error. It is worth pointing out that from the convergence results of the online deterministic annealing algorithm [36], it follows that the partition error can be arbitrarily small in the limit K𝐾K\rightarrow\inftyitalic_K → ∞ (which is the case when λ0𝜆0\lambda\rightarrow 0italic_λ → 0). Albeit a nice analytical result, in practice there will always be non-zero error in the estimation of the partition {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT.

We hereby discuss two ways to deal with this problem. The first is to assume the existence of a non-linear transformation that maps each Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to a polyhedral region S¯isubscript¯𝑆𝑖\bar{S}_{i}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and proceed with Alg. 1. General-purpose learning machines, such as artificial neural networks can be incorporated in this process. Further assumptions and analysis is required for this method, which is beyond the scope of this paper. The second refers to mitigating the jumping effect of the identified system to decrease the closed-loop error that naturally occurs due to imperfect mode switching. To this end, recall that, given an observation ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT the dynamics of the identified model are given according to (11) by:

ψ^t=[ϕtTIm]θ¯i, if ϕtΣj and θ^j(θ¯)=θ¯i.formulae-sequencesubscript^𝜓𝑡delimited-[]tensor-productsuperscriptsubscriptitalic-ϕ𝑡Tsubscript𝐼𝑚subscript¯𝜃𝑖 if subscriptitalic-ϕ𝑡subscriptΣ𝑗 and subscript^𝜃𝑗¯𝜃subscript¯𝜃𝑖\hat{\psi}_{t}=[\phi_{t}^{\mathrm{T}}\otimes I_{m}]\bar{\theta}_{i},\text{ if % }\phi_{t}\in\Sigma_{j}\text{ and }\hat{\theta}_{j}(\bar{\theta})=\bar{\theta}_% {i}.over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , if italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over¯ start_ARG italic_θ end_ARG ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (46)

To mitigate the jumping behavior one can make use of the association probabilities

p(ϕi|ϕt)=e1λλdρ(ϕt,ϕi)je1λλdρ(ϕt,ϕi),𝑝conditionalsubscriptitalic-ϕ𝑖subscriptitalic-ϕ𝑡superscript𝑒1𝜆𝜆subscript𝑑𝜌subscriptitalic-ϕ𝑡subscriptitalic-ϕ𝑖subscript𝑗superscript𝑒1𝜆𝜆subscript𝑑𝜌subscriptitalic-ϕ𝑡subscriptitalic-ϕ𝑖p(\phi_{i}|\phi_{t})=\frac{e^{-\frac{1-\lambda}{\lambda}d_{\rho}(\phi_{t},\phi% _{i})}}{\sum_{j}e^{-\frac{1-\lambda}{\lambda}d_{\rho}(\phi_{t},\phi_{i})}},italic_p ( italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_λ end_ARG start_ARG italic_λ end_ARG italic_d start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG , (47)

to instead construct the weighted dynamics:

ψ^t=i=1Kp(ϕi|ϕt)[ϕtTIm]θ^i.subscript^𝜓𝑡superscriptsubscript𝑖1𝐾superscript𝑝conditionalsubscriptitalic-ϕ𝑖subscriptitalic-ϕ𝑡delimited-[]tensor-productsuperscriptsubscriptitalic-ϕ𝑡Tsubscript𝐼𝑚subscript^𝜃𝑖\hat{\psi}_{t}=\sum_{i=1}^{K}p^{*}(\phi_{i}|\phi_{t})[\phi_{t}^{\mathrm{T}}% \otimes I_{m}]\hat{\theta}_{i}.over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (48)

This jump-mitigation method has been used in the literature to preserve smoothness of the closed-loop dynamics and is particularly useful when hybrid system identification is used for non-linear function approximation, i.e., when the original system is not hybrid but is to be approximated by a hybrid system with simpler local dynamics.

VI-B Switched non-linear systems with polyhedral partition.

In this case, often referred to as piece-wise non-linear hybrid systems [40], the mode switching partition {Si}i=1ssuperscriptsubscriptsubscript𝑆𝑖𝑖1𝑠\left\{S_{i}\right\}_{i=1}^{s}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT is polyhedral, and can be perfectly approximated by a finite union of polyhedral regions {Σi}i=1KsuperscriptsubscriptsubscriptΣ𝑖𝑖1𝐾\left\{\Sigma_{i}\right\}_{i=1}^{K}{ roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. For the identification of the non-linear local dynamics, the recursive identification method discussed in Section V-A needs to be modified. In particular the recursive updates:

θ¯i(t+1)subscript¯𝜃𝑖𝑡1\displaystyle\bar{\theta}_{i}(t+1)over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) =θ¯i(t)α(t)(θ¯iϵ(t))ϵ(t),absentsubscript¯𝜃𝑖𝑡𝛼𝑡subscriptsubscript¯𝜃𝑖italic-ϵ𝑡italic-ϵ𝑡\displaystyle=\bar{\theta}_{i}(t)-\alpha(t)\left(\nabla_{\bar{\theta}_{i}}% \epsilon(t)\right)\epsilon(t),= over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_α ( italic_t ) ( ∇ start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϵ ( italic_t ) ) italic_ϵ ( italic_t ) , (49)

given in (39) of the same stochastic gradient descent structure are used, with the error term in this case given by

ϵ(t)=i𝟙[ϕtSi]f^(ϕt,θ¯i)ψt,italic-ϵ𝑡subscript𝑖subscript1delimited-[]subscriptitalic-ϕ𝑡subscript𝑆𝑖^𝑓subscriptitalic-ϕ𝑡subscript¯𝜃𝑖subscript𝜓𝑡\epsilon(t)=\sum_{i}\mathds{1}_{\left[\phi_{t}\in S_{i}\right]}\hat{f}(\phi_{t% },\bar{\theta}_{i})-\psi_{t},italic_ϵ ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT [ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG ( italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (50)

where the functions f^(ϕt,θ¯i)^𝑓subscriptitalic-ϕ𝑡subscript¯𝜃𝑖\hat{f}(\phi_{t},\bar{\theta}_{i})over^ start_ARG italic_f end_ARG ( italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are local parametric models of known form, differentiable with respect to the parameters θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. General-purpose learning machines, such as artificial neural networks can be used. Notice that the identification updates remain stochastic approximation updates of the same form, which means that the convergence results of Theorem 4 continue to hold.

VII Experimental Results

We illustrate the properties and evaluate the performance of the proposed algorithm in two PWA systems, one in PWARX form and the other in state-space form.

VII-A PWARX System

The first one, adopted from [9], is given in the input–output representation of (51):

yt={θ1Tϕt+et,if rtP1θ2Tϕt+et,if rtP2θ3Tϕt+et,if rtP3,subscript𝑦𝑡casessuperscriptsubscript𝜃1Tsubscriptitalic-ϕ𝑡subscript𝑒𝑡if subscript𝑟𝑡subscript𝑃1otherwisesuperscriptsubscript𝜃2Tsubscriptitalic-ϕ𝑡subscript𝑒𝑡if subscript𝑟𝑡subscript𝑃2otherwisesuperscriptsubscript𝜃3Tsubscriptitalic-ϕ𝑡subscript𝑒𝑡if subscript𝑟𝑡subscript𝑃3otherwise\displaystyle y_{t}=\begin{cases}\theta_{1}^{\mathrm{T}}\phi_{t}+e_{t},\quad% \text{if }r_{t}\in P_{1}\\ \theta_{2}^{\mathrm{T}}\phi_{t}+e_{t},\quad\text{if }r_{t}\in P_{2}\\ \theta_{3}^{\mathrm{T}}\phi_{t}+e_{t},\quad\text{if }r_{t}\in P_{3}\\ \end{cases},italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW , (51)

where yt1subscript𝑦𝑡superscript1y_{t}\in\mathbb{R}^{1}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, rtP=[4,4]subscript𝑟𝑡𝑃44r_{t}\in P=[-4,4]italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_P = [ - 4 , 4 ], ϕt=[rt 1]Tsubscriptitalic-ϕ𝑡superscriptdelimited-[]subscript𝑟𝑡1T\phi_{t}=[r_{t}\ 1]^{\mathrm{T}}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT 1 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, (P1,P2,P3)=([4,1],(1,2),[2,4])subscript𝑃1subscript𝑃2subscript𝑃3411224(P_{1},P_{2},P_{3})=([-4,-1],(-1,2),[2,4])( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( [ - 4 , - 1 ] , ( - 1 , 2 ) , [ 2 , 4 ] ), and (θ1,θ2,θ3)=([1,2]T,[1,0]T,[1,2]T)subscript𝜃1subscript𝜃2subscript𝜃3superscript12Tsuperscript10Tsuperscript12T(\theta_{1},\theta_{2},\theta_{3})=([1,2]^{\mathrm{T}},[-1,0]^{\mathrm{T}},[1,% 2]^{\mathrm{T}})( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( [ 1 , 2 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , [ - 1 , 0 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , [ 1 , 2 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ). This example is chosen to showcase the properties of the proposed methodology, since its simplicity allows graphical representation of the signaling partition and the convergence of the model parameters. At the same time, it is a switching system that presents a jump at rt=2subscript𝑟𝑡2r_{t}=2italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 2, and same dynamics for different regions of the input space, i.e., θ1=θ3subscript𝜃1subscript𝜃3\theta_{1}=\theta_{3}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT while P1P3subscript𝑃1subscript𝑃3P_{1}\neq P_{3}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. System (51) can thus be written in the form:

yt={θ2Tϕt+et,if ϕtS2θ1Tϕt+et,otherwise,subscript𝑦𝑡casessuperscriptsubscript𝜃2Tsubscriptitalic-ϕ𝑡subscript𝑒𝑡if subscriptitalic-ϕ𝑡subscript𝑆2otherwisesuperscriptsubscript𝜃1Tsubscriptitalic-ϕ𝑡subscript𝑒𝑡otherwiseotherwise\displaystyle y_{t}=\begin{cases}\theta_{2}^{\mathrm{T}}\phi_{t}+e_{t},\quad% \text{if }\phi_{t}\in S_{2}\\ \theta_{1}^{\mathrm{T}}\phi_{t}+e_{t},\quad\text{otherwise}\end{cases},italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , otherwise end_CELL start_CELL end_CELL end_ROW , (52)

where S2={ϕ=[r 1]T:rP2}subscript𝑆2conditional-setitalic-ϕsuperscriptdelimited-[]𝑟1T𝑟subscript𝑃2S_{2}=\left\{\phi=[r\ 1]^{\mathrm{T}}:r\in P_{2}\right\}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_ϕ = [ italic_r 1 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT : italic_r ∈ italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. The representation of (52) in the input–output (ry𝑟𝑦r-yitalic_r - italic_y) space is shown in Fig. 2. A total of N=150𝑁150N=150italic_N = 150 observations under Gaussian noise (etN(0,0.2)similar-tosubscript𝑒𝑡𝑁00.2e_{t}\sim N(0,0.2)italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_N ( 0 , 0.2 )) are accessible sequentially.

Refer to caption
Figure 2: Representation of system (52) in the input–output space. The noisy observations used are also depicted.

Algorithm 1 is applied to the observations for T=900𝑇900T=900italic_T = 900 iterations. The same observations can be reused by the algorithm. The temperature parameters used for the online deterministic annealing algorithm are (λmax,λmin,γ)=(0.99,0.2,0.8)subscript𝜆subscript𝜆𝛾0.990.20.8(\lambda_{\max},\lambda_{\min},\gamma)=(0.99,0.2,0.8)( italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_γ ) = ( 0.99 , 0.2 , 0.8 ), and the stepsizes (α(t),β(t))=(1/(1+0.01t),1/(1+0.9tlogt))𝛼𝑡𝛽𝑡110.01𝑡110.9𝑡𝑡(\alpha(t),\beta(t))=(\nicefrac{{1}}{{(1+0.01t)}},\nicefrac{{1}}{{(1+0.9t\log t% )}})( italic_α ( italic_t ) , italic_β ( italic_t ) ) = ( / start_ARG 1 end_ARG start_ARG ( 1 + 0.01 italic_t ) end_ARG , / start_ARG 1 end_ARG start_ARG ( 1 + 0.9 italic_t roman_log italic_t ) end_ARG ). At first (λ=λmax𝜆subscript𝜆\lambda=\lambda_{\max}italic_λ = italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT), the algorithm keeps in memory only one codevector ϕ^1subscript^italic-ϕ1\hat{\phi}_{1}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and one model parameter vector θ¯1subscript¯𝜃1\bar{\theta}_{1}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, essentially assuming that the system has constant dynamics in the entire domain, i.e., S^1=Σ1={ϕ=[r 1]T:rP}subscript^𝑆1subscriptΣ1conditional-setitalic-ϕsuperscriptdelimited-[]𝑟1T𝑟𝑃\hat{S}_{1}=\Sigma_{1}=\left\{\phi=[r\ 1]^{\mathrm{T}}:r\in P\right\}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_ϕ = [ italic_r 1 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT : italic_r ∈ italic_P }. As new input–output pairs are observed, the estimated parameter θ¯1subscript¯𝜃1\bar{\theta}_{1}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT gets updated by the iterations (39). We have assumed θ¯1(0)=[1,1]Tsubscript¯𝜃10superscript11T\bar{\theta}_{1}(0)=[1,1]^{\mathrm{T}}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 0 ) = [ 1 , 1 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT.

At the same time, the estimate of θ¯1subscript¯𝜃1\bar{\theta}_{1}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are used to update the location of the codevector towards the mean of the observation domain as shown in (28). This process does not yield any accurate identification results since at this stage it is assumed that P1=Psubscript𝑃1𝑃P_{1}=Pitalic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_P. However, it boosts the robustness of the identification algorithm with respect to initial conditions, since the converged values of the parameters for λ=λmax𝜆subscript𝜆\lambda=\lambda_{\max}italic_λ = italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT will be used as initial conditions for the next value of λ𝜆\lambdaitalic_λ. As λ𝜆\lambdaitalic_λ is reduced, the bifurcation phenomenon described in Section IV-B takes place, and, after reaching a critical value, the single codevector splits into two duplicates. Now the algorithm assumes that there are two modes in the system and estimates the optimal model parameters {θ¯1,θ¯2}subscript¯𝜃1subscript¯𝜃2\left\{\bar{\theta}_{1},\bar{\theta}_{2}\right\}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } and partition {Σ1,Σ2}subscriptΣ1subscriptΣ2\left\{\Sigma_{1},\Sigma_{2}\right\}{ roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } (through the location of the codevectors {ϕ^1,ϕ^2}subscript^italic-ϕ1subscript^italic-ϕ2\left\{\hat{\phi}_{1},\hat{\phi}_{2}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }). This process continues until a desired termination criterion is reached. In this case it is the minimum temperature parameter λminsubscript𝜆\lambda_{\min}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT that reflects to a potential time and computational constraint of the system. The bifurcation phenomenon is illustrated in Fig. 3 where the locations of the codevectors {ϕ^i}subscript^italic-ϕ𝑖\left\{\hat{\phi}_{i}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, ϕ^iP=[4,4]subscript^italic-ϕ𝑖𝑃44\hat{\phi}_{i}\in P=[-4,4]over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_P = [ - 4 , 4 ] generated by Alg. 1 are shown. The algorithm progressively constructs a total of K=5𝐾5K=5italic_K = 5 effective codevectors. The number of modes is estimated with the process explained in Section IV-C. Two modes are estimated, i.e., s^=2^𝑠2\hat{s}=2over^ start_ARG italic_s end_ARG = 2 with θ¯={θ¯1,θ¯2}¯𝜃subscript¯𝜃1subscript¯𝜃2\bar{\theta}=\left\{\bar{\theta}_{1},\bar{\theta}_{2}\right\}over¯ start_ARG italic_θ end_ARG = { over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. The association of each effective codevector with each identified mode according to the rule (35) is shown in Fig. 3.

Refer to caption
Figure 3: Evolution of the codevectors {ϕ^i}subscript^italic-ϕ𝑖\left\{\hat{\phi}_{i}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } generated by Alg. 1 for system (52) illustrating the bifurcation phenomenon described in Section IV-B. The association of each effective codevector with each identified mode according to the rule (35) is also shown.
Refer to caption
Refer to caption
Figure 4: Identified modes, predicted output and identification error with respect to the true model (52). A single misclassification instance of the mode appears at the boundary of the true mode switching partition.

The final estimated partition, the output of the estimated model, and its error with respect to the true model without noise are shown in Fig. 4. As shown, the identification error is low with the exception of a single misclassification instance of the mode at the boundary of the true partition of the input–output domain. This mode switching error can be avoided by allowing λ𝜆\lambdaitalic_λ to go lower, which results in a larger number K𝐾Kitalic_K of effective codevectors and is indicative of the performance/complexity trade-off of the algorithm. Finally, the convergence of the parameters {θ¯i}subscript¯𝜃𝑖\left\{\bar{\theta}_{i}\right\}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of each of the s^=2^𝑠2\hat{s}=2over^ start_ARG italic_s end_ARG = 2 local models detected are shown in Fig. 5. Parameter values that do not appear at t=0𝑡0t=0italic_t = 0 indicate that belong to modes identified through the bifurcation phenomenon after a certain critical temperature value.

Refer to caption
Refer to caption
Figure 5: Convergence of the parameters {θ¯i}i=12superscriptsubscriptsubscript¯𝜃𝑖𝑖12\left\{\bar{\theta}_{i}\right\}_{i=1}^{2}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to the true values of (52). Parameter values that do not appear at t=0𝑡0t=0italic_t = 0 indicate that belong to modes identified through the bifurcation phenomenon described in Section IV-B.

VII-B State-Space PWA system

The second system we evaluate is given by the following linearized PWA dynamics in the state-space domain:

{xt+1=(I2+dt[0100])xt+dt[01]ut,if |ut|>1xt+1=(I2+dt[0101])xt+dt[00]ut,if |ut|1,casessubscript𝑥𝑡1formulae-sequenceabsentsubscript𝐼2d𝑡matrix0100subscript𝑥𝑡d𝑡matrix01subscript𝑢𝑡if subscript𝑢𝑡1subscript𝑥𝑡1formulae-sequenceabsentsubscript𝐼2d𝑡matrix0101subscript𝑥𝑡d𝑡matrix00subscript𝑢𝑡if subscript𝑢𝑡1\displaystyle\begin{cases}x_{t+1}&=(I_{2}+\textrm{d}t\begin{bmatrix}0&1\\ 0&0\end{bmatrix})x_{t}+\textrm{d}t\begin{bmatrix}0\\ 1\end{bmatrix}u_{t},\ \text{if }|u_{t}|>1\\ x_{t+1}&=(I_{2}+\textrm{d}t\begin{bmatrix}0&1\\ 0&-1\end{bmatrix})x_{t}+\textrm{d}t\begin{bmatrix}0\\ 0\end{bmatrix}u_{t},\ \text{if }|u_{t}|\leq 1\end{cases},{ start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_CELL start_CELL = ( italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + d italic_t [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] ) italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + d italic_t [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ] italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if | italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | > 1 end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_CELL start_CELL = ( italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + d italic_t [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - 1 end_CELL end_ROW end_ARG ] ) italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + d italic_t [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , if | italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ 1 end_CELL end_ROW , (53)

where xt2subscript𝑥𝑡superscript2x_{t}\in\mathbb{R}^{2}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and utsubscript𝑢𝑡u_{t}\in\mathbb{R}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R. System (53) has two modes (s=2𝑠2s=2italic_s = 2) and the switching signal is defined by the polyhedral regions R1={[xT|uT]T3:u<1}R_{1}=\left\{[x^{\mathrm{T}}|u^{T}]^{\mathrm{T}}\in\mathbb{R}^{3}:u<-1\right\}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { [ italic_x start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT : italic_u < - 1 }, R2={[xT|uT]T3:1<u<1}R_{2}=\left\{[x^{\mathrm{T}}|u^{T}]^{\mathrm{T}}\in\mathbb{R}^{3}:-1<u<1\right\}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { [ italic_x start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT : - 1 < italic_u < 1 }, and R3={[xT|uT]T3:1<u}R_{3}=\left\{[x^{\mathrm{T}}|u^{T}]^{\mathrm{T}}\in\mathbb{R}^{3}:1<u\right\}italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = { [ italic_x start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT : 1 < italic_u } with S1=R1R3subscript𝑆1subscript𝑅1subscript𝑅3S_{1}=R_{1}\bigcup R_{3}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋃ italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and S2=R2subscript𝑆2subscript𝑅2S_{2}=R_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The dynamics of (53) consist of a controllable double integrator when the input is of sufficient magnitude, and a stable autonomous system that drives its velocity to zero, otherwise. An example of such system can be an autonomous vehicle that avoids minimal, potentially accidental, gas pedal input. In this example, the linear system of the second mode (s=2𝑠2s=2italic_s = 2) is not minimal, and its identification relies on the mode switching behavior of the system, as explained in Section II-B. To preserve the PE conditions of Assumption 10, the input signal is chosen as ut=2cos(2πtdt)subscript𝑢𝑡22𝜋𝑡d𝑡u_{t}=2\cos(2\pi t*\textrm{d}t)italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 2 roman_cos ( 2 italic_π italic_t ∗ d italic_t ), t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, and the noise term wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a zero-mean Gaussian random variable with σ2=0.1superscript𝜎20.1\sigma^{2}=0.1italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.1. The evolution of (53) over time, as well as the mode switching behavior, are shown in Fig. 6.

Refer to caption
Figure 6: Time evolution of system (53) for T=3𝑇3T=3italic_T = 3 seconds. The mode-switching behavior is depicted.

The system is allowed to run for T=3s𝑇3𝑠T=3sitalic_T = 3 italic_s (seconds), with dt=0.01d𝑡0.01\textrm{d}t=0.01d italic_t = 0.01, i.e., a total of N=300𝑁300N=300italic_N = 300 observations are acquired online, based on which, the proposed method identifies the switched system in real time. The temperature parameters used for the online deterministic annealing algorithm are (λmax,λmin,γ)=(0.99,0.1,0.8)subscript𝜆subscript𝜆𝛾0.990.10.8(\lambda_{\max},\lambda_{\min},\gamma)=(0.99,0.1,0.8)( italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_γ ) = ( 0.99 , 0.1 , 0.8 ), and the stepsizes (α(t),β(t))=(1/1+0.01t,1/1+0.9tlogt)𝛼𝑡𝛽𝑡110.01𝑡110.9𝑡𝑡(\alpha(t),\beta(t))=(\nicefrac{{1}}{{1+0.01t}},\nicefrac{{1}}{{1+0.9t\log t}})( italic_α ( italic_t ) , italic_β ( italic_t ) ) = ( / start_ARG 1 end_ARG start_ARG 1 + 0.01 italic_t end_ARG , / start_ARG 1 end_ARG start_ARG 1 + 0.9 italic_t roman_log italic_t end_ARG ). At first (λ=λmax𝜆subscript𝜆\lambda=\lambda_{\max}italic_λ = italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT), the algorithm keeps in memory only one codevector ϕ^1subscript^italic-ϕ1\hat{\phi}_{1}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and one model parameter vector θ¯1subscript¯𝜃1\bar{\theta}_{1}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, essentially assuming that the system has constant dynamics in the entire domain. The estimated parameter θ^1subscript^𝜃1\hat{\theta}_{1}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT gets updated by the iterations (39). We have assumed θ^1(0)=[1,1,1,1,1,1]Tsubscript^𝜃10superscript111111T\hat{\theta}_{1}(0)=[1,1,1,1,1,1]^{\mathrm{T}}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 0 ) = [ 1 , 1 , 1 , 1 , 1 , 1 ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. As λ𝜆\lambdaitalic_λ is reduced, the bifurcation phenomenon described in Section IV-B takes place, and, after reaching a critical value, the single codevector splits into two duplicates. This process continues until the minimum temperature parameter λminsubscript𝜆\lambda_{\min}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT that reflects to a potential time and computational constraint of the system. The bifurcation phenomenon is illustrated in Fig. 7 where the third coordinate of the codevectors {ϕ^i}subscript^italic-ϕ𝑖\left\{\hat{\phi}_{i}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, which gives an estimate of the control input representation of the mode, is depicted. A total of K=4𝐾4K=4italic_K = 4 effective codevectors are estimated and the association of the regions {Σj}j=14superscriptsubscriptsubscriptΣ𝑗𝑗14\left\{\Sigma_{j}\right\}_{j=1}^{4}{ roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT with the identified modes {Si}i=12superscriptsubscriptsubscript𝑆𝑖𝑖12\left\{S_{i}\right\}_{i=1}^{2}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is also depicted in Fig. 7.

The identification error and the estimated mode switching behavior are shown in Fig. 8 in comparison with the true mode switching behavior of the system. More specifically, the algorithm identifies a total of s^=2^𝑠2\hat{s}=2over^ start_ARG italic_s end_ARG = 2 modes with S1=Σ3Σ4subscript𝑆1subscriptΣ3subscriptΣ4S_{1}=\Sigma_{3}\bigcup\Sigma_{4}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Σ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⋃ roman_Σ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and S2=Σ1Σ2subscript𝑆2subscriptΣ1subscriptΣ2S_{2}=\Sigma_{1}\bigcup\Sigma_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋃ roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In Figure 9, the convergence of the parameters {θ¯i}subscript¯𝜃𝑖\left\{\bar{\theta}_{i}\right\}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of each of the s^=2^𝑠2\hat{s}=2over^ start_ARG italic_s end_ARG = 2 local models detected to the actual {θi}i=12superscriptsubscriptsubscript𝜃𝑖𝑖12\left\{\theta_{i}\right\}_{i=1}^{2}{ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT observed are shown. Parameter values that do not appear at t=0𝑡0t=0italic_t = 0 indicate that they belong to modes identified through the bifurcation phenomenon after a certain critical temperature value.

Refer to caption
Figure 7: Mode estimation illustrating the bifurcation phenomenon for (53) described in Section IV-B. The evolution of the third coordinate (u^isubscript^𝑢𝑖\hat{u}_{i}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) of the codevectors {ϕ^i}subscript^italic-ϕ𝑖\left\{\hat{\phi}_{i}\right\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } is depicted.
Refer to caption
Figure 8: Identification error over time for system (53). The estimated modes are also compared against the original modes.
Refer to caption
Refer to caption
Figure 9: Convergence of the parameters {θ¯i}i=12superscriptsubscriptsubscript¯𝜃𝑖𝑖12\left\{\bar{\theta}_{i}\right\}_{i=1}^{2}{ over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to the true values of (53). Parameter values that do not appear at t=0𝑡0t=0italic_t = 0 indicate that belong to modes identified through the bifurcation phenomenon described in Section IV-B.

VIII Conclusion

We proposed a real-time identification scheme for discrete-time switched state-space models. In contrast to most existing identification algorithms for piece-wise affine systems, the proposed approach is appropriate for online identification of both the modes and the subsystems of the switched system, and is computationally efficient compared to standard algebraic, mixed-integer programming, and offline clustering-based methods. The progressive nature of the algorithm also provides real-time control over the performance-complexity trade-off.

Future directions include extensions of the proposed approach to identification of both discrete- and continuous-time partially observable piece-wise affine models in the state-space domain using real-time observations.

References

  • [1] A. Garulli, S. Paoletti, and A. Vicino, “A survey on switched and piecewise affine system identification,” IFAC Proceedings Volumes, vol. 45, no. 16, pp. 344–355, 2012.
  • [2] D. Liberzon, Switching in Systems and Control.   Springer, 2003, vol. 190.
  • [3] S. Paoletti, A. L. Juloski, G. Ferrari-Trecate, and R. Vidal, “Identification of hybrid systems a tutorial,” European Journal of Control, vol. 13, no. 2-3, pp. 242–260, 2007.
  • [4] A. Moradvandi, R. E. Lindeboom, E. Abraham, and B. De Schutter, “Models and methods for hybrid system identification: a systematic survey,” IFAC-PapersOnLine, vol. 56, no. 2, pp. 95–107, 2023.
  • [5] A. Bemporad, G. Ferrari-Trecate, and M. Morari, “Observability and controllability of piecewise affine and hybrid systems,” IEEE Transactions on Automatic Control, vol. 45, no. 10, pp. 1864–1876, 2000.
  • [6] R. Vidal, A. Chiuso, and S. Soatto, “Observability and identifiability of jump linear systems,” in IEEE Conference on Decision and Control, vol. 4, 2002, pp. 3614–3619.
  • [7] M. Petreczky, L. Bako, and J. H. Van Schuppen, “Realization theory of discrete-time linear switched systems,” Automatica, vol. 49, no. 11, pp. 3337–3344, 2013.
  • [8] J. Roll, A. Bemporad, and L. Ljung, “Identification of piecewise affine systems via mixed-integer programming,” Automatica, vol. 40, no. 1, pp. 37–50, 2004.
  • [9] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari, “A clustering technique for the identification of piecewise affine systems,” Automatica, vol. 39, no. 2, pp. 205–217, 2003.
  • [10] F. Lauer, G. Bloch, F. Lauer, and G. Bloch, “Hybrid system identification,” Hybrid System Identification: Theory and Algorithms for Learning Switching Models, pp. 77–101, 2019.
  • [11] L. Bako, “Identification of switched linear systems via sparse optimization,” Automatica, vol. 47, no. 4, pp. 668–677, 2011.
  • [12] R. Vidal, S. Soatto, Y. Ma, and S. Sastry, “An algebraic geometric approach to the identification of a class of linear hybrid systems,” in IEEE International Conference on Decision and Control, vol. 1, 2003, pp. 167–172.
  • [13] R. Vidal, “Recursive identification of switched ARX systems,” Automatica, vol. 44, no. 9, pp. 2274–2287, 2008.
  • [14] M. Gegundez, J. Aroba, and J. M. Bravo, “Identification of piecewise affine systems by means of fuzzy clustering and competitive learning,” Engineering Applications of Artificial Intelligence, vol. 21, no. 8, pp. 1321–1329, 2008.
  • [15] H. Nakada, K. Takaba, and T. Katayama, “Identification of piecewise affine systems based on statistical clustering technique,” Automatica, vol. 41, no. 5, pp. 905–913, 2005.
  • [16] C. Li, Z. Huang, Y. Wang, and H. Jiang, “Rapid identification of switched systems: A data-driven method in variational framework,” Science China Technological Sciences, vol. 64, no. 1, pp. 148–156, 2021.
  • [17] A. L. Juloski, S. Weiland, and W. M. H. Heemels, “A bayesian approach to identification of hybrid systems,” IEEE Transactions on Automatic Control, vol. 50, no. 10, pp. 1520–1533, 2005.
  • [18] L. Bako, K. Boukharouba, E. Duviella, and S. Lecoeuche, “A recursive identification algorithm for switched linear/affine models,” Nonlinear Analysis: Hybrid Systems, vol. 5, no. 2, pp. 242–253, 2011.
  • [19] R. Baptista, J. Y. Ishihara, and G. A. Borges, “Split and merge algorithm for identification of piecewise affine systems,” in American Control Conference, 2011, pp. 2018–2023.
  • [20] A. M. Ivanescu, T. Albin, D. Abel, and T. Seidl, “Employing correlation clustering for the identification of piecewise affine models,” in Workshop on Knowledge Discovery, Modeling and Simulation, 2011, pp. 7–14.
  • [21] M. G. Sefidmazgi, M. M. Kordmahalleh, A. Homaifar, and A. Karimoddini, “Switched linear system identification based on bounded-switching clustering,” in American Control Conference, 2015, pp. 1806–1811.
  • [22] A. Bemporad, A. Garulli, S. Paoletti, and A. Vicino, “A bounded-error approach to piecewise affine system identification,” IEEE Transactions on Automatic Control, vol. 50, no. 10, pp. 1567–1580, 2005.
  • [23] C. N. Mavridis and J. S. Baras, “Identification of piecewise affine systems with online deterministic annealing,” in IEEE Conference on Decision and Control, 2023, pp. 4885–4890.
  • [24] C. N. Mavridis, A. Kanellopoulos, J. S. Baras, and K. H. Johansson, “State-space piece-wise affine system identification with online deterministic annealing,” in European Control Conference, 2024, pp. 3110–3115.
  • [25] S. Weiland, A. L. Juloski, and B. Vet, “On the equivalence of switched affine models and switched arx models,” in IEEE Conference on Decision and Control, 2006, pp. 2614–2618.
  • [26] S. Paoletti, A. Garulli, J. Roll, and A. Vicino, “A necessary and sufficient condition for input-output realization of switched affine state space models,” in IEEE Conference on Decision and Control, 2008, pp. 935–940.
  • [27] S. Paoletti, J. Roll, A. Garulli, and A. Vicino, “On the input-output representation of piecewise affine state space models,” IEEE Transactions on Automatic Control, vol. 55, no. 1, pp. 60–73, 2009.
  • [28] M. Petreczky, L. Bako, and J. H. van Schuppen, “Identifiability of discrete-time linear switched systems,” in ACM International Conference on Hybrid Systems: Computation and Control, 2010, pp. 141–150.
  • [29] C. N. Mavridis and J. S. Baras, “Online deterministic annealing for classification and clustering,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7125–7134, 2023.
  • [30] C. Mavridis and J. S. Baras, “Annealing optimization for progressive learning with stochastic approximation,” IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 2862–2874, 2023.
  • [31] C. N. Mavridis, N. Suriyarachchi, and J. S. Baras, “Maximum-entropy progressive state aggregation for reinforcement learning,” in IEEE Conference on Decision and Control, 2021, pp. 5144–5149.
  • [32] C. N. Mavridis and J. S. Baras, “Progressive graph partitioning based on information diffusion,” in IEEE Conference on Decision and Control, 2021, pp. 37–42.
  • [33] C. N. Mavridis, N. Suriyarachchi, and J. S. Baras, “Detection of dynamically changing leaders in complex swarms from observed dynamic data,” in International Conference on Decision and Game Theory for Security.   Springer, 2020, pp. 223–240.
  • [34] K. Rose, “Deterministic annealing for clustering, compression, classification, regression, and related optimization problems,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2210–2239, 1998.
  • [35] C. Mavridis, E. Noorani, and J. S. Baras, “Risk sensitivity and entropy regularization in prototype-based learning,” in Mediterranean Conference on Control and Automation, 2022, pp. 194–199.
  • [36] C. Mavridis and J. Baras, “Multi-resolution online deterministic annealing: A hierarchical and progressive learning architecture,” arXiv preprint arXiv:2212.08189, 2022.
  • [37] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research, vol. 6, no. Oct, pp. 1705–1749, 2005.
  • [38] X. Lin, Z. Yang, X. Zhang, and Q. Zhang, “Continuation path learning for homotopy optimization,” in International Conference on Machine Learning.   PMLR, 2023, pp. 21 288–21 311.
  • [39] V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint.   Springer, 2009, vol. 48.
  • [40] F. Lauer and G. Bloch, “Switched and piecewise nonlinear hybrid system identification,” in Hybrid Systems: Computation and Control: 11th International Workshop, HSCC 2008, St. Louis, MO, USA, April 22-24, 2008. Proceedings 11.   Springer, 2008, pp. 330–343.
  • [41] B. M. Jenkins, A. M. Annaswamy, E. Lavretsky, and T. E. Gibson, “Convergence properties of adaptive systems and the definition of exponential stability,” SIAM Journal on Control and Optimization, vol. 56, no. 4, pp. 2463–2484, 2018.
  • [42] B. Anderson and J. Moore, “New results in linear system stability,” SIAM Journal on Control, vol. 7, no. 3, pp. 398–414, 1969.

Appendix A Proof of Theorem 1.

We construct the system

x^t+1=A^xt+B^ut,t+,formulae-sequencesubscript^𝑥𝑡1^𝐴subscript𝑥𝑡^𝐵subscript𝑢𝑡𝑡subscript\hat{x}_{t+1}=\hat{A}x_{t}+\hat{B}u_{t},\quad t\in\mathbb{Z}_{+},over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over^ start_ARG italic_A end_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over^ start_ARG italic_B end_ARG italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , (54)

where A^n×n^𝐴superscript𝑛𝑛\hat{A}\in\mathbb{R}^{n\times n}over^ start_ARG italic_A end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, and B^n×p^𝐵superscript𝑛𝑝\hat{B}\in\mathbb{R}^{n\times p}over^ start_ARG italic_B end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT. Subtracting (7) from (54), we get:

et+1=Θ¯rt,t+,formulae-sequencesubscript𝑒𝑡1¯Θsubscript𝑟𝑡𝑡subscripte_{t+1}=\bar{\Theta}r_{t},\quad t\in\mathbb{Z}_{+},italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over¯ start_ARG roman_Θ end_ARG italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , (55)

where et=x^txtnsubscript𝑒𝑡subscript^𝑥𝑡subscript𝑥𝑡superscript𝑛e_{t}=\hat{x}_{t}-x_{t}\in\mathbb{R}^{n}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the observation error, rt=[xtT|utT]Tn+psubscript𝑟𝑡superscriptdelimited-[]conditionalsuperscriptsubscript𝑥𝑡Tsuperscriptsubscript𝑢𝑡TTsuperscript𝑛𝑝r_{t}=[x_{t}^{\mathrm{T}}|u_{t}^{\mathrm{T}}]^{\mathrm{T}}\in\mathbb{R}^{n+p}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + italic_p end_POSTSUPERSCRIPT is the augmented state-input vector as defined in (5), and Θ¯=[(A^A)|(B^B)]¯Θdelimited-[]conditional^𝐴𝐴^𝐵𝐵\bar{\Theta}=[(\hat{A}-A)|(\hat{B}-B)]over¯ start_ARG roman_Θ end_ARG = [ ( over^ start_ARG italic_A end_ARG - italic_A ) | ( over^ start_ARG italic_B end_ARG - italic_B ) ] is an augmented matrix of the system parameters of size n×(n+p)𝑛𝑛𝑝n\times(n+p)italic_n × ( italic_n + italic_p ). Then (9) is equivalent to:

Θ¯t+1=Θ¯tγet+1rtT,t0.formulae-sequencesubscript¯Θ𝑡1subscript¯Θ𝑡𝛾subscript𝑒𝑡1superscriptsubscript𝑟𝑡T𝑡0\bar{\Theta}_{t+1}=\bar{\Theta}_{t}-\gamma e_{t+1}r_{t}^{\mathrm{T}},\quad t% \geq 0.over¯ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over¯ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_γ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , italic_t ≥ 0 . (56)

Notice that (56) can be written in the form of a linear time-varying dynamical system:

Θ¯t+1=Θ¯t(In+pγrtrtT),t0.formulae-sequencesubscript¯Θ𝑡1subscript¯Θ𝑡subscript𝐼𝑛𝑝𝛾subscript𝑟𝑡superscriptsubscript𝑟𝑡T𝑡0\bar{\Theta}_{t+1}=\bar{\Theta}_{t}(I_{n+p}-\gamma r_{t}r_{t}^{\mathrm{T}}),~{% }t\geq 0.over¯ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over¯ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_I start_POSTSUBSCRIPT italic_n + italic_p end_POSTSUBSCRIPT - italic_γ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) , italic_t ≥ 0 . (57)

By vectorizing Θ¯tsubscript¯Θ𝑡\bar{\Theta}_{t}over¯ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that θ¯t=vec(Θ¯t)subscript¯𝜃𝑡vecsubscript¯Θ𝑡\bar{\theta}_{t}=\text{vec}(\bar{\Theta}_{t})over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = vec ( over¯ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), (57) becomes:

θ¯t+1subscript¯𝜃𝑡1\displaystyle\bar{\theta}_{t+1}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =(In(n+p)γψtψtT)θ¯t=Ξtθ¯t,t0,formulae-sequenceabsentsubscript𝐼𝑛𝑛𝑝𝛾subscript𝜓𝑡superscriptsubscript𝜓𝑡Tsubscript¯𝜃𝑡subscriptΞ𝑡subscript¯𝜃𝑡𝑡0\displaystyle=(I_{n(n+p)}-\gamma\psi_{t}\psi_{t}^{\mathrm{T}})\bar{\theta}_{t}% =\Xi_{t}\bar{\theta}_{t},~{}t\geq 0,= ( italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT - italic_γ italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ≥ 0 , (58)

where tensor-product\otimes denotes the Kronecker product, and ψt=[rtTIn]Tsubscript𝜓𝑡superscriptdelimited-[]tensor-productsuperscriptsubscript𝑟𝑡Tsubscript𝐼𝑛T\psi_{t}=[r_{t}^{\mathrm{T}}\otimes I_{n}]^{\mathrm{T}}italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT is a n(n+p)×n𝑛𝑛𝑝𝑛n(n+p)\times nitalic_n ( italic_n + italic_p ) × italic_n matrix. We will show that (58) is exponentially stable in the large (Definition 1, [41]) as long as (8) is satisfied. Consider the Lyapunov function candidate V(t,θ¯)=θ¯tTθ¯t𝑉𝑡¯𝜃superscriptsubscript¯𝜃𝑡Tsubscript¯𝜃𝑡V(t,\bar{\theta})=\bar{\theta}_{t}^{\mathrm{T}}\bar{\theta}_{t}italic_V ( italic_t , over¯ start_ARG italic_θ end_ARG ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. It is obvious that there exist k1,k2>0subscript𝑘1subscript𝑘20k_{1},k_{2}>0italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that k1θ¯2V(t,θ¯)k2θ¯2subscript𝑘1superscriptnorm¯𝜃2𝑉𝑡¯𝜃subscript𝑘2superscriptnorm¯𝜃2k_{1}\|\bar{\theta}\|^{2}\leq V(t,\bar{\theta})\leq k_{2}\|\bar{\theta}\|^{2}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ over¯ start_ARG italic_θ end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_V ( italic_t , over¯ start_ARG italic_θ end_ARG ) ≤ italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ over¯ start_ARG italic_θ end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Notice that V(t+1,θ¯t+1)V(t,θ¯t)=θ¯tTΞtTΞtθ¯t𝑉𝑡1subscript¯𝜃𝑡1𝑉𝑡subscript¯𝜃𝑡superscriptsubscript¯𝜃𝑡TsuperscriptsubscriptΞ𝑡TsuperscriptΞ𝑡subscript¯𝜃𝑡V(t+1,\bar{\theta}_{t+1})-V(t,\bar{\theta}_{t})=\bar{\theta}_{t}^{\mathrm{T}}% \Xi_{t}^{\mathrm{T}}\Xi^{t}\bar{\theta}_{t}italic_V ( italic_t + 1 , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_V ( italic_t , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. As a result, by summing the differences for T𝑇Titalic_T timesteps, we get:

V(t+\displaystyle V(t+italic_V ( italic_t + T+1,θ¯t+T+1)V(t,θ¯t)=\displaystyle T+1,\bar{\theta}_{t+T+1})-V(t,\bar{\theta}_{t})=italic_T + 1 , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t + italic_T + 1 end_POSTSUBSCRIPT ) - italic_V ( italic_t , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = (59)
=τ=tt+TV(τ+1,θ¯τ+1)V(τ,θ¯τ)absentsuperscriptsubscript𝜏𝑡𝑡𝑇𝑉𝜏1subscript¯𝜃𝜏1𝑉𝜏subscript¯𝜃𝜏\displaystyle=\sum_{\tau=t}^{t+T}V(\tau+1,\bar{\theta}_{\tau+1})-V(\tau,\bar{% \theta}_{\tau})= ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT italic_V ( italic_τ + 1 , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ + 1 end_POSTSUBSCRIPT ) - italic_V ( italic_τ , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT )
=τ=tt+Tθ¯τT(ΞτTΞτIn(n+p))θ¯τabsentsuperscriptsubscript𝜏𝑡𝑡𝑇superscriptsubscript¯𝜃𝜏TsuperscriptsubscriptΞ𝜏TsubscriptΞ𝜏subscript𝐼𝑛𝑛𝑝subscript¯𝜃𝜏\displaystyle=\sum_{\tau=t}^{t+T}\bar{\theta}_{\tau}^{\mathrm{T}}\left(\Xi_{% \tau}^{\mathrm{T}}\Xi_{\tau}-I_{n(n+p)}\right)\bar{\theta}_{\tau}= ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT ) over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT
=θ¯tT[τ=tt+TΦ(τ;t)T(ΞτTΞτIn(n+p))Φ(τ;t)]θ¯τabsentsuperscriptsubscript¯𝜃𝑡Tdelimited-[]superscriptsubscript𝜏𝑡𝑡𝑇Φsuperscript𝜏𝑡TsuperscriptsubscriptΞ𝜏TsubscriptΞ𝜏subscript𝐼𝑛𝑛𝑝Φ𝜏𝑡subscript¯𝜃𝜏\displaystyle=\bar{\theta}_{t}^{\mathrm{T}}\left[\sum_{\tau=t}^{t+T}\Phi(\tau;% t)^{\mathrm{T}}\left(\Xi_{\tau}^{\mathrm{T}}\Xi_{\tau}-I_{n(n+p)}\right)\Phi(% \tau;t)\right]\bar{\theta}_{\tau}= over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT roman_Φ ( italic_τ ; italic_t ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT ) roman_Φ ( italic_τ ; italic_t ) ] over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT
α1θ¯tTIn(n+p)θ¯t=α1V(t,θ¯t),absentsubscript𝛼1superscriptsubscript¯𝜃𝑡Tsubscript𝐼𝑛𝑛𝑝subscript¯𝜃𝑡subscript𝛼1𝑉𝑡subscript¯𝜃𝑡\displaystyle\leq-\alpha_{1}\bar{\theta}_{t}^{\mathrm{T}}I_{n(n+p)}\bar{\theta% }_{t}=-\alpha_{1}V(t,\bar{\theta}_{t}),≤ - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_V ( italic_t , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ,

for some 0<α1<10subscript𝛼110<\alpha_{1}<10 < italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 1. Here Φ(τ;t)=ΞtΞt+1Ξτ1Φ𝜏𝑡subscriptΞ𝑡subscriptΞ𝑡1subscriptΞ𝜏1\Phi(\tau;t)=\Xi_{t}\Xi_{t+1}\ldots\Xi_{\tau-1}roman_Φ ( italic_τ ; italic_t ) = roman_Ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT … roman_Ξ start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT is the transition matrix of (58), and the inequality follows from condition (8). Notice that the first inequality in (8) is equivalent to αIn+pτ=tt+TrτTrτprecedes-or-equals𝛼subscript𝐼𝑛𝑝superscriptsubscript𝜏𝑡𝑡𝑇superscriptsubscript𝑟𝜏Tsubscript𝑟𝜏\alpha I_{n+p}\preceq\sum_{\tau=t}^{t+T}r_{\tau}^{\mathrm{T}}r_{\tau}italic_α italic_I start_POSTSUBSCRIPT italic_n + italic_p end_POSTSUBSCRIPT ⪯ ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT and directly implies that α2In(n+p)τ=tt+TψτTψτprecedes-or-equalssubscript𝛼2subscript𝐼𝑛𝑛𝑝superscriptsubscript𝜏𝑡𝑡𝑇superscriptsubscript𝜓𝜏Tsubscript𝜓𝜏\alpha_{2}I_{n(n+p)}\preceq\sum_{\tau=t}^{t+T}\psi_{\tau}^{\mathrm{T}}\psi_{\tau}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT ⪯ ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT, for some α2>0subscript𝛼20\alpha_{2}>0italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, as well. As a result τ=tt+TΞτTΞτα3TIn(n+p)precedes-or-equalssuperscriptsubscript𝜏𝑡𝑡𝑇superscriptsubscriptΞ𝜏TsubscriptΞ𝜏subscript𝛼3𝑇subscript𝐼𝑛𝑛𝑝\sum_{\tau=t}^{t+T}\Xi_{\tau}^{\mathrm{T}}\Xi_{\tau}\preceq\alpha_{3}TI_{n(n+p)}∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ⪯ italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_T italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT for some 0<α3<10subscript𝛼310<\alpha_{3}<10 < italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT < 1, and, therefore, τ=tt+T(ΞτTΞτIn(n+p))α4TIn(n+p)precedes-or-equalssuperscriptsubscript𝜏𝑡𝑡𝑇superscriptsubscriptΞ𝜏TsubscriptΞ𝜏subscript𝐼𝑛𝑛𝑝subscript𝛼4𝑇subscript𝐼𝑛𝑛𝑝\sum_{\tau=t}^{t+T}\left(\Xi_{\tau}^{\mathrm{T}}\Xi_{\tau}-I_{n(n+p)}\right)% \preceq-\alpha_{4}TI_{n(n+p)}∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT ( roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT ) ⪯ - italic_α start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_T italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT for some 0<α4<10subscript𝛼410<\alpha_{4}<10 < italic_α start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT < 1. Finally this implies that [τ=tt+TΦ(τ;t)T(ΞτTΞτIn(n+p))Φ(τ;t)]α1In(n+p)delimited-[]superscriptsubscript𝜏𝑡𝑡𝑇Φsuperscript𝜏𝑡TsuperscriptsubscriptΞ𝜏TsubscriptΞ𝜏subscript𝐼𝑛𝑛𝑝Φ𝜏𝑡subscript𝛼1subscript𝐼𝑛𝑛𝑝\left[\sum_{\tau=t}^{t+T}\Phi(\tau;t)^{\mathrm{T}}\left(\Xi_{\tau}^{\mathrm{T}% }\Xi_{\tau}-I_{n(n+p)}\right)\Phi(\tau;t)\right]\leq-\alpha_{1}I_{n(n+p)}[ ∑ start_POSTSUBSCRIPT italic_τ = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT roman_Φ ( italic_τ ; italic_t ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT roman_Ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT ) roman_Φ ( italic_τ ; italic_t ) ] ≤ - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n ( italic_n + italic_p ) end_POSTSUBSCRIPT for some 0<α1<10subscript𝛼110<\alpha_{1}<10 < italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 1 [42]. Notice that the second inequality of (8) is necessary to ensure non-singularity of the transition matrix Φ(τ;t)Φ𝜏𝑡\Phi(\tau;t)roman_Φ ( italic_τ ; italic_t ) [41]. Finally, as an immediate result of (59), V(t+T+1,θ¯t+T+1)(1α1)V(t,θ¯t)𝑉𝑡𝑇1subscript¯𝜃𝑡𝑇11subscript𝛼1𝑉𝑡subscript¯𝜃𝑡V(t+T+1,\bar{\theta}_{t+T}+1)\leq(1-\alpha_{1})V(t,\bar{\theta}_{t})italic_V ( italic_t + italic_T + 1 , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT + 1 ) ≤ ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_V ( italic_t , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), t0for-all𝑡0\forall t\geq 0∀ italic_t ≥ 0, which implies uniform asymptotic stability in the large, and, due to linearity, exponential stability in the large.

[Uncaptioned image] Christos N. Mavridis received his Diploma in electrical and computer engineering from the National Technical University of Athens, Greece, in 2017, and the M.S. and Ph.D. degrees in electrical and computer engineering at the University of Maryland, College Park, MD, in 2021. His research interests include stochastic optimization, learning theory, hybrid systems, and control theory,. He is currently a postdoc at KTH Royal Institute of Technology, Stockholm, and has been affiliated as a research scientist with the Institute for Systems Research (ISR), University of Maryland, MD, the Nokia Bell Labs, NJ, the Xerox Palo Alto Research Center (PARC), CA, and Ericsson AB, Stockholm. Dr. Mavridis is an IEEE member, and a member of IEEE/CSS Technical Committee on Security and Privacy. He has received the A. James Clark School of Engineering Distinguished Graduate Fellowship and the Ann G. Wylie Dissertation Fellowship in 2017 and 2021, respectively. He has been a finalist in the Qualcomm Innovation Fellowship US, San Diego, CA, 2018, and he has received the Best Student Paper Award in the IEEE International Conference on Intelligent Transportation Systems (ITSC), 2021.
[Uncaptioned image] Karl H. Johansson is Swedish Research Council Distinguished Professor in Electrical Engineering and Computer Science at KTH Royal Institute of Technology in Sweden and Founding Director of Digital Futures. He earned his MSc degree in Electrical Engineering and PhD in Automatic Control from Lund University. He has held visiting positions at UC Berkeley, Caltech, NTU and other prestigious institutions. His research interests focus on networked control systems and cyber-physical systems with applications in transportation, energy, and automation networks. For his scientific contributions, he has received numerous best paper awards and various distinctions from IEEE, IFAC, and other organizations. He has been awarded Distinguished Professor by the Swedish Research Council, Wallenberg Scholar by the Knut and Alice Wallenberg Foundation, Future Research Leader by the Swedish Foundation for Strategic Research. He has also received the triennial IFAC Young Author Prize and IEEE CSS Distinguished Lecturer. He is the recipient of the 2024 IEEE CSS Hendrik W. Bode Lecture Prize. His extensive service to the academic community includes being President of the European Control Association, IEEE CSS Vice President Diversity, Outreach & Development, and Member of IEEE CSS Board of Governors and IFAC Council. He has served on the editorial boards of Automatica, IEEE TAC, IEEE TCNS and many other journals. He has also been a member of the Swedish Scientific Council for Natural Sciences and Engineering Sciences. He is Fellow of both the IEEE and the Royal Swedish Academy of Engineering Sciences.