Real-time Hybrid System Identification
with Online Deterministic Annealing
Abstract
We introduce a real-time identification method for discrete-time state-dependent switching systems in both the input–output and state-space domains. In particular, we design a system of adaptive algorithms running in two timescales; a stochastic approximation algorithm implements an online deterministic annealing scheme at a slow timescale and estimates the mode-switching signal, and an recursive identification algorithm runs at a faster timescale and updates the parameters of the local models based on the estimate of the switching signal. We first focus on piece-wise affine systems and discuss identifiability conditions and convergence properties based on the theory of two-timescale stochastic approximation. In contrast to standard identification algorithms for switched systems, the proposed approach gradually estimates the number of modes and is appropriate for real-time system identification using sequential data acquisition. The progressive nature of the algorithm improves computational efficiency and provides real-time control over the performance-complexity trade-off. Finally, we address specific challenges that arise in the application of the proposed methodology in identification of more general switching systems. Simulation results validate the efficacy of the proposed methodology.
Index Terms:
Hybrid System Identification, Switching Systems, Piecewise Affine System Identification, Online Deterministic Annealing.I Introduction
Hybrid systems, described by interacting continuous and discrete dynamics, are a powerful modeling tool in the analysis of systems where logic and continuous processes are interlaced, as in most complex cyber-physical systems. In addition to being able to describe switching dynamics, hybrid systems can be used as a tool to approximate highly non-linear dynamics by a collection of simpler models, and boost model explainability and robustness, by decomposing the behavior of a complex system into sub-systems where first principles and domain knowledge can be used for precise model tuning [1, 2]. As a result, hybrid systems have attracted significant attention in the control community.
However, first principles modelling is often too complicated and sub-optimal, and a hybrid model needs to be identified on the basis of observations. The majority of the work in this area is based on piece-wise affine (PWA) systems, a class of state-dependent switched systems with important applications in identification, verification, and control synthesis of hybrid and nonlinear systems [2, 3, 4, 5]. PWA systems are a collection of affine dynamical systems, indexed by a discrete-valued switching variable (mode) that depends on a partitioning of the state-input domain into a finite number of polyhedral regions [2, 3]. The input–output representation of PWA systems is the class of piece-wise affine auto-regressive exogenous (PWARX) systems with the switching signal depending on a partitioning of the domain of a vector containing the recent history of input–output pairs. The problem of identifying a PWA system can be quite challenging [6, 7]. As a result, most existing approaches focus on offline identification methods [8, 9].
I-A Contribution and Outline
In this work, we propose a two-timescale stochastic optimization approach for real-time state-dependent switched system identification in both input–output and state-space representations. We first focus on the well-studied case of PWA and PWARX systems. In Section II we present the realization and identifiability conditions for PWA systems, and in Theorem 1 of Section II-B we provide the identifiability conditions for state space PWA systems in the form of a persistence of excitation (PE) criterion. In Section III, we formulate the state-dependent switching system identification problem as a combined identification and prototype-based learning problem, and in Sections IV and V we develop a two-timescale stochastic approximation algorithm to solve it in real-time.
In particular, Theorem 3 of Section IV constructs a stochastic approximation algorithm based on online deterministic annealing that estimates the mode-switching signal, as well as the number of modes, through a bifurcation phenomenon, studied in Section IV-B. In Section V a second stochastic approximation algorithm based on standard adaptive filtering, running at a faster timescale, is developed to update the parameters of the local models based on the estimate of the switching signal. The convergence properties of this system of recursive algorithms are studied in Theorem 4 of Section V-B, and the applicability of the proposed approach in more general state-dependent switching systems is discussed in Section VI. Finally, in Section VII, simulation results validate the efficacy of the proposed approach in PWA systems.
I-B Related Work
Most existing switched system identification methods [2, 3, 4] can be categorized by the problem formulation used as optimization-based [4, 10, 8, 11], algebraic [12, 13], or clustering-based [9, 14, 15], and by the the method used as offline [9, 16, 17] or recursive [18, 13]. Algebraic methods are based on transforming the SARX model to a “lifted” ARX model that does not depend on the switching sequence [12, 13]. Offline optimization-based methods often rely on solving a large mixed-integer program, which is tractable only for small data sets [11, 8], or relaxation techniques over the same problem [18]. Finally, clustering-based methods are optimization-based methods that make use of unsupervised learning to estimate the partition of the domain that is needed for the switching signal [9, 14, 15, 19, 20, 21]. Most hybrid identification approaches are offline methods that first classify each observation and estimate the local model parameters (either simultaneously or iteratively), and then reconstruct the partition of the switching signal [9, 14, 22]. In our recent work, we have proposed the use of the online deterministic annealing approach as a clustering method to estimate the partition of the switching signal [23, 24]. Compared to standard offline methods, this approach allows for real-time PWA system identification, provides computational benefits, and offers real-time control over the performance-complexity trade-off, desired in many applications. In this work, we modify and extend this approach and provide an extensive study of a real-time prototype-based learning method for more general switched systems in both input–output and state-space representations. Compared to [23, 24], the proposed method also provides a solution to the central problem of estimating a minimal number of modes.
I-C Notation
The sets and represent the sets of real and integer numbers, respectively, while represents the set of non-negative integers. For a real matrix , denotes its transpose and the vectorization of . The identity matrix is denoted . is a positive semi-definite matrix, and the condition is understood as . Unless otherwise specified, random variables are defined in a probability space . The probability of an event is denoted , and the expectation operator . In case of multiple random variables and a deterministic function , the expectation operator is understood with respect to the joint probability measure, while denotes the expectation of conditioned to the -field of . Stochastic processes , , are defined in the filtered probability space , where , , is the natural filtration. The indicator function of the event is denoted and denotes the Kronecker product. Finally, “” defines the minimization operator while “” defines a minimization problem.
II Switched and Piecewise Affine Systems
A general discrete-time switched system is described by:
(1) | ||||
where is the state vector of the system, the input, the output, and and are noise terms. The signal defines the mode which is active at time . System (1) is a switched affine system when it can be expressed as:
(2) | ||||
The matrices , , , , , and define the affine dynamics for each mode . System (2) is PWA when is defined according to a polyhedral partition of the state and input space, i.e., when
(3) |
where , , are convex polyhedra defining a partition of the state-input domain , that is when for , and .
Switched affine systems can be expressed in input–output form as Switched AutoRegressive eXogenous (SARX) systems of fixed orders , , such that for every component of the output vector it holds:
(4) |
where the regressor vector , , is defined by
(5) |
The parameter vectors , , define each ARX mode, and is a noise term. Similarly, (4) is PWARX if
(6) |
and define a polyhedral partition of .
II-A Realization and Identification of PWARX Models
Results in [25] show that every observable switched affine system admits a SARX representation. Necessary and sufficient conditions for input–output realization of SARX and PWARX systems are given in [26], and [27], respectively. It is worth mentioning, however, that the number of modes and parameters can grow considerably when a PWA state-space system is converted into a minimum-order equivalent PWARX representation [27].
Identifiability refers to whether or not identification of a given parameterized system from noise-free data is a well-posed problem. In spite of the increasing attention received by SARX and PWARX system identification, there are currently only few results on the identifiability of these systems [2, 3]. Identifiability with respect to the input–output behavior of switched linear systems is investigated in [28]. The general identification problem for a PWARX system of the form (4)-(6) can be formulated as a stochastic optimization problem over the parameters . We make the following assumption:
Assumption 1.
Upper bounds on the orders of the model are known.
Assumption 1 will allow us to concentrate on the properties of PWARX identification, assuming known subject to potential computational bounds.
II-B Realization and Identification of PWA State-Space Models
The problem of identifying a state-space representation of a switched affine system can be quite challenging. Traditionally, it has been handled linked to applying results from classical realization theory to each linear subsystem [7]. However, identifiability issues arise regarding the characterization of minimality of discrete-time switched linear systems. The first issue relates to the known fact that realizations of a switched affine system are not unique [6]. The lack of uniqueness is related to that (i) the minimal realizations of the local linear systems from input–output observations are non-unique, and (ii) a realization of a switched affine system can be constructed for any arbitrary number of modes [6]. Here we address the problem of the realization of the local systems being unique only up to isomorphisms, even when the switching signal is known [7]. In particular, assuming no affine dynamics , , for any set of invertible matrices , , the realization corresponds to the transfer function associated with (2), i.e., the same input–output observations. To ensure uniqueness of the realizations, given that all subsystems share the same state space, and simplify the presentation of our methodology, we make the following assumptions.
Assumption 2.
, in system (2).
Assumption 2 implies that the order is known (observed) and enforces that the set of observations is acquired using the same observation mechanism, which leads to the realization of (2) being unique. In practice, this corresponds, for example, with the assumption of using a single sensor with the same world reference to measure the states of the system for every mode, without allowing any similarity transformation.
Assumption 3.
No affine dynamics, i.e., , , no feed-forward terms, i.e., , the states are fully observable, i.e., , and the error terms and share the same zero-mean statistics for every mode of the system.
Assumptions 3 are made to simplify the presentation of the proposed methodology without loss of generality.
In addition to the realizations of the local systems being non-unique, minimality and identifiability of the switched system does not necessarily imply that of the local subsystems [28]. In Theorem 1, we describe the conditions under which the local linear models of (2) (under Assumptions 2–3) can be identified, even when a subset of them is not controllable (minimal) in isolation.
Theorem 1.
Consider a bounded-input bounded-output linear discrete-time system of the form:
(7) | ||||
where , , , and . Denote . Then, if there exist some such that
(8) |
the augmented parameter matrix updated by the recursion
(9) |
for some , asymptotically converges to .
Proof.
See Appendix A. ∎
As a result of Theorem 1, throughout this paper, we make the following assumption to ensure identifiability of (2) under Assumptions 2–3:
Assumption 4.
Remark 1.
Remark 2.
The assumption of asymptotic boundedness and controllability (thus, minimality) for all subsystems of (2) would simplify the condition (10) to a persistence of excitation criterion for the input for each subsystem separately. Although this assumption is usually adopted, it is a limiting assumption in a practical sense. The assumption that all the local systems share the same state space of order is a modeling assumption that facilitates the identification of the switched signal as a partition of the state-input space. However, it allows for situations when the minimal realization of some of the local models is of order , as long as the switched system as a whole is identifiable.
III Hybrid System Identification as an Optimization Problem
Consider a switched linear system of the form
(11) | ||||
where , , , , for all , , is a zero-mean noise signal, and define a polyhedral partition of . System (4) can be written in the form (11) with , , and , where , and . In addition, system (2) under Assumptions 2, 3 can be written in the form (11) with , , and , where , and .
Under the identifiability conditions discussed in Section II, the general identification problem for a switching system of the form (11) can be formulated as a stochastic optimization problem over the parameters , as follows:
(12) |
where and represent random variables, realizations of which constitute the system observations, the nonnegative measure is an appropriately defined dissimilarity measure, and the expectation is taken with respect to the joint distribution of that depends on the system dynamics, the control input, and the noise term in (11).
It is clear that the optimization problem (12) is computationally hard and becomes intractable as the number of modes and states increases. In particular, the number of modes is unknown and completely alters the cardinality and the domain of the set of parameter vectors that represent the dynamics of the system. In addition, a parametric representation for the polyhedral regions should be defined.
To represent the regions , we will follow a Voronoi tessellation approach based on prototypes. We introduce a set of parameters , and define the regions:
(13) |
The measure can be designed such that the Voronoi regions are polyhedral, e.g., when is a squared Euclidean distance or any Bregman divergence, as will be explained in Section IV-A. In this sense, each can be mapped to a region (for ) or the union of a subset of of (for ), according to a predefined rule, as will be explained in Section IV-C. An illustration of this partition is given in Fig. 1.
In addition to the prototype parameters , we also introduce a set of parameters , , with each associated with the region according to (13). Representing the augmented random vector
(14) |
we can define a set of augmented codevectors as
(15) |
where the first component of each 111Throughout this paper we will use the notation , , , , interchangeably, to showcase the dependence on the variables of interest in each case. is a mapping that simulates the local model dynamics in (11) with unknown parameters , and the second component is a set of unknown codevectors that define the partition in (13).
Problem (12) can then be decomposed into two interconnected stochastic optimization problems. Assuming are known, the optimization problem
(16) |
finds the optimal parameters that define the partition subject to the joint distribution of , and is, therefore, a mode switching signal identification problem.
On the other hand, assuming the partition (and, therefore, ) is known, the optimization problem
(17) |
is a system identification problem for each mode of the system.
In Section IV we address the question of finding the optimal number according to a performance-complexity trade-off, as well as finding a mapping between and for the lowest possible number . In Section V we tackle the problem of solving (16) and (17) as a system of interconnected stochastic optimization problems in real-time using principles from two-timescale stochastic approximation theory.

IV Mode Identification with Online Deterministic Annealing
We aim to construct a recursive stochastic optimization algorithm to solve problem (16) while progressively estimating the number of the augmented codevectors , an estimate of the actual number of modes, and a mapping between and . Recall that the observed data are represented by the random variable in (14), and the augmented codevectors are normally treated as constant parameters to be estimated. To progressively estimate and , we will adopt the online deterministic annealing approach [29, 30], and define a probability space over an arbitrary number of codevectors, while constraining their distribution using a maximum-entropy principle at different levels. First we define a quantizer as a stochastic mapping of the form:
(18) |
Then we formulate the multi-objective optimization
(19) |
where the dependence on comes through , the term
(20) |
is a generalization of the objective in (16), and
(21) | ||||
is the Shannon entropy. This is now a problem of finding the locations and the corresponding probabilities .
Notice that, for and , (19) is equivalent to (16). In that sense, (19) introduces extra optimization parameters in the probabilities , and the parameter that defines a homotopy . However, the advantages of this approach are notable, and, perhaps counter-intuitively, lead to numerical optimization solutions with several computational benefits. On the one hand, the Lagrange multiplier controls the trade-off between and , which, as will be shown, is a trade-off between performance and complexity. On the other hand, the use of the conditional probabilities allows for the definition of the entropy term , which introduces several useful properties [29, 30, 31, 32, 33]. In particular, as we will show in Section IV-B, reducing the values of defines a direction that resembles an annealing process [29, 34] and induces a bifurcation phenomenon, with respect to which, the number of unique codevectors depends on and is finite for any given value of . This process also introduces robustness with respect to initial conditions [29, 35].
IV-A Solving the Optimization Problem
To solve (19) for a given value of , we successively minimize first with respect to the association probabilities , and then with respect to the codevector locations . The solution of the optimization problem
(22) | ||||
s.t. |
is given by the Gibbs distributions [36]:
(23) |
In order to minimize with respect to we set the gradients to zero
(24) |
where , and
(25) | ||||
where we have used (23) and direct differentiation with similar arguments as in [36]. It follows that which implies that
(26) |
Equation (26) has a closed-form solution if the dissimilarity measure belongs to the family of Bregman divergences [37, 29],information-theoretic dissimilarity measures that include the squared Euclidean distance and the Kullback-Leibler divergence, and are defined as follows:
Definition 1 (Bregman Divergence).
Let , be a strictly convex function defined on a vector space such that is twice F-differentiable on . The Bregman divergence is defined as:
where , and the continuous linear map is the Fréchet derivative of at .
Throughout this manuscript, we will assume that the dissimilarity measure in (13) is a Bregman divergence. Then the solution to the optimization problem
(27) |
where is the solution of (22) for a given and is given by (23), is given by Theorem 2.
Theorem 2.
Proof.
Remark 3.
Based on Theorem 2, Theorem 3 below constructs a gradient-free stochastic approximation algorithm that recursively estimates (28).
Theorem 3.
The sequence constructed by the recursive updates
(30) |
where represents external input with , , , , and the quantities and are recursively updated as follows:
(31) |
with , converges almost surely to given in (28).
Proof.
The proof follows similar arguments as Theorem of [30]. ∎
Remark 4.
Notice that the dynamics of (30) can be expressed as:
(32) |
where the recursive updates take place for every codevector sequentially. This is a discrete-time dynamical system that presents bifurcation phenomena with respect to the parameter , i.e., the number of equilibria of this system changes with respect to the value which is hidden inside the term in (31). According to this phenomenon, the number of distinct values of is finite, and the updates need only be taken with respect to these values that we call “effective codevectors”. This is discussed in Section IV-B.
IV-B Bifurcation Phenomena
In Section IV-A we described how to solve the optimization problem for a given value of the parameter . The main idea of the proposed approach is to solve a sequence of optimization problems of the form (19) with decreasing values of . This process then becomes a homotopy optimization method [38]. In particular, the usage of the entropy term resembles annealing optimization methods and grants the name of a ’temperature’ parameter. Notice that, so far, we have assumed an arbitrary number of codevectors . We will show that the unique values of the set that solves (19), form a finite set of values that we will refer to as “effective codevectors”.
Notice that at high temperature (), (23) yields uniform association probabilities , and as a result of (28), all pseudo-inputs are located at the same point , which means that there is one unique “effective” codevector given by . As is lowered below a critical value, a bifurcation phenomenon occurs, when the number of “effective” codevectors increases, which describes an annealing process [29, 34]. Mathematically, this occurs when the existing solution given by (28) is no longer the minimum of the free energy , as the temperature crosses a critical value. Following principles from variational calculus, we can track the bifurcation by the condition:
(33) |
for all choices of finite perturbations . Using (33) and direct differentiation, one can show that bifurcation depends on the temperature coefficient (and the choice of the Bregman divergence, through the function ) [30, 36]. In other words, the number of codevectors increases countably many times as the value of decreases, and an algorithmic implementation needs only as many codevectors in memory as the number of “effective” codevectors.
In practice. we can detect the bifurcation points by introducing perturbing pairs of codevectors at each temperature level . In this way, the codevectors are doubled by inserting a perturbation of each in the set of effective codevectors. The newly inserted codevectors will merge with their pair if a critical temperature has not been reached and separate otherwise. The merging criterion takes the form:
(34) |
for a given threshold . The pseudocode for this algorithm is presented in Alg. 1. A detailed discussion on the implementation of the original online deterministic annealing algorithm, its complexity, and the effect of its parameters, can be found in [29, 30, 36].
IV-C Estimating the number of modes
As illustrated in Fig. 1, the problem formulation developed in Section III defines a possibly imperfect surjective mapping from to such that each is defined as a union of a subset of . According to Remark 3, it is possible for this mapping to be perfect in the sense that each is perfectly represented, inducing zero mode switching error. The design of an appropriate termination criterion for Alg. 1 is an open question and is subject to the trade-off between the number and the minimization of the identification error. In this work, we make use of the condition as a termination criterion, where represents the computational capacity of the identification device.
Recall that each is associated with a parameter vector , . Assuming a set , we define each as the mapping:
(35) |
In this way if . Therefore, given (35), the goal now is to find and such that , and , . We follow a similar approach to the bifurcation mechanism described in Section IV-B. Starting with one codevector , we define . Every time a codevector is split into a pair of perturbed codevectors, a new is introduced. After convergence for a given , merging of the codevectors is detected by (34). For the insertion of a new we check the condition:
(36) |
with respect to a given threshold . Notice that in contrast to (34), (36) does not depend on . If (36) is satisfied, a new is introduced and . This process is integrated in the mode identification algorithm and its pseudocode is presented in Alg. 1.
Remark 5.
Note that are only used as functions of , and the parameters are the ones that are being updated by the local system identification algorithm that will be presented in Section V.
V Piecewise Affine System Identification
In this section we review standard recursive system identification for estimating the parameters of the local models given knowledge of the partition . We show that this kind of recursive identification can be formulated as a stochastic approximation algorithm, and that it can be combined using the theory of two-timescale stochastic approximation with the stochastic approximation method of estimating by as proposed in Section IV.
V-A Recursive Identification of Local Models
Recall that, given knowledge of the partition , each local linear model of the PWA system in (11) is completely defined by the parameters . In the following, we develop a stochastic approximation recursion to estimate . First we define the error:
(37) |
A stochastic gradient descent approach aims to minimize the error:
(38) |
using the recursive updates:
(39) | ||||
where , . Here the expectation is taken with respect to the joint distribution of as explained in Section III. This is a standard recursive identification method and constitutes a stochastic approximation sequence of the form:
(40) |
where is Lipschitz, and is a Martingale difference sequence. This sequence converges almost surely to the equillibrium of the differential equation
(41) |
which can be shown to be a solution of (38) with standard Lyapunov arguments. For more details the reader is referred to [39, 30]. Moreover, notice that (39) is a vectorized representation of (9), for . Therefore, under the PE condition (10) of Assumption 10, and under the zero-mean noise assumption, it follows that converges asymptotically to for all , i.e., the minimum of (38) is achieved.
V-B Combined Mode and Dynamics Identification
Recall that the mode identification method is based on the stochastic approximation updates (30) that can be written with respect to the vectors and a stepsize schedule in the form:
(42) |
where is Lipschitz, is a Martingale difference sequence and the dependence on comes from the quantity in (31) given (35). At the same time, the recursive system identification technique to estimate is a stochastic approximation sequence with a stepsize schedule of the form:
(43) |
as given in (40). The dependence on , comes through (37), since defines , which defines through (13), which defines through the rule if .
Theorem 4 shows how the two recursive algorithms (42) and (43) can be combined using the theory of two-timescale stochastic approximation if , i.e., when the estimation of the partition is updated at a slower rate than the updates of the parameters .
Theorem 4.
Consider the sequence generated using the updates (42), where , and are defined in (30). Consider the sequence generated by the updates (43). Let the stepsizes of (43) and (42), respectively, satisfy the conditions , , and , with the last condition implying that the iterations for run on a slower timescale than those for . If the equation
(44) |
has an asymptotically stable equilibrium for fixed and some Lipschitz mapping , and the equation
(45) |
has an asymptotically stable equilibrium , then, almost surely, the sequence generated by (42), (43), converges to .
Proof.
It follows directly from Theorem 2, Ch. 6 of [39]. ∎
Corollary 4.1.
Notice that the condition is of great importance. Intuitively, the stochastic approximation algorithm (42), (43) consists of two components running in different timescales, where the slow component is viewed as quasi-static when analyzing the behavior of the fast transient. In practice, the condition is satisfied by stepsizes of the form , or . Another way of achieving the two-timescale effect is to run the iterations for the slow component with stepsizes , where is a subsequence of that becomes increasingly rare (i.e. ), while keeping its values constant between these instants. A good policy is to combine both approaches and update the slow component with slower stepsize schedule along a subsequence keeping its values constant in between (e.g., [30, 39]).
VI General Switched System Identification
So far, in Sections III, IV, and V we have developed a real-time idenitification method for PWA systems. Notice, however, that neither the proposed methodology, nor the algorithmic implementation of Alg. 1 are constrained to PWA systems, meaning that the proposed approach can, in principle, be applied to more general switching and hybrid systems. However, one must proceed with caution, as issues may arise with respect to the identifiability conditions, the mode-switching estimation error, and the possibly non-linear local system identification error. In this section, we discuss the applicability of the proposed approach in different cases often encountered in hybrid control systems, namely switched linear systems with non-polyhedral partition, and switched non-linear systems with polyhedral partition.
VI-A Switched linear systems with non-polyhedral partition.
In the case of linear local dynamics, the recursive identification method discussed in Section V-A remains unchanged, and the same convergence results hold as well. However, if the regions of the mode switching partition are non-polyhedral, they cannot be perfectly approximated by a finite union of polyhedral regions . Therefore, the mode switching estimation will have inherent non-zero error. It is worth pointing out that from the convergence results of the online deterministic annealing algorithm [36], it follows that the partition error can be arbitrarily small in the limit (which is the case when ). Albeit a nice analytical result, in practice there will always be non-zero error in the estimation of the partition .
We hereby discuss two ways to deal with this problem. The first is to assume the existence of a non-linear transformation that maps each to a polyhedral region , and proceed with Alg. 1. General-purpose learning machines, such as artificial neural networks can be incorporated in this process. Further assumptions and analysis is required for this method, which is beyond the scope of this paper. The second refers to mitigating the jumping effect of the identified system to decrease the closed-loop error that naturally occurs due to imperfect mode switching. To this end, recall that, given an observation the dynamics of the identified model are given according to (11) by:
(46) |
To mitigate the jumping behavior one can make use of the association probabilities
(47) |
to instead construct the weighted dynamics:
(48) |
This jump-mitigation method has been used in the literature to preserve smoothness of the closed-loop dynamics and is particularly useful when hybrid system identification is used for non-linear function approximation, i.e., when the original system is not hybrid but is to be approximated by a hybrid system with simpler local dynamics.
VI-B Switched non-linear systems with polyhedral partition.
In this case, often referred to as piece-wise non-linear hybrid systems [40], the mode switching partition is polyhedral, and can be perfectly approximated by a finite union of polyhedral regions . For the identification of the non-linear local dynamics, the recursive identification method discussed in Section V-A needs to be modified. In particular the recursive updates:
(49) |
given in (39) of the same stochastic gradient descent structure are used, with the error term in this case given by
(50) |
where the functions are local parametric models of known form, differentiable with respect to the parameters . General-purpose learning machines, such as artificial neural networks can be used. Notice that the identification updates remain stochastic approximation updates of the same form, which means that the convergence results of Theorem 4 continue to hold.
VII Experimental Results
We illustrate the properties and evaluate the performance of the proposed algorithm in two PWA systems, one in PWARX form and the other in state-space form.
VII-A PWARX System
The first one, adopted from [9], is given in the input–output representation of (51):
(51) |
where , , , , and . This example is chosen to showcase the properties of the proposed methodology, since its simplicity allows graphical representation of the signaling partition and the convergence of the model parameters. At the same time, it is a switching system that presents a jump at , and same dynamics for different regions of the input space, i.e., while . System (51) can thus be written in the form:
(52) |
where . The representation of (52) in the input–output () space is shown in Fig. 2. A total of observations under Gaussian noise () are accessible sequentially.

Algorithm 1 is applied to the observations for iterations. The same observations can be reused by the algorithm. The temperature parameters used for the online deterministic annealing algorithm are , and the stepsizes . At first (), the algorithm keeps in memory only one codevector and one model parameter vector , essentially assuming that the system has constant dynamics in the entire domain, i.e., . As new input–output pairs are observed, the estimated parameter gets updated by the iterations (39). We have assumed .
At the same time, the estimate of are used to update the location of the codevector towards the mean of the observation domain as shown in (28). This process does not yield any accurate identification results since at this stage it is assumed that . However, it boosts the robustness of the identification algorithm with respect to initial conditions, since the converged values of the parameters for will be used as initial conditions for the next value of . As is reduced, the bifurcation phenomenon described in Section IV-B takes place, and, after reaching a critical value, the single codevector splits into two duplicates. Now the algorithm assumes that there are two modes in the system and estimates the optimal model parameters and partition (through the location of the codevectors ). This process continues until a desired termination criterion is reached. In this case it is the minimum temperature parameter that reflects to a potential time and computational constraint of the system. The bifurcation phenomenon is illustrated in Fig. 3 where the locations of the codevectors , generated by Alg. 1 are shown. The algorithm progressively constructs a total of effective codevectors. The number of modes is estimated with the process explained in Section IV-C. Two modes are estimated, i.e., with . The association of each effective codevector with each identified mode according to the rule (35) is shown in Fig. 3.



The final estimated partition, the output of the estimated model, and its error with respect to the true model without noise are shown in Fig. 4. As shown, the identification error is low with the exception of a single misclassification instance of the mode at the boundary of the true partition of the input–output domain. This mode switching error can be avoided by allowing to go lower, which results in a larger number of effective codevectors and is indicative of the performance/complexity trade-off of the algorithm. Finally, the convergence of the parameters of each of the local models detected are shown in Fig. 5. Parameter values that do not appear at indicate that belong to modes identified through the bifurcation phenomenon after a certain critical temperature value.


VII-B State-Space PWA system
The second system we evaluate is given by the following linearized PWA dynamics in the state-space domain:
(53) |
where and . System (53) has two modes () and the switching signal is defined by the polyhedral regions , , and with and . The dynamics of (53) consist of a controllable double integrator when the input is of sufficient magnitude, and a stable autonomous system that drives its velocity to zero, otherwise. An example of such system can be an autonomous vehicle that avoids minimal, potentially accidental, gas pedal input. In this example, the linear system of the second mode () is not minimal, and its identification relies on the mode switching behavior of the system, as explained in Section II-B. To preserve the PE conditions of Assumption 10, the input signal is chosen as , , and the noise term is a zero-mean Gaussian random variable with . The evolution of (53) over time, as well as the mode switching behavior, are shown in Fig. 6.

The system is allowed to run for (seconds), with , i.e., a total of observations are acquired online, based on which, the proposed method identifies the switched system in real time. The temperature parameters used for the online deterministic annealing algorithm are , and the stepsizes . At first (), the algorithm keeps in memory only one codevector and one model parameter vector , essentially assuming that the system has constant dynamics in the entire domain. The estimated parameter gets updated by the iterations (39). We have assumed . As is reduced, the bifurcation phenomenon described in Section IV-B takes place, and, after reaching a critical value, the single codevector splits into two duplicates. This process continues until the minimum temperature parameter that reflects to a potential time and computational constraint of the system. The bifurcation phenomenon is illustrated in Fig. 7 where the third coordinate of the codevectors , which gives an estimate of the control input representation of the mode, is depicted. A total of effective codevectors are estimated and the association of the regions with the identified modes is also depicted in Fig. 7.
The identification error and the estimated mode switching behavior are shown in Fig. 8 in comparison with the true mode switching behavior of the system. More specifically, the algorithm identifies a total of modes with and . In Figure 9, the convergence of the parameters of each of the local models detected to the actual observed are shown. Parameter values that do not appear at indicate that they belong to modes identified through the bifurcation phenomenon after a certain critical temperature value.




VIII Conclusion
We proposed a real-time identification scheme for discrete-time switched state-space models. In contrast to most existing identification algorithms for piece-wise affine systems, the proposed approach is appropriate for online identification of both the modes and the subsystems of the switched system, and is computationally efficient compared to standard algebraic, mixed-integer programming, and offline clustering-based methods. The progressive nature of the algorithm also provides real-time control over the performance-complexity trade-off.
Future directions include extensions of the proposed approach to identification of both discrete- and continuous-time partially observable piece-wise affine models in the state-space domain using real-time observations.
References
- [1] A. Garulli, S. Paoletti, and A. Vicino, “A survey on switched and piecewise affine system identification,” IFAC Proceedings Volumes, vol. 45, no. 16, pp. 344–355, 2012.
- [2] D. Liberzon, Switching in Systems and Control. Springer, 2003, vol. 190.
- [3] S. Paoletti, A. L. Juloski, G. Ferrari-Trecate, and R. Vidal, “Identification of hybrid systems a tutorial,” European Journal of Control, vol. 13, no. 2-3, pp. 242–260, 2007.
- [4] A. Moradvandi, R. E. Lindeboom, E. Abraham, and B. De Schutter, “Models and methods for hybrid system identification: a systematic survey,” IFAC-PapersOnLine, vol. 56, no. 2, pp. 95–107, 2023.
- [5] A. Bemporad, G. Ferrari-Trecate, and M. Morari, “Observability and controllability of piecewise affine and hybrid systems,” IEEE Transactions on Automatic Control, vol. 45, no. 10, pp. 1864–1876, 2000.
- [6] R. Vidal, A. Chiuso, and S. Soatto, “Observability and identifiability of jump linear systems,” in IEEE Conference on Decision and Control, vol. 4, 2002, pp. 3614–3619.
- [7] M. Petreczky, L. Bako, and J. H. Van Schuppen, “Realization theory of discrete-time linear switched systems,” Automatica, vol. 49, no. 11, pp. 3337–3344, 2013.
- [8] J. Roll, A. Bemporad, and L. Ljung, “Identification of piecewise affine systems via mixed-integer programming,” Automatica, vol. 40, no. 1, pp. 37–50, 2004.
- [9] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari, “A clustering technique for the identification of piecewise affine systems,” Automatica, vol. 39, no. 2, pp. 205–217, 2003.
- [10] F. Lauer, G. Bloch, F. Lauer, and G. Bloch, “Hybrid system identification,” Hybrid System Identification: Theory and Algorithms for Learning Switching Models, pp. 77–101, 2019.
- [11] L. Bako, “Identification of switched linear systems via sparse optimization,” Automatica, vol. 47, no. 4, pp. 668–677, 2011.
- [12] R. Vidal, S. Soatto, Y. Ma, and S. Sastry, “An algebraic geometric approach to the identification of a class of linear hybrid systems,” in IEEE International Conference on Decision and Control, vol. 1, 2003, pp. 167–172.
- [13] R. Vidal, “Recursive identification of switched ARX systems,” Automatica, vol. 44, no. 9, pp. 2274–2287, 2008.
- [14] M. Gegundez, J. Aroba, and J. M. Bravo, “Identification of piecewise affine systems by means of fuzzy clustering and competitive learning,” Engineering Applications of Artificial Intelligence, vol. 21, no. 8, pp. 1321–1329, 2008.
- [15] H. Nakada, K. Takaba, and T. Katayama, “Identification of piecewise affine systems based on statistical clustering technique,” Automatica, vol. 41, no. 5, pp. 905–913, 2005.
- [16] C. Li, Z. Huang, Y. Wang, and H. Jiang, “Rapid identification of switched systems: A data-driven method in variational framework,” Science China Technological Sciences, vol. 64, no. 1, pp. 148–156, 2021.
- [17] A. L. Juloski, S. Weiland, and W. M. H. Heemels, “A bayesian approach to identification of hybrid systems,” IEEE Transactions on Automatic Control, vol. 50, no. 10, pp. 1520–1533, 2005.
- [18] L. Bako, K. Boukharouba, E. Duviella, and S. Lecoeuche, “A recursive identification algorithm for switched linear/affine models,” Nonlinear Analysis: Hybrid Systems, vol. 5, no. 2, pp. 242–253, 2011.
- [19] R. Baptista, J. Y. Ishihara, and G. A. Borges, “Split and merge algorithm for identification of piecewise affine systems,” in American Control Conference, 2011, pp. 2018–2023.
- [20] A. M. Ivanescu, T. Albin, D. Abel, and T. Seidl, “Employing correlation clustering for the identification of piecewise affine models,” in Workshop on Knowledge Discovery, Modeling and Simulation, 2011, pp. 7–14.
- [21] M. G. Sefidmazgi, M. M. Kordmahalleh, A. Homaifar, and A. Karimoddini, “Switched linear system identification based on bounded-switching clustering,” in American Control Conference, 2015, pp. 1806–1811.
- [22] A. Bemporad, A. Garulli, S. Paoletti, and A. Vicino, “A bounded-error approach to piecewise affine system identification,” IEEE Transactions on Automatic Control, vol. 50, no. 10, pp. 1567–1580, 2005.
- [23] C. N. Mavridis and J. S. Baras, “Identification of piecewise affine systems with online deterministic annealing,” in IEEE Conference on Decision and Control, 2023, pp. 4885–4890.
- [24] C. N. Mavridis, A. Kanellopoulos, J. S. Baras, and K. H. Johansson, “State-space piece-wise affine system identification with online deterministic annealing,” in European Control Conference, 2024, pp. 3110–3115.
- [25] S. Weiland, A. L. Juloski, and B. Vet, “On the equivalence of switched affine models and switched arx models,” in IEEE Conference on Decision and Control, 2006, pp. 2614–2618.
- [26] S. Paoletti, A. Garulli, J. Roll, and A. Vicino, “A necessary and sufficient condition for input-output realization of switched affine state space models,” in IEEE Conference on Decision and Control, 2008, pp. 935–940.
- [27] S. Paoletti, J. Roll, A. Garulli, and A. Vicino, “On the input-output representation of piecewise affine state space models,” IEEE Transactions on Automatic Control, vol. 55, no. 1, pp. 60–73, 2009.
- [28] M. Petreczky, L. Bako, and J. H. van Schuppen, “Identifiability of discrete-time linear switched systems,” in ACM International Conference on Hybrid Systems: Computation and Control, 2010, pp. 141–150.
- [29] C. N. Mavridis and J. S. Baras, “Online deterministic annealing for classification and clustering,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7125–7134, 2023.
- [30] C. Mavridis and J. S. Baras, “Annealing optimization for progressive learning with stochastic approximation,” IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 2862–2874, 2023.
- [31] C. N. Mavridis, N. Suriyarachchi, and J. S. Baras, “Maximum-entropy progressive state aggregation for reinforcement learning,” in IEEE Conference on Decision and Control, 2021, pp. 5144–5149.
- [32] C. N. Mavridis and J. S. Baras, “Progressive graph partitioning based on information diffusion,” in IEEE Conference on Decision and Control, 2021, pp. 37–42.
- [33] C. N. Mavridis, N. Suriyarachchi, and J. S. Baras, “Detection of dynamically changing leaders in complex swarms from observed dynamic data,” in International Conference on Decision and Game Theory for Security. Springer, 2020, pp. 223–240.
- [34] K. Rose, “Deterministic annealing for clustering, compression, classification, regression, and related optimization problems,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2210–2239, 1998.
- [35] C. Mavridis, E. Noorani, and J. S. Baras, “Risk sensitivity and entropy regularization in prototype-based learning,” in Mediterranean Conference on Control and Automation, 2022, pp. 194–199.
- [36] C. Mavridis and J. Baras, “Multi-resolution online deterministic annealing: A hierarchical and progressive learning architecture,” arXiv preprint arXiv:2212.08189, 2022.
- [37] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research, vol. 6, no. Oct, pp. 1705–1749, 2005.
- [38] X. Lin, Z. Yang, X. Zhang, and Q. Zhang, “Continuation path learning for homotopy optimization,” in International Conference on Machine Learning. PMLR, 2023, pp. 21 288–21 311.
- [39] V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint. Springer, 2009, vol. 48.
- [40] F. Lauer and G. Bloch, “Switched and piecewise nonlinear hybrid system identification,” in Hybrid Systems: Computation and Control: 11th International Workshop, HSCC 2008, St. Louis, MO, USA, April 22-24, 2008. Proceedings 11. Springer, 2008, pp. 330–343.
- [41] B. M. Jenkins, A. M. Annaswamy, E. Lavretsky, and T. E. Gibson, “Convergence properties of adaptive systems and the definition of exponential stability,” SIAM Journal on Control and Optimization, vol. 56, no. 4, pp. 2463–2484, 2018.
- [42] B. Anderson and J. Moore, “New results in linear system stability,” SIAM Journal on Control, vol. 7, no. 3, pp. 398–414, 1969.
Appendix A Proof of Theorem 1.
We construct the system
(54) |
where , and . Subtracting (7) from (54), we get:
(55) |
where is the observation error, is the augmented state-input vector as defined in (5), and is an augmented matrix of the system parameters of size . Then (9) is equivalent to:
(56) |
Notice that (56) can be written in the form of a linear time-varying dynamical system:
(57) |
By vectorizing such that , (57) becomes:
(58) |
where denotes the Kronecker product, and is a matrix. We will show that (58) is exponentially stable in the large (Definition 1, [41]) as long as (8) is satisfied. Consider the Lyapunov function candidate . It is obvious that there exist such that . Notice that . As a result, by summing the differences for timesteps, we get:
(59) | ||||
for some . Here is the transition matrix of (58), and the inequality follows from condition (8). Notice that the first inequality in (8) is equivalent to and directly implies that , for some , as well. As a result for some , and, therefore, for some . Finally this implies that for some [42]. Notice that the second inequality of (8) is necessary to ensure non-singularity of the transition matrix [41]. Finally, as an immediate result of (59), , , which implies uniform asymptotic stability in the large, and, due to linearity, exponential stability in the large.
![]() |
Christos N. Mavridis received his Diploma in electrical and computer engineering from the National Technical University of Athens, Greece, in 2017, and the M.S. and Ph.D. degrees in electrical and computer engineering at the University of Maryland, College Park, MD, in 2021. His research interests include stochastic optimization, learning theory, hybrid systems, and control theory,. He is currently a postdoc at KTH Royal Institute of Technology, Stockholm, and has been affiliated as a research scientist with the Institute for Systems Research (ISR), University of Maryland, MD, the Nokia Bell Labs, NJ, the Xerox Palo Alto Research Center (PARC), CA, and Ericsson AB, Stockholm. Dr. Mavridis is an IEEE member, and a member of IEEE/CSS Technical Committee on Security and Privacy. He has received the A. James Clark School of Engineering Distinguished Graduate Fellowship and the Ann G. Wylie Dissertation Fellowship in 2017 and 2021, respectively. He has been a finalist in the Qualcomm Innovation Fellowship US, San Diego, CA, 2018, and he has received the Best Student Paper Award in the IEEE International Conference on Intelligent Transportation Systems (ITSC), 2021. |
![]() |
Karl H. Johansson is Swedish Research Council Distinguished Professor in Electrical Engineering and Computer Science at KTH Royal Institute of Technology in Sweden and Founding Director of Digital Futures. He earned his MSc degree in Electrical Engineering and PhD in Automatic Control from Lund University. He has held visiting positions at UC Berkeley, Caltech, NTU and other prestigious institutions. His research interests focus on networked control systems and cyber-physical systems with applications in transportation, energy, and automation networks. For his scientific contributions, he has received numerous best paper awards and various distinctions from IEEE, IFAC, and other organizations. He has been awarded Distinguished Professor by the Swedish Research Council, Wallenberg Scholar by the Knut and Alice Wallenberg Foundation, Future Research Leader by the Swedish Foundation for Strategic Research. He has also received the triennial IFAC Young Author Prize and IEEE CSS Distinguished Lecturer. He is the recipient of the 2024 IEEE CSS Hendrik W. Bode Lecture Prize. His extensive service to the academic community includes being President of the European Control Association, IEEE CSS Vice President Diversity, Outreach & Development, and Member of IEEE CSS Board of Governors and IFAC Council. He has served on the editorial boards of Automatica, IEEE TAC, IEEE TCNS and many other journals. He has also been a member of the Swedish Scientific Council for Natural Sciences and Engineering Sciences. He is Fellow of both the IEEE and the Royal Swedish Academy of Engineering Sciences. |