xrdocs_io_cloud_scale_networking_blogs_ios_xr_evolution_part2
xrdocs_io_cloud_scale_networking_blogs_ios_xr_evolution_part2
Jag Tangirala
Distinguished Engineer, Cisco. Follow
Save to PDF
O N T H I S PA G E
D ATA D I S T R I B U T I O N A N D A C C E S S D E S I G N PAT T E R N S
H I G H AVA I L A B I L I T Y F O U N D AT I O N
UPGRADEABILITY ARCHITECTURE
CONCLUSION
This is part two of the blog about the IOS XR’s software architecture. If you missed part 1, it’s advisable to start with it. In
the rst part, under the IOS XR Architecture Strategy, we covered the following concepts: Decoupled Planes Abstraction,
State Management, Process Distribution Across Available Compute, and High Performance Messaging Infrastructure.
This second part covers the Data Distrubution and Access Design Patterns and High Availability & Upgradeability.
After scalable data partitioning, intelligent process placement to work with that data, and a high performance messaging
infrastructure to enable communication among the processes working with this data, we now explore the next important
aspect––the data distribution and access patterns that applications running in an XR system use.
Note that data distribution and access is inherently di erent from the messaging infrastructure/IPC mechanism––the
messaging infrastructure provides a means of communication; it does not de ne the approach as to how to present data
to other parts of the system that need the data, or how an application locates its resources that it needs for its operation.
These issues are fundamental to the applications that run in IOS XR and are re ected in the implicit decisions an
application is built upon.
Understanding the inherent data characteristics of a router is essential in designing and optimizing the NOS properly. IOS
XR is designed taking into account the data distribution and access patterns in a small single CPU IOS XR router to a
large distributed router built from multiple chassis. Based on these insights, the IOS XR NOS data access and distribution
characteristics can be pictorially categorized as follows:
The following informational section gives additional details about these characteristics.
Data distribution has four sub characteristics: size of the data, number of consumers, liveness of producers/consumers, and
tracking of producer.
FIB routes is an example of a large amount of data that is going to be consumed by a large number of nodes in a
distributed routing cluster. Operational data is an example of a large set of data that is going to be consumed by a few
management agents. Throughput and latency are two performance metrics that matter in this kind of data distribution.
Many of the data sharing applications would want to know the state of producer(s); i.e. if the producer is active or has gone
down so that they can take an appropriate action. Again, a good example of this is the interaction between a routing
protocol and RIB. If a protocol has populated routes to RIB, the RIB should be noti ed when the protocol goes down so that
it can stale the routes, and in the event that the protocol does not come back, it can delete those routes and promote
backup routes, if any.
Many of the applications would also want to track producers of the data for various purposes. Going back to the RIB
example, the routing protocols populating routes to RIB are the producers and RIB is the consumer. RIB explicitly tracks
each such producer so that when all the producers are done with their routes, RIB declares routing convergence. This is an
important state for the operation of the router.
In some cases, the data producer needs to be aware of individual consumers. In most cases, the producer needs to know
only about the aggregate set of consumers, but in some cases, more intimate knowledge of the consumer is needed. For
example, consider the case of RIB downloading routes to various FIBs across the routing cluster. Typically, this is done by
putting all the consumers in a group and multicasting to them. If one consumer restarts, the producer should start a
separate download session to that consumer so that it is aware of all the updates thus far while continuing to update the
remaining consumers. The producer cannot suspend downloads to the other consumers because network events may
happen at the same time and delaying downloads results in an unacceptable network convergence. Once the restarting
consumer catches up with the rest of the group, the producer has to merge the consumer back with the larger multicast
group.
On the data access front, if we look at breadth of data access, most data is accessed only by a few entities in the router
cluster, and there is a very limited set of data that is accessed very broadly. One ne example for this is BGP’s private
internal RIB structures. This data is referenced solely by BGP itself and the manageability agents. It is extremely large, and
in many cases, is the majority of the data in the router cluster. BGP uses this data for computing routes and then injects
speci c paths into the RIB. Feature (ACL, QoS, etc.) data is another example of data with small breadth of access in terms
of number clients. By contrast, there are other data items, such as the interface database, system con guration, etc., that
are referenced by almost every other process in the router cluster.
Similarly, frequency of access of data items is highly variable, with a few data items that are very frequently accessed and
the vast majority that are rarely accessed. For example, BGP provides an excellent example of a great deal of data that is
accessed infrequently. Since BGP only makes incremental changes, once a path is in BGP’s internal RIB, and advertised to
the RIB processes and any neighbors, that information may not be accessed again until the advertising BGP session fails.
This could easily be days or weeks. On the high frequency side, the interface database can be accessed extremely
frequently, such as every time an application needs to resolve an interface name into a data structure, access interface
dependent statistics, or con guration. Thus, routing and con guration are a couple of examples of data that is accessed
infrequently while statistics and interfaces are data that are accessed frequently.
For some data, the consumer consumes the data obtained from the producer as is. In some cases, consumers need to
transform the received data.
Design Patterns
Based on the above examples, we can observe some usage patterns. There is small, frequently, and broadly accessed
data and large, infrequently, and sparsely accessed data. There are large volumes of data that need to be moved to a lot
of nodes, and there are small volumes of data that need to be moved to a lot of nodes.
What is the best way to satisfy these requirements? The IOS XR, after a careful consideration of data access metrics like
the breadth of access, data size, frequency of access, degree of producer/consumer decoupling, liveness, data
transformation requirements, is built around the following two fundamental data distribution/access patterns:
Data-centric
In a data-centric approach, IOS XR applications are structured around the data that is read or written. The identity of the
processes that provide or consume the data is hidden in the infrastructure. Data is located by its description, and that
acts as the rendezvous point for end applications. To that end, it mimics the publisher-subscriber model––more popularly
known as “pub-sub”. Since processes that share data will not be connecting to each other to exchange data, this model
is characterized by a looser coupling between processes exchanging data.
The time, space, and synchronization decoupling is achieved in IOS XR systems via a data centric broker between
producers and consumers.
IOS XR designed the following two data-centric infrastructures based on the needs:
ENS (Event Noti cation System) - a data centric pub-sub messaging network.
The above categorization is very important to design a good NOS. For example, if we look at con guration data, it is
essentially a state that is relatively small, fairly static, with a large set of consumers, and with almost no need of data
transformation. Databases are great for storing such a state and to keep retrieving it. But when something changes in
there, it is good to have “don’t call us, we will call you” support. Thus, SysDB provides messaging/noti cation support.
SysDB, like Redis, is an advanced in-memory data structure server. Interestingly, like Redis, SysDB also acts separately
as a publish/subscribe server.
ENS, on the other hand, is a decentralized at topic based pub-sub messaging infrastructure that actually moves data
from one place to another with reliability semantics. It is useful for the distribution of data that is written by one process to
one or more processes on di erent nodes (a push model). ENS also works equally well in a pull model. Readers can
come up and pull the data from the writer somewhere in the system. It is decentralized, as it is a collection of brokers
spread across all nodes.
If we compare between ENS and SysDB, ENS creates direct links between distributed nodes, whereas SysDB is a
logically centralized node which must be written to, then read from. In this respect, this is similar to the comparison
between ZeroMQ and Redis, respectively. ZeroMQ/ENS is primarily a messaging infrastructure whereas Redis/SysDB is
primarily a database. SysDB is covered in more detail in the following informational section.
SysDB - Logically centralized, physically distributed, scalable, neutral, in-memory, highly available, model driven, pub-sub
datastore
That is a mouthful of a description for SysDB, but each word in there is important, and collectively they de ne the concept
and the power of SysDB. Con guration and operational data is a signi cant part of the router state, and the IOS XR’s SysDB
is an advanced datastore to hold this data. SysDB has the following attributes shown in the picture, and each of these
attributes is explained below:
The con guration and operational data is eventually consumed by various manageability agents interacting with the router
on the northbound side. From an operations point of view, it is hugely important to present manageability agents the
con guration/operational data in a centralized fashion, irrespective of the internal implementation. But internally the
datastore should be distributed to achieve the required scalability, reliability, and responsiveness goals of large systems. IOS
XR designed SysDB as a scalable distributed data store that presents a single logical view for the whole system’s
con guration and operational data. Data is partitioned according to the principles mentioned elsewhere in this blog.
Logically centralized/physically distributed is a concept that we see these days being used in many SDN Controllers as they
try to present one logical global view of the network state while physically distributing the state across multiple nodes.
Scalable
SysDB is scalable to a large number of nodes, large sets of con guration, high volume operational data, and to frequently
changing operational data.
One can apply more than two million lines of con guration, and IOS XR works without a hiccup! One can retrieve large
amounts of operational data from SysDB easily. This scalability can be attributed to proper data partition, and distribution of
the required processing of the same, across the available compute.
Due to carefully designed shared state concurrency, many applications can also access the data in parallel, yielding greater
scalability.
Neutral
SysDB stores data in a format that is independent of the data formats of the manageability agents that it interacts with in
the northbound direction and application backends it interacts with in the southbound direction. This is important for
decoupling SysDB’s producers and consumers. All manageability agents work on the same data, and any new
manageability agent support can be easily added.
With SysDB, applications do not have to track the data by its location, nor do they have to track the location of the data
providers. Since the applications are unaware of the identity or location of data providers/consumers, they are una ected if
these parties crash or relocate.
Model Driven
SysDB data is modeled and has evolved over the years to support native and Opencon g Yang models. For many years, XR
has provided programmatic access to con guration and operational data.
In-memory
The SysDB data is stored in memory (RAM). This is critical for the required performance.
Pub-sub
SysDB is a hierarchical topic based publish-subscribe mechanism that decouples producers and consumers. It is designed
to store fairly static data as well as dynamic, fast-changing, and/or high volume data. Subscribers are usually interested in
particular events or event patterns, and not in all events.
Highly Available
Active/standby HA is supported.
Process-centric
In the process-centric model, speci c processes own the data, and any process interested in the data must contact the
owning process to retrieve or update the data. Processes discover each other through a distributed directory services
and register with each other for their interest on data.
One main bene t of the process-centric approach is the ability of the consumer to e ciently get liveness information
about the producer. In cases where that liveness impacts the usability of the data, this is essential. This is not possible in
a true data-centric system, as the producer is kept anonymously behind the publishing mechanism.
Since the producer of the data is aware of the data semantics, e cient data structures that are well-suited for the data
being produced are used in IOS XR to build e cient noti cation mechanisms.
There are many use cases among router applications where a process-centric model turns out to be the best choice and
IOS XR makes the optimal use of this construct.
The fact that IOS XR services run outside of the kernel in separate address spaces means that the crash of an IOS XR
process is isolated and does not crash the system, and in general does not crash other processes. Thus, a measure of
fault isolation is available, and it can be exploited for purposes of service upgrade. Restartable processes are a major
aspect of the high availability of an IOS XR system, and all IOS XR processes are required to be restartable.
Decoupled planes abstraction, explained earlier, also helps in the high availability of the system.
For restart and redundancy support, in general, checkpointing and replication services for local as well as remote
checkpointing/replication are available so that restarted applications can start up warm or even hot. Replication services
that allow checkpointing to more than one replica at a time are also available.
IOS XR also supports many levels of redundancy in the system, including switch-over of paired RPs, OIR scenarios and
process-level, and role redundancy.
IOS XR has runtime monitoring of CPU and memory resources, so that applications that are using up too much of either
of these system resources may be terminated or have their scheduling semantics modi ed to reduce their impact on the
rest of that node.
IOS XR also supports NSF (Non-stop Forwarding), GR (Graceful Restart) protocols and NSR (Non-stop Routing).
The overall coordination of a router cluster, comprised of either single chassis or multiple chassis, requires that there be
communications within the cluster, some sense of the topology of the cluster, and some consistent decision making
capabilities within the cluster. To do this, the IOS XR has a top-level control protocol that is responsible for electing an
overall leader of the cluster and computing the topology of the cluster.
Upgradeability Architecture
The modularity of IOS XR makes it possible to upgrade individual or smaller pieces of the software. That is, the same
architectural features that accomplish fault isolation can also be exploited to achieve a ner-grain level of software
upgradeability. The problem of software upgradeability starts at the smallest unit of software divisibility and goes to
collections of these units. It is not at all a given with an IOS XR system that a software upgrade has much of an impact on
the system. Much e ort continually goes into reducing the impact of software upgrades, and it is the architecture of an
IOS XR system, as well as the software organization, that enables this process.
Because much of an IOS XR process’s code may actually reside in shared libraries/DLLs, it is possible to replace/upgrade
a portion of a process’s code by loading a new shared library. Because IOS XR processes are restartable, all that is
required to pick up a new version of a DLL is to restart a process. Even some IOS XR processes do not restart, but simply
unload and reload newer versions of DLLs.
The fundamental building block of IOS XR is the component. This is the minimum version-able unit of software. A
component is potentially eld replaceable. A component may contain a process’s executable le, DLLs for sharing code
between processes, binary les for implementing a portion of the CLI, text les describing con guration rules, or other
related data. A typical example of an IOS XR component is Border Gateway Protocol (BGP). The BGP is a component in
the Cisco IOS XR system that includes the BGP process, the con guration and operational models, and associated
processes.
IOS XR software is organized into packages at the highest level. Packages are collections of components and contain
meta-data that indicates the compatibility of the components in the package with other components. Packages contain
features that can be installed/activated/deactivated at runtime on an individual node, a set of nodes, or across the entire
IOS XR system.
Conclusion
The blog started o by giving a brief introduction to IOS XR’s rapid evolution over the years. It then explained the
architecture strategy via six steps: decoupled plane abstractions, state management, process distribution, high
performance messaging infrastructure, data distribution/access patterns, and high availability and upgradeability. This
logical sequencing, followed by details on each, hopefully gave you a better perspective of the IO XR design thought
process.
I hope this blog gave good insights into various powerful architectural patterns built into IOS XR and how the IOS XR has
pioneered various popular scalable and highly available frameworks in the context of a NOS. More importantly, it tries to
give a sense of what it takes to build a carrier grade NOS and how much of a key role the infrastructure plays in a well
designed NOS.
The IOS XR anticipated and baked in many architectural patterns that are considered trendy and cutting edge today, and
the IOS XR evolution continues as networking scene keeps changing around it, as it has been over the last several years.
IOS XR kept the foundations simple, which made complex things possible over the years. These solid foundations and
continuous evolutions made IOS XR the best NOS for decades to come. For IOS XR, the change is a process and the
future is the destination.
SHARE ON
Leave a Comment
What do you think?
18 Responses
0 Comments
1 Login
Name
This site is maintained by Cisco Systems, Inc. employees. Powered by Jekyll & Minimal Mistakes.