0% found this document useful (0 votes)
44 views34 pages

Paper 6 - Schema-Based JSON Data Stores in Relational Databases

This article discusses the integration of JSON data with relational databases to leverage the strengths of both technologies, particularly focusing on schema-based JSON data management. It highlights the challenges of NoSQL document stores, such as the lack of standard query languages and ACID compliance, and proposes a new approach to create relational schemas from JSON schemas for efficient querying and data management. The authors present principles for mapping, querying, and projecting JSON data within relational database systems, aiming to enhance developer productivity and data analysis capabilities.

Uploaded by

ghgy87g9tr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views34 pages

Paper 6 - Schema-Based JSON Data Stores in Relational Databases

This article discusses the integration of JSON data with relational databases to leverage the strengths of both technologies, particularly focusing on schema-based JSON data management. It highlights the challenges of NoSQL document stores, such as the lack of standard query languages and ACID compliance, and proposes a new approach to create relational schemas from JSON schemas for efficient querying and data management. The authors present principles for mapping, querying, and projecting JSON data within relational database systems, aiming to enhance developer productivity and data analysis capabilities.

Uploaded by

ghgy87g9tr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Journal of Database Management

Volume 30 • Issue 3 • July-September 2019

Schema-Based JSON Data Stores


in Relational Databases
Lubna Irshad, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Li Yan, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Zongmin Ma, Nanjing University of Aeronautics and Astronautics, Nanjing, China
https://orcid.org/0000-0001-7780-6473

ABSTRACT

JSON is a simple, compact and light weighted data exchange format to communicate between web
services and client applications. NoSQL document stores evolve with the popularity of JSON, which
can support JSON schema-less storage, reduce cost, and facilitate quick development. However,
NoSQL still lacks standard query language and supports eventually consistent BASE transaction
model rather than the ACID transaction model. This is very challenging and a burden on the developer.
The relational database management systems (RDBMS) support JSON in binary format with SQL
functions (also known as SQL/JSON). However, these functions are not standardized yet and vary
across vendors along with different limitations and complexities. More importantly, complex searches,
partial updates, composite queries, and analyses are cumbersome and time consuming in SQL/JSON
compared to standard SQL operations. It is essential to integrate JSON into databases that use standard
SQL features, support ACID transactional models, and has the capability of managing and organizing
data efficiently. In this article, we empower JSON to use relational databases for analysis and complex
queries. The authors reveal that the descriptive nature of the JSON schema can be utilized to create
a relational schema for the storage of the JSON document. Then, the powerful SQL features can be
used to gain consistency and ACID compatibility for querying JSON instances from the relational
schema. This approach will open a gateway to combine the best features of both worlds: the fast
development of JSON, consistency of relational model, and efficiency of SQL.

Keywords
JSON, JSON Schema, NoSQL, Relational Database Systems, Schema-Less

INTRODUCTION

Web services are the means to exchange information on a mobile, search engine, enterprise application
and many more, thought formats like XML (Extensible Markup Language) and JSON (Java Script
Object Notation). For serialization, data interchange format plays a great role in terms of the rate of
data transfer and performance (Helland, 2017). Structure of JSON is similar to the type model of
many programming languages, essentially Java Scripting (JS) that makes it flexible, easy to use and

DOI: 10.4018/JDM.2019070103

Copyright © 2019, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.


38
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

independent text format. It is simple and requires no prior knowledge to acquire and practice. JSON
is a format that fills a particular niche to integrate multiple services across many platforms. XML
and JSON, both being semi-structured and hierarchal data model for data exchange, are popular
serialization techniques in Web development. XML as compared to JSON is heavy, complex and
requires additional libraries to exchange data. Its model consists of hierarchal complex tags and
requires more bytes for data transfer even for a small task. JSON is light weighted, compact and
close to many programming languages. It is one of its kind of emerging data-interchange format as
compared to XML and many others like Atom, RDF (Ma, Lin, Yan, Zhao, 2018; Ma, Jia, Cheng &
Angryk, 2016), REBOL, Gellish, YAML and so on.
JSON document consists of objects that constitute of attributes/keys (string type), value (String,
Number, Boolean and NULL), arrays and objects. An object is a collection of pairs (attributes &
values) and a pair can again be a JSON object (Bourhis, Reutter, Suárez & Vrgoč, 2017). JSON
document can have hierarchical data like nested objects and arrays. Being simple, easy and light
weighted, JSON has become a format of choice for most Web services. JSON especially adores for
storing temporary data (like filling the form on Web) and exchanging information between servers and
clients. Note that it cannot be used for permanent storage, data analysis, and processing of complex
queries1. So, database support for storing and querying JSON came into existence (Liu, 2019; Irshad,
Ma, & Yan, 2019; Junkkari et al., 2016; Hu & Dessloch, 2015). Although JSON can work standalone,
its database support provides secure, easy and fast processing of information. This situation just likes
XML for databases (Fong & Shiu, 2012). In addition, it is difficult to manage data sharing by multiple
users in the document but it is easy in databases. Two types of database models support JSON data
management, which are relational database management system (RDBMS) and NoSQL (Not only
SQL) (Liu, 2019; Irshad, Ma, & Yan, 2019). We categorize them on the basis of their data structure,
model, data organization, and transaction and querying mechanism.
NoSQL database encompasses document databases, key-value pair stores, columnar stores and
graph formats stores (Ma, Capretz, & Yan, 2016). JSON document stores belong to the category
of NoSQL document databases that evolve with the popularity of JSON. NoSQL data stores are
gaining popularity but still lack powerful standard query language. Although, as compared to many
document stores, the query language of MongoDB is easy and close to SQL, it requires MapReduce
for aggregation and complex queries. CouchDB provides better consistency but has complex query
language. Learning a new language every time as per the need of application/project is a cumbersome
and time taking task. Standardization of query language for all NoSQL document stores is the
biggest demand at present. NoSQL document stores, CouchDB and MongoDB claim to provide
ACID (Atomicity, Consistency, Isolation, Durability) transaction model in a coming version. First,
the ACID model is implemented on a single document level only and secondly, it is not fully tested
and available yet. BASE transaction model is not suitable for all applications, where things will be
eventually consistent as compared to the ACID model.
For traditional RDBMS, it is compulsory to design schema carefully before storing/adding data
in the relational tables, creating relations among tables and then querying data (Liu, Hammerschmidt,
& McMahon, 2014). It enforces relationships among tables using entity-relationship model (Chen,
2002) by breaking down entities into tables and relations (Liu, Hammerschmidt, & McMahon, 2014).
Good schema design helps for fast and efficient querying. Advocates of NoSQL document stores
claim that schema maintenance costs a lot for rapidly growing data while proponents of RDBMS
argue that schema-less development results in long-terms issues (Tahara, Diamond, & Abadi, 2014).
NoSQL databases facilitate semi-structured and unstructured data that may fluctuate time to time while
relational databases deal with structured data. NoSQL does not have a strong powerful and generalized
query language while RDBMS have a native strong SQL language for fast efficient querying. NoSQL
follows “schema later or never” approach while RDBMS follows the relational data model (Melton,
2003) based on “schema earlier, data afterward “approach (Liu et al., 2016). The relational database
strongly follows the ACID model while the NoSQL databases follow BASE principles (Chandra,

39
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

2015). NoSQL document stores generally do not provide explicit locks and have weaker concurrency
and atomicity properties than traditional ACID-compliant databases (Cattell, 2011).
In a contemporary world where time to the market is crucial and agile development is gaining
popularity, no (or never) schema nature of JSON really helps for quick development. Diminishing
schema definition overhead reduces workload and makes the application functional in no time without
niggling about incidental details. However, these incidental details may cause many difficulties
afterward in the organization, maintenance, reusability and sharing of data (Tahara, Diamond, &
Abadi, 2014). Besides, dealing with huge JSON document requires some validation rules, constraint
checking for the integrity and joining multiple documents. The notion of structure or schema is
substantial for traditional application development and data analysis (Klettke, Störl, & Scherzinger,
2015). JSON schema plays this role for JSON document (Baazizi, Colazzo, Ghelli, & Sartiani, 2019;
Pezoa et al., 2016). JSON schema is used optionally to describe the structure of JSON document by
defining attributes with their names, types, properties, constraints, hierarchy, and relation with other
attributes. It describes the structure, defines interaction control and affirms the validity of JSON
document. First, it is not compulsory to design JSON schema prior creation of any JSON document.
Secondly, it does not play many roles in querying JSON document efficiently except validation and
this differs it with the definition of a relational schema. Representation of data through JSON data
format is increasing. As a matter of fact, an increase in substantial amount of data will later require
analysis too. Pragmatically, for analysis, it is necessary to have some up-to-date information, structure
definition, and efficient query language. The descriptive power of JSON schema can be used to create
the schema in RDBMS and prevailing SQL query can lead to quick, efficient and reliable querying.
Both NoSQL document databases and RDBMS are used in different scenarios according to
requirement. The developer needs to learn both of them to enhance productivity. However, the
integration of various approaches will be more beneficial for the developer to gain the best of all in
single platforms. This paper devotes to combine both worlds together to take out best and resolve
all mentioned issues. We intend to incorporate JSON with RDBMS to get the benefit of multiple
platforms along with SQL powerful features. This can reduce the burden of learning new technology
and new language every time according to the application need that is required in case of NoSQL
document databases, and new operations that support JSON like in relational database systems. For
this vision, we articulate a new approach that will open new horizons for research. Few efforts have
devoted to JSON data management with RDBMS, but the proposed approaches mainly focus on
schema-less development in RDBMS (Liu, Hammerschmidt, & McMahon, 2014; Liu, 2019). Being
different from (Liu, Hammerschmidt, & McMahon, 2014; Liu, 2019), in (Chasseur, Li, & Patel, 2013;
Petković, 2017a & 2017b), JSON objects are split and stored in one or several relational tables. We
suggest in this paper using the descriptive nature of JSON schema to create a relational schema for
JSON document management by using the following three principles.

Mapping and Storage Principle


We transform JSON schema into a relational schema using our proposed mapping schema (Algorithm
1) by following the architecture similar to relational databases and create tables using a vertical table
approach. We keep the record of singleton attributes in the master table and manage objects and arrays
in child tables (Atzeni et al., 2013). We develop an algorithm for the storage of JSON objects with
unique object_id for reference and maintaining relations among master and child tables.

Query Principle
For querying, we apply some generalized transformation principles along with SQL traditional
statements to perform all operations. As we use a simple relational structure, querying is also simple
and easy like in any relational database system. Our approach allows the user to apply JSON as
relational databases by querying directly with standard SQL statements.

40
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Projection Principle
Data can project to the user in relational and JSON format. Due to the basic relational structure,
queried data can display easily in relational format. To display data in JSON format, we suggest
recomposing JSON document in-relation to unique object_id of stored JSON objects from relational
tables. We propose an algorithm (Algorithm 3) to transform user queried data into JSON format.
Our approach is simple and close to relational database architecture. It is easy to query JSON
data by using traditional SQL statements. To the best of our knowledge, our approach for managing
JSON data with schema with RDBMS is the first effort in JSON data management with RDBMS.
We hope this approach will help JSON and RDBMS to complement each other in a single platform
and benefit developers to complete their task efficiently and rapidly.
Details of above-mentioned principles are discussed in the following sections. Section 2 highlights
the related work. Section 3 defines generalized mapping principles and an algorithm to create a
relational schema from JSON schema. Section 4 states JSON instance operations principles for
projection, selection, insertion, updating, and deletion of JSON instance from the relational schema.
In addition, two algorithms are defined: one to store JSON data in the created relational schema and
second to display the user queried results into JSON file format. Section 5 presents implementations
and experimental details. Section 6 draws some conclusion with future work.

RELATED WORK

JavaScript Object Notation (JSON) is unstructured, flexible and readable by humans and machine.
With the rapid increase of data and users, JSON has become a current hotspot for data management
on the Web. Various approaches are proposed for JSON data representation and processing (Bourhis,
Reutter, Suárez, & Vrgoč, 2017). Many database management systems support JSON format to store,
retrieve and organize JSON as NoSQL document stores and relational databases. JSON documents
can be stored as text in key-value stores (NoSQL) or in relational database systems as JSON datatype.

JSON Data Processing


JSON is a raw data format and JSON data must be parsed before it can be further processed or analyzed
(Li, 2017). Parsing JSON data is expensive, and a new JSON parser called Mison is hereby proposed
in (Li, 2017), which is typically one order of magnitude faster than any existing JSON parser. By
pushing down both projection and filter operators of analytical queries into the parser, Mison supports
a wide range of applications, including analytics on JSON data, real-time streaming JSON data, and
processing JSON messages at client machines. To make JSON parsing as fast as possible, Langdale
& Lemire (2019) present the first standard-compliant JSON parser to process gigabytes of data per
second on a single core, using commodity processors.
For efficient processing of JSON objects in data-intensive applications, Bonetta & Brantner
(2017) introduce a runtime system named Fad.js, which is based on speculative just-in-time (JIT)
compilation and selective access to data. With respect to JSON data access, JSON data querying is one
of the most useful forms of data access. Hidders, Paredaens, & Van den Bussche (2017) investigate
the foundations of querying JSON data and, based on Datalog, propose a logical framework named
J-Logic. The main feature of their approach is the emphasis on paths, which are sequences of keys
and are used to access the tree structure of nested JSON objects. J-Logic also features “packing” as
a means to generate a new key from a path or subpath. For JSON data querying, query languages are
essential. Florescu & Fourny (2013) review JSONiq, a query language for JSON data querying. In
addition to JSONiq, there are other query languages such as JSONPath, JSONSQL, and JAQL. Being
similar to the XPath for locating data in an XML document, JSONPath 2 retrieves information by a
path expression with forms of “dot-notation” and “bracket-notation”. In addition, the existing SQL/
JSON standard lacks the support for queries concerning full-text search. Petkovic (2018) proposes a

41
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

set of design goals that full-text search extension of the SQL/JSON language should support, in which
the given set is valid for any fulltext search language in relation to JSON documents.
In addition to JSON document, JSON schema is also an import part of JSON data, which generally
plays a crucial role in JSON data management (Baazizi, Colazzo, Ghelli, & Sartiani, 2019; Pezoa
et al., 2016; Wright, Andrews, & Lu, 2016). In (Hai, Quix, & Kensche, 2018), for example, JSON
schema is used for JSON data integration, where a nested mapping representation is generated.
Identifying that schema information is sometimes essential for data retrieval, integration and analysis
tasks, some efforts have devoted to deal with JSON schema. In (Mok, 2016), using Nested Normal
Form as a guide, an JSON schema design methodology is proposed in order to design redundancy
free JSON Schemas, which begins with UML use case diagrams, communication diagrams and
class diagrams that model a system under study. Note that JSON schema is not always available
for direct use. To discover implicit schemas in JSON data, a model-based approach is proposed to
generate the underlying schema of a set of JSON documents in (Cánovas Izquierdo & Cabot, 2013).
In (Klettke, Störl, & Scherzinger, 2015), an algorithm is introduced for schema extraction that is
operating outside of the NoSQL data store. Identify a JSON type language, Baazizi et al. (2017) deal
with the problem of inferring a schema from massive JSON datasets. They design a schema inference
algorithm and implement it based on Spark. Frozza et al. (2018) propose an approach for extracting a
schema from a JSON or extended JSON document collection stored in a NoSQL document-oriented
database or other document repository. Focusing on the use of open Web APIs from different sources
with JSON as interchange data format, Cánovas Izquierdo & Cabot (2016) develop a tool named
JSONDiscoverer, which can discover and visualize the implicit schema of JSON documents as well
as possible composition links among JSON-based Web APIs. More importantly, for the purpose of
JSON for document stores, a schema management framework is proposed in (Wangz et al., 2015),
which is used to discover and persist schemas of JSON records in a repository, and supports querie
and schema summarization.

JSON in NoSQL Databases


Increasing demand in the modern application has resulted in the generation of NoSQL databases,
which includes a wide variety of databases technologies: a document store is the most popular and used
one. MongoDB is an open-source NoSQL document database, which can provide high performance,
high availability, and automatic scaling (Chodorow & Dirolf, 2010). It supports dynamic schema
and collection of documents and this can reduce expensive joins in the relational databases. Its main
features along with schema-less in nature include horizontal scaling, indexing, aggregation support,
multiple storage engines, and so on. MongoDB stores data in the form of groups called collections
with unique identifiers. The collection is a combination of documents or key-value pairs, which can be
retrieved by their unique identifiers. The stored values can be Boolean, integer, float, date, string and
binary types as well as array of documents and key-value pairs. MongoDB stores data in BSON (binary
JSON) format. In addition, MongoDB supports a rich query language, supporting CURD (Create,
Update, Read, Delete) operations, data aggregations, text search and geospatial queries (Chodorow
& Dirolf, 2010). Simple query construct is easy and flexible and easy for those who understand SQL.
For complex aggregation, MapReduce job is required, which is a bit tricky to handle. Replication
model manages fail-overs and data redundancy automatically, known as “Replica Set”. A replica set
is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing
data availability (Chodorow & Dirolf, 2010). MongoDB supports dynamic queries with automatic
use of indices, like RDBMS (Cattell, 2011). MongoDB provides scaling through Sharding technique.
Sharding, a tedious task, is a process of distributing data across multiple machines, which requires a
proper division of data and management of read/write operations between different machines.
Another famous NoSQL document store that has gained a lot of popularity is CouchDB, a project
of Apache (Cattell, 2011). CouchDB stores document data in the JSON format. CouchDB is somewhat
similar to MongoDB. It is schema-less and lockless, and can provide indexes on collections, facilitate

42
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

scalability and a document query mechanism. Here document query mechanism handles document
like attachments and the fields consist of text, numbers, Boolean and lists. CouchDB uses RESTful
HTTP API to save, update, retrieve and delete data on JSON document. Querying CouchDB is a bit
complex and is possible only on predefined views. These predefined views facilitate to filter documents,
retrieve data in a specific order, and create indexes. CouchDB uses MapReduce for defining views,
and indexing, searching and aggregating data. This way is cumbersome and not much easy. Here
Map/Reduce has two folds: it looks at all of the documents; it creates a mapping of document for
further processing. Mapping is a onetime process and occurs again only if document is updated. Like
MongoDB, CouchDB provides a replication model named “Eventual Consistency.” In this model,
changes are copied to a node of document one by one without affecting other nodes and eventually
all nodes syncs (MongoDB replication model is not for scalability for failovers). Regarding updating
documents, with commit operation, all updates flush on the disk by writing updates to the end of the
file that lower the risk of conflicts. A developer can select replication, filter document, and make
copy on a specific device. This helps in optimizing the memory usage of mobiles. So, many platforms
recommend CouchDB for mobile application due to less memory utilization.
JSON document stores like MongoDB (Chodorow & Dirolf, 2010) and CouchDB (Anderson,
Lehnardt, & Slater, 2010) can store and retrieve JSON objects in their primary format. Efficient
binary format, query language and secondary indices are making MongoDB lead the race of NoSQL
document store as compared to CouchDB (Chasseur, Li, & Patel, 2013). Couchbase Server 4.0 has
introduced N1QL, a powerful query language that extends SQL to JSON, which enables the developers
to leverage both the power of SQL and the flexibility of JSON (Petkovic, 2017).
Note that JSON documents stores along with so many pros have some cons as compared to
relational database systems. Most importantly, JSON document stores lack powerful query and ACID
transaction features (Chandra, 2015). Data analysis or processing without powerful query and ACID
transaction property is very challenging and burden on the developer. The document stores generally
do not provide explicit locks, and have weaker concurrency and atomicity properties as compared
to traditional ACID-compliant databases (Cattell, 2011). On the one side, JSON document stores
facilitate the developer by reducing development overhead and cost and can provide quick services.
On the other side, the developer loses the rich power of native SQL query constructs, analytical
processing, safety-guaranteed transactions and many more. For these reasons, CouchDB and MongoDB
are recently trying to achieve ACID compliant transactions for a single document. They want to
leverage the capabilities of traditional relational databases and make their design more mature. ACID
compliance in NoSQL document stores is going to be an added benefit for the collected or generated
data in the future. With the popularity of NoSQL databases, the JSON persistence of RDBMS had
been becoming a question mark (Liu, Hammerschmidt, & McMahon, 2014; Liu, 2019). RDBMS
has been serving and evolving for more than 30 years and have faced many threats and yet not only
survived but also excel afterward (Atzeni et al., 2013). There is so much work that has been done on
RDBMS and SQL. It does not seem easy to throw away all the efforts and start a journey from the
outset without utilizing the efforts of many years (Atzeni et al., 2013).

JSON in Relational Databases


NoSQL and RDBMS now both handle JSON document well, depending upon the application
requirements. NoSQL document stores plays very well in the applications that do not require many
operations, analysis and consistency. But RDBMS is the best option if consistency, performance, ACID
transaction, analytical processing is required. JSON in RDBMS have been managed by schema-less
and schema-based design.

JSON Schema-Less Development in RDBMS


JSON schema-less relational databases world is a new paradigm, which opens a new door for fast
and efficient development of JSON data management. The main theme of this approach is to store

43
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

valid JSON instances as a whole in a binary or text column. It stores JSON as an object along with a
unique identifier, which can identify each instance, and uses the latest SQL/JSON constructs for its
management. As a result, the functional, performance and conceptual gap between SQL and NoSQL
worlds can be reduced. The biggest edge of using the relational model for JSON data is that the
capabilities of database servers (e.g., fast tuning, high scalability, and refined optimization techniques)
can be applied on JSON data to achieve high performance (Beyer et al., 2005).
The approach in (Wright, Andrews, & Lu, 2016) emphasizes to manage both relational and JSON
data under one umbrella of RDBMS, where, after the implantation of JSON path query language, SQL
can serve for both structured and unstructured data. It proposes to save native JSON objects into the
relational databases without shredding them into a relational model. So, the JSON object instances
are stored in relational schema without a formal definition of schema. Nowadays, major database
systems, say Oracle, MySQL, and PostgreSQL, are providing JSON data type (binary) and textual
type for storing JSON data. Text and binary data types both support JSON document management,
but it is recommended to use binary data type with the provided JSON-specific functions. Almost
all database vendors use the same approach on front-end by storing JSON document instances in
binary format, using JSON-specific function and JSON formatted data retrieval for querying. To parse
path hierarchy, JSON instance data can be accessed by using the prefix “.” or “->, ->>” notation in
different relational systems.
In (Liu, Hammerschmidt, & McMahon, 2014), three architectural principles are proposed, which
facilitate a schema-less development style within an RDBMS. As a result, RDBMS users can store,
query, and index JSON data without requiring schemas. It is shown in (Liu, Hammerschmidt, &
McMahon, 2014) that the three principles can be applied to the Oracle RDBMS Server with relatively
little effort. With the approach proposed in (Liu, Hammerschmidt, & McMahon, 2014), an RDBMS
can manage both relational data and JSON data in one platform, where SQL is used with an embedded
JSON path language as a single declarative language to query both relational data and JSON data.
To support SQL/JSON standard defined operations effectively and efficiently, the design approaches
to store, index, query JSON in the kernel of RDBMS are presented in (Liu, 2019). The issue how
JSON data model and classical relational model complement each other in a single RDBMS is further
addressed. Concentrating on the field of criminal data, in (Piech & Marcjan, 2018), a solution for
storing open schema data in a relational database is proposed to support JSON as a native type. The
proposed solution is implemented in two relational databases PostgreSQL and MySQL.
Only valid JSON document is stored according to JSON validation rules. A bit complex part
is the JSON-specific functions. Being new and not very well understood, for complex queries and
analysis, their usage requires a lot of knowledge as compared to traditional SQL statements. In addition,
querying and handling the same attribute with multiple values also require a lot of the embedded
constructs, which are a part of the query for full data retrieval to avoid errors. For recursive structure,
it is difficult to manage a variation of many attributes, change in cardinalities among attributes and
large hierarchal structure. Last but not the least, there are many API readily available to work with
JSON. But, in general, API works well with text/String rather than binary data that requires Web
developers to do a lot of coding. SQL/JSON is the standard to query JSON data, but the provided
JSON specific functions and operations are not standardized. Several drafts by some organizations vary
from one relational system to other in terms of storing, updating, retrieving and handling attributes
with multiple data types in JSON document. With all these issues mentioned above, shredding and
saving JSON object is not only tedious but also requires a lot of time.
One idea that has not been instigated yet is the usage of JSON schema in addition to JSON
document in cross platforms. JSON schema describes the structure and can be used to validates JSON
document. JSON schema can assist to create relational schema, which is helpful in converting user
query into SQL parameters for information retrieval. Being simple and close to relational database
architecture, it will be easy to query JSON data by using traditional SQL operators from the created
relational tables with less overhead.

44
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Schema-Based Development of JSON in RDBMS


The approach proposed in (Chasseur, Li, & Patel, 2013) suggests to breakdowns JSON objects into
a path-value vertical relational table based on their primitive types, in which “Argo” is used on the
top of traditional RDBMS. “Argo” provides application/user a way to direct access JSON data over
the relational model along with SQL-based query language for JSON querying (called Argo/SQL
(Chasseur, Li, & Patel, 2013)). This approach consists of two major parts: “Argo Mapping Layer”
and “Argo/SQL”. Argo Mapping Layer maps JSON document to relational database and Argo/SQL
is used for querying JSON data. So, “Argo” provides a layer to fill a gap between JSON data and
relational system by mingling JSON’s flexible schema-less nature with the efficient query constructs
and ACID transactions of traditional relational systems (Chasseur, Li, & Patel, 2013). “Argo”
recommends shredding JSON objects into vertical relational tables, one table (for each primitive data
type) constituent of three columns to store key-path (keystr), value (keyval) and a unique identifier
(keyId). Here, for hierarchical data like objects and array, key flattening approach is used. Parent key
along with separator character “.” is used to store the nested objects and arrays in “keystr” column as
path expression and value in “keyval” column of the table. Arrays are stored with indexes (enclosed
by square brackets) to keep a record of its position. Some substantial issues may arise in this approach.
If JSON instance contains a large nested hierarchy with long attribute key names, the path value will
require large capacity that may exceed column design limit of RDBMS and querying underlying data
will become more complex and inefficient.
The approach in (Petkovic, 2017) maps whole JSON document in one table for all primitive types
and then manages hierarchical objects and array through key-path value in a single table. Attributes
are stored as keystr and their values are stored in basic JSON types as columns valstr for TEXT,
valnum for DOUBLE PRECISION, and valboolean for BOOLEAN. If the value of any of three types
does not exist, the columns of that type contain Null (Petkovic, 2017). For queries, this approach
introduces a language called “Argo/SQL”, which converts SQL queries to JSON format to store and
retrieve JSON objects. The approach is good if instances of JSON document hold multiple types of
values against one attribute property. Note that storing all in one table may not only increase in size
of the table but also cause wastage of storage space due to null values in each row. In addition, nested
objects and arrays may cause a complex key-path value and be difficult to be retrieved.
In this paper, we articulate JSON schema to help as a bridge to fill the gap between NoSQL
document stores and relational systems. Our schema-based development of JSON in RDBMS uses
several tables rather than a single table in (Petkovic, 2017). Like our approach in this paper, three tables
with separate types (two variants of the approach) are used in (Chasseur, Li, & Patel, 2013). But we
use completely different tables as compared to the tables in (Chasseur, Li, & Patel, 2013), consider
data types in the mapping from JSON to RDBMS, and develop detailed processing algorithms. Also,
our approach is different from the approach in (Chasseur, Li, & Patel, 2013) because we explicitly
consider JSON schema in addition to JSON instance storage in this paper.

JSON SCHEMA TO RELATIONAL SCHEMA

JSON schema is applied to defines, describes and validates JSON document. In this paper, we
articulate an approach that uses JSON schema with RDBMS. We aim to combine the best features
of RDBMS with JSON to gain supremacy. For this vision, we suggest using the descriptive nature of
JSON schema to create a relational schema for the management of the JSON document. We develop
an algorithm to transform JSON schema into a relational schema. This approach constitutes a set of
principles that define the details for such a transformation. For the clarity of concept, we take the
JSON schema in Figure 1 and transform it into a relational schema systematically. We also take the
sample JSON data in Figure 2 and store it in the created schema.

45
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Figure 1. Sample JSON schema file

JSON to Relational Schema Extraction


JSON schema to relational schema extraction has three folds. The first one is to create metadata tables.
The second one is to parse the JSON schema file and save the extracted information in metadata tables.
The third one is to create relational tables by reading these metadata tables. To create a relational
schema from JSON schema, we first create metadata tables “Relational Structure Master Table”
(RS_Master_Table) and “Relational Structure Child Table” (RS_Child_Table)”. Their structures are
shown in Table 1 and Table 2, respectively. Collectively we call them “Relational Structure Tables
(RS_Tables)”. Then, we use these RS_Tables to build the structure of our relational schema extracted
from JSON schema file. The most important step is to parse and extract metadata information from
JSON schema to create a relational schema. For this purpose, we parse JSON schema file line by
line, resolve all reference pointers ($ref) and special keys (‘oneOf’,’allOf’,’anyOf’), and store the
structure information in RS_Tables. We keep details with respect to the category in Relational
Structure Master Table (RS_Master_Table) and attribute wise details in Relational Structure Child
Table (RS_Child_Table). Once the structure is saved in RS_Tables, we can query these RS_Tables
to create tables as well as relationships between the tables.

46
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Figure 2. Sample JSON data file

Table 1. Relational structure master table (RS_Master_Table)

Table Columns Mapping From JSON Schema Details of Columns


RS_ID ---------- Primary key of the table
Level ---------- Hierarchy of the object
objectName Object Title Title of the root object
tableName Object Short Name Short alias of the root object
attributeCategory Object,Array,Singleton Object, Array, Singleton
parentLevel ---------- Level of the parent or container attribute
pkColumn ---------- Title of the primary key column

47
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Table 2. relational structure child table (RS_Child_Table)

Table Columns Mapping From JSON Schema Details of Columns


RS_CID ---------- Primary key of the child table
attributeName Attribute Title Name of the attribute
columnName Attribute Short Name Short alias of the attribute
attributeDataType Attribute Data Type Mapped SQL data type
requiredAttribute Required Constraint Required constraint
RS_ID ---------- Foreign key of RS_Master_Table

For the clarity of concept, we take a sample JSON schema file in Figure 1 and map it
to relational structure tables (RS_Tables) given in Table 3 and Table 4, respectively. To
elaborate it further, a pictorial presentation is also given in Figure 3. A separate table is not
required for container objects that just hold other objects and do not possess any attribute.
“Address”, for example, is a container object that does not have any attributes but just holds
Temporary_Address and Permanent_Address objects.
Table 3 is created by mapping the sample JSON schema file in Figure 1. It stores all singleton
attributes and some information about objects that further contain objects and attributes. Value
“Personal” of “objectName” column is the root object that holds all singleton attributes, hierarchical
objects, and arrays. Value “null” of “parentLevel” column and value “0” of “level” column show that
it is a root object. Value “0” of “parentLevel” column shows that the attribute is contained directly
in root object and value ”1” shows that it is contained within the object with value “0” in “Level”
column and so on. To display the result to the user in JSON file format, “objectName” column defines

Table 3. RS_Master_Table mapped from sample JSON schema

RS_ID Level objectName tableName attributeCategory parentLevel pkColumn

1 0 Personal personalTbl Object null persID

2 1 Address addressTbl Object 0

3 2 Temporary_Address tempAddressTbl Object 1 tempAddID

4 2 Permanent_Address permAddressTbl Object 1 permAddID

5 1 Kids kidsTbl Array 0 kidsID

Table 4. RS_Child_Table mapped from sample JSON schema

RS_CID attributeName columnName attributeDataType Required RS_ID


1001 Name name varchar True 1
1002 Age age number False 1
1005 city city varchar False 3
1006 Street_Address streetAddress varchar False 3
1005 City city varchar False 4
1006 Street_Address streetAddress varchar False 4
1007 Kids kids varchar False 5

48
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Figure 3. Relational schema created from JSON schema

attribute name, and “attributeCategory” column identifies the types of braces that need to be used
according to the category of attributes. Column “level” and “parentLevel” are used to reconstruct
the hierarchy of objects. It is helpful to show output to the user according to the structure defined in
the JSON schema file.
To keep the detail of all objects, we create a child table shown in Table 4, which consists
of singleton attributes, objects, container objects and arrays. “RS_CID” is the primary key of
child table, which is automatically generated and incremented. Column “attributeName” shows
the actual attribute title presented in JSON schema file and “columnName” shows the alias of
that attribute. We use the alias of attributes to reduce long attribute name and avoid conflicts
with keywords of the database. We map and reuse “attributeName” while displaying results to
the user. Column “Required” preserves the constraint that this attribute is mandatory to retrieve
any information. Column “RS_ID” is a foreign key of the master table shown in Table 3, which
links the container attributes to its parent attribute.

Mapping Principles
JSON schema file defines a variety of constructs to describe all properties of JSON document. We
use these constructs to create tables and their relationships by following the basic structural definition
of a relational schema. We parse JSON schema and keep all information in RS_Tables. Then, we
read RS_Tables to create tables and their relationships with the parent table as per their level in the

49
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

hierarchy. In RS_Master_Table, the column “Level” defines the hierarchy of tables. Here, level “0”
defines the master table, level “1” defines the child table, level “2” defines the child of the child table
and so on. The variety of keywords JSON schema holds.

Properties Keyword
Properties keyword encompasses name, type, format, and restraints of the attributes in JSON document.
We use “name” to define column name, “type” to define column type and restraints to define
database constraints on the columns. Restraints predefine validation keywords for every data type in
JSON schema. There are structural and semantic validation restraints3for numeric (e.g., minimum,
exclusiveMinimum, maximum, exclusiveMaximum, multipleOf), String (e.g., maxLength, minLength,
pattern), arrays (e.g., maxItems, minItems, uniqueItems, additionalItems, contains, items), objects
(e.g., maxProperty, minProperty, required, properties, pattern properties) and so on (Wright, Andrews,
& Lu, 2016). They are used to validate JSON document instances. Note that it is not required to map
all validation rules in the relational schema as we only store valid JSON document (the document is
valid if it fulfills all restraints).

Type Keyword
To restrict the type of attributes in JSON schema, it is fundamental to use the “type” keyword.
This type attribute depicts the type of value that a specific attribute/key can accept (Baazizi,
Colazzo, Ghelli, & Sartiani, 2019). We use the specification of “type attribute” to map
JSON data types to SQL database types in Table 5. String type object can have multiple
values according to a defined format like DATE, TIME, DATETIME and so on. We define
the data type of columns in the table by keeping in view the format described in the JSON
schema file. This SQL data type mapping table may vary as per relational database vendors
like MySQL, PostgreSQL and etc.

Reusability by $ref Pointer


Software development process recommends reusability for developing, updating, maintaining and
timesaving. In JSON schema, the definition of one attribute is used to define other attributes that
hold almost the same properties. For example, “Temporary_Address” and “Permanent_Address” may
have the same attributes and definition. Instead of defining both twice, the definition of one object
can be used to define other attributes using “$ref” pointer. We parse and use this pointer to define
the columns of the same structure in RS_Tables.
A pictorial presentation of the above-mentioned mapping is given as per Sample JSON Schema
file defined in Figure 3.

Table 5. Mapping of JSON data types to SQL data types

JSON Data Types SQL Data Types


String VARCHAR
Number NUMBER
Boolean BOOLEAN
String (Format: Calendar dates in ISO 8601) DATE
String (Format: Time zone designators in ISO 8601) TIME
String (Format: JDBC timestamp) TIMESTAMP
String (Format: Combined date and time in ISO 8601) TIMESTAMP

50
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Relational Schema Creation From JSON Schema


We parse JSON schema file line by line and then store all information like attribute names, their data
types and their hierarchies in RS_Tables shown in Table 3 and Table 4. After the successful mapping
of JSON schema in RS_Master_Table and RS_Child_Table, we can create a relational table structure.
We read “tableName” from RS_Master_Table, columns “attributeName” and “attributeDataType’
from RS_Child_Table row by row and then generate SQL statements. We generate SQL statements
to create and link the tables by reading and concatenating columns from RS_Tables row by row. We
use SQL statements to create relational tables and maintain their relationship. RS_Tables not only
makes creation and linking of tables easy but also helps in querying and displaying data in JSON
format. Algorithm 1 describes all the procedure to create a relational schema from the JSON schema.
Algorithm 1 has three folds. First, it creates a relational structure tables (RS Tables), which are
RS_Master_Table and RS_Child_Table. Second, it parses the JSON schema file line by line and
inserts it into the created RS Tables. Third, it describes the steps to create relational tables and their
relationships from RS_Master_Table and RS_Child_Table shown in Table 3 and Table 4, respectively.
In the following, we explain Algorithm 1 in detail.
First, Algorithm 1 applies openConnection() function to establish database connection and then
create RS_Master_Table and RS_Child_Table as shown in Table 1 and Table 2, respectively. Second,
it uses function parseAllReferences() to read a JSON schema file line by line and parses all reference
pointers ($ref) defined in the same file. In addition, function resolveSpecialKeys() is used to resolve
special keys (‘oneOf’,’allOf’,’anyOf’) in JSON schema file. Function saveAttributesDetails() is used
to store the category wise details in RS_Master_Table and attribute wise details in RS_Child_Table.
Function saveAttributesDetails() also maps JSON data types to SQL data types accordingly as
shown in Table 5. Lastly, Algorithm 1 creates a relational schema from the collected information
in RS Tables. It gets the total numbers of rows in RS_Master_Table using function getRowCount().
It reads rows in a loop and gets details like RS_ID, table name and primary key from RS_Master_
Table using functions getRSID(recMaster), getTableName(RS_ID) and getPriamryKey(RS_ID),
respectively. After reading one record from RS_Master_Table, it passes primary key RS_ID into
function getChildRowCount(RS_ID) to retrieve attributes information. Also, it concatenates the
attributes in a query string that generates a SQL statement to be executed by function createTable().
Then, relationship among tables are generated. Here, parent level and child level columns are used
along with the primary key to generate relationships among tables. For this purpose, function
getParentLevel(RS_ID) and getLevel(RS_ID) are applied to get parentLevel and child level from
RS_Master_Table according to input parameter RS_ID. Function createRelation(parentLevel, level)
takes these levels as parameters to create relationships among the created tables using SQL statement.
In this way, all tables and their relations are created one by one. Primary key is generated through
auto sequence generator. For a glance and schema understanding, showMapping() function is used
to display a file for the user, showing the mapping of attributes from a JSON schema file into the
form of a created relational schema. After the creation of whole schema, function closeConnection()
is applied to close open database connection. We use various functions to perform some simple
operations for the reusability point of view. We can also get “tableName”, “primaryKey”, counts and
etc. by simple codes in the same function too.

JSON INSTANCE OPERATIONS PRINCIPLES

After the creation of relational schema from JSON schema, we can store JSON document instance by
instance in the related master and child tables according to the structure defined in RS_Master_Table
and RS_Child_Table.

51
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Algorithm 1. Creation of relational schema from JSON schema

Algorithm: createRelationalTables
Input: JSON Schema
Output: Creation of relational tables according to JSON Schema
//create RS Tables
1. openConnection()
2. createRSTables()
//Parse and save information in RS Tables
3. fLength=openJSONSchemaFile()
4. for fRec in 1 to fLength do
5. parseAllReferences()
6. resolveSpecialKeys()
7. saveAttributesDetails()
8. end loop
// create relational schema
9. recMasterCount = getRowCount()
10. for recMaster in 1 to recMasterCount do
11. RS_ID = getRSID(recMaster)
12. tableName = getTableName(RS_ID)
13. primaryKey = getPriamryKey(RS_ID)
14. recChildCount = getChildRowCount(RS_ID)
15. String queryStr =”Create Table” + tableName + ”(“ + primaryKey +” Primary Key, ”
16. for recChild in 1 to recChildCount do
17. getAttributeName = getAttributeName(RS_ID, recChild)
18. getDataType = getAttributeDataType(RS_ID, recChild)
19. queryStr = queryStr + getAttributeName +” ” + getDataType
20. if recChild < recChildCount then
21. queryStr = queryStr+ “, ”
22. end if
23. end for
24. queryStr = queryStr+”);”
25. createTable(queryStr)
26. parentLevel = getParentLevel(RS_ID)
27. level = getLevel(RS_ID)
28. if parentLevel<> null then
29. createRelation(parentLevel,level)
30. end if
31. end for
32. showMapping
33. closeConnection()

52
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

JSON Instance Storage Principle


With our suggested approach, after the creation of relational tables and their relations, we parse and
insert whole JSON document instance by instance into the created schema according to the structure
defined in RS_Master_Table and RS_Child_Table. The attributes at level-0 are stored in the master
table, and the objects and arrays at level-1 or above are stored in the child tables along with the unique
ID, which is automatically generated by SQL SEQUENCE. Like database structure, unique ID or
reference key in each table is used for unique identification, which maintains the relationship of the
object as shown in Figure 4.
To store JSON document, we propose Algorithm 2 to parse and store all JSON objects in the
mapped schema. Once the JSON document are stored in the relational database, it becomes easy
to retrieve any information with ease, efficiency and security by using standard SQL operations.
An additional benefit is that all the relational database server features can be used to provide extra
security, scaling, consistency, and isolation. We validate JSON document with JSON schema before
it is stored it in the relational table.
Algorithm 2 takes a JSON data file and saves it in the database according to the mapping of attributes
in RS_Master_Table and RS_Child_Table, respectively. First, Algorithm 2 uses openConnection()
function to open database connection, and uses function getObjectNames() and getPKValues() to
respectively get the name of objects and RS_ID that is primary key of RS_Master_Table. In a loop,
it takes RS_ID one by one and gets array of attributesTitle[] by using function getAttributeTitle().
Function getAttributeName() reads JSON file and the title of attribute from RS_Child_Table. If the

Figure 4. JSON schema mapping to relational schema with sample data

53
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Algorithm 2. Saving JSON document in created relational schema

Algorithm: JSONDataToRelationalStorage
Input: Parameter JSONDataFile; JSON data file that user wants to save in created relational schema
Output: Status “File Saved” or “File cannot save” with an error message
1. openConnection()
2. objectName [] = getObjectNames()
3. RS_ID[]= getPKValues()
4. for rec 1… length(RS_ID[]) loop
5. attributeTitle[] =getAttributeTitle()
6. for recV 1… length(attributeTitle[])
7. Read JSON data file line by line
8. String attributeName = getAttributeName()
9. if attributeName exists in attributeTitle[] then
10. parseSaveValue()
11. attributeExists = true
12. end if
13. end loop
14. if attributeExists = true then
15. createRelation()
16. attributeExixts = False
17. end if
18. end loop
19. closeConnection()
20. return status true if transaction ended successfully else false with error message

title of attribute exists in an array of attributesTitle[], function parseSaveValue() parses the value
according to the data type mapped according to RS_Child_Table, and saves the value in the column
mapped according to attributeTitle[recV1] from RS_Child_Table. If any column populates in a row of
the table, we generate the primary key of the row. We set the value of Boolean variable attributeExists
to be true if the row exists. If attributeExists is set to be true, we can use function createRelation()
to generate a primary key and create relationship among rows with its parent table record by using
RS_ID and parent level according to values in RS_Master_Table. If JSON data are successfully stored
in the relational schema, we return a status like “JSON data stores are successful” to the user. Finally,
all database connections are closed by using CloseConnection() function. Algorithm 2 takes JSON
data file, stores it in the created relational schema, and returns storage status to the user.

JSON Instance Projection Principle


Projection in JSON document stores is the most complex part, especially for complex querying
and analysis due to complex query languages. For a simple task, it may work fine but for complex
queries, it requires MapReduce job every time (Liu et al., 2016). The main advantage to use JSON
with a relational database is querying flexibility due to powerful standard query language evolved
by the decade of research [12]. The user can query the table in any direction instead of following a
single hierarchical path query. In the classical RDBMS, it is required to use SQL/JSON path language

54
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

for JSON binary format. It looks like an extension of SQL that supports various function to handle
JSON data (Petkovic, 2017). However, it is not standardized yet and requires a lot of improvement for
storing, updating and querying JSON data in binary format as compared to standard SQL statements
(Petkovic, 2017).
Our approach for transforming JSON data in the relational model makes query simple by using
SQL standard functions. Like the relational databases, we use a simple mechanism for projection
with the SQL commands. Projection uses “select” statement to choose a list of attributes by column
titles or ‘*’ to project information of specific or all attributes vice versa. Predicates are used for
criteria-based retrieval, using “where” clause optionally. A predicate requires matching of master-child
reference id to fulfill criteria. We follow the basic relational database structure, and comparison, string
operations and pattern matching are hereby similar to database SQL query. Additionally, complex
queries, analysis, and data processing are also easy to be handled. We define a materialized view
based on all tables for fast querying.
To support our vision and perform all operations, we formalize transformation principles for
JSON query to traditional SQL statements. Our approach benefits the user in two ways: using the
result of a relational query directly and transforming the query results into JSON file format, and it
the same as the use of RDBMS. Once relational schema is created and JSON data are stored, we can
use JSON data as relational data, and query and perform all operations.
We propose an algorithm to recompose JSON document in-relation-to RS_Tables in mapping
schema from the stored JSON objects/instances. We query and save the result set in a temporary
table. This temporary table coexists with the session, drops as session terminates (Chasseur, Li, &
Patel, 2013) and inimitable to every connection. We use the temporary table along with RS_Tables to
convert information into JSON format. After querying, we iterate through records and save information
instance by instance in JSON file format for presentation to the user. Summarized detail of steps is
presented in Section 4.2.1.

Steps of Operating Principles to Present Data in JSON Format


Given below are the steps that we present user queried data in JSON format for better understanding
of the procedure and its effectiveness:

• Populating the temporary table with the result set according to the user criteria.

We can execute the query according to the user criteria and save the result set in a temporary table
(temporary tables drop as session terminates (Klettke, Störl, & Scherzinger, 2015) and inimitable to
every connection (Chasseur, Li, & Patel, 2013), and they thereby require less memory space). We
use a temporary table with RS_Tables to get the required results in JSON format. This temporary
table contains only distinct records with the primary key only. We use these keys to get information
like table name, level to check hierarchy, parent level to know container object from RS_Tables.

• Copying objects in JSON file by using a recursive function.

We can retrieve the required information by getting table name, level and parent level details from
RS_Master_Table and RS_Child_Table. To this end, we select attributes from the child tables by using
a temporary table and RS_Child_Table. We apply the recursive function “relationalToJSON” to query
and iterate through a table by the table from RS_Master_Table and row by row from RS_Child_Table.
We concatenate braces “{,}” for objects and “[,]” for arrays according to the category of the attribute
stored in RS_Master_Table. We save information instance by instance in JSON file format.

• Displaying the resultant JSON file to the user.

55
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

It is needed to display the file to the user and close all databases and their connections if no further
processing is required. The user can view resultant information in relational and JSON formats. At
this point, data is already in relational format and so it is easy to display data to the user. Algorithm
3 describes how to display the queried data in JSON file format.
To project the whole document, for example, we iterate through tables and generate a file in
JSON format. We read the tables and their levels from RS_Tables to display objects in a proper
hierarchy. We get reference id or primary key id from the table at the level-0. Reference Id of the
level-0 table is used to retrieve the value from level-1 tables. Reference id of level-1 and level-0 tables
facilitate to query level-2 tables and so on. A full iteration of one record from the level-1 to last-level
describes one complete JSON instance along with all hierarchical objects. We store this record in
JSON file format and concatenate all objects in the same pattern one by one until record set ends. A
generalized algorithm to traverse through all levels is given in Algorithm 3. In addition, to display
specific columns, we can modify the algorithm by adding an array of the selected column as input
parameter, and pass it to getQuery() function.
Algorithm 3 first uses openConnection() function to open database connection and then takes
the input of the user query. Using functions saveColumnString() and getCriteraString() that keep
the record of selected columns (select statement) and criteria (from and where clause) of query,
respectively, it divides the query into two variables columVar and criteriaVar. Function getTableNames
takes these two variables as a parameter to get the distinct names of tables in array tableArr[], which
are presented in the condition by using columnVar and the select statement by using citeriaVar. The
tableArr[] is stored according to the column level in RS_Master_Table in ascending order by using
sortable functions. This helps to display JSON attributes, array, and objects in a proper hierarchy.
Function concatinatePK() is used in the loop to concatenate all primary key columns of the distinct
tables, which are missing in the select statement of queryStr, and this helps to avoid duplications of
data. Note that the primary key function is added only for the column where an aggregate function is
not applied so that the addition of primary key column may not affect the user query. After successful
concatenation of the primary key columns, function executeQuery(queryStr, criteriaVar) checks
the aggregate function if it exists in a query, and then executes the query after the concatenation
of criteria in variable queryStr. This function also saves the executed query in a temporary table.
Function getResultSet() separates the record set according to tableArr[] in a double dimension array
tableResultSet[][] from the resultSet of the executed query. This double dimension array keeps the
record of table and column. Finally, we use a recursive function displayJSONFormat(), which takes
this double dimension array and displays resultSet in JSON file format by using array tableResultSet[]
[]. Recursive function displayJSONFormat() initially takes the input of a primary key column title,
a primary key value of the first record of the table and first table name from the array tableArr[].
After printing the first record of the first table, it recursively calls for the records of the second
table that links other tables with the primary key passed as an input parameter. After completion of
the second table, the primary key of the second table is taken and a record of the table at level-3 is
displayed. In this way, the whole JSON instances are completely displayed one by one, excluding
the design details like the primary key column. In addition, according to the category of an attribute
defined in RS_Child_Table, opening is concatenated to a closing identifier such as “{,}” for objects,
“[,]” for an array, semicolons “:” and comma “,”. Function displayFile() shows the resultant JSON
data file to the user. It closes open connections and objects to make free the used memory space by
using closeConnection(). If the user gives ‘*’ in a select statement, we take all the column of table
mentioned in criteriaVar. Last but not the least, the user can use the results of the query directly like
from any RDBMS and in JSON format file according to the requirement.
Note that, in some cases, we can get the same results by simply querying tables directly without
using materialized view and temporary tables. However, materializing view can enhance performance
for complex queries. While querying the materialized view, only reference ids are stored in the
temporary table, which does not require much space. The temporary table drops automatically when

56
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Algorithm 3. Transformation of user’s queried results into JSON file format

Algorithm: relationalToJSON
Input: Parameter queryParm; is the query that the user wants to execute and get a result in JSON format
Output: JSON file contained queried relational data result set in JSON Format
1. openConnection()
2. String columVar = getColumnString()
3. String criteriaVar = getCriteraString()
4. Array tableArr[] = getTableNames(criteriaVar, columVar)
5. sortTable()
6. For rec 1… length(tableArr[]) loop
7. String queryStr = concatinatePK()
8. End loop
9. resultSet[] = executeQuery(queryStr, criteriaVar)
10. for rec in 1… length(tableArr[]) loop
11. tableResultSet[rec][]= getResultSet()
12. end loop
13. pkColTitle = getPKColumn(tableArr[1])
14. plColVal = getPKValue(tableResultSet[1][1], tableArr[1])
15. openFile()
16. File jsonData = displayJSONFormat(pkColTitle, plColVal, tableArr[1]);
17. displayFile()
18. End
// recursive function to display records in JSON Format
19. function displayJSONFormat(pkColTitle, plColVal, (tableArr[])
20. tableLevel = 1;
21. pkColumn = getPkColumnTitle(tableArr[tableNo]);
22. for recA in 1… length(tableResultSet[tableNo][]) loop
23. for recV in 1… length(tableResultSet[tableLevel][recA]) loop
24. getCategory(tableResultSet[rec])
25. embed opening Identifiers according to the category of attribute
26. displayAttributes(tableResultSet[rec])
27. pkVal =getPKValRec(tableResultSet[rec]) vale of tableResultSet[rec]
28. displayJSONFormat(pkColumn, pkVal, tableArr[])
29. embed closing Identifiers according to the category of attribute
30. end loop
31. end loop
32. closeConnection()
33. end

57
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

the session terminates. With the temporary tables, it is not required to repeat predicate (criteria) for
every table. The user can directly use the query results like from RDBMS as well as in JSON format
file as per requirement.

JSON Instance Deletion and Updating Principles


Deletion in a relational structure is a bit complex because records may link with other records through
multiple tables. Deleting one record requires to delete corresponding records from child tables first
(if exist). Basically, we have two choices for deletion. First, we use CASCADE DELETE SQL
statement to delete records along with its child record. Second, we use a bottom-to-top approach to
delete corresponding records. As JSON data are transformed into a relational model, the user can
apply simple SQL statements to delete the records according to any criteria by using the cascade
delete or bottom-up approach. Here, the cascade delete is efficient as compared to the bottom-up
deletion approach.
Regarding the update in JSON document stores, major issues are duplicate data update due
to de-normalized storage that may corrupt the data or may cause a big issue for the database
administrator to handle it. For binary format in a relational database, all vendors do not fully
support update by its true meaning. Updating part of data or bunch of data is not allowed in
relational databases. For binary format, it is recommended to replace the updated object with
the existing one. There are some serious issues regarding update data as per storage size. For
MySQL, updating data with new data of the same size is all right. But it creates issues if it is
requiring more bytes than the existing data. Besides, update limitations vary for all vendors.
Update is a major issue that needs to be improved for JSON storage in binary format among
relational databases. It may corrupt the data too (Petkovic, 2017).

IMPLEMENTATIONS AND EXPERIMENTS

Our approach advocates to transform JSON schema into a relational schema by parsing JSON
schema file and then stores metadata information in relational structure tables as per the ideology
defined in the above sections. We transform a simple JSON schema file and a complex JSON
schema file, which includes self-reference ($ref) and special keys (‘oneOf’,’allOf’,’anyOf’),
into a relational schema. For the performance comparison with our approach, we performed
a comparison between the binary format of the NoSQL document store (MongoDB) and the
binary format of relational databases (MySQL). We evaluated results based on some simple and
aggregate queries on all mentioned formats. In the subsequent sections, we explain in detail the
category of the file used, SQL operations performed on the stored JSON data in the relational
database along with their evaluations.

Experimental Settings
We evaluated our models on Intel(R) Xeon(R) CPU E5-2620 v4 2.10GHz with 16 cores and 64GB
of ram running CentOS 7 operating system, equipped with an NVIDIA Tesla K40m with 12GB of
GPU memory along with 2880 CUDA cores with 1000.2 GB HDD. We used the Eclipse Framework
with the Python programming language to implement the systems. We used MySQL and MongoDB
with the client-server settings.
For the experimental point of view and proof of concept, we used the same of JSON schema
and JSON data file as mentioned in Figure 1 and 2, respectively. We generated a sample JSON data
using online data generation tool “mockaroo”4 by following the guidelines and pattern from other
database micro-benchmark (Melton, 2003). Details of generated sample data are as under that contain
String, Numeric, hierarchical objects and, array. We used arrays and hierarchical objects to check the
effectiveness of our approach at multilevel tables:

58
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

• Name: String type


• Age: Number type, range between 20 to 50
• Address: Nested object contains two more objects:
◦◦ Temporary_Address: Object with two attributes:
▪▪ Street_Address: String
▪▪ City: String
◦◦ Permanenet_Address: Object with two attribute:
▪▪ Street_Address: String
▪▪ City: String
• Kids: An array of type String with a number of items from 0 to 3

We saved JSON data in the relational database according to our approach. To evaluate the
efficiency, we measured the time by querying full data and partially selected data. We scaled up JSON
data from 1 million to 16 million JSON objects to show the results. One JSON instance/object means
a record of one person including all his/her information in a JSON document.

Evaluations of JSON Schema Mapping and JSON Document Storage


Our approach first parsed whole JSON schema and made structural tables (“RS_Tables”). We tested
varies JSON schema files according to their complexity and length. As shown in Table 6, time
difference is very minute to convert a simple JSON schema file and complex JSON schema file into
RS_Tables. Once RS_Tables was created, it just took a few seconds to create database tables and
store JSON data file.
Table 6 clearly shows that, with our approach, creating a relational structure from a simple JSON
schema file (either lengthy or not) is not much time-consuming. For, a complex JSON schema file, it
requires a little more time comparatively as it requires resolving references ($ref pointer) and special
keys (‘oneOf’,’allOf’,’anyOf’) using recursive functions.
For the storage of JSON document, we used a data generation tool to generate data sets. We
generated about 1 million objects and scaled them up to 16 million to perform analysis. We saved
these JSON objects in MongoDB BSON format, MySQL JSON Binary format and in RS Format
(our proposed approach). We performed the storage of 1 million objects to 16 million objects. The
detail results are shown in Table 7:

• NoSQL Binary Format: MongoDB stores whole JSON data instances by an instance in a
collection. The storage of JSON objects are a little more efficient as compared to relational binary
format and RS format. It stores objects in the form of collections and a collection is a combination
of similar documents. If the user does not create a collection, it is created automatically by
insertion of an object. Insertion of JSON is a bit faster or equal to MySQL binary JSON format;
• Relational Binary Format: Storage of JSON document is easy in relational binary format. It
is required to filter and save objects one by one in rows. Most of the database administrators

Table 6. Creation of relational schema from JSON schema

JSON Schema File


File Type Simple Complex
Time Taken 4.53s 6.57s
The simple file contains no reference pointers and special keys
The complex file contains reference pointers ($ref) and special keys (‘oneOf’,’allOf’,’anyOf’)
Measuring unit: Time = seconds
File Size = 0.003 Megabytes (MB)

59
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Table 7. Storage time of JSON objects in various formats

1 Million Objects 4 Million Objects 16 Million Objects

NoSQL Relational NoSQL Relational NoSQL Relational


RS RS RS
Binary Binary Binary Binary Binary Binary
Format Format Format
Format Format Format Format Format Format

Storage 14.52 16.91 19.62 52.15 53.38 65.85 187.35 193.27 208.71

Measuring Time= seconds

make a script file that contains all statements of operations. By running the whole scripts, the
storage is completed at once, and this can save time and avoid ambiguity. Although storage of
binary format is easy, as shown in the following subsection, performing SQL operations is a bit
complex and varies for different RDBMS vendors;
• RS Format: Our approach is a little time taking as compared to two approaches mentioned above.
For our approach, saving JSON data in relational format is accomplished through Algorithm 2,
which parses the whole JSON data file, stores values of attributes in related tables and creates
relationships. It does not require separating all objects one by one and writing script to execute.
It takes a JSON document and stores it in a created schema. Storage time also depends on the
size of JSON file, size of JSON objects, and complexity of JSON data file number of JSON
objects. Once JSON objects are stored as relational data, our suggested RS format can perform
many database operations.

Experimental Queries and Criteria


To test the performance of JSON data on MongoDB BSON format (called NoSQL Binary Format),
MySQL binary format (called Relational Binary Format) and our proposed approach (called RS
Format), we defined 14 queries to perform different operations of selection, update, deletion, and
insertion. We performed various operations on 1 million, 4 million and 16 million JSON objects.
We executed all queries with different parameters three times. Evaluation results showed in Table 8
depicts the average result of these queries.

Projection Query Criteria


For projection, we set four criteria C1, C2, C3, and C4 as follows:

C1: This query projects whole objects including singleton attributes, hierarchical objects and array
without applying any criteria.
C2: This query projects a small subset of data, including “Name” and “Age” attributes of String and
Numeric data type, from all the objects in the dataset to test the efficiency of simple queries to get
root level singleton attributes. We used predicate/criteria (where clause) with respect to categories
one by one. First, we used a predicate of singleton attributes (“Name” and “Age”) then objects
(“City” of “Temporary_Address” and “Permanent_Address”) and lastly on arrays (“Kids”).
C3: This query projects two singleton attributes (“Name” and “Age”) and two attributes from the
nested 3. object, including “Street_Address” and “City” from “Temporay_Address”, to test the
results of attributes on various hierarchical levels. We used predicate/criteria (where clause)
category wise one by one. First, we used predicate of singleton attributes (“Name” and “Age”)
then objects (“City” from “Temporary_Address” and “Permanent_Address”) and lastly on arrays
(the number of “Kids” equal to 3).
C4: This query projects two singleton attributes (“Name” and “Age”) and “kids” data from the
array. The number of kids in the array varies from 0 to 3 for all objects. This query evaluates the
efficiency of various approaches on array data. We used predicate/criteria (where clause) with

60
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

categories one by one. First, we used predicate of singleton attributes (“Name” and “Age”) then
objects (“City” from “Temporary_Address” and “Permanent_Address”) and lastly on arrays
(number of “Kids”).

Selection Query Criteria


For selection, we selected about 0.1% of data among 1 million, 4 million and 16 million JSON objects
by using three criteria C5, C6, and C7 as follows:

C5: This query depicts the count of persons of all ages by using criteria on the predicate of “Age” to
check the performance of aggregate function on singleton attributes (“Age”).
C6: this query displays the objects by using criteria on the predicate of hierarchical objects
“City” contained in “Temporary_Address” object to check the performance of aggregate
function on hierarchical objects. It shows the full details of all objects from the most
populated “City” by applying predicate of maximum persons living in the city from
“Temporary_Address”.
C7: This query shows the full details of all objects by using criteria on the predicate of arrays, where
the total number of kids of a person are equal to 3 and predicate on hierarchal object “City” of
“Temporary_Address” where city is the most populated to check the performance of aggregate
function on arrays and objects.

Update Query Criteria


For the update, we updated about 0.5% of data among 1 million, 4 million and 16 million JSON
objects according to the criteria C8, C9 and C10 as follows:

C8: This query updates singleton data at the root level by using the exact criteria on the predicate of
“Name” and “Age” to check the performance of partial update operation.
C9: This query updates the name of hierarchical attributes “City” contained in “Temporary_
Address” object by using a predicate on the same hierarchical object “City”. An update is
performed on a bunch of data simultaneously to test the results of an update on multiple
hierarchical levels.
C10: This query updates data of array that are the names of “kid” by using the predicate of a singleton
attribute “Name”. This query tests the results of an update on the data of arrays.

Deletion Query Criteria


For deletion, we set three criteria C11, C12 and C13 as follows:

C11: This query deletes the single attribute age by using the exact criteria on the predicate of the
“Name” and “Age” columns to check the performance of partial delete operation.
C12: This query deletes hierarchical attribute “City” contained in “Temporary_Address” object by
using a predicate on singleton root attribute “Age”. A deletion is performed on a bunch of data
simultaneously to test the results of deletion of attributes at various hierarchical levels collectively.
C13: This query deletes the whole record on the aggregate count on kid column, where kids are
equals to 3 from the array data.

Insertion Query Criteria


For insertion, we set one criterion C14 as follows. We inserted about 0.1% of data for 1 million, 4
million and 16 million objects:

61
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

C14: This query adds the bulk of JSON objects in data to check the performance of the
insertion operation.

Evaluations of Queries
Query criteria C1 to C4 shows the projection of JSON objects. Query criteria C5 to C7 shows the
selection by using a different parameter and predicate on multiple tables. Query criteria C8 and C10
show the update results. Query criteria C11 to C13 show the deletion operation and their results.
Query criteria C14 shows the insertion of 0.1% of total JSON objects. We evaluated the query
criteria (C1 to C14) based on their average execution time. As shown in the table, we took JSON
in various file formats like NoSQL binary, relational binary and RS Format (JSON converted to
traditional relational structure). We applied the 14 query criteria on 1 million, 4 million and 16
million JSON objects that contain string, numeric, array, and hierarchical objects, respectively.
For the evaluation point of view, the performed queries show that our approach is
significantly better, plausibly because once relational schema created, manipulation and
analysis on data is very efficient. Besides, aggregate and complex queries are difficult and
time taking in binary format. The summary of the experimental results is stated in Table 8,
which shows some major differences regarding the performance of various query criteria on
different file formats with a different number of objects. We performed each query three time
by using random parameters and displayed their average result mentioned in Table 8. Details
of each operation on all mentioned formats are as follows.

Projection
Projection of JSON objects in both NoSQL and relational binary formats requires to display all objects
by iterating through the rows one by one. For RS format, it requires to iterate through multiple tables.
Performance evaluation of projection of JSON data is shown in Figure 5:

Table 8. Performance evaluation using JSON in various formats

1 Million Objects 4 Million Objects 16 Million Objects

NoSQL Relational NoSQL Relational NoSQL Relational


RS RS RS
Binary Binary Binary Binary Binary Binary
Format Format Format
Format Format Format Format Format Format

Projection C1 9.26 10.05 20.65 35.84 38.89 79.92 137.25 148.96 306.07

C2 10.05 11.16 21.59 38.90 43.19 83.55 148.96 165.41 320.01

C3 11.95 12.51 24.81 46.34 48.41 96.01 177.12 185.42 367.74

C4 12.13 13.68 23.59 46.94 52.94 91.29 179.79 202.77 349.65

Selection C5 2.81 2.05 0.38 10.87 7.93 1.47 41.65 30.39 5.63

C6 3.52 3.11 0.49 13.62 12.04 1.89 52.17 46.10 7.27

C7 5.65 4.47 0.53 21.87 17.30 2.05 83.74 66.25 7.86

Update C8 0.38 2.56 0.07 1.47 9.91 0.27 5.63 37.94 1.03

C9 1.56 4.13 0.15 6.04 15.98 0.58 23.12 61.21 2.22

C10 1.89 3.5 0.17 7.32 13.54 0.66 28.01 51.87 2.52

Deletion C11 1.51 1.49 0.09 5.84 5.77 0.35 22.38 22.08 1.33

C12 3.6 4.2 2.76 13.93 16.25 10.68 53.36 62.25 40.91

C13 0.28 0.39 0.31 1.08 1.50 1.19 4.15 5.78 4.59

Insertion C14 0.04 0.04 2.82 0.15 0.15 10.91 0.59 0.59 41.78

Measuring unit: Time = seconds

62
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Figure 5. Projection of JSON data

• NoSQL Binary Format: The NoSQL binary format outperforms two other formats in the
projection of all JSON objects without applying any condition (query criteria C1) and projection
of hierarchical data (query criteria C3);
• Relational Binary Format: The relational binary format is performed equally or a little less
in the projection of whole JSON document. However, it is performed better in the projection of
array data (query criteria C4) as compared to two other formats;
• RS Format: The RS format outperforms two other formats in the projection of singleton attributes
(query criteria C2) without applying any condition. However, for other criteria (C1, C3, and C4),
it requires to project full object, hierarchical objects and array, respectively, and make multiple
joins. This makes it run slower than two other formats. At this point, indexing techniques and
the use of materialized views can make it efficient.

Selection
For the selection, we applied query criteria C5, C6 and C7 to test simple and complex conditions for
all three formats, where aggregate functions on singleton attributes, hierarchical objects, and array are
used, respectively. Each criterion is applied to almost 1% of total JSON objects for instance 1 million,
4 million and 16 million. Performance evaluation of selection of JSON data is shown in Figure 6:

• NoSQL Binary Format: For aggregate queries from C5 to C7, NoSQL binary format is
the slowest;
• Relational Binary Format: The relational binary format for all aggregate queries from C5 to
C7 is a bit faster as compared to NoSQL binary format but slower as compared to RS format;

Figure 6. Selection of JSON data

63
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

• RS Format: The RS format outperforms two other formats in all aggregate queries from C5 to C7
due to the use of standard SQL functions. The criteria-based queries on the relational database are
simple by using standard built-in SQL functions. RS format is performed efficiently, especially
for complex queries and data analysis.

Update
For the update, we used queries criteria C8 and C10. Query C8 updates only singleton attribute
(column wise) at root level. Queries C9 and C10 update a bunch of data in various rows from root to
child at different levels by updating hierarchical objects and arrays. We repeated all operations on three
queries and show the average results. Each criterion is applied to almost 0.5% of 1 million, 4 million
and 16 million JSON Objects. Performance evaluation of update of JSON data is shown in Figure 7:

• NoSQL Binary Format: Update operation of the NoSQL binary format is faster than the relational
binary format. All queries from C8 to C10 are performed better than the relational binary format
but not much as compared to the RS Format;
• Relational Binary Format: The relational binary format is not performed better as compared
to two other formats. For a partial update, sometimes it requires to replace a new instance from
the existing one. It requires to use multiple functions to perform a simple update on a single
instance or a bunch of objects;
• RS Format: For the RS format on queries criteria C8 to C10, updating singleton attributes,
multiple objects and arrays are quite easy by using simple SQL queries on specific tables.
Therefore, the RS format outperforms two other formats on update operation.

Deletion
We performed delete operation in two ways. The column wise deletion deletes attribute values from
objects. The row wise deletion deletes a whole instance. The query criteria C11, C12, and C13 delete
the singleton attribute values, hierarchical objects, and data from arrays, respectively. Each criterion
is applied to almost 1% of 1 million, 4 million and 16 million JSON objects. Performance evaluation
of deletion of JSON data is shown in Figure 8:

• NoSQL Binary Format: For query criteria C11, the column wise deletion on singleton attribute
in NoSQL binary format is almost the same as in the relational format, but a little faster as
compared to the relational binary format and slower as compared to the RS format. The same
results are also for query criteria C12. For query criteria C13, the NoSQL binary format is the
slowest as it uses aggregates function;

Figure 7. Update of JSON data

64
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Figure 8. Deletion of JSON data

• Relational Binary Format: Deletion in relational binary format for query criteria C11 to C13
is the slowest among all the databases;
• RS Format: Cascade delete operation deletes all relevant records quickly from all the related
child tables. Its execution time is fast as compared to binary NoSQL and binary relational format.
For query criteria C11 to C12, RS format outperforms two other formats.

Insertion
We performed insertion of 1% of total JSON objects on 1 million, 4 million and 16 million JSON
objects. Performance evaluation of insertion of JSON data is shown in Figure 9:

• NoSQL Binary Format: The NoSQL binary format outperforms two other formats in the
insertion of new records for query criteria C14;
• Relational Binary Format: Insertion in the relational binary format is also fast as it just requires
to insert a new row instead of populating many tables for criteria C14;
• RS Format: For the RS format, insertion of a new instance requires to populate all relevant
tables and this is a bit time- consuming. But once it is completed, all operations become easy by
using standard SQL constructs and are performed better than other approaches.

THEORETICAL AND PRACTICAL CONTRIBUTIONS

Both NoSQL and RDBMS are used in different scenarios and they have their own features, limitations,
and usage in different applications. The developer needs to learn both of them to enhance productivity.
Integration of various approaches will be more beneficial for the developer to gain the best of all

Figure 9. Insertion of JSON data

65
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

in single platforms. JSON document stores in NoSQL lack powerful query and ACID transaction
features, which do not provide explicit locks, and have weaker concurrency and atomicity properties
as compared to traditional ACID-compliant databases. As a result, the data analysis and processing
are very challenging and burden on the developer. For this reason, RDBMS are applied to support
JSON in binary format with SQL functions (i.e., SQL/JSON). However, these functions are not
standardized yet and vary across the vendors along with different limitations and complexities. More
importantly, complex search, partial update, composite queries, and analysis are cumbersome and
time taking in SQL/JSON as compared to standard SQL operations. It is essential to integrate JSON
into databases that use the standard SQL features, support ACID transactional model, and has the
capability of managing and organizing data efficiently.
In this paper, we empower JSON to use relational databases for analysis and complex
queries. We reveal that the descriptive nature of JSON schema can be utilized to create a
relational schema for the storage of JSON document. Then the powerful SQL features can
be used to gain consistency and ACID compatibility for querying JSON instances from the
relational schema. The approach proposed in this paper explicitly considers JSON schema
in addition to JSON instance storage. Our approach uses multiple tables, considers data
types in the mapping from JSON to RDBMS, and develops detailed processing algorithms.
We quantitatively compared our approach with JSON file format and binary format. For
the storage of all JSON objects, NoSQL binary format and relational binary come first
as compared our approach (RS format). For selection, update and deletion, RS format is
performed better, especially for complex aggregate queries. In addition, insertion of new
objects is efficient in NoSQL binary format and relational binary format as compared to RS
format. We can further enhance our processing algorithms and introduce indexing technique
to make our approach more efficient in the near future.

CONCLUSION AND FUTURE WORK

NoSQL document database, which is an evolving technology to support JSON, still lacks
powerful standard query language to fully support ACID transaction model. In addition, learning
and adapting a new technology require time and may affect software reusability too. Relational
database systems are the main building blocks of many applications, which have been evolved
and enhanced through decades of continued research efforts, and have powerful standard query
language SQL, numerous skilled experts, abundant utilities, libraries, and API, and etc. In this
paper, we propose an approach for using JSON with relational databases. We invent an approach
to utilize JSON schema to bridge JSON and relational database system, including the creation
of relational schema from JSON schema and transformation of JSON data into relational data
as well as retrieval, addition, deletion and updating of the stored JSON data with standard SQL
queries. Once the relational schema is created and JSON data are stored in relational format,
querying and displaying of data become very easy because standard SQL queries can be used
to handle both simple and complex criteria. Queried results can be displayed both in relational
and JSON file formats. In addition, all RDBMS features can be used for efficiency, accuracy,
up-to-date-data access and ACID transaction, and etc. Our approach provides fast development
by JSON, consistency by relational databases and efficient querying by SQL all in one to achieve
data management efficiently.
For future guidelines, there are many directions for improving the whole progression.
Mapping JSON schema to the relational schema can be enhanced and the proposed algorithms
can be improved from the developer perspective to make them more efficient. Our research for
creating intermediate mappings of JSON to relational schema only considers data definition and
data manipulation components. Also, there are some other issues to be investigated in future

66
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

research. The definition of JSON schema should be normalized before creating a relational
schema. Considering user’s roles and access rights, security issues in the transformation of
JSON schema into relational schema need to be addressed. In addition, other services of
relational databases such as data backups, replication and etc. are important research problems
to be solved in the future.

ACKNOWLEDGMENT

This work was supported in part by National Natural Science Foundation of China (61772269,
61370075, 61572118). Zongmin Ma is a corresponding author on this paper.

67
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

REFERENCES

Anderson, J. C., Lehnardt, J., & Slater, N. (2010). CouchDB - The Definitive Guide: Time to Relax. O’Reilly.
Atzeni, P., Jensen, C. S., Orsi, G., Ram, S., Tanca, L., & Torlone, R. (2013). The relational model is dead, SQL
is dead, and I don’t feel so good myself. SIGMOD Record, 42(2), 64–68. doi:10.1145/2503792.2503808
Baazizi, M. A., Colazzo, D., Ghelli, G., & Sartiani, C. (2019). Schemas and types for JSON data. In Proceedings
of the 22nd International Conference on Extending Database Technology (pp. 437-439). Academic Press.
Baazizi, M. A., Lahmar, H. B., Colazzo, D., Ghelli, G., & Sartiani, C. (2017, March). Schema inference for
massive JSON datasets. In Proceedings of the 20th International Conference on Extending Database Technology
(pp. 222-233). Academic Press.
Beyer, K., Cochrane, R. J., Josifovski, V., Kleewein, J., Lapis, G., Lohman, G., ... & Truong, T. (2005, June).
System RX: one part relational, one part XML. In Proceedings of the 2005 ACM SIGMOD international
conference on Management of data (pp. 347-358). ACM. doi:10.1145/1066157.1066197
Bonetta, D., & Brantner, M. (2017). FAD.js: Fast JSON data access using JIT-based speculative optimizations.
Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, 10(12), 1778–1789.
doi:10.14778/3137765.3137782
Bourhis, P., Reutter, J. L., Suárez, F., & Vrgoč, D. 2017, JSON: Data model, query languages and schema
specification. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
Systems (pp. 123-135). ACM. doi:10.1145/3034786.3056120
Cánovas Izquierdo, J. L., & Cabot, J. (2013). Discovering implicit schemas in JSON data. In Proceedings of the
2013 International Conference on Web Engineering (pp. 68-83). Academic Press.
Cánovas Izquierdo, J. L., & Cabot, J. (2016). JSONDiscoverer: Visualizing the schema lurking behind JSON
documents. Knowledge-Based Systems, 103, 52–55. doi:10.1016/j.knosys.2016.03.020
Cattell, R. (2011). Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4), 12–27.
doi:10.1145/1978915.1978919
Chandra, D. G. (2015). BASE analysis of NoSQL database. Future Generation Computer Systems, 52, 13–21.
Chasseur, C., Li, Y., & Patel, J. M. (2013). Enabling JSON document stores in relational systems, In Proceedings
of the 2013 International Workshop on the Web and Databases (pp. 1-6). Academic Press.
Chen, P. P. S. (1976). The entity-relationship model—toward a unified view of data. [TODS]. ACM Transactions
on Database Systems, 1(1), 9–36.
Chodorow, K., & Dirolf, M. (2010). MongoDB - The Definitive Guide: Powerful and Scalable Data Storage.
O’Reilly.
Florescu, D., & Fourny, G. (2013). JSONiq: The History of a Query Language. IEEE Internet Computing, 17(5),
86–90. doi:10.1109/MIC.2013.97
Fong, J., & Shiu, H. (2012). An interpreter approach for exporting relational data into XML documents
with structured export markup language. Journal of Database Management, 23(1), 49–77. doi:10.4018/
jdm.2012010103
Frozza, A. A., dos Santos Mello, R., & da Costa, F. D. S. (2018, July). An Approach for Schema Extraction of
JSON and Extended JSON Document Collections. In Proceedings of the 2018 IEEE International Conference
on Information Reuse and Integration (IRI) (pp. 356-363). IEEE. doi:10.1109/IRI.2018.00060
Hai, R. H., Quix, C., & Kensche, D. (2018). Nested schema mappings for integrating JSON. In Proceedings of
the 37th International Conference on Conceptual Modeling (pp. 397-405). Academic Press. doi:10.1007/978-
3-030-00847-5_28
Helland, P. (2017). XML and JSON are like cardboard. Communications of the ACM, 60(12), 46–47.
doi:10.1145/3132269

68
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Hidders, J., Paredaens, J., & Van den Bussche, J. (2017). J-Logic: Logical foundations for JSON querying. In
Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp.
137-149). ACM. doi:10.1145/3034786.3056106
Hu, Y., & Dessloch, S. (2015). Temporal data management and processing with column oriented NoSQL
databases. Journal of Database Management, 26(3), 41–70. doi:10.4018/JDM.2015070103
Irshad, L., Ma, Z. M., & Yan, L. (2019). A survey on JSON data stores. In Emerging Technologies and Applications
in Data Processing and Management (pp. 45–69). Hershey, PA: IGI Global.
Junkkari, M., Vainio, J., Iltanen, K., Arvola, P., Kari, H., & Kekäläinen, J. (2016). Path expressions in SQL: A
user study on query formulation. Journal of Database Management, 27(3), 1–22. doi:10.4018/JDM.2016070101
Klettke, M., Störl, U., & Scherzinger, S. (2015). Schema extraction and structural outlier detection for JSON-based
NoSQL data stores. In Proceedings of the 2015 International Conference on Datenbanksysteme für Business,
Technologie and Web (pp. 425-444). Academic Press.
Langdale, G. & Lemire, D. (2019). Parsing gigabytes of JSON per second.
Li, Y., Katsipoulakis, N. R., Chandramouli, B., Goldstein, J., & Kossmann, D. (2017). Mison: A fast JSON
parser for data analytics. Proceedings of the VLDB Endowment International Conference on Very Large Data
Bases, 10(10), 1118–1129. doi:10.14778/3115404.3115416
Liu, Z. H. (2019). JSON data management in RDBMS. In Emerging Technologies and Applications in Data
Processing and Management (pp. 20–44). Hershey, PA: IGI Global.
Liu, Z. H., Hammerschmidt, B., & McMahon, D. (2014). JSON data management: Supporting schema-less
development in RDBMS. In Proceedings of the 2014 ACM SIGMOD International Conference on Management
of Data (pp. 1247-1258). ACM. doi:10.1145/2588555.2595628
Liu, Z. H., Hammerschmidt, B., McMahon, D., Liu, Y., & Chang, H. J. (2016). Closing the functional and
performance gap between SQL and NoSQL. In Proceedings of the 2016 ACM SIGMOD International Conference
on Management of Data (pp. 227-238). ACM. doi:10.1145/2882903.2903731
Ma, R. Z., Jia, X. Y., Cheng, J. W., & Angryk, R. A. (2016). SPARQL queries on RDF with fuzzy constraints
and preferences. Journal of Intelligent & Fuzzy Systems, 30(1), 183–195. doi:10.3233/IFS-151745
Ma, Z. M., Capretz, M. A. M., & Yan, L. (2016). Storing massive Resource Description Framework (RDF) data:
A survey. The Knowledge Engineering Review, 31(4), 391–413. doi:10.1017/S0269888916000217
Ma, Z. M., Lin, X. Q., Yan, L., & Zhao, Z. (2018). RDF keyword search by query computation. Journal of
Database Management, 29(4), 1–27. doi:10.4018/JDM.2018100101
Melton, J. (2003). Information Technology-Database Languages-SQL-Part 14: XML-Related Specifications
(SQL/XML) (ISO/IEC 9075-14: 2003). OASIS.
Mok, W. Y. (2016). Utilizing nested normal form to design redundancy free JSON Schemas, International
Journal of Recent Contributions from Engineering. Science & IT, 4(4), 21–25.
Petković, D. (2017a). SQL/JSON standard: Properties and deficiencies. Datenbank-Spektrum, 17(3), 277–287.
doi:10.1007/s13222-017-0267-4
Petković, D. (2017b). JSON integration in relational database systems. International Journal of Computers and
Applications, 168(5), 14–19. doi:10.5120/ijca2017914389
Petkovic, D. (2018). Full-text search extensions for JSON documents: Design goals and implementations. In
Proceedings of the 2018 International Conference on Beyond Databases, Architectures and Structures (pp.
283-293). Academic Press. doi:10.1007/978-3-319-99987-6_22
Pezoa, F., Reutter, J. L., Suarez, F., Ugarte, M., & Vrgoč, D. (2016, April). Foundations of JSON schema. In
Proceedings of the 25th International Conference on World Wide Web (pp. 263-273). International World Wide
Web Conferences Steering Committee. doi:10.1145/2872427.2883029
Piech, M., & Marcjan, R. (2018). A new approach to storing dynamic data in relational databases using JSON.
Computer Science, 19(1), 5. doi:10.7494/csci.2018.19.1.2505

69
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019

Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O., Zou, J., & Wangz, C. (2015). Schema management for
document stores. Proceedings of Very Large Database Endowment, 8(9), 922–933.
Tahara, D., Diamond, T., & Abadi, D. J. (2014). Sinew: A SQL system for multi-structured data, Proceedings
of the 2014 ACM SIGMOD International Conference on Management of Data (pp. 815-826). CM.
doi:10.1145/2588555.2612183
Wright, A., Andrews, H., & Lu, G. (2016). JSON Schema validation: A vocabulary for structural validation of
JSON. IETF Standard.

ENDNOTES
1
https://www.quora.com/Is-it-okay-to-use-JSON-as-a-database
2
https://www.toolsqa.com/rest-assured/jsonpath-and-query-json-using-jsonpath/
3
http://json-schema.org/latest/json-schema-validation.html
4
https://mockaroo.com/

Lubna Irshad is now a master student at Nanjing University of Aeronautics and Astronautics, China. Her research
interests include databases and Web data management.

Li Yan is currently a full professor at Nanjing University of Aeronautics and Astronautics, China. Her research
interests include web data modeling and temporal/spatiotemporal data management. She has published more
than fifty papers on these topics. She is the author of three monographs published by Springer.

Zongmin Ma is currently a full professor at Nanjing University of Aeronautics and Astronautics, China. His research
interests include databases, the Semantic Web, and knowledge engineering with a special focus on information
uncertainty. He has published more than one hundred and seventy papers on these topics. He is also the author
of five monographs published by Springer. He is a Fellow of the IFSA and a senior member of the IEEE.

70
Reproduced with permission of copyright owner. Further
reproduction prohibited without permission.

You might also like