Paper 6 - Schema-Based JSON Data Stores in Relational Databases
Paper 6 - Schema-Based JSON Data Stores in Relational Databases
ABSTRACT
JSON is a simple, compact and light weighted data exchange format to communicate between web
services and client applications. NoSQL document stores evolve with the popularity of JSON, which
can support JSON schema-less storage, reduce cost, and facilitate quick development. However,
NoSQL still lacks standard query language and supports eventually consistent BASE transaction
model rather than the ACID transaction model. This is very challenging and a burden on the developer.
The relational database management systems (RDBMS) support JSON in binary format with SQL
functions (also known as SQL/JSON). However, these functions are not standardized yet and vary
across vendors along with different limitations and complexities. More importantly, complex searches,
partial updates, composite queries, and analyses are cumbersome and time consuming in SQL/JSON
compared to standard SQL operations. It is essential to integrate JSON into databases that use standard
SQL features, support ACID transactional models, and has the capability of managing and organizing
data efficiently. In this article, we empower JSON to use relational databases for analysis and complex
queries. The authors reveal that the descriptive nature of the JSON schema can be utilized to create
a relational schema for the storage of the JSON document. Then, the powerful SQL features can be
used to gain consistency and ACID compatibility for querying JSON instances from the relational
schema. This approach will open a gateway to combine the best features of both worlds: the fast
development of JSON, consistency of relational model, and efficiency of SQL.
Keywords
JSON, JSON Schema, NoSQL, Relational Database Systems, Schema-Less
INTRODUCTION
Web services are the means to exchange information on a mobile, search engine, enterprise application
and many more, thought formats like XML (Extensible Markup Language) and JSON (Java Script
Object Notation). For serialization, data interchange format plays a great role in terms of the rate of
data transfer and performance (Helland, 2017). Structure of JSON is similar to the type model of
many programming languages, essentially Java Scripting (JS) that makes it flexible, easy to use and
DOI: 10.4018/JDM.2019070103
Copyright © 2019, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
38
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
independent text format. It is simple and requires no prior knowledge to acquire and practice. JSON
is a format that fills a particular niche to integrate multiple services across many platforms. XML
and JSON, both being semi-structured and hierarchal data model for data exchange, are popular
serialization techniques in Web development. XML as compared to JSON is heavy, complex and
requires additional libraries to exchange data. Its model consists of hierarchal complex tags and
requires more bytes for data transfer even for a small task. JSON is light weighted, compact and
close to many programming languages. It is one of its kind of emerging data-interchange format as
compared to XML and many others like Atom, RDF (Ma, Lin, Yan, Zhao, 2018; Ma, Jia, Cheng &
Angryk, 2016), REBOL, Gellish, YAML and so on.
JSON document consists of objects that constitute of attributes/keys (string type), value (String,
Number, Boolean and NULL), arrays and objects. An object is a collection of pairs (attributes &
values) and a pair can again be a JSON object (Bourhis, Reutter, Suárez & Vrgoč, 2017). JSON
document can have hierarchical data like nested objects and arrays. Being simple, easy and light
weighted, JSON has become a format of choice for most Web services. JSON especially adores for
storing temporary data (like filling the form on Web) and exchanging information between servers and
clients. Note that it cannot be used for permanent storage, data analysis, and processing of complex
queries1. So, database support for storing and querying JSON came into existence (Liu, 2019; Irshad,
Ma, & Yan, 2019; Junkkari et al., 2016; Hu & Dessloch, 2015). Although JSON can work standalone,
its database support provides secure, easy and fast processing of information. This situation just likes
XML for databases (Fong & Shiu, 2012). In addition, it is difficult to manage data sharing by multiple
users in the document but it is easy in databases. Two types of database models support JSON data
management, which are relational database management system (RDBMS) and NoSQL (Not only
SQL) (Liu, 2019; Irshad, Ma, & Yan, 2019). We categorize them on the basis of their data structure,
model, data organization, and transaction and querying mechanism.
NoSQL database encompasses document databases, key-value pair stores, columnar stores and
graph formats stores (Ma, Capretz, & Yan, 2016). JSON document stores belong to the category
of NoSQL document databases that evolve with the popularity of JSON. NoSQL data stores are
gaining popularity but still lack powerful standard query language. Although, as compared to many
document stores, the query language of MongoDB is easy and close to SQL, it requires MapReduce
for aggregation and complex queries. CouchDB provides better consistency but has complex query
language. Learning a new language every time as per the need of application/project is a cumbersome
and time taking task. Standardization of query language for all NoSQL document stores is the
biggest demand at present. NoSQL document stores, CouchDB and MongoDB claim to provide
ACID (Atomicity, Consistency, Isolation, Durability) transaction model in a coming version. First,
the ACID model is implemented on a single document level only and secondly, it is not fully tested
and available yet. BASE transaction model is not suitable for all applications, where things will be
eventually consistent as compared to the ACID model.
For traditional RDBMS, it is compulsory to design schema carefully before storing/adding data
in the relational tables, creating relations among tables and then querying data (Liu, Hammerschmidt,
& McMahon, 2014). It enforces relationships among tables using entity-relationship model (Chen,
2002) by breaking down entities into tables and relations (Liu, Hammerschmidt, & McMahon, 2014).
Good schema design helps for fast and efficient querying. Advocates of NoSQL document stores
claim that schema maintenance costs a lot for rapidly growing data while proponents of RDBMS
argue that schema-less development results in long-terms issues (Tahara, Diamond, & Abadi, 2014).
NoSQL databases facilitate semi-structured and unstructured data that may fluctuate time to time while
relational databases deal with structured data. NoSQL does not have a strong powerful and generalized
query language while RDBMS have a native strong SQL language for fast efficient querying. NoSQL
follows “schema later or never” approach while RDBMS follows the relational data model (Melton,
2003) based on “schema earlier, data afterward “approach (Liu et al., 2016). The relational database
strongly follows the ACID model while the NoSQL databases follow BASE principles (Chandra,
39
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
2015). NoSQL document stores generally do not provide explicit locks and have weaker concurrency
and atomicity properties than traditional ACID-compliant databases (Cattell, 2011).
In a contemporary world where time to the market is crucial and agile development is gaining
popularity, no (or never) schema nature of JSON really helps for quick development. Diminishing
schema definition overhead reduces workload and makes the application functional in no time without
niggling about incidental details. However, these incidental details may cause many difficulties
afterward in the organization, maintenance, reusability and sharing of data (Tahara, Diamond, &
Abadi, 2014). Besides, dealing with huge JSON document requires some validation rules, constraint
checking for the integrity and joining multiple documents. The notion of structure or schema is
substantial for traditional application development and data analysis (Klettke, Störl, & Scherzinger,
2015). JSON schema plays this role for JSON document (Baazizi, Colazzo, Ghelli, & Sartiani, 2019;
Pezoa et al., 2016). JSON schema is used optionally to describe the structure of JSON document by
defining attributes with their names, types, properties, constraints, hierarchy, and relation with other
attributes. It describes the structure, defines interaction control and affirms the validity of JSON
document. First, it is not compulsory to design JSON schema prior creation of any JSON document.
Secondly, it does not play many roles in querying JSON document efficiently except validation and
this differs it with the definition of a relational schema. Representation of data through JSON data
format is increasing. As a matter of fact, an increase in substantial amount of data will later require
analysis too. Pragmatically, for analysis, it is necessary to have some up-to-date information, structure
definition, and efficient query language. The descriptive power of JSON schema can be used to create
the schema in RDBMS and prevailing SQL query can lead to quick, efficient and reliable querying.
Both NoSQL document databases and RDBMS are used in different scenarios according to
requirement. The developer needs to learn both of them to enhance productivity. However, the
integration of various approaches will be more beneficial for the developer to gain the best of all in
single platforms. This paper devotes to combine both worlds together to take out best and resolve
all mentioned issues. We intend to incorporate JSON with RDBMS to get the benefit of multiple
platforms along with SQL powerful features. This can reduce the burden of learning new technology
and new language every time according to the application need that is required in case of NoSQL
document databases, and new operations that support JSON like in relational database systems. For
this vision, we articulate a new approach that will open new horizons for research. Few efforts have
devoted to JSON data management with RDBMS, but the proposed approaches mainly focus on
schema-less development in RDBMS (Liu, Hammerschmidt, & McMahon, 2014; Liu, 2019). Being
different from (Liu, Hammerschmidt, & McMahon, 2014; Liu, 2019), in (Chasseur, Li, & Patel, 2013;
Petković, 2017a & 2017b), JSON objects are split and stored in one or several relational tables. We
suggest in this paper using the descriptive nature of JSON schema to create a relational schema for
JSON document management by using the following three principles.
Query Principle
For querying, we apply some generalized transformation principles along with SQL traditional
statements to perform all operations. As we use a simple relational structure, querying is also simple
and easy like in any relational database system. Our approach allows the user to apply JSON as
relational databases by querying directly with standard SQL statements.
40
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Projection Principle
Data can project to the user in relational and JSON format. Due to the basic relational structure,
queried data can display easily in relational format. To display data in JSON format, we suggest
recomposing JSON document in-relation to unique object_id of stored JSON objects from relational
tables. We propose an algorithm (Algorithm 3) to transform user queried data into JSON format.
Our approach is simple and close to relational database architecture. It is easy to query JSON
data by using traditional SQL statements. To the best of our knowledge, our approach for managing
JSON data with schema with RDBMS is the first effort in JSON data management with RDBMS.
We hope this approach will help JSON and RDBMS to complement each other in a single platform
and benefit developers to complete their task efficiently and rapidly.
Details of above-mentioned principles are discussed in the following sections. Section 2 highlights
the related work. Section 3 defines generalized mapping principles and an algorithm to create a
relational schema from JSON schema. Section 4 states JSON instance operations principles for
projection, selection, insertion, updating, and deletion of JSON instance from the relational schema.
In addition, two algorithms are defined: one to store JSON data in the created relational schema and
second to display the user queried results into JSON file format. Section 5 presents implementations
and experimental details. Section 6 draws some conclusion with future work.
RELATED WORK
JavaScript Object Notation (JSON) is unstructured, flexible and readable by humans and machine.
With the rapid increase of data and users, JSON has become a current hotspot for data management
on the Web. Various approaches are proposed for JSON data representation and processing (Bourhis,
Reutter, Suárez, & Vrgoč, 2017). Many database management systems support JSON format to store,
retrieve and organize JSON as NoSQL document stores and relational databases. JSON documents
can be stored as text in key-value stores (NoSQL) or in relational database systems as JSON datatype.
41
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
set of design goals that full-text search extension of the SQL/JSON language should support, in which
the given set is valid for any fulltext search language in relation to JSON documents.
In addition to JSON document, JSON schema is also an import part of JSON data, which generally
plays a crucial role in JSON data management (Baazizi, Colazzo, Ghelli, & Sartiani, 2019; Pezoa
et al., 2016; Wright, Andrews, & Lu, 2016). In (Hai, Quix, & Kensche, 2018), for example, JSON
schema is used for JSON data integration, where a nested mapping representation is generated.
Identifying that schema information is sometimes essential for data retrieval, integration and analysis
tasks, some efforts have devoted to deal with JSON schema. In (Mok, 2016), using Nested Normal
Form as a guide, an JSON schema design methodology is proposed in order to design redundancy
free JSON Schemas, which begins with UML use case diagrams, communication diagrams and
class diagrams that model a system under study. Note that JSON schema is not always available
for direct use. To discover implicit schemas in JSON data, a model-based approach is proposed to
generate the underlying schema of a set of JSON documents in (Cánovas Izquierdo & Cabot, 2013).
In (Klettke, Störl, & Scherzinger, 2015), an algorithm is introduced for schema extraction that is
operating outside of the NoSQL data store. Identify a JSON type language, Baazizi et al. (2017) deal
with the problem of inferring a schema from massive JSON datasets. They design a schema inference
algorithm and implement it based on Spark. Frozza et al. (2018) propose an approach for extracting a
schema from a JSON or extended JSON document collection stored in a NoSQL document-oriented
database or other document repository. Focusing on the use of open Web APIs from different sources
with JSON as interchange data format, Cánovas Izquierdo & Cabot (2016) develop a tool named
JSONDiscoverer, which can discover and visualize the implicit schema of JSON documents as well
as possible composition links among JSON-based Web APIs. More importantly, for the purpose of
JSON for document stores, a schema management framework is proposed in (Wangz et al., 2015),
which is used to discover and persist schemas of JSON records in a repository, and supports querie
and schema summarization.
42
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
scalability and a document query mechanism. Here document query mechanism handles document
like attachments and the fields consist of text, numbers, Boolean and lists. CouchDB uses RESTful
HTTP API to save, update, retrieve and delete data on JSON document. Querying CouchDB is a bit
complex and is possible only on predefined views. These predefined views facilitate to filter documents,
retrieve data in a specific order, and create indexes. CouchDB uses MapReduce for defining views,
and indexing, searching and aggregating data. This way is cumbersome and not much easy. Here
Map/Reduce has two folds: it looks at all of the documents; it creates a mapping of document for
further processing. Mapping is a onetime process and occurs again only if document is updated. Like
MongoDB, CouchDB provides a replication model named “Eventual Consistency.” In this model,
changes are copied to a node of document one by one without affecting other nodes and eventually
all nodes syncs (MongoDB replication model is not for scalability for failovers). Regarding updating
documents, with commit operation, all updates flush on the disk by writing updates to the end of the
file that lower the risk of conflicts. A developer can select replication, filter document, and make
copy on a specific device. This helps in optimizing the memory usage of mobiles. So, many platforms
recommend CouchDB for mobile application due to less memory utilization.
JSON document stores like MongoDB (Chodorow & Dirolf, 2010) and CouchDB (Anderson,
Lehnardt, & Slater, 2010) can store and retrieve JSON objects in their primary format. Efficient
binary format, query language and secondary indices are making MongoDB lead the race of NoSQL
document store as compared to CouchDB (Chasseur, Li, & Patel, 2013). Couchbase Server 4.0 has
introduced N1QL, a powerful query language that extends SQL to JSON, which enables the developers
to leverage both the power of SQL and the flexibility of JSON (Petkovic, 2017).
Note that JSON documents stores along with so many pros have some cons as compared to
relational database systems. Most importantly, JSON document stores lack powerful query and ACID
transaction features (Chandra, 2015). Data analysis or processing without powerful query and ACID
transaction property is very challenging and burden on the developer. The document stores generally
do not provide explicit locks, and have weaker concurrency and atomicity properties as compared
to traditional ACID-compliant databases (Cattell, 2011). On the one side, JSON document stores
facilitate the developer by reducing development overhead and cost and can provide quick services.
On the other side, the developer loses the rich power of native SQL query constructs, analytical
processing, safety-guaranteed transactions and many more. For these reasons, CouchDB and MongoDB
are recently trying to achieve ACID compliant transactions for a single document. They want to
leverage the capabilities of traditional relational databases and make their design more mature. ACID
compliance in NoSQL document stores is going to be an added benefit for the collected or generated
data in the future. With the popularity of NoSQL databases, the JSON persistence of RDBMS had
been becoming a question mark (Liu, Hammerschmidt, & McMahon, 2014; Liu, 2019). RDBMS
has been serving and evolving for more than 30 years and have faced many threats and yet not only
survived but also excel afterward (Atzeni et al., 2013). There is so much work that has been done on
RDBMS and SQL. It does not seem easy to throw away all the efforts and start a journey from the
outset without utilizing the efforts of many years (Atzeni et al., 2013).
43
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
valid JSON instances as a whole in a binary or text column. It stores JSON as an object along with a
unique identifier, which can identify each instance, and uses the latest SQL/JSON constructs for its
management. As a result, the functional, performance and conceptual gap between SQL and NoSQL
worlds can be reduced. The biggest edge of using the relational model for JSON data is that the
capabilities of database servers (e.g., fast tuning, high scalability, and refined optimization techniques)
can be applied on JSON data to achieve high performance (Beyer et al., 2005).
The approach in (Wright, Andrews, & Lu, 2016) emphasizes to manage both relational and JSON
data under one umbrella of RDBMS, where, after the implantation of JSON path query language, SQL
can serve for both structured and unstructured data. It proposes to save native JSON objects into the
relational databases without shredding them into a relational model. So, the JSON object instances
are stored in relational schema without a formal definition of schema. Nowadays, major database
systems, say Oracle, MySQL, and PostgreSQL, are providing JSON data type (binary) and textual
type for storing JSON data. Text and binary data types both support JSON document management,
but it is recommended to use binary data type with the provided JSON-specific functions. Almost
all database vendors use the same approach on front-end by storing JSON document instances in
binary format, using JSON-specific function and JSON formatted data retrieval for querying. To parse
path hierarchy, JSON instance data can be accessed by using the prefix “.” or “->, ->>” notation in
different relational systems.
In (Liu, Hammerschmidt, & McMahon, 2014), three architectural principles are proposed, which
facilitate a schema-less development style within an RDBMS. As a result, RDBMS users can store,
query, and index JSON data without requiring schemas. It is shown in (Liu, Hammerschmidt, &
McMahon, 2014) that the three principles can be applied to the Oracle RDBMS Server with relatively
little effort. With the approach proposed in (Liu, Hammerschmidt, & McMahon, 2014), an RDBMS
can manage both relational data and JSON data in one platform, where SQL is used with an embedded
JSON path language as a single declarative language to query both relational data and JSON data.
To support SQL/JSON standard defined operations effectively and efficiently, the design approaches
to store, index, query JSON in the kernel of RDBMS are presented in (Liu, 2019). The issue how
JSON data model and classical relational model complement each other in a single RDBMS is further
addressed. Concentrating on the field of criminal data, in (Piech & Marcjan, 2018), a solution for
storing open schema data in a relational database is proposed to support JSON as a native type. The
proposed solution is implemented in two relational databases PostgreSQL and MySQL.
Only valid JSON document is stored according to JSON validation rules. A bit complex part
is the JSON-specific functions. Being new and not very well understood, for complex queries and
analysis, their usage requires a lot of knowledge as compared to traditional SQL statements. In addition,
querying and handling the same attribute with multiple values also require a lot of the embedded
constructs, which are a part of the query for full data retrieval to avoid errors. For recursive structure,
it is difficult to manage a variation of many attributes, change in cardinalities among attributes and
large hierarchal structure. Last but not the least, there are many API readily available to work with
JSON. But, in general, API works well with text/String rather than binary data that requires Web
developers to do a lot of coding. SQL/JSON is the standard to query JSON data, but the provided
JSON specific functions and operations are not standardized. Several drafts by some organizations vary
from one relational system to other in terms of storing, updating, retrieving and handling attributes
with multiple data types in JSON document. With all these issues mentioned above, shredding and
saving JSON object is not only tedious but also requires a lot of time.
One idea that has not been instigated yet is the usage of JSON schema in addition to JSON
document in cross platforms. JSON schema describes the structure and can be used to validates JSON
document. JSON schema can assist to create relational schema, which is helpful in converting user
query into SQL parameters for information retrieval. Being simple and close to relational database
architecture, it will be easy to query JSON data by using traditional SQL operators from the created
relational tables with less overhead.
44
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
JSON schema is applied to defines, describes and validates JSON document. In this paper, we
articulate an approach that uses JSON schema with RDBMS. We aim to combine the best features
of RDBMS with JSON to gain supremacy. For this vision, we suggest using the descriptive nature of
JSON schema to create a relational schema for the management of the JSON document. We develop
an algorithm to transform JSON schema into a relational schema. This approach constitutes a set of
principles that define the details for such a transformation. For the clarity of concept, we take the
JSON schema in Figure 1 and transform it into a relational schema systematically. We also take the
sample JSON data in Figure 2 and store it in the created schema.
45
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
46
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
47
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
For the clarity of concept, we take a sample JSON schema file in Figure 1 and map it
to relational structure tables (RS_Tables) given in Table 3 and Table 4, respectively. To
elaborate it further, a pictorial presentation is also given in Figure 3. A separate table is not
required for container objects that just hold other objects and do not possess any attribute.
“Address”, for example, is a container object that does not have any attributes but just holds
Temporary_Address and Permanent_Address objects.
Table 3 is created by mapping the sample JSON schema file in Figure 1. It stores all singleton
attributes and some information about objects that further contain objects and attributes. Value
“Personal” of “objectName” column is the root object that holds all singleton attributes, hierarchical
objects, and arrays. Value “null” of “parentLevel” column and value “0” of “level” column show that
it is a root object. Value “0” of “parentLevel” column shows that the attribute is contained directly
in root object and value ”1” shows that it is contained within the object with value “0” in “Level”
column and so on. To display the result to the user in JSON file format, “objectName” column defines
48
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
attribute name, and “attributeCategory” column identifies the types of braces that need to be used
according to the category of attributes. Column “level” and “parentLevel” are used to reconstruct
the hierarchy of objects. It is helpful to show output to the user according to the structure defined in
the JSON schema file.
To keep the detail of all objects, we create a child table shown in Table 4, which consists
of singleton attributes, objects, container objects and arrays. “RS_CID” is the primary key of
child table, which is automatically generated and incremented. Column “attributeName” shows
the actual attribute title presented in JSON schema file and “columnName” shows the alias of
that attribute. We use the alias of attributes to reduce long attribute name and avoid conflicts
with keywords of the database. We map and reuse “attributeName” while displaying results to
the user. Column “Required” preserves the constraint that this attribute is mandatory to retrieve
any information. Column “RS_ID” is a foreign key of the master table shown in Table 3, which
links the container attributes to its parent attribute.
Mapping Principles
JSON schema file defines a variety of constructs to describe all properties of JSON document. We
use these constructs to create tables and their relationships by following the basic structural definition
of a relational schema. We parse JSON schema and keep all information in RS_Tables. Then, we
read RS_Tables to create tables and their relationships with the parent table as per their level in the
49
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
hierarchy. In RS_Master_Table, the column “Level” defines the hierarchy of tables. Here, level “0”
defines the master table, level “1” defines the child table, level “2” defines the child of the child table
and so on. The variety of keywords JSON schema holds.
Properties Keyword
Properties keyword encompasses name, type, format, and restraints of the attributes in JSON document.
We use “name” to define column name, “type” to define column type and restraints to define
database constraints on the columns. Restraints predefine validation keywords for every data type in
JSON schema. There are structural and semantic validation restraints3for numeric (e.g., minimum,
exclusiveMinimum, maximum, exclusiveMaximum, multipleOf), String (e.g., maxLength, minLength,
pattern), arrays (e.g., maxItems, minItems, uniqueItems, additionalItems, contains, items), objects
(e.g., maxProperty, minProperty, required, properties, pattern properties) and so on (Wright, Andrews,
& Lu, 2016). They are used to validate JSON document instances. Note that it is not required to map
all validation rules in the relational schema as we only store valid JSON document (the document is
valid if it fulfills all restraints).
Type Keyword
To restrict the type of attributes in JSON schema, it is fundamental to use the “type” keyword.
This type attribute depicts the type of value that a specific attribute/key can accept (Baazizi,
Colazzo, Ghelli, & Sartiani, 2019). We use the specification of “type attribute” to map
JSON data types to SQL database types in Table 5. String type object can have multiple
values according to a defined format like DATE, TIME, DATETIME and so on. We define
the data type of columns in the table by keeping in view the format described in the JSON
schema file. This SQL data type mapping table may vary as per relational database vendors
like MySQL, PostgreSQL and etc.
50
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
After the creation of relational schema from JSON schema, we can store JSON document instance by
instance in the related master and child tables according to the structure defined in RS_Master_Table
and RS_Child_Table.
51
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Algorithm: createRelationalTables
Input: JSON Schema
Output: Creation of relational tables according to JSON Schema
//create RS Tables
1. openConnection()
2. createRSTables()
//Parse and save information in RS Tables
3. fLength=openJSONSchemaFile()
4. for fRec in 1 to fLength do
5. parseAllReferences()
6. resolveSpecialKeys()
7. saveAttributesDetails()
8. end loop
// create relational schema
9. recMasterCount = getRowCount()
10. for recMaster in 1 to recMasterCount do
11. RS_ID = getRSID(recMaster)
12. tableName = getTableName(RS_ID)
13. primaryKey = getPriamryKey(RS_ID)
14. recChildCount = getChildRowCount(RS_ID)
15. String queryStr =”Create Table” + tableName + ”(“ + primaryKey +” Primary Key, ”
16. for recChild in 1 to recChildCount do
17. getAttributeName = getAttributeName(RS_ID, recChild)
18. getDataType = getAttributeDataType(RS_ID, recChild)
19. queryStr = queryStr + getAttributeName +” ” + getDataType
20. if recChild < recChildCount then
21. queryStr = queryStr+ “, ”
22. end if
23. end for
24. queryStr = queryStr+”);”
25. createTable(queryStr)
26. parentLevel = getParentLevel(RS_ID)
27. level = getLevel(RS_ID)
28. if parentLevel<> null then
29. createRelation(parentLevel,level)
30. end if
31. end for
32. showMapping
33. closeConnection()
52
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
53
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Algorithm: JSONDataToRelationalStorage
Input: Parameter JSONDataFile; JSON data file that user wants to save in created relational schema
Output: Status “File Saved” or “File cannot save” with an error message
1. openConnection()
2. objectName [] = getObjectNames()
3. RS_ID[]= getPKValues()
4. for rec 1… length(RS_ID[]) loop
5. attributeTitle[] =getAttributeTitle()
6. for recV 1… length(attributeTitle[])
7. Read JSON data file line by line
8. String attributeName = getAttributeName()
9. if attributeName exists in attributeTitle[] then
10. parseSaveValue()
11. attributeExists = true
12. end if
13. end loop
14. if attributeExists = true then
15. createRelation()
16. attributeExixts = False
17. end if
18. end loop
19. closeConnection()
20. return status true if transaction ended successfully else false with error message
title of attribute exists in an array of attributesTitle[], function parseSaveValue() parses the value
according to the data type mapped according to RS_Child_Table, and saves the value in the column
mapped according to attributeTitle[recV1] from RS_Child_Table. If any column populates in a row of
the table, we generate the primary key of the row. We set the value of Boolean variable attributeExists
to be true if the row exists. If attributeExists is set to be true, we can use function createRelation()
to generate a primary key and create relationship among rows with its parent table record by using
RS_ID and parent level according to values in RS_Master_Table. If JSON data are successfully stored
in the relational schema, we return a status like “JSON data stores are successful” to the user. Finally,
all database connections are closed by using CloseConnection() function. Algorithm 2 takes JSON
data file, stores it in the created relational schema, and returns storage status to the user.
54
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
for JSON binary format. It looks like an extension of SQL that supports various function to handle
JSON data (Petkovic, 2017). However, it is not standardized yet and requires a lot of improvement for
storing, updating and querying JSON data in binary format as compared to standard SQL statements
(Petkovic, 2017).
Our approach for transforming JSON data in the relational model makes query simple by using
SQL standard functions. Like the relational databases, we use a simple mechanism for projection
with the SQL commands. Projection uses “select” statement to choose a list of attributes by column
titles or ‘*’ to project information of specific or all attributes vice versa. Predicates are used for
criteria-based retrieval, using “where” clause optionally. A predicate requires matching of master-child
reference id to fulfill criteria. We follow the basic relational database structure, and comparison, string
operations and pattern matching are hereby similar to database SQL query. Additionally, complex
queries, analysis, and data processing are also easy to be handled. We define a materialized view
based on all tables for fast querying.
To support our vision and perform all operations, we formalize transformation principles for
JSON query to traditional SQL statements. Our approach benefits the user in two ways: using the
result of a relational query directly and transforming the query results into JSON file format, and it
the same as the use of RDBMS. Once relational schema is created and JSON data are stored, we can
use JSON data as relational data, and query and perform all operations.
We propose an algorithm to recompose JSON document in-relation-to RS_Tables in mapping
schema from the stored JSON objects/instances. We query and save the result set in a temporary
table. This temporary table coexists with the session, drops as session terminates (Chasseur, Li, &
Patel, 2013) and inimitable to every connection. We use the temporary table along with RS_Tables to
convert information into JSON format. After querying, we iterate through records and save information
instance by instance in JSON file format for presentation to the user. Summarized detail of steps is
presented in Section 4.2.1.
• Populating the temporary table with the result set according to the user criteria.
We can execute the query according to the user criteria and save the result set in a temporary table
(temporary tables drop as session terminates (Klettke, Störl, & Scherzinger, 2015) and inimitable to
every connection (Chasseur, Li, & Patel, 2013), and they thereby require less memory space). We
use a temporary table with RS_Tables to get the required results in JSON format. This temporary
table contains only distinct records with the primary key only. We use these keys to get information
like table name, level to check hierarchy, parent level to know container object from RS_Tables.
We can retrieve the required information by getting table name, level and parent level details from
RS_Master_Table and RS_Child_Table. To this end, we select attributes from the child tables by using
a temporary table and RS_Child_Table. We apply the recursive function “relationalToJSON” to query
and iterate through a table by the table from RS_Master_Table and row by row from RS_Child_Table.
We concatenate braces “{,}” for objects and “[,]” for arrays according to the category of the attribute
stored in RS_Master_Table. We save information instance by instance in JSON file format.
55
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
It is needed to display the file to the user and close all databases and their connections if no further
processing is required. The user can view resultant information in relational and JSON formats. At
this point, data is already in relational format and so it is easy to display data to the user. Algorithm
3 describes how to display the queried data in JSON file format.
To project the whole document, for example, we iterate through tables and generate a file in
JSON format. We read the tables and their levels from RS_Tables to display objects in a proper
hierarchy. We get reference id or primary key id from the table at the level-0. Reference Id of the
level-0 table is used to retrieve the value from level-1 tables. Reference id of level-1 and level-0 tables
facilitate to query level-2 tables and so on. A full iteration of one record from the level-1 to last-level
describes one complete JSON instance along with all hierarchical objects. We store this record in
JSON file format and concatenate all objects in the same pattern one by one until record set ends. A
generalized algorithm to traverse through all levels is given in Algorithm 3. In addition, to display
specific columns, we can modify the algorithm by adding an array of the selected column as input
parameter, and pass it to getQuery() function.
Algorithm 3 first uses openConnection() function to open database connection and then takes
the input of the user query. Using functions saveColumnString() and getCriteraString() that keep
the record of selected columns (select statement) and criteria (from and where clause) of query,
respectively, it divides the query into two variables columVar and criteriaVar. Function getTableNames
takes these two variables as a parameter to get the distinct names of tables in array tableArr[], which
are presented in the condition by using columnVar and the select statement by using citeriaVar. The
tableArr[] is stored according to the column level in RS_Master_Table in ascending order by using
sortable functions. This helps to display JSON attributes, array, and objects in a proper hierarchy.
Function concatinatePK() is used in the loop to concatenate all primary key columns of the distinct
tables, which are missing in the select statement of queryStr, and this helps to avoid duplications of
data. Note that the primary key function is added only for the column where an aggregate function is
not applied so that the addition of primary key column may not affect the user query. After successful
concatenation of the primary key columns, function executeQuery(queryStr, criteriaVar) checks
the aggregate function if it exists in a query, and then executes the query after the concatenation
of criteria in variable queryStr. This function also saves the executed query in a temporary table.
Function getResultSet() separates the record set according to tableArr[] in a double dimension array
tableResultSet[][] from the resultSet of the executed query. This double dimension array keeps the
record of table and column. Finally, we use a recursive function displayJSONFormat(), which takes
this double dimension array and displays resultSet in JSON file format by using array tableResultSet[]
[]. Recursive function displayJSONFormat() initially takes the input of a primary key column title,
a primary key value of the first record of the table and first table name from the array tableArr[].
After printing the first record of the first table, it recursively calls for the records of the second
table that links other tables with the primary key passed as an input parameter. After completion of
the second table, the primary key of the second table is taken and a record of the table at level-3 is
displayed. In this way, the whole JSON instances are completely displayed one by one, excluding
the design details like the primary key column. In addition, according to the category of an attribute
defined in RS_Child_Table, opening is concatenated to a closing identifier such as “{,}” for objects,
“[,]” for an array, semicolons “:” and comma “,”. Function displayFile() shows the resultant JSON
data file to the user. It closes open connections and objects to make free the used memory space by
using closeConnection(). If the user gives ‘*’ in a select statement, we take all the column of table
mentioned in criteriaVar. Last but not the least, the user can use the results of the query directly like
from any RDBMS and in JSON format file according to the requirement.
Note that, in some cases, we can get the same results by simply querying tables directly without
using materialized view and temporary tables. However, materializing view can enhance performance
for complex queries. While querying the materialized view, only reference ids are stored in the
temporary table, which does not require much space. The temporary table drops automatically when
56
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Algorithm: relationalToJSON
Input: Parameter queryParm; is the query that the user wants to execute and get a result in JSON format
Output: JSON file contained queried relational data result set in JSON Format
1. openConnection()
2. String columVar = getColumnString()
3. String criteriaVar = getCriteraString()
4. Array tableArr[] = getTableNames(criteriaVar, columVar)
5. sortTable()
6. For rec 1… length(tableArr[]) loop
7. String queryStr = concatinatePK()
8. End loop
9. resultSet[] = executeQuery(queryStr, criteriaVar)
10. for rec in 1… length(tableArr[]) loop
11. tableResultSet[rec][]= getResultSet()
12. end loop
13. pkColTitle = getPKColumn(tableArr[1])
14. plColVal = getPKValue(tableResultSet[1][1], tableArr[1])
15. openFile()
16. File jsonData = displayJSONFormat(pkColTitle, plColVal, tableArr[1]);
17. displayFile()
18. End
// recursive function to display records in JSON Format
19. function displayJSONFormat(pkColTitle, plColVal, (tableArr[])
20. tableLevel = 1;
21. pkColumn = getPkColumnTitle(tableArr[tableNo]);
22. for recA in 1… length(tableResultSet[tableNo][]) loop
23. for recV in 1… length(tableResultSet[tableLevel][recA]) loop
24. getCategory(tableResultSet[rec])
25. embed opening Identifiers according to the category of attribute
26. displayAttributes(tableResultSet[rec])
27. pkVal =getPKValRec(tableResultSet[rec]) vale of tableResultSet[rec]
28. displayJSONFormat(pkColumn, pkVal, tableArr[])
29. embed closing Identifiers according to the category of attribute
30. end loop
31. end loop
32. closeConnection()
33. end
57
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
the session terminates. With the temporary tables, it is not required to repeat predicate (criteria) for
every table. The user can directly use the query results like from RDBMS as well as in JSON format
file as per requirement.
Our approach advocates to transform JSON schema into a relational schema by parsing JSON
schema file and then stores metadata information in relational structure tables as per the ideology
defined in the above sections. We transform a simple JSON schema file and a complex JSON
schema file, which includes self-reference ($ref) and special keys (‘oneOf’,’allOf’,’anyOf’),
into a relational schema. For the performance comparison with our approach, we performed
a comparison between the binary format of the NoSQL document store (MongoDB) and the
binary format of relational databases (MySQL). We evaluated results based on some simple and
aggregate queries on all mentioned formats. In the subsequent sections, we explain in detail the
category of the file used, SQL operations performed on the stored JSON data in the relational
database along with their evaluations.
Experimental Settings
We evaluated our models on Intel(R) Xeon(R) CPU E5-2620 v4 2.10GHz with 16 cores and 64GB
of ram running CentOS 7 operating system, equipped with an NVIDIA Tesla K40m with 12GB of
GPU memory along with 2880 CUDA cores with 1000.2 GB HDD. We used the Eclipse Framework
with the Python programming language to implement the systems. We used MySQL and MongoDB
with the client-server settings.
For the experimental point of view and proof of concept, we used the same of JSON schema
and JSON data file as mentioned in Figure 1 and 2, respectively. We generated a sample JSON data
using online data generation tool “mockaroo”4 by following the guidelines and pattern from other
database micro-benchmark (Melton, 2003). Details of generated sample data are as under that contain
String, Numeric, hierarchical objects and, array. We used arrays and hierarchical objects to check the
effectiveness of our approach at multilevel tables:
58
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
We saved JSON data in the relational database according to our approach. To evaluate the
efficiency, we measured the time by querying full data and partially selected data. We scaled up JSON
data from 1 million to 16 million JSON objects to show the results. One JSON instance/object means
a record of one person including all his/her information in a JSON document.
• NoSQL Binary Format: MongoDB stores whole JSON data instances by an instance in a
collection. The storage of JSON objects are a little more efficient as compared to relational binary
format and RS format. It stores objects in the form of collections and a collection is a combination
of similar documents. If the user does not create a collection, it is created automatically by
insertion of an object. Insertion of JSON is a bit faster or equal to MySQL binary JSON format;
• Relational Binary Format: Storage of JSON document is easy in relational binary format. It
is required to filter and save objects one by one in rows. Most of the database administrators
59
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Storage 14.52 16.91 19.62 52.15 53.38 65.85 187.35 193.27 208.71
make a script file that contains all statements of operations. By running the whole scripts, the
storage is completed at once, and this can save time and avoid ambiguity. Although storage of
binary format is easy, as shown in the following subsection, performing SQL operations is a bit
complex and varies for different RDBMS vendors;
• RS Format: Our approach is a little time taking as compared to two approaches mentioned above.
For our approach, saving JSON data in relational format is accomplished through Algorithm 2,
which parses the whole JSON data file, stores values of attributes in related tables and creates
relationships. It does not require separating all objects one by one and writing script to execute.
It takes a JSON document and stores it in a created schema. Storage time also depends on the
size of JSON file, size of JSON objects, and complexity of JSON data file number of JSON
objects. Once JSON objects are stored as relational data, our suggested RS format can perform
many database operations.
C1: This query projects whole objects including singleton attributes, hierarchical objects and array
without applying any criteria.
C2: This query projects a small subset of data, including “Name” and “Age” attributes of String and
Numeric data type, from all the objects in the dataset to test the efficiency of simple queries to get
root level singleton attributes. We used predicate/criteria (where clause) with respect to categories
one by one. First, we used a predicate of singleton attributes (“Name” and “Age”) then objects
(“City” of “Temporary_Address” and “Permanent_Address”) and lastly on arrays (“Kids”).
C3: This query projects two singleton attributes (“Name” and “Age”) and two attributes from the
nested 3. object, including “Street_Address” and “City” from “Temporay_Address”, to test the
results of attributes on various hierarchical levels. We used predicate/criteria (where clause)
category wise one by one. First, we used predicate of singleton attributes (“Name” and “Age”)
then objects (“City” from “Temporary_Address” and “Permanent_Address”) and lastly on arrays
(the number of “Kids” equal to 3).
C4: This query projects two singleton attributes (“Name” and “Age”) and “kids” data from the
array. The number of kids in the array varies from 0 to 3 for all objects. This query evaluates the
efficiency of various approaches on array data. We used predicate/criteria (where clause) with
60
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
categories one by one. First, we used predicate of singleton attributes (“Name” and “Age”) then
objects (“City” from “Temporary_Address” and “Permanent_Address”) and lastly on arrays
(number of “Kids”).
C5: This query depicts the count of persons of all ages by using criteria on the predicate of “Age” to
check the performance of aggregate function on singleton attributes (“Age”).
C6: this query displays the objects by using criteria on the predicate of hierarchical objects
“City” contained in “Temporary_Address” object to check the performance of aggregate
function on hierarchical objects. It shows the full details of all objects from the most
populated “City” by applying predicate of maximum persons living in the city from
“Temporary_Address”.
C7: This query shows the full details of all objects by using criteria on the predicate of arrays, where
the total number of kids of a person are equal to 3 and predicate on hierarchal object “City” of
“Temporary_Address” where city is the most populated to check the performance of aggregate
function on arrays and objects.
C8: This query updates singleton data at the root level by using the exact criteria on the predicate of
“Name” and “Age” to check the performance of partial update operation.
C9: This query updates the name of hierarchical attributes “City” contained in “Temporary_
Address” object by using a predicate on the same hierarchical object “City”. An update is
performed on a bunch of data simultaneously to test the results of an update on multiple
hierarchical levels.
C10: This query updates data of array that are the names of “kid” by using the predicate of a singleton
attribute “Name”. This query tests the results of an update on the data of arrays.
C11: This query deletes the single attribute age by using the exact criteria on the predicate of the
“Name” and “Age” columns to check the performance of partial delete operation.
C12: This query deletes hierarchical attribute “City” contained in “Temporary_Address” object by
using a predicate on singleton root attribute “Age”. A deletion is performed on a bunch of data
simultaneously to test the results of deletion of attributes at various hierarchical levels collectively.
C13: This query deletes the whole record on the aggregate count on kid column, where kids are
equals to 3 from the array data.
61
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
C14: This query adds the bulk of JSON objects in data to check the performance of the
insertion operation.
Evaluations of Queries
Query criteria C1 to C4 shows the projection of JSON objects. Query criteria C5 to C7 shows the
selection by using a different parameter and predicate on multiple tables. Query criteria C8 and C10
show the update results. Query criteria C11 to C13 show the deletion operation and their results.
Query criteria C14 shows the insertion of 0.1% of total JSON objects. We evaluated the query
criteria (C1 to C14) based on their average execution time. As shown in the table, we took JSON
in various file formats like NoSQL binary, relational binary and RS Format (JSON converted to
traditional relational structure). We applied the 14 query criteria on 1 million, 4 million and 16
million JSON objects that contain string, numeric, array, and hierarchical objects, respectively.
For the evaluation point of view, the performed queries show that our approach is
significantly better, plausibly because once relational schema created, manipulation and
analysis on data is very efficient. Besides, aggregate and complex queries are difficult and
time taking in binary format. The summary of the experimental results is stated in Table 8,
which shows some major differences regarding the performance of various query criteria on
different file formats with a different number of objects. We performed each query three time
by using random parameters and displayed their average result mentioned in Table 8. Details
of each operation on all mentioned formats are as follows.
Projection
Projection of JSON objects in both NoSQL and relational binary formats requires to display all objects
by iterating through the rows one by one. For RS format, it requires to iterate through multiple tables.
Performance evaluation of projection of JSON data is shown in Figure 5:
Projection C1 9.26 10.05 20.65 35.84 38.89 79.92 137.25 148.96 306.07
Selection C5 2.81 2.05 0.38 10.87 7.93 1.47 41.65 30.39 5.63
Update C8 0.38 2.56 0.07 1.47 9.91 0.27 5.63 37.94 1.03
C10 1.89 3.5 0.17 7.32 13.54 0.66 28.01 51.87 2.52
Deletion C11 1.51 1.49 0.09 5.84 5.77 0.35 22.38 22.08 1.33
C12 3.6 4.2 2.76 13.93 16.25 10.68 53.36 62.25 40.91
C13 0.28 0.39 0.31 1.08 1.50 1.19 4.15 5.78 4.59
Insertion C14 0.04 0.04 2.82 0.15 0.15 10.91 0.59 0.59 41.78
62
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
• NoSQL Binary Format: The NoSQL binary format outperforms two other formats in the
projection of all JSON objects without applying any condition (query criteria C1) and projection
of hierarchical data (query criteria C3);
• Relational Binary Format: The relational binary format is performed equally or a little less
in the projection of whole JSON document. However, it is performed better in the projection of
array data (query criteria C4) as compared to two other formats;
• RS Format: The RS format outperforms two other formats in the projection of singleton attributes
(query criteria C2) without applying any condition. However, for other criteria (C1, C3, and C4),
it requires to project full object, hierarchical objects and array, respectively, and make multiple
joins. This makes it run slower than two other formats. At this point, indexing techniques and
the use of materialized views can make it efficient.
Selection
For the selection, we applied query criteria C5, C6 and C7 to test simple and complex conditions for
all three formats, where aggregate functions on singleton attributes, hierarchical objects, and array are
used, respectively. Each criterion is applied to almost 1% of total JSON objects for instance 1 million,
4 million and 16 million. Performance evaluation of selection of JSON data is shown in Figure 6:
• NoSQL Binary Format: For aggregate queries from C5 to C7, NoSQL binary format is
the slowest;
• Relational Binary Format: The relational binary format for all aggregate queries from C5 to
C7 is a bit faster as compared to NoSQL binary format but slower as compared to RS format;
63
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
• RS Format: The RS format outperforms two other formats in all aggregate queries from C5 to C7
due to the use of standard SQL functions. The criteria-based queries on the relational database are
simple by using standard built-in SQL functions. RS format is performed efficiently, especially
for complex queries and data analysis.
Update
For the update, we used queries criteria C8 and C10. Query C8 updates only singleton attribute
(column wise) at root level. Queries C9 and C10 update a bunch of data in various rows from root to
child at different levels by updating hierarchical objects and arrays. We repeated all operations on three
queries and show the average results. Each criterion is applied to almost 0.5% of 1 million, 4 million
and 16 million JSON Objects. Performance evaluation of update of JSON data is shown in Figure 7:
• NoSQL Binary Format: Update operation of the NoSQL binary format is faster than the relational
binary format. All queries from C8 to C10 are performed better than the relational binary format
but not much as compared to the RS Format;
• Relational Binary Format: The relational binary format is not performed better as compared
to two other formats. For a partial update, sometimes it requires to replace a new instance from
the existing one. It requires to use multiple functions to perform a simple update on a single
instance or a bunch of objects;
• RS Format: For the RS format on queries criteria C8 to C10, updating singleton attributes,
multiple objects and arrays are quite easy by using simple SQL queries on specific tables.
Therefore, the RS format outperforms two other formats on update operation.
Deletion
We performed delete operation in two ways. The column wise deletion deletes attribute values from
objects. The row wise deletion deletes a whole instance. The query criteria C11, C12, and C13 delete
the singleton attribute values, hierarchical objects, and data from arrays, respectively. Each criterion
is applied to almost 1% of 1 million, 4 million and 16 million JSON objects. Performance evaluation
of deletion of JSON data is shown in Figure 8:
• NoSQL Binary Format: For query criteria C11, the column wise deletion on singleton attribute
in NoSQL binary format is almost the same as in the relational format, but a little faster as
compared to the relational binary format and slower as compared to the RS format. The same
results are also for query criteria C12. For query criteria C13, the NoSQL binary format is the
slowest as it uses aggregates function;
64
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
• Relational Binary Format: Deletion in relational binary format for query criteria C11 to C13
is the slowest among all the databases;
• RS Format: Cascade delete operation deletes all relevant records quickly from all the related
child tables. Its execution time is fast as compared to binary NoSQL and binary relational format.
For query criteria C11 to C12, RS format outperforms two other formats.
Insertion
We performed insertion of 1% of total JSON objects on 1 million, 4 million and 16 million JSON
objects. Performance evaluation of insertion of JSON data is shown in Figure 9:
• NoSQL Binary Format: The NoSQL binary format outperforms two other formats in the
insertion of new records for query criteria C14;
• Relational Binary Format: Insertion in the relational binary format is also fast as it just requires
to insert a new row instead of populating many tables for criteria C14;
• RS Format: For the RS format, insertion of a new instance requires to populate all relevant
tables and this is a bit time- consuming. But once it is completed, all operations become easy by
using standard SQL constructs and are performed better than other approaches.
Both NoSQL and RDBMS are used in different scenarios and they have their own features, limitations,
and usage in different applications. The developer needs to learn both of them to enhance productivity.
Integration of various approaches will be more beneficial for the developer to gain the best of all
65
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
in single platforms. JSON document stores in NoSQL lack powerful query and ACID transaction
features, which do not provide explicit locks, and have weaker concurrency and atomicity properties
as compared to traditional ACID-compliant databases. As a result, the data analysis and processing
are very challenging and burden on the developer. For this reason, RDBMS are applied to support
JSON in binary format with SQL functions (i.e., SQL/JSON). However, these functions are not
standardized yet and vary across the vendors along with different limitations and complexities. More
importantly, complex search, partial update, composite queries, and analysis are cumbersome and
time taking in SQL/JSON as compared to standard SQL operations. It is essential to integrate JSON
into databases that use the standard SQL features, support ACID transactional model, and has the
capability of managing and organizing data efficiently.
In this paper, we empower JSON to use relational databases for analysis and complex
queries. We reveal that the descriptive nature of JSON schema can be utilized to create a
relational schema for the storage of JSON document. Then the powerful SQL features can
be used to gain consistency and ACID compatibility for querying JSON instances from the
relational schema. The approach proposed in this paper explicitly considers JSON schema
in addition to JSON instance storage. Our approach uses multiple tables, considers data
types in the mapping from JSON to RDBMS, and develops detailed processing algorithms.
We quantitatively compared our approach with JSON file format and binary format. For
the storage of all JSON objects, NoSQL binary format and relational binary come first
as compared our approach (RS format). For selection, update and deletion, RS format is
performed better, especially for complex aggregate queries. In addition, insertion of new
objects is efficient in NoSQL binary format and relational binary format as compared to RS
format. We can further enhance our processing algorithms and introduce indexing technique
to make our approach more efficient in the near future.
NoSQL document database, which is an evolving technology to support JSON, still lacks
powerful standard query language to fully support ACID transaction model. In addition, learning
and adapting a new technology require time and may affect software reusability too. Relational
database systems are the main building blocks of many applications, which have been evolved
and enhanced through decades of continued research efforts, and have powerful standard query
language SQL, numerous skilled experts, abundant utilities, libraries, and API, and etc. In this
paper, we propose an approach for using JSON with relational databases. We invent an approach
to utilize JSON schema to bridge JSON and relational database system, including the creation
of relational schema from JSON schema and transformation of JSON data into relational data
as well as retrieval, addition, deletion and updating of the stored JSON data with standard SQL
queries. Once the relational schema is created and JSON data are stored in relational format,
querying and displaying of data become very easy because standard SQL queries can be used
to handle both simple and complex criteria. Queried results can be displayed both in relational
and JSON file formats. In addition, all RDBMS features can be used for efficiency, accuracy,
up-to-date-data access and ACID transaction, and etc. Our approach provides fast development
by JSON, consistency by relational databases and efficient querying by SQL all in one to achieve
data management efficiently.
For future guidelines, there are many directions for improving the whole progression.
Mapping JSON schema to the relational schema can be enhanced and the proposed algorithms
can be improved from the developer perspective to make them more efficient. Our research for
creating intermediate mappings of JSON to relational schema only considers data definition and
data manipulation components. Also, there are some other issues to be investigated in future
66
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
research. The definition of JSON schema should be normalized before creating a relational
schema. Considering user’s roles and access rights, security issues in the transformation of
JSON schema into relational schema need to be addressed. In addition, other services of
relational databases such as data backups, replication and etc. are important research problems
to be solved in the future.
ACKNOWLEDGMENT
This work was supported in part by National Natural Science Foundation of China (61772269,
61370075, 61572118). Zongmin Ma is a corresponding author on this paper.
67
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
REFERENCES
Anderson, J. C., Lehnardt, J., & Slater, N. (2010). CouchDB - The Definitive Guide: Time to Relax. O’Reilly.
Atzeni, P., Jensen, C. S., Orsi, G., Ram, S., Tanca, L., & Torlone, R. (2013). The relational model is dead, SQL
is dead, and I don’t feel so good myself. SIGMOD Record, 42(2), 64–68. doi:10.1145/2503792.2503808
Baazizi, M. A., Colazzo, D., Ghelli, G., & Sartiani, C. (2019). Schemas and types for JSON data. In Proceedings
of the 22nd International Conference on Extending Database Technology (pp. 437-439). Academic Press.
Baazizi, M. A., Lahmar, H. B., Colazzo, D., Ghelli, G., & Sartiani, C. (2017, March). Schema inference for
massive JSON datasets. In Proceedings of the 20th International Conference on Extending Database Technology
(pp. 222-233). Academic Press.
Beyer, K., Cochrane, R. J., Josifovski, V., Kleewein, J., Lapis, G., Lohman, G., ... & Truong, T. (2005, June).
System RX: one part relational, one part XML. In Proceedings of the 2005 ACM SIGMOD international
conference on Management of data (pp. 347-358). ACM. doi:10.1145/1066157.1066197
Bonetta, D., & Brantner, M. (2017). FAD.js: Fast JSON data access using JIT-based speculative optimizations.
Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, 10(12), 1778–1789.
doi:10.14778/3137765.3137782
Bourhis, P., Reutter, J. L., Suárez, F., & Vrgoč, D. 2017, JSON: Data model, query languages and schema
specification. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
Systems (pp. 123-135). ACM. doi:10.1145/3034786.3056120
Cánovas Izquierdo, J. L., & Cabot, J. (2013). Discovering implicit schemas in JSON data. In Proceedings of the
2013 International Conference on Web Engineering (pp. 68-83). Academic Press.
Cánovas Izquierdo, J. L., & Cabot, J. (2016). JSONDiscoverer: Visualizing the schema lurking behind JSON
documents. Knowledge-Based Systems, 103, 52–55. doi:10.1016/j.knosys.2016.03.020
Cattell, R. (2011). Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4), 12–27.
doi:10.1145/1978915.1978919
Chandra, D. G. (2015). BASE analysis of NoSQL database. Future Generation Computer Systems, 52, 13–21.
Chasseur, C., Li, Y., & Patel, J. M. (2013). Enabling JSON document stores in relational systems, In Proceedings
of the 2013 International Workshop on the Web and Databases (pp. 1-6). Academic Press.
Chen, P. P. S. (1976). The entity-relationship model—toward a unified view of data. [TODS]. ACM Transactions
on Database Systems, 1(1), 9–36.
Chodorow, K., & Dirolf, M. (2010). MongoDB - The Definitive Guide: Powerful and Scalable Data Storage.
O’Reilly.
Florescu, D., & Fourny, G. (2013). JSONiq: The History of a Query Language. IEEE Internet Computing, 17(5),
86–90. doi:10.1109/MIC.2013.97
Fong, J., & Shiu, H. (2012). An interpreter approach for exporting relational data into XML documents
with structured export markup language. Journal of Database Management, 23(1), 49–77. doi:10.4018/
jdm.2012010103
Frozza, A. A., dos Santos Mello, R., & da Costa, F. D. S. (2018, July). An Approach for Schema Extraction of
JSON and Extended JSON Document Collections. In Proceedings of the 2018 IEEE International Conference
on Information Reuse and Integration (IRI) (pp. 356-363). IEEE. doi:10.1109/IRI.2018.00060
Hai, R. H., Quix, C., & Kensche, D. (2018). Nested schema mappings for integrating JSON. In Proceedings of
the 37th International Conference on Conceptual Modeling (pp. 397-405). Academic Press. doi:10.1007/978-
3-030-00847-5_28
Helland, P. (2017). XML and JSON are like cardboard. Communications of the ACM, 60(12), 46–47.
doi:10.1145/3132269
68
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Hidders, J., Paredaens, J., & Van den Bussche, J. (2017). J-Logic: Logical foundations for JSON querying. In
Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp.
137-149). ACM. doi:10.1145/3034786.3056106
Hu, Y., & Dessloch, S. (2015). Temporal data management and processing with column oriented NoSQL
databases. Journal of Database Management, 26(3), 41–70. doi:10.4018/JDM.2015070103
Irshad, L., Ma, Z. M., & Yan, L. (2019). A survey on JSON data stores. In Emerging Technologies and Applications
in Data Processing and Management (pp. 45–69). Hershey, PA: IGI Global.
Junkkari, M., Vainio, J., Iltanen, K., Arvola, P., Kari, H., & Kekäläinen, J. (2016). Path expressions in SQL: A
user study on query formulation. Journal of Database Management, 27(3), 1–22. doi:10.4018/JDM.2016070101
Klettke, M., Störl, U., & Scherzinger, S. (2015). Schema extraction and structural outlier detection for JSON-based
NoSQL data stores. In Proceedings of the 2015 International Conference on Datenbanksysteme für Business,
Technologie and Web (pp. 425-444). Academic Press.
Langdale, G. & Lemire, D. (2019). Parsing gigabytes of JSON per second.
Li, Y., Katsipoulakis, N. R., Chandramouli, B., Goldstein, J., & Kossmann, D. (2017). Mison: A fast JSON
parser for data analytics. Proceedings of the VLDB Endowment International Conference on Very Large Data
Bases, 10(10), 1118–1129. doi:10.14778/3115404.3115416
Liu, Z. H. (2019). JSON data management in RDBMS. In Emerging Technologies and Applications in Data
Processing and Management (pp. 20–44). Hershey, PA: IGI Global.
Liu, Z. H., Hammerschmidt, B., & McMahon, D. (2014). JSON data management: Supporting schema-less
development in RDBMS. In Proceedings of the 2014 ACM SIGMOD International Conference on Management
of Data (pp. 1247-1258). ACM. doi:10.1145/2588555.2595628
Liu, Z. H., Hammerschmidt, B., McMahon, D., Liu, Y., & Chang, H. J. (2016). Closing the functional and
performance gap between SQL and NoSQL. In Proceedings of the 2016 ACM SIGMOD International Conference
on Management of Data (pp. 227-238). ACM. doi:10.1145/2882903.2903731
Ma, R. Z., Jia, X. Y., Cheng, J. W., & Angryk, R. A. (2016). SPARQL queries on RDF with fuzzy constraints
and preferences. Journal of Intelligent & Fuzzy Systems, 30(1), 183–195. doi:10.3233/IFS-151745
Ma, Z. M., Capretz, M. A. M., & Yan, L. (2016). Storing massive Resource Description Framework (RDF) data:
A survey. The Knowledge Engineering Review, 31(4), 391–413. doi:10.1017/S0269888916000217
Ma, Z. M., Lin, X. Q., Yan, L., & Zhao, Z. (2018). RDF keyword search by query computation. Journal of
Database Management, 29(4), 1–27. doi:10.4018/JDM.2018100101
Melton, J. (2003). Information Technology-Database Languages-SQL-Part 14: XML-Related Specifications
(SQL/XML) (ISO/IEC 9075-14: 2003). OASIS.
Mok, W. Y. (2016). Utilizing nested normal form to design redundancy free JSON Schemas, International
Journal of Recent Contributions from Engineering. Science & IT, 4(4), 21–25.
Petković, D. (2017a). SQL/JSON standard: Properties and deficiencies. Datenbank-Spektrum, 17(3), 277–287.
doi:10.1007/s13222-017-0267-4
Petković, D. (2017b). JSON integration in relational database systems. International Journal of Computers and
Applications, 168(5), 14–19. doi:10.5120/ijca2017914389
Petkovic, D. (2018). Full-text search extensions for JSON documents: Design goals and implementations. In
Proceedings of the 2018 International Conference on Beyond Databases, Architectures and Structures (pp.
283-293). Academic Press. doi:10.1007/978-3-319-99987-6_22
Pezoa, F., Reutter, J. L., Suarez, F., Ugarte, M., & Vrgoč, D. (2016, April). Foundations of JSON schema. In
Proceedings of the 25th International Conference on World Wide Web (pp. 263-273). International World Wide
Web Conferences Steering Committee. doi:10.1145/2872427.2883029
Piech, M., & Marcjan, R. (2018). A new approach to storing dynamic data in relational databases using JSON.
Computer Science, 19(1), 5. doi:10.7494/csci.2018.19.1.2505
69
Journal of Database Management
Volume 30 • Issue 3 • July-September 2019
Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O., Zou, J., & Wangz, C. (2015). Schema management for
document stores. Proceedings of Very Large Database Endowment, 8(9), 922–933.
Tahara, D., Diamond, T., & Abadi, D. J. (2014). Sinew: A SQL system for multi-structured data, Proceedings
of the 2014 ACM SIGMOD International Conference on Management of Data (pp. 815-826). CM.
doi:10.1145/2588555.2612183
Wright, A., Andrews, H., & Lu, G. (2016). JSON Schema validation: A vocabulary for structural validation of
JSON. IETF Standard.
ENDNOTES
1
https://www.quora.com/Is-it-okay-to-use-JSON-as-a-database
2
https://www.toolsqa.com/rest-assured/jsonpath-and-query-json-using-jsonpath/
3
http://json-schema.org/latest/json-schema-validation.html
4
https://mockaroo.com/
Lubna Irshad is now a master student at Nanjing University of Aeronautics and Astronautics, China. Her research
interests include databases and Web data management.
Li Yan is currently a full professor at Nanjing University of Aeronautics and Astronautics, China. Her research
interests include web data modeling and temporal/spatiotemporal data management. She has published more
than fifty papers on these topics. She is the author of three monographs published by Springer.
Zongmin Ma is currently a full professor at Nanjing University of Aeronautics and Astronautics, China. His research
interests include databases, the Semantic Web, and knowledge engineering with a special focus on information
uncertainty. He has published more than one hundred and seventy papers on these topics. He is also the author
of five monographs published by Springer. He is a Fellow of the IFSA and a senior member of the IEEE.
70
Reproduced with permission of copyright owner. Further
reproduction prohibited without permission.