Semi-Structured Data Design: (RDF & Json)
Semi-Structured Data Design: (RDF & Json)
Semi-structured
Data Design
(RDF & JSON)
Week 6
Sep 26 Oct 2
IS/ICT 303 Online Fall 2016
UNIVERSITY OF KENTUCKY, SCHOOL OF INFORMATION SCIENCE
IS/ICT-303
AGENDA
1. XML XPath
IS/ICT-303
ANNOUNCEMENT QUIZ 1
Next week, there is a first online quiz (Quiz 1).
20 questions (true/false, multiple choice,
short answer questions)
120 minutes
You can start the quiz online anytime
between Oct. 3rd 12pm and Oct. 9th 11:59pm.
All questions will be from the lecture notes.
IS/ICT-303
XML XPATH
IS/ICT-303
XPath
XPath
XML document.
IS/ICT-303
XPath
XPath
XQuery
XPoint
XPath
XLink
XSLT
IS/ICT-303
XPath
XPath
BOOK
Title
Authors
Author
BOOK
BOOK
Price
Author
IS/ICT-303
XPath
XPath
BOOKSTORE
Basic Constructs
BOOK
Title
(root element)
/BOOK
(element)
Authors
Author
BOOK
BOOK
Price
Author
/BOOK/Title (element)
*
@
IS/ICT-303
XPath
XPath
/bookstore/book/title
/menu/food/description
IS/ICT-303
XPath Nodes
Xpath Root Element Nodes
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore> Root Element Node / " or " /bookstore"
<book>
lang="en">Harry Potter</title>
The<title
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
points to the official W3C
XSLT
namespace.
you use this namespace, you must also include the attribute
<author>J
K.If Rowling</author>
version="1.0".
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
IS/ICT-303
XPath Nodes
Xpath Element Nodes, Attribute Nodes
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
Element Node "/bookstore/book"
lang="en">Harry Potter</title>
The<title
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
points to the official W3C
XSLT
namespace.
you use this namespace, you must also include the attribute
<author>J
K.If Rowling</author>
version="1.0".
<year>2005</year> Element
Node
<price>29.99</price>
"/bookstore/book/year"
</book>
</bookstore>
IS/ICT-303
Relationship of Nodes
Parent - Children
- Each element can have only one parent
- Element nodes may have zero, one or more children.
<bookstore>
<book>
The
<title lang="en">Harry
Potter</title>
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<author>J K. Rowling</author>
points to the official W3C XSLT namespace. If you use
this<year>2005</year>
namespace, you must also include the attribute
<price>29.99</price>
version="1.0".
</book>
</bookstore>
<bookstore> : Parent
<book> : Child
<book> : Parent
<title> : Child
<author> : Child
<year> : Child
<price> : Child
IS/ICT-303
Relationship of Nodes
Siblings
- Nodes that have the same parent.
<bookstore>
<book>
The
<title lang="en">Harry
Potter</title>
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<author>J K. Rowling</author>
points to the official W3C XSLT namespace. If you use
this<year>2005</year>
namespace, you must also include the attribute
<price>29.99</price>
version="1.0".
</book>
</bookstore>
<title>
<author>
<year>
<price>
Siblings
IS/ICT-303
XPath Syntax
EXPRESSION
DESCRIPTION
nodename
/nodename
//nodename
Select attributes
IS/ICT-303
XPath Syntax
EXPRESSION
book
/bookstore
/bookstore/book
DESCRIPTION
Selects all nodes with the name
"bookstore"
Selects the root element bookstore
Select all book elements
//book
//@lang
IS/ICT-303
XPath Syntax
Predicates (conditions)
- Predicates are used to find a specific node or a node that contains a
specific value. Predicates are always embedded in square brackets.
EXPRESSION
DESCRIPTION
/bookstore/book[1]
/bookstore/book[last( )]
/bookstore/book[position( )<3]
//title[@lang]
//title[@lang="eng"]
XPath Syntax
Selecting Several Paths
- By using the | operator in an XPath expression you can
select several paths.
PATH EXPRESSION
//book/title | //book/price
RESULT
Selects the title OR price elements of all book
elements
//title | //price
/bookstore/book/title | //price
IS/ICT-303
IS/ICT-303
SEMANTIC WEB
"The Semantic Web is an extension of the
current web in which information is given welldefined meaning, better enabling computers
and people to work in cooperation.
(Tim Berners-Lee)"
IS/ICT-303
LINKED DATA
In Semantic Web terminology, Linked Data is
the term used to describe a method of
exposing and connecting data on the Web from
different sources.
IS/ICT-303
LINKED DATA
Rel
Rel
Rel
Rel
Rel
Rel
Rel
IS/ICT-303
LINKED DATA
name
name
Bday
Bday
Thing 1
Thing 2
Mood
Mood
Location
Location
IS/ICT-303
LINKED DATA
Challenges
HTML, JSON,
XML, CSV, RDF
UNIVERSITY OF KENTUCKY, SCHOOL OF INFORMATION SCIENCE
Linking
IS/ICT-303
LINKED DATA
Web Data
Name
Nick
Birthday
2099-09-09
Job
Future baby
Location
Milwaukee
Name
Yuna
Name
Asada
Birthday
2014-03-27
Birthday
2009-02-02
Job
Student
Job
Programmer
Location
Seoul
Location Tokyo
IS/ICT-303
LINKED DATA
Web Data + Relationships
Name
Nick
Birthday
2099-09-09
Job
Future baby
Location
Milwaukee
Name
Asada
Birthday
2009-02-02
Job
Programmer
Name
Yuna
Birthday
2014-03-27
Job
Student
Location
Seoul
Location Tokyo
IS/ICT-303
LINKED DATA
Web Data + Relationships
http://mysite.fake.com/nick
Name
Nick
Birthday
2099-09-09
Job
Future baby
Location
Milwaukee
http://mysite.fake.co.kr/yuna
http://mysite.fake.co.jp/asada
Name
Asada
Birthday 2009-02-02
Job
Name
Yuna
Birthday
2014-03-27
Job
Student
Location
Seoul
Programmer
Location Tokyo
IS/ICT-303
IS/ICT-303
WHAT IS RDF?
IS/ICT-303
WHAT IS RDF?
IS/ICT-303
WHAT IS RDF?
IS/ICT-303
WHAT IS RDF?
IS/ICT-303
WHAT IS RDF?
IS/ICT-303
RDF TRIPLES
RDF Triples
Dave
(Subject)
like
(Predicate)
Cookie
(Object)
IS/ICT-303
RDF TRIPLES
RDF Triples
Nick
hasCollegue
Yuna
Subject: Nick
Predicate: hasCollege
Object: Yuna
UNIVERSITY OF KENTUCKY, SCHOOL OF INFORMATION SCIENCE
IS/ICT-303
RDF TRIPLES
RDF Triples
A Sentence
(Subject)
Jane
(Predicate)
(Object)
sells
books
IS/ICT-303
RDF RESOURCE
IS/ICT-303
RDF DESCRIPTION
Resource
Description Framework
IS/ICT-303
RDF TRIPLES
RDF Triples
@prefix pref: <http://exampke.org/vocabulary#>
<#dave> <pref:likes> <#cookies>.
Dave
(Subject)
like
(Predicate)
Cookie
(Object)
IS/ICT-303
PROPERTY VALUE:
A property value is the value of a Property
IS/ICT-303
URI
IS/ICT-303
RDF EXAMPLE
Title
Artist
Country
Company
Price
Year
Empire Burlesque
Bob Dylan
USA
Columbia
10.90
1985
Bonnie Tyler
UK
CBS Records
9.90
1988
IS/ICT-303
and availability
Describing time schedules for web events
Describing information about web pages (content,
author, created and modified date)
Describing content and rating for web pictures
IS/ICT-303
IS/ICT-303
IS/ICT-303
RDF ELEMENTS
<RDF>
the root element
<Description>
identifies a resource
IS/ICT-303
IS/ICT-303
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire_Burlesque">
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
</rdf:RDF>
IS/ICT-303
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire_Burlesque">
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
</rdf:RDF>
UNIVERSITY OF KENTUCKY, SCHOOL OF INFORMATION SCIENCE
IS/ICT-303
Dublin Core
The Dublin Core is a set of predefined properties
for describing documents.
The first Dublin Core properties were defined at the
Metadata Workshop in Dublin, Ohio in 1995 and is
currently maintained by the Dublin Core Metadata
Initiative.
UNIVERSITY OF KENTUCKY, SCHOOL OF INFORMATION SCIENCE
IS/ICT-303
6. Contributor
7. Date
8. Type
9. Format
10. Identifier
11. Source
12. Language
13. Relation
14. Coverage
15. Rights
IS/ICT-303
URI = http://collections.lib.uwm.edu/cdm/ref/collection/af/id/37
UNIVERSITY OF KENTUCKY, SCHOOL OF INFORMATION SCIENCE
IS/ICT-303
IS/ICT-303
IS/ICT-303
IS/ICT-303
IS/ICT-303
JSON
What is JSON?
It is a data interchange format.
Based on JavaScript Objects.
IS/ICT-303
JSON
What is JSON?
Like XML, JSON can be thought of
data model, an alternative to
relational model.
IS/ICT-303
WHY JSON?
Why use JSON?
Easy to read and write
Fast and compact
Ideal for ordered lists or key/value pairs
Maps perfectly to most programming
IS/ICT-303
JSON
Why use JSON? (continued)
Human-readable, useful for data
interchange.
Useful for representing & storing semi-
structured data.
IS/ICT-303
JSON
JSON in Web applications
Many web applications provide APIs in
the data format of JSON.
IS/ICT-303
JSON
Basic constructs
Objects { }
sets of label-value pairs
Arrays [ ]
lists of values
IS/ICT-303
JSON
Objects { }
"price" : 29.95
IS/ICT-303
JSON
Objects { }
{ "firstName":"John" , "lastName":"Doe" }
T
IS/ICT-303
JSON
Arrays [ ]
lists of values
JSON arrays are written inside square brackets.
An array can contain multiple objects:
"employees": [
{ "firstName":"John" , "lastName":"Doe" },
{ "firstName":"Anna" , "lastName":"Smith" },
T
{ "firstName":"Peter" , "lastName":"Jones"
}]
IS/ICT-303
JSON
Example
IS/ICT-303
JSON
Example
IS/ICT-303
JSON
Relational Model versus JSON
Relational
JSON
Structure
Tables
Sets/ Arrays
Schema
Fixed in advance
Self-describing,
Flexible
Queries
Simple expressive
languages
Ordering
None
Arrays
Implementation
Most information
systems
Coupled with
programming
languages
IS/ICT-303
JSON
XML versus JSON
XML
JSON
Complexity
More complex
Less complex
Validity
Programming
interface
XML parsing
More directly
applicable to
programming
languages
Querying
Xpath
XQuery
IS/ICT-303
ONLINE DISCUSSION
IS/ICT-303
ASSIGNMENT #5
JSON (Due: 10/3)
Create a json document (transform a table into a
json format document)
An example is provided in the instruction
Please use any of text editors, such as NotePad,
NotePad++, Oxygen XML Editor, TextWragler,
AdobeDreamweaver, etc.
Please see the instruction for details.
IS/ICT-303
or By Appointment
IS/ICT-303