0% found this document useful (0 votes)
21 views

Chapter 4 XML

Uploaded by

denzil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Chapter 4 XML

Uploaded by

denzil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Chapter-4

XML
What is XML?
• The eXtensible Markup Language (XML) is a text
document used mainly for distributing the data
on the internet between different applications.
• An xml is a text file saved with an extension .xml
• It’s a document for storing and transporting the
data; mainly used for the interchanging the data
on the internet.
• It is a language similar to html.
• In xml user can define our own tags and these
tags are used to describe the data.
• It is a compatible scripting language.
Advantages
• xml documents are easy to create.
• It has the property of self describing the data.
• xml is a fully compatible application like java.
• It is a portable language.
• It is platform independent.
*Difference between XML and HTML
XML syntax:
XML declaration:

This XML declaration indicates that the


document is written in XML and specifies
which version of XML.

XML declaration can also specify the language


encoding for the document.
Ex: <? xml version=”1.0” encoding=”UTF-8”?
Lang =“en”>
• Comments:Non executable part of a
program.
• XML comments begin with <!- - and end with - -> .

• XML comments allow us to write comments


within the document
• Ex: <!--This file is related to book information-->
• Root element:
• The first element in the XML document is
called root element, which is the parent of all
other elements in the document.
• Ex: <books>
----------
-----------
</books>
• Child elements:
The elements that are contained within
the root elements are called child elements.
• Empty elements:
An empty element is the one without the
closing tag and which does not hold any
contents.
• Ex: <br/>, < hr/>, <img/>…..

• Closing Tags – that’s the closing of the root


element.
Elements
• An xml document consists of 3 main tags
– Elements
– Attributes
– Entities
Elements
• Element:
The content between the start tag<..> and
end tag</..> including the tags is called element.
Ex: <title>Web programming</title>
<title>System programming</title>

Here web programming & system programming


are the elements.
*Attributes
• An attribute is a name/value pair , that we place
within an opening tag , which allows us to provide
extra information about an element.
• The property that describes an element is called
attributes.
• Ex :< img src=”myimage.gif”/>
• <input type=“text”>
• Here src & type are the attributes.
• An element can contain one or more attributes.
Entities
• Entity is an object in the real world
• Eg: Student
• Book etc
*XML syntax Rules:
• All XML documents must have a root element
• XML is Case sensitive.
• All XML elements must have closing tags.
• All XML elements must be properly nested.
• Attribute values must be quoted.
eg: <input type =“text”>
• The first character of each tag name must be a letter
or the “_ “character, but not numbers or other
punctuation.
*XML CDATA:
• CDATA is nothing but character data.
• The term CDATA is used about text data.
• Characters like <, >,& and few are treated as
illegal in xml elements.
• It will generate an error if we directly using it.
• So in order to avoid the errors in scripting, the
code can be defined using CDATA.
syntax
<! [CDATA [“ contents“ ] ]> as the closing
tag.
• "<" will generate an error because the parser
interprets it as the start of a new element.

• "&" will generate an error because the parser


interprets it as the start of an character entity.
• To avoid that error scripts code can be defined as CDATA as
follows:Example
<script type=”text/javascript” >
<![CDATA[
function greatest(a,b)
{
if(a>b)
return a;
else
return b;
}
]]>
• </script>
*Types of XML Documents
• There are two types
 Well Formed document
 Valid document
Well Formed document
• An XML document with correct syntax is called "Well
Formed". well-formedness refers to syntax.
• A Well Formed document is an xml document that confirms
or follows all the syntax rules of the xml.
• A well-formed XML document must have a corresponding end
tag for all of its start tags.
• Nesting of elements within each other in an XML document
must be proper.
• Eg:- <?xml version="1.0" encoding="UTF-8"?>
<!– -- Sample xml document-- -- >
<person>
<name> Manoj</name>
<age> 34</age>
<address> Hebbal</address>
</person>
Valid document
• An XML document said to be valid when it is not only well-formed, but
it also confirms to available DTD that specifies which tags it uses and
what attributes those tags can contain.
• validity refers to semantics.
• Syntax defines the rules and regulations that help write any statement
in a programming language, while semantics refers to the meaning of
the associated line of code.

• Eg:<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE strictSYSTEM “ strict.dtd">
<!– -- Sample xml document-- -- >
<person>
<name> Manoj</name>
<age> 34</age>
<address> Hebbal</address>
</person>
**DTD- Document Type Definition
• A DTD (Document Type Definition) consists of
a list of syntax definitions and rules for each
element in the XML document.
• The purpose of a DTD is to define the
structure and the legal elements and
attributes of an XML document:
• DTD specifies which element names can be
included in the document, the attributes that
each element can have, whether or not these
are required or optional and more.
DTD
• DTD <! DOCTYPE>
• The <!DOCTYPE> appearing near the top of the
document in every xml document;
• This is how DTD declaration happens in xml as well.
• Similarly to use DTD within XML document ,we need
to declare it.
• Syntax:
<!DOCTYPE rootname[DTD]>
• Eg:<! DOCTYPE books[note.dtd]>
Rules for DTD
• The DTD type declaration must be written in
between the xml declaration and the root
element.(ie, second line should be DOCTYPE)
• Keyword DOCTYPE must be followed by the
root element.
• Keyword DOCTYPE must be in uppercase.
**Types of DTD
• Internal DTD
• External DTD
Internal DTD
• A DTD is referred to as an internal DTD if
elements are declared within the XML files.
• If the DTD is declared inside the XML file, it
must be wrapped inside the <!DOCTYPE>
definition
• An internal DTD is defined between the square
brackets within the XML document.
Syntax
<!DOCTYPE root-element [element-declarations]>
Example
External DTD
• In external DTD elements are declared outside the
XML file.
• If the DTD is declared in an external file, the <!
DOCTYPE> definition must contain a reference to the
DTD file.
• It is same as internal except that defining an external
file.
• An external DTD is defined in an external file. And it
can be used with more than one XML document.
<?xml version="1.0"?>
<!DOCTYPE note “note.dtd”>
Syntax
• <!DOCTYPE root-element SYSTEM "file-name">
• where file-name is the file with .dtd extension.
**XML NAMESPACE:
• In XML namespace is used to prevent any conflicts
with element names.

• Because XML allows to create our own tag names,


there’s always the possibility of naming a tag exactly
same as one in another XML document.

• The XML namespace identifies the range of tags used


by the xml document.

• It is used to ensure that names used by one DTD


don’t conflict with user-defined tags or tags defined
Eg. For name conflicts
• If these XML fragments were added together,
there would be a name conflict.
• Both contain a <table> element, but the
elements have different content and meaning.
Solving the Name Conflict Using a Prefix

In the example above, there will be no conflict because the


two <table> elements have different names.
XML Namespaces - The xmlns Attribute

• When using prefixes in XML, a namespace for the prefix


must be defined.
• The namespace can be defined by an xmlns attribute in
the start tag of an element.
• The namespace declaration has the following syntax.
xmlns:prefix="URI".
**XML SCHEMAS
• An XML schema defines how to structure an XML
document and it can be used in place of DTD.
• An XML Schema describes the structure of an XML
document.
• – XML schema is based on XML.
• – XML Schema language is known as XML Schema
Definition (XSD).
• – The purpose of an XML Schema is to define the
legal building blocks of an XML document, just like a
DTD.
• An XML Schema:
• – defines elements that can appear in a document.
• – defines attributes that can appear in a document
• – defines which elements are child elements.
• – defines the order of child elements.
• – defines the number of child elements.
• – defines whether an element is empty or can
include text.
• – defines data types for elements and attributes.
• – defines default and fixed values for elements and
attributes.
(TYPES OF ELEMENTS IN XML)

• A simple Type

• A complex type
• “SIMPLE” TYPE ELEMENTS
• A simple element is an XML element that can contain
only text. It cannot contain any other elements or
attributes.
• Simple type elements have no children or attributes.
• Eg: <xs:element _name=“hai”/>
• “COMPLEX” TYPE ELEMENTS
• – A complex element may have attributes
•A complex element is an XML element that contains
other elements and/or attributes.
• – A complex element may be empty, or it may
contain text, other elements, or both text and other
elements.
• Eg: <product pid="1345"/>
Simple Elements
• A simple element is an XML element that can
contain only text. It cannot contain any other
elements or attributes.
• Complex Elements
A complex element is an XML element that
contains other elements and/or attributes.
• There are four kinds of complex elements:
• Empty elements
<product pid="1345"/>
Which does not have a child element.
• Elements that contain only other elements OR CHILD
Ex:A complex XML element, "employee", which
contains only other elements:

<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
• Elements that contain only text.
Ex: A complex XML element, "food", which
contains only text:
<food type="dessert">Ice cream</food>
• Elements that contain both other elements and
text
Ex:A complex XML element, "description",
which contains both elements and text:

<description>
It happened on <date>03.03.99</date>
....
</description>
**XSL( Extensible Style sheet Language)

• It is a styling language for XML just like CSS is a


styling language for HTML.
• XSL is a language to format xml documents.
• XSL has two parts
- XSLT
- XSL- FO
XSLT
• XSLT stands for XSL Transformations.
• XSLT: It is a language for transforming XML
documents into various other types of
documents.
• XSLT (Extensible Stylesheet Language
Transformations) is a language for
transforming XML documents into other XML
documents like HTML for web pages, PDF,
PNG (portable network graphics)etc.
XSLT Transformation Process
• The process of transforming an XML
document into another format is called XSL
transformation.
• XSLT Processor is responsible for
transforming the xml document.
• XSLT processor reads XML and XSLT
document and produces the output in the
form of HTML or XHTML or XML or PDF etc.
Advantages
• XSLT provides an easy way to merge XML data
to produce output.
• By using XML and XSLT, the application will
look clean and will be easier to maintain.
• XSLT can be used as a validation language .
XSL-FO
• XSL-FO (XSL- Formatting Objects) is a markup
language for XML document formatting , that
is most often used to generate PDF files.
• A markup language is a text-encoding system
• XSL-FO is part of XSL (Extensible Stylesheet
Language), a set of W3C technologies
designed for the transformation and
formatting of XML data.
Parser
• A parser is a compiler or interpreter
component that breaks data into smaller
elements for easy translation into another
language. A parser takes input in the form of a
sequence of tokens or program instructions.
*XML PARSER or Processors
• An XML parser is a software library or package that provides
interfaces for client applications to work with an XML
document. The XML Parser is designed to read the XML and
create a way for programs to use XML.
• XML parser validates the document and check that the
document is well formatted.
• Reads in XML data, checks for syntactic constraints.
• There are two types of parser APIs(a set of functions and
procedures allowing the creation of applications)

– SAX Simple API to XML (event-based)

– DOM Document Object Model (object/tree based)


SAX(Simple API for XML)
• – An event-based parsing technique.
(the flow of the program is determined by events such
as user actions like mouse clicks.)
• – The parser generates an application event
whenever it encounters an element or data in the
document being parsed.
• It is an event based parser, it works like an event
handler in Java.

• – Programmer attaches “event handlers” to handle


the event. Eg: click -onclick
• Advantages
• 1) It is simple and memory efficient.
• 2) It is very fast and works for huge
documents.
• Disadvantages
• 1) It is event-based so its API is less sensitive.
• 2) Clients never know the full information
because the data is broken into pieces.
DOM
• Refer from Chapter 2

You might also like