0% found this document useful (0 votes)
74 views

Fundamental XML For Developers: Dr. Timothy M. Chester Texas A&M University

This document provides an overview of XML and related technologies for software developers who are new to XML. It introduces XML and its advantages over HTML, discusses how XML documents are parsed and validated, and explains the document object model (DOM) for programmatically accessing XML documents as tree structures in memory. The agenda outlines topics to be covered including XML, the DOM, XPath, XSLT, schemas, WSDL, SOAP, and leaves time for questions.

Uploaded by

Nguyen Toan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Fundamental XML For Developers: Dr. Timothy M. Chester Texas A&M University

This document provides an overview of XML and related technologies for software developers who are new to XML. It introduces XML and its advantages over HTML, discusses how XML documents are parsed and validated, and explains the document object model (DOM) for programmatically accessing XML documents as tree structures in memory. The agenda outlines topics to be covered including XML, the DOM, XPath, XSLT, schemas, WSDL, SOAP, and leaves time for questions.

Uploaded by

Nguyen Toan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 82

Fundamental XML for

Developers
Dr. Timothy M. Chester
Texas A&M University
Timothy M. Chester is. . .
• Senior IT Manager, Texas A&M University
– Application Development, Systems Integration, Developer Tools & Training
• Lecturer, Texas A&M College of Business
– Courses on Business Programming Fundamentals (VB.NET, C#), XML &
Advanced Web Development.
• Author
– Visual Studio Magazine, Dr. Dobbs Journal, IT Professional
• Consultant
– President & Principal, eInternet Studios
• Contact Information
– E-mail: [email protected]
– Web: http://tim-chester.tamu.edu
Texas A&M University
You Are. . .
• Software Developers
– New to XML, Object Oriented Development
– Require ‘basics’ of XML course
• IT Managers
– Need familiarity with XML basics and
terminology
– Interested in how XML can affect both
software development and legacy system
integration
This session . . .
• Assumes you know nothing about XML or
XML based technologies
• Provides a basic introduction to XML
based technologies
• Demonstrates some of the basics of
working with the DOM, XSLT, Schema,
WSDL, and SOAP.
Agenda
 XML
• Document Object Model (DOM)
• XPATH
• XSLT
• Schema
• WSDL
• SOAP
• Questions
Underlying Technologies
XML Is the Glue

L
XML
HTM
gy /I P
h no lo TCP
Tec
Connecting
Connectivity Presentation
Inn Applications
FTP,
ov E-ma
ati il, Go
pher
on Web P
ages
Connect Web
Se rvice
the Web Browse s
the Web Program
the Web
Evolution of Web

HTML, XML
HTML HTML

HTML, XML

Generation 1 Generation 2 Generation 3


Static HTML Web Applications Web Services
Web Services Overview
Application Model
Partner
Web Service

Other Web Services

Partner
Web Service
Internet + XML

End Users YourCompany.com


Application Business Logic Tier

Data Access and Storage Tier

Other Applications
Introducing XML
• XML stands for Extensible Markup
Language. A markup language specifies
the structure and content of a document.

• Because it is extensible, XML can be used


to create a wide variety of document
types.
Introducing XML
• XML is a subset of a the Standard Generalized
Markup Language (SGML) which was
introduced in the 1980s. SGML is very complex
and can be costly.

• These reasons led to the creation of Hypertext


Markup Language (HTML), a more easily used
markup language. XML can be seen as sitting
between SGML and HTML – easier to learn than
SGML, but more robust than HTML.
The Limits of HTML
• HTML was designed for formatting text on a Web page.
It was not designed for dealing with the content of a Web
page. Additional features have been added to HTML, but
they do not solve data description or cataloging issues in
an HTML document.

• Because HTML is not extensible, it cannot be modified to


meet specific needs. Browser developers have added
features making HTML more robust, but this has resulted
in a confusing mix of different HTML standards.
Introducing XML
• HTML cannot be applied consistently.
Different browsers require different
standards making the final document
appear differently on one browser
compared with another.
Introduction to XML Markup
• XML document (intro.xml)
– Marks up message as XML
– Commonly stored in text files
• Extension .xml
Document begins with declaration
1 <?xml version = "1.0"?>
2 that specifies XML version 1.0
3 <!-- Fig. 5.1 : intro.xml -->
4 <!-- Simple introduction to XML markup -->
5
6 <myMessage>
7 <message>Welcome to XML!</message>
Element message is
8 </myMessage>
child element of root
element myMessage
Line numbers are not part
of XML document. We
include them for clarity.
Introduction to XML Markup
(cont.)
• XML documents
– Must contain exactly one root element
• Attempting to create more than one root element is
erroneous
– Elements must be nested properly
• Incorrect: <x><y>hello</x></y>
• Correct: <x><y>hello</y></x>
– Must be well-formed
XML Parsers
• An XML processor (also called XML parser)
evaluates the document to make sure it conforms
to all XML specifications for structure and syntax.

• XML parsers are strict. It is this rigidity built into


XML that ensures XML code accepted by the
parser will work the same everywhere.
XML Parsers
• Microsoft’s parser is called MSXML and is
built directly in IE versions 5.0 and above.

• Netscape developed its own parser, called


Mozilla, which is built into version 6.0 and
above.
Parsers and Well-formed XML
Documents (cont.)
• XML parsers support
– Document Object Model (DOM)
• Builds tree structure containing document data in
memory
– Simple API for XML (SAX)
• Generates events when tags, comments, etc. are
encountered
– (Events are notifications to the application)
Parsing an XML Document with
MSXML
• XML document
– Contains data
– Does not contain formatting information
– Load XML document into Internet Explorer 5.0
• Document is parsed by msxml.
• Places plus (+) or minus (-) signs next to container elements
– Plus sign indicates that all child elements are hidden
– Clicking plus sign expands container element
» Displays children
– Minus sign indicates that all child elements are visible
– Clicking minus sign collapses container element
» Hides children
• Error generated, if document is not well formed
XML document shown in IE6.
Character Set
• XML documents may contain
– Carriage returns
– Line feeds
– Unicode characters
• Enables computers to process characters for
several languages
Characters vs. Markup
• XML must differentiate between
– Markup text
• Enclosed in angle brackets (< and >)
– e.g,. Child elements
– Character data
• Text between start tag and end tag
– Welcome to XML!
– Elements versus Attributes
White Space, Entity References
and Built-in Entities
• Whitespace characters
– Spaces, tabs, line feeds and carriage returns
• Significant (preserved by application)
• Insignificant (not preserved by application)
– Normalization
» Whitespace collapsed into single whitespace character
» Sometimes whitespace removed entirely

<markup>This is character data</markup>


after normalization, becomes
<markup>This is character data</markup>
White Space, Entity References and
Built-in Entities (cont.)
• XML-reserved characters
– Ampersand (&)
– Left-angle bracket (<)
– Right-angle bracket (>)
– Apostrophe (’)
– Double quote (”)
• Entity references
– Allow to use XML-reserved characters
• Begin with ampersand (&) and end with semicolon (;)
– Prevents from misinterpreting character data as markup
White Space, Entity References
and Built-in Entities (cont.)
• Build-in entities
– Ampersand (&amp;)
– Left-angle bracket (&lt;)
– Right-angle bracket (&gt;)
– Apostrophe (&apos;)
– Quotation mark (&quot;)
– Mark up characters “<>&” in element message
<message>&lt;&gt;&amp;</message>
Agenda
XML
 Document Object Model (DOM)
• XPATH
• XSLT
• Schema
• WSDL
• SOAP
• Questions
Introduction
• XML Document Object Model (DOM)
– Build tree structure in memory for XML
documents
– DOM-based parsers parse these structures
• Exist in several languages (Java, C, C++, Python,
Perl, C#, VB.NET, VB, etc)
Introduction
• DOM tree
– Each node represents an element, attribute, etc.
<?xml version = "1.0"?>
<message from = "Paul" to = "Tem">
<body>Hi, Tim!</body>
</message>

• Node created for element message


– Element message has child node for body element
– Element body has child node for text "Hi, Tim!"
– Attributes from and to also have nodes in tree
DOM Implementations
• DOM-based parsers
– Microsoft’s msxml
– Microsoft.NET System.Xml Namspace
– Sun Microsystem’s JAXP
Creating Nodes
• Create XML document at run time
Traversing the DOM
• Use DOM to traverse XML document
– Output element nodes
– Output attribute nodes
– Output text nodes
DOM Components
• Manipulate XML document
Agenda
XML
Document Object Model (DOM)
 XPATH
• XSLT
• Schema
• WSDL
• SOAP
• Questions
Introduction
• XML Path Language (XPath)
– Syntax for locating information in XML
document
• e.g., attribute values
– String-based language of expressions
• Not structural language like XML
– Used by other XML technologies
• XSLT
Nodes
• XML document
– Tree structure with nodes
– Each node represents part of XML document
• Seven types
– Root
– Element
– Attribute
– Text
– Comment
– Processing instruction
– Namespace
• Attributes and namespaces are not children of their parent node
– They describe their parent node
XPath node types
Node Type string -value expanded -name Description

root Determined by None. Represents the root of an


concatenating the XML document. This node
string-values of all text - exists only at the top of the
node descendents in tree and may contain element,
document order. comment or processor -
instruction children.
element Determined by The element tag, Represents an XML element
concatenating the including the namespace and may co ntain element, text,
string-values of all text - prefix (if applicable). comment or processor -
node descendents in instruction children.
document order.
attribute The normalized value The name of the Represents an attribute of an
of the attribute. attribute, including the element.
namespace prefix (if
applicable).
XPath node types. (Part 2)
Node Type string -value expanded -name Description

text The character data None. Represents the character


contained in the text node. data content of an element.

comment The content of the comment None. Represents an XML


(not including <!-- and -->). comment.

processing The part of the processing The target of the Represents an XML
instruction instruction that follows the processing processing instruction.
target and any whitespace. instruction.
namespace The URI of the namespace. The namespace Represents an XML
prefix. namespace.
Location Paths
• Location path
– Expression specifying how to navigate XPath
tree
– Composed of location steps
• Each location step composed of
– Axis
– Node test
– Predicate
Axes
• XPath searches are made relative to context
node
• Axis
– Indicates which nodes are included in search
• Relative to context node
– Dictates node ordering in set
• Forward axes select nodes that follow context node
• Reverse axes select nodes that precede context node
Node Tests
• Node tests
– Refine set of nodes selected by axis
• Rely upon axis’ principle node type
– Corresponds to type of node axis can select
Node-set Operators and
Functions (cont.)
• Location-path expressions
– Combine node-set operators and functions
• Select all head and body children element nodes
head | body
• Select last bold element node in head element node
head/title[ last() ]
• Select third book element
book[ position() = 3 ]
– Or alternatively
book[ 3 ]
• Return total number of element-node children
count( * )
• Select all book element nodes in document
//book
Agenda
XML
Document Object Model (DOM)
XPATH
 XSLT
• Schema
• WSDL
• SOAP
• Questions
Introduction
• Extensible Stylesheet Language (XSL)
– Used to format XML documents
– Consist of two parts
• XSL Transformation Language (XSLT)
– Transform XML document from one form to another
– Use XPath to match nodes
• XSL formatting objects
– Alternative to CSS
Setup
• XSLT processor
– Microsoft Internet Explorer 6
– Java 2 Standard Edition
– Microsoft.NET System.Xml Namespace
Templates
• XSLT document
– XML document with root element stylesheet
– template element
• Matches specific XML document nodes
• Uses XPath expression in attribute match
Templates (cont.)
• XSLT
– Two trees of nodes
• Source tree corresponds to original XML document
• Result tree contains nodes produced by
transformation
– Transforms intro.xml into HTML document
Iteration and Sorting
• XSLT allows
– Iteration through node set
• Element for-each
– Sorting node set
• Element sort
– Attribute ascending (i.e., A-Z)
– Attribute descending (i.e., Z-A)
Conditional Processing
• Perform conditional processing
– Such as if statement
– Use element choose
• Allows alternate conditional statements
• Similar to switch statement
• Has child elements when and otherwise
– when element content used if condition is met
– otherwise element content used if no conditions in
when condition are met
XSLT and XPath
• XPath Expression
– locates elements, attributes and text in XML
document
Agenda
XML
Document Object Model (DOM)
XPATH
XSLT
 Schema
• WSDL
• SOAP
• Questions
Working with Namespaces
• Name collision occurs when elements from two
or more documents share the same name.

• Name collision isn’t a problem if you are not


concerned with validation. The document
content only needs to be well-formed.

• However, name collision will keep a document


from being validated.
Name Collision
This figure shows two documents each with a Name element
Using Namespaces to Avoid
Name Collision
This figure shows how to use a namespace to avoid collision
Declaring a Namespace
• A namespace is a defined collection of element
and attribute names.

• Names that belong to the same namespace


must be unique. Elements can share the same
name if they reside in different namespaces.

• Namespaces must be declared before they can


be used.
Declaring a Namespace
• A namespace can be declared in the prolog or as an
element attribute. The syntax to declare a namespace in
the prolog is:

<?xml:namespace ns=“URI” prefix=“prefix”?>

• Where URI is a Uniform Resource Identifier that assigns


a unique name to the namespace, and prefix is a string
of letters that associates each element or attribute in the
document with the declared namespace.
Declaring a Namespace
• For example,

<?xml:namespace ns=http://uhosp/patients/ns
prefix=“pat”>

• Declares a namespace with the prefix “pat” and


the URI http://uhosp/patients/ns.

• The URI is not a Web address. A URI identifies a


physical or an abstract resource.
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 5.8 : namespace.xml -->
4 <!-- Namespaces -->
5
6 <directory xmlns:text = "urn:deitel:textInfo"
7 xmlns:image = "urn:deitel:imageInfo">
8
9 <text:file filename = "book.xml">
10 <text:description>A book list</text:description>
11 </text:file>
12
13 <image:file filename = "funny.jpg">
14 <image:description>A funny picture</image:description>
15 <image:size width = "200" height = "100"/>
16 </image:file>
17
18 </directory>
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 5.9 : defaultnamespace.xml -->
4 <!-- Using Default Namespaces -->
5
6 <directory xmlns = "urn:deitel:textInfo"
7 xmlns:image = "urn:deitel:imageInfo">
8
9 <file filename = "book.xml">
10 <description>A book list</description>
11 </file>
12
13 <image:file filename = "funny.jpg">
14 <image:description>A funny picture</image:description>
15 <image:size width = "200" height = "100"/>
16 </image:file>
17
18 </directory>
Schemas
• A schema is an XML document that defines the content
and structure of one or more XML documents.

• To avoid confusion, the XML document containing the


content is called the instance document.

• It represents a specific instance of the structure defined


in the schema.
Comparing Schemas and DTDs
This figure compares schemas and DTDs
Schema Dialects
• There is no single schema form.

• Several schema “dialects” have been


developed in the XML language.

• Support for a particular schema depends on the


XML parser being used for validation.
Starting a Schema File
• A schema is always placed in a separate
XML document that is referenced by the
instance document.
Schema Types
• XML Schema recognize two categories of
element types: complex and simple.

• A complex type element has one or more


attributes, or is the parent to one or more
child elements.

• A simple type element contains only


character data and has no attributes.
Schema Types

This figure shows types of elements


Understanding Data Types
• XML Schema supports two data types: built-in and
user-derived.

• A built-in data type is part of the XML Schema


specifications and is available to all XML Schema
authors.

• A user-derived data type is created by the XML


Schema author for specific data values in the
instance document.
Understanding Data Types
• A primitive data type, also called a base
type, is one of 19 fundamental data types
not defined in terms of other types.

• A derived data type is a collection of 25


data types that the XML Schema
developers created based on the 19
primitive types.
Agenda
XML
Document Object Model (DOM)
XPATH
XSLT
Schema
 WSDL
• SOAP
• Questions
WSDL
• Think "TypeLib for SOAP"
• WSDL = Web Service Description Language
• Uniform representation for services
– Transport Protocol neutral
– Access Protocol neutral (not only SOAP)
• Describes:
– Schema for Data Types
– Call Signatures (Message)
– Interfaces (Port Types)
– Endpoint Mappings (Bindings)
– Endpoints (Services)
UDDI
• Think "Yahoo!" for WebServices
• Universal Description and Discovery Interface
• WebService-Programmable "Yellow Pages"
• Advertise Sites and Services
• May point to DISCO resources
• Initiative driven by Microsoft, IBM, Ariba
Agenda
XML
Document Object Model (DOM)
XPATH
XSLT
Schema
WSDL
 SOAP
• Questions
SOAP
Overview

• A lightweight protocol for exchanging information


in a distributed, heterogeneous environment
– It enables cross-platform interoperability
• Interoperable
– OS, object model, programming language neutral
– Hardware independent
– Protocol independent
• Works over existing Internet infrastructure
SOAP
Overview
• Guiding principle: “Invent no new technology”
• Builds on key Internet standards
– SOAP ≈ HTTP + XML
– Submitted to W3C
• The SOAP specification defines:
– The SOAP message format
– How to send messages
– How to receive responses
– Data encoding
SOAP
SOAP Is Not…
• Objects-by-reference
– Distributed garbage collection
– Bi-directional HTTP
• Activation
• Complicated
– Doesn’t try to solve every problem in
distributed computing
– Can be easily implemented
SOAP
The HTTP Aspect
• SOAP requests are HTTP POST requests

POST /WebCalculator/Calculator.asmx HTTP/1.1


Content-Type: text/xml
SOAPAction: “http://tempuri.org/Add”
Content-Length: 386

<?xml version=“1.0”?>
<soap:Envelope ...>
...
</soap:Envelope>
SOAP
Message Structure

SOAP Message The complete SOAP message

Headers Protocol binding headers

SOAP Envelope <Envelope> encloses payload

SOAP Header <Header> encloses headers

Headers Individual headers

SOAP Body <Body> contains SOAP message name

Message Name & Data XML-encoded SOAP message name


& data
SOAP
SOAP Message Format
• An XML document using the SOAP schema:

<?xml version=“1.0”?>
<soap:Envelope ...>
<soap:Header ...>
...
</soap:Header>
<soap:Body>
<Add xmlns=“http://tempuri.org/”>
<n1>12</n1>
<n2>10</n2>
</Add>
</soap:Body>
</soap:Envelope>
SOAP
Server Responses
• Server replies with a “result” message:
HTTP/1.1 200 OK
...
Content-Type:text/xml
Content-Length: 391

<?xml version=“1.0”?>
<soap:Envelope ...>
<soap:Body>
<AddResult xmlns=“http://tempuri.org/”>
<result>28.6</result>
</AddResult>
</soap:Body>
</soap:Envelope>
SOAP
Industry Support
• DevelopMentor Inc. • Microsoft
• Digital Creations • Rogue Wave Software Inc.
• IONA Technologies PLC • Scriptics Corp.
• Jetform • Secret Labs AB
• ObjectSpace Inc. • UserLand Software Inc.
• Rockwell Software Inc. • Zveno Pty. Ltd.
• SAP • IBM
• Compaq • Hewlett Packard
• Intel
Agenda
XML
Document Object Model (DOM)
XPATH
XSLT
Schema
WSDL
SOAP
 Questions
Questions
Bibliography
• Harvey Deitel’s “XML:How To Program”
• Prentice Hall XML Reference
• Microsoft Academic Resource Kit

You might also like