0% found this document useful (0 votes)
61 views

XML Sem 3

The document describes how to parse and extract information from an XML file using the xml.etree.ElementTree module in Python. It provides an example XML file containing food item data, then demonstrates how to parse the file, extract the root element and child elements, retrieve attribute values, and extract text from elements. The document also shows how to write XML data to a file using the ElementTree module.

Uploaded by

prashanth kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

XML Sem 3

The document describes how to parse and extract information from an XML file using the xml.etree.ElementTree module in Python. It provides an example XML file containing food item data, then demonstrates how to parse the file, extract the root element and child elements, retrieve attribute values, and extract text from elements. The document also shows how to write XML data to a file using the ElementTree module.

Uploaded by

prashanth kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Reading XML File :-

<?xml version="1.0" encoding="UTF-8"?>


<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>
Two idly's with chutney
</description>
<calories>553</calories>
</food>
<food>
<item name="breakfast">Paper Dosa</item>
<price>$2.7</price>
<description>
Plain paper dosa with chutney
</description>
<calories>700</calories>
</food>
<food>
<item name="breakfast">Upma</item>
<price>$3.65</price>
<description>
Ravaupma with bajji
</description>
<calories>600</calories>
</food>
<food>
<item name="breakfast">BisiBele Bath</item>
<price>$4.50</price>
<description>
BisiBele Bath with sev
</description>
<calories>400</calories>
</food>
<food>
<item name="breakfast">Kesari Bath</item>
<price>$1.95</price>
<description>
Sweet rava with saffron
</description>
<calories>950</calories>
</food>
</metadata>

The above example shows the contents of a file which I have named as ‘Sample.xml’ 

Python XML Parsing Modules

Python allows parsing these XML documents using two modules namely, the
xml.etree.ElementTree module and Minidom (Minimal DOM Implementation). Parsing means to
read information from a file and split it into pieces by identifying parts of that particular XML file.
xml.etree.ElementTree Module:
This module helps us format XML data in a tree structure which is the most natural
representation of hierarchical data. Element type allows storage of hierarchical data structures in
memory and has the following properties:

Property Description

It is a string representing the type of data


Tag
being stored

Consists of a number of attributes stored as


Attributes
dictionaries

A text string having information that needs


Text String
to be displayed

Tail String Can also have tail strings if necessary

Consists of a number of  child elements


Child Elements
stored as sequences

ElementTree is a class that wraps the element structure and allows conversion to and from XML.
Let us now try to parse the above XML file using python module.

There are two ways to parse the file using ‘ElementTree’ module. The first is by using the  parse()
function and the second is fromstring() function. The parse () function parses XML document
which is supplied as a file whereas,fromstring parses XML when supplied as a string i.e within
triple quotes.

Using parse() function:-


As mentioned earlier, this function takes XML in file format to parse it. Take a look at the following
example:

importxml.etree.ElementTree as ET
mytree = ET.parse('Sample.xml')
myroot = mytree.getroot()
print(myroot)
As you can see, The first thing you will need to do is to import the xml.etree.ElementTree module.
Then, the parse() method parses the ‘Sample.xml’ file. The getroot() method returns the root
element of ‘Sample.xml’.

To check for the root element, you can simply use the print statement as follows:

OUTPUT: 

 <Element ‘metadata’ at 0x033589F0>

The above output indicates that the root element in our XML document is ‘metadata’.
Using fromstring() function:
You can also use fromstring() function to parse your string data. In case you want to do this, pass
your XML as a string within triple quotes as follows:

importxml.etree.ElementTree as ET
data='''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>
Two idly's with chutney
</description>
<calories>553</calories>
</food>
</metadata>
'''
myroot = ET.fromstring(data)
#print(myroot)
print(myroot.tag)

You can also slice the tag string output by just specifying which part of the string you want to see
in your output.

EXAMPLE:

print(myroot.tag[0:4])

OUTPUT:  

meta

As mentioned earlier, tags can have dictionary attributes as well. To check if the root tag has any
attributes you can use the ‘attrib’ object as follows:

EXAMPLE:

print(myroot.attrib)

OUTPUT: 

{}

As you can see, the output is an empty dictionary because our root tag has no attributes.

Finding Elements of Interest:


The root consists of child tags as well. To retrieve the child of the root tag, you can use the
following:

print(myroot[0].tag)

OUTPUT: food
Now, if you want to retrieve all first-child tags of the root, you can iterate over it using the for loop
as follows:

for x in myroot[0]:

print(x.tag, x.attrib

OUTPUT:

item {‘name’: ‘breakfast’}


price {}
description {}
calories {}

All the items returned are the child attributes and tags of food.

To separate out the text from XML using ElementTree, you can make use of the text attribute. For
example, in case I want to retrieve all the information about the first food item, I should use the
following piece of code:

for x in myroot[0]:

print(x.text)

OUTPUT:

Idly
$2.5
Two idly’s with chutney
553

As you can see, the text information of the first item has been returned as the output. Now if you
want to display all the items with their particular price, you can make use of the get() method.
This method accesses the element’s attributes.

EXAMPLE:

for x in myroot.findall('food'):
item =x.find('item').text
price = x.find('price').text
print(item, price)
OUTPUT:

Idly $2.5
Paper Dosa $2.7
Upma $3.65
BisiBele Bath $4.50
Kesari Bath $1.95

The above output shows all the required items along with the price of each of them. Using
ElementTree, you can also modify the XML files.
Writing XML Documents:-

Using ElementTree

ElementTree is also great for writing data to XML files. The code below shows how to create an
XML file with the same structure as the file we used in the previous examples.

The steps are:

1. Create an element, which will act as our root element. In our case the tag for this element is "data".

2. Once we have our root element, we can create sub-elements by using the SubElement function. This
function has the syntax:

SubElement(parent, tag, attrib={}, **extra)

Here parent is the parent node to connect to, attrib is a dictionary containing the element
attributes, and extra are additional keyword arguments. This function returns an element
to us, which can be used to attach other sub-elements, as we do in the following lines by
passing items to the SubElement constructor.

3. Although we can add our attributes with the SubElement function, we can also use the
set() function, as we do in the following code. The element text is created with the text
property of the Element object.

4. In the last 3 lines of the code below we create a string out of the XML tree, and we write
that data to a file we open.
Example code:

Import xml.etree.cElementTree as ET
root = ET.Element("data")
doc = ET.SubElement(root,"food")

ET.SubElement(doc, "item", name="breakfast").text = "idly"


ET.SubElement(doc, "price").text = "25"
ET.SubElement(doc, "description").text = "Two idly's with chutney"

doc = ET.SubElement(root,"food")
ET.SubElement(doc, "item", name="breakfast").text = "Dosa"
ET.SubElement(doc, "price").text = "35"
ET.SubElement(doc, "description").text = "one dosa with chutney"

tree = ET.ElementTree(root)
tree.write("FILE3.xml")
Executing this code will result in a new file, "FILE3.xml", which
should be equivalent to the original "Sample.xml" file, at least
in terms of the XML data structure. You'll probably notice that
it the resulting string is only one line and contains no
indentation,

You might also like