0% found this document useful (0 votes)
20 views22 pages

Slide10 Part2

This document discusses Chapter 10 of the course "Web Data Analysis" which covers using Selenium for web scraping and interacting with web pages programmatically. It discusses using Selenium to find elements by XPath, and get the parent, child, sibling, next sibling and previous sibling elements. Code examples are provided to demonstrate how to locate elements and their relatives using the Selenium Python API and XPath queries. The chapter also introduces HTML, CSS and the Beautiful Soup library for web scraping.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Slide10 Part2

This document discusses Chapter 10 of the course "Web Data Analysis" which covers using Selenium for web scraping and interacting with web pages programmatically. It discusses using Selenium to find elements by XPath, and get the parent, child, sibling, next sibling and previous sibling elements. Code examples are provided to demonstrate how to locate elements and their relatives using the Selenium Python API and XPath queries. The chapter also introduces HTML, CSS and the Beautiful Soup library for web scraping.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

FACULTY OF INFORMATION SYSTEMS

Course:
Web Data Analysis
(3 credits)

Lecturer: Nguyen Thon Da Ph.D.


LECTURER’S INFORMATION

Chapter 10
Working with Web-Based APIs,
Beautiful Soup and Selenium
(Part 2)

Web Data Analysis :: Thon-Da Nguyen Ph.D.


MAIN CONTENTS
 Using Selenium for web scraping (cont.)
 Hypertext Markup Language: HTML
 Using Your Browser as a Development Tool
 Cascading Style Sheets: CSS
 The Beautiful Soup Library
 Scraping JavaScript

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium – Find Elements by XPATH
To find the HTML Elements by an XPath (language used for locating nodes in HTML) using Selenium
in Python, call find_elements() method and pass By.XPATH as the first argument, and the XPath value
as the second argument. Code: find_elements(By.XPATH, "xpath_value")
find_elements() method returns all the HTML Elements, that satisfy the given XPath value, as a list. If
there are no elements in the document for the given XPath value, then find_elements() method returns
an empty list.

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium – Find Elements by XPATH

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium – Find Elements by XPATH

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium – Find Elements by XPATH

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium – Get the parent element
To get the parent element of a given element in Selenium Python, call the find_element() method on
the given element and pass By.XPATH for the by parameter, and '..' for the value parameter in the
function call. If myelement is the WebElement object for which we would like to find the parent, the
code snippet for find_element() method is myelement.find_element(By.XPATH, '..')

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - Get the child elements
To get the child elements of a given element in Selenium Python, call the find_elements() method on
the given element and pass By.XPATH for the by parameter, and '*' for the value parameter in the
function call. If myelement is the WebElement object for which we would like to find the child
elements, the code snippet for find_elements() method is myelement.find_elements(By.XPATH, '*')
The above method call returns a list of WebElement objects.

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - Get all the sibling elements
To get all the sibling elements of a given element in Selenium Python, call the find_elements() method
on the given element and pass By.XPATH for by parameter, and 'following-sibling::* | preceding-
sibling::*' for the value parameter in the function call. If myelement is the WebElement object for
which we would like to find the sibling elements, the code snippet for find_elements() method is
myelement.find_elements(By.XPATH, "following-sibling::* | preceding-sibling::*")
The above method call returns a list of WebElement objects containing the sibling elements.

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - Get the next sibling element

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - Get the previous sibling element

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - Get all the next sibling elements

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - Get all the previous sibling elements

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for parent element
If web element myelement is already there, and you want to get the parent element of this
myelement using XPath, then use the following code: myelement.find_element(By.XPATH, "..")

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for all child elements

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for all sibling elements

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for the next immediate sibling element

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for all the next following sibling elements

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for the previous sibling element

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for all the previous sibling elements

Web Data Analysis :: Thon-Da Nguyen Ph.D.


Python Selenium - XPath for all the next sibling elements (using class)

Web Data Analysis :: Thon-Da Nguyen Ph.D.

You might also like