0% found this document useful (0 votes)
64 views213 pages

Data Struct 2

Data structure Unit-2

Uploaded by

sakshi.swami25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views213 pages

Data Struct 2

Data structure Unit-2

Uploaded by

sakshi.swami25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 213

Unit II:

Linear Data Structures, Searching and Sorting

Dinesh Satre
[email protected]
Unit II: Linear Data Structures, searching and sorting

• Overview of Array, Array as an Abstract Data Type, Operations


on Array, Storage Representation, Multidimensional Arrays[2D,
nD], Sparse matrix representation using 2D
• Searching: Sequential Search/Linear Search, Binary Search,
Fibonacci Search, and Indexed Sequential Search.
• Sorting: Concepts- Stability, Efficiency, and Number of Passes,
Internal and External Sorting, Bubble sort, Insertion Sort,
Selection Sort , Quick Sort, Merge sort
• Case Study : Social Network Adjacency Matrix Representing
friendship connections among millions of users.
SEQUENTIAL ORGANIZATION
● The sequential organization allows storing data a
fixed distance apart

● If the ith element is stored at location X, then the


next sequential (i + 1)th element is stored at
location X + C, where C is a constant.

● Linear arrays, linear stacks, and linear queues are


some examples of sequential organization.
SEQUENTIAL ORGANIZATION
● One major advantage of sequential organization is the direct or random access to
any data element of the list in constant time.
● As sequential organization uses continuous memory locations to store data, the data
access time remains constant for accessing any element of the list

● The drawback of sequential organization: When performing in-between insertions


or deletions of elements in sequential organization, we have to perform data shifting
to keep the organization consistent and intact.
● So the in-between insertions and deletions become much expensive with respect to
time and space complexities.
LINEAR DATA STRUCTURE USING ARRAYS
● To store a group of data together in a sequential manner in computer’s memory,
arrays can be one of the possible data structures.
● Arrays enable us to organize more than one element in consecutive memory
locations; hence, it is also termed as structured or composite data type.
● The only restriction is that all the elements must be of the same data type.
● It can be thought of as a box with multiple compartments, where each compartment is
capable of holding one data item.
● Arrays support direct access to any of those data items just by specifying the name of
the array and its index as the item’s position
LINEAR DATA STRUCTURE USING ARRAYS
● Arrays are the most general and easy to use of all the data structures.

● An array as a data structure is defined as a set of pairs (index, value) such that with
each index, a value is associated.
○ index—indicates the location of an element in an array
○ value—indicates the actual value of that data element
LINEAR DATA STRUCTURE USING ARRAYS
● Index allows the direct addressing (or accessing) of
any element of an array.

● Most of the time, an array is implemented by using


continuous or consecutive memory locations
LINEAR DATA STRUCTURE USING ARRAYS
● Definition of Array: An array is a finite ordered collection of homogeneous data
elements that provides direct access to any of its elements.
○ Finite: The number of elements in an array is finite or limited.
○ Ordered collection: The arrangement of all the elements in an array is very specific, that is, every
element has a particular ranking in the array.
○ Homogeneous: All the elements of an array should be of the same data type.
LINEAR DATA STRUCTURE USING ARRAYS
● Let us see how to declare an array in C++.
int Array_A[20];

● This statement will allocate a memory space to store 20 integer elements, and the
name assigned to the array is Array_A.
LINEAR DATA STRUCTURE USING ARRAYS
● Size of array: The maximum number of elements that would be stored in an array is
the size of that array. It is also the length of that array. Arrays are static data structures
because once the size of an array is defined, it cannot be changed after compilation.
For the array Name, the size is 20.

● Base Address: The base address of an array is the memory location where the first
element of an array is stored. It is decided at the time of execution of a program. The
value of this base address varies at every program execution as it is decided at the
run-time. It cannot be decided or defined even by a programmer.
LINEAR DATA STRUCTURE USING ARRAYS
● Data type of an array: The data type of an array indicates the data type of elements
stored in that array.

● Index: A user or a programmer can access the elements of an array by using


subscripts such as Name[0], Name[1], ..., Name[i]. This subscript is called the index
of an element. It indicates the relative position of every element in the array with
respect to its first element. Often, an array is also referred to as subscripted variable.

● Range of index: If N is the size of an array, then, the range of index is 0 - (N - 1)


Array as an Abstract Data Type
● For defining an array as an abstract data type (ADT), we have to define the very basic
operations or functions that can be performed on it.

● The basic operations of arrays are


● Creating an array
● Storing an element
● Accessing an element
● Traversing the array
Array as an Abstract Data Type
● Let us specify an ADT array in which we provide specifications with operations to
be performed.
Array as an Abstract Data Type
● The function create() produces a new, empty array.

● Access() takes an array and an index as input, and returns either the appropriate value
or an error.

● Store() is used to enter new index–value pairs.

● The axiom given from line 6 in the ADT


Array as an Abstract Data Type
● ADT is a collection of domains, operations, and axioms (or rules).

● Domain: A domain is the intended set of values that any array may use either as an
index or as a value.
● We can say that a domain of an array is a collection of fixed, homogeneous elements that may be atomic
or structured.
● The restriction is that all the elements should be homogeneous.
● Arrays use a set of indices or subscript values that have one-to-one correspondence with the positive
integer values.
MEMORY REPRESENTATION AND ADDRESS CALCULATION

● During compilation, the appropriate number of locations is allocated for the array.

● The location of an entire block of memory is referenced by the base address

● The remaining elements are stored sequentially at a fixed distance apart


● So if the ith element is mapped into a memory location of address x, then the (i + 1)th element is mapped
into the memory location with address (x + C)
MEMORY REPRESENTATION AND ADDRESS CALCULATION

● The address of the ith element is calculated by the


following formula:

(Base address) + (Offset of ith element from base


address)

● Base address is the address of the first element where array


storage starts.
● The base address is x and the offset is computed as Offset of
ith element = (Number of elements before ith element) X (Size
of each element)
● Address of A[i] = Base + i * Size of element
MEMORY REPRESENTATION AND ADDRESS CALCULATION

● The index, address, and values are shown in Fig. for an array of six real numbers.
Operations on Array
● Inserting an Element into an Array
● Deleting an Element
Inserting an Element into an Array
● To insert an element at the ith position in an array of size N, all the elements originally
at positions i, i + 1, i + 2, ..., N - 1 will be shifted to i + 1, i + 2, i + 3, ..., N,
respectively so that each element gets shifted to the right by one position.
Inserting an Element into an Array
● Consider the following array:

● To insert ‘z’ at index = 2, that is at position 3, create room at 3 by data shifting.

● Then insert ‘z’ at position 3.


Deleting an Element
● After one deletion operation, one location becomes empty, so all the elements should
be shifted by one position after the deleted element to fill in the empty location of the
deleted element.

● In short, deletion can be handled by simply overwriting the specified location.


Deleting an Element

● Delete ‘c’ from the 3rd position, that is, index = 2.


MULTIDIMENSIONAL ARRAYS
● Most of the times, data is organized in multiple dimensions.

● When a one-dimensional array proves to be insufficient, and we need


two-dimensional, three-dimensional, or n-dimensional arrays.
Two-dimensional Arrays
• A two-dimensional array A of dimension m X n is a collection of m X n elements in
which each element is identified by a pair of indices [i, j], where in general, 1 ≤ i ≤ m
and 1 ≤ j ≤ n.

● For the C/C++ languages this range is 0 ≤ i < m and 0 ≤ j < n.

● A two-dimensional array has m rows and n columns.


Two-dimensional Arrays
● The pictorial representation of a two-dimensional array Student of size 100 X 9.
Memory Representation of Two-dimensional Arrays
● The elements of a multidimensional array can be stored in the memory as
● Row-major representation
● Column-major representation
Row-major Representation
● In row-major representation, the elements of matrix M are stored row-wise, that is,
elements of the 0th row, 1st row, 2nd row, 3rd row, and so on till the mth row.
Row-major Representation
● The address of the element of the ith row and the jth column for a matrix of size m X n
can be calculated as
• Address of (A[i][j]) = Base address+((i * Number of Columns) + j) * Size of element

● Address of (A[i][j]) = Base address + Offset

= Base address + (ith row * Number of Columns * Size of element) + (jth Column
* Size of element)

Address of (A[i][j]) = Base address+((i * Number of Columns) + j) * Size of element


Row-major Representation
● The base is the address of A[0][0].
● Address of A[i][j] = Base + (i * n * Size of element) + (j * Size of element)

● For Size of element = 1, the address is


● Address of A[i][j] = Base + (i * n) + j
Column-major Representation
● In column-major representation, m X n elements of a two-dimensional array A are
stored as one single row of columns.
● The elements are stored in the memory as a sequence: first the elements of column 0,
then the elements of column 1, and so on, till the elements of column n - 1.
Column-major Representation
● The address of A[i][j] is computed as

● Address of (A[i][j]) = Base address + Offset

= Base address + (jth column * Number of rows * Size of element) + (ith row * Size of
element)

Address of (A[i][j]) = Base address + ((j * Number of rows ) + i )* Size of element


Column-major Representation
● If the base is the address of A[0][0], then
● Address of A[i][j] = Base + (j * m * Size of element) + (i * Size of element)

● For Size of element = 1, the address is


● Address of A[i][j] for column-major arrangement = Base + (j * m) + i
Common Compilers and Their Default Matrix Storage Order
• C / C++ Row-major
• Fortran Column-major
• MATLAB Column-major
• Python (NumPy) Row-major by default, but supports both
• Java Row-major
EXAMPLE
● Consider an integer array, int A[3][4] in C++. If the base address is 1050, find the
address of the element A[2][3] with row-major and column-major representation of
the array.
● Solution: For C++, the LB of index is 0, and we have m = 3, n = 4, and Base = 1050.
Let us compute the address of the element A[2][3] using the address computation
formula
EXAMPLE
EXAMPLE
Example-2
• Find the address offset of A[2][1] in both layouts (using 0-based index)
○ Matrix: 3 rows × 3 columns
○ Element size = 4 bytes
○ Base address = 1000
Example-2
• Find the address offset of A[2][1] in both layouts (using 0-based index)
○ Matrix: 3 rows × 3 columns
○ Element size = 4 bytes
○ Base address = 1000
• Row Major:
○ Address of (A[i][j]) = Base address+((i * Number of Columns) + j) * Size of element
○ = 1000 + ((2 * 3) + 1) * 4 = 1028
• Column Major:
○ Address of (A[i][j]) = Base address+((j * Number of Rows) + i) * Size of element
○ = 1000 + (1 * 3) + 2) * 4 = 1020
n-Dimensional Arrays
● An n-dimensional m1 X m2 X m3 X ... X mn array A is a collection of m1 X m2 X
m3 X ... X mn elements in which each element is specified by a list of n integers such
as k1, k2, ... kn called subscripts where 0 ≤ k1 ≤ m1 - 1, 0 ≤ k2 ≤ m2 - 1, ..., 0 ≤ kn ≤
mn - 1.

● The element of array A with subscripts k1, k2, ..., kn is denoted by A[k1][k2]...[kn].
n-Dimensional Arrays
● Consider the three-dimensional array A[2][3][4].
● There are 2 X 3 X 4 = 24 elements in array A.
Q&A
• 1. Which of the following best describes an array?

A. A dynamic collection of elements with different data types


B. A fixed-size sequential collection of elements of the same data type
C. A linked list of heterogeneous elements
D. A structure storing key-value pairs
Q&A
• 2. What is the primary advantage of using an array?

A. Fast insertion at the beginning


B. Random access to elements using index
C. Dynamic resizing
D. Ability to store multiple data types
Q&A
• 3. Which of the following is not a valid operation on an array?

A. Traversal
B. Insertion
C. Deletion
D. Hashing
Q&A
• 4. Arrays are considered as which type of Abstract Data Type (ADT)?

A. Linear ADT
B. Non-linear ADT
C. Hierarchical ADT
D. Graph-based ADT
Q&A
• 5. What happens when an element is inserted into the middle of an array?

A. Only that element is inserted without affecting others


B. All elements are shifted one position right
C. All elements are shifted left
D. Array becomes non-contiguous
Q&A
• 6. What is the worst-case time complexity of searching an element in an unsorted
array?

A. O(1)
B. O(log n)
C. O(n)
D. O(n log n)
Q&A
• 7. In memory, array elements are stored:

A. In randomly distributed memory locations


B. In non-contiguous memory blocks
C. In contiguous memory locations
D. Using pointer-based links
Q&A
• 8. In C/C++, how is the address of the ith element of a 1D array A computed?

A. Base_Address + i * sizeof(datatype)
B. Base_Address + i
C. i + sizeof(datatype)
D. Base_Address * i
Q&A
• 9. Which of the following best describes a two-dimensional array?

A. A list of arrays
B. A linear sequence
C. A tree structure
D. A hash table
Q&A

10. If int A[3][4]; is declared in C, how many elements does the array have?

A. 3
B. 4
C. 7
D. 12
Q&A
• 11. In a row-major storage of a 2D array, which element comes first in memory?

A. The last element of the first column


B. The first element of the first row
C. The last element of the last row
D. The first element of the last column
Q&A
• 12. Which of the following is true about multidimensional arrays in most
programming languages?

A. They are implemented as nested linked lists


B. They are stored in row-major or column-major order
C. They allow dynamic resizing at runtime
D. Their size must be defined at runtime
SPARSE MATRIX
● To represent a matrix, we need a two-dimensional array with two different indices for
row and column references.

● The representation of a matrix for operations on it should be efficient so that the space
and time requirement is less.

• In many situations, the matrix size is very large but most of the elements in it are 0s
(less important or irrelevant data).

• A matrix of such type is called a sparse matrix


SPARSE MATRIX
● The sparse matrix must be represented and stored with an alternate way to achieve
good space utilization.

● Such representation avoids operations with 0s (addition or multiplication of 0s).

● Consequently, a good time complexity along with efficient storage is achieved if a


sparse matrix is stored with an alternate way.
SPARSE MATRIX

Sparse square matrix Sparse triangular matrix Sparse tridiagonal matrix


Sparse Matrix Representation
● In the sparse representation of a matrix, there are three columns.

● A triple (i, j, value) can easily represent the non-zero elements of the matrix.

● In general, for space reliability, 3 X (No_Of_NonZeroValues + 1) should always be


less than or equal to m X n where m = number of rows and n = number of columns.
Sparse Matrix Representation
Sparse Matrix Representation
• Let’s take this 5×4 matrix A

• Sparse Matrix Representation (3-tuple)


Sparse Matrix Representation
• Represent sparse matrix in normal matrix form
Sparse Matrix Addition
Let A and B be two sparse matrices to be added

● Only if the size of both the


matrices is the same can they
be added.

● M and N are the number of


non-zero elements in A and
B, respectively.

● C is the resultant sparse


matrix.
Sparse Matrix Addition

+ =

A= + B= =
Sparse Matrix Addition
• Do the addition of sparse matrices
Sparse Matrix Addition
Sparse Matrix Addition
Sparse Matrix Addition
Transpose of Sparse Matrix
• In the conventional approach, by interchanging rows and columns, we get the
transpose of the matrix as the elements at position [i][j] and [j][i] are swapped
• Let m and n be the number of rows and columns for matrix A.
• The transpose of A can be obtained using the following code.

• Time complexity of this technique is O(mn).


Transpose of Sparse Matrix
• The conventional transpose is not suitable for sparse matrix’s alternate representation

• We can notice that entries in BT are not sorted row and column wise; we need to sort
them further.
Simple Transpose
• Let A be a matrix of size m X n with T non-zero elements and let B be its transpose.

• One of the easiest ways is to search for each column (column = 0 to n - 1) and
sequentially place each column as a row in the transposed matrix B by placing the
interchanged entries as row, column, and value
Simple Transpose
Simple Transpose
• In Algorithm, we first take the first row of matrix A as (m,
n, t) and store it as (n, m, t) in matrix B.

• The entry (2, 1, 21) in A is stored as (1, 2, 21) in matrix B

• The next entry (3, 1, 31) is stored as (1, 3, 31) in matrix B



• Further, entry (1, 2, 12) is stored as (2, 1, 12) in matrix B

• Similarly, it goes on searching for each column value.
Simple Transpose
• Find the transpose of sparse matrix using simple transpose method
Fast Transpose
• Let A be a sparse matrix of size m X n with T non-zero elements. Its transpose will be
stored in matrix B.

• Let Freq and RowStartPos be two one-dimensional arrays of size n.



• In Freq array, the frequency count of each column in matrix A is stored, and
RowStartPos will be computed and stored at the position where each row entry of
matrix A is to be inserted in matrix B.

• Then, the RowStartPos is computed using Freq.


Fast Transpose
Fast Transpose
• This algorithm will first find the number of non-zero elements in each column and
store it in an array Freq.

• The second array RowStartPos is used to store the starting address of each column,
which will be a row in the corresponding transposed matrix.

• The starting address of each row in the transposed matrix is given by


RowStartPos[i] = RowStartPos[i - 1] + Freq[i - 1]
Fast Transpose
Fast Transpose

Freq RowStartPos
0 0 0 1
1 1 1 1
2 3 2 2
3 1 3 5
4 1 4 6
Q&A
• What defines a sparse matrix?

• a) A matrix with all elements equal to zero


• b) A matrix with most elements equal to zero
• c) A matrix with all diagonal elements equal to one
• d) A matrix with an equal number of non-zero and zero elements
Q&A
• What is the primary advantage of using sparse matrix representations?

• a) Reduced memory usage


• b) Faster computation
• c) Simplified matrix operations
• d) All of the above
Q&A
• When adding two sparse matrices, what is the time complexity?

• a) O(n)
• b) O(m + n)
• c) O(n²)
• d) O(1)
Q&A
• What is the main difference between simple transpose and fast transpose of a
sparse matrix?

• a) Simple transpose uses extra space, while fast transpose does not
• b) Fast transpose is more efficient in terms of time complexity
• c) Simple transpose is used for dense matrices only
• d) There is no difference; they are the same
Q&A
• In the fast transpose algorithm, what is the purpose of the 'index' array?

• a) To store the row indices of non-zero elements


• b) To store the column indices of non-zero elements
• c) To keep track of the position of elements in the transposed matrix
• d) To store the values of non-zero elements
Q&A
• In the context of sparse matrices, what does the term 'sparsity' refer to?

• a) The ratio of non-zero elements to total elements


• b) The ratio of zero elements to total elements
• c) The number of rows in the matrix
• d) The number of columns in the matrix
Searching
• The process of locating target data is known as searching.
○ Consider a situation where you are trying to get the phone number of your friend from a telephone
directory. The telephone directory can be thought of as a table or a file, which is a collection of records.
Each record has one or more fields such as name, address, and telephone number.

• The fields, which are used to distinguish records, are known as keys.

• While searching, we are asked to find the record which contains information along with
the target key. When we think of a telephone directory, the search is usually by name.
Searching
• If the key is unique and if it determines a record uniquely, it is called a primary key.
For example, telephone number is a primary key

• As any field of a record may serve as the key for a particular application, keys may not
always be unique. For example, if we use ‘name’ as the key for a telephone directory,
there may be one or more persons with the same name.

• In addition, sorted organization of a directory makes searching easier and faster.


Searching
• We may use one of the two linear data structures, arrays and linked lists, for storing the
data.
• Search techniques may vary according to data organization.
• The data may be stored on a secondary storage or permanent storage area.
• If the search is applied on the table that resides at the secondary storage (hard disk), it is
called as external searching, whereas searching of a table that is in primary storage
(main memory) is called as internal searching which is faster than external searching.
Searching
• A searching algorithm accepts two arguments as parameters—a target value to be
searched and the list to be searched.

• The search algorithm searches a target value in the list until the target key is found or
can conclude that it is not found
Search techniques
• Depending on the way data is scanned for searching a particular record, the search
techniques are categorized as follows:
○ 1. Sequential search
○ 2. Binary search
○ 3. Fibonacci search
○ 4. Index sequential search
• The performance of a searching algorithm can be computed by counting the number of
comparisons to find a given value.
Sequential Search/Linear Search
• The easiest search technique is a sequential search.

• This is a technique that must be used when records are stored without any consideration
given to order, or when the storage medium lacks any type of direct access facility.

○ For example, magnetic tape and linked list are sequential storage media where the
data may or may not be ordered.
Sequential Search/Linear Search
• Let us assume that we have a sequential file F, and we wish to retrieve a record with a
certain key value k.
○ If F has n records, then key retrieval is by examining the key values in the order
until the correct record is located.
• Such a search is known as sequential search

• A sequential search begins with the first available record and proceeds to the next
available record repeatedly until we find the target key or conclude that it is not found.

• Sequential search is also called as linear search.


Sequential Search/Linear Search
Sequential Search/Linear Search
• Sequential search for target data of 89
Sequential Search/Linear Search
Sequential Search/Linear Search
• The function SeqSearch() is defined with three parameters—the element to be searched,
the array A where the element is to be searched, and the total number of elements in the
array.

• The function SeqSearch() returns the location of the element if found or returns -1 if the
element is not found.
Sequential Search/Linear Search
• Let us compute the amount of time the sequential search needs to search for a target
data.
• We must compute the number of times the comparisons of keys is done.
• The number of comparisons depends on where the target data is stored in the search
list.
• If the target data is placed at the first location, we get it in just one comparison.
• Similarly, i comparisons are required if the target data is at the ith location and n
comparisons, if it is at the nth location.
Sequential Search/Linear Search

• The number of comparisons is n and the complexity is denoted as O(n).


Pros and Cons of Sequential Search
• Pros
• 1. A simple and easy method
• 2. Efficient for small lists
• 3. Suitable for unsorted data
• 4. Suitable for storage structures which do not support direct access to data, for
example, magnetic tape, linked list, etc.
• 5. Best case is one comparison, worst case is n comparisons, and average case is (n +
1)/2 comparisons
• 6. Time complexity is in the order of n denoted as O(n).
Pros and Cons of Sequential Search
• Cons
• 1. Highly inefficient for large data
• 2. In the case of ordered data other search techniques such as binary search are found
more suitable.
Variations of Sequential Search
• The time complexity of sequential search is O(n); this amounts to one comparison in
the best case, n comparisons in the worst case, and (n + 1)/2 comparisons in the average
case. The algorithm starts at the first location and the search continues till the last
element. We can make a few changes leading to a few variations in the sequential
search algorithm.
• There are three such variations:
• 1. Sentinel search
• 2. Probability search
• 3. Ordered list search
Sentinel search
• We note that in Sequential Algorithm, there are two comparisons one for the element
(key) to be searched and the other for the end of the array. The algorithm ends either
when the target is found or when the last element is compared.
• The algorithm can be modified to eliminate the end of list test by placing the target at
the end of list as just one additional entry. This additional entry at the end of the list is
called as a sentinel.
• Now, we need not test for the end of list condition within the loop and merely check
after the loop completes whether we found the actual target or the sentinel. This
modification avoids one comparison within the loop that varies n times. The only care
to be taken is not to consider the sentinel entry as a data member.
Sentinel search
Sentinel search
Probability search
• In probability search, the elements that are more probable are placed at the beginning
of the array and those that are less probable are placed at the end of the array.
Ordered list Search
• When elements are ordered, binary search is preferred.
• However, when data is ordered and is of smaller size, sequential search with a small
change is preferred to binary search.
• In addition, when the data is ordered but stored in a data structure such as a linked list,
modified sequential search is preferred.
• While searching an ordered list, we need not continue the search till the end of list to
know that the target element is not in the list. While searching in an ascending ordered
list, whenever an element that is greater than or equal to the target is encountered, the
search stops. We can also add a sentinel to avoid the end of list test.
Q&A
• Which of the following statements about sequential search is TRUE?
A. It only works on sorted lists
B. It is also known as binary search
C. It searches each element one by one
D. It uses a tree-like structure
Q&A
• In sentinel search, what is placed at the end of the array?
A. The maximum element
B. The minimum element
C. A null value
D. The target value (sentinel)
Q&A
• Probability search improves performance by:
A. Sorting the list before searching
B. Removing duplicates
C. Moving frequently accessed elements toward the front
D. Using binary trees
Q&A
• In probability search, which of the following operations is typically performed after a
successful search?
A. Move the found item to the end
B. Delete the found item
C. Swap the found item with the first item
D. Reverse the list
Q&A
• Ordered list search can stop early if:
A. The current element is greater than the target
B. The list is reversed
C. All elements are the same
D. The search starts from the end
Binary Search
• Sequential search is not suitable for larger lists. It requires n comparisons in the worst
case.

• In binary search, as we have divided the list to be searched every time into two lists and
the search is done in only one of the lists.

• In binary search algorithm, to search for a particular element, it is first compared with
the element at the middle position, and if it is found, the search is successful, else if the
middle position value is greater than the target, the search will continue in the first half
of the list; otherwise, the target will be searched in the second half of the list.
Binary Search
Binary Search
Binary Search
Binary Search

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14]

x = 10
Low = 1
High = 14
Mid = (1+14)/2 = 7
Binary Search

• Let us consider the list


Time Complexity Analysis
• Time complexity of binary search is O(log(n)) as it halves the list size in each step.

• The time complexity can be written as a recurrence relation as


Time Complexity Analysis
• The most popular and easiest
way to solve a recurrence
relation is to repeatedly make
substitutions for each occurrence
of the function T on the
right-hand side until all such
occurrences disappear.
Binary Search

• The recurrence relation for Binary Search

• Recurrence relation for D&C is

• Therefore,
• –a = 1
• –b = 2
Binary Search

d
• If f (n) ∈ Ө(n ) where d ≥ 0
• Then, d = 0, since f(n) = 1
• According to master theorem,

• Therefore,

d
• Since, a = b
Pros and Cons of Binary Search
Pros

1. Suitable for sorted data

2. Efficient for large lists

3. Suitable for storage structures that support direct access to data

4. Time complexity is O(log2(n))


Pros and Cons of Binary Search
Cons

1. Not applicable for unsorted data

2. Not suitable for storage structures that do not support direct access to data, for
example,magnetic tape and linked list

3. Inefficient for small lists


Fibonacci search

• We all know about Fibonacci numbers.

• The Fibonacci series has 0 and 1 as the first two terms, and each successive term is the
sum of the previous two terms.

• Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ... with

• Fm = Fm−1+ Fm−2 for m ≥ 2 where, F0 = 0 and F1 = 1.


Fibonacci search
• The detailed procedure of the searching is seen below

• Step 1 − As the first step, find the immediate Fibonacci number that is greater than or
equal to the size of the input array. Then, also hold the two preceding numbers of the
selected Fibonacci number, that is, we hold Fm, Fm-1, Fm-2 numbers from the
Fibonacci Series.

• Step 2 − Initialize the offset value as -1, as we are considering the entire array as the
searching range in the beginning.
Fibonacci search
• The detailed procedure of the searching is seen below

• Step 3 − Until Fm-2 is greater than 0, we perform the following steps

○ Compare the key element to be found with the element at index i = [min(offset+Fm-2,n-1)]. If a
match is found, return the index.

○ If the key element is found to be lesser value than this element, we reduce the range of the input
from 0 to the index of this element. The Fibonacci numbers are also updated with Fm = Fm-2.

○ But if the key element is greater than the element at this index, we remove the elements before
this element from the search range. The Fibonacci numbers are updated as Fm = Fm-1.
The offset value is set to the index of this element.
Fibonacci search
• The detailed procedure of the searching is seen below

• Step 4 − As there are two 1s in the Fibonacci series, there arises a case where your two
preceding numbers will become 1. So if Fm-1 becomes 1, there is only one element left
in the array to be searched. We compare the key element with that element and return
the 1st index. Otherwise, the algorithm returns an unsuccessful search.
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.

• Step 1
• The size of the input array is 10. The smallest Fibonacci number greater than 10 is 13.
• Therefore, Fm = 13, Fm-1 = 8, Fm-2 = 5.
• We initialize offset = -1
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.

• Step 2
• In the first iteration, compare it with the element at index = minimum (offset + F m-2, n
1) = minimum (-1 + 5, 9) = minimum (4, 9) = 4.
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.

• Step 3
• In the second iteration, update the offset value and the Fibonacci numbers.
• Since the key is greater, the offset value will become the index of the element, i.e. 4.
Fibonacci numbers are updated as Fm = Fm-1 = 8.
• Fm-1 = 5, Fm-2 = 3.
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.

• Now, compare it with the element at index = minimum (offset + Fm-2, n 1) = minimum
(4 + 3, 9) = minimum (7, 9) = 7.
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.

• Step 4
• We discard the elements after the 7th index, so n = 7 and offset value remains 4.
• Fibonacci numbers are pushed two steps backward, i.e. Fm = Fm-2 = 3.
• Fm-1 = 2, Fm-2 = 1.
Example
• Suppose we have a sorted array of elements {12, 14, 16, 17, 20, 24, 31, 43, 50, 62} and
need to identify the location of element 24 in it using Fibonacci Search.

• Now, compare it with the element at index = minimum (offset + Fm-2, n 1) = minimum
(4 + 1, 6) = minimum (5, 7) = 5.
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Step 1
• The size of the input array is 10. The smallest Fibonacci number greater than 10 is 13.
• Therefore, Fm = 13, Fm-1 = 8, Fm-2 = 5.
• We initialize offset = -1
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Step 2
• In the first iteration, compare it with the element at index = minimum (offset + F m-2, n
1) = minimum (-1 + 5, 9) = minimum (4, 9) = 4.

0 1 2 3 4 5 6 7 8 9

6 14 23 36 55 67 76 78 81 89
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Step 3
• In the second iteration, update the offset value and the Fibonacci numbers.
• Since the key is greater, the offset value will become the index of the element, i.e. 4.
Fibonacci numbers are updated as Fm = Fm-1 = 8.
• Fm-1 = 5, Fm-2 = 3.
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Now, compare it with the element at index = minimum (offset + Fm-2, n 1) = minimum
(4 + 3, 9) = minimum (7, 9) = 7.

0 1 2 3 4 5 6 7 8 9

6 14 23 36 55 67 76 78 81 89
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Step 4
• In the third iteration, update the offset value and the Fibonacci numbers.
• Since the key is greater, the offset value will become the index of the element, i.e. 7.
Fibonacci numbers are updated as Fm = Fm-1 = 5.
• Fm-1 = 3, Fm-2 = 2.
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Now, compare it with the element at index = minimum (offset + Fm-2, n-1) = minimum
(7 + 2, 9) = minimum (9, 9) = 9.

0 1 2 3 4 5 6 7 8 9

6 14 23 36 55 67 76 78 81 89
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Step 5
• We discard the elements after the 9th index, so n = 9 and offset value remains 7.
• Fibonacci numbers are pushed two steps backward, i.e. Fm = Fm-2 = 3.
• Fm-1 = 2, Fm-2 = 1.
Example-2
• Search for 81 using Fibonacci search in the list {6, 14, 23, 36, 55, 67, 76, 78, 81, 89},
where n = 10.

• Now, compare it with the element at index = minimum (offset + Fm-2, n-1) = minimum
(7 + 1, 9) = minimum (8, 8) = 8.

0 1 2 3 4 5 6 7 8 9

6 14 23 36 55 67 76 78 81 89
Algorithm
Time Complexity of Fibonacci Search
• When we solve a recurrence relation Fn = Fn−1 + Fn−2 for Fibonacci numbers, we get
the solution as Fn = (1/sqrt(5)) * [((1 + sqrt(5))/2)n + ((1 + sqrt(5))/2)n].

• For large n, the term ((1 - sqrt(5))/2)n tends to zero.



• Hence Fn is bounded by ((1 - sqrt(5))/2)n.

• Hence, Fn <= n * log[(1 + sqrt(5))/2].

• Hence, the algorithm for Fibonacci search is O(log(n)) algorithm.


Time Complexity of Fibonacci Search
• Consider an example where for a list of 10 numbers, each element of the 10 numbers is
to be searched once.
• For an unsuccessful search, the algorithm needs a total of 13 searches.
• In case of binary search, the number of comparisons would be 40, and for Fibonacci
search, it will be 41.
• Since this is a small-scale example, binary search will score, but in larger instances, it
may be the other way around.
• Fibonacci search is more efficient than binary search for large lists.
• However, it is inefficient in case of small lists.
Pros and Con
Pros

1. Faster than binary search for larger lists

2. Suitable for sorted lists

Con

1. Inefficient for smaller lists


Indexed Sequential Search
• Indexed sequential search is suitable for sequential files.

• File index is a data structure similar to a list of keys and their location or reference to
the location of the record associated with the key.

• An index file can be used to effectively overcome the problem associated with
sequential files and to speed up the key search.

• Only a subset of data records,evenly spaced along the data file, is indexed to mark the
intervals of data records
Indexed Sequential Search
• A key search then proceeds as follows:
○ the search key is compared with the index to find the highest index key
preceding the search, and a linear search is performed from the current record
until the search key is matched or until the record pointed by the next index entry
is reached.
○ In spite of the double file access (index + data) needed by this kind of search, the
decrease in access time with respect to a sequential file is significant.
Indexed Sequential Search
• Consider the data file as in Table
Indexed Sequential Search
Indexed Sequential Search
• Searching a record from this index file involves the following issues:

1. The index file is ordered, so the searching can be done using the binary search method.

2. The search is successful if we find the target element in the index.

3. The record position is used to access the details of that record from the data file.
Example
• Consider a sorted array of integers: arr = {6, 7, 8, 9, 10, 11, 12, 13, 14, 15}. We want
to search for the element 8.
• Create an Index: Divide the array into blocks and create an index that stores the
starting element and its corresponding index for each block. For this example, let's
create blocks of 3 elements.

0 1 2 3 4 5 6 7 8 9

6 7 8 9 10 11 12 13 14 15
Example
• Consider a sorted array of integers: arr = {6, 7, 8, 9, 10, 11, 12, 13, 14, 15}. We want
to search for the element 8.
Index Key

0 6

1 9

2 12

3 15

0 1 2 3 4 5 6 7 8 9

6 7 8 9 10 11 12 13 14 15
Example
• Consider a sorted array of integers: arr = {6, 7, 8, 9, 10, 11, 12, 13, 14, 15}. We want
to search for the element 8.
Index Key
• Compare the target element (8) with the elements in the index
0 6
• 8 is not less than index[0] (which is 6).
1 9
• 8 is less than index[1] (which is 9).
2 12

3 15
0 1 2 3 4 5 6 7 8 9

6 7 8 9 10 11 12 13 14 15
Example
• Consider a sorted array of integers: arr = {6, 7, 8, 9, 10, 11, 12, Index Key
13, 14, 15}. We want to search for the element 8. 0 6

1 9
• Perform a sequential search within the identified block (from
2 12
index 0 to 2) in the original array.
3 15

0 1 2 3 4 5 6 7 8 9

6 7 8 9 10 11 12 13 14 15
Q&A
• What is the time complexity of Binary Search in the worst case?
A) O(1)
B) O(n)
C) O(log n)
D) O(n log n)
Q&A
• Binary Search can only be applied to:
A) Unsorted arrays
B) Linked lists
C) Sorted arrays
D) Hash tables
Q&A
• Which of the following is not a requirement for Binary Search to work correctly?
A) The list must be sorted
B) Random access to elements
C) Array must contain unique elements
D) Array indexing support
Q&A
• Fibonacci Search divides the array using:
A) Middle element
B) Golden ratio
C) Fibonacci numbers
D) Prime numbers
Q&A
• What is the worst-case time complexity of Fibonacci Search?
A) O(1)
B) O(n)
C) O(log n)
D) O(n log n)
Q&A
• Indexed Sequential Search is a combination of:
A) Binary Search and Linear Search
B) Hashing and Linear Search
C) Indexing and Sequential Search
D) Fibonacci and Interpolation Search
Q&A
• The index in Indexed Sequential Search typically stores:
A) Every element of the main list
B) Only the first and last elements
C) Keys and their addresses
D) All records
Sorting
• Sorting is the operation of arranging the records of a table according to the key value of
each record, or it can be defined as the process of converting an unordered set of
elements to an ordered set.

• A table or a file is an ordered sequence of records r[1], r[2], …, r[n], each containing a
key k[1], k[2], … , k[n]. This key is usually one of the fields of the entire record. The
table is said to be sorted on the key if i < j implies that k[i] precedes k[j] in some
ordering on the keys.
General Sort Concepts
• Stability :-A sorting algorithm is said to be stable if it preserves the order for all records
with duplicate keys; that means, if for all records i and j is such that k[i] is equal to k[j]
and if r[i] precedes to r[j] in the unsorted table, then r[i] precedes to r[j] in the sorted
table too.

• Bubble sort, selection sort, and insertion sort are the stable sort methods.
General Sort Concepts
• Consider the following unsorted sequence of marks to be sorted in descending order.
Efficiency
• Each sorting method may be analysed depending on the amount of time necessary for
running the program and the amount of space required for the program.
• The amount of time for running a program is proportional to the number of key
comparisons and the movement of records or the movement of pointers to records.
• Sort efficiency is a measure of the relative efficiency of a sort.
• It is usually an estimate of the number of comparisons and data movement required to
sort the data.
Passes

• During the sorted process, the data is traversed many times.


• Each traversal of the data is referred to as a sort pass.
• Depending on the algorithm, the sort pass may traverse the whole list or just a section
of the list.
• In addition, the characteristic of a sort pass is the placement of one or more elements in
a sorted list
Types of Sorting

• Sorting algorithms are divided into two categories: internal and external
sorts.
• Internal Sorting:- Any sort algorithm that uses main memory exclusively
during the sorting is called as an internal sort algorithm. This assumes
high-speed and random access to all data members. Internal sorting is
faster than external sorting.
• External Sorting :- Any sort algorithm that uses external memory, such
as tape or disk, during the sorting is called as an external sort algorithm.
Merge sort uses external memory
Types of Sorting

• The various internal sorting techniques are the following:


• 1. Bubble sort 2. Insertion sort
• 3. Selection sort 4. Quick sort
• 5. Heap sort 6. Shell sort
• 7. Bucket sort 8. Radix sort
• 9. File sort 10. Merge sort
Bubble Sort
• The bubble sort is the oldest and the simplest sort in use.
• Unfortunately, it is also the slowest.
• The bubble sort works by comparing each item in the list with the item next to it and
swapping them if required.
• The algorithm repeats this process until it makes a pass all the way through the list
without swapping any items.
• This causes larger values to ‘bubble’ to the end of the list while smaller values ‘sink’
towards the beginning of the list.
Algorithm
Example
Example
Example-2
Example-2
Analysis of Bubble Sort
• For n data items, the method requires n(n - 1)/2 comparisons and on an average,
almost one-half as many swaps.

• The bubble sort, therefore, is very inefficient in large sorting jobs.

• In bubble sort, (n - 1) comparisons in the first iteration, (n - 2) comparisons in the


second iteration, so on
(n - 1) + (n - 2) + (n - 3) + … + 1 = n(n - 1)/2

• Thus, the total number of comparisons is n(n - 1)/2, which is O(n2).


Insertion Sort
• The insertion sort works just like its name suggests—it inserts each item into its proper
place in the final list.

• The simplest implementation of this requires two list structures: the source list and the
list into which the sorted items are inserted.
Insertion Sort
• Let us consider a list L = {3, 6, 9, 14}. Given this sorted list, we need to insert a new
element 5 in it.

• The commonly used process would involve the following steps:

○ 1. Compare the new element 5 and the last element 14

○ 2. Shift 14 right to get 3, 6, 9, ,14

○ 3. Shift 9 right to get 3, 6, ,9, 14

○ 4. Shift 6 right to get 3, ,6, 9, 14

○ 5. Insert 5 to get 3, 5, 6, 9, 14
Example
Example
Example
Example
Example-2
Algorithm
Analysis of Insertion Sort
• Although the insertion sort is almost always better than the bubble sort, the time
required in both the methods is approximately the same, that is, it is proportional to n2

• The total number of comparisons is given as follows:

(n - 1) + (n - 2) + …. + 1 = (n - 1) * n/2

• which is O(n2).
Selection Sort
• This algorithms construct the sorted sequence, one element at a time, by adding
elements to the sorted sequence in order.
• At each step, the next element to be added to the sorted sequence is selected
from the remaining elements.
• Because the elements are added to the sorted sequence in order, they are always
added at one end.
• This makes the selection sorting different from the insertion sorting.
• In insertion sorting, the elements are added to the sorted sequence in an arbitrary
order.
• Therefore, the position in the sorted sequence at which each subsequent element
is inserted is arbitrary.
Selection Sort
• In this method, we sort a set of unsorted elements in two steps.
• In the first step, find the smallest element in the structure.
• In the second step, swap the smallest element with the element at the first
position.
• Then, find the next smallest element and swap with the element at the second
position.
• Repeat these steps until all elements get arranged at proper positions.
Example
Example-2
Analysis of Selection Sort
• During the first pass, (n - 1) comparisons are made. In the second pass, (n - 2)
comparisons are made. In general, for the ith pass, (n - i) comparisons are required

• The total number of comparisons is as follows


(n - 1) + (n - 2) + … + 1 = n(n -1)/2
• Therefore, the number of comparisons for the selection sort is proportional to n2,
which means that it is O(n2).
Quick Sort
• Quick sort is based on the divide-and-conquer strategy.
• This sort technique initially selects an element called as pivot that is near the middle of
the list to be sorted, and then the items on either side are moved so that the elements
on one side of pivot are smaller and on the other side are larger.
• Now, the pivot is at the right position with respect to the sorted sequence.
• These two steps, selecting the pivot and arranging the elements on either side of pivot,
are now applied recursively to both the halves of the list till the list size reduces to
one.
• To choose the pivot, there are several strategies.
• The popular way is considering the first element as the pivot
Quick Sort
• Thus, the recursive algorithm consists of four steps
• 1. If the array size is 1, return immediately.
• 2. Pick an element in the array to serve as a ‘pivot’
• 3. Partition the array into two parts—one with elements smaller than the pivot and the
other with elements larger than the pivot by traversing from both the ends and
performing swaps if needed.
• 4. Recursively repeat the algorithm for both partitions.
Example

A[pivot] = 13
Example
• Let us first find the elements larger than the pivot, that is, 13. In addition, let us find
the last element not larger than the pivot. These elements are in positions 2 and 9. Let
us swap those.
Example
• Let us again start scanning from both the directions
Example
• Let us repeat the steps to get the following sequence:

• Here, the lower and upper bounds have crossed. So let us now swap the pivot-with
element 12.
Example
• Here, we get two partitions as represented in the following sequence:

• Recursively applying similar steps to each sub-list on the right and left side of the
pivot, we get,
Algorithm
Analysis of Quick Sort
• Now, let us see the efficiency of quick sort. On the first pass, every element in the
array is compared to the pivot, so there are n comparisons.
• The array is then divided into two parts each of size (n/2).
• We assume that the array is divided into approximately one-half each time.
• For each of these sub-arrays, (n/2) comparisons are made and four sub-arrays of size
(n/4) are formed.
• So at each level, the number of sub-arrays doubles.
• It will take log2n divisions if we are dividing the array approximately one-half each
time.
• Therefore, quick sort is O(nlog2n) on the average.
Virutual Lab
• https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html
• https://ds1-iiith.vlabs.ac.in/exp/quick-sort/quick-sort/partition.html
• https://www.hackerearth.com/practice/algorithms/sorting/quick-sort/visualize/
• https://opendsa-server.cs.vt.edu/embed/quicksortAV
• https://visualgo.net/en/sorting

Merge Sort
• The most common algorithm used in external sorting is the merge sort.
• Merging is the process of combining two or more sorted files into the third sorted file.
• We can use a technique of merging two sorted lists.
• Divide and conquer is a general algorithm design paradigm that is used for merge sort
Merge Sort
• Merge sort has three steps to sort an input sequence S with n elements:
• 1. Divide—partition S into two sequences S1 and S2 of about n/2 elements each
• 2. Recur—recursively sort S1 and S2
• 3. Conquer—merge S1 and S2 into a sorted sequence

• A file (or sub-file) is divided into two files, f1 and f2. These two files are then
compared, one pair of records at a time, and merged
Example
• The operation of the algorithm on the list 8, 3, 2, 9, 7, 1, 5, 4 is illustrated as
Algorithm
Algorithm
Time Complexity

•The worst case, Cmerge(n) = n − 1


•Therefore the recurrence relation is

•Hence, according to master theorem


–a = 2, b = 2, f(n) = n
d
–If f (n) ∈ Ө(n ) where d ≥ 0
•Then, d = 1, since f(n) = n
Q&A

• What is the key difference between internal and external sorting?

• A. Internal sorting uses disk, external sorting uses RAM


B. Internal sorting uses cache, external sorting uses CPU
C. Internal sorting is done in memory, external sorting uses secondary storage
D. Internal sorting is slower than external sorting
Q&A

• Which sorting algorithm is not efficient for large datasets due to its O(n²)
time complexity in worst case?
• A. Merge Sort
B. Quick Sort
C. Insertion Sort
D. Heap Sort
Q&A

• What is the average case time complexity of Bubble Sort?

• A. O(n)
B. O(log n)
C. O(n log n)
D. O(n²)
Q&A

• Which sorting algorithm repeatedly selects the minimum element and


places it at the beginning?
• A. Insertion Sort
B. Quick Sort
C. Selection Sort
D. Merge Sort
Q&A

• What is the worst-case time complexity of Quick Sort?

• A. O(n)
B. O(n log n)
C. O(n²)
D. O(log n)
Q&A

• Merge Sort uses which algorithmic paradigm?

• A. Backtracking
B. Divide and Conquer
C. Greedy
D. Dynamic Programming
Q&A

• Which sorting algorithm is stable and has a worst-case time complexity of


O(n log n)?
• A. Quick Sort
B. Heap Sort
C. Merge Sort
D. Selection Sort
Q&A

• External sorting is mainly required when:

• A. Data is small
B. Data fits into RAM
C. Data is too large to fit into main memory
D. Data is sorted already

You might also like