0% found this document useful (0 votes)
93 views

Map, Filter and Reduce Functions

The document discusses various Python functions - map, filter and reduce. Map applies a function to each element of an iterable and returns a new iterable. Filter returns elements from an iterable that satisfy a given condition. Reduce takes an iterable and combines elements using a function to produce a single value. Examples of using each function on lists, tuples and dictionaries are provided to demonstrate their usage.

Uploaded by

Ritik Mitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Map, Filter and Reduce Functions

The document discusses various Python functions - map, filter and reduce. Map applies a function to each element of an iterable and returns a new iterable. Filter returns elements from an iterable that satisfy a given condition. Reduce takes an iterable and combines elements using a function to produce a single value. Examples of using each function on lists, tuples and dictionaries are provided to demonstrate their usage.

Uploaded by

Ritik Mitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

Map, Filter and Reduce Functions

Now that we have covered more sophisticated methods like loops and comprehensions, let's
also learn more about the map, filter and reduce methods, that offer us sophisticated and faster
method implementation. Starting with Map, Map is a function that works like list
comprehensions and for loops. It is used when you need to map or implement functions on
various elements at the same time.
The syntax of the map function looks as shown below:
map(function,iterable object)
The function here can be a lambda function or a function object.
The iterable object can be a string, list, tuple, set or dictionary.
Let’s look at an example to understand the map function better:
In the video above, you are using a map function to create a list from a tuple with each of its
elements squared.
list_numbers = (1,2,3,4)
sample_map = map(lambda x: x*2, list_numbers)
print(list(sample_map))
In the implementation of the map function, the lambda function lambda x: x*2 would return a
lambda object, but the map handles the job of passing each element of the iterable to the lambda
object and storing the value obtained in a map object. Using the list function on the map object,
you finally obtain the output list.
def multi(x):
return x*2
list_numbers = [1,2,3,4]
sample_map = map(multi, list_numbers)

print(list(sample_map))
The difference between the previous code and this above-mentioned code is that instead of
applying a lambda function, you can use the function object also.
Now let's look at the Filter operation:
'Filter' is similar to the map function, the only distinguishing feature being that it requires the
function to look for a condition and then returns only those elements from the collection that
satisfy the condition.
The syntax of the filter function looks as shown below:
filter(function,iterable object)
The function object passed to the filter function should always return a boolean value.
Let's take a look at an example to understand the filter function.
In the video, the filter command was used to create an application that can count the number
of students whose age is above 18.
students_data = {1:['Sam', 15] , 2:['Rob',18], 3:['Kyle', 16], 4:['Cornor',19], 5:['Trump',20]}

len(list(filter(lambda x : x[1] > 18, students_data.values())))


Now, let's take a look at the last function in sequence i.e. the reduce function -
'Reduce' is an operation that breaks down the entire process into pair-wise operations and uses
the result from each operation, with the successive element. The syntax of reduce function is
given below.
reduce(function,iterable object)
The function object passed to the reduce function decides what expression is passed to the
iterable object.
Also, reduce function produces a single output.
Note: One important thing to note is that the reduce function needs to be imported from the
'functools' library.
from functools import reduce
Let's take a look at an example to understand the reduce function:
from functools import reduce
list_1 = ['Paul','Ted']
reduce(lambda x,y: x + y,list_1)
In the code snippet given above, you are importing functools library is being imported to access
reduce function. In the implementation of the reduce function, the lambda function x,y: x+y,
list_1 is appending two objects; remember here if the objects are strings it appends them if the
objects are numbers it adds them.
For the example above, the reduce function will convert the list of string into a single list, the
output is given below:
'PaulTed'
In the next segment, let's solve a few questions related to Map, Filter, and Reduce.
Dictionary and List

One of the most common ways to store data is in a dictionary or in a data frame (you will learn
about the data frame in later modules). So, it is important to have a good grasp of iterating
through dictionary keys and values. Please have a look at the coding quiz question below before
you proceed to see Sajan solve it.
You saw Sajan build pseudocode for the problem at hand. Can you try and solve it now? Sajan,
will also demonstrate how to convert this to code.
In this segment, you learned how to iterate on the keys and the values of the dictionary. You
can iterate on values in two ways, by using dict.values() and also by iterating on dict.keys()
and using dict[k] to access the value of key k. Choosing what way to traverse through the
dictionary will be crucial and will change with your objective for the same.
In the next segment, you will solve a custom problem to determine whether a given string is an
'UpGrad string' or not.
Session Overview

In this session, you will learn about how having two or more pointers or iterators can sometimes
improve the algorithm time complexity significantly. You must have already done this in a few
problems before, like in searching algorithms, binary search and rotated list search etc. We will
reinforce this learning using the following problems:
Merge two sorted lists
Specific sum
Sort 0s and 1s
Module Introduction

Welcome to the module Programming in Python - II. In this segment, you will get an overview
of the topics covered in this module.
In this module
In the previous module, you learned how to approach a given problem in different ways. In this
module, you will learn about comparing those different approaches based on their time and
space complexities. You will also learn about a few searching and sorting techniques,
recursion, and two-pointer methods to iterate through different data structures to improve time
and space complexities. This module has been broken down into 2 major portions:
Two Pointers
This is just a slightly different approach taken to solve problems to improve their time
complexities.
Recursion
You briefly stumbled upon recursion in the module Introduction to Python earlier. Here, you
will learn how to think about recursive algorithms and the cost associated with the recursion
approach.
Module Objectives
At the end of this session, you will be able to:
Understand how to break down complex real-life problems into smaller problems
Understand how to draw use case of a given problem
Understand how to write pseudocode for a given coding problem
Pre-requisites
We expect the learner to have gone through the previous two modules.
Guidelines for in-module questions
The in-video and in-content questions for this module are not graded.
Guidelines for graded lab questions
The lab questions at the end of the module are graded.
People you will hear from in this module
Faculty
Sajan Kedia
Data Science Lead, Myntra
Sajan completed B.Tech. and M.Tech. in Computer Science from IIT BHU. During his
master’s, he worked on data mining and published research papers on that topic. Currently, he
is leading the data science team of pricing at Myntra, building AI systems for personalised
pricing. He has expertise in big data technologies, machine learning, and NLP.
Summary

In this session, you solved several problems using Python programming. Let's revise each
problem statement one by one:
Remove duplicates: You learnt how to remove duplicate values from a given list of integers.
Dictionary and list: You learnt about a dictionary which is a widely used data structure in
Python.
upGrad string: You also solved a problem where you determined whether a string is an 'upGrad'
string or not.
Balanced brackets: Finally, you also wrote some code to find if a given string of brackets is
balanced or not.
Sajan's Anecdotes

Here are few of the many pointers from the industry expert to remember while coding as a
beginner in the field of Data Science.
Let us summarise our understanding from this session in the next segment.
Practice Questions - II

Here are some more assessments for your practice.


In the next segment, the industry expert Sanjay will highlight some tips for data science.
Practice Questions - I

Here is a problem for you to practice.


Lets practice some more problems in the next segment.
Balanced Brackets

By now you must have realised that matching brackets get highlighted when you are coding on
the console. Your next problem is based on the same thing. Given a string on brackets, can you
determine if the string of brackets is balanced or not?
Can you try and solve it now? Sajan will explain about how to code the above-explained logic
and solve this problem. However, we highly recommend to try and solve the problem before
seeing Sajan code the solution.
Do you think you will be able to give out the index of matching brackets now? Discuss how
you would do it in the discussion forums.
upGrad String

The next problem will help you in developing a strong sense of dictionary key and value. For
the purpose of this question, we will define something called an upGrad string. Note that this
definition is not valid outside this question.
A string is an upGrad string if the frequency of its characters is something like 1, 2, 3, 4, ... i.e.,
a character appears only once, another appears twice, another appears thrice and so on.
For example string '$yrr$ssrsr' is an upGrad string since the frequency of y:1, $:2, s:3, r:4,
however, string '$yrr$ssrsr%' will not be an upGrad string since it has two characters (y and %)
with frequency 1. The frequency of characters should be of the form 1, 2, 3, 4, 5... only. Given
a string, can you determine if the string is upGrad string or no?
Here Sajan explains how to code this problem. The key takeaway from the last two problems
is to learn how to iterate on dictionary keys and values.
In the next segment, Sajan will introduce you to the problem of 'balancing brackets' and also
explain how to solve it.
Remove Duplicates

You often encounter duplicates in your data when you get it from different sources. Although
there are inbuilt functions in the Pandas library to remove duplicates which you will learn in
upcoming modules, this raises the question of how do you really remove duplicates.
Let's take a smaller version of the same problem. You will be given a list of integers, you have
to remove all duplicate values from the list. How would you do it?
Let's see how this will look like in code on the console.
It is your turn to try it out now!
Be careful though, all objects cannot be used as a key to a dictionary. One obvious condition
is that no two keys are the same in a dictionary. But apart from this, the key object should be
hashable and immutable. In layman's terms, hashing means being able to map individual
elements to unique values. Hashing and how it works is beyond the scope of this module.
However, If you are interested in learning more about it you can read this python wiki link and
more information on hashing can be found here. But there however is one more way using sets.
Why do you think we did not use sets?
A dictionary is another very commonly used data structure in Python programming. Let's learn
how to use it in the next segment.
Refresher

Please attempt these problems to test your knowledge of dictionaries, sets, and tuples. You may
revise it from the previous module - Introduction to Python. We recommend you revise the
concepts based on your performance in the following questions.
Now that you have the basics of sets and dictionaries revised, let's solve some problems that
will make use of these data structures.
Session Overview

In this session
In this session, we will learn about all other data structures like tuples, sets, and dictionary.
From the data science perspective, the dictionary is one of the most important data structures
and you will learn so in upcoming modules. You will focus on iterating through the dictionary
keys and values. The key learnings from the previous sessions will also fit in this as more often
than not, the keys and values are either strings or lists or lists of strings. You will also see
different scenarios where a dictionary will make the problem much easier.
The learning objectives will be achieved with the help of the following problems:
Remove Duplicates
Dictionary and List
upGrad String
Balanced Brackets
People you will hear from in this session
Faculty
Sajan Kedia
Data Science Lead, Myntra
Sajan did B.Tech. & M.Tech. in Computer Science from IIT BHU. During his Masters, he
worked on Data Mining & published research papers on the topic. Currently, he is leading the
Data Science Team of Pricing at Myntra, building AI systems for personalised pricing. He has
very good expertise in Big Data technologies, Machine learning, and NLP.
Summary

In this session, you solved several problems using Python programming. Let's revise each
problem statement one by one:
Palindrome string: You wrote a program to determine whether a string is a palindrome or not.
Reverse words: You also learnt how to reverse a given word.
No spaces: You learnt how to clean a given data value by removing spaces and converting the
case of the letters as required.
Move vowels: You also solved an interesting problem where you had to move the vowels to
the front and the consonants to the back of a given string.
Common prefix: You wrote a program to find the common prefix in two given strings. This is
a basic implementation of pattern-matching.
Anagrams: Finally, you also determined whether two given strings are anagrams of each other
or not.
Practice Questions - II

Here are some more assessments for your practice.


Lets summarise what you have understood in this session in the next segment.
Practice Questions - I

Here are some practice problems for you to try.


Lets practice some more problems in the upcoming segment.
Anagrams

Let's try a fun problem now. Two words are called anagrams if they have the exact same letters
but in a different order. For example, 'night' and 'thing' are anagrams.
Can you write a code to find out if two strings are anagrams or no? We strongly recommend
that you try this question on your own by thinking of the logic, building the flow chart, and
translating that logic into code in the console below before you watch Sajan's explanation of
the code.
Here is Sajan's explanation incase you were not able to solve the code on your own.
Practice makes a man perfect. Use your newly acquired Python skills to solve the problems in
the upcoming segments of this session.
Common Prefix

We often have to find the common part of two sets of data. It can be when we are merging two
data sets, or you sometimes see if a certain data or a certain part of data appears more times
implying it is either more important or highly reliable and a lot more.
In this question, we will take up a simpler problem of the same type. You will be given two
strings. Can you think of a way to find only the common prefix part in the two strings? Let's
see as Sajan explains it.
The pseudocode of this problem may seem complex but if you follow it in the order Sajan
explains it, you'll find it simple. Let's see how we write it in Python.
You saw that Sajan made an error in the code and how was he able to debug it. You'll also face
such issues while coding and keep developing the skill of debugging. Now, it's your turn to try
the problem.
In the next segment, you will determine whether two given strings are anagrams of each other
or not. What exactly are anagrams though? Let's find out.
Move Vowels

Next, we will write a code to shift all the vowels to the front and consonants to the back in the
given string, without changing their order. This might seem quite tough at first, doesn't it? Give
it a thought and try to think of an approach before we see Sajan break down this complex-
looking problem.
This goes to illustrate how a complex-looking problem can be so easy beneath once we calmly
break it down to smaller chunks and think about it in the right way.
Wasn't that really simple? Well, you'll soon be able to do it as you go ahead. Let's see how do
we translate the above pseudocode in Python.
Can you try the problem now? Try to think what would have happened if you tried 'v = i + v'
instead of 'v = v+ i'?
You just saw how easy it was once we broke down the problem and thought about it. This is
what coding is all about, finding the right approach. Typing it on the keyboard is just trying to
explain the process to the computer. In the next segment, you will determine the common prefix
between two given strings. This is generally used for pattern-matching.
No Spaces

The two fundamental skills for any data analyst is getting data, and then cleaning it. Only then
can you work your magic around the data to extract or draw actionable insights from it. When
you get your data, it is never in a useable form; one of the most common problems is having to
deal with spaces. Spaces are always to be avoided in naming the variables and the column
names and a lot more similar places. Another problem is values like cost or any other numbers
have commas in their representation, e.g. one lakh is 1,00,000 and due to this, the system treats
it as a string and not an integer or float.
In this segment, you will address the first problem with the help of Sajan, and then address the
second problem on your own in practice problems. You may look at the detailed problem
statement in the coding quiz question below.
Well, you have already understood the pseudocode. Can you solve it now? Think, how will
you convert the first letter of the words to capitals.
Let's see how Sajan has coded the above mentioned problem in Python.
You got to know how to use the 'title()' functionality in the above problem which helped in
making the code cleaner. You can read more about it here. In the next segment, you will learn
how to solve an interesting problem using basic string manipulation.
Reverse Words

In this segment, you will learn about a useful functionality of strings and lists called reverse().
Let’s take an example to illustrate it. You will be given a sentence in the form of a string. You
have to reverse the order of the words in the sentence. Remember not to reverse the individual
words, character by character, but the order of words. Before we see Sajan explains how we
can do this with a very small code, can you try it out?
Did you get the desired output? Did you use a for loop? Let's see how Sajan approaches this
problem.
Let's see how do we implement the above pseudocode in Python.
Note that Sajan uses split(' '), whereas using only split() would have worked too as the default
argument for the split function is space. It is always a good practice to not depend on default
arguments and pass them the same like space in the split function.
In the next segment, you will learn about one of the most common ways to clean your data, by
removing spaces.
Palindrome String

Let’s start off with a simple and common problem: Palindrome String.
A string is considered palindrome if it stays the same upon reversing it. For example ‘racecar’.
Can you write a code that takes the input of a string and checks if it's palindrome or not?
Before we start off with the coding section, let’s understand the logic.
Let’s move to the console and see how we convert the pseudocode to a code in Python.
It is now your turn to try it out. Note that you can do it by some other methods too. Hence, we
encourage you to design multiple approaches to the same problem.
In a previous session, you learnt how to output the reverse of a given number. You can perform
the same operation for a string as well. Let's see how it can be done in the next segment.
Refresher

In this session, we are going to cover applications of some of the concepts related to string
manipulation like appending using '+', title(), lower(), upper() etc. Please attempt these
problems to test your knowledge of strings. You may revise it from the previous module
(Introduction to Python + Graded Questions, Session 2). We recommend you revise the
concepts based on your performance in the following questions.
Now that you have had a quick revision of the concepts, let’s start off by solving some questions
that would require us to think logically.
Session Overview

In this session
After covering lists, you can now start exploring the interaction of different data structures,
namely a list of strings. You will first revise basic syntax like the previous two sessions and
then move on to coding questions. You will also learn to make use of basic functionalities like
appending, slicing, typecasting etc.
One of the first steps in data analysing as a data scientist is acquiring the data. The acquired
data is mostly in string format. Cleaning this data is the first step. Learning functionalities of
the string is thus, one of the most important objectives of this session. You will be doing similar
activities in the upcoming questions.
You will be understanding the following questions in this session:
Palindrome
Reverse Words
No Spaces
Move Vowels
Common Prefix
Anagrams
People you will hear from in this session
Faculty
Sajan Kedia
Data Science Lead, Myntra
Sajan did B.Tech. & M.Tech. in Computer Science from IIT BHU. During his Masters, he
worked on Data Mining & published research papers on the topic. Currently, he is leading the
Data Science Team of Pricing at Myntra, building AI systems for personalised pricing. He has
very good expertise in Big Data technologies, Machine learning, and NLP.
Summary

In this session, you solved some more problems using Python programming. Let's revisit each
problem statement one by one:
Smallest element: You learnt how to determine the smallest element from a given set of
integers.
Above average: You also determined how to find whether a given number in a list is above the
average value or not.
Recruit new members: You solved a real-life problem of selecting better candidates using basic
concepts.
Calendar: You also implemented a basic solution to determine overlaps between multiple
events.
Fenced matrix: Finally, you learnt and implemented the concept of a fenced matrix using
Python.
Practice Questions - II

Here are a couple of practice questions for you to try.


Lets summarise what you have understood in this session in the next segment.
Practice Questions - I

Here are some more assessments for your practice.


Lets practice some more coding problems in the upcoming segment.
Fenced Matrix

You will be given two positive integers m and n. You have to make a list of lists (which can be
visualised as a matrix) of size m*n, that is m sublists (rows), with each sublist having n integers
(columns).
The matrix should be such that it should have 1 on the border and 0 everywhere else. Check
the coding quiz question at the end of this segment for more details. Sajan will explain the
concept of a fenced matrix and the problem statement in detail.
Let’s see how you will convert this to code. Sajan will also explain what a deep and shallow
copy is and how it can create severe problems while coding.
Deep and Shallow copy is one of the most significant error encountered. To illustrate the
difference let us consider the following piece of code.
original_list=[0, 0, 0]
copy1=original_list
copy2=list(original_list)
copy3=list.copy(original_list)
original_list[0] = 1
print(copy1)
print(copy2)
print(copy3)
The output we were expecting is
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
Since all 3, copy1, copy2, copy3, are copies of original_list, any change in the original list
should not be reflected in copy1, copy2 or copy3. However, the result we get is
[1, 0, 0]
[0, 0, 0]
[0, 0, 0]
Why is that?
Let's dig in a little deeper. When you make ANY data structure, you ask the computer to give
you some space in its memory. Now when you asked the computer for original_list‘s space in
the first line, it returned you the asked space.
Think of this as a box. You can store anything in this box: list, dict, anything. When you next
asked copy1=original_list, you wanted a new box with its contents same as the box named
original_list, but what the computer did was to give you the box named original_list instead of
making a new box.
Now if you make changes in the content of the box using original_list or copy1, because it is
the same box it will be reflected in other as well. This is called a shallow copy. It is called a
deep copy when you make the computer give you a new box as in the case of copy2 and copy3.
Now try coding the same thing below.
Additional Resources
Another common 2D- Matrix problem is the Spiral Traversal of a Matrix.
With all this knowledge under your belt, go about solving the practice problems given in the
next segment.
Calendar

Let’s raise the bar a little now by taking level-2 lists (2-D list) into consideration. Here is a
simple problem to start.
You are planning to go to your friend's wedding and you have long events all month, going on
for at least a few days. You have the start and end dates of events and your task is to find out
events that are overlapping with the wedding date.
Let’s understand how Sajan approaches the problem.
Can you convert this logic to code now? You can check out Sajan's explanation of how to
convert the logic to code for your reference.
Here is the explanation of how Sajan coded the above-explained logic.
This question was just to make you a little more comfortable with iterating through a 2-D list
or a level-2 list. Let's solve one more question on a 2-D list before wrapping up the session.
Recruit New Members

This can be seen as an extension of the previous segment. Suppose you are the manager of a
big firm and are looking to hire new members for your team. You published an advertisement
and have received a few applications.
You rate people on a scale of 0 to 100 and have given scores to all the members of your team
and the new applicants. The selection process is very straightforward - if the applicant improves
the average of the team then you hire the applicant or else reject the applicant. Remember the
order of processing applications is going to be important here.
Maintaining modularity in your code is a very important aspect of writing code in the industry.
You will be building on programs that are developed by your colleagues and vice versa. To
make that easy for them and for you, it's always better to have a modular code. Let us see Sajan
use the code from the previous question and solve this one.
You can read more about the sum functionality here. Try out coding the above question on
your own below.
Lists can have more than one dimension. Let's see how to solve an interesting problem using
something known as a 2-D list.
Above Average

Just like the previous problem, finding the average is another action you will perform quite
frequently while analysing the data. Given a list of integers, can you find the average and
determine if the given number is above the list’s average or not?
Let us see Sajan turn the logic explained above into a piece of code.
You can try writing the same code using the reduce functionality? How many more ways can
you think of? Use the console below to try out different approaches.
It is time to move on and see another real-world application which is an extension to the
problem you solved in this segment.
Smallest Element

Let's start with a very basic simple question of finding the smallest or minimum element from
the given list of integers. You will be using these types of codes like finding the minimum,
average, maximum, median, and a lot more when dealing with new data. Let's listen to Sajan
explain the approach that you can take.
We can also use the Python inbuilt functionality to perform a lot of actions. Let’s see Sajan
code the problem explained above and show us the Python inbuilt functionality to do the same.
You can read more about min() functionality of python here. Other important Python inbuilt
functions can be found here. It is always better to use these than to write code that performs
the same action. However, the aim of this question is to develop a coding intuition and help
you understand the working of these inbuilt functions.
Try writing code in both ways in the question provided below.
Lists are the most common iterable data structure in python. This question was to familiarise
you with iterations on the list. Let's check the next problem.
Refresher

Here are a few questions to check if you remember about lists from the previous module -
Introduction to Python. We recommend revising the basics based on your performance in the
following questions.
You can proceed with this session once you are thorough with the basics.
Refresher

Here are a few questions to check if you remember about lists from the previous module -
Introduction to Python. We recommend revising the basics based on your performance in the
following questions.
You can proceed with this session once you are thorough with the basics.
Session Overview

In this session
In this session, we will build upon the basics of lists learned in the previous module -
Introduction to Python. You will learn to iterate on lists and use loops and conditionals with
lists, apply list comprehensions to make code more comprehensible, short, and clean. You will
also learn about different inbuilt Python functionalities to make use of while coding. We will
also demonstrate how shallow copy can backfire and collapse your code and logic.
We will do this by covering the following problems:
Smallest Element
Above Average
Recruit New Members
Calendar
Fenced Matrix
People you will hear from in this session
Faculty
Sajan Kedia
Data Science Lead, Myntra
Sajan did B.Tech. & M.Tech. in Computer Science from IIT BHU. During his Masters, he
worked on Data Mining & published research papers on the topic. Currently, he is leading the
Data Science Team of Pricing at Myntra, building AI systems for personalised pricing. He has
very good expertise in Big Data technologies, Machine learning, and NLP.
Summary

In this session, you solved several problems using Python programming. Let's revise each
problem statement one by one:
Real-life scenarios of programming - I: You learnt a basic implementation of the following in
this segment:
Back button
Search bar
Back button
Search bar
Real-life scenarios of programming - II: You learnt a basic implementation of the following in
this segment:
Filtering a few restaurants out of many, based on the location.
Dating apps like Tinder.
Filtering a few restaurants out of many, based on the location.
Dating apps like Tinder.
Swapping: You also learnt how to swap the values stored in two integers.
Even or odd: You wrote a program to determine whether a given number is even or odd.
Alarm clock: You wrote a basic implementation of an alarm clock.
Factorial: You also learnt how to determine the factorial of a given number.
Reverse the digits: You learnt how to write a Python code to reverse the digits of a given
number.
How many chocolates?: You solved an interesting problem where you needed to calculate the
number of chocolates you would get based on certain conditions.
Print the pattern: Finally, you printed different patterns using Python.
Practice Questions - II

Here are some more assessments for your practice.


Let us summarise what you have understood in this session in the next segment.
Practice Questions - I

Try and solve the following pattern printing problem for practice.
Lets look into more coding problems in the upcoming segment.
Print the Pattern

Pattern printing is an exercise done frequently to learn and master loop iterations. This type of
exercise is recommended for all new coders and professionals. Please go to the coding quiz
question below and check the detailed problem statement.
Given a positive integer, you have to print the pattern as illustrated. We recommend solving a
lot more of these types of problems on your own. You will also be solving these types of
problems in the upcoming lessons and modules.
Well, those were three nested loops! Can you take up the challenge and code it without further
help?
Don't worry if you are still confused, watch Sajan explain the problem in detail, and then you
will feel confident to solve it on your own.
These type of problems may seem a little overwhelming at first, but you have already learnt
how to break them into pieces and solve them. Pattern printing problems make the best
questions for practice. Try and make an account on different coding platforms and attempt
more of these for practice.
How Many Chocolates?

This question is actually a puzzle that some of you must have solved as a kid. Let's say you
have m Rupees and one chocolate costs rupees c. The shopkeeper will give you a bar of
chocolate for free if you give him 3 wrappers. Can you determine how many chocolates can
you get with the m Rupees you have?
This might seem like a tough problem to code in first glance, but let's see Sajan break it down
and make it easier.
Well, that doesn't look very tough now, does it? Can you try and write a code for the above-
explained logic?
Let's see how Sajan would have written a code for the above problem.
The first takeaway from the segment is that you should always break the problem into a smaller
one.
Suppose you were to write a more generalised code, instead of one chocolate for 3 wrappers,
there was also a variable taken via input called r, i.e., instead of getting a chocolate for 3
wrappers, you now get it for 'r' wrappers. Now if you were to write a code for m, c and r, that
doesn't seem like a big problem now does it?
Well, that is simply because you broke down a simpler problem with r=3 and then solved it by
further breaking it down.
This is a common practice in the industry. You first start with a very easy and simple problem,
solve that and then build upon it. After doing this several times, building on the existing code
or product to improve it is how you end up with a worthy end product.
You can also print some interesting patterns using Python code! Didn't see that coming, did
you? Let's see how you can do so in the next segment.
Reverse the Digits

Until now, we saw how to use loops and conditionals. Let's now move a step further and solve
a tougher problem. Here you will reverse a number.
Given a number, say 24312, you have to reverse it and print it, 21342. (hint: you will find %
and // operators useful here). Can you try making a flow diagram and then converting it to
code? Try it below!
If you feel this was a tough problem, let's see Sajan break it down and create a flow diagram
for the same.
Now that you have a flow diagram, it now seems like a trivial problem. How about you try and
write a code for it now if you weren't able to earlier. If you are still having a tough time
understanding how to code this.
You just saw how having the right path or approach to any problem makes the problem easy
and trivial. One should not focus on coding the problem but on deciding how to approach it
and what algorithm to use and try and build a flow diagram for it. This is how we will always
approach every problem. Let's see if you can solve the next one.
Factorial

The next problem is to find the factorial of a number. A factorial is defined for integers greater
than or equal to zero and is defined as:
n!=1×2×3×4×5...×n
Now, if you need to find the factorial of a number 'n', you just need to multiply 1 with 2, then
the result with 3, succeeded with a multiplication with 4 and so on until 'n'. Sounds like this
question needs to be done using a loop. So given a number n, can you find n!? Let's hear from
Sajan on how to approach it.
Try and code the above-explained logic. If you feel stuck, Sajan will show you how to convert
the above logic in a code. You will find loops useful in this problem.
Here Sajan showing how to write a code for this problem using the above-explained logic. But
before you hit the 'Play' button, do you want to try it out yourself?
It is now time to move beyond conditionals and loops. In the next segment, you will learn how
to output a number in reverse form.
Alarm Clock

Do you wake up at the same time every morning? Well, the answer is probably no for most of
you. If it is a normal working day where you're swamped with work and meetings starting early
morning, you try to wake up as soon as possible. If it is a weekend or a vacation day, you
mostly want to give yourself some extra rest, right? Now, you might set alarms for this
manually periodically depending on what day it is. Instead of manually setting up an alarm,
what if you try to write a code that automates the time the alarm goes off. If there was some
way for the alarm to know the kind of day it is, it would automatically ring according to the
preferences set by you. Let's attempt to do this.
So, the problem statement is, you're trying to automate your alarm clock by writing a function
for it. You're given a day of the week encoded as 1 = Mon, 2 = Tue, ..., 6 = Sat, 7 = Sun, and
whether you are on vacation as a boolean value (a boolean object is either True or False. Google
"booleans Python" to get a better understanding).
Based on the day and whether you're on vacation, write a function that returns a time in form
of a string indicating when the alarm clock should ring.
When not on a vacation, on weekdays, the alarm should ring at "7:00" and on the weekends
(Saturday and Sunday) it should ring at "10:00".
While on a vacation, it should ring at "10:00" on weekdays. On vacation, it should not ring on
weekends, i.e., it should return "off". You may have a look at the problem statement from the
coding quiz question.
You must have a pretty good idea of how to draw simple flow diagrams now. Let's try and
convert it to a code now.
The approach Sajan discussed was with a nested loop. You can do it using if... elif... elif... elif...
else statement as well. Can you try writing a code with that approach? (Hint: You will find the
'and' operator quite useful here).
So, you have now automated your alarm clock to ring according to your needs so that you can
always catch that extra hour of sleep depending on the day. Feeling lazy already? Well, right
now you have to keep your minds open and brains charged as there are many such questions to
go in the upcoming segments.
Even or Odd

Let’s take up another problem of finding whether a given integer is even or odd. You may have
a look at the problem statement from the coding quiz question.
The aim of this question to revise the if-else conditions and also to learn to draw a flow chart
for every problem and to convert it to code. Let's see Sajan make a simple flow diagram to
show how to solve the question.
Making the flow diagram is the same as representing the solution in a diagrammatic form. You
may choose to write it and explain it in simple words too, but it is easier with a diagram.
Let us now see Sajan write a code for the same diagram that he explained above.
You just saw how to make a simple flow diagram and then code it. Can you write a similar
code?
How about the next problem below? Can you draw a flow diagram for this next question and
then convert it to code?
In the next segment, you will see another interesting use case where you can write code for a
highly basic alarm clock.
Swapping

Let’s start writing the code now. Sajan will help you break down the problem and then show
you how to code it on the console. Let’s start with a very simple problem first. You will be
given two integers x and y. You have to swap values stored in them. Have a look at the detailed
problem statement in the coding quiz question at the end of this page.
Having understood the logic behind the problem, let's now take a look at how to code it using
the coding console.
The approach explained by Sajan is quite simple and straight forward, can you think of a
method that doesn't involve making a new variable or using a new container? Try your method
below!
In the next segment, you will continue on your journey of basic Python programming. You will
learn how to determine whether a number is even or odd by writing a small piece of code.
Real Life Scenarios of Programming - II

The next problem on our plate is the food delivery application platforms. If you have ever
placed a food order online, you might have noticed that you only see the restaurants that will
deliver an order to you, at the top of the display list, from the thousands of restaurants registered
with the application. How do you think this happens? How would you do it if you were asked
to solve this problem?
Like you realised, you don't only have to design the flow or think of storing the data, you also
have to think about what more information is needed from the users or some other source of
data. In this case, we had to take GPS locations and map data from users. Storing this, we can
later analyse the behaviour of different users to see if they tend to order from restaurants closer
to them or some other analysis to help make better business decisions.
You should have a pretty good idea by now about how to draw flowcharts depicting the
solutions to different problems. Let us conclude this discussion with this slightly complex
problem about any dating website or application. Can you try and think of some way to design
a similar application like Tinder?
Here are a few problem statements for you to try and design a flow diagram to solve them. You
can make few assumptions but don't forget to state those so that your peers can also comment
on the same along with the TAs.
It is now time to get started with writing some Python code! In the next segment, you will learn
how to write a basic implementation of swapping two integers - a highly common use case in
the world of programming.
Real Life Scenarios of Programming - I

The first step to solving any problem is to think about what you are going to do. Only after
deciding the right process and flow of the solution, you explain the same process to the
computer using a programming language. So, no matter what language you write your code in,
to explain to the computer what you want it to do, the first step is always deciding the flow.
This is called writing pseudocode.
Let’s listen to Sajan as he shows how to break down some complex real-life problems into
smaller problems and write pseudocode for those using flow diagrams.
Our first problem for this discussion is the implementation of the back button. The back button
is not only used in browsers but also in smartphones and different applications. Ever wondered
how that works? Of course, we will not be designing the real back button for different
applications or browsers, but we can still try and break down the basic implementation. This
basic thought or flow of the problem is then worked on and improved with different things like
giving different illustrations and applying a similar concept for different data forms and so on.
Let's see how you will solve the implementation of the back button.
Here you saw a very rudimentary application of the back button using stacks. Another concept
similar to stack is a queue. You can read more about stacks and queues here. We saw how the
back button can be implemented as a stack using lists. This is also how the 'undo' function
works in your text editors. The programs are built upon many times to improve performance
and speed but the core idea never changes.
You must have realised by now that YouTube keeps a count of the unique views each video
gets. How do you think this is done? Let us see Sajan explain this algorithm, which we are
tracking if you viewed or not as well
Now you know why different websites store cookies or collect IP addresses or even ask you to
signup or link accounts before viewing the content.
With this much data being saved, the next big problem is to search the entry of interest from
this accumulated ton of data. Our next problem is the same. While shopping online on a
platform you enter the product name and the platform finds all the relevant results for
displaying, or while using any other filter as well. How would you implement this functionality
on a platform?
Let us look at a few more such problems in the next segment.
Although a 'search bar' in real life has been further developed to perform more complex
searches than just basic string matching, this should serve as a very good starting point to
further develop a more sophisticated search bar. This solution might seem very trivial now, but
that is only because we selected the right data structure. Selecting the right data structure to
store a particular data depending on the problem at hand can make the problem extremely easy
or hard.
Try these following very easy problem statements and see if you can come up with a flow
diagram to solve the problem, remember to put some thought into what data you will be saving
and in what format.
Basic Refresher

In this session, we will be learning how to approach different problems and develop a strong
foundation in logic building. Here are a few questions for your revision, kindly revise the basic
conditionals, loops and operators from the previous module - Introduction to Python before
starting.
You have already been using it in previous modules. With an increase in graded questions and
diving headfirst into coding, it is important to know the console you will be using to code.
Hope that made you familiar with the console. If you still have any doubts or face any difficulty
while attempting the coding questions later in the module, you can refer to the following PDF
that briefs you about the platform coding console and its various functionalities.
In the next segment, you will begin with learning about some real-life scenarios where Python
programming is used.
Session Overview

In this session
In this session, you will first revise the basics of conditional statements, looping syntax, and
operators in Python. This session focuses on building a strong sense of logic and the importance
of making a flow diagram or writing pseudocode before solving any problem. You will also
see how to break down an intimidating problem and approach it with sound logic.
We will do this by:
First, breaking down a few famous industry application problems and demonstrate how that
can be done using simple concepts that you have learned in the previous module - Introduction
to Python.
Next, we will take up a few problems starting from an easy one and slowly increasing the
difficulty. It is recommended you solve all of them and also try them on your own employing
different approaches than the ones that are illustrated.
We will consider the following problems in this session:
Swapping
Even or Odd
Alarm Clock
Factorial
Reverse The Digits
How Many Chocolates
Print the Pattern
Swapping
Even or Odd
Alarm Clock
Factorial
Reverse The Digits
How Many Chocolates
Print the Pattern
Session Objectives
At the end of this session, you will be able to:
Understand how to break down complex real-life problems into smaller problems
Understand how to draw use case of a given problem
Understand how to write pseudocode for a given coding problem
Module Introduction

Welcome to the module 'Programming in Python - I'.


Programming is one of the most fundamental skills you will need as a data scientist/analyst.
Among the numerous tools available for analysing data, we choose Python as it is the most
robust, has an intuitive syntax, and is a modern and widely used language.
You will see a module mindmap that will give you a better idea of the structure of the entire
module.
In this module
The module will be focused on developing problem-solving ability using Python and
familiarizing you with programming. This will mostly be done by solving various problems
and a lot of practice.
Sajan will introduce himself and give a brief overview of the learning objectives in this module
and the next.
This module has been broken down in 4 major portions:
Basic Programming
Here you will learn about how to think and approach coding questions. This portion will be
mostly based on building a strong sense of logic using only the basics.
Here you will learn about how to think and approach coding questions. This portion will be
mostly based on building a strong sense of logic using only the basics.
Lists
In this session, you will solve questions based on lists. You already know most of the basic
syntax from the previous module. Here you will be challenged to use those learnings in an
effective way to tackle different problems.
In this session, you will solve questions based on lists. You already know most of the basic
syntax from the previous module. Here you will be challenged to use those learnings in an
effective way to tackle different problems.
Strings
Here you will solve questions based on strings. You already know the basic syntax from the
previous module. In this session, you will see how to leverage those to try and solve different
problems and build a strong logical thought process.
Here you will solve questions based on strings. You already know the basic syntax from the
previous module. In this session, you will see how to leverage those to try and solve different
problems and build a strong logical thought process.
Other Data Structures
In this session, you will solve questions based on the dictionary, which is one of the most
important data structures for a data scientist. You will be able to leverage your learnings from
the previous sessions where the keys and values of a dictionary can be analogous to lists and
strings.
In this session, you will solve questions based on the dictionary, which is one of the most
important data structures for a data scientist. You will be able to leverage your learnings from
the previous sessions where the keys and values of a dictionary can be analogous to lists and
strings.
Module Objectives
At the end of this session, you will be able to:
Understand how to break down complex real-life problems into smaller problems
Understand how to draw use case of a given problem
Understand how to write pseudocode for a given coding problem
Pre-requisites
We expect you to have gone thoroughly through the previous module Introduction to Python.
Guidelines for in-module questions
The in-video and in-content questions for this module are not graded.
Guidelines for graded lab questions
The lab questions at the end of the module are graded.
People you will hear from in this session
Faculty
Sajan Kedia
Data Science Lead, Myntra
Sajan did B.Tech. & M.Tech. in Computer Science from IIT BHU. During his Masters, he
worked on Data Mining & published research papers on the topic. Currently, he is leading the
Data Science Team of Pricing at Myntra, building AI systems for personalised pricing. He has
very good expertise in Big Data technologies, Machine learning, and NLP.
Summary

Now that you have learnt all the basic concepts and techniques in python, you are ready to
move further along on the programme. But before doing that, let's revise the concepts learnt as
part of this module using the practice exercise given below.
In this exercise, you will need the concepts of object-oriented programming, control structures
and fundamental arithmetic operations.
The concepts learnt as part of this module would be used to build an Ice-Cream Sundae ordering
application, through which a user will be able to order an ice cream or a customisable sundae.
We will program the application to dynamically calculate the cost of the order based on the
user's customisation to the dessert.
Download the Jupyter Notebook linked below and get started. All the best!
It is highly recommended that you give the practice exercise your best shot. If you are facing
any challenge with respect to any particular component, you could look at the solution to the
exercise given below:
Now, let's summarise what you learnt in this session.
Overall in this module, in the first session, you started your learning journey in Python by
writing your first program and then went on to see the different data types supported by Python
and various arithmetic and string operations using the same.
The second session was all about grasping concepts of different data structures supported by
Python, where you learned in detail about List, Tuples, Sets and Dictionary.
The third session was about control structures and functional programming in Python, how
decision-making statements and loops play an essential role etc.
Finally, in this session, you learnt about classes, objects methods and various OOP
methodologies. You started from what are classes, what are objects, how these two are related,
and how you implement methods. Moving further, you learnt in detail about one of the most
crucial object-oriented programming methodology called 'inheritance', how Python supports
the same, and how do you override it using a method.
Additional Reading :
With this module, you have now learnt the fundamental skills and syntaxes associated with the
Python language. This knowledge is enough to learn the next main skill in Python - how to use
Python for data-related tasks which is the central point of discussion for your next main
module-Python for Data Science.
Apart from this, you can also use Python for general problem-solving. Think of it like this,-
when you are learning a new spoken language, you first learn the alphabets, basic words, and
the grammar (syntax) that binds them together. This knowledge then enables you to create
structured sentences that help you express your thoughts. The learnings in this module were
those alphabets, basic words, and syntaxes. The next step would be to leverage these entities
to structurally create programs that could help you solve problems. You have, obviously,
solved some basic problems in this module itself like repeating the same task for a large number
of times using loops, or filtering out some values based on certain conditions using the filter()
function, and even creating classes and objects that enable you to think like you are working
with real-life entities.
Class Inheritance and Overriding

In the previous segment, we have learnt about methods and functions in Python. Next, we will
be learning about inheritance and overriding. Let's start by understanding what inheritance is
and how it works.
As the name suggests, inheritance means to receive something. In Python as well, inheritance
has a similar meaning; Let's say class A is defined in class B, then in this case class A would
have all the properties of class B.
Inheritance helps in the code reusability. Just like a child inherits their parents' qualities, the
class that inherits properties is known as a child class, and the class from which it inherits the
properties is called the parent class. Let's take a look at an example to understand this better.
In the example shown above, you saw that the rectangle and circle inherit the properties of the
parent class shape, and because of this parent-child relationship, you did not need to define
set_colour and colour_the_shape again in the circle and rectangle classes.
One more thing to notice is the method calculate area since it would be unique to different
classes it was just initiated in the parent class and in the child class, this method was defined
as per the child class functionality. This is nothing but method overriding. You can use methods
to override the inbuilt functions in Python as well.
That brings us to the end of the segment on inheritance and overriding. Before moving forward
please attempt the question given below:
Additional Reading:
If you want to learn more about inheritance then click on the link given below:
Inheritance
Methods

In the last two sessions, you used a lot of in-built methods in case of lists, tuples or some other
data structures. The methods essentially are functions which are responsible to implement a
certain functionality when they are used in code. This segment will help you learn how these
methods are implemented in Python.
The self keyword in the init method has a different functionality of linking the elements defined
in the class to the values passed as arguments while creating an instance to the class as shown
in the image below:
In the execution, you can see that the E1 employee object is created, and on calling the update
method, it is returning the updated age and also updating the age of the employee.
You can write a similar function to update the company code as well; however, there would be
a critical flaw if you did so because handling class variable should not be within an ordinary
method that can be accessed/changed by any instance object. There are separate methods called
class methods to do this. Let's understand more about these methods.
A class method is defined using a class method decorator (@classmethod) and takes a class
parameter (cls) that has access to the state of the class. In other words, any change made using
the class method would apply to all the instances of the class.
Food for thought-Method vs Functions
Python functions are called generically, methods are called on an object since we call a method
on an object, it can access the data within it.
A 'method' may alter an object’s state, but Python 'function' usually only operates on it, and
then returns a value.
Additional Resource:
For a better understanding of the difference between a class method and a static method.
Note: Please be aware that the environment used for running Python is different from the one
we have been using, but the concept is explored really well.
Class and Objects

In the previous sessions, you must have heard about using a list object. In this session, we will
first understand 'what exactly is an object in Python?' and then, we will explore the concept of
object-oriented programming, which will help us understand the language better.
Let's take a look for a better understanding of the concept.
The above provides a basic introduction to classes and objects. These concepts are a little
complex to digest but you will get a better understanding after looking at a few examples.
You have learnt about classes from the previous video, you can now try to create an Employee
class with three attributes, namely, age, name and employee id.
The class keyword is used to define a class, and in the __init__ method is used to initialise the
attributes that define our class. Here, the attributes 'name', 'age' and 'employee id' define our
employee class, and using the self keyword, you define these arguments inside the __init__
method.
You will learn about the significance of the keyword - 'self' in future segments. Take a look at
the code snippet below and try creating your own class:
It is very important to understand the init method, it is this method that is instantiated
automatically when a particular class is being used; it also determines the number of values
that are to be passed.
Having created the employee class, you can now create an object in this class by just passing
the details of the employee as arguments. Take a look at the details and code snippet given
below:
E1 = Employee(24,'Ravi',101) → This would create E1 with age = 24, name = Ravi, and eid
= 101.
This object E1 is nothing but an instance of the class Employee. When you try to apply the
type function on E1, it will return the class to which it belongs.
The next question that arises is whether it is possible for this employee class to contain certain
attributes such as the company code that is common to all the employees? These attributes are
class variables, and they are common to all instances of the class.
So, let’s add a class variable called company code to our employee class using the code given
below:
class Employee :
company_code = "EMZ"
def __init__(self,age, name,eid):
self.age = age
self.name = name
self.eid = eid
This would make the company code a common property of all the employees. On creating an
employee instance, the company code attribute and its value are assigned automatically to the
employee as shown below:
E1 = Employee(24, 'Ravi', 101)
E1.company_code

'EMZ
You cannot simply use E1.company_code = 'XYZ’ to change the company_code. This would
change the company_code of E1 employee; however, since company code applies to all the
employees, you need to write:
Employee.company_code ='XYZ'
In the upcoming segments, you will be learning about the different methods and functions that
can be applied to the data in python.
Session Overview

Welcome to the fifth session on 'Object-Oriented Programming in Python'.


In this session, you will learn about the object-oriented programming concepts in Python,
which will help you in building different complex applications. Let's hear from Behzad about
what object-oriented programming is all about.
In this session, we will be covering the important concepts of object-oriented programming,
which you may come across while working in the field of Software Engineering. The object-
oriented programming paradigm enables us to think in a natural way, it mimics the working of
real-life entities or objects.
Note: Please keep in mind that object-oriented programming is a world in itself and we are
learning this paradigm in order to get exposed to software engineering skills that might be
required as part of the complex application building process. This might be a reasonably
challenging concept to grasp so; please do not feel disheartened if you find it difficult to follow
now, you can always come back later to review the concepts, and with a few iterations, things
will start making sense. Essentially, the depth of topics covered in this module is enough for
you to work in the field of Data Science, but for the curious ones who wish to know more, we
will be sharing curated resources at the end of this module.
You can open the link below to download the Python notebooks used in this session. We
recommend that you keep executing the commands on your computer in pace with the lecture.
You can parallelly try experimenting with other commands you may have in mind.
Session Objectives
By the end of this session, you will be able to:
Understand the role of object-oriented programming in writing structured applications;
Implementation of objects and classes in python programming;
Understand different methods and functions that can be applied to the data in python;
Define and implement inheritance in python programming; and
Understand the concept of overriding in programming.
People you will hear from in this session
Faculty
Behzad Ahmadi
Data Scientist at Walmart Labs
Behzad is a Doctor of Philosophy (PhD) in Electrical and Computer Engineering;
communication and signal processing from the New Jersy Institue of Technology. He has been
working in the software engineering and data science field for the last 12+ years. Behzad
currently employs his machine learning skill-set to create retails graph for Walmart labs.
Summary

Let's summarise what you learnt in this session.


To conclude, in this session you learnt about the various control structures and functions
supported by Python like the if-else constructs, nested if constructs, various loop concepts, and
then finally learnt about functional programming in Python.
In the next session, you will be learning about object-oriented programming concepts in
Python, where you will dive deep into the concepts of classes, objects and object-oriented
programming methodologies.
Additional Reading
If you want to know more about functions and comprehensions in Python then click on the
links given below:
1. Functions-A Byte of Python
2. Defining functions of your own
3. Comprehensions explained visually
4. Python 3 idioms test
Practice Exercise III - Map, Filter and Reduce

By now you are well versed with the concepts of basic Python programming. Given below are
some practice questions. Download the Jupyter Notebook given below and solve these
questions before moving on to the practice exercise:
Once you are done solving the questions, refer to the solution notebook below to find the
correct answers:
Now let's test our newly acquired Python skills on some practice questions (un-graded).
Attempt the questions on map, filter and reduce functions.
Functions in Python

In the previous segments, you saw how to take two lists and perform an element-wise
subtraction. Now, imagine you are trying to build an app which the general public could use.
For such an app, the 'expense calculator function' would be reused. Now instead of writing or
copying the code each time you could build it as a function that is similar to methods you have
seen earlier. Functions serve the purpose of reusable and customisable methods.
By the end of this segment, you would be able to create your own functions which could be
customized to perform a task based on your preference.
Syntax of a function:
The function which was built was not really useful. It returned the number four no matter what
the input was. Let's build a function that actually serves a purpose.
Now, based on your understanding attempt the coding question given below:
As you saw in the 'factorial function', functions can take multiple parameters. When the
functions get complicated, programmers often get confused while passing parameters to a
function. Let's now look into more complex examples to build a stronger foundation of
functions.
You heard about arguments and parameters a couple of times in the video and might have been
feeling a little fuzzy about it, so here is refresher-Function parameters are the variables used
while defining the function, and the values used while calling this function are the function
arguments.
There are four types of function parameters:
Required parameters: These parameters are necessary for the function to execute. While calling
an inbuilt max() function, you need to pass a list or dictionary or other data structure. This
means that a sequence is a required parameter for this function.
Default arguments: These parameters have a default value and will not throw any error if they
are not passed while using the function.
Keyword parameters: These parameters are expected to be passed using a keyword. In the
example of printing a list, you saw that passing end = ',' prints the elements of the list separated
by a comma instead of the default newline format. This is a classic example of a keyword
argument, the value of which is changed by using the end keyword and specifying the value.
Variable-length parameters: These enable the function to accept multiple arguments. Now, let’s
try to write a function that takes two lists as the input and performs an element-wise subtraction.
It is simple: first, you define your function with two parameters:
def fun(L1,L2):
Now, from your previous experience, you know how to perform element-wise subtraction. So,
in order to build it into a function for the same, you just need to put the code inside a function
definition:
def list_diff(list1,list2):
list3 = []
for i in range(0, len(list1)):
list3.append(list1[i] - list2[i])
return list3

L1 = [10, 20, 30, 24, 18]


L2 = [8, 14, 15, 20, 10]

print(list_diff(L1, L2))
Note: The return statement is very crucial in the whole function definition, as it is responsible
for returning the output of the function.
Based on your understanding of functions, attempt the quiz given below:
Lambda functions are another way of defining functions to execute small functionalities
occurring while implementing a complex functionality. These functions can take multiple
parameters as input but can only execute a single expression; in other words, they can only
perform a single operation.
The format of a lambda expression is as follows:
function_name = lambda <space> input_parameters : output_parameters
For example diff = lambda x,y: x-y is a lambda function to find the difference of two elements.
In the upcoming segment, you will understand the concept of map, reduce and filter.
Comprehensions

Comprehensions are syntactic constructs that enable sequences to be built from other sequences
in a clear and concise manner. Here, we will cover list comprehensions, dictionary-
comprehensions and set comprehensions.
Using list comprehensions is much more concise and elegant than explicit for loops. An
example of creating a list using a loop is as follows:
L1 = [10, 20, 30, 24, 18]
L2 = [8, 14, 15, 20, 10]
L3 = []
for i in range(len(L1)):
L3.append(L1[i] - L2[i])
L3
You know this code from our earlier discussions. The same code using a list comprehension is
as follows:
# using list comprehension
L1 = [10, 20, 30, 24, 18]
L2 = [8, 14, 15, 20, 10]
L3 = [L1[i] - L2[i] for i in range(0, len(L1))]
L3
You can use list comprehension to iterate through two lists at a time.
Apart from iterating through two lists, you also saw dictionary comprehension. It is similar to
list comprehension in its construction.
Let’s look at an example to understand dictionary comprehension better. First, using the
traditional approach, let’s create a dictionary that has the first ten even natural numbers as keys
and the square of each number as the value to the key.
# Creating a dictionary consisting of even natural numbers as key and square of each element
as value
ordinary_dict ={}
for i in range(2,21):
if i % 2 == 0:
ordinary_dict[i] = i**2
print(ordinary_dict)
The same code in terms of dictionary comprehension is as follows:
#Using dictionary comprehension
updated_dict = {i : i**2 for i in range(2,21) if i % 2 ==0}
print(updated_dict)
You can see that the comprehension is inside curly brackets, representing that it is dictionary
comprehension. The expression inside the brackets first starts with the operation/output that
you desire, and then loops and conditionals occur in the same order of the regular code.
The comprehension technique can work on sets as well. Let's look at an application of set
comprehensions.
You saw the use of sets comprehension to create a small application which returns the vowels
in a name.
word = input("Enter a word : ")
vowels = {i for i in word if i in "aeiou"}
vowels
Now, based on your learning from the previous attempt the coding question given below:
Loops and Iterations

In the previous session, you learnt about lists and various other data structures. Let’s look at a
small example where you have a person’s income and expense data across five months in the
form of a list, and you want to compute his savings across these five months. You may be
thinking to do this manually by taking the first elements from the two lists and subtracting
them, then again taking the second elements and subtracting, and so on. This may look simple,
but let’s say this task has to be done for 10 or 20 years timeframe, in that case, would you have
the same strategy?
In cases where we need to repeat a pre-set process n number of times the concept of iteration
comes in handy, as you are repeating the same operation multiple times. With this in mind,
let’s learn more about it.
While loop
As you saw, the while loop keeps on executing until the defined condition is true. Once the
condition fails the loop is exited. Such kind of programs are used in everyday applications such
as the pin code lock:
#Let's create a pin checker which we generally have in our phones or ATMs
pin = input("Enter your four digit pin: ")
while pin != '1234': #Pre-set pin is 1234
pin = input('Invalid input, please try again: ')
print("Pin validation successful.")
But usually, you only get three attempts to unlock anything with a pin code. Can this condition
be written in the while loop.
To add the functionality of the number of attempts you use a counter variable. The code from
is given below:
# Now if we want to add a maximum number of tries allowed we'll add an 'if loop'
import sys #required for exiting the code and displaying an error
pin = input("Enter your four digit pin: ")
attempt_count = 1
while pin != '1234':
if attempt_count >= 3:
sys.exit("Too many invalid attempts") #error code
pin = input('Invalid input, please try again: ')
attempt_count += 1
print("Pin validation successful.")
That was about the while loop, let's move on to the next looping technique that is the 'for' loop.
'For loop' Control Structure:
By now, you have understood the syntax of a for loop.
for val in seq :
statements
A few things to note in the syntax include:
seq, which represents a sequence; it can be a list, a tuple or a string. To put it in a simple way
it can be any iterable object.
The in operator, which, as you may recall, is a membership operator that helps to check for the
presence of an element in a sequence.
But what role is the 'in' operator playing here?
When you say for val in seq, it means that for a value in sequence, it executes the set of
statements in the code block once and returns to 'for' again and then shifts the position to the
next value.
Let’s say you are at the last element in the seq. Here, the for block executes, and now, when it
again goes to val in seq, there are no more elements, and hence, val in seq returns a false and
stops the execution.
The for loop can iterate over any iterable object, we seen a few examples of this already.
Dictionary data type is also an iterable object. Let's learn how for loop can be applied over the
dictionaries.
Now that you have learnt about the 'for statement', can you answer our initial question of
calculating savings using a person’s income and expenses data? Let’s try this out.
Assume you have -
L1 = [10, 20, 30, 24, 18] (in thousands)
L2 = [8, 14, 15, 20, 10]
What you were doing manually is subtracting the first element of each list and then the second
element, and so on. In other words, it is L1[i] - L2[i].
Since you need these indexes, let’s create a dummy array of five elements that represent the
index positions L3 = [0, 1, 2, 3, 4].
Let’s implement the for loop using the list L3:
L1 = [10, 20, 30, 24, 18]
L2 = [8, 14, 15, 20, 10]
L3 = [0, 1, 2, 3, 4]
for i in L3:
L3[i] = L1[i] - L2[i]
Here, you are updating elements of L3 at each iteration. Now think about whether you can use
the same approach for a list with 1000 elements?
Note: Here 'i' is an iterator - An iterable is any Python object capable of returning its members
one at a time, permitting it to be iterated over in a for loop.
Now, let's revisit what you have learnt above. The syntax of the range function is simple.
Different implementations of range function include:
range(n): This creates a range object that has elements from 0 to n-1 [ both inclusive].
range(m,n): This creates a range object that has elements from m to n-1 [both inclusive].
range(m,n,t): This creates a range object that has elements from m to n-1 with a step count of
t. In other words, the range object has elements m, m+t, m+2t…so on. If t is negative, the
elements would be decreasing, and that’s exactly what happens in the range (100, 0, -1).
An important thing to note here is that you saw a way to create lists using the list function on
the range object. list(range(0, 10) would create a list of 10 elements.
Now, attempt the coding question given below based on your understanding of iterations.
Suppose a situation arises where the developer has no idea about the number of iterations he
needs to have. Such iterations are called event-based iterations. Let's learn about them in detail.
And before moving forward let's use the learnings of this segment, to create a small app which
can find the prime numbers in the given list of numbers. It is highly recommended that you try
to build this app on your own. If you feel that you are not comfortable with the syntaxes yet,
try to come up with a logical flow for the app. After you give it a try, look at the following to
understand the solution.
Note: Note that, 1 is not a prime number; so the code written is incorrect. To rectify this output,
you need to make a small change (change the start point of the range function from 1 to 2) in
the first line of the code as given below.
Let's learn about another improtant concept also known as the comprehensions in python in the
upcoming segment.
Decision Making

If statements are a crucial part of the decision making constructs in Python. The construct of
the statement is inspired by the natural language. For instance:
if it rains today then I would order an umbrella.
In the example above, there is a logic to the decision taken to order an umbrella. If a condition
is met then the action is executed. The if construct in Python is exactly the same. You will learn
to code the condition that is to be checked.
Having understood relational operators, let’s now look at the if-else construct and the role of
these relational operations in its implementation.
In the example discussed above:
x = 450
if x < 99:
print(x, "is less than 99")
else:
print(x, "is greater than 99")
x < 99 is a relational condition that would return 'true' or 'false' based on the value of x. If this
condition is true, the set of statements under the if block will be executed; otherwise, the
statements under the else block will be executed.
Important things to note here are as follows:
The colon that indicates the start of block creation, The indentation here defines the scope of
the if or else statement, Either code under the if block is executed or under the else block is
executed, not both of them.
So now that you have understood the basic if-else construct, can you try to write a code to
return YES if x lies in the range of 1000 to 1100, else return NO?
If you break this down, you can get the following conditions:
Implementation:
if (X<1000):
print('No')
else:
if (X>1100 ):
print('No')
else:
print('Yes')
In cases like this, where you would like to make a decision based upon multiple conditions
combined together - you can use the logical operators in Python to combine various conditions
within a single if-else loop.
You learnt that there are two types of logical operators:
If you apply this concept to our earlier example for checking whether x lies in the range of
1000 to 1100, the code gets modified as shown below:
if (X > 1000 & X < 1100):
print('Yes')
else:
print('No')
Isn’t this simple compared with the earlier code? That is how logical operators make our logic
building easy.
Similar to an if-else construct, there is if-elif-else construct available in Python. The example
used in the video is given below:
shoppinng_total = 550
if shoppinng_total >= 500:
print("You won a discount voucher of flat 1000 on next purchase")
elif shoppinng_total >= 250:
print("You won a discount voucher of flat 500 on next purchase")
elif shoppinng_total >= 100:
print("You won a discount voucher of flat 100 on next purchase")
else:
print("OOPS!! no discount for you!!!")
Note that in the elif construct, a particular block is executed if all the blocks above the
considered block are false and this particular block is true. For instance, in the above example,
if the shopping total is less than 500 and greater than 250 the output will be:
You won a discount voucher of flat 500 on next purchase.
So the elif construct can also be used to replace the and operator in certain situations.
Now let's look at the nested if-else constructs.
In the example shown in the video:
world_cups = {2019 : ['England', 'New Zealand'], 2015:["Australia", "New Zealand"], 2011 :
["India", "Sri Lanka"], 2007: ["Australia", "Sri Lanka"], 2003: ["Australia", "India"]}
year = int(input("Enter year to check New Zealand made it to Finals in 20th century : "))
if year in world_cups :
if "New Zealand" in world_cups[year] :
print("New Zealand made it to Finals")
else:
print("New Zealand could not make it to Finals")
else:
print("World cup wasnt played in", year)
Using an if-else construct would also give us an answer, but imagine that you have 10
conditions like this; wouldn’t it be clumsy? Instead, if you use something like an elif-construct,
it would make the code more readable.
In this segment, you learnt about relational operators, logical operators, if-else, if-elif-else, and
nested if-else constructs. You might be pretty confused about when to use each of them. Read
the tips given below, which may help you to arrive at a conclusion faster.
Tips:
When there is more than one condition to check and you need to:
Perform a different operation at each condition involving a single variable, you use an if-elif-
else condition.
Just return a boolean functionality, i.e., Yes or No, you use logical operators in a single if-else
construct.
Perform a different operation in each condition involving different variables, you use a nested
if-else construct.
Now let's look at a real-world example of the if-else construct.
In the next segment, you will learn about the uses of looping constructs to automate tasks that
require multiple iterations such as reading 1000+ records and adding them to a single file.
Session Overview

Welcome to the fourth session on 'Control Structures and Functions'.


Control structures are the essence of programming; they help computers do what they do best:
automate repetitive tasks intelligently. The most common control structures are if-else
statements, for and while loops, and list and dictionary comprehensions. This session will cover
all these concepts.
Another crucial thing you will learn in this session is how to write your own functions. Almost
every powerful program, be it a web app or a machine learning algorithm, is a set of functions
written to perform specific tasks.
In this session, we will first cover control structures followed by 'Functions in Python'. Control
structures are used to control the order of execution of a program on the basis of logic and
values. Also, a function is a set of statements that takes input, does some computation, and
returns an output. Instead of writing a set of commonly-used tasks repeatedly, we can simply
write a function and call it. We will be covering both these concepts in-depth, and you will be
given practice exercises to assimilate the concepts learnt. This is a reasonably challenging
topic; hence, we recommend going through the content a few times before attempting the
practice exercise to gain a better grasp on the subject matter.
In this session, you will learn:
Control structures
If-elif-else
For loop
While loop
List comprehensions
Dictionary comprehensions
Control structures
If-elif-else
For loop
While loop
List comprehensions
Dictionary comprehensions
Functions
Map
Filter
Reduce
You can download the Python notebooks used in the lecture from the link below. It is
recommended that you keep executing the commands on your computer at pace with the
lecture. You can also parallelly try experimenting with other commands that you may have in
mind.
Session Objectives
By the end of this session, you will be able to:
Understand the need for control structures in programming;
Implement various types of control structures;
Implementation of logical operators in decision making statements;
Understand the various types of loops structure; and
Understand the effectiveness of list comprehensions over loop iterations.
People you will hear from in this session
Faculty
Behzad Ahmadi
Data Scientist at Walmart Labs
Behzad is a Doctor of Philosophy (PhD) in Electrical and Computer Engineering;
communication and signal processing from the New Jersy Institue of Technology. He has been
working in the software engineering and data science field for the last 12+ years. Behzad
currently employs his machine learning skill-set to create retail graphs for Walmart labs.
Summary

Congrats on completing your second session, hope you had fun creating the data containers for
various use-cases, let's summarise what you learnt in this session.
To conclude, in this session we first had a look at various data structures supported by Python
which are tuples, list, sets, and dictionaries and then we understood the use-case or say features
each one of these data structures offer. Hope this was an insightful session wherein we saw
certain real-world application-based use-cases and various operations supported by these data
structures.
In the next session, we will dive deep into control structures and functions where you will learn
about programming the decision-making capabilities and automating tasks via several
constructs and loops supported by python.
Additional Reading
If you want to know more about Python data structures and advanced algorithms, click on the
links given below:
1. Python data structures
2. Problem-solving with algorithms
Practice Exercise II - Part II

Download the Python file given below; it includes an exercise to test your understanding of the
concepts learnt in this session:
Based on your answers obtained in the notebook, attempt the following quiz:
Practice Exercise II - Part I

Before you get started with solving the practice exercise, here are some additional questions
for which will help you revise all the concepts you have learnt in this session. Download the
jupyter notebook provided below and get started:
Given below are the solutions to the above questions, download the notebook given below and
take a look at the solution:
Now let's test our newly acquired python skills on some practice questions (un-graded).
Download the Python file given below; it includes an exercise to test your understanding of the
concepts learnt in this session:
Based on your answers obtained in the notebook, attempt the following quiz:
Dictionaries

The first thing that comes to mind when you hear about dictionaries is the Oxford Dictionary
where you can look up the meanings of words. So, you can imagine how dictionaries work. A
dictionary is a collection of words along with their definitions or explanations.
At a broader level, you could describe a dictionary as a mapping of words with their synonyms
or meanings. Let's learn about Python dictionaries conceptually.
The dictionary structure looks as shown below:
Dictionary is one data structure that is very useful because of the way in which it is defined.
Let's take a look at a small example to understand this. Imagine we have employee data with
the following attributes: employee id, name, age, designation. Now, let’s say you want to
retrieve employee details based on employee id (since this is unique to each employee). How
would you store the data?
Let’s say you use a list or tuple here; it would simply be difficult. Let’s see how.
First, you should have a list where each element represents an employee, and this element
should be iterable again since you have different values to represent every employee.
[[e_id1, name1, age1, designation1],[e_id2, name2, age2, designation2]...]
But would this serve the purpose? There is a problem here: e_id is unique and cannot be
changed, or, in other words, it is an immutable entity. Let’s say we make this entire element a
tuple:
[(e_id1, name1, age1, designation1),(e_id2,name2, age2, designation2)...]
Now, this would make name, age and designation immutable entities; instead, they should be
mutable entities:
[(e_id1, [name1, age1, designation1]),(e_id2,[name2, age2, designation2])...]
Having arrived at this point, imagine how it is to retrieve information about a particular
employee based on their employee id. To achieve this, you need to use the loops concepts in
Python, but isn’t this whole thing difficult? Imagine how simple this would be if you used a
dictionary here:
E = { e_id1 : [name1, age1, designation1],e_id2 : [name2, age2, designation2],...}
Here -
e_id is unique;
name, age and designation are mutable; and
simply using E[e_id1] will give employee e_id1’s information.
Now that you understand the application and use of a dictionary, let's learn to declare a
dictionary and also explore some of its features.
By now, you should have a good overview of dictionaries and how to use them. The further
segments will provide you with practice questions which will help you explore more about
lists, tuples and sets.
Sets

In the earlier segments, you learnt about lists and tuples, which are ordered sequences. In this
segment, you will learn about a data structure 'sets' that is unordered, or, in other words, it does
not maintain an order in which elements are inserted. This makes sets unfit for indexing or
slicing, but what is their use then?
By now, you would have found the answer to our earlier question about why sets are used:
They can eliminate duplicates. This feature is necessary while handling massive data where
most of it is just redundant and repeating. Let’s take a look at an example to understand this
better:
Let’s say you have a huge list that contains student grades:
Grades = ['A', 'A', 'B', 'C', 'D', 'B', 'B', 'C', 'D', 'E', 'C', 'C', 'A', 'B', 'F', 'D', 'C',
'B', 'C', 'A', 'B', 'F', 'B', 'A', 'E', 'B', 'B', 'C', 'D'...]
You want to identify distinct grades allotted to students. Obviously, you cannot check every
element of this list; instead, we make use of sets which gets our job done here.
By using the set function on the grades, you would get the distinct grades in the form of a set:
Grades = ["A", "A", "B", "C", "D", "B", "B", "C", "D", "E", "C", "C", "A", "B", "F", "D", "C",
"B", "C", "A", "B", "F", "B", "A", "E", "B", "B", "C", "D"]
set(Grades)
{'A', 'B', 'C', 'D', 'E', 'F'}
With all the conceptual understanding of the application of sets let's learn to declare sets and
add and remove elements from them.
These sets further help you perform all the typical set operations that you learnt in high school.
Imagine you have two sets:
A = {0,2,4,6,8}
B = {1,2,3,4,5}
Set methods
Union represents the total unique elements in both sets.
A.union(B) → {0, 1, 2, 3, 4, 5, 6, 8}
Union represents the total unique elements in both sets.
A.union(B) → {0, 1, 2, 3, 4, 5, 6, 8}
Intersection represents the elements common to both sets.
A.intersection(B) → {2, 4}
Intersection represents the elements common to both sets.
A.intersection(B) → {2, 4}
Difference(A-B) represents the elements present in A and not in B.
A.difference(B) → {0, 6, 8}
Difference(A-B) represents the elements present in A and not in B.
A.difference(B) → {0, 6, 8}
Symmetric difference represents the union of the elements A and B minus the intersection of
A and B.
A^B → {0, 6, 8, 1, 3, 5}
Symmetric difference represents the union of the elements A and B minus the intersection of
A and B.
A^B → {0, 6, 8, 1, 3, 5}
The order of elements shown above may not be the same when you actually execute the above
operations. It is because sets are unordered, which means they do not store the positional index
of an element in the set.
A Fun Activity
Try to decode the set operation give below:
(A.union(B)).difference(A.intersection(B))
Instead of using commands, you can also use simple mathematical operators between sets to
perform the various operations. Go through the below link to understand how:
Operations on Sets
Tuples

Tuples are similar to lists in almost of their functionalities, but there is one significant
difference in both the data structures lists are mutable and tuples are not. Let’s start learning
about tuples.
Let's start by understanding what tuples are.
A tuple contains a sequence of comma-separated values within parentheses. An important
feature of a tuple is immutability, which means the elements of a tuple cannot be altered. It is
used to store data such as employee information, which is not allowed to be changed.
For example: ('Gupta', 24 , 'Project Manager')
Here, we are storing the employee information name, age and designation in the form of a
tuple. This makes this information unchangeable or immutable.
One crucial point to observe from the list of examples that a tuple can also be defined without
using parentheses. For example:
X = 1, 2, 3, 4 → makes X a tuple
Accessing the elements of tuples is similar to accessing elements in a list. You can use the same
indexing method. In indexing of list, each character is assigned an index; similarly, each
element is given an index in a tuple.
Tuples are ordered sequences, which means the order in which the elements are inserted
remains the same. This makes them flexible for indexing and slicing just like lists. Using
slicing, you were able to obtain sections of the data from a list; here, you will be able to obtain
a subset of elements.
Immutability is the differentiating property between lists and tuples.
The elements in a tuple cannot be changed once the tuple is declared. You have to make a new
tuple using concatenation if you wish to change a value in a given tuple. You also saw tuples
can be sorted just like lists.
You learnt that a tuple can have an iterable object or another sequence as its elements.
t = (1,5,"Disco", ("Python", "Java"))
If you apply the type function on the third element, it would return a tuple:
t = (1,5,"Disco", ("Python", "Java"))
type(t[3])
tuple
You saw how the inbuilt dir() function helps to look for the list of methods that can be used
while handling tuples. The dir() function only gives the list of methods available; instead, using
the help() function and passing an empty tuple gives you a brief description of each of the
methods available with tuples.
So far you should have a good understanding of both tuples and lists. In the next segment, you
will be introduced to sets in python.
Lists

In this session, we will understand more about data structure; A data structure is nothing but a
collection of data values.
Practically, whenever you are handling data, you are not given a single number or a string;
instead, you are given a set of values and asked to manage them. There are multiple data
structures that can handle such a set of values, each with its unique properties.
Lists are the most basic data structure available in python that can hold multiple
variables/objects together for ease of use. Let's explore a few examples of lists to understand
more about how they are used.
There are different ways of accessing elements that are present in a list. You can use indexing
to access individual elements from a list or you can use slicing to multiple elements. Implement
it yourself using the code given below:
# Indexing example
L = ["Chemistry", "Biology", [1989, 2004], ("Oreily", "Pearson")]
L[0]

# Slicing
L[0:3]
For more examples of indexing and slicing in lists, you can refer to the python documentation
on lists. Let's continue exploring the capabilities of lists in python.
You can check the members of a list by using the 'in' keyword. The membership check
operation returns a boolean output. Lists are mutable, which means the elements of a list can
be changed.
Some of the essential methods available with lists include:
extend(): Extend a list by adding elements at the end of the list.
append(): Append an object to the end of a list.
The significant difference between the two methods is that the append() method takes an object
passed as a single element and adds it to the end of the list, whereas extend() takes an object
passed as an iterable and adds every element in the iterable at the end of the list. Take a look
at the code below and implement it yourself:
# extend()
L = ["Chemistry", "Biology", [1989, 2004] ,("Oreily" , "Pearson")]
L.extend([5, 8])
L
# append()
L = ["Chemistry", "Biology", [1989, 2004], ("Oreily" , "Pearson")]
L.append([5, 8])
L
In the examples above, you can see that when a list is passed in the extend method, it takes
each element from the list and appends it to the end of the list. However, when the same list is
passed to an append method, it considers this list as a single element and appends it to the end
of the list.
Some of the common list functions that you have learnt above:
pop(index): Remove and return the item at index (default last)
remove(value): Remove the first occurrence of a value in the list
Till now you have learnt indexing, slicing, adding elements, and removing elements from a
list. Now let's understand how to order the elements in a list.
There are two functions you can use to sort a list:
sort(): Sorts the elements of a list in place
sorted(): Assign the sorted elements to a new list and the original list is left as is.
You will learn about a copying technique in python, also know as the Shadow Copying.
You learnt about shallow copying, which is an important concept to understand while handling
lists. As you saw, by assigning B = A, you refer to the same list object in the memory, and the
changes made to list A will be reflected in list B as well. With A[:], you are creating a new
object, and this new object is assigned to B; here, any changes in list A would not affect list B
anymore. Take a look at the image below to understand better:
In this segment, you have learnt about lists. In the upcoming segment, you will be learning
about tuples.
Session Overview

Welcome to the second session on 'Data Structures in Python'.


In the previous session, you learnt the basics of Python and the most common data types used
in Python. In this session, we will extend our discussion to the essential data structures
frequently used in data analysis.
In this session, we will go into the depths of data structures-basically, structures that can hold
the data together. A data structure is a particular way of organizing data in a computer so that
it can be used effectively. We will cover the four builtin data structures in Python i.e., tuples,
lists, dictionaries, and sets, in a fair amount of depth. You will get ample opportunity to practice
the exercises shared. Please space your learning throughout the week and try to go through the
content multiple times before attempting the exercises.
You can open the link below to download the Python notebooks used in this session. We
recommend that you keep executing the commands on your computer in pace with the lecture.
You can parallelly try experimenting with other commands you may have in mind.
Session Objectives
By the end of this session, you will be able to:
Understand the functionalities of list data structure;
Define tuple data structure in python and implement the tuple data structure;
Understand the operations of set data structure;
Understand the intricacies of dictionaries data structure; and
Understand the relationship between the key-value pair in dictionaries.
People you will hear from in this session
Faculty
Behzad Ahmadi
Data Scientist at Walmart Labs
Behzad is a Doctor of Philosophy (PhD) in Electrical and Computer Engineering;
communication and signal processing from the New Jersy Institue of Technology. He has been
working in the software engineering and data science field for the last 12+ years. Behzad
currently employs his machine learning skill-set to create retails graph for Walmart labs.
Summary

Let's summarise what you have learned in this session.


First, we started exploring the very basics of Python and then moved on and wrote our first
Python program. Further which we saw the different data types supported by Python and the
various arithmetic and string operations; finally, we attempted the coding questions based on a
practice exercise.
In the next session, you will have a look at data structures that could store multiple values
together.
Data structures are typically data containers that could store numerous values together-your
driving license details, which might include numeric, text, or alphanumeric values, can be
stored in a single data structure.
Up next, we will dive deep into various data structures supported by Python and learn about
each one of them in detail.
Additional Reading
If you want to learn more about what is provided in this module, you can optimally use the
following resources.
Beginner Level:
1. Think Python
2. The hitchhiker's guide to Python
3. A byte of Python
4. Jupyter Notebook mac shortcuts
5. Jupyter magic commands
Practice Exercise I

By now you are well versed with the concepts of basic Python programming. Given below are
some practice questions. Download the Jupyter Notebook given below and solve these
questions before moving on to the practice exercise:
Once you are done solving the questions, refer to the solution notebook below to find the
correct answers:
Now let us test our newly acquired Python skills on some practice questions (un-graded).
The following Python file consists of certain questions based on the concepts you have learned
in this session. You are expected to code in the Jupyter Notebook to find the correct solutions
to the given questions and answer the below given MCQs.
For your reference Strings' commands and their input parameters are listed on this link.
Based on your answers obtained in the notebook, attempt the following quiz.
String Operations

90% of the data in the world is in form of text data; as we have seen in the last few segments,
python handles text data generally via the string data type; and this makes 'strings' perhaps one
of the most important data types and hence, learning how to work with strings is crucial. You
will learn about various operations and manipulations applicable to a string.
You understood the basics of the string data type, and you also understood the functionalities
a string could provide, such as the escape character or '\' operator. Let's continue exploring the
functionalities of the string data type and start by building a small application that could take
your dessert order!
The application that we built in the above used string concatenation to order a dessert.
#Let's try building a system that could ask for flavour, and the type of desert user wants
flavour=input("What flavour would you like ")
dessert_type=input("What type of dessert would you like ")
print("You have ordered", flavour+"-"+dessert_type)
String concatenation is just one of the way to manipulate a string, let's have a look at what other
string manipulation techniques does Python offer.
You learned about indexing in strings. Always remember that forward indexing starts with 0,
and reverse indexing starts with -1.
We should also keep in mind the distinct feature of immutability that string data type provides,
which means that once a string is created, it cannot be changed.
And for cases where we may want to add data from multiple strings together, such as the
creation of a variable 'Full name' from variables 'First name' and 'Last name' you will use string
concatenation.
Now, let's move on to the next and understand another important concept string slicing.
You saw slicing in a string through several examples. Now, based on your learning attempt the
quiz given below:
Earlier in the segment, you saw how to do indexing and slicing in strings. Now let's take a step
forward and learn how to change the character-case for a string or remove unwanted characters
adjacent to a given string variable.
Apart from the methods you saw there is one more important method which you might want to
use in certain situations. The split() method splits the string into substrings based on the
separator chosen; the final outcome is a list that has the substrings in it.
a = "Hello World!"
print(a.split(" ")) # returns ['Hello', 'World!']
This method is used in situations where you might want to separate certain strings. For
example, consider the categories of products in an e-commerce data set. It might be possible
that they are given in the following manner:
electronics-phone
electronics-headset
furniture-table
Now with the data structured in this manner in order to get a category and the sub-category
level data, you will have to split the words into two sub-strings.
Now, you will learn how to count the number of occurrences of a substring in a string.
Let's practice more questions based on your understanding of this session in the upcoming
segment.
Arithmetic Operations

In the previous segment, you learned about various data types in Python, next let's learn about
Arithmetic operations. Arithmetic operations are an integral part of every programming
language. Let's understand how you perform them in Python:
To do some mathematical operation which involves one or more than one operator, we did
follow some rules. The same is the case in Python, where if multiple operations have to be done
in a single problem, then we make use of operator precedence rule.
Let's understand Operator precedence using an example.
a = 4 + (8 ** 2) - 3 ** 2 % 1
To find the value of the variable 'a' the following steps have to be followed:
Step 1: The first step is to deal with brackets as it holds the highest precedence among all
operators in the given expression. The expression inside these brackets [(8**2)] will get
executed first to return 64.
Updated Expression:
4+64−3∗∗2%1
Step 2: Moving on, you deal with the exponentiation operator [3**2] as it has the next highest
precedence when compared to other operators in the expression.
Updated Expression:
4+64−9%1
Step 3: Now, you deal with the remainder operator as it has higher precedence over subtraction
and addition. This means the value 9%1 gets evaluated to return 0.
Updated Expression:
4+64−0
Step 4: In the next step, the addition operator gets executed as it holds higher precedence over
subtraction.
Updated Expression:
68−0
Step 5: The final step would be to perform subtraction.
Answer:
68
And this is how the operator precedence rule plays an essential part while doing arithmetic
operations in Python. Based on the concepts learnt so far, let's check our understanding of the
concepts using the quiz given below.
A meme to help you remember the operator precedence rule. :)
Additional Resources:
You can also go through this video tutorial to understand the Arithmetic operators.
Data Types in Python

Now that you know how to use Jupyter Notebooks. Let's write your first piece of code-the hello
world program.
Now that you have learnt how to write the basic “Welcome to Upgrad, name” program in
Python, it’s time to understand how to declare a variable in Python. We will also learn about
the different data types available in Python.
The following will offer you an introduction to the basic syntax of declaring a variable in
Python.
You can find the Jupyter Notebook used throughout this session below.
You have learnt that variables are nothing but a memory location to store values assigned to
them. Some of the properties of these variables are:
There is no need to declare the data type of the variable as done in other programming
languages, such as C, C++ and JAVA.
int c = 5
string name = 'Alex'
The variable name (or identifier) cannot start with a number, i.e., declaring something like
2name = 7 throws an error.
Python is case sensitive or in other words, these variables are case sensitive. This means that
declaring name= 2 & Name = 4 would create two different variables.
You understood how to declare a variable in python. Let's use the variables to make a small
application that can calculate the age of a person in months.
You saw that the input function always reads data in a string. And to make arithmetic operations
the variable needs an integer. So you will need to change one data type to another in Python,
this process of changing one data type to another is called typecasting. You saw the process of
typecasting in the video above. Let's look at a few more examples.
You learnt two commands:
Use type() to find the data type of a variable.
For casting variables use the keyword of the target data type.
In order to change the data type of a particular variable, you can simply call the following
inbuilt methods in python.
In the above snapshot of code, you are assigning a value to a variable (x) and then using
typecasting, converting it into different data types. So for example, when you convert x into an
integer it converts the floating-point value into an integer value.
Based on your learning about various data types and typecasting in Python from the previous
attempt the quiz given below.
In the next segment, you will learn about various arithmetic operators in python.
Additional Resources:
Confused between multiple data types and their input parameters?
Here is a quick reference guide on - Various data types in Python
Introduction to Jupyter Notebook

Welcome to the Jupyter Notebook introductory session. You will use the Jupyter IPython
Notebook as the main environment for writing Python code throughout this program and
because of this reason, learning how to use the various functionalities present in the Jupyter
Notebook is extremely crucial so that your coding experience going forward can be smooth.
The main advantage of using Jupyter Notebook is that you can write both code and normal text
(using the Markdown format in Jupyter) in the Notebooks. These notebooks are easy to read
and share, and can even be used to present your work to others.
You can find the Jupyter IPython Notebook used in this session below. To open this notebook.
Download it to your local drive.
Open the Jupyter Notebook as you have seen in the last segment, then find the file that you
downloaded in point number 1 and double click on it.
The notebook will open in a new tab; you can explore the notebook environment and get
comfortable with its use:
Here's a brief of the concepts in the notebook:
Headings
# for the titles
## for the main headings
### for the subheadings
#### for the smaller subheadings
##### for the italic subheadings
Emphasis
__string__ or **string** for bold text
_string_ or *string* for italic text
Monospace fonts
A back single quotation mark ` on both sides to get monospace fonts.
Line breaks
<br> wherever you want a line break, as the notebook sometimes doesn`t give you the required
line break where you want it.
Indenting
> to indent the text
>> for further indenting it, and so on
Bullets and numbering
A single dash, i.e. - followed by two spaces to make bullet points
A number and a dot followed by a space, i.e. 1. to make numbered lists
Colouring
<font color = blue, yellow, red, pink, green, etc.> String </font> to give your font any colour
that you want
LaTeX Equations
$ on both the sides of the text to write LaTeX equations
Next, it is also crucial that you know about the various shortcuts while using the Jupyter
Notebook.
Command mode shortcuts
Esc: To go into command mode
Enter: To go back to edit mode
M: To convert a cell to a markdown cell
Y: To convert a cell back to a code cell
A: To insert a new cell above
B: To insert a new cell below
D + D: To delete cell
Z: Undo the last operation
F: To find and replace on your code
Shift + Up/Down: To select multiple cells
Space: Scroll notebook downwards
Shift + Space: Scroll notebook upwards
Edit mode shortcuts
Shift + Enter: To execute the code in the current cell and go to the next cell
Alt + Enter: To execute the code in the current cell and insert a new cell below
Shift + Tab: To get brief documentation of the object that you have just typed in the coding
cell
Ctrl + Shift + -: To split the cell at the cursor
Shift + M: To merge selected cells
We have also provided a link to all the shortcuts for Mac users below.
Jupyter Notebook Mac shortcuts
You will slowly get used to effortlessly using these commands to write codes efficiently on
your Jupyter Notebook as you move forward, so do not worry about memorising these
commands all at once right now.
Let's answer a couple of questions about what you just learnt before moving forward.
In the upcoming segment, you will be introduced to various data types in python.
Additional Reading for Curious Students: Jupyter Notebook Magic Commands
Getting Started with Python

Python is a general-purpose programming language that was named after Monty Python. It is
simple and incredibly readable since it closely resembles the English language. But still, why
should you use Python?
Python is a language that finds use in nearly every domain possible. Its official website will
give you an overview of this. In addition, its simplicity, as well as the way it ensures tasks can
be performed using fewer lines of code, is encouraging many developers across the world to
take it up.
Currently, there are two common versions of Python: Version 2 and 3 and later. Apart from
some syntactical differences, they are pretty similar. As support for version 2 would fade over
time, our course supports version 3.
To install Python 3 on your system, follow the steps in the document provided below.
Installation Instructions
You will need various Python packages (or synonymously, libraries) for specific purposes.
Anaconda is an open-source distribution that simplifies package management and deployment.
The package management system 'Conda' manages package versions.
We strongly recommend using Anaconda to install Python as well as the packages, since it
comes preloaded with most of the packages you will need.
Advantages of using Anaconda
It is easy to manage and supports most of the libraries required for Machine learning/Artificial
Intelligence problems.
Anaconda comes with many libraries such as NumPy, OpenCV, SciPy, PyQt, the Spyder IDE,
etc.
Anaconda can be downloaded from this link and can be installed like any other regular
software. There is no need to download Python separately; the Anaconda installer will do this
for you. Make sure you select Python 3.x while downloading Anaconda.
Note for experienced Python programmers: In case you are already using Python along with
an existing package manager such as pip or easy_install, you can continue to do so. However,
make sure you are using Python 3.x.
Jupyter Notebook
You will use the Jupyter IPython Notebook as the main environment for writing Python code
throughout this program. The key advantage of using Jupyter Notebook is that you can write
both code and normal text (using the Markdown format in Jupyter) in the notebooks. These
notebooks are easy to read and share, and can even be used to present your work to others.
Here is a brief overview of Jupyter Notebook.
The document given below provides instructions for installing Python and the Jupyter
Notebook using Anaconda.
Also, refer to the document for help with installing Anaconda successfully. It is important to
note that you need to install Python 3.x (the latest 3.x version available), not 2.x.
Please proceed to the next segment only after installing Anaconda and the Jupyter notebook.
In the next segment, you will be introduced to the various commands that are used for
formatting in the Jupyter Notebook.
Additional Reading
If you want to know about the Jupyter notebook in brief check out this link:
Jupyter Notebook quick starter guide
Getting Started with Python

Python is a general-purpose programming language that was named after Monty Python. It is
simple and incredibly readable since it closely resembles the English language. But still, why
should you use Python?
Python is a language that finds use in nearly every domain possible. Its official website will
give you an overview of this. In addition, its simplicity, as well as the way it ensures tasks can
be performed using fewer lines of code, is encouraging many developers across the world to
take it up.
Currently, there are two common versions of Python: Version 2 and 3 and later. Apart from
some syntactical differences, they are pretty similar. As support for version 2 would fade over
time, our course supports version 3.
To install Python 3 on your system, follow the steps in the document provided below.
Installation Instructions
You will need various Python packages (or synonymously, libraries) for specific purposes.
Anaconda is an open-source distribution that simplifies package management and deployment.
The package management system 'Conda' manages package versions.
We strongly recommend using Anaconda to install Python as well as the packages, since it
comes preloaded with most of the packages you will need.
Advantages of using Anaconda
It is easy to manage and supports most of the libraries required for Machine learning/Artificial
Intelligence problems.
Anaconda comes with many libraries such as NumPy, OpenCV, SciPy, PyQt, the Spyder IDE,
etc.
Anaconda can be downloaded from this link and can be installed like any other regular
software. There is no need to download Python separately; the Anaconda installer will do this
for you. Make sure you select Python 3.x while downloading Anaconda.
Note for experienced Python programmers: In case you are already using Python along with
an existing package manager such as pip or easy_install, you can continue to do so. However,
make sure you are using Python 3.x.
Jupyter Notebook
You will use the Jupyter IPython Notebook as the main environment for writing Python code
throughout this program. The key advantage of using Jupyter Notebook is that you can write
both code and normal text (using the Markdown format in Jupyter) in the notebooks. These
notebooks are easy to read and share, and can even be used to present your work to others.
Here is a brief overview of Jupyter Notebook.
The document given below provides instructions for installing Python and the Jupyter
Notebook using Anaconda.
Also, refer to the document for help with installing Anaconda successfully. It is important to
note that you need to install Python 3.x (the latest 3.x version available), not 2.x.
Please proceed to the next segment only after installing Anaconda and the Jupyter notebook.
In the next segment, you will be introduced to the various commands that are used for
formatting in the Jupyter Notebook.
Additional Reading
If you want to know about the Jupyter notebook in brief check out this link:
Jupyter Notebook quick starter guide
Session Overview

In this first session, we will start with the reasons behind using Python as the language of choice
for Data Science. Then, in order to understand how to design and code programs, you need to
understand the types of data that you want to work with and how to manipulate these data types.
To understand this, you will go into the details of the different data types and the operations
possible on each data type. This will be followed by a series of practice exercises to make the
concepts clear. Please ensure that you go through the content multiple times before attempting
the practice questions, especially if you are new to programming.
Session Objectives
By the end of this session, you will be able to:
Understand the need for python programming language;
Understand various arithmetic operators used in python;
Implement arithmetic operators based on operator precedence rule;
Understand various string operations and their importance in python programming; and
Implement various commands used in jupyter notebook.
Let's start with setting up the python programming IDE in the upcoming segment.
Module Introduction

Welcome to the module on 'Introduction to Python'!


In this module
You will get familiarised in working with the basic syntax of Python and introduced to certain
programming basics of Python language.
The first module includes four sessions:
Basics of Python: This session introduces you to Python Programming and the environment
you require for coding. Once you have the basic setup ready, you will learn to write your first
program in Python, followed by learning about different data types. Finally, towards the end
of the session, you will look at various arithmetic and string operations supported by Python.
Data Structures in Python: The session starts with Introducing various data structures in
Python, which include tuples, lists, sets, and dictionaries. Further, you learn about all these data
structures in detail and various operations related to them.
Control Structure and Functions in Python: This session is the essence of programming since
they help computers do what they do best—automate repetitive tasks intelligently. It
incorporates all the decision-making control structures and functions supported by Python.
OOP in Python: This session will teach you about the various object-oriented programming
methodologies in Python that include classes, objects, and methods.
How to learn a new programming language, especially if you are new to programming?
In case you are new to programming, please go through the videos multiple times to make
sense of what is being discussed. Programming is essentially a different way of thinking. Just
like driving, it may feel challenging initially, but once you get the hang of it, it almost becomes
second nature.
Please ensure that you do not go through the content in one go. You need to set up dedicated
time daily throughout the week to learn and review. Trust us; if you do this, you will surprise
yourself with how much you learn and retain through each week.
Unlike other fields, the only way to learn to program is by practice-please feel free to make as
many mistakes as you want when trying different programming tasks. It is very rare that we
get the program right the first time we write it. Expect to make mistakes but learn from them.
Things will slowly start making sense.

We hope you enjoy the process of learning with us.


Guidelines for in-module questions
The in-video and in-content questions for this module are not graded.
Guidelines for graded questions
The lab questions and multiple-choice questions at the end of the module are graded.
People you will hear from in this module
Faculty
Behzad Ahmadi
Data Scientist at Walmart Labs
Behzad is a Doctor of Philosophy (PhD) in Electrical and Computer Engineering;
communication and signal processing from the New Jersy Institue of Technology. He has been
working in the software engineering and data science field for the last 12+ years. Behzad
currently employs his machine learning skill-set to create retails graphs for Walmart labs.
Python for Data Science

Please note, It is mandatory that you go through the Python for Data Science module before
attempting these graded questions.
The questions in this session will adhere to the following guidelines:
MCQs:
All the questions below are graded. All the best!
Graded Lab Session
Lab Session - NumPy

This segment is based on the concepts covered in the Python for Data Science module. In this
segment, you are expected to write and submit a solution for that coding problem on the
platform. These coding problems will require the understanding of only the concepts covered
in this module. After you submit your solutions, you will be able to access the sample solution
to these coding problems. Note that all the coding problems are graded.
Note that there are a limited number of submissions for each code. So, before submitting, do
verify and run your code against sample test cases.
In the next segment, you will solve lab questions based on the concept of pandas.
Summary

Let's summarise what you have learnt in this session. You learnt about the Pandas library,
which provides various functions to conduct data analysis in Python.
The various topics that were covered are:
Pandas Series and Dataframes, which are the basic data structures in the Pandas library
Indexing, selecting and subsetting a dataframe
Merging and appending two dataframes, which can be done using the .merge and .concat
commands
Grouping and summarising dataframes, which can be done using groupby() to first make an
object and then use it to play around
The pivot table function in a dataframe, which is similar to pivot tables in MS Excel
Finally, you learnt how to perform different functions over time-series data using Pandas
Congratulations on the completion of the module on Python for Data Science.
Practice Exercise

Now, solve the questions provided below to test your understanding of the topic. To set the
expectation, you will have to surf through the Pandas documentation to solve some of the
problems. This is intended to inculcate the habit of reading the official documentation to
increase your knowledge.
For your reference, here is a cheat sheet by Pandas that will help you quickly refer to the
command and syntax.
Practice exercise - I: Data preparation
Usually, in machine learning applications, the data is not very hygienic, or you can say not very
easy to load and operate on. This exercise is meant to give you some exposure to how to use
pandas to make a dataset fit for use. Find the Question notebook attached below. The dataset
that will be using for this session is the weather data from Seoul, Korea. You can download the
data from the Kaggle page. Kaggle, a subsidiary of Google, is a platform for the data science
community to share datasets and code notebooks. It will prove a valuable resource in your data
science journey.
Try to solve the question in the notebook on your own. Once you feel like you have given a
genuine attempt, then look up the solution notebook below.
Let's move to the next exercise now.
Practice exercise - II: Movies
In this assignment, you will try to find some interesting insights into a few movies released
between 1916 and 2016, using Python. You will have to download a movie dataset, write
Python code to explore the data, gain insights into the movies, actors, directors, and collections,
and submit the code.
The following Python file consists of certain questions based on the concepts that you have
learned in this session. You are expected to code in the Jupyter Notebook in order to arrive at
the correct solutions to the questions provided and to answer the MCQs given below.
Pivot Tables

A pivot table is quite a useful tool to represent a DataFrame in a structured and simplified
manner. It acts as an alternative to the groupby() function in Pandas. Pivot tables provide excel-
like functionalities to create aggregate tables.
So, you can use the following command to create pivot tables in Pandas:
df.pivot(columns='grouping_variable_col', values='value_to_aggregate',
index='grouping_variable_row')
The pivot_table() function can be used to also specify the aggregate function that you would
want Pandas to execute over the columns that are provided. It could be the same or different
for each column in the DataFrame. You can write the pivot_table command as shown below:
df.pivot_table(values, index, aggfunc={'value_1': np.mean,'value_2': [min, max, np.mean]})
The function above, when substituted with proper values, will result in a mean value of value_1
and three values (minimum, maximum and a mean of value_2) for each row.
In the next segment, you will attempt a few coding questions. But before attempting those
questions, revise all the concepts and codes that have been covered in this session. Since this
module is based on coding, it is important that you practise, as practice is the best way to learn
to code.
Additional resources
Now that you have hands-on experience on Pandas, here is the Pandas official documentation
for your reference.
Merging DataFrames

In this segment, you will learn how to merge and concatenate multiple DataFrames. In a real-
world scenario, you would rarely have the entire data stored in a single table to load into a
DataFrame. You will have to load the data into Python using multiple DataFrames and then
find a way to bring everything together.
This is why merge and append are two of the most common operations that are performed in
data analysis. You will now learn how to perform these tasks using different DataFrames. First,
let’s start with merging. The data set that you have been working with contains only weather
data and no sales data. Sales data is stored in a different data set. How would you combine
these data sets?
So, you see an error when trying to join the two DataFrames. What do you think could be the
reason for this error? Give it some thought; watch the video again if you want to. Here is a hint:
look closely at all the column names in both the DataFrames.
You can use the following command to merge the two DataFrames above:
dataframe_1.merge(dataframe_2, on = ['column_1', 'column_2'], how = '____')
We will take a look at the useful attribute ‘how’, which is provided by the merge function.
The attribute how in the code above specifies the type of merge that is to be performed. Merges
are of the following different types:
left: This will select the entries only in the first dataframe.
right: This will consider the entries only in the second dataframe.
outer: This takes the union of all the entries in the dataframes.
inner: This will result in the intersection of the keys from both frames.
Depending on the situation, you can use an appropriate method to merge the two DataFrames.
Concatenating dataframes
Concatenation is much more straightforward than merging. It is used when you have
dataframes with the same columns and want to stack them on top of each other, or with the
same rows and want to append them side by side.
You can add columns or rows from one dataframe to another using the concat function:
pd.concat([dataframe_1, dataframe_2], axis = _)
To append rows, you have to set the axis value as 0. For adding columns from one dataframe
to another, the axis value must be set as 1. If there are any extra columns or rows where there
are no values, they are replaced with ‘NaN’.
You can also perform various mathematical operations between two or more dataframes. For
example, you may have two dataframes for storing the sales information for 2018 and 2019.
Now, you want the sales data combined for a period of two years. In such a case, the add
function in Pandas allows you to directly combine the two dataframes easily.
Apart from the merge, append or concat, you can perform mathematical operations to combine
multiple dataframes. When two dataframes have the same row and column labels, you can
directly use the mathematical operators provided in the list below:
add(): +
sub(): -
mul(): *
div(): /
floordiv(): //
mod(): %
pow() :**
Pandas will return the derived values with the same labels in a combined dataframe. It also
provides the attribute fill_value to control how you want to deal with the values that are not
common between two dataframes. You can refer to the documentation for the same. For a better
understanding of these, function explores the following notebook.
Let's solve an example of the add() function.
In the next segment, you will learn how to create pivot tables using Pandas.
Additional resources:
A brief explanation to know which operation should be used and when it should be used:
Merge, Join, Append, Concat using Pandas - is there a preference?
Groupby and Aggregate Functions

Grouping and aggregation are two of the most frequently used operations in data analysis,
especially while performing exploratory data analysis (EDA), where it is common to compare
summary statistics across groups of data.
As an example, in the weather time-series data that you are working with, you may want to
compare the average rainfall of various regions or compare temperature across different
locations.
A grouping analysis can be thought of as having the following three parts:
Splitting the data into groups (e.g., groups of location, year, and month)
Applying a function on each group (e.g., mean, max, and min)
Combining the results into a data structure showing summary statistics
You will learn how to perform grouping over the Pandas DataFrames using the same data set
as before.
So, the groupby() function is quite a powerful function, and it can significantly reduce the work
of a data scientist. The groupby() function returns a Pandas object, which can be used further
to perform the desired aggregate functions. Let’s take a look at another example of a groupby()
object in use in the upcoming video.
You will see another example, which is a high-level problem, wherein you will not only use
the groupby() and the aggregate function but also use a user-defined function to create a new
column; you can then apply the groupby() and the aggregate function over this column. It is a
bit complex to reiterate the problem, and it is alright if you feel lost.
NOTE: at 0:30 the text reads says "cill factor", it should actually be "chill factor".
That was a fun example, was it not? Now, before moving any further, note that if you apply
the groupby function on an index, you will not encounter any error while executing the
grouping and aggregation commands together. However, when grouping on columns, you
should first store the DataFrame and then run an aggregate function on the new DataFrame.
In the next segment, you will learn how to deal with multiple DataFrames.
Additional Resources:
Operations on Dataframes

So far, in this session, you have been working with a dummy data set, and the functions that
were performed on the data were more theoretical than practical. From this point, you will be
working with a real-life data set. Find the dataset and the notebook used from this point
onwards linked below.
The data set contains weather data from Australia, and the same data is being used by an FMCG
company to predict sales in their stores. Before going into the details of the tasks that you will
be performing in this session, let’s watch the upcoming video where Behzad will share some
more details.
So, the data set contains weather data from Australia. Take a look at the data dictionary below:
Date: Date on which a data was recorded
Location: The location where the data was recorded
MinTemp: Minimum temperature on the day of recording data (in degrees Celsius)
MaxTemp: Maximum temperature on the day of recording data (in degrees Celsius)
Rainfall: Rainfall in mm
Evaporation: The so-called Class A pan evaporation (mm) in the 24 hours up to 9 AM
Sunshine: Number of hours of bright sunshine in the day
WindGustDir: Direction of the strongest gust of wind in the 24 hours up to midnight
WindGustSpeed: Speed (km/h) of the strongest gust of wind in the 24 hours up to midnight
The type of data that is recorded after a specific time period is called time-series data. The data
being used in this case study is an example of Time-series data. The observations in the data
are recorded periodically after 1 day. We will have a brief discussion on how to work with
time-series data a bit later in the session; for now, let’s focus on the data set and the tasks
associated with it. Behzad will begin solving the examples given in the notebook.
You are already familiar with the filter function, which was used to solve the problem. Similar
to the filtering in NumPy, running a Pandas DataFrame through a conditional statement also
returns boolean values. These boolean values can be used to slice out data.
The next important task that you will learn is to create new columns in the DataFrame. Frankly
speaking, creating new columns is as simple as assigning a column to the output of an operation
that you are carrying out. Although it might seem not so simple but it really is, it will become
clear after you watch the demonstration.
To create new columns, you will use the time-series functionality. So, before moving on to the
actual demonstration, let’s discuss a time series briefly next.
Handling time-series data
Time-series data refers to a series of data points that are indexed over time. The data is recorded
over regular time intervals and is stored along with the time it was recorded. Some common
examples of time series include stock prices, temperature, weather report, etc., as this
information would make sense only when presented with the time it was recorded.
If a date-time variable has values in the form of a string, then you can call the ‘parse_dates’
function while loading the data into the Pandas DataFrame. This will convert the format to
date–time for that particular variable. Fortunately, no such data-type conversion is required in
the given data set. You will learn how to extract the series data.
So, once data is loaded in a date-time format, Pandas can easily interpret the different
representations of date and time.
Apart from handling time-series data, another important feature of a DataFrame is the user-
defined functions. In the previous module, you have already seen that lambda functions are the
most accessible of all types of user-defined functions. Let’s take a look at lambda functions
once again before proceeding further.
Lambda functions
Suppose you want to create a new column ‘is_raining’, which categorises days into rainy or
not rainy days based on the amount of rainfall. You need to implement a function, which returns
‘Rainy’ if rainfall > 50 mm, and ‘Not raining’ otherwise. This can be done easily by using the
apply() method on a column of the DataFrame. You can see a demonstration of the same.
So, you saw the use of the ‘apply()’ method to apply a simple lambda function on a column in
the DataFrame. Now, the next step is to add the data in a new column to the DataFrame.
The columns that are created by the user are known as ‘Derived Variables’. Derived variables
increase the information conveyed by a DataFrame. Now, you can use the lambda function to
modify the DataFrames.
In the upcoming segments, you will learn how to use the groupby function to aggregate the
created DataFrame.
Additional resources
Indexing and Slicing

There are multiple ways to select rows and columns from a dataframe or series. In this segment,
you will learn how to:
Select rows from a dataframe
Select columns from a dataframe
Select subsets of dataframes
The selection of rows in dataframes is similar to the indexing that you saw in NumPy arrays.
The syntax df[start_index:end_index] will subset the rows according to the start and end
indices.
However, you can have all the columns for each row using the function provided above. With
the introduction of column labels, selecting columns is no more similar to that in arrays. Let’s
learn how to select the required column(s) from a dataframe.
The notebook above will help in this segment. Next, Behzad will explain how to extract data
from a DataFrame.
You can select one or more columns from a dataframe using the following commands:
df['column'] or df.column: It returns a series
df[['col_x', 'col_y']]: It returns a dataframe
Pandas series data type:
To visualise pandas series easily, it can be thought of as a one-dimensional (1D) NumPy array
with a label and an index attached to it. Also, unlike NumPy arrays, they can contain non-
numeric data (characters, dates, time, booleans, etc.). Usually, you will work with Series only
as part of dataframes.
You could create a Pandas series from an array-like object using the following command:
pd.Series(data, dtype)
The methods taught above allow you to extract columns. But how would you extract a specific
column from a specific row?
Let’s learn how to do this over Pandas dataframes.
You can use the loc method to extract rows and columns from a dataframe based on the
following labels:
dataframe.loc[[list_of_row_labels], [list_of_column_labels]]
This is called label-based indexing over dataframes. Now, you may face some challenges while
dealing with the labels. As a solution, you might want to fetch data based on the row or column
number.
As you learnt, another method for indexing a dataframe is the iloc method, which uses the row
or column number instead of labels.
dataframe.iloc[rows, columns]
Since positions are used instead of labels to extract values from the dataframe, the process is
called position-based indexing. With these two methods, you can easily extract the required
entries from a dataframe based on their labels or positions. The same set of commands, loc and
iloc , can be used to slice the data as well. Behzad will demonstrate the slicing of DataFrames.
Subsetting rows based on conditions
Often, you want to select rows that meet some given conditions. For example, you may want
to select all orders where Sales > 3,000, or all orders where 2,000 < Sales < 3,000 and Profit <
100. Arguably, the best way to perform these operations is to use df.loc[], since df.iloc[] would
require you to remember the integer column indices, which is tedious. Let’s start first with one
condition to filter the elements in the dataframe.
As you can see, you can easily segregate the entries based on the single or multiple conditions
provided. To get the desired results by subsetting the data, it is important to have well-written
conditional statements.
You already know the basic conditional operators like "<" or ">". There are a couple of other
functions which might come in really handy while handling real-life datasets. These are isin()
and isna().
isin() : Similar to the membership operator in lists, this function can check if the given element
"is in" the collection of elements provided.
isna() : It checks whether the given element is null/empty.
In the next segment, you will learn how to run operations over the dataframes; this will help
you create or modify the stored data.
Describing Data

In the previous segment, you learnt how to load data into a dataframe and manipulate the
indices and headers to represent the data in a meaningful manner. Let's first load the data that
will be used in the demonstrations in this segment. You can use the Jupyter Notebook provided
below to code along with the instructor.
Behzad will demonstrate a different way of hierarchical indexing.
Now that you know how hierarchical indexing is done and you understand its benefits, let's
learn the ways of extracting information from a DataFrame. In this segment, you will learn
some basic functions that will be useful for describing the data stored in the dataframes. The
same notebook will be used in the next segment as well.
While working with Pandas, the dataframes may hold large volumes of data; moreover, it
would be an inefficient approach to load the entire data whenever an operation is performed.
Hence, you must use the following code to load a limited number of entries:
dataframe_name.head()
By default, it loads the first five rows, although you can specify a number if you want fewer or
more rows to be displayed. Similarly, to display the last entries, you can use the tail() command
instead of head().
You learnt about two commands which give statistical information as well:
dataframe.info(): This method prints information about the dataframe, which includes the index
data type and column data types, the count of non-null values and the memory used.
dataframe.describe(): This function produces descriptive statistics for the dataframe, that is, the
central tendency (mean, median, min, max, etc.), dispersion, etc. It analyses the data and
generates output for both numeric and non-numeric data types accordingly.
Let’s try to visually understand the findings of the describe function using a box plot.
In the next segment, you will learn how to slice and index the data in a dataframe.
Pandas - Rows and Columns

An important concept in Pandas dataframes is that of the row and column indices. By default,
each row is assigned indices starting from 0, which are represented to the left of the dataframe.
For columns, the first row in the file (CSV, text, etc.) is taken as the column header. If a header
is not provided (header = none), then the case is similar to that of row indices (which start from
0).
Pandas library offers the functionality to set the index and column names of a dataframe
manually. Let's now learn how to change or manipulate the default indices and replace them
with more logical ones. The required notebook is the same as the previous segment.
You can use the following code to change the row indices:
dataframe_name.index
To change the index while loading the data from a file, you can use the attribute 'index_col':
pd.read_csv(filepath, index_col = column_number)
It is also possible to create a multilevel indexing for your dataframe; this is known as
hierarchical indexing. Let’s watch the following video and learn how to do it.
For column header, you can specify the column names using the following code:
dataframe_name.columns = list_of_column_names
In the next segment, you will learn methods of extracting statistical information from
DataFrames.
Basics of Pandas

Pandas has two main data structures:


Series
Dataframes
The more commonly used data structure are DataFrames. So, most of this session will be
focused on DataFrames. When you encounter series data structure, Behzad will explain them
briefly to you. Let's begin the session by introducing Pandas DataFrames.
DataFrame
It is a table with rows and columns, with rows having an index each and columns having
meaningful names. There are various ways of creating dataframes, for instance, creating them
from dictionaries, reading from .txt and .csv files. Let’s take a look at them one by one.
Creating dataframes from dictionaries
If you have data in the form of lists present in Python, then you can create the dataframe directly
through dictionaries. The ‘key’ in the dictionary acts as the column name and the ‘values’
stored are the entries under the column.
You can refer to the Notebook provided below for this segment.
The below will demonstrate creating DataFrames from dictionaries.
To create a dataframe from a dictionary, you can run the following command:
pd.DataFrame(dictionary_name)
You can also provide lists or arrays to create dataframes, but then you will have to specify the
column names as shown below.
pd.DataFrame(list_or_array_name, columns = ['column_1', 'column_2'])
Creating dataframes from external files
Another method to create dataframes is to load data from external files. Data may not
necessarily be available in the form of lists. Mostly, you will have to load the data stored in the
form of a CSV file, text file, etc.
Download the file provided 'cars.csv' before you proceed.
Pandas provides the flexibility to load data from various sources and has different commands
for each of them. You can go through the list of commands here. The most common files that
you will work with are csv files. You can use the following command to load data into a
dataframe from a csv file:
pd.read_csv(filepath, sep=',', header='infer')
You can specify the following details:
separator (by default ‘,’)
header (takes the top row by default, if not specified)
names (list of column name)
In the next segment, you will learn about row and column indices in a dataframe.
Introduction to Pandas

Pandas is a library specifically for data analysis; it is built using NumPy. You will be using
Pandas extensively for data manipulation, visualisation, building machine learning models, etc.
Let’s hear from Behzad as he explains the topics that will be covered in this session.
As mentioned, Pandas is one of the most used libraries in Python, and this is because of the
powerful data constructs that it offers. You will learn about the data constructs as you move
ahead in the session. But first, to initialise the Pandas library, you can use the following
command:
import pandas as pd
In this session, you will learn about:
Creating dataframes
Importing CSV data files as Pandas dataframes
Reading and summarising dataframes
Sorting dataframes
Labelling, indexing and slicing data
Merging dataframes using joins
Pivoting and grouping
Guidelines for coding console questions
The lectures are interspersed with coding consoles to help you practise writing Python code.
You will be given a brief problem statement and some pre-written code. You can write the code
in the space provided, verify your answer using test cases and submit when you are confident
about your answer.
Note that the coding console questions are non-graded. Some instructions for these questions
are as follows:
Ignore the pre-written code on the console. Please do not change the code.
Write your answer where you are asked to write it.
You may run your codes and verify them any number of times.
People you will hear from in this session
Faculty
Behzad Ahmadi
Data Scientist at Walmart Labs
Behzad is a Doctor of Philosophy (PhD) in Electrical and Computer Engineering;
Communication and Signal Processing from the New Jersey Institute of Technology. He has
been working in the software engineering and data science field for more than 12 years. Behzad
currently employs his machine learning skillset to create retail graphs for Walmart Labs.
Summary

Let's summarise what you have learnt in this session.


In this session, you learnt about the most important package for scientific computing in Python:
NumPy. The various operations that you learnt about include:
Arrays, which are the basic data structure in the NumPy library
Creating NumPy arrays from a list or a tuple
Creating randomly large arrays which can be done using the arange command
Analysing the shape and dimension of an array using array.shape, array.ndim and so on
Indexing, slicing and subsetting an array, which is very similar to indexing in lists
Working on multidimensional arrays
Manipulating arrays using reshape(), hstack() and vstack()
In the next session, you will dive deep into a new library, pandas, which is used to manipulate
heterogeneous data.
Additional Reading:
If you want to learn more about this topic than what is covered in this module, you can
optionally use the additional resources provided below.
NumPy in detail
Practice Exercise II

The following Python file consists of certain questions based on the concepts you have learnt
in this session. You are expected to code in the Jupyter Notebook to find the correct solutions
to the given questions and answer the MCQs given below.
Here is a cheat sheet for you to quickly refer to the commands and syntax.
As discussed during the previous lectures, NumPy is a library that helps in scientific
calculations. The notebook exercise given below will ask you to do the same. You will be
writing code to perform a data transformation which is common in machine learning. Although
the task you will be performing can be easily done using machine learning libraries like scikit
learn (which you will learn later in the program), but it is always a good practice to write the
code from scratch. The notebook below has the questions for you to try. It will need some
functions which you might not have studied in the session, but you can always visit the
documentation to find the function that you want to use.
The solutions to the problems in the notebook above are given below. Before you look at the
solutions try to give the practice problem your best shot. You could also reach out to your
teaching assistants (TAs) or the discussion forum if you get stuck anywhere.
This activity is designed to give you an opportunity to learn how to write code from scratch.
Additional Practice Questions
Draw Tic Tac Toe gameboard
Computation Times in NumPy vs Python Lists

You will often work with extremely large datasets; thus, it is important for you to understand
how much computation time (and memory) you can save using NumPy as compared with the
use of standard Python lists.
To compare both these data structures, it is recommended that you code along with us and
experiment with the different kinds of data to see the difference in real-time.
Now, let's compare the computation times of arrays and lists through a simple task of
calculating the element-wise product of numbers.
There is a huge difference in the time taken to perform the same operation using lists vs the
Numpy arrays. Let’s try to find the ratio of the speed of the NumPy array as compared to lists.
In the example discussed above, NumPy is an order of magnitude faster than lists. This is with
arrays of sizes in millions, but you may work on much larger arrays with sizes in billions. Then,
the difference may be even larger.
Some reasons for such difference in speed are as follows:
NumPy is written in C, which is basically being executed behind the scenes.
NumPy arrays are more compact than lists, i.e., they take less storage space than lists.
The following discussions demonstrate the differences in the speeds of NumPy and standard
Python lists.
Why are NumPy arrays so fast?
Why NumPy instead of Python lists?
Mathematical Operations on NumPy II

The objective of this segment is to cover the mathematical capabilities of NumPy. Note that
these mathematical functions might not be of direct use for you. In actual practice as a data
scientist, you might not use these functions, but the advanced functions that you will use will
be built using these functions. So, it would help if you remember that the NumPy library has
all of these capabilities.
Let’s understand the trigonometric capabilities of NumPy.
The next set of mathematical capabilities is exponential and logarithmic functions.
You learnt about the mathematical functions that can be directly calculated. Another important
feature offered by NumPy is empty arrays, where you can initialise an empty array and later
use it to store the output of your operations. Behzad will explain how to create empty arrays
and use them.
Once you have created an array, you may also want to run aggregation operations on the data
stored in it. An aggregation function helps you summarise the numerical data.
Using the reduce() and accumulate() functions, you can easily summarise the data available in
arrays. The reduce() function results in a single value, whereas the accumulate() function helps
you apply your aggregation sequentially on each element of an array. These functions require
a base function to aggregate the data, for example, add() in the case given above.
The last mathematical capability that we will discuss is the linear algebra module in the NumPy
library. Linear algebra is a significantly used module in machine learning. You might not be
expected to write code using the NumPy library, but the functions being used will depend on
the functions demonstrated below. You will learn about the linear algebra functions in detail.
You learnt about numerous functions, such as rank and the inverse of a matrix. Although
NumPy can calculate the results of these functions, the operations need to be valid. For
example, if it is not possible for a matrix to be inverted, then the NumPy inverse operations
will also throw an error. With this, we have come to the end of the demonstration of the
mathematical capabilities of the NumPy library. let’s summarise what you learnt in the last
segments.
In the next segment, we will have a detailed demonstration of the computational speeds of
NumPy arrays versus the lists.
Mathematical Operations on NumPy

The objective of this segment is to discuss the mathematical capabilities of NumPy as a library
that is meant for scientific calculations. Behzad will walk you through the topics mentioned
below.
Manipulate arrays
Reshape arrays
Stack arrays
Reshape arrays
Stack arrays
Perform operations on arrays
Perform basic mathematical operations
Power
Absolute
Trigonometric
Exponential and logarithmic
Perform basic mathematical operations
Power
Absolute
Trigonometric
Exponential and logarithmic
Power
Absolute
Trigonometric
Exponential and logarithmic
Apply built-in functions
Apply your own functions
Apply basic linear algebra operations
In the first half of this segment, you will learn about concepts such as reshaping and stacking
arrays. These concepts are important and might come in handy. In the latter half of this
segment, demonstrations will show all the mathematical topics mentioned above. The notebook
below will be useful in this and the next segment.
You will learn about the operations on arrays with mismatching dimensions.
Behzad performed algebraic operations on arrays. A point to note here is that when the shapes
of NumPy arrays are not the same, the arrays cannot be operated on. Because of this restriction
on the operability, the flexibility of NumPy reduces significantly. Fortunately, the developers
of NumPy noticed this issue and developed a few functions that can modify the shape of a
NumPy array. They will be covered one by one.
The commands demonstrated in this can be used to change the dimensions of a given NumPy
array. These commands are as follows:
hstack: It puts two arrays with the same number of rows together. By using this command, the
number of rows stays the same, while the number of columns increases.
vstack: It puts two arrays on top of each other. This command works only when the number of
columns in both the arrays is the same. This command can only change the number of rows of
an array.
reshape: It can change the shape of an array as long as the number of elements in the array
before and after the reshape operation is the same.
Now that you have learnt about the function that can be used to reshape a NumPy array, let’s
take a look at some inbuilt functions in the NumPy library that can help transform an array.
You learnt about a few commands such as power and absolute that are available in the NumPy
library. In the next segment, let’s continue exploring the mathematical capabilities of the
NumPy library.
Additional Resources
Creating NumPy Arrays

In the previous segments, you learnt how to convert lists or tuples to arrays using np.array().
There are other ways in which you can create arrays. The following ways are commonly used
when you know the size of the array beforehand:
np.ones(): It is used to create an array of 1s.
np.zeros(): It is used to create an array of 0s.
np.random.randint(): It is used to create a random array of integers within a particular range.
np.random.random(): It is used to create an array of random numbers.
np.arange(): It is used to create an array with increments of fixed step size.
np.linspace(): It is used to create an array of fixed length.
The iPython file attached here has been used in the demonstrations in this segment.
Let’s take a look at each of these methods one by one.
You learnt about the functions of ones and zeros. Learn about the other functions that are used
to create arrays.
Behzad talked about creating NumPy arrays using the functions mentioned earlier. But, if you
notice, all the arrays that Behzad created were one-dimensional. These functions can also be
used to create multidimensional arrays. You will learn how to create nD arrays.
Many other functions can also be used to create arrays. A few methods that will not be covered
in this segment are mentioned below. Please read the official NumPy documentation to
understand the usage of these methods.
np.full(): It is used to create a constant array of any number ‘n’.
np.tile(): It is used to create a new array by repeating an existing array a particular number of
times.
np.eye(): It is used to create an identity matrix of any dimension.
In the next segment, you will learn about the mathematical methods available for performing
mathematical transformations on NumPy arrays.
Multidimensional Arrays

Until now, you learnt about one-dimensional arrays, where all the data is stored in a
single line or row. In this segment, you will learn about multidimensional array.
A multidimensional array is an array of arrays. For example, a two-dimensional array
would be an array with each element as a one-dimensional array.
1-D array : [1, 2, 3, 4, 5]
2-D array : [ [1, 2, 3, 4, 5], [6, 7, 8, 9, 10] ]
Similarly, a three-dimensional array can be thought of as an array with each element as
a two-dimensional array. To create multidimensional arrays, you can give a
multidimensional list as an input to the np.array function.
Let’s hear from Behzad, as he explains two-dimensional arrays in detail.
NumPy arrays have certain features that help in analysing multidimensional arrays.
Some features are as follows:
shape: It represents the shape of an array as the number of elements in each dimension.
ndim: It represents the number of dimensions of an array. For a 2D array, ndim = 2.
Similar to 1D arrays, nD arrays can also operate on individual elements without using
list comprehensions or loops. The following video will give you a small demonstration
of operating on nD arrays.
You can multiply the different elements in an array with different values. This property
of NumPy is called broadcasting. As this is a slightly advanced topic with respect to the
scope of this module, it will not be explained in details here. However, If you wish to
learn more about broadcasting, then you can visit the following link.
In NumPy, the dimension is called axis. In NumPy terminology, for 2-D arrays:
axis = 0 - refers to the rows
axis = 1 - refers to the columns
Multidimensional arrays are indexed using as many indices as the number of
dimensions or axes. For instance, to index a 2-D array, you need two indices: array[x,
y]. Each axis has an index starting at 0. The figure provided above shows the axes and
their indices for a 2-D array.
You will learn how to subset and operate on multidimensional arrays.
The indexing and slicing of nD arrays are similar to those of 1D arrays. The only
difference between the two is that you need to give the slicing and indexing instructions
for each dimension separately. See the example below:
players_converted[:, 0]
Returns all the rows in the 0th column. Here, ‘:’ is the instruction for all the rows, and
0 is the instruction for the 0th columns. Similarly, for a 3D array, the slicing command
will have three arguments. You can also opt for conditional slicing and indexing of data
in nD arrays. Behzad will explain this concept in detail.
Behzad demonstrated a few different ways to slice an array. You can apply a condition
on the array itself or use a different array with the same dimensions to apply the
condition. Now, let’s summarise the learnings of this segment.
In the next segment, you will learn how to initialise fixed-length one-dimensional
NumPy arrays using different functions.
Additional Resouces
Practice Exercise I

The following Python file consists of certain questions based on the concepts you learnt in this
session. You are expected to code in the Jupyter Notebook provided below to find the correct
solutions to the questions given below:
Operations Over 1-D Arrays

In the previous segment, you learnt how to create NumPy arrays using existing lists. Once you
have loaded the data into an array, NumPy offers a wide range of operations to perform on the
data.
Next, you will learn how to efficiently operate on NumPy arrays.
You learnt about the calculation of BMI using NumPy arrays. In the BMI example given above,
if you had been working with lists, then you would have needed to map a lambda function (or
worse, write a for loop). Whereas, with NumPy, you simply use the relevant operators, as
NumPy does all the back-end coding on its own.
Now that you have learnt how to use operators to perform basic operations on a 1D array, let’s
understand how to access the elements of an array. For one-dimensional arrays, indexing,
slicing, etc. are similar to those in Python lists, which means that indexing starts at 0. The
following will demonstrate the methodologies of indexing and slicing arrays.
As explained, indexing refers to extracting a single element from an array, while slicing refers
to extracting a subset of elements from an array. Both indexing and slicing are exactly the same
as those in lists. Having a unified method of extracting elements from lists and NumPy arrays
helps in keeping the library simpler.
The aforementioned element extraction methods will only help you when you know the
location of the element that you want to extract. In the following video, you will learn how to
access elements based on a condition.
To summarise, similar to lists, you can subset your data through conditions based on your
requirements in NumPy arrays. To do this, you need to use logical operators such as ‘<’ and
‘>’. NumPy also has a few inbuilt functions such as max(), min() and mean(), which allow you
to calculate statistically important data over the data directly. Behzad will explore these
functions and also summarise the learnings of this segment.
Please note - For the embedded coding question listed below, we request you to directly type
down the code as the console loads the libraries used by Python automatically, also if you wish
to directly copy and paste this code to Jupyter or any other external IDE, you would be required
to manually add the libraries for code to run.
Let's move to the next segment where you will be solving a few problems on NumPy arrays.
Basics of NumPy

NumPy, which stands for ‘Numerical Python’, is a library meant for scientific calculations.
The basic data structure of NumPy is an array. A NumPy array is a collection of values stored
together, similar to a list.
You will learn about the difference between lists and NumPy arrays.
This mentioned two different advantages that NumPy arrays have over lists. These include:
Ability to operate on individual elements in the array without using loops or list comprehension
Speed of execution
The demonstration in the video above did not cover the aspect of speed, so for now, you can
assume that a NumPy array is faster than a list. Later in this session, you will be able to take a
look at a detailed demonstration to compare the speed of NumPy arrays.
You can download the IPython notebook used in the lecture from the link given below. As
mentioned in the introduction, you are expected to code along with the instructor in the
notebook.
Now, let’s continue exploring the properties of NumPy arrays. You will learn how to use
operators to perform operations on NumPy arrays
You learnt how a NumPy array behaves differently with the ‘+’ operator. You also learnt that
a NumPy array can be created using a pre-existing list. There will be more details on the use
of operators with arrays in the latter part of the session. For now, let’s discuss how to create
arrays.
Creating NumPy Arrays
There are two ways to create NumPy arrays, which are mentioned below.
By converting the existing lists or tuples to arrays using np.array
By initialising fixed-length arrays using the NumPy functions
In this session, you will learn about both these methods.
The key advantage of using NumPy arrays over lists is that arrays allow you to operate over
the entire data, unlike lists. However, in terms of structure, NumPy arrays are extremely similar
to lists. If you try to run the print() command over a NumPy array, then you will get the
following output:
[element_1 element_2 element_3…]
The only difference between a NumPy array and a list is that the elements in the NumPy array
are separated by a space instead of a comma. Hence, this is an aesthetic feature that
differentiates a list and a NumPy array.
An important point to note here is that the array given above is a one-dimensional array. You
will learn about multidimensional arrays in the subsequent segments.
Another feature of NumPy arrays is that they are homogeneous in nature. By homogenous, we
mean that all the elements in a NumPy array have to be of the same data type, which could be
an integer, float, string, etc. The quiz below will help you understand the homogeneity of
NumPy arrays a bit better.
In the next segment, you will learn about the different operations that can be performed on one-
dimensional NumPy arrays.
Introduction to NumPy

Welcome to this module on Python for Data Science.


In this module
You will learn about the two most important and popular Python libraries for handling data:
NumPy and Pandas. Let's hear from our SME, Behzad Ahmadi, who will explain the contents
of the module.
In this module, you will learn about the basics of NumPy, which is the fundamental package
for scientific computing in Python. NumPy consists of a robust data structure called
multidimensional arrays. Pandas is another powerful Python library that provides a fast and
easy-to-use data analysis platform.
Important Note
To enhance the learning outcome, you are expected to pause the videos and code along with
the instructor. You will be provided with a structured and blank IPython notebook for coding.
It is a must for you to answer certain in-segment questions, as they serve the purpose of
practice, and the final notebook will have practice tasks for you to solve.
Prerequisite for This Module
The learners are expected to have gone through the module on ‘Introduction to Python’ before
beginning with this module.
In this session
Behzad will talk about the scope of the session.
You will understand the advantages of using NumPy. You will also learn how to:
Create NumPy arrays,
Convert lists and tuples to NumPy arrays,
Inspect the structure and content of arrays, and
Subset, slice, index, and iterate through arrays.
Before we get into the technicalities of a NumPy array, explore its useful functions and
understand how it is implemented in Python, it is crucial to understand why NumPy is an
important library for working with data.
NumPy, an acronym for the term ‘Numerical Python’, is a library in Python which is used
extensively for efficient mathematical computing. This library allows users to store large
amounts of data using less memory and perform extensive operations efficiently. It provides
optimised and simpler functionalities to perform the aforementioned operations using
homogenous, one-dimensional and multidimensional arrays (You will learn more about this
later.).
Now, before delving deep into the concept of NumPy arrays, it is important to note that Python
lists can very well perform all the actions that NumPy arrays perform; it is simply the fact that
NumPy arrays are faster and more convenient than lists when it comes to extensive
computations, which make them extremely useful, especially when you are working with large
amounts of data.
Guidelines for coding console questions
The lectures are interspersed with coding consoles to help you practise writing Python code.
You will be given a brief problem statement and some pre-written code. You can write the code
in the provided space, verify your answer using test cases and submit it when you are confident
about it.
Note that the coding console questions are non-graded. Some instructions for these questions
are as follows:
Ignore the pre-written code on the console. Please do not change it.
Write your answer only in the space where you are asked to write.
You may run and verify your codes any number of times.
People you will hear from in this session
Faculty
Behzad Ahmadi
Data Scientist at Walmart Labs
Behzad is a Doctor of Philosophy (PhD) in Electrical and Computer Engineering;
Communication and Signal Processing from the New Jersey Institute of Technology. He has
been working in the software engineering and data science field for more than 12 years. Behzad
currently employs his machine learning skillset to create retail graphs for Walmart Labs.
Practice Questions

Let's check your knowledge of recursive programming by predicting the outputs of some
recursive functions below. Please note that these questions are just meant for practice; they are
not graded.
Let's practice recursive programming through a coding exercise as well.
Fibonacci

Fibonacci numbers are quite interesting and have a lot of significance in mathematics and
computer sciences in general. You can read more about it here.
In the previous module, you wrote a piece of code to find Fibonacci series using loops. You
can refer to the segment here. Let's try to do it using recursion now.
Using recursion carelessly here has huge costs. Let's say you call fib(5). This will in turn call
fib(4) and fib(3) and further fib(4) will call fib(3) and fib(2). Do you realise the problem now?
You are evaluating fib(3) again and again, even more so for fib(2) and fib(1). Don't you think
this is just a wastage of precious computing time? This is not only wastage of computing time,
but fib(5) will not give output and will be running until fib(4) and fib(3) give their outputs
which means it will be occupying the memory space as well, and fib(4) will not give its output
until it evaluates fib(3) and fib(2) and so on. The amount of memory space taken up will also
go up if we call fib() on a larger number.
Now, try to find the nth Fibonacci number using recursion.
Using recursion carelessly here has huge costs. Let's say you call fib(5). This will in turn call
fib(4) and fib(3) and further fib(4) will call fib(3) and fib(2). Do you realise the problem now?
You are evaluating fib(3) again and again, even more so for fib(2) and fib(1). Don't you think
this is just wastage of precious computing time? This is not only wastage of computing time,
but fib(5) will not give output and will be running until fib(4) and fib(3) give their outputs
which means it will be occupying the memory space as well, and fib(4) will not give its output
until it evaluates fib(3) and fib(2) and so on. The amount of memory space taken up will also
go up if we call fib() on a larger number.
One way to solve this is by making a global list that will save all Fibonacci numbers calculated.
You will calculate a new Fibonacci number only if it is not already evaluated. This type of
approach is called Dynamic Programming. We will not be going in-depth on Dynamic
Programming keeping in mind that it is a fairly complex concept for this beginner stage. The
key takeaway, however, is to remember and use recursion carefully since it has this huge space
and time complexity associated with it and can backfire if not used carefully.
You may read about the Dynamic Programming solution for Fibonacci here.
Printing

Having seen how to break a recursive function, can you try and think of a recursive function
that will perform the task as explained in the question below?
Try to think in terms of the three components discussed in the previous segment.
Here Sajan explaining the same problem in terms of the three components described in the
previous segment: base case, recursive call and action.
This type of approach is always going to work for all kinds of recursion functions. Even the
most complex recursive function will boil down to these three components as discussed. But
this does not mean you should use recursion so very often just because it's easy to understand,
code and implement.
There is always a huge cost in terms of space and time complexities if recursion is not used
carefully. This is a double-edged sword. Let's learn more about it in the next problem.
This type of approach is always going to work for all kinds of recursion functions. Even the
most complex recursive function will boil down to these three components as discussed. But
this does not mean you should use recursion so very often just because it's easy to understand,
code, and implement.
Factorial

You already have studied recursive functions in the previous module. In this segment, we will
quickly revisit the same topic. Then try to break down the components of the recursive function
and give some structure to it. This should make algorithms involving recursive functions easier
to think of and implement.
Sajan will explain some things that you should keep in mind whenever you try to solve a
problem using recursion.
The example we will take to break down recursion is the factorial problem that you all have
solved in the previous module. There you used loops to solve the problem, in this case, we will
use recursion to solve the same problem. Let's see how Sajan breaks down recursion.
As you saw, any recursive algorithm consists of three major components:
Base case
Recursive call
Action
First, you make a recursive call, the function will keep calling itself again and again infinitely.
This is where the base case comes into action. This will determine where the recursion will not
call itself again. Like in the example taken, fact(1) will call 1*fact(0), and fact(0) will not call
recursion again, but return 1. Action is always written thinking that the recursive call will give
the correct output. You just take a leap of faith that fact(n-1) will return the (n-1)! and write
the action, that is, multiplying by n.
Let's see Sajan implement these three steps in code.
Try and code this problem keeping in mind the three components of a recursive function.
In the next segment, you will solve an interesting problem that requires the application of
recursion.
Session Overview

In this session, you will start with a very basic problem and solve it using recursion. While
doing that, you will learn about three basic components of recursion and how to break down
any recursion problem or think of a recursion algorithm in a very easy and structured way.
Using that you will then solve a problem to improve your understanding of the same. In the
end, you will see how recursion has huge space complexity and time complexity associated
with it and why we have to be careful while using recursion. Let's begin.
Practice Questions

In the previous segment, you saw how you can use the two-pointer technique to sort a list of
0s and 1s. Let's put this new-found knowledge to action by solving a similar coding console
question below. Please note that this question is just meant for practice; it is not graded.
Let us solve another coding problem which is listed as follows.
Sort 0s and 1s

In the last session, you learnt about different sorting algorithms. Now, these algorithms are
crafted for a general case, wherein you are unsure about the type of numbers present in the list.
But in some cases, your list only has a specific set of numbers, say just 0 and 1.
When you need to do sorting in such cases, you use your general programming knowledge to
solve the problem; since standard sorting algorithms provide an inefficient run-time. For
example, a list containing only 0s and 1s can be sorted in just O(n) by traversing the list only
once. This can be achieved using a slight variation of the two-pointer technique that we just
saw. Take a look at the following algorithm to see how you can achieve this:
Initialise two variables 'i' and 'j' to 0, indicating that they are pointing at the first element of the
vector.
'i' here will be used to travel through the list and 'j' will be pointing to the first occurrence of 1
in the list.
Run a loop from 0 to the (len(list)-1), using the variable i.
If the current element, i.e. list[i] is 0, swap list[i] and list[j] and increment the value of ‘j’ by 1
and also 'i' by 1.
If the list[i] is 1, increment only 'i' by 1.
Now, look at the code given below to understand the process further.
# Program to sort a list of 0s and 1s in one traversal of the list

# Given list of 0s and 1s


v = [0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0]
n = len(v)
# Initialise two variables 'i' and 'j' to 1, indicating that they are currently pointing at the
# first element in the list.
i=0
j=0

# Run a loop from 1 to the length of the list, with the variable 'i'
for i in range(n):

# If you encounter a zero, swap the values between v[i] & v[j] and increment 'j' as well. 'i'
# anyway gets incremented with every iteration of the loop. Think about it. This way, 'j' will
# always point at the first '1' that hasn't been sorted. Swapping the value of v[i]
# and v[j] will help replace the 1s with 0s that come after it. If v[j] is pointing at zero,
# swapping won't matter anyway.
if(v[i] == 0):
temp = v[j]
v[j] = v[i]
v[i] = temp
j=j+1

# Print the sorted list


print(v)
In the next segment, you will start solving some practice questions based on what you have
learnt in this session so far.
Specific Sum

Given a list of integers and a number, you have to find two integers in the list whose sum is
equal to the number given.
Normally, you would check for all the pairs in the list, but this will take a lot of time. Let’s
hear Sajan explain this.
Now that is a very bad algorithm, we can do much better than that. Let’s see if we can do better
than
O(n2).
You just saw how using two iterators or pointers cleverly makes the algorithm more efficient
sometimes. Can you try and convert this logic into a working code?
Let's see Sajan convert this logic into code below.
It need not always be just two pointers; you may have three or even four. You learnt about
sorting in the previous session. We saw how we can sort in
O(n∗log(n))
and not any better. But a few lists are special. Let’s see them in the next segment and understand
what is so special about them.
Merge Sorted

While learning merge sort, we assumed we could merge two sorted lists in linear time
complexity. Let’s learn how to perform that task.
Here you saw how to merge the sorted lists in linear time. Can you try to code the same
below?
Earlier, in most cases, we would iterate the loop using only one pointer or an iterator. You
also attempted a few questions where having more than one iterator sometimes helped in
saving time. For example, when you tried to code a program to find the second smallest
element from an integer list using two pointers, one iterator pointed towards the smallest
value and the other on the second smallest value. This saved the time that would have been
required for the second iteration, thereby resulting in better or more efficient performance.
Another example we saw was in the case of binary search, where we reduced the search time
complexity from O(n) to O(log n) using two pointers (left and right) instead of only one as
used in a linear search.
Let's see one more example where having two pointers instead of one can help us solve the
question in a much more efficient way.

You might also like