0% found this document useful (0 votes)
102 views

Lösungen Zu Den Exercises AI Python

The document provides solutions to exercises from Module 1-5 of a course. Some key solutions include: 1) Using list comprehensions and sets to merge and sort lists, filter lists based on conditions, and concatenate dictionaries. 2) Applying pandas to load, clean, and analyze stock market data, including correcting date formats and data types. 3) Implementing logic puzzles and knowledge graphs using Prolog-style rules to infer relationships and solve puzzles. 4) Building a family tree knowledge graph from a JSON file and answering queries about relationships through inference. 5) Implementing iterative deepening search to solve puzzles by adapting depth-first search and comparing performance to other search algorithms on graphs

Uploaded by

Ozan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Lösungen Zu Den Exercises AI Python

The document provides solutions to exercises from Module 1-5 of a course. Some key solutions include: 1) Using list comprehensions and sets to merge and sort lists, filter lists based on conditions, and concatenate dictionaries. 2) Applying pandas to load, clean, and analyze stock market data, including correcting date formats and data types. 3) Implementing logic puzzles and knowledge graphs using Prolog-style rules to infer relationships and solve puzzles. 4) Building a family tree knowledge graph from a JSON file and answering queries about relationships through inference. 5) Implementing iterative deepening search to solve puzzles by adapting depth-first search and comparing performance to other search algorithms on graphs

Uploaded by

Ozan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Lösungen zu den Exercises von Modul 1

Exercise 1.2: Take the following list and write a program that
prints out all the elements of the list that are less than 5.
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

b = [num for num in a if num < 5]


print(b)

Exercise 1.3: Take the following two lists and write a program that
returns a list that contains all elements of the lists (without
duplicates). Make sure your program works on two lists of
different sizes. Moreover, try to find a 1-line-solution (using sets).
import random
a = random.sample(range(1, 100), 5)
b = random.sample(range(1, 100), 12)
print(a)
print(b)
# function to merge and sort two lists
def merge_and_sort_lists(x,y):
merged_list = list(set(x+y))
merged_list.sort()
return merged_list

print(merge_and_sort_lists(a,b))

Exercise 1.5: Write a Python program to concatenate following


dictionaries to create a new one.
dic1={1:10, 2:20}
dic2={3:30, 4:40}
dic3={5:50, 6:60}

dic4 = {}
for x in (dic1, dic2, dic3):
dic4.update(x)
print (dic4)

Exercise 1.7: Generate a random number between 1 and 15


(including 1 and 15). Ask the user to guess the number, then tell
him/her whether he/she guessed too low, too high, or exactly
right. Remark: Import and use the random library.
# generate a random number
import random
number = random.randint(1,15)
# remark: user input can be captured using the command input() --> n=int(input())
guess = -1
while guess != number:
guess = int(input())
if guess < number:
print("too low")
elif guess > number:
print("too high")
else:
print("exactly right")

Exercise 2.1: Load the data-set '02_dow_jones_index.data' using


pandas ('02_dow_jones_index.names' contains a description of the
data).
df = pd.read_csv('02_dow_jones_index.data')

Exercise 2.2: Validate the numerical columns in the following


ways: (a) list all values of the columns with non-numerical values
and their occurrences (use a dictionary); (b) convert share prices
to floats (spoiler alert: df['open'] =
df['open'].str.strip('$').astype('float64') ); (c) look at the
dispersion of the columns with numerical values (7-number-
summary and/or boxplot)
df['open'].describe()
# convert share price to float
df['open'] = df['open'].str.strip('$').astype('float64')

df['open'].describe()

Exercise 2.3: Check if all rows are in the proper chronological


order and fix the order if necessary. Correct the date (spoiler alert:
df['date_corr'] = pd.to_datetime(df['date']) ) and plot the chart of
the closing stock price for the Cisco share.
# correct a date
df['date_corr'] = pd.to_datetime(df['date'])
# show some entries
df.head()

# also fix closing price


df['close'] = df['close'].str.strip('$').astype('float64')
df.describe()

df[df.stock=='CSCO'].plot(x = 'date_corr', y = 'close', title = 'Closing price of Cisco')


Lösungen zu den Exercises von Modul 2
Exercise 3.3: Solve the following questions by creating inference
rules on this logic-based knowledge base: (a) Which kind of drink
does the Englishman prefer? (b) Who drinks beer? (c) Which
colors have the houses from left to right?
Given the solutions variable from the notebook that was shown in the lecture, the three
questions can be answered as follows:

# Which kind of drink does the Englishman prefer?


output_drink = [house for house in solutions[0] if 'Englishman' in house][0][2]
print ('\n' + 'Englishman prefers ' + output_drink)

# Who drinks beer?


output_beer = [house for house in solutions[0] if 'beer' in house][0][0]

print ('\n '+ output_beer + ' drinks beer.')

# Which colors have the houses from left to right?


output_zebra = [house for house in solutions[0]]
for house in output_zebra:
print(house[4])

Exercise 3.4: Solve the following puzzle! (as described in the


notebook)
# import packages
from kanren import *
from kanren.core import lall

# Declare the variable


people = var()

# Define the rules


rules = lall(
# There are 4 people
(eq, (var(), var(), var(), var()), people),

# Steve's car is blue


(membero, ('Steve', var(), 'blue', var()), people),

# Person who has a cat lives in Canada


(membero, (var(), 'cat', var(), 'Canada'), people),

# Matthew lives in USA


(membero, ('Matthew', var(), var(), 'USA'), people),

# The person who has a black car lives in Australia


(membero, (var(), var(), 'black', 'Australia'), people),
# Jack has a cat
(membero, ('Jack', 'cat', var(), var()), people),

# Alfred lives in Australia


(membero, ('Alfred', var(), var(), 'Australia'), people),

# Person who owns the dog lives in France


(membero, (var(), 'dog', var(), 'France'), people),

# Who has a rabbit?


(membero, (var(), 'rabbit', var(), var()), people)
)

# Run the solver


solutions = run(0, people, rules)

# extract the output from the solver


output_zebra = [people for people in solutions[0] if 'rabbit' in people][0][0]

print ('\n'+ output_zebra + ' owns a rabbit.')

Exercise 4.1: Build up the following family tree given the following
description. Define the following relationships (always check if 'x'
is related to 'y', e.g. is 'x' the father of 'y'): father(x,y) and
mother(x,y) as basic relationships; parent(x,y), gandparent(x,y),
sibling(x,y), uncleOrAunt(x,y).
# import libraries
import json
from kanren import Relation, facts, run, eq, membero, var, conde

# read in data from a JSON file (or create the knowledge base manually...)
with open('04_relationships.json') as f:
d = json.loads(f.read())

# Check if 'x' is the parent of 'y'


def parent(x, y):
return conde([father(x, y)], [mother(x, y)])

# Check if 'x' is the grandparent of 'y'


def grandparent(x, y):
temp = var()
return conde((parent(x, temp), parent(temp, y)))

# Check for sibling relationship between 'a' and 'b'


def sibling(x, y):
temp = var()
return conde((parent(temp, x), parent(temp, y)))

# Check if x is y's uncle or aunt


def uncleOrAunt(x, y):
temp = var()
return conde((sibling(temp, x), parent(temp, y)))

# Now define core relations father and mother:


father = Relation()
mother = Relation()

for item in d['father']:


facts(father, (list(item.keys())[0], list(item.values())[0]))

for item in d['mother']:


facts(mother, (list(item.keys())[0], list(item.values())[0]))

Exercise 4.2: Answer these questions through inferences on your


knowledge base! (question given in the notebook)
# define variable for queries...
x = var()

# John's children
name = 'John'
output = run(0, x, father(name, x))
print("\nList of " + name + "'s children:")
for item in output:
print(item)

# William's mother
name = 'William'
output = run(0, x, mother(x, name))[0]
print("\n" + name + "'s mother:\n" + output)

# Adam's parents
name = 'Adam'
output = run(0, x, parent(x, name))
print("\nList of " + name + "'s parents:")
for item in output:
print(item)

# Wayne's grandparents
name = 'Wayne'
output = run(0, x, grandparent(x, name))
print("\nList of " + name + "'s grandparents:")
for item in output:
print(item)

# David's siblings
name = 'David'
output = run(0, x, sibling(x, name))
siblings = [x for x in output if x != name]
print("\nList of " + name + "'s siblings:")
for item in siblings:
print(item)

# Tiffany's uncles and aunts


name = 'Tiffany'
name_father = run(0, x, father(x, name))[0]
name_mother = run(0, x, mother(x, name))[0]
output = run(0, x, uncleOrAunt(x, name))
output = [x for x in output if x != name_father and x != name_mother]
print("\nList of " + name + "'s uncles or aunts:")
for item in output:
print(item)

# All spouses
a, b, c = var(), var(), var()
output = run(0, (a, b), (father, a, c), (mother, b, c))
print("\nList of all spouses:")
for item in output:

print('Husband:', item[0], '<==> Wife:', item[1])

Lösungen zu den Exercises von Modul 3


Exercise 5.1: Implement the Iterative Deepening algorithm by
adapting the dfs function and run it on the test graph. Compare
the debug output with the other search techniques.
# depth-first implementation with border (limit for depth)
def dfs(graph, start, goal, border):
visited = set()
stack = [start]
# iterate through tree until depth 'border'
counter = 0
while counter < border:
# iterate through stack and check all nodes
new_stack = []
while stack:
node = stack.pop()
#print("DFS - Checking node", node, "; Visited =", visited)
if node not in visited:
visited.add(node)
if node == goal:
return 1
for neighbor in graph[node]:
if neighbor not in visited:
new_stack.append(neighbor)
stack = new_stack
#print("DFS - Checked node", node, "; Visited =", visited, "; Stack =", stack)
counter += 1
return 0

def iterdeep(graph, start, goal):


print("Iter.Deep. - Goal", goal)
depth = 1
while 1==1:
if(dfs(graph, start, goal, depth) == 1):
return depth
depth += 1
print("Iter.Deep. - Goal not found, increasing depth to", depth)

Exercise 5.2: Run the four algorithms on the following graph (due
to the visited flag the search techniques also work on graphs).
Analyze runtime and memory usage for the three uninformed
search algorithms.
Only shown for Depth First Search! Can be applied to the other search algorithms
accordingly...

# import matplotlib
import matplotlib.pyplot as plt

# if necessary: > pip install networkx


from networkx import nx

# let's play with a classic Barabasi graph :-)

ba = nx.barabasi_albert_graph(100, 5)

# package and variable for memory stats...


import os
import psutil

# return the memory usage in MB using psutil


def memory_usage_psutil():
process = psutil.Process(os.getpid())
mem = process.memory_info()[0] / float(2 ** 20)
return mem

# let's evaluate DFS first


mem_psutil = []

def dfs(graph, start, goal):


visited = set()
stack = [start]
# iterate through stack
while stack:
node = stack.pop()
if node not in visited:
visited.add(node)
mem_psutil.append(memory_usage_psutil())
if node == goal:
return
for neighbor in graph[node]:
if neighbor not in visited:
stack.append(neighbor)

# we use the adjacency matrix here, so the algorithms


# have to be adapted slightly
search_tree = ba.adj
# run DFS on our test tree (%time for one iteration, %timeit for 1000)
%timeit dfs(search_tree, 1, 95)

import matplotlib.pyplot as plt


%matplotlib inline
plt.title('memory usage')
plt.plot(mem_psutil)
plt.show()

Exercise 5.3: Extend the UCS algorithm to a real heuristic search


technique by using the costs to the goal vertice (remark: it is
necessary to extent the data model, pre-define these costs and
provide a method to return the costs-to-goal). Try to implement
greedy search and the A* algorithm.
The trick is to add an heuristic function to estimate the costs to the goal - in this example
the following is used: (ord(goal)-ord(i))/len(graph.edges)

# from UCS to A*
class Graph:
def __init__(self):
self.edges = {
'A': ['B', 'C'],
'B': ['D', 'A'],
'C': ['A'],
'D': ['B', 'E'],
'E': []
}
self.weights = {
'AB': 0.4,
'AC': 0.6,
'BA': 0.2,
'BD': 0.8,
'DE': 0.5
}

def neighbors(self, node):


return self.edges[node]

def get_cost(self, from_node, to_node):


print('costs from last node', from_node, 'to', to_node, ':', self.weights[(from_node +
to_node)])
return (self.weights[(from_node + to_node)])

# moreover the PriorityQueue from the queue package


from queue import PriorityQueue

def a_star(graph, start, goal):


print("A* - Goal", goal)
visited = set()
queue = PriorityQueue()
queue.put((0, start))
while queue:
cost, node = queue.get()
print("A* - Checking node", node, "Cost =", cost, "Visited =", visited)
if node not in visited:
visited.add(node)
if node == goal:
return
for i in graph.neighbors(node):
if i not in visited:
# remark: heuristic here is the difference between the keys of the nodes!
print("estimted cost to goal:",(ord(goal)-ord(i))/len(graph.edges))
total_cost = cost + graph.get_cost(node, i) + (ord(goal)-ord(i))/len(graph.edges)
queue.put((total_cost, i))

# run a_star on our test tree


sg1 = Graph()
a_star(sg1,'A','E')

Exercise 6.1: Alter the initial state of the 8-puzzle and the maze -
experiment with the two solvers! Find a configuration for each
example for which the task is not solvable. What happens?
Such scenarios simply lead to an error...

# Maze solver example: no solution possible

# we need the math and the simpleai package


import math
from simpleai.search import SearchProblem, astar

# Class containing the methods to solve the maze


class MazeSolver(SearchProblem):
# Initialize the class
def __init__(self, board):
self.board = board
self.goal = (0, 0)

for y in range(len(self.board)):
for x in range(len(self.board[y])):
if self.board[y][x].lower() == "o":
self.initial = (x, y)
elif self.board[y][x].lower() == "x":
self.goal = (x, y)

super(MazeSolver, self).__init__(initial_state=self.initial)

# Define the method that takes actions


# to arrive at the solution
def actions(self, state):
actions = []
for action in COSTS.keys():
newx, newy = self.result(state, action)
if self.board[newy][newx] != "#":
actions.append(action)

return actions

# Update the state based on the action


def result(self, state, action):
x, y = state

if action.count("up"):
y -= 1
if action.count("down"):
y += 1
if action.count("left"):
x -= 1
if action.count("right"):
x += 1

new_state = (x, y)

return new_state

# Check if we have reached the goal


def is_goal(self, state):
return state == self.goal

# Compute the cost of taking an action


def cost(self, state, action, state2):
return COSTS[action]

# Heuristic that we use to arrive at the solution


def heuristic(self, state):
x, y = state
gx, gy = self.goal

return math.sqrt((x - gx) ** 2 + (y - gy) ** 2)

# Define the map


MAP = """
##############################
# # # #
# #### ######## # #
# o# # # #
# ### ##### ###### #
# # ### # # #
# # # # # # ###
# ##### # # # x #
# # # #
##############################
"""

# Convert map to a list


print(MAP)
MAP = [list(x) for x in MAP.split("\n") if x]

# Define cost of moving around the map


cost_regular = 1.0
cost_diagonal = 1.7
# Create the cost dictionary
COSTS = {
"up": cost_regular,
"down": cost_regular,
"left": cost_regular,
"right": cost_regular,
"up left": cost_diagonal,
"up right": cost_diagonal,
"down left": cost_diagonal,
"down right": cost_diagonal,
}

# Create maze solver object


problem = MazeSolver(MAP)

# Run the solver


result = astar(problem, graph_search=True)

# Extract the path


path = [x[1] for x in result.path()]

# Print starting state and the result


print()
for y in range(len(MAP)):
for x in range(len(MAP[y])):
if (x, y) == problem.initial:
print('o', end='')
elif (x, y) == problem.goal:
print('x', end='')
elif (x, y) in path:
print('·', end='')
else:
print(MAP[y][x], end='')
print()

Lösungen zu den Exercises von Modul 4


Exercise 7.1: Calculate P(A|B) for the given scenario!
# calculate the probability of cancer patient and diagnostic test

# calculate P(A|B) given P(A), P(B|A), P(B|not A)


def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
# calculate P(not A)
not_a = 1 - p_a
# calculate P(B)
p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
# calculate P(A|B)
p_a_given_b = (p_b_given_a * p_a) / p_b
return p_a_given_b

# P(A)
p_a = 0.0002
# P(B|A)
p_b_given_a = 0.85
# P(B|not A)
p_b_given_not_a = 0.05
# calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# summarize

print('P(A|B) = %.3f%%' % (result * 100))

Exercise 7.2: Consider additional information about phone usage


for this scenario, as given with the likelihoods below! (the MCMC
model from above can be reused)
# Numpy and pandas for data manipulation
import numpy as np
import pandas as pd

# Matplotlib for visualization


import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline

# Logistic Parameters from Markov Chain Monte Carlo Notebook


alpha = 0.977400
beta = -0.067270

def calculate_prior(time, alpha, beta):


p = 1.0 / (1.0 + np.exp(np.dot(beta, time) + alpha))
return p

# likelihoods

# P(light | sleep)
light_sleep = 0.01
# P(-light | sleep)
nolight_sleep = 0.99
# P(light | -sleep)
light_nosleep = 0.8
# P(-light | -sleep)
nolight_nosleep = 0.2

# P(phone | sleep)
phone_sleep = 0.95
# P(-phone | sleep)
nophone_sleep = 0.05
# P(phone | -sleep)
phone_nosleep = 0.25
# P(-phone | -sleep)

nophone_nosleep = 0.75

def add_update_probability(time_offset, light, phone):


# Calculate the prior for the time
prior_probability = calculate_prior(time_offset, alpha, beta)

# Account for evidence


if light == 0:
l_likelihood = nolight_sleep
l_non_likelihood = nolight_nosleep
l_status = 'OFF'

elif light == 1:
l_likelihood = light_sleep
l_non_likelihood = light_nosleep
l_status= 'ON'

if phone == 0:
p_likelihood = nophone_sleep
p_non_likelihood = nophone_nosleep
p_status = 'NOT charging'

elif phone == 1:
p_likelihood = phone_sleep
p_non_likelihood = phone_nosleep
p_status = 'charging'

numerator = l_likelihood * p_likelihood * prior_probability


denominator = (l_likelihood * p_likelihood * prior_probability) + (l_non_likelihood *
p_non_likelihood * (1 - prior_probability))

conditional_probability = numerator / denominator

if type(time_offset) == int:
time = pd.datetime(2017, 1, 1, 10, 0, 0)
new_time = str((time + pd.DateOffset(minutes = time_offset)).time())

print('Time is {} PM \tLight is {} \tPhone IS {}.'.format(


new_time, l_status, p_status))

print('\nThe prior probability of sleep: {:.2f}%'.format(100 * prior_probability))


print('The updated probability of sleep: {:.2f}%'.format(100 * conditional_probability))

return conditional_probability

# some sample probabilities (baseline is 10pm)


result = add_update_probability(-15, 1, 1)
result = add_update_probability(-15, 1, 0)
result = add_update_probability(60, 1, 0)
result = add_update_probability(15, 0, 1)

# Data formatted in different notebook


sleep_data = pd.read_csv('07_sleep_data.csv')

# Labels for plotting


sleep_labels = ['9:00', '9:30', '10:00', '10:30', '11:00', '11:30', '12:00']

# Sort the values by time offset


sleep_data.sort_values('time_offset', inplace=True)

# Time is the time offset


time = np.array(sleep_data.loc[:, 'time_offset'])
# Observations are the indicator
sleep_obs = np.array(sleep_data.loc[:, 'indicator'])

# Time values for probability prediction


time_est = np.linspace(time.min()- 5, time.max() + 5, 1e3)[:, None]

# Probability at each time using mean values of alpha and beta


sleep_est = calculate_prior(time_est, alpha, beta)

time_est = np.linspace(time.min()- 15, time.max() + 30, 1e3)[:, None]


sleep_labels.append('00:30')

light_phone = add_update_probability(time_est, 1, 1)
nolight_phone = add_update_probability(time_est, 0, 1)
light_nophone = add_update_probability(time_est, 1, 0)
nolight_nophone = add_update_probability(time_est, 0, 0)

from IPython.core.pylabtools import figsize

figsize(18, 8)

plt.plot(time_est, sleep_est, color = 'black',


lw=3, linestyle = '--', label="Prior Probability")
plt.plot(time_est, light_phone, color = 'orange', lw = 4,
label = 'Light ON Phone Charging')
plt.plot(time_est, nolight_phone, color = 'darkmagenta', lw = 4,
label = 'Light OFF Phone Charging')
plt.plot(time_est, light_nophone, color = 'brown', lw= 4,
label = 'Light ON Phone NOT Charging')
plt.plot(time_est, nolight_nophone, color = 'darkblue', lw = 4,
label = 'Light OFF Phone NOT Charging')
plt.scatter(time, sleep_obs, edgecolor = 'slateblue',
s=50, alpha=0.2)
plt.legend(loc=2); plt.xlabel('Time'); plt.ylabel('Probability')
plt.title('Probability of Sleep with Light and Phone');
plt.xticks([-60, -30, 0, 30, 60, 90, 120, 150], sleep_labels);

Exercise 8.1: Make use of a Bayesian Network for the Monty Hall
problem!
# Import required packages;
# requires: pip install pomegranate
import math
from pomegranate import *

# Initially the door selected by the guest is completely random


guest = DiscreteDistribution( { 'A': 1./3, 'B': 1./3, 'C': 1./3 } )

# The door containing the prize is also a random process


prize = DiscreteDistribution( { 'A': 1./3, 'B': 1./3, 'C': 1./3 } )

# The door Monty picks, depends on the choice of the guest and the prize door
monty = ConditionalProbabilityTable(
[[ 'A', 'A', 'A', 0.0 ],
[ 'A', 'A', 'B', 0.5 ],
[ 'A', 'A', 'C', 0.5 ],
[ 'A', 'B', 'A', 0.0 ],
[ 'A', 'B', 'B', 0.0 ],
[ 'A', 'B', 'C', 1.0 ],
[ 'A', 'C', 'A', 0.0 ],
[ 'A', 'C', 'B', 1.0 ],
[ 'A', 'C', 'C', 0.0 ],
[ 'B', 'A', 'A', 0.0 ],
[ 'B', 'A', 'B', 0.0 ],
[ 'B', 'A', 'C', 1.0 ],
[ 'B', 'B', 'A', 0.5 ],
[ 'B', 'B', 'B', 0.0 ],
[ 'B', 'B', 'C', 0.5 ],
[ 'B', 'C', 'A', 1.0 ],
[ 'B', 'C', 'B', 0.0 ],
[ 'B', 'C', 'C', 0.0 ],
[ 'C', 'A', 'A', 0.0 ],
[ 'C', 'A', 'B', 1.0 ],
[ 'C', 'A', 'C', 0.0 ],
[ 'C', 'B', 'A', 1.0 ],
[ 'C', 'B', 'B', 0.0 ],
[ 'C', 'B', 'C', 0.0 ],
[ 'C', 'C', 'A', 0.5 ],
[ 'C', 'C', 'B', 0.5 ],
[ 'C', 'C', 'C', 0.0 ]], [guest, prize] )

d1 = State( guest, name="guest" )


d2 = State( prize, name="prize" )
d3 = State( monty, name="monty" )

# Building the Bayesian Network


network = BayesianNetwork( "Solving the Monty Hall Problem With Bayesian Networks" )
network.add_states(d1, d2, d3)
network.add_edge(d1, d3)
network.add_edge(d2, d3)

network.bake()

# What are the odds (A, B, C) for both players if guest decides for door B?
beliefs = network.predict_proba({ 'guest' : 'B' })
beliefs = map(str, beliefs)
print("\n".join( "{}\t{}".format( state.name, belief ) for state, belief in zip( network.states,
beliefs ) ))

# What are the odds (A, B, C) for both players if the guest decides for door A and Monty for
door B?
beliefs = network.predict_proba({'guest' : 'A', 'monty' : 'B'})
print("\n".join( "{}\t{}".format( state.name, str(belief) ) for state, belief in zip( network.states,
beliefs )))

Exercise 8.2: Design a Bayesian Network and calculate the CPDs


for the following scenario...
(as given in the task description)
from pgmpy.models import BayesianModel

# this is how we realize the Bayesian network: raining -> parking space av <- working day
model = BayesianModel([('raining', 'available'), ('working', 'available')])

# plot it
from networkx import nx
%matplotlib inline

nx.draw(model)

# creative part: create data according to task description


import pandas as pd
data = pd.DataFrame(data={
'raining': [True, True, True, False, False, False, False,
True, True, True, True, True, False, False,
True, True, True, False, False, False, False,
False, False, True, True, False, False, False,
True, True],
'available': [0.5, 0, 0, 0.5, 1, 1, 1,
0.5, 0, 0, 0.5, 0.5, 1, 1,
0.5, 0, 0, 0.5, 0.5, 1, 1,
0.5, 0, 0.5, 0.5, 1, 1, 1,
0.5, 0.5],
'working': [True, True, True, False, False, False, False,
True, True, True, True, True, False, False,
True, True, True, False, True, False, False,
True, True, True, True, True, False, False,
True, True]
})
print(data)

# create Bayes estimator


from pgmpy.estimators import BayesianEstimator
est = BayesianEstimator(model, data)
print(est.estimate_cpd('available'))

Lösungen zu den Exercises von Modul 5


Exercise 9.1: Build a k-Nearest-Neighbour (k=7) as well as a
Decision Tree classifier (max.depth=4) and train it with the
following data (X = input, y = labels, 70% of data for training).
Visualize the prediction models on the basis of the trainings and
test data-set. Calculate the accuracy of these two models.
# data-set for this exercise
def twospirals(n_points, noise=.5):
"""
Returns the two spirals dataset.
"""
n = np.sqrt(np.random.rand(n_points,1)) * 780 * (2*np.pi)/360
d1x = -np.cos(n)*n + np.random.rand(n_points,1) * noise
d1y = np.sin(n)*n + np.random.rand(n_points,1) * noise
return (np.vstack((np.hstack((d1x,d1y)),np.hstack((-d1x,-d1y)))),
np.hstack((np.zeros(n_points),np.ones(n_points))))

X, y = twospirals(1000)

plt.title('training set')
plt.plot(X[y==0,0], X[y==0,1], '.', label='hot dog')
plt.plot(X[y==1,0], X[y==1,1], '.', label='not hot dog')
plt.legend()
plt.show()

from sklearn import neighbors


from sklearn.model_selection import train_test_split

# split data (70% training, 30% test)


train_data, test_data, train_labels, test_labels = train_test_split(
X, y, test_size=0.3, random_state=51)

print("train-data: ", train_data.shape, ", train-labels: ", train_labels.shape)


print("test-data: ", test_data.shape, ", test-labels: ", test_labels.shape)

# we set k to 7
n_neighbors = 7

# we create an instance of kNN Classifier and fit the data.


knn_clf = neighbors.KNeighborsClassifier(n_neighbors, weights='distance')
knn_clf.fit(train_data, train_labels)

### visualize_classifier() as given by example!

# define boundary for visualize_classifier


boundary = []
if len(X) > 0:
boundary.append(X[:, 0].min() - 1.0)
boundary.append(X[:, 0].max() + 1.0)
boundary.append(X[:, 1].min() - 1.0)
boundary.append(X[:, 1].max() + 1.0)

# visualize the kNN Classifier for the training data


visualize_classifier(knn_clf, train_data, train_labels,
"kNN Classifier / trainings data (k = %d, weights = 'distance')" % n_neighbors,
boundary)

# visualize the kNN Classifier for the test data


visualize_classifier(knn_clf, test_data, test_labels,
"kNN Classifier / test data (k = %d, weights = 'distance')" % n_neighbors,
boundary)

# Decision Trees classifier


params = {'random_state': 0, 'max_depth': 4}
dt_clf = DecisionTreeClassifier(**params)
dt_clf.fit(train_data, train_labels)
# visualize the DT Classifier for the training data
visualize_classifier(dt_clf, train_data, train_labels,
"DT Classifier / trainings data (depth = 4)", boundary)

# visualize the DT Classifier for the training data


visualize_classifier(dt_clf, test_data, test_labels,
"DT Classifier / test data (depth = 4)", boundary)

Exercise 9.2: Train, evaluate and visualize a decision tree for the
following data-set.
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# data for this exercise


data =
pd.DataFrame({"toothed":["True","True","True","False","True","True","True","True","True","False"
],
"hair":["True","True","False","True","True","True","False","False","True","False"],
"breathes":["True","True","True","True","True","True","False","True","True","True"],
"legs":["True","True","False","True","True","True","False","False","True","True"],
"species":["Mammal","Mammal","Reptile","Mammal","Mammal","Mammal","Reptile",
"Reptile","Mammal","Reptile"]},
columns=["toothed","hair","breathes","legs","species"])
features = data[["toothed","hair","breathes","legs"]]
target = data["species"]
data
from sklearn.tree import DecisionTreeClassifier

# some preprocessing (convert True to 1 and False to 0)


features = (features=="True").astype(int)
print(features)
print(target)

# Decision Trees classifier


params = {'random_state': 0, 'max_depth': 5}
classifier = DecisionTreeClassifier(**params)
classifier.fit(features, target)

classifier.predict(features)

# some more visualization (the decision tree itself)


# install if not available: > conda install python-graphviz
import graphviz
from sklearn import tree
dot_data = tree.export_graphviz(classifier, out_file=None,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph
Exercise 10.1: MApply the k-Means algorithm to the following
data-set using several values for k (2,3,4,5,6,7).
# import libraries
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs

# data-set for this exercise


X, y_true = make_blobs(n_samples=500, centers=5,
cluster_std=0.60, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import metrics
%matplotlib inline

for k in range(2, 8):


num_clusters = k

# Create KMeans object


kmeans = KMeans(init='k-means++', n_clusters=num_clusters, n_init=10)

# Train the KMeans clustering model


kmeans.fit(X)

# Step size of the mesh


step_size = 0.01

# Define the grid of points to plot the boundaries


x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
x_vals, y_vals = np.meshgrid(np.arange(x_min, x_max, step_size),
np.arange(y_min, y_max, step_size))

# Predict output labels for all the points on the grid


output = kmeans.predict(np.c_[x_vals.ravel(), y_vals.ravel()])

# Plot different regions and color them


output = output.reshape(x_vals.shape)
plt.figure()
plt.clf()
plt.imshow(output, interpolation='nearest',
extent=(x_vals.min(), x_vals.max(),
y_vals.min(), y_vals.max()),
cmap=plt.cm.Paired,
aspect='auto',
origin='lower')

# Overlay input points


plt.scatter(X[:,0], X[:,1], marker='o', facecolors='none',
edgecolors='black', s=80)

# Plot the centers of clusters


cluster_centers = kmeans.cluster_centers_
plt.scatter(cluster_centers[:,0], cluster_centers[:,1],
marker='o', s=210, linewidths=4, color='black',
zorder=12, facecolors='black')

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1


y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
plt.title('Boundaries of clusters')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
print("\nk-Means Algorithm for k =", k)
plt.show()

Exercise 10.2: Calculate the Shilouhette score for the data-set of


Exercise 10.1 and build a kMeans clustering models using the
optimal k.
# Initialize variables
scores = []
values = np.arange(2, 10)

# Iterate through the defined range


for num_clusters in values:
# Train the KMeans clustering model
kmeans = KMeans(init='k-means++', n_clusters=num_clusters, n_init=10)
kmeans.fit(X)
score = metrics.silhouette_score(X, kmeans.labels_,
metric='euclidean', sample_size=len(X))

print("\nNumber of clusters =", num_clusters)


print("Silhouette score =", score)

scores.append(score)
# Plot silhouette scores
plt.figure()
plt.bar(values, scores, width=0.7, color='black', align='center')
plt.title('Silhouette score vs number of clusters')

# Extract best score and optimal number of clusters


num_clusters = np.argmax(scores) + values[0]
print('\nOptimal number of clusters =', num_clusters)

plt.show()

Lösungen zu den Exercises von Modul 6


Exercise 11.1: Experiment with this 2-layer neural network - vary
the architecture (number of neurons in the hidden layer), the
activation function or the number of training iterations. Also add
some debug output in the backpropagation function (after 100
training runs) and observe the error/delta.
import numpy as np

def sigmoid(x):
return 1.0/(1+ np.exp(-x))

def sigmoid_derivative(x):
return x * (1.0 - x)

def linear(x):
return x

def linear_derivative(x):
return 1

def tanH(x):
return np.tanh(x)

def tanH_derivative(x):
return 1 - tanH(x)**2

class NeuralNetwork:
def __init__(self, x, y, hidden_neurons, act, act_derivative):
self.input =x
self.weights1 = np.random.rand(self.input.shape[1],hidden_neurons)
self.weights2 = np.random.rand(hidden_neurons,1)
self.y =y
self.output = np.zeros(self.y.shape)
self.act = act
self.act_derivative = act_derivative

def feedforward(self):
self.layer1 = self.act(np.dot(self.input, self.weights1))
self.output = self.act(np.dot(self.layer1, self.weights2))

def backprop(self):
# application of the chain rule to find derivative of the loss function with respect to
weights2 and weights1
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) *
self.act_derivative(self.output)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) *
self.act_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

# update the weights with the derivative (slope) of the loss function
self.weights1 += d_weights1
self.weights2 += d_weights2

def print_debug(self, num_runs):


print('Difference after {} runs: {}'.format(num_runs, np.mean(abs(self.y - self.output))))

# input vectors
X = np.array([[0,0,1],
[1,1,0],
[1,0,1],
[0,1,1],
[1,1,1]])

# output values (labels)


y = np.array([[0],[1],[1],[1],[0]])

def run_nn(x, y, hidden_neurons, iterations, act, act_der):


# initialize neural network
nn = NeuralNetwork(x,y,hidden_neurons, act, act_der)

print('\nHidden Neurons:', hidden_neurons)


print('Iterations:', iterations)

# iterate 1500 times


for i in range(iterations):
nn.feedforward()
nn.backprop()

if i == 100:
nn.print_debug(i)

nn.print_debug(i)

def run_tests(act, act_der):


run_nn(X, y, 4, 1500, act, act_der)
run_nn(X, y, 10, 1500, act, act_der)
run_nn(X, y, 40, 1500, act, act_der)

run_nn(X, y, 4, 1500, act, act_der) # ups, das ist doppelt


run_nn(X, y, 4, 15000, act, act_der)

print("\n\nSigmoid:")
run_tests(sigmoid, sigmoid_derivative)

print("\n\nLinear:")
run_tests(linear, linear_derivative)

print("\n\ntanH:")
run_tests(tanH, tanH_derivative)

Exercise 11.2: Build a perceptron and train it with the following


data (X = input, y = labels). What do you observe?
import numpy as np
import matplotlib.pyplot as plt
import neurolab as nl

def twospirals(n_points, noise=.5):


"""
Returns the two spirals dataset.
"""
n = np.sqrt(np.random.rand(n_points,1)) * 780 * (2*np.pi)/360
d1x = -np.cos(n)*n + np.random.rand(n_points,1) * noise
d1y = np.sin(n)*n + np.random.rand(n_points,1) * noise
return (np.vstack((np.hstack((d1x,d1y)),np.hstack((-d1x,-d1y)))),
np.hstack((np.zeros(n_points),np.ones(n_points))))
X, y = twospirals(1000)

plt.title('training set')
plt.plot(X[y==0,0], X[y==0,1], '.', label='hot dog')
plt.plot(X[y==1,0], X[y==1,1], '.', label='not hot dog')
plt.legend()

plt.show()

# Define minimum and maximum values for each dimension


dim1_min, dim1_max, dim2_min, dim2_max = min(X[:, 0]), max(X[:, 0]), min(X[:, 1]), max(X[:,
1])

y = y.reshape((X.shape[0], 1))

# Number of neurons in the output layer


num_output = y.shape[1]

# Define a perceptron with 2 input neurons (because we


# have 2 dimensions in the input data)
dim1 = [dim1_min, dim1_max]
dim2 = [dim2_min, dim2_max]
perceptron = nl.net.newp([dim1, dim2], num_output)

# Train the perceptron using the data (lr...learning rate)


error_progress = perceptron.train(X, y, epochs=100, show=20, lr=0.03)

# Plot the training progress


plt.figure()
plt.plot(error_progress)
plt.xlabel('Number of epochs')
plt.ylabel('Training error')
plt.title('Training error progress')
plt.grid()

plt.show()

# Error rate does not realy improve...

Exercise 11.3: Build a single layer neural network and train it with
the data from the last exercise (X = input, y = labels). What do you
observe?
# Define minimum and maximum values for each dimension
dim1_min, dim1_max, dim2_min, dim2_max = min(X[:, 0]), max(X[:, 0]), min(X[:, 1]), max(X[:,
1])

y = y.reshape((X.shape[0], 1))

# Define a perceptron with 2 input neurons (because we


# have 2 dimensions in the input data)
dim1 = [dim1_min, dim1_max]
dim2 = [dim2_min, dim2_max]
# Number of neurons in the output layer
num_output = y.shape[1]

# Define a single-layer neural network


nn = nl.net.newff([dim1, dim2], [5, num_output])

# Train the perceptron using the data (lr...learning rate)


error_progress = nn.train(X, y, epochs=100, show=20)

# Plot the training progress


plt.figure()
plt.plot(error_progress)
plt.xlabel('Number of epochs')
plt.ylabel('Training error')
plt.title('Training error progress')
plt.grid()

plt.show()

# a clear improvement of the situation, yet error rate still high

Exercise 11.4: Figure out how to change the learning rate and
examine if you can improve the training (e.g. less oscillation
effects in the error rate)?
import numpy as np
import matplotlib.pyplot as plt
import neurolab as nl

# Generate some training data


min_val = -15
max_val = 15
num_points = 130
x = np.linspace(min_val, max_val, num_points)
y = 3 * np.square(x) + 5
y /= np.linalg.norm(y)

# Create data and labels


data = x.reshape(num_points, 1)
labels = y.reshape(num_points, 1)

# Plot input data


plt.figure()
plt.scatter(data, labels)
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')

plt.title('Input data')

# Define a single-layer neural network


nn = nl.net.newff([[min_val, max_val]], [10, 6, 1])
# Train the perceptron using the data (lr...learning rate)
nn.trainf = nl.train.train_gdx
error_progress = nn.train(data, labels, epochs=500, show=20, lr=0.07)

# Plot the training progress


plt.figure()
plt.plot(error_progress)
plt.xlabel('Number of epochs')
plt.ylabel('Training error')
plt.title('Training error progress')
plt.grid()

plt.show()

Exercise 11.5: Build a multilayer neural network and train it with


the data from the last exercise (X = input, y = labels). What do you
observe now?
import numpy as np
import matplotlib.pyplot as plt
import neurolab as nl

def twospirals(n_points, noise=.5):


"""
Returns the two spirals dataset.
"""
n = np.sqrt(np.random.rand(n_points,1)) * 780 * (2*np.pi)/360
d1x = -np.cos(n)*n + np.random.rand(n_points,1) * noise
d1y = np.sin(n)*n + np.random.rand(n_points,1) * noise
return (np.vstack((np.hstack((d1x,d1y)),np.hstack((-d1x,-d1y)))),
np.hstack((np.zeros(n_points),np.ones(n_points))))

X, y = twospirals(1000)

plt.title('training set')
plt.plot(X[y==0,0], X[y==0,1], '.', label='hot dog')
plt.plot(X[y==1,0], X[y==1,1], '.', label='not hot dog')
plt.legend()
plt.show()

# Define minimum and maximum values for each dimension


dim1_min, dim1_max, dim2_min, dim2_max = min(X[:, 0]), max(X[:, 0]), min(X[:, 1]), max(X[:,
1])

y = y.reshape((X.shape[0], 1))

# Number of neurons in the output layer


num_output = y.shape[1]

# Define a single-layer neural network


dim1 = [dim1_min, dim1_max]
dim2 = [dim2_min, dim2_max]

# Define a multilayer neural network with 3 hidden layers (seems to work best);
nn = nl.net.newff([dim1, dim2], [10, 20, 4, num_output])

# This train algorithm based on spipy.optimize seems to give the best results
nn.trainf = nl.train.train_cg

# Train the neural network


error_progress = nn.train(X, y, epochs=1000, show=50, goal=0.01)

# Plot the training progress


plt.figure()
plt.plot(error_progress)
plt.xlabel('Number of epochs')
plt.ylabel('Training error')
plt.title('Training error progress')
plt.grid()

plt.show()

You might also like