0% found this document useful (0 votes)
77 views42 pages

VoiceAssistantMiniProject Report

Reports of project and internship

Uploaded by

tejeswini.22mj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views42 pages

VoiceAssistantMiniProject Report

Reports of project and internship

Uploaded by

tejeswini.22mj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

JnanaSangama, Belagavi - 590018

Mini Project Report On

“AI DESKTOP VOICE ASSISTANT”


Submitted in partial fulfillment for 6th sem Assignment in Mini Project
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by

Vandana GM Tejeswini MJ
1VK21CS086 1VK21CS084
Under the Guidance
of
Viswanath P
Assistant Professor, Department of Computer Science

VIVEKANANDA INSTITUTE OF TECHNOLOGY


Gudimavu, Kumbalgodu(P), Kengeri (H), Bengaluru-560074
2023-2024
Janatha Education Society®
VIVEKANANDA INSTITUTE OF TECHNOLOGY
Gudimavu, Kumbalagodu Post, Kengeri Hobli, Bengaluru – 560 074

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the Mini Project work entitled “AI DESKTOP VOICE ASSISTANT” carried out by
VANDANA GM bearing the USN: 1VK21CS086 and TEJESWINI MJ bearing the USN: 1VK21CS084
is a bonafide student of Vivekananda Institute of Technology, Bengaluru in partial fulfilment of the
requirements for the Sixth SEMESTER MINI PROJECT(21CSMP67) in Computer Science and
Engineering, with a Bachelor’s in Engineering prescribed by the VISVESVARAYA TECHNOLOGICAL
UNIVERSITY, Belagavi for the academic year 2023-24. It is certified that all corrections/suggestions
indicated for internal assessment have been incorporated in the report deposited in the departmental library.
The mini project report has been approved as it satisfies the academic requirements in respect of Mini Project
prescribed for the said degree.

Signature of the Guide Signature of the HOD Signature of the Principal


Viswanath P Dr. Vidya A Dr. K M RaviKumar
Asst Professor Department of CSE Prof.& Head of Department Principal VKIT,
VKIT, Bengaluru CSE, VKIT, Bengaluru Bengaluru

External Viva-Voce
Name of the Examiners Signature with date

1.__________________ _________________
2.__________________ _________________
ACKNOWLEDGEMENT

We express our profound gratitude and deep regards to our guide Viswanath P for his exemplary
guidance, monitoring and constant encouragement throughout the course of this project. The help
and guidance given by his time to time provided us with valuable information and assistance,
which helped us in completing this task through various stages.

I am obliged to thank the staff members of Computer Science VKIT, for the valuable information
provided by them in their respective fields. We would like to extend our heartfeltthanks to our Head
of department Dr. Vidya A without whom planning and execution of theassignment would not be
possible. We extend our cordial regards and thanks to our principal Dr. Ravikumar We also thank
all the members for this mini project who have contributed directly or indirectly

Vandana GM(1VK21CS086)
Tejeswini MJ(1VK21CS084)

i
ABSTRACT

The project aims to develop a personal-assistant for Desktop. Jarvis draws its inspiration from
virtual assistant like Cortana for windows, and Siri for iOS. It has been designed to provide a user-
friendly interface for carrying out a variety of tasks by employing certain well-defined commands.
Users can interact with the assistant through voice commands.

Computer vision is a field of computer science that enables computers to identify and process
objects in videos and images just the way we humans do. Although computer vision might seem
like not a very old concept but it dates back to the late 1960s when the first digital image scanner
which transformed images into grids of numbers was invented.

In this project we mainly use voice as communication means so the assistant is basically the
Speech recognition application.

As a personal assistant, Jarvis assists the end-user with day-to-day activities like general human
conversation, searching queries in google, searching for videos, retrieving images, live weather
conditions, word meanings, searching for medicine details, health recommendations based on
symptoms and reminding the user about the scheduled events and tasks. The user
statements/commands are analyzed with the help of machine learning to give an optimal solution.

A speech synthesizer takes as input and produces an audio stream as output. A speech recognizer
on the other hand does opposite. It takes an audio stream as input and thus turns it into text
transcription. The voice is a signal of infinite information. A direct analysis and synthesizing the
complex voice signal are due to too much information contained in the signal. Therefore, the digital
signal processes such as Feature Extraction and Feature Matching are introduced to represent the
voice signal. We test this on 2 speakers (1 Female and 1 Male) for accuracy purpose.

ii
CONTENTS
Acknowledgement i

Abstract ii

Contents iii

List of figures v

CHAPTER 1: INTRODUCTION 1

1.1 Introduction 1

1.2 Objectives 2

1.3 Applications of Speech 3

1.4 Speech-to-text 4

1.5 Overview of the project 6

CHAPTER 2: ANALYSIS AND DESIGN 7

2.1 Benefits of voice assistant 7

2.2 User interface 8

2.2.1 Graphical User Interface (GUI) 8

2.2.2 Voice User Interface (VUI) 8

2.3 Data flow diagram 9

2.4 Architecture of voice assistant 9

2.5. Use case Diagram 10

2.6 Sequence diagram 10

CHAPTER 3: IMPLEMENTATION 11

3.1. Modules Description 11

3.2. Implementation Details 13

3.3. Technology used 14

iii
3.3.1 Voice Recognition 14

3.3.2 Artificial Intelligence 14

3.4 Code 15

CHAPTER 4: EXPERIMENT RESULT 26

4.1. Testing 26

4.1.1 Voice App Testing Layers 26

4.1.2 Unit Testing 27

4.2. Results 27

CHAPTER 5: FURTHER SCOPE AND CONCLUSION 33

5.1 Future Scope 33

5.1.1 Future Integration 33

5.1.2 Natural Conversation 33

5.2 Conclusion 34

5.3 References 35

iv
LIST OF FIGURES

Figure Name Page No.

4.2.1 Wikipedia Search Result 28

4.2.2 Play video on YouTube 28

4.2.3 Open Google 29

4.2.4 Fact Telling 29

4.2.5 Open Gaana 30

4.2.6 Open calculator 30

4.2.7 Joke Telling 31

4.2.8 Open Microsoft Word 31

4.2.9 Open Microsoft Excel 32

4.2.10 Open Microsoft Power Point 32

v
AI DESKTOP VOICE ASSISTANT

CHAPTER – 1

1.1. Introduction

In this chapter we are going to see about voice assistant, what is voice assistant and how it
works. Many of us might have already known about this voice assistant and we use this in our day-
to-day life. A voice assistant is a digital assistant that uses voice recognition, language processing
algorithms, and voice synthesis to listen to specific voice commands and return relevant
information or perform specific functions as requested by the user. A brief description is given
about them in this chapter.

Speech is an effective and natural way for people to interact with applications, complementing
or even replacing the use of mice, keyboards, controllers, and gestures. A handsfree, yet accurate
way to communicate with applications, speech lets people be productive and stay informed in a
variety of situations where other interfaces will not. Speech recognition is a topic that is very useful
in many applications and environments in our daily life.

Generally, speech recognizer is a machine which understands humans and their spoken word
in some way and can act thereafter. A different aspect of speech recognition is to facilitate for
people with functional disability or other kinds of handicap. To make their daily chores easier,
voice control could be helpful. With their voice they could operate the light switch turn off/on or
operate some other domestic appliances. This leads to the discussion about intelligent homes where
these operations can be made available for the common man as well as for handicapped.

This is application acts as your personal assistant such as google or Siri, which features in Making
notes, searching information on Wikipedia, opening YouTube, google, it can tell jokes, it can play

DEPT OF CSE, VKIT 1 2023-24


AI DESKTOP VOICE ASSISTANT

music from your local directory, it can tell you the current date or time, it also has the feature to
send emails. It uses the google calendar api by which you will stay updated by your current events
also. A brief description is given about them in this chapter.

1.2 Objective

Voice assistant will send emails without typing a single word, doing Wikipedia searches
without opening web browsers, and performing many other daily tasks like playing music with the
help of a single voice command. Voice based personal assistants have gained a lot of popularity in
this era of smart homes and smart devices.
These personal assistants can be easily configured to perform many of your regular tasks
by simply giving voice commands. The Most famous application of iPhone is “SIRI” which helps
the end user to communicate end user mobile with voice and it also responds to the voice
commands of the user. Same kind of application is also developed by the Google that is
“Google Voice Search” which is used for in Android Phones. But this Application mostly works
with Internet Connections. But our Proposed System has capability to work with and without
Internet Connectivity. It is named as Personal Assistant with Voice Recognition Intelligence, which
takes the user input in form of voice or text and process it and returns the output in various forms
like action to be performed or the search result is dictated to the end user. In addition, this proposed
system can change the way of interactions between end user and the mobile devices. The system
is being designed in such a way that all the services provided by the mobile devices are accessible
by the end user on the user's voice commands.

DEPT OF CSE, VKIT 2 2023-24


AI DESKTOP VOICE ASSISTANT

1.3 Applications of Speech

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to


allow the use of speech recognition and speech synthesis within Windows applications. To date, a
number of versions of the API have been released, which have shipped either as part of a Speech
SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office,
Microsoft Agent and Microsoft Speech Server.

In our project we use The Speech Application Programming Interface or SAPI5 is an API
developed by Microsoft to allow the use of speech recognition and speech synthesis within
Windows applications. Applications that use SAPI include Microsoft Office, Microsoft Agent and
Microsoft Speech Server. In general, all API have been designed such that a software developer
can write an application to perform speech recognition and synthesis by using a standard set of
interfaces, accessible from a variety of programming languages.

In addition, it is possible for a 3rd-party company to produce their own Speech Recognition
and Text-To-Speech engines or adapt existing engines to work with SAPI. Basically, Speech
platform consist of an application runtime that provides speech functionality, an Application

DEPT OF CSE, VKIT 3 2023-24


AI DESKTOP VOICE ASSISTANT

Program Interface (API) for managing the runtime and Runtime Languages that enable speech
recognition and speech synthesis (text-to-speech or TTS) in specific languages.

1.4 Speech-To-Text:
This is the process of converting speech to text. It is also called speech recognition.

Through this process user’s commands are converted into text. Now, this text is used by further
processes to extract information from it. Note that the speech-to-text model should be independent
of the user’s accent and way of pronunciation. Speech-to-text software is a type of software that
effectively takes audio content and transcribes it into written words in a word processor or other
display destination. This type of speech recognition software is extremely valuable to anyone who
needs to generate a lot of written content without a lot of manual typing.
It is also useful for people with disabilities that make it difficult for them to use a keyboard.

Speech-to-text software may also be known as voice recognition software.

DEPT OF CSE, VKIT 4 2023-24


AI DESKTOP VOICE ASSISTANT

DEPT OF CSE, VKIT 5 2023-24


AI DESKTOP VOICE ASSISTANT

1.5 Overview of the project:

Virtual Assistant which is developed using python that can understand voice commands
and complete tasks for a user. In this we will run specific commands and this will be executed back
to the user. A voice assistant is a digital assistant that uses voice recognition, language processing
algorithms, and voice synthesis to listen to specific voice commands and return relevant
information or perform specific functions as requested by the user.

Based on specific commands, sometimes called intents, spoken by the user, voice assistants
can return relevant information by listening for specific keywords and filtering out the ambient
noise. Today, voice assistants are integrated into many of the devices we use on a daily basis, such
as cell phones, computers, and smart speakers. Because of their wide array of integrations, there
are several voice assistants who offer a very specific feature set, while some choose to be open
ended to help with almost any situation at hand.

DEPT OF CSE, VKIT 6 2023-24


AI DESKTOP VOICE ASSISTANT

CHAPTER - 2
2. Analysis and Design

Based on specific commands, sometimes called intents, spoken by the user, voice
assistants can return relevant information by listening for specific keywords and filtering out the
ambient noise. Today, voice assistants are integrated into many of the devices we use on a daily
basis, such as cell phones, computers, and smart speakers. Because of their wide array of
integrations, there are several voice assistants who offer a very specific feature set, while some
choose to be open ended to help with almost any situation at hand.

2.1 Benefits of Voice assistant:

It can play music for you, it can do Wikipedia searches for you, it is capable of opening
websites like Google, You tube, etc., in a web browser.

One of the main reasons of the growing popularity of Voice User Interfaces (VUI) is due to
the growing complexity within mobile software without an increase in screen size, leading to a
huge disadvantage by using a GUI (Graphical User Interface). As more iterations of phones come
out, the screen sizes stay relatively the same, leading for very cramped interfaces and creating
frustrating user experiences, which is why more and more developers are switching to Voice User
Interfaces.

DEPT OF CSE, VKIT 7 2023-24


AI DESKTOP VOICE ASSISTANT

2.2 User Interface:


To further understand voice assistants, it is important to take a look at the overall user
Experience and what a User Interface is and how a VUI differs from a more traditional graphical
user Interface that modern apps currently use.

2.2.1 Graphical User Interface (GUI): A Graphical User Interface is what is most

commonly used today. For example, the internet browser you’re using to read this article is a
graphical user interface. Using graphical icons and visual indicators, the user is able to interact
with machines quicker and easier than before. A Graphical User Interface can be used in something
like a chatbot, where the user communicates with the device over text, and the machine responds
with natural conversation text. The big downside to this is since it is done all in text, it can seem
cumbersome and inefficient, and can take longer than voice in certain situations.

2.2.2 Voice User Interface (VUI): An example of a VUI is something like Siri, where there
is an audio cue that the device is listening, followed by a verbal response. Most apps today combine
a sense of both Graphical and Voice User Interfaces. For example, when using a maps application,
you can use voice to search for destinations and the application will show you the most relevant
results, placing the most important information at the top of the screen. Some examples of popular
smart assistants today are Alan, Amazon Alexa, Siri by Apple, and Google Voice Assistant. A
voice-user interface (VUI) makes spoken human interaction with computers possible, using speech
recognition to understand spoken commands and answer questions, and typically text to speech to
play a reply. A voice command device (VCD) is a device controlled with a voice user interface.
They are the primary way of interacting with virtual assistants on smartphones and smart speakers.
Older automated attendants (which route phone calls to the correct extension) and interactive voice
response systems (which conduct more complicated transactions over the phone) can respond to
the pressing of keypad buttons via DTMF tones, but those with a full voice user interface allow
callers to speak requests and responses without having to press any buttons.

DEPT OF CSE, VKIT 8 2023-24


AI DESKTOP VOICE ASSISTANT

2.3 Data Flow Diagram:

2.4 Architecture of voice assistant:

DEPT OF CSE, VKIT 9 2023-24


AI DESKTOP VOICE ASSISTANT

2.5 Use case Diagram:

2.6 Sequence Diagram:

DEPT OF CSE, VKIT 10 2023-24


AI DESKTOP VOICE ASSISTANT

CHAPTER – 3

3. Implementation

For building any voice-based assistant you need two main functions. One for listening to your
commands and another to respond to your commands. Along with these two core functions, you
need the customized instructions that you will feed your assistant.

For building any voice-based assistant you need two main functions. One for listening to your
commands and another to respond to your commands. Along with these two core functions, you
need the customized instructions that you will feed your assistant.

The required modules for voice assistant in python: Speech Recognition, Wikipedia, Web browser,
OS, smtplib, pyttsx3, Datetime, Random, Pass1.

3.1 Modules Description

➢ Pttsx3: A python library that will help us to convert text to speech. In short, it is a textto-speech
library. It works offline, and it is compatible with Python 2 as well as python 3. To install this
module type pip, install pyttsx3 command. An application invokes the pyttsx3.init() factory
function to get a reference to a pyttsx3. Engine instance. it is a very easy to use tool which converts
the entered text into speech. The pyttsx3 module supports two voices first is female and the second
is male which is provided by “sapi5” for windows.

➢ Speech Recognition: The speech recognition is the library for performing speech
recognition, with support for several engines and APIs, online and offline. It recognizing the voice
command and converting to text.

DEPT OF CSE, VKIT 11 2023-24


AI DESKTOP VOICE ASSISTANT

➢ Wikipedia: The Wikipedia library will allow us to get information about the user query
from Wikipedia. Wikipedia is a multilingual online encyclopedia created and maintained as
an open collaboration project by a community of volunteer editors using a wiki-based editing
system. In order to extract data from Wikipedia, we must first install the python Wikipedia
library, which wraps the official Wikipedia API.

➢ Pypiwin32: Python extensions for Microsoft windows provides access to much of the Win32
API, the ability to create and use COM objects, and the Pythonwin evironment.

➢ Pyaudio: Pyaudio is a Python binding for PortAudio, a cross-platform library for input and
output of audio. This basically means that we can use Pyaudio to record and play sound across all
platforms and Operating systems such as windows, Mac and Linux. PyAudio provides Python
bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use
Python to play and record audio on a variety of platforms.

➢ Randfacts: Randfacts is a versatile Python package designed to generate random, interesting


facts, suitable for enhancing various applications, from chatbots to educational tools. Easy to install
via pip, it supports generating facts in multiple languages and offers a feature to filter out offensive
content, ensuring suitability for all audiences. By calling a simple function, users can integrate
random facts into their projects, adding an element of fun and engagement. Whether used in
entertainment apps, educational platforms, marketing content, or chatbots, Randfacts provides a
quick and delightful way to present trivia. Its open-source nature invites customization and
contributions, making it a valuable addition to any developer's toolkit.

DEPT OF CSE, VKIT 12 2023-24


AI DESKTOP VOICE ASSISTANT

3.2 Implementation Details

➢ Random: The random module is a built-in module to generate the pseudo-random variables. It
can be used perform some action randomly such as to get a random number, selecting a random
element from a list, shuffle elements randomly, etc.

➢ Speech Recognition: Speech recognition has an instance named recognizer and as the name
suggests it recognizes the speech (whether from an audio file or microphone). Create a recognizer
instant as follows:

➢ Datetime: The datetime library will allow us to get the current date and time. This
module comes built-in with python. Unlike other programming languages, python doesn’t
allow us to work with date objects directly. To work with date and time objects, you have
to import a module called datetime in python. This python datetime module holds different
classes to work with dates or to manipulate dates and times. It is used to show date and
time.

➢ Web browser: The web browser provides a high-level interface to allow displaying
Web-based documents to users. Under most circumstances, simply calling the open ()
function will open the web browser. The web browser module includes function to open
URLs in interactive browser applications. The module includes a registry of available
browsers, in case multiple options are available on the system. It can also be controlled

DEPT OF CSE, VKIT 13 2023-24


AI DESKTOP VOICE ASSISTANT
with the BROWSER environment variable. To open any website, we need to import a
module called web browser. It is an in-built module. We can directly import it into our
program by writing an import statement

➢ OS: The OS module in python provides a way of using operating system dependent functionality. The
functions that the OS module provides allows you to interface with the underlying operating system
that python is running on. The OS module in python provides functions for interacting with the
operating system. OS comes under python’s standard utility modules. This module provides a portable
way of using operating-system-dependent functionality. The os and os.path modules include many
functions to interact with the file system. To open the apps we need to import a module called OS.
Import this module directly with an import statement.

3.3 Technology Used:

Voice assistants use Artificial Intelligence and Voice recognition to accurately and efficiently
deliver the result that the user is looking for. While it may seem simple to ask a computer to set a
timer, the technology behind it is fascinating.

3.3.1. Voice Recognition: Voice recognition works by taking an analog signal from a user’s
voice and turning it into a digital signal. After doing this, the computer takes the digital signal and
attempts to match it up to words and phrases to recognize the user’s intent. To do this, the computer
requires a database of pre-existing words and syllables in a given language to be able to closely
match the digital signal with. Checking the input signal with this database is known as pattern
recognition, and is the primary force behind voice recognition.

DEPT OF CSE, VKIT 14 2023-24


AI DESKTOP VOICE ASSISTANT

3.3.2. Artificial Intelligence: Artificial intelligence is using machines to simulate and


replicate human intelligence.

In 1950, Alan Turing (The namesake of our company) published his paper “Computing
Machinery and Intelligence” that first asked the question, can machines think? Alan Turing then
went on to develop the Turing Test, a method of evaluating a computer to test its capability of
thinking like a human. There were four approaches later developed that defined AI, thinking
humanly/rationally, and acting humanly/rationally. While the first two deal with reasoning, the
second two deal with actual behavior. Modern AI is typically seen as a computer system designed
to accomplish tasks that typically require human interaction. These systems can improve upon
themselves using a process known as machine learning.

3.4 CODE

main.py file
Import pyttsx3 as p
import speech_recognition as sr
from selenium_web import *
from YT_auto import *
from News import *
import randfacts
from jokes import *
from selenium_web import infow
from wheather import *
import datetime
from Whatsappcall import *
from Whatsappmessage import *
from mail import send_email # Ensure you import the send_email function
import webbrowser
import os

engine = p.init()
rate = engine.getProperty('rate')
engine.setProperty('rate', 170)

DEPT OF CSE, VKIT 15 2023-24


AI DESKTOP VOICE ASSISTANT
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)

r = sr.Recognizer()
def speak(text):
engine.say(text)
engine.runAndWait()

def wishme():
hour = int(datetime.datetime.now().hour)
if hour > 0 and hour < 12:
return "Morning"

elif hour >= 12 and hour < 16:


return "afternoon"
else:
return "evening"

def listen_command():
with sr.Microphone() as source:
r.energy_threshold = 10000
r.adjust_for_ambient_noise(source, 1.2)
print("Listening...")
audio = r.listen(source)

try:
command = r.recognize_google(audio)
print(f"Recognized command: {command}")

return command.lower()

except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
speak("Sorry, I did not catch that.")

except sr.RequestError as e:

print(f"Could not request results from Google Speech Recognition service; {e}")
speak("Sorry, there was an issue with the speech recognition service.")
return ""

DEPT OF CSE, VKIT 16 2023-24


AI DESKTOP VOICE ASSISTANT

def open_microsoft_word():
try:
path = "C:\\Program Files\\Microsoft Office\\root\\Office16\\WINWORD.EXE" # Update
with the actual path to MS Word

os.startfile(path)
print(f"Attempting to open: {path}")
speak("Microsoft Word is now opened")
except Exception as e:

print(f"Failed to open Microsoft Word: {e}")


speak("Failed to open Microsoft Word")

def main():

today_date = datetime.datetime.now()
formatted_date = today_date.strftime("%d of %B")
formatted_time = today_date.strftime("%I:%M %p")

speak("Hello ma'am, good " + wishme() + " I am your voice assistant")


speak('Today is ' + formatted_date + ' and it\'s currently ' + formatted_time)
speak("Temperature in Bengaluru is " + str(temp()) + " degree celsius" + " and with " +
str(des()))

speak("What can I do for you")


print("What can I do for you")

command = listen_command()

if "information" in command:
speak("You need information related to which topic")
print("You need information related to which topic")
info_topic = listen_command()

if info_topic:

speak(f"Searching {info_topic} in Wikipedia")


print(f"Searching {info_topic} in Wikipedia")
assist = infow()
assist.get_info(info_topic)

DEPT OF CSE, VKIT 17 2023-24


AI DESKTOP VOICE ASSISTANT

elif "play" in command and "video" in command:


speak("You want me to play which video?")
print("You want me to play which video?")
video_name = listen_command()
if video_name:

speak(f"Playing {video_name} on YouTube")


print(f"Playing {video_name} on YouTube")
assist = Music()
assist.play(video_name)

elif "news" in command:

print("Sure ma'am. Now I will read news for you")


speak("Sure ma'am. Now I will read news for you")
arr = news(json_data)

for i in range(len(arr)):
print(arr[i])
speak(arr[i])

elif "fact" in command:


speak("Sure ma'am")
print("Sure ma'am")
x = randfacts.get_fact()
print(x)
speak("Did you know that " + x)

elif "joke" in command:


print("Sure ma'am. Get ready to laugh")
speak("Sure ma'am. Get ready to laugh")

arr = joke()
print(arr[0])
speak(arr[0])
print(arr[1])
speak(arr[1])

DEPT OF CSE, VKIT 18 2023-24


AI DESKTOP VOICE ASSISTANT

elif "calculator" in command:


os.startfile("calc.exe")
speak("Calculator is now opened")

elif "spotify" in command:


webbrowser.open("https://spotify.com/")
speak("spotify.com is now ready for you")

elif "music" in command:


webbrowser.open("https://gaana.com/")
speak("gaana.com is now ready for you ")

elif "google" in command:


webbrowser.open("https://google.com/")
speak("google.com is now ready for you ")

elif "microsoft word" in command:


open_microsoft_word()

elif "excel" in command:


path = "C:\\Program Files\\Microsoft Office\\root\\Office16\\EXCEL.EXE" # Update with
the actual path to MS Excel

try:
os.startfile(path)
print(f"Attempting to open: {path}")
speak("Microsoft Excel is now opened")

except Exception as e:
print(f"Failed to open Microsoft Excel: {e}")
speak("Failed to open Microsoft Excel")

elif "powerpoint" in command:

path = "C:\\Program Files\\Microsoft Office\\root\\Office16\\POWERPNT.EXE" # Update


with the actual path to MS PowerPoint

try:
os.startfile(path)

DEPT OF CSE, VKIT 19 2023-24


AI DESKTOP VOICE ASSISTANT

print(f"Attempting to open: {path}")


speak("Microsoft PowerPoint is now opened")

except Exception as e:

print(f"Failed to open Microsoft PowerPoint: {e}")


speak("Failed to open Microsoft PowerPoint")

elif "youtube" in command:

webbrowser.open("https://youtube.com/")
speak("youtube.com is now ready for you")

if __name__ == "__main__":
main()

os.py file

import os
import pyttsx3 as p
import speech_recognition as sr

engine = p.init()
rate = engine.getProperty('rate')
engine.setProperty('rate', 170)

voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)

r = sr.Recognizer()

def speak(text):
engine.say(text)
engine.runAndWait()

with sr.Microphone() as source:


r.energy_threshold = 10000
r.adjust_for_ambient_noise(source, 1.2)
print("Listening...")
audio = r.listen(source)

DEPT OF CSE, VKIT 20 2023-24


AI DESKTOP VOICE ASSISTANT

text = r.recognize_google(audio)
print(text)

path = "C:\\Program Files\\Microsoft Office\\root\\Office16\\WINWORD.EXE" # Update


with the actual path to MS Word
os.startfile(path)

print(f"Attempting to open: {path}")


speak("Microsoft Word is now opened")

Youtube.py file

from selenium import webdriver


from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import time

class Music:
def __init__(self):
chrome_options = Options()
chrome_options.binary_location = r'C:\Program
Files\Google\Chrome\Application\chrome.exe'

self.driver = webdriver.Chrome(options=chrome_options)

def play(self, query):


self.query = query
self.driver.get("https://www.youtube.com")

try:

# Wait for the search input field


search_input = WebDriverWait(self.driver, 20).until(

DEPT OF CSE, VKIT 21 2023-24


AI DESKTOP VOICE ASSISTANT

EC.element_to_be_clickable((By.XPATH, '//input[@id="search"]'))
)
search_input.click()
search_input.clear()
search_input.send_keys(query)
search_input.send_keys(Keys.RETURN)

# Handle search suggestions if they appear

try:
WebDriverWait(self.driver, 10).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, 'li.sbsb_c'))
)

first_suggestion = self.driver.find_element(By.CSS_SELECTOR, 'li.sbsb_c')


first_suggestion.click()

except:

pass

# Wait for the video link to be clickable

video_link = WebDriverWait(self.driver, 20).until(


EC.element_to_be_clickable((By.XPATH, '//a[@id="video-title"]'))
)

self.driver.execute_script("arguments[0].click();", video_link)

# Wait for the video player to load

video_player = WebDriverWait(self.driver, 20).until(


EC.presence_of_element_located((By.CLASS_NAME, 'html5-video-player'))
)

# Attempt to click the play button

self.click_play_button()

# Wait for the video to start playing

DEPT OF CSE, VKIT 22 2023-24


AI DESKTOP VOICE ASSISTANT

WebDriverWait(self.driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, 'playing-mode'))
)

# Add a delay to keep the browser open for a while

time.sleep(60) # Keeps the browser open for 60 seconds after video starts playing
except Exception as e:
print(f"An error occurred: {e}")
finally:

# Close the browser after a delay

time.sleep(10) # Wait for 10 seconds before closing the browser


try:
self.driver.quit()
except:
pass

def click_play_button(self):
try:

# Attempt to click the play button

play_button = WebDriverWait(self.driver, 10).until(


EC.element_to_be_clickable((By.XPATH, '//button[@aria-label="Play"]'))
)
self.driver.execute_script("arguments[0].click();", play_button)
except Exception as e:
print(f"Error clicking play button: {e}")

# Retry clicking play button

self.click_play_button()

DEPT OF CSE, VKIT 23 2023-24


AI DESKTOP VOICE ASSISTANT
News.py file

import requests
from secret import *
api_address = f"https://newsapi.org/v2/top-
headlines?sources=techcrunch&apiKey=f3ae43247b5f4f489a57000e51d11257={key}"
json_data = requests.get(api_address).json()
ar = []
def news(json_data):
if "articles" in json_data:
for i in range(3):
ar.append("number" + str(i + 1) + ": " + json_data["articles"][i]["title"] + ".")
return ar
else:
print("Error: 'articles' key not found in JSON response")
return ["No news articles found."]

jokes.py file

import requests
def get_joke():
url="http://official-joke-api.appspot.com/random_joke"
json_data = requests.get(url).json()
return [json_data["setup"], json_data["punchline"]]
def joke():
arr = get_joke()
return arr

weather.py file

import requests
from secret import *

# Form the correct API address with the API key


api_address = f'http://api.openweathermap.org/data/2.5/weather?q=bengaluru&appid={key2}'

# Make the request to the API


response = requests.get(api_address)

# Check if the request was successful


if response.status_code == 200:
json_data = response.json()
# Print the entire JSON response for debugging

DEPT OF CSE, VKIT 24 2023-24


AI DESKTOP VOICE ASSISTANT

def temp():
if "main" in json_data:
temperature = round(json_data["main"]["temp"] - 273.15, 1) # Using 273.15 for accurate
Celsius conversion
return temperature
else:
return "Temperature data not available"

def des():
if "weather" in json_data and len(json_data["weather"]) > 0:
description = json_data["weather"][0]["description"]
return description
else:
return "Weather description not available"

secret.py file

key = 'f3ae43247b5f4f489a57000e51d11257'
key2 = '0888b99dc59f34ae0edcca177fad19d4'

DEPT OF CSE, VKIT 25 2023-24


AI DESKTOP VOICE ASSISTANT

CHAPTER – 4

4. Test results/experiment result.

4.1 Testing:

We face different challenges when testing voice apps than when we test GUI apps. For
instance, GUI apps limit the number of possible interactions a user might do. Voice, on the other
hand, allows a much richer and complex set of spoken interactions, increasing the difficulty of
testing. Additionally, the backend behind voice apps includes several components not owned by
developers. These AI-powered elements are constantly learning and evolving by gathering insights
from the myriads of interactions they receive. This is why they get constant updates and
improvements, which requires us to keep up on our side by doing continuous verification to be
sure nothing has broken, and that our app continues to deliver great voice experiences to our users.
We are witnessing a notable increase in the complexity of voice applications as a result of the effort
companies do to provide enriched experiences that allow users to solve real and day-to-day
problems. In this scenario, testing voice apps is a must. It doesn’t matter if your approach to testing
follows the popular waterfall model (requirements, analysis, design, coding, testing, deployment)
or test-driven development practices (TDD). In any case, it's vital that you find the bugs in your
code before your customers do. A voice app free from errors is the key to ensuring that your users
enjoy the content you’re offering.

DEPT OF CSE, VKIT 26 2023-24


AI DESKTOP VOICE ASSISTANT

4.1.1. Voice app testing layers:

4.1.2. Unit testing: This type of test is targeted to voice app developers. They need to do unit
testing to ensure the code is working correctly in isolation. As you perform unit testing while
coding, you need it to be fast, so it doesn’t interrupt your coding pace. Unit testing is focused on
making sure your code and logic are correct so there is no need to hit the cloud (where most voice
apps backend lives), or call real external services. It is important that the unit testing tool you
choose supports mocks and preferably runs locally.

4.2. Result:

Here are some of the examples and output, which can help you understand how the above
processing works. it will do what is told. If the assistant can’t understand what is told it will ask
you to say that again please

DEPT OF CSE, VKIT 27 2023-24


AI DESKTOP VOICE ASSISTANT

4.2.1. Wikipedia searches result:

4.2.2. User says “Play Oye movie song ”:

DEPT OF CSE, VKIT 28 2023-24


AI DESKTOP VOICE ASSISTANT

4.2.3. User says “open google”:

4.2.4. User says “Tell me a fact”:

DEPT OF CSE, VKIT 29 2023-24


AI DESKTOP VOICE ASSISTANT

4.2.5. User says “open gaana”:

4.2.6. User says “Open Calculator”:

DEPT OF CSE, VKIT 30 2023-24


AI DESKTOP VOICE ASSISTANT

4.2.7 User says “Tell me a joke”

4.2.8 User says “Open Microsoft word”

DEPT OF CSE, VKIT 31 2023-24


AI DESKTOP VOICE ASSISTANT

4.2.9 User says “Open Microsoft excel”

4.2.10 User says “Open Microsoft Power Point”

DEPT OF CSE, VKIT 32 2023-24


AI DESKTOP VOICE ASSISTANT

CHAPTER - 5

5. Conclusion and Further Scope:

5.1 Further scope:


As AI becomes more advanced and voice technology becomes more accepted, not only
will voice controlled digital assistants become more natural, they will also become more integrated
into more daily devices. Also, conversations will become much more natural, emulating human
conversations, which will begin to introduce more complex task flows. More and more people are
using voice assistants too, as it was estimated in early 2019 that 111.8 million people in the US
will use a voice assistant at least monthly, up 9.5% from last year.

In the future, devices will be more integrated with voice, and it will become easier and
easier to search using voice. For example, Amazon has already released a wall clock that comes
enabled with Amazon Alexa, so you can ask it to set a timer or tell you the time. While these
devices aren’t full blown voice activated personal assistants, they still show a lot of promise in the
coming years. Using vocal commands, we will be able to work with our devices just by talking.

5.1.1. Further Integration: In the future, devices will be more integrated with voice, and it will
become easier and easier to search using voice. For example, Amazon has already released a wall
clock that comes enabled with Amazon Alexa, so you can ask it to set a timer or tell you the time.
While these devices aren’t full blown voice activated personal assistants, they still show a lot of
promise in the coming years. Using vocal commands, we will be able to work with our devices
just by talking.

5.1.2. Natural Conversations: Currently, as users are getting more used to using voice to
communicate with their digital devices, conversations can seem very broken and awkward. But in
the future, as digital processing becomes quicker and people become more accustomed to using
voice assistants in their everyday devices, we will see a shift where users won’t have to pause and
wait for the voice assistant to catch up, and instead we will be able to have natural conversations
with our voice assistants, creating a more soothing and natural experience.

DEPT OF CSE, VKIT 33 2023-24


AI DESKTOP VOICE ASSISTANT

5.2. Conclusion:

Voice Controlled Personal Assistant System will use the Natural language processing
and can be integrated with artificial intelligence techniques to achieve a smart assistant that can
control IoT applications and even solve user queries using web searches. It can be designed to
minimize the human efforts to interact with many other subsystems, which would otherwise have
to be performed manually. By achieving this, the system will make human life comfortable. More
specifically, this system is designed to interact with other subsystems intelligently and control
these devices, this includes IoT devices or getting news from Internet, providing other information,
getting personalized data saved previously on the system, etc. The android application should let
the user add data such as calendar entries, set alarm, or even reminders. The software will facilitate
ease of access to various other devices and platforms. The system will have the following phases:
Data collection in the form of voice; Voice analysis and conversion to text; Data storage and
processing; generating speech from the processed text output. The data generated at every phase
can further be used to find patterns and suggest user later. This can be a major base for artificial
intelligence machines that learns and understand users. Thus, on the basis of literature survey and
by analyzing the existing system, we have come to a conclusion that the proposed system will not
only ease to interact with the other systems and modules but also keeps us organized. There is still
a lot of ground to be covered up in the world of automation but the skills of the device can help to
build a new generation of voice-controlled devices and bring a new sustaining change in the field
of automation. This paper can also act as a prototype for many advanced applications.

DEPT OF CSE, VKIT 34 2023-24


AI DESKTOP VOICE ASSISTANT

5. References:

[1] https://towardsdatascience.com/how-to-build-your-own-ai-personal-assistant-using-
pythonf57247b4494b

[2] Intelligentpersonalassistantarchitecture
https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-1.htm

[3] Analytics Vidhya, Build your own desktop voice assistant in python
https://www.analyticsvidhya.com/blog/2020/11/build-your-own-desktop-voice-assistant-
inpython/

[4] Deller John R., Jr., Hansen John J.L., Proacts John G., Discrete-Time Processing of Speech

Signals, IEEE Press, ISBN 0-7803-5386-2

[5] www.investopedia.com/terms/a/artificial-intelligence-ai.asp

DEPT OF CSE, VKIT 35 2023-24

You might also like