0% found this document useful (0 votes)
9 views

Web Tech 2 and Data Anlaytics Notes

The document provides an introduction to web techniques, specifically focusing on PHP and XML. It covers key concepts such as variables, forms, error handling, and state maintenance in PHP, as well as XML structure, parsing, and characteristics. The document includes both short and long answers to various questions, offering examples and explanations for better understanding.

Uploaded by

gauravjanmare07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Web Tech 2 and Data Anlaytics Notes

The document provides an introduction to web techniques, specifically focusing on PHP and XML. It covers key concepts such as variables, forms, error handling, and state maintenance in PHP, as well as XML structure, parsing, and characteristics. The document includes both short and long answers to various questions, offering examples and explanations for better understanding.

Uploaded by

gauravjanmare07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Chp1: Introduction to Web Techniques

Here are the answers to your short-answer questions:


1. Define variable.
A variable is a named storage location in memory used to store
data that can be changed during program execution. In PHP,
variables start with a $ sign (e.g., $name = "John";).
2. Give the purpose of form.
A form in HTML is used to collect user input and send it to a
server for processing. It is created using the <form> element
and can include input fields like text boxes, radio buttons,
checkboxes, and submit buttons.
3. List predefined variables in PHP.
PHP has several predefined variables, including:
o $_GET
o $_POST
o $_REQUEST
o $_SESSION
o $_COOKIE
o $_FILES
o $_SERVER
o $_ENV
o $_GLOBALS
4. What is the purpose of POST and GET methods in forms in
PHP?
o GET: Sends form data as a URL query string, making it
visible in the browser's address bar. It is used for
retrieving data.
o POST: Sends form data in the request body, making it
more secure and suitable for sending sensitive
information or large amounts of data.
5. Which file is used to configure the behavior of PHP?
The php.ini file is used to configure PHP's behavior, including
error reporting, file uploads, and execution time limits.
6. Define sticky form.
A sticky form retains user input values even after submission,
allowing users to see their previously entered data. This is
achieved using PHP by setting the value attribute dynamically.
7. Give the purpose of error_log().
The error_log() function is used to send error messages to a log
file, system log, or an email address, helping in debugging and
error tracking.
8. Which array is used for file uploading?
The $_FILES superglobal array is used for handling file uploads
in PHP.
9. Give the purpose of header ().
The header() function is used to send raw HTTP headers, such
as redirecting to another page, setting content types, or
controlling caching.
10. Define cookies.
A cookie is a small piece of data stored on the client's browser
to remember user preferences and session information across
multiple visits. It is created using set cookie ().
11. What is the use of trigger_error()?
The trigger_error() function is used to generate a user-defined
error message and can be used for debugging or custom error
handling.
12. Define session.
A session is a mechanism to store user data across multiple web
pages during a visit. Unlike cookies, session data is stored on
the server and is accessed using the $_SESSION super global.
13. Define error.
An error is an issue in a program that prevents it from executing
correctly. PHP errors can be syntax errors, runtime errors, or
logical errors.
14. Enlist techniques for maintaining the state.
Techniques for maintaining the state in PHP include:
o Sessions
o Cookies
o Hidden form fields
o URL rewriting
o Databases
Here are detailed answers to your long-answer questions:

1. What are variables in PHP? Explain in detail.


A variable in PHP is a named storage location used to hold data that
can change during program execution. Variables in PHP start with a $
sign, followed by the variable name.
Rules for Naming Variables:
• Must start with a letter or an underscore (_).
• Cannot start with a number.
• Can contain letters, numbers, and underscores.
• PHP variables are case-sensitive ($name and $Name are
different).
Example:
$name = "John"; // String variable
$age = 25; // Integer variable
$price = 19.99; // Float variable
$ is Student = true; // Boolean variable

echo "Name: $name, Age: $age, Price: $price, Student: $ is Student";

2. What is a form? How to create and process it? Explain with an


example.
A form is an HTML element used to collect user input and send it to a
server for processing. PHP processes the form data using $_POST or
$_GET.
Example of a Form:
<form action="process . php" method="POST">
Name: <input type="text" name="name"><br>
Age: <input type="number" name="age"><br>
<input type="submit" value="Submit">
</form>
Processing the Form in PHP (process.php):
if ($_SERVER["REQUEST_METHOD"] == "POST") {
$name = $_POST['name'];
$age = $_POST['age'];
echo "Name: $name, Age: $age";
}

3. What is an error and an error handler?


An error in PHP is an issue that prevents the program from running
correctly. Errors are categorized as:
• Syntax errors (e.g., missing semicolon)
• Runtime errors (e.g., division by zero)
• Logical errors (e.g., incorrect calculations)
Error Handling Functions:
• error_reporting()
• set_error_handler()
• trigger_error()
• error_log()
Example of a Custom Error Handler:
function customError($errno, $errstr) {
echo "Error [$errno]: $errstr";
}
set_error_handler("customError");
echo 10 / 0; // Triggers erro

4. How to get server information? Describe in detail.


PHP provides the $_SERVER superglobal to access server information.
Example:
echo "Server Name: " . $_SERVER['SERVER_NAME'] . "<br>";
echo "Server Software: " . $_SERVER['SERVER_SOFTWARE'] . "<br>";
echo "Client IP: " . $_SERVER['REMOTE_ADDR'] . "<br>";

5. Describe GET and POST methods with examples. Differentiate


them.
• GET sends data via the URL.
• POST sends data in the request body.
Example:
<form action="process.php" method="GET">
Name: <input type="text" name="name">
<input type="submit" value="Submit">
</form>
Differences:
Feature GET POST

Data Visibility Visible in URL Hidden

Data Length Limited Unlimited

Security Less secure More secure

Use Case Retrieving data Sending sensitive data

6. Short note: Automatic quoting of parameters.


Automatic quoting refers to PHP automatically adding escape
characters to prevent SQL injection or script injection. However,
magic_quotes_gpc has been deprecated.

7. Explain self-processing pages in detail.


A self-processing page submits its form data to itself.
Example:
<form method="POST">
Name: <input type="text" name="name">
<input type="submit">
</form>

if ($_SERVER["REQUEST_METHOD"] == "POST") {
echo "Hello, " . $_POST['name'];
}
8. Sticky Forms (Example)
Sticky forms retain user input after submission.
Example:
<form method="POST">
Name: <input type="text" name="name" value="<?php echo
isset($_POST['name']) ? $_POST['name'] : ''; ?>">
<input type="submit">
</form>

9. Multi-valued and Sticky Multi-valued Parameters


Multi-valued parameters handle multiple selections.
Example:
<form method="POST">
Select hobbies:
<select name="hobbies[]" multiple>
<option value="reading">Reading</option>
<option value="sports">Sports</option>
</select>
<input type="submit">
</form>
if (!empty($_POST['hobbies'])) {
print_r($_POST['hobbies']);
}
10. File Uploads in PHP (Example)
<form action="upload.php" method="POST"
enctype="multipart/form-data">
<input type="file" name="file">
<input type="submit" value="Upload">
</form>

if ($_FILES) {
move_uploaded_file($_FILES["file"]["tmp_name"], "uploads/" .
$_FILES["file"]["name"]);
echo "File uploaded!";
}

11. Form Validation in PHP (Example)


if ($_SERVER["REQUEST_METHOD"] == "POST") {
if (empty($_POST["name"])) {
echo "Name is required";
}
}
<form method="POST">
Name: <input type="text" name="name">
<input type="submit">
</form>
12. Setting Response Headers (Example)header("Content-Type:
application/json");
echo json_encode(["message" => "Hello, World!"]);

13. Maintaining State in PHP


State can be maintained using:
• Sessions
• Cookies
• Hidden fields
• URL parameters

14. Cookies in PHP (Diagram & Example)


Cookies store user data on the client side.
setcookie("user", "John", time() + 3600);

15. Sessions in PHP (Example)


session_start();
$_SESSION["user"] = "John";
echo "Session set!";
16. Custom Error Handler (Example)
function myErrorHandler($errno, $errstr) {
echo "Custom Error: [$errno] $errstr";
}
set_error_handler("myErrorHandler");
trigger_error("An error occurred!", E_USER_WARNING);

17. Differences Between Cookies and Sessions


Feature Cookies Sessions

Storage Client-side Server-side

Expiry Can be set Expires on browser close

Security Less secure More secure

Data Limit Limited Unlimited

18. How to Trigger and Log an Error?


Triggering an Error
trigger_error("This is a custom error!", E_USER_WARNING);
Logging an Error
error_log("Custom error message", 3, "errors.log");
Chp2:XML

Here are the answers to your short-answer questions:


1. What is XML?
XML (Extensible Markup Language) is a markup language used
to store and transport data in a structured and readable format.
It is both human- and machine-readable.
2. What is Simple XML?
Simple XML is a PHP extension that provides an easy way to
read, parse, and manipulate XML data using object-oriented
and array-like access.
3. List major parts of XML.
o Declaration (<?xml version="1.0" encoding="UTF-8"?>)
o Elements (<tag>data</tag>)
o Attributes (<tag attribute="value">data</tag>)
o Text content
o Comments (<!-- This is a comment -->)
4. Give any two syntax rules for XML.
o Every XML document must have a root element.
o All tags must be properly nested and closed.
5. What is CDATA?
CDATA (Character Data) is a section within XML that allows
including characters that would otherwise be treated as XML
markup.
Example:
<![CDATA[ <message>Hello World!</message> ]]>
6. What is the structure of XML?
An XML document follows a tree structure with a single root
element containing child elements, attributes, and text data.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<students>
<student id="1">
<name>John</name>
<age>20</age>
</student>
</students>
7. Give the relationship between XML and PHP.
PHP can parse, read, and manipulate XML data using extensions
like Simple XML, DOM, and XML Parser, making XML useful for
data storage and exchange.
8. Define XML parser.
An XML parser is a software component that reads XML
documents and provides a way to access or modify their data
programmatically.
9. Define DOM.
DOM (Document Object Model) is a programming interface that
treats an XML document as a tree structure, allowing programs
to modify its content and structure dynamically.
10. Enlist parts of an XML document.
• XML Declaration
• Root Element
• Child Elements
• Attributes
• Text Data
• Comments
• Processing Instructions

Here are detailed answers to your long-answer questions:

1. What are the characteristics of XML? Explain in detail.


XML (Extensible Markup Language) has several important
characteristics:
1. Self-descriptive: XML uses tags to define data, making it easy to
understand.
2. Structured & Hierarchical: XML follows a tree structure, where
each element has a parent and can have child elements.
3. Custom Tags: Unlike HTML, XML allows users to create their
own tags to define data meaningfully.
4. Platform-independent: XML can be used across different
systems and applications.
5. Supports Unicode: XML supports multiple languages and
character sets.
6. Extensible: New elements and attributes can be added without
affecting existing structures.
7. Interoperability: XML allows data exchange between different
applications and platforms.
Example XML:
<?xml version="1.0" encoding="UTF-8"?>
<person>
<name>John Doe</name>
<age>30</age>
<city>New York</city>
</person>

2. Explain document structure of XML.


An XML document follows a hierarchical tree structure. It consists of:
1. XML Declaration: Defines the XML version and encoding.
<?xml version="1.0" encoding="UTF-8"?>
2. Root Element: The main element containing all other elements.
<company>
3. Child Elements: Nested elements representing data.
<employee>
<name>John Doe</name>
<age>30</age>
</employee>
4. Attributes: Additional information about elements.
<employee id="101">
5. CDATA & Comments: Sections for special content.
<!-- This is a comment -->
<![CDATA[ Special characters like < and > can be included here ]]>

3. How to process XML? Describe with example.


Processing XML in PHP can be done using Simple XML or DOM
Parser.
Using Simple XML:
$xml = simple xml_load_file("data.xml");
echo $xml->name; // Access XML data
Using DOM Parser:
$dom = new DOMDocument();
$dom->load("data.xml");
echo $dom->getElementsByTagName("name")->item(0)->nodeValue;

4. Short Note: Predefined Character Entities.


Some characters are reserved in XML and must be replaced with
predefined entities:
Character Entity

< &lt;

> &gt;

& &amp;

' &apos;

" &quot;
Example:
<message>3 &lt; 5 is true</message>

5. Describe CDATA with an example.


CDATA (Character Data) is used to include special characters without
XML parsing them.
Example:
<note>
<![CDATA[ This <tag> is inside CDATA and will not be parsed! ]]>
</note>

6. XML and PHP (Example)


PHP can read and parse XML data.
Example XML File (data.xml):
<user>
<name>John</name>
<email>[email protected]</email>
</user>
PHP Code:
$xml = simplexml_load_file("data.xml");
echo "User Name: " . $xml->name;

7. What is an XML parser? What are its types? Compare them.


An XML parser reads and processes XML data.
Types of XML Parsers:
1. DOM (Document Object Model)
o Reads the entire XML file into memory.
o Allows modification of XML data.
o Uses more memory.
2. SAX (Simple API for XML)
o Reads XML data sequentially (event-driven).
o More memory efficient.
o Cannot modify data directly.
Feature DOM Parser SAX Parser

Memory Usage High Low

Read Method Whole Document Stream (Event-based)

Modification Yes No

8. What is DOM? How to create an XML document using DOM?


DOM (Document Object Model) represents XML as a tree.
Example: Creating XML using DOM in PHP
$dom = new DOMDocument("1.0", "UTF-8");
$root = $dom->createElement("students");
$dom->appendChild($root);

$student = $dom->createElement("student");
$student->setAttribute("id", "101");
$name = $dom->createElement("name", "John");
$student->appendChild($name);

$root->appendChild($student);
echo $dom->saveXML();

9. Short Note: Converting SimpleXML and DOM Objects


SimpleXML and DOM can be converted into each other.
Convert SimpleXML to DOM:
$xml = simplexml_load_file("data.xml");
$dom = dom_import_simplexml($xml);
Convert DOM to SimpleXML:
$simplexml = simplexml_import_dom($dom);

10. Describe SimpleXML with an example.


SimpleXML provides an easy way to parse XML.
Example:
$xmlString = '<person><name>John</name></person>';
$xml = simplexml_load_string($xmlString);
echo $xml->name;

11. How to create a new XML parser? Explain with an example.


XML Parser can be created using xml_parser_create().
Example:
function startElement($parser, $name, $attrs) {
echo "Start: $name\n";
}

$parser = xml_parser_create();
xml_set_element_handler($parser, "startElement", NULL);

$xml = "<note><to>John</to></note>";
xml_parse($parser, $xml);
xml_parser_free($parser);

12. Applications of XML


XML is widely used in various fields:
1. Web Development: Storing and exchanging data (e.g., RSS
feeds).
2. Configuration Files: Used in software settings (config.xml).
3. Data Storage: Database alternatives for lightweight storage.
4. Web Services: Used in SOAP-based APIs.
5. Document Markup: Used in XHTML and SVG graphics.
6. Communication between systems: Used in IoT and enterprise
applications.
Chp1: Introduction to Data Analytics

Here are the answers to your short-answer questions:


1. What is data science?
Data science is an interdisciplinary field that combines statistics,
machine learning, programming, and domain knowledge to
extract insights and knowledge from structured and
unstructured data.
2. Define the term analytics.
Analytics is the process of analysing data to discover meaningful
patterns, trends, and insights that help in decision-making.
3. Enlist types of data analytics.
o Descriptive Analytics
o Diagnostic Analytics
o Predictive Analytics
o Prescriptive Analytics
4. Define data analysis.
Data analysis is the process of inspecting, cleaning,
transforming, and modeling data to uncover useful insights and
support decision-making.
5. Define mathematical model.
A mathematical model is a representation of a system or
process using mathematical concepts and equations to analyse,
predict, or optimize outcomes.
6. What is the purpose of diagnostic analytics?
Diagnostic analytics helps to determine the cause of past events
by analysing historical data and identifying patterns and
relationships.
7. Define class imbalance.
Class imbalance occurs when one class in a dataset has
significantly more instances than another, which can affect the
performance of machine learning models.
8. Differentiate between predictive analytics and prescriptive
analytics (Any two points).
Feature Predictive Analytics Prescriptive Analytics

Forecasts future trends Suggests actions to achieve


Purpose
based on historical data desired outcomes

Predicting customer Recommending strategies


Example
churn to reduce churn
9. Define exploratory analysis.
Exploratory analysis involves summarizing and visualizing data
to identify patterns, trends, and relationships before applying
formal statistical models.
10. Define linear model.
A linear model is a mathematical model that assumes a linear
relationship between input variables (features) and the target
variable.
11. What is model evaluation?
Model evaluation is the process of assessing a machine learning
model's performance using metrics like accuracy, precision,
recall, and F1-score.
12. Define predictive analytics.
Predictive analytics uses statistical techniques and machine
learning to analyse past data and make future predictions.
13. What is the purpose of AUC and ROC curves?
o AUC (Area Under Curve): Measures the overall
performance of a classification model.
o ROC (Receiver Operating Characteristic) Curve: Shows the
trade-off between true positive rate (sensitivity) and false
positive rate.
14. Define baseline model.
A baseline model is a simple model used as a reference to
compare the performance of more complex machine learning
models.
15. Define descriptive analytics.
Descriptive analytics focuses on summarizing historical data to
understand trends and patterns using statistical methods and
visualizations.
16. Define the terms metric and classifier.
o Metric: A quantitative measure used to evaluate a
machine learning model (e.g., accuracy, precision).
o Classifier: A machine learning algorithm that categorizes
data into predefined classes (e.g., Decision Tree, SVM).
Here are the answers to your long-answer questions:

1. Define data science. What is its purpose? Explain in detail.


Definition:
Data Science is an interdisciplinary field that combines statistics,
machine learning, data analysis, and domain expertise to extract
meaningful insights from structured and unstructured data.
Purpose:
• Identify patterns and trends in data.
• Make data-driven decisions.
• Build predictive and prescriptive models.
• Automate decision-making processes.
• Improve efficiency and productivity across various industries.

2. What is data analytics? Enlist its different roles. Also, state its
advantages and disadvantages.
Definition:
Data Analytics is the process of analysing raw data to identify
patterns, trends, and useful information.
Roles in Data Analytics:
• Data Engineer – Prepares and processes data.
• Data Analyst – Examines data to generate insights.
• Data Scientist – Develops predictive and machine learning
models.
• Business Analyst – Uses data for business strategy and
decision-making.
Advantages:
Improves decision-making.
Helps detect fraud and anomalies.
Enhances customer experience.
Disadvantages:
Requires high computational resources.
Can lead to biased results if not handled properly.

3. With the help of a diagram, describe the lifecycle of data


analytics.
The Data Analytics Lifecycle includes the following stages:
1. Problem Definition: Identify the business problem and
objectives.
2. Data Collection: Gather raw data from various sources.
3. Data Cleaning: Remove errors and inconsistencies in data.
4. Data Exploration & Analysis: Identify patterns and relationships
in data.
5. Model Building: Apply machine learning algorithms to create
predictive models.
6. Model Evaluation: Assess model performance using metrics.
7. Deployment & Maintenance: Implement the model in a real-
world environment.
Diagram:
[Problem Definition] → [Data Collection] → [Data Cleaning] →
[Exploratory Analysis] → [Model Building] → [Model Evaluation] →
[Deployment]
4. Explain four layers in the data analytics framework
diagrammatically.
The Data Analytics Framework consists of four layers:
1. Data Layer: Raw data from sources (databases, APIs, IoT
devices).
2. Processing Layer: Data cleaning, transformation, and storage.
3. Analytics Layer: Application of algorithms, statistics, and AI.
4. Presentation Layer: Visualizing insights using dashboards and
reports.
Diagram:
[Data Layer] → [Processing Layer] → [Analytics Layer] →
[Presentation Layer]

5. Differentiate between data analysis and data analytics.


Feature Data Analysis Data Analytics

Examines raw data for Uses tools and models to predict


Definition
insights future trends

Approach Descriptive Predictive & Prescriptive

Example Summarizing sales data Forecasting future sales

6. What are the types of data analytics? Describe two of them in


detail.
• Descriptive Analytics – Summarizes historical data.
• Diagnostic Analytics – Explains why something happened.
• Predictive Analytics – Forecasts future trends.
• Prescriptive Analytics – Suggests actions to improve outcomes.
Example of Predictive Analytics:
Using past customer behavior to predict future purchases.
Example of Prescriptive Analytics:
Recommending strategies to increase sales.

7. What is prescriptive analytics? Explain in detail.


Prescriptive analytics uses AI, machine learning, and statistical
models to recommend actions based on data.
Example:
An e-commerce platform suggesting product discounts to increase
sales.

8. What is exploratory analytics? What is its purpose? Explain with


an example.
Exploratory analytics identifies hidden patterns and relationships in
data.
Example:
Analysing customer purchase history to group similar buyers.

9. Short Note: Mechanistic Analytics


Mechanistic analytics focuses on cause-and-effect relationships in
data, often used in scientific research.
10. What is a mathematical model? List its types. Explain two in
detail.
Definition:
A mathematical model represents real-world systems using
mathematical formulas.
Types:
• Linear Models
• Non-Linear Models
• Statistical Models
• Machine Learning Models
Example:
Linear Regression predicts sales based on advertising spend.

11. What is a linear and non-linear model? Compare them.


Feature Linear Model Non-Linear Model

Relationship Straight-line relationship Complex curves

Complexity Simple Advanced

Example Linear Regression Neural Networks

12. What is a baseline model? Enlist two in detail.


A baseline model serves as a simple benchmark for model
comparison.
Examples:
• Mean Model: Predicts using the average value.
• Random Model: Assigns random outputs as predictions.
13. How to evaluate a model? Describe in detail.
Steps:
1. Choose evaluation metrics (accuracy, precision, recall, etc.).
2. Use a test dataset to assess performance.
3. Compare results with baseline models.
4. Tune parameters for better accuracy.

14. Short Note: Metrics for Evaluating Classifiers


Common metrics for evaluating classification models:
• Accuracy: Correct predictions vs. total predictions.
• Precision: True positives vs. predicted positives.
• Recall: True positives vs. actual positives.
• F1-Score: Balance between precision and recall.

15. What is a confusion matrix? How to use it in data analytics?


A confusion matrix is a table showing true positives, false positives,
true negatives, and false negatives in classification models.
Example:
Actual / Predicted Positive Negative

Positive TP FN

Negative FP TN

16. Define accuracy, precision, recall, and F-score.


• Accuracy: (TP + TN) / Total Samples
• Precision: TP / (TP + FP)
• Recall: TP / (TP + FN)
• F-score: 2 * (Precision * Recall) / (Precision + Recall)

17. What is ROC curve? How to implement it? Explain with an


example.
The ROC (Receiver Operating Characteristic) curve plots the true
positive rate vs. false positive rate for classification models.
Implementation in Python:
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
fpr, tpr, _ = roc_curve(y_true, y_scores)

plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()

18. What is class imbalance? Describe in detail.


Class imbalance occurs when one class in a dataset has significantly
fewer instances than another. This can lead to biased model
predictions.
Example:
Fraud detection datasets often have 99% legitimate transactions and
1% fraud cases.
Solutions:
• Resampling: Oversampling minority class or under sampling
majority class.
• Weighted Loss Functions: Assigning higher penalties to
misclassified minority classes.

19. Short Note: Evaluating Value Prediction Models


Value prediction models estimate numerical outputs (e.g., stock
prices).
Common metrics:
• Mean Absolute Error (MAE)
• Mean Squared Error (MSE)
• Root Mean Squared Error (RMSE)
Chp2: Machine Learning Overview

Here are the answers to your short-answer questions:

1. Define machine learning.


Machine learning is a branch of artificial intelligence that
enables computers to learn patterns from data and make
predictions or decisions without being explicitly programmed.
2. Define deep learning.
Deep learning is a subset of machine learning that uses artificial
neural networks with multiple layers to model complex patterns
and relationships in data.
3. List types of machine learning.
o Supervised Learning
o Unsupervised Learning
o Reinforcement Learning
o Semi-Supervised Learning
4. Enlist three parameters for machine learning.
o Learning Rate
o Number of Iterations (Epochs)
o Model Complexity
5. Define classification and regression.
o Classification: A supervised learning task that assigns data
to predefined categories (e.g., spam vs. non-spam emails).
o Regression: A supervised learning task that predicts
continuous values (e.g., predicting house prices).
6. Define reinforcement machine learning.
Reinforcement learning is a type of machine learning where an
agent learns by interacting with an environment and receiving
rewards or penalties.
7. State any two uses of machine learning.
o Fraud detection in banking.
o Recommendation systems (Netflix, Amazon).
8. Define Neural Networks (NNs).
Neural networks are computational models inspired by the
human brain, consisting of interconnected layers of artificial
neurons that process and learn from data.
9. Define Artificial Intelligence (AI).
AI is the simulation of human intelligence in machines, enabling
them to perform tasks such as problem-solving, learning, and
decision-making.
10. List AI applications. Any two.
• Autonomous vehicles (self-driving cars).
• Chatbots and virtual assistants (e.g., Siri, Alexa).
11. Define model.
A model is a mathematical or computational representation
that learns patterns from data and makes predictions or
classifications.
12. Define supervised machine learning.
Supervised learning is a type of machine learning where the
model is trained on labeled data (i.e., data with known
outputs).
13. Give the purpose of k-NN algorithm.
The k-Nearest Neighbors (k-NN) algorithm is used for
classification and regression by finding the k closest data points
to make a decision.
14. Define decision tree.
A decision tree is a machine learning model that splits data into
branches based on decision rules to classify or predict
outcomes.
15. What is the purpose of SVM?
Support Vector Machine (SVM) is used for classification and
regression by finding the optimal hyperplane that separates
different classes.
16. Give the use of Naïve Bayes.
Naïve Bayes is used for text classification, spam filtering, and
sentiment analysis due to its simplicity and efficiency.
17. Define unsupervised machine learning.
Unsupervised learning is a type of machine learning where the
model learns patterns and structures from unlabeled data.
18. Define clustering.
Clustering is an unsupervised learning technique that groups
similar data points together.
19. Define association rule mining.
Association rule mining discovers relationships and patterns
between variables in large datasets (e.g., market basket
analysis).
20. What is the purpose of Apriori algorithm?
The Apriori algorithm is used in association rule mining to find
frequent itemsets and generate strong association rules.
21. Define anomaly detection.
Anomaly detection is the process of identifying unusual or
unexpected data points that deviate from normal patterns.
22. Differentiate between supervised and unsupervised
machine learning.
| Feature | Supervised Learning | Unsupervised Learning | |----
-----|---------------------|----------------------| | Data | Labeled data |
Unlabeled data | | Purpose | Classification & regression |
Clustering & pattern discovery |
23. Define semi-supervised machine learning.
Semi-supervised learning is a mix of supervised and
unsupervised learning, where a small amount of labeled data is
used alongside a large amount of unlabeled data.
24. Define regression analysis.
Regression analysis is a statistical method used to model
relationships between dependent and independent variables.
25. Define regression model.
A regression model predicts continuous values based on input
features.
26. What is logistic regression?
Logistic regression is a classification algorithm that predicts the
probability of a categorical outcome using a logistic function.
27. Define linear regression.
Linear regression is a statistical method that models the
relationship between a dependent variable and one or more
independent variables using a straight-line equation.
28. Define polynomial regression.
Polynomial regression is an extension of linear regression
where the relationship between variables is modeled as an nth-
degree polynomial.
29. List ensemble techniques.
• Bagging (e.g., Random Forest)
• Boosting (e.g., AdaBoost, XGBoost)
• Stacking
30. Define classification.
Classification is a supervised learning technique that assigns
data points to predefined categories.
31. Define cluster and clustering.
• Cluster: A group of similar data points.
• Clustering: The process of grouping similar data points
together.
32. Enlist types of clustering.
• Hierarchical Clustering
• K-Means Clustering
• DBSCAN (Density-Based Spatial Clustering of Applications with
Noise)
33. Give the long form of DBSCAN.
Density-Based Spatial Clustering of Applications with Noise.
34. Define SOM.
Self-Organizing Map (SOM) is an unsupervised neural network
that reduces the dimensionality of data while preserving
topological relationships.
Here are some detailed answers:

1. What is machine learning? State its advantages and


disadvantages. Also list its various applications.
Definition:
Machine learning (ML) is a subset of artificial intelligence (AI) that
enables systems to automatically learn from data and improve
performance without being explicitly programmed.
Advantages of Machine Learning:
• Automation – Reduces human effort in decision-making.
• Improved Accuracy – Learns from large amounts of data to
improve predictions.
• Scalability – Can handle large-scale data efficiently.
• Continuous Improvement – Learns from new data dynamically.
• Application Variety – Used in multiple domains (healthcare,
finance, marketing, etc.).
Disadvantages of Machine Learning:
• Data Dependency – Requires a large amount of high-quality
data.
• Computational Cost – Requires high computational power and
memory.
• Interpretability – Some models, like deep learning, act as "black
boxes" and lack transparency.
• Bias & Fairness Issues – Can reinforce biases present in the
training data.
Applications of Machine Learning:
• Healthcare: Disease prediction, drug discovery.
• Finance: Fraud detection, stock market prediction.
• Retail: Customer segmentation, product recommendation.
• Autonomous Systems: Self-driving cars, robotics.
• Natural Language Processing (NLP): Chatbots, speech
recognition.

2. What is deep learning? How does it work? Explain


diagrammatically.
Definition:
Deep learning is a subset of machine learning that uses neural
networks with multiple layers (deep neural networks) to learn
complex patterns in data.
How It Works:
• Input Layer: Takes in raw data (e.g., images, text).
• Hidden Layers: Multiple layers of artificial neurons extract and
transform features.
• Output Layer: Provides the final prediction or classification
result.
Example Diagram of a Neural Network:
Input Layer → Hidden Layers → Output Layer
[Features] → [Neurons Processing Data] → [Predicted Output]
(Imagine a simple feedforward neural network with multiple hidden
layers).
Key Deep Learning Architectures:
• Convolutional Neural Networks (CNNs) – Image processing.
• Recurrent Neural Networks (RNNs) – Sequential data (e.g.,
speech, text).
• Transformer Models – Advanced NLP tasks (e.g., ChatGPT).

3. What is AI? What is its purpose? State its advantages and


disadvantages.
Definition:
Artificial Intelligence (AI) refers to the development of computer
systems that can perform tasks requiring human intelligence, such as
problem-solving, learning, reasoning, and perception.
Purpose of AI:
• Automate repetitive tasks.
• Enhance decision-making.
• Improve efficiency in industries like healthcare, finance, and
manufacturing.
Advantages of AI:
• Automation: Reduces the need for human intervention.
• Data Processing: Handles vast amounts of data efficiently.
• Accuracy: AI models can outperform humans in specific tasks.
• Personalization: Used in recommendation systems (Netflix,
Amazon).
Disadvantages of AI:
• Job Displacement: Can replace human jobs in some industries.
• Bias and Ethics Issues: AI models can reflect human biases.
• High Costs: Requires substantial computational power.
4. Relationship Between AI, ML, and DL (with Diagram)
AI → ML → DL
Artificial Intelligence ⊃ Machine Learning ⊃ Deep Learning
• AI is the broadest concept, covering any machine that exhibits
human-like intelligence.
• ML is a subset of AI that learns from data.
• DL is a subset of ML that uses deep neural networks.

10. What are the types of machine learning? Compare them.


Type Description Example

Supervised Email spam


Learns from labeled data.
Learning detection

Unsupervised Finds patterns in unlabeled Customer


Learning data. segmentation

Reinforcement Learns by trial and error using


Self-driving cars
Learning rewards.

Uses a small amount of labeled


Semi-Supervised Medical
data with a large amount of
Learning diagnosis
unlabeled data.

16. Describe unsupervised learning with a diagram, advantages, and


disadvantages.
Definition:
Unsupervised learning is a machine learning approach where the
model identifies patterns in data without explicit labels.
Advantages:
• Can find hidden patterns and relationships.
• Works well for clustering and anomaly detection.
Disadvantages:
• Harder to evaluate than supervised learning.
• Requires human interpretation of results.
Example Diagram:
Input Data → Clustering Algorithm → Grouped Clusters

17. k-Means Clustering Algorithm (with Example)


Definition:
k-Means is an unsupervised clustering algorithm that partitions data
into k clusters based on similarity.
Example:
Consider customer segmentation for an e-commerce business.
• k-Means groups customers based on purchasing behavior.
• Each cluster represents a group with similar shopping habits.
Steps of k-Means:
1. Select k cluster centers randomly.
2. Assign each data point to the nearest cluster.
3. Recalculate cluster centroids.
4. Repeat until convergence.
30. Reinforcement Learning (with Diagram)
Definition:
Reinforcement Learning (RL) is a type of machine learning where an
agent learns by interacting with an environment and receiving
rewards for desirable actions.
Example Use Cases:
• Gaming: AI learns to play chess.
• Robotics: Self-learning robots.
• Autonomous Vehicles: AI learns driving behavior.
Reinforcement Learning Process (Diagram):
Agent → Action → Environment → Reward → Update Policy →
Repeat

31. Difference Between Supervised, Unsupervised, Semi-


Supervised, and Reinforcement Learning
Semi-
Supervised Unsupervised Reinforcement
Feature Supervised
Learning Learning Learning
Learning

Small labeled
Environment-
Data Labeled Unlabeled + Large
based
unlabeled

Maximize
Predict Find hidden Learn with
Goal cumulative
outcomes patterns minimal labels
reward
Semi-
Supervised Unsupervised Reinforcement
Feature Supervised
Learning Learning Learning
Learning

Spam Customer Medical image


Example Self-driving cars
detection segmentation classification

5. Learning Models for Algorithms


Learning models define how an algorithm learns from data. The
primary learning models include:
• Supervised Learning – Learns from labeled data. Example:
Linear Regression.
• Unsupervised Learning – Learns from unlabeled data. Example:
k-Means Clustering.
• Reinforcement Learning – Learns by interacting with an
environment. Example: Q-Learning.
• Semi-Supervised Learning – Uses a small amount of labeled
data with a large set of unlabeled data. Example: Self-training
classifiers.

6. Applications of Machine Learning in Data Science


Machine learning is widely used in data science for:
• Healthcare: Disease prediction, drug discovery.
• Finance: Fraud detection, credit scoring.
• Marketing: Customer segmentation, recommendation systems.
• Retail: Demand forecasting, inventory optimization.
• Autonomous Systems: Self-driving cars, robotics.
• Natural Language Processing (NLP): Chatbots, speech
recognition.

7. Machine Learning Model (with Diagram)


A machine learning model is a mathematical representation that
makes predictions based on input data.
Basic Workflow of an ML Model:
Input Data → Data Preprocessing → Feature Engineering → Model
Training → Model Evaluation → Predictions
Diagram:
Raw Data → Preprocessing → Feature Selection → Model Training →
Evaluation → Prediction

8. Model Selection and Feature Engineering


• Model Selection:
o Choose based on the problem type (classification,
regression, clustering).
o Consider accuracy, interpretability, and computational
efficiency.
• Feature Engineering:
o Selecting relevant features.
o Handling missing values.
o Scaling and normalization.

9. Training and Validating a Model


Steps to Train a Model:
1. Split Data: Train-test split (e.g., 80%-20%).
2. Train the Model: Fit data to the selected algorithm.
3. Hyperparameter Tuning: Optimize settings like learning rate,
number of layers, etc.
4. Validation: Use techniques like cross-validation to check
performance.
Validation Techniques:
• Holdout Validation: Split into training and test sets.
• K-Fold Cross-Validation: Divide data into k subsets and train
multiple times.
• Leave-One-Out Cross-Validation: Use one sample for testing,
rest for training.

11. Supervised Learning


• Definition: Learning from labeled data.
• How It Works: The model maps inputs to outputs using a
training dataset.
• Examples:
o Regression (e.g., predicting house prices).
o Classification (e.g., spam detection).
• Advantages: High accuracy, interpretable.
• Disadvantages: Requires labeled data, risk of overfitting.

12. k-Nearest Neighbors (k-NN) Algorithm


• Definition: A simple classification algorithm that assigns a class
based on the majority of its k nearest neighbors.
• How It Works:
1. Select the number of neighbors (k).
2. Measure distances (e.g., Euclidean distance).
3. Classify based on majority voting.
• Advantages: Easy to implement.
• Disadvantages: Slow for large datasets.

13. Decision Tree Algorithm


• Definition: A tree-like model that splits data into branches
based on feature values.
• How It Works:
o Each node represents a decision.
o Splitting continues until a stopping criterion is met.
• Advantages: Easy to interpret.
• Disadvantages: Prone to overfitting.

14. Support Vector Machine (SVM) (with Diagram)


• Definition: A classification algorithm that finds the best
hyperplane to separate data points.
• How It Works:
o Finds the optimal decision boundary.
o Uses support vectors (key data points) to maximize the
margin.
• Example Diagram:
Class A (●●●) ---- SVM ---- Class B (▲▲▲)

15. Naïve Bayes Algorithm


• Definition: A probabilistic classifier based on Bayes’ Theorem.
• Uses: Spam filtering, sentiment analysis.
• Advantages: Works well with small datasets.
• Disadvantages: Assumes independence of features.

18. Association Rule Mining (with Example)


• Definition: A method to find relationships between variables in
large datasets.
• Example: Market Basket Analysis.
o If a customer buys "Milk" and "Bread," they are likely to
buy "Butter."
o Rule: Milk, Bread → Butter

19. Polynomial Regression (with Diagram)


• Definition: A regression model that fits data using polynomial
functions.
• Equation: y=a0+a1x+a2x2+...+anxny = a_0 + a_1x + a_2x^2 + ...
+ a_nx^ny=a0+a1x+a2x2+...+anxn
• Example Diagram:
Linear Model (Straight Line)
Polynomial Model (Curved Fit)

20. Semi-Supervised Learning (with Diagram)


• Definition: Uses both labeled and unlabeled data for training.
• Example Use Case: Medical image classification (few labeled
images).

21. Regression Model and Linear Regression (with Diagram)


• Regression Model: Predicts continuous values.
• Linear Regression Equation: y=mx+cy = mx + cy=mx+c
• Diagram:
Scatter Plot → Best Fit Line

22. Logistic Regression (with Assumptions)


• Definition: A classification algorithm used for binary outcomes.
• Equation: P(Y=1)=11+e−(b0+b1x)P(Y=1) = \frac{1}{1 + e^{-(b_0
+ b_1x)}}P(Y=1)=1+e−(b0+b1x)1
• Assumptions:
o No multicollinearity.
o Large sample size.

23. Ensemble Techniques (Short Note)


• Definition: Combines multiple models to improve accuracy.
• Types:
o Bagging: Random Forest.
o Boosting: XGBoost, AdaBoost.

24. Classification (with Example)


• Definition: Categorizes data into predefined labels.
• Example: Spam detection (Spam vs. Not Spam).
• Techniques:
o Decision Trees.
o SVM.
o Naïve Bayes.

25. Random Forest (with Diagram)


• Definition: An ensemble of decision trees.
• Diagram:
Multiple Decision Trees → Majority Voting → Final Output

26. Clustering (with Example)


• Definition: Groups similar data points together.
• Example: Customer segmentation.

27. Clustering Techniques


• Types:
o Hierarchical Clustering – Forms a tree structure.
o k-Means Clustering – Divides into k clusters.

28. DBSCAN Clustering (with Example)


• Definition: A density-based clustering algorithm.
• Example: Detecting anomalies in financial transactions.

29. Self-Organizing Map (SOM) (with Example)


• Definition: A type of artificial neural network for clustering.
• Example: Visualizing high-dimensional data.

32. Predicting New Observations


• Definition: Using trained models to predict outcomes for
unseen data.
• Process:
o Input new data.
o Apply trained model.
o Generate predictions.

You might also like