AI PROJECT CYCLE SHORT NOTES
The AI project cycle is a structured process that guides the development and implementation of AI solutions. Here’s a concise
overview of each phase within the AI project cycle:
1. Problem scoping
*Goal*: Clearly define the problem to be solved and gather requirements.
Activities: Analyze case studies, engage in stakeholder discussions, and refine problem statements.
Identify 4 critical parameters using 4Ws Problem Canvas to solve.
2. Data Acquisition
*Goal*: Collect and prepare data for analysis.
Activities: Source data from APIs, surveys, web scrapping, sensors, cameras; clean and preprocess data to ensure quality.
Only public data should be acquired that is available in open-source websites or government portals. General Data
Protection Right (GDPR) must not be violated as it's punishable offense. Example of such website: data.gov.in,
india.gov.in, etc.
3. Data exploration
Goal: Explore and visualize data to uncover patterns and insights.
Activities: Perform statistical analysis, create visualizations , and identify key variables.
Different types of graphs can be used: Line graph, Bar Graph, Pie chart etc.
4. Data Modelling
*Goal*: Develop and train machine learning models.
*Activities*: Implement algorithms, optimize hyperparameters, and compare model performance.
A. RULE BASED APPROACH
Refers to the Al modelling where the rules are defined by the developer. The machine follows the rules or instructions
mentioned by the developer and performs its task accordingly.
Example:
Step 1: Train system with training data fed into the system.
[Dataset containing 1000 images of onions and carrots with labels]
Step 2: Feed a testing data [Say one image of onion]
Step 3: Compare training data with testing data as per rules [Compare image of onion with all
others]
Step 4: Identify the correct output [Determine its onion]
Advantage:
The algorithms are simple and easy to implement Number of data required is limited. Hence training machine
is easy.
Limitation:
This learning is static. The machine once trained, does not take into consideration any changes made in the
original training dataset.
B.LEARNING BASED APPROACH
In Leaming based AI model, the machine gets trained on the data fed to it and then can design a model, adaptive to
the change in data. Implementation through: Classification of images, used in Computer Vision.
Advantage:
It is a dynamic model. If the model is trained with a type of data and the machine designs the algorithm around
it. The model would adjust itself according to the changes in the data to handle the exceptions.
Disadvantage:
Huge amount quality of data is required for training the machine. Large storage and efficient algorithm is
required. It is expensive and time taking to implement.
Example:
Step 1: Random data is fed into the system. [10,000 images of people in a city]
Step 2: Machine analyses data. [to identify sick and healthy people]
Step 3: System tries to extract similar features. Algorithm needs to derive relationship. [Identify facial
expressions and emotions]
Step 4: Cluster same data together. [Form group with identical facial expressions]
Step 5: Output is the broad trends observed in the data set. [Identify whether a
given picture belong to a sick or healthy person]
Supervised Learning
In a supervised learning model, the dataset which is fed to the machine is labelled.
A label is some information which can be used as a tag for data.
i. Regression:
Algorithm generates a mapping function from the given data, represented by a line. It helps to predict or
forecast future data. Regression works with continuous data.
e.g. - Prediction of marks in the next exam based on historical data.
ii. Classification:
Algorithm classifies the data according to the labels and sorted as per labelling is done. It works on discrete data
sets.
e.g. - Classify image of men and women where numerous images of men and women in different structures and
formats are fed as training data.
Unsupervised Learning
An unsupervised learning model works on unlabelled dataset. Data fed to the machine is random. The
unsupervised learning models are used to identify relationships, patterns, and trends out of the training data. It
helps the user in understanding what the data is about and what are the major features identified by the machine
in it.
Example
A random data of 1000 dog images are fed into the system and some pattern can be found out of it, like colour,
size of dogs etc.
i. Clustering:
This unsupervised learning algorithm can cluster the unknown data according to the patterns or trends identified
out of it. Cluster works in random, unlabelled, and discrete data sets. The patterns observed might be ones which
are known to the developer or it might even come up with some unique patterns out of it.
ii.Dimensionality Reduction:
Human beings are able to visualise upto 3-Dimensions only. But according to lot of theories and algorithms,
there are various entities which exist beyond 3- Dimensions. Dimensionality reduction algorithm is used to reduce
dimensions and still make sense out of data.
The information gets distorted with reducing dimensions. At least 50% of the information is lost after reducing
one dimension.
Supervised Learning Unsupervised Learning
1. Uses Known and Labelled Data l. Uses Unknown data as
as input input
2 Less Computational 11. More Computational
Complexity Complexity
... ...
ill. Uses off-line analysis ill. Uses Real-Time Analysis
of data
IV. Accurate and Reliable lV. Moderate Accurate and
Results Reliable Results
V. Training data and testing V. Training data and testing
data is given data is not given
Vl. Not possible to learn larger and VI. It is possible to learn larger and
more complex models more complex models
Vil. Can test the model Vil. Cannot test the model
5. Data evaluation
Goal: Assess the model’s performance using evaluation metrics.
Activities: Calculate accuracy, precision, recall, and F1 scores; apply cross-validation techniques.
Accuracy: Precision:
Measures the proportion of true results (both true positives Measures the proportion of true positive predictions among
and true negatives) among the total number of cases. the total positive predictions.
Recall: F1 Score:
Measures the proportion of true positive predictions The harmonic mean of precision and recall, providing a
among the total actual positives. single metric that balances both.
Here, we can see in the picture that a forest fire has broken out in the forest. The model predicts a Yes which means there is a
forest fire. The Prediction matches with the Reality. Hence, this condition is termed as True Positive.
Here there is no fire in the forest hence the reality is No. In this case, the machine too has predicted it correctly as a No.
Therefore, this condition is termed as True Negative.
Here the reality is that there is no forest fire. But the machine has incorrectly predicted that there is a forest fire. This case is
termed as False Positive.
Here, a forest fire has broken out in the forest because of which the Reality is Yes but the machine has incorrectly predicted it
as a No which means the machine predicts that there is no Forest Fire.
Therefore, this case becomes False Negative.
5. Model Evaluation and Validation
*Goal*: Assess the model’s performance using evaluation metrics.
*Activities*: Calculate accuracy, precision, recall, and F1 scores; apply cross-validation techniques.
### 6. Deployment
*Goal*: Deploy the model into a production environment.
*Activities*: Develop APIs, use cloud platforms, and set up CI/CD pipelines for continuous
deployment.