0% found this document useful (0 votes)

14 views29 pages

Final Report

Uploaded by

sreenivasan kn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views29 pages

Final Report

Uploaded by

sreenivasan kn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JNANA SANGAMA” BELAGAUM – 590 018

DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING

A SEMINAR REPORT ON

“ Decision tree-based Design Defects Detection”

SEMINAR BY

SREENIVASAN KN
[4GM20CS106]

GUIDE
Dr. Shankarayya Shastri
Assistant Professor

CO-ORDINATOR HEAD OF THE DEPARTMENT

Dr. RACHANA P G Dr. B N VEERAPPA
Assistant Professor Professor

Srishyla Educational Trust (R), Bheemasamudra

GM INSTITUTE OF TECHNOLOGY, DAVANGERE

(Affiliated to VTU Belagavi, Approved by AICTE, New Delhi & Govt. of Karnataka)
Phone: 08192-252560,233377, 252777, Tel/Fax: 08192 233344
Srishyla Educational Trust ®, Bheemasamudra
GM INSTITUTE OF TECHNOLOGY
Approved by AICTE| Affiliated by VTU, Belagavi | Recognized by Govt. of Karnataka

Department of Computer Science& Engineering

CERTIFICATE

This is to certify that the Technical Seminar entitled “ Decision tree-based

Design Defects Detection” bonafide record of the work carried out by
SREENIVASAN K N, 4GM20CS106, under our guidance and supervision at the
Department of Computer Science & Engineering, GM Institute of Technology,
Davangere, in partial fulfilment of the requirement for the award of degree of Bachelor
Of Engineering in Computer Science of the Visvesvaraya Technological University,
Belgaum during the academic year 2023– 2024.

GUIDE COORDINATOR

------------------------------ -------------------------------
Dr. Shankarayya Shastri Dr. Rachana P G

HEAD OF THE DEPARTMENT

------------------------------
Dr. B N Veerappa
ABSTRACT

Design defects affect project quality and hinder development and maintenance.
Consequently, experts need to minimize these defects in software systems. A promising
approach is to apply the concepts of refactoring at higher level of abstraction based on UML
diagrams instead of code level. Unfortunately, I find in literature many defects that are
described textually and there is no consensus on how to decide if a particular design violates
model quality. Defects could be quantified as metrics based rules that represent a
combination of software metrics. However, it is difficult to find manually the best threshold
values for these metrics. In this paper, I propose a new approach to identify design defects
at the model level using the ID3 decision tree algorithm. I aim to create a decision tree for
each defect. I experimented mine approach on four design defects: The Blob, Data class,
Lazy class and Feature Envy defect, using 15 Object-Oriented metrics. The rules generated
using decision tree give a very promising detection results for the four open source projects
tested in this paper. In Lucene 1.4 project, we found that the precision is 67% for a recall
of 100%. In general, the accuracy varies from 49%, reaching for Lucene 1.4 project 80%.

i
ACKNOWLEDGEMENT
First and the foremost, I take this opportunity to express my deep sense of gratitude
to the principal Dr. Sanjay Pande M B for his guidance and encouragement throughout
the technical seminar.

I am highly grateful to Dr. B N Veerappa, Professor and Head, Dept.of CS&E,

GMIT, Davanagere, for his kind support, guidance and encouragement throughout the
technical seminar.

I take this opportunity to express my deep sense of gratitude to the Technical

Seminar coordinator Dr. Rachana P G, Asst. Professor, Dept. of CS&E, GMIT,
Davanagere, for her kind support, guidance and encouragement throughout the technical
seminar.

I take this opportunity to express my deep sense of gratitude to my project guide

Dr. Shankarayya Shastri, Asst. Professor, Dept. of CS&E, GMIT, Davanagere, for her
kind support and guidance.

I take this opportunity to express my deep sense of gratitude to GMIT for providing
me an opportunity to carry out Technical Seminar. I would also like to thank all the
teaching and non-teaching staff of Dept. of CS&E for their kind co-operation during
theTechnical Seminar. The support provided by the College and Departmental library is
gratefully acknowledged.

Finally, I’m thankful to my parents and friends, who helped us in one way or the
other throughout this Technical Seminar.

SREENIVASAN KN

[4GM20CS106]

ii
LIST OF FIGURES

Fig.no Description Page.no

3.2.1 Cut Point Example 09

3.2.2 Data class metrics discrete values 10

3.3.1 Final decision tree 12

3.4.1 the variation of recall and precision depending on N. 13

5.1 Precisions and recall results 15

6.1 Variation of recall and precision 16

6.2 Variation of F1 score. 16

iii
LIST OF TABLES

Table.no Description Page.no

3.2.1 Discretization of metrics values 10

4.1 Data class metrics 14

5.1 The number of detection for each project and each defect 15

6.1 F1 score 17

iv
CONTENTS

DESCRIPTION PAGE NO.

ABSTRACT i

ACKNOWLEDGEMENT ii

LIST OF FIGURES iii

LIST OF TABLES iv

CHAPTER 1

INTRODUCTION 1

1.1 Brief on Topic 1

1.2 Objectives 2
1.3 Advantages 3
1.4 Disadvantages 4

CHAPTER 2

LITERATURE SURVEY 5

CHAPTER 3

ALGORITHM 8

CHAPTER 4

DATASET USED 14

CHAPTER 5

EXPERIMENTAL RESULT 15

CHAPTER 6

RESULTS AND DISCUSSION 16

APPLICATIONS 18

CHAPTER 7

CONCLUSION 19
REFERENCES 20
CHAPTER 1

INTRODUCTION
Decision tree-based Design Defects Detection is a data-driven approach used to
identify and diagnose design defects in software systems. The method involves creating a
decision tree model that represents the relationships between various factors, such as design
components, defect types, and defect severities. By analyzing the data collected from the
software development process, the model can help in detecting design defects at an early
stage and improving the overall quality of the software.

1.1 Brief on Topic

Decision tree-based design defects detection is a machine learning approach that

leverages decision tree algorithms to automatically identify design flaws in software
systems. Decision trees are a supervised learning method that partitions the feature space
into segments based on the attributes of the data, enabling the prediction of a target variable,
in this case, the presence or absence of design defects. In this context, decision tree models
are trained using datasets containing features extracted from software codebases, such as
code metrics, code smells, or other relevant characteristics indicative of potential design
flaws. The decision tree algorithm then recursively splits the data based on these features
to create a tree-like structure where each node represents a decision based on a particular
feature. By traversing the tree, the model can classify instances of software components as
either containing or lacking design defects. Decision tree-based design defects detection
offers several advantages, including interpretability, as the decision-making process is
transparent and can be visualized, enabling developers to understand the reasoning behind
the detected defects. Additionally, decision trees can handle both numerical and categorical
data and are robust to outliers and irrelevant features. However, challenges such as
overfitting and the need for feature selection and preprocessing must be addressed to ensure
the effectiveness and generalization of the model. Overall, decision tree-based approaches
provide a powerful tool for automating the detection of design defects in software systems,
helping developers improve code quality and maintainability.

1
1.2 Objectives
• Early Detection: Identify design defects as early as possible in the software
development process to minimize their impact on the project timeline and budget.

• Accurate Diagnosis: Develop a decision tree model that accurately predicts the
likelihood of design defects based on various factors, such as design components,
defect types, and defect severities.

• Efficient Resource Allocation: Allocate resources efficiently by prioritizing the

design defect detection process based on the model's predictions, focusing on high-
risk areas and components.

• Improved Software Quality: Enhance the overall quality of the software by

identifying and resolving design defects, leading to a more reliable and robust
software system.

2
1.3 Advantages
• Data-Driven Insights: The method relies on a decision tree model that is
trained on data collected from the software development process, providing
accurate and reliable insights into the likelihood of design defects.
• Scalability: Decision tree-based design defects detection can be applied to
various software systems, regardless of their size or complexity, making it
a versatile tool for identifying defects.
• Early Warning System: The model can predict the likelihood of design
defects based on input variables, serving as an early warning system for
developers to address issues before they become critical.
• Customizable: The decision tree model can be tailored to suit specific
software systems or industries, making it a flexible solution for design
defect detection.
• Cost-Effective: Implementing decision tree-based design defects detection
can lead to cost savings by identifying and resolving design defects early in
the development process, reducing the need for costly rework and
maintenance.
• Continuous Improvement: By continuously monitoring the software system
and detecting new design defects, the method promotes a culture of
continuous improvement and ensures that high-quality software is
maintained throughout its lifecycle.

3
1.4 Disadvantages

• Overfitting: Decision trees are prone to overfitting, especially with complex

datasets or when the tree is allowed to grow too deep. This can lead to poor
generalization performance on unseen data, as the model may capture noise
or irrelevant patterns from the training data.

• High Variance: Decision trees are sensitive to small variations in the training
data, leading to high variance in the model's predictions. This can result in
instability and inconsistency in defect detection performance across
different datasets or sampling variations.

• Bias towards Features with Many Levels: Decision trees tend to favor
features with many levels or categories, potentially overlooking other
relevant but less complex features.

• Limited Expressiveness: Decision trees have limited expressiveness

compared to more complex models like neural networks or gradient
boosting machines. They may struggle to capture intricate relationships or
nonlinear patterns in the design data.

• Lack of Continuity: Decision tree boundaries are discontinuous, leading to

abrupt changes in predictions even with slight variations in input features.
This can be undesirable in certain applications where smooth transitions are
preferred.

• Difficulty Handling Imbalanced Data: Decision trees may struggle to handle

imbalanced datasets, where one class (e.g., defective designs) is
significantly underrepresented compared to the other class. This can result
in biased predictions towards the majority class.

• Sensitive to Small Changes in Data: Decision trees can produce different

structures with minor changes in the training data, leading to instability in
the model's predictions and interpretations.

4
CHAPTER 2

LITERATURE SURVEY

1. Title of the paper: UML model refactoring: A systematic literature review.

Author and Year: M.Misbhauddin and M.Alshayeb, Empirical Softw. Eng., vol. 20, no.
1, pp. 206–251, Feb. 2015.

Description: "UML model refactoring: A systematic literature review" is a comprehensive

analysis examining the practices, methodologies, and challenges associated with
refactoring Unified Modeling Language (UML) models. The study likely explores various
techniques, tools, and case studies to understand the state-of-the-art in UML model
refactoring, offering insights into best practices and areas for improvement. This review
aims to consolidate existing knowledge and provide valuable guidance for researchers and
practitioners in the field of software engineering and UML modeling.

2. Title of the paper: An empirical study to investigate different SMOTE data sampling
techniques for improving software refactoring prediction

Author and Year: R.Panigrahi, L.Kumar, and S.Kuanar, in Proc. ICONIP, 2020, pp. 23–
31.

Description: This empirical study explores the efficacy of various Synthetic Minority
Over-sampling Technique (SMOTE) data sampling methods in enhancing the accuracy of
software refactoring prediction. By employing SMOTE, the research aims to address class
imbalance issues commonly encountered in software datasets. Through experimentation
and analysis, the study investigates how different SMOTE techniques impact the predictive
performance of software refactoring models, providing insights into optimizing data
preprocessing strategies for better prediction outcomes.

5
3. Title of the paper: Classification of model refactoring approaches .
Author and Year: M.Mohamed, R.Mohamed, and G.Khaled, J. Object Technol., vol. 8,
no. 6, pp. 121–126, 2009.

Description: Classification of model refactoring approaches is a systematic

categorization and analysis of different strategies employed in refactoring models. This
study likely identifies and classifies various techniques, methodologies, and tools used in
the process of refining and improving models, such as Unified Modeling Language (UML)
diagrams. By organizing these approaches into distinct categories, the research aims to
provide a structured understanding of model refactoring practices, aiding both researchers
and practitioners in selecting appropriate techniques for their specific needs.

4. Title of the paper: Generic and domain-specific model refactoring using a model
transformation engine .
Author and Year: J.Zhang, Y.Lin, and J.Gray, in Model-Driven Softw. Develop.
Berlin, Germany: Springer, 2005 .

Description: The study uses a method called Differential Evolution to maximize profit
from farming. It builds on a previous study that considered three goals: maximizing profit,
crop output, and minimizing water use. By refining the results of the multi-objective study,
the researchers found a strategy that maximized profit, earning ZAR 1,330,000 from a
planting area of 771,000 m^2 using 704,694 m^3 of irrigation water.

5. Title of the paper: Detection strategies: Metrics-based rules for detecting design flaws
Author and Year: R.Marinescu, in Proc. 20th IEEE Int. Conf. Softw. Maintenance, Sep.
2004, pp. 350–359.

Description: Detection strategies utilizing metrics-based rules for identifying design

flaws involve establishing quantitative thresholds and patterns derived from software
metrics such as complexity, coupling, and cohesion.

6
6. Title of the paper: Defects detection technique of use case views during requirements
engineering
Author and Year: P.Tianual and A.Pohthong, in Proc. 8th Int. Conf. Softw.
Comput.Appl., Feb. 2019, pp. 277–281.

Description: The "Defects detection technique of use case views during requirements
engineering" refers to systematic approaches aimed at identifying and resolving flaws
within use case diagrams and scenarios, crucial components of requirements engineering.

7. Title of the paper: Comparing and experimenting machine learning techniques for
code smell detection .
Author and Year: F.Arcelli Fontana, M.V.Mantyla, M.Zanoni, and A.Marino,
Empirical Softw. Eng., vol. 21, no. 3, pp. 1143–1191, Jun. 2016.

Description: It involves evaluating and testing various machine learning algorithms to

identify and mitigate code smells in software repositories. This study likely compares the
effectiveness of different approaches in automatically detecting code quality issues.

8. Title of the paper: Towards proving preservation of behaviour of refactoring of UML

models.
Author and Year: M.V.Kempen , C.Michel , K.Derrick , and B.Andrew , Proc. SAICSIT,
2005, p. 252.

Description: The question seems to be about the preservation of the behavior of UML
models during refactoring. The description provided in various knowledge bases and
articles suggests that behavior preservation is a crucial aspect of refactoring UML models.

Here we delve into an extensive exploration of existing research and scholarly

works related to this topic. The literature review serves as a foundation for study, providing
insights, context, and a comprehensive understanding of the current state of knowledge in
the field.

7
CHAPTER 3
ALGORITHM
A decision tree is a graphical representation of potential solutions to a decision,
based on conditional control statements, and is commonly used in supervised machine
learning. It aims to classify data features into homogeneous groups, with each branch
representing a possible decision. The ID3 algorithm is often employed to construct such
trees, where each non-leaf node corresponds to an input metric, and each arc represents a
possible value of that metric. The algorithm starts by selecting the most informative metrics
and evaluates metric information using Shannon Entropy. Once the tree is built, designers
can filter the extracted rules based on a heuristic value N, which determines the minimal
number of metrics in rule detection. This value balances between over-detection and under-
detection of defects, with small values leading to more false positives and high values
leading to more false negatives.

3.1 Adaptation of Decision Tree to Design Defect Detection

For a defect D: ‘‘IF metric1 is higher/lower than threshold1 AND metric2
higher/lower than threshold 2 . . . . AND metric n higher/lower than threshold n THEN
defect D is suspected’’.ID3 is a recursive algorithm; lines 1-3 in Figure 3.1 encode the
termination criteria. The algorithm stops when the set of examples is empty or when all
metrics are classified. For an intermediate node, ID3 measures each metric gain based on
Shannon entropy, in the line 4. ID3 chooses the metrics with the best gain as a root node;
this metric is deleted from the set of metrics. For each value of the root metrics, ID3 is
called with the rest of metrics and the new set of example related to the arc values .

8
3.2 Choosing the Best Cut Point

Figure 3.2.1 Cut Point Example

To choose the best cut point, we first discretize the metrics threshold in the set of
examples. The discretization consists of transforming values into a finite number of
intervals. After that we re-encode, each value for the selected attribute by associating it
with its corresponding interval. It is a powerful heuristic to classify a set of training
examples using the best decision tree. It is a good method to determine the most relevant
attributes for the classification task. Each metric value is compared to the cut point. The
idea is to transform the continuous interval into two intervals according to the cut point.
We illustrate how we adapted the set of examples based on Table 3.1. It represents the list
of observations in the set of example for the defect ‘‘Data Class’’, in different projects.
In this example, we have two projects P1 and P2. Let’s consider the first metrics Access
To Foreigner Data (ATFD). Metrics thresholds must be classified into two classes
depending on the existence of a defect. Indeed, we have to make a supervised discretization.
In this work, we adopted a bottom-up hierarchical clustering. Each item is placed in its own
cluster; the next partition is created by merging the two nearest clusters.
In Figure 3.2.1, we give a cut point example. Each continuous interval is divided
into three parts; first quartile, medium quartile and third quartile. We choose the cut point
intervals giving the best degree of association. We use the number of phi coefficient defined
in (1) to assess the degree of association between the two variables.

9
x2
(1) ∅ = √ N Where χ2 is derived from Pearson’s chi-squared test and N is the total of

observations.
As presented in Figure 3.2.2 , ∅2 is the best value and the best cut point is 16.5. In Table
3.2.1, we present the final discretization of the observations related to the defect ‘‘Data
Class’’.
We evaluate the phi coefficient for all metrics. Therefore, the node ATFD will have two
arcs one with values <=16.5 and the other with values > 16.5.

Figure 3.2.2 Data class metrics discrete values

TABLE 3.2.1 Discretization of metrics values

3.3 Choosing the Best Root Node

In constructing a decision tree, the challenge lies in determining the best splitting
attribute at each node, a process that can be recursive depending on the metrics involved.
The choice of root metric, for instance, involves selecting the attribute with the smallest
entropy value, indicating the greatest information gain. Entropy, as measured by Shannon
entropy, quantifies the uncertainty in a dataset, and the attribute with the lowest entropy

10
provides the most clarity in distinguishing between classes. Each branch of the tree
emanates from the most informative metric, with leaf nodes representing decisions based
on computed attributes. The path from root to leaf embodies the design defect detection
rules, with Shannon entropy serving as a guiding metric in optimizing information gain. (2)
E(BE/c) = - ∑𝑐=𝑐.𝑙 p(c) log2 p(c).
Where ,
BE is the set of examples
Cl is the set of classes in BE (Defect, Not Defect)
P(c) is the proportion of the number of defect
values in a class c to the number of elements in the set BE
In Figure 3.4, we present the decision tree for the example presented in Table 3.2 .
We measure the Shannon entropy for each metric.
E(Yes/ATFD) = − P(>16.5) × (P(Yes/>16.5) × log P(Yes/>16.5) + P(No/>16.5) × log
P(No/>16.5)) - P(<=16.5) × (P(Yes/<=16.5) × log P(Yes/<=16.5) + P(No/<=16.5) × log
P(No/<=16.5)) = 0.11
E(Yes/NOM) = 0.27
E(Yes/NOA) = 0.22
E(Yes/NC) = 0.25
The ID3 algorithm selects ATFD as the root metric due to its lowest entropy. Values
of ATFD <= 16.5 lead to a leaf node labeled "Yes," while for ATFD > 16.5, entropy is re-
evaluated for the remaining metrics, with classification stopping at the NC node. In this
example, three rules are extracted:
R1: IF ATFD <= 16.5 THEN Data class = Yes
R2:IF ATFD > 16.5 AND NOA > 11.25 THEN Data class = Yes
R3: IF ATFD > 16.5 AND NOA <= 11.25 AND NC > 251 THEN Data class = Yes.
It is clear that R1 and R2 combine a few numbers of metrics and will generate a
huge number of suspect classes. Furthermore, if the number of metrics (N) is fixed to 3
then we can extract only the rule R3, which seems to be the most appropriate rule for this
illustrative example. The final tree is shows in Figure 3.3.1.

11
FIGURE 3.3.1 Final decision tree.

3.4 Validation
The experiments assess the performance of the ID3 algorithm on detecting Lazy
Class (LC), Blob, Data Class (DC), and Feature Envy (FE) defects using precision and
recall metrics. Precision measures the correctness of defect identification, while recall
evaluates the completeness of defect detection, specifically highlighting the number of true
defects missed by the algorithm.
Precision = Detected defects ∩ Expected defects Detected defects (3)
Detected defects

Recall = Detected defects ∩ Expected defects Expected defects (4)

Expected defects

We want to validate the following assumptions:

• We detect the majority of existent defects. This assumption is validated by a high recall
value. A recall of 100 percent means that we identified all existent defects.
• We generate a reasonable number of defects. Indeed, detecting a large set of defects is
inconvenient for the validation process. Even if we detect all existent defects. We must
guaranty a good trade-off between recall and precision. In this work, we assume that a
precision over than 60 percent is acceptable regarding a recall over 80 percent.
The approach implemented via Eclipse plug-in utilizes UML class diagrams and
sequence class diagrams, with N representing the minimal number of metrics composing a

12
detection rule. Static metrics are evaluated using class diagrams, while behavioral metrics
are assessed using sequence diagrams, generating a set of suspect elements in the model
based on detection rules. The hypothesis underlying the work is that analysis relies on
complete and final design, with the quality of rule detection contingent upon the
completeness of sequence diagram models. Validation is conducted on reverse-engineered
designs of five open-source Java systems—Xerces v2.7, Argo UML 0.19.8, Lucene 1.4,
Log4j 1.2.1, and Gantt Project v1.10.2—chosen based on project size and open-source
availability. PMD 5.4.3 and Nutch 1.12 projects are utilized for creating the set of
examples, with defects manually entered for detection. Validation iterates twice, initially
with only the PMD project and subsequently incorporating the Nutch project to increase
the number of defects, enabling evaluation of the impact on detection quality.
The choice of the N value significantly impacts both recall and precision, with a
trade-off between the two metrics. Minimizing N tends to increase recall but decrease
precision, while maximizing it has the opposite effect, influenced by project size.
Optimization of N seeks to strike a balance between precision and recall, with experimental
findings suggesting a range between four and ten as optimal, depending on project
characteristics. Our experiments on three projects, with varying class counts and defect
numbers, indicate N thresholds of 4, 5, and 7 for optimal trade-offs. Although determining
the best N value is challenging, our approach provides acceptable results, with a suggested
range of 4 to 9, approximately one-third of the total number of metrics. Designers may
explore other N values if results are unsatisfactory, with computations performed
efficiently on a desktop computer within 30 seconds, excluding reverse-engineered designs.
Figure 3.4.1 provides the variation of recall and precision depending on N.

Figure 3.4.1 the variation of recall and precision depending on N.

13
CHAPTER 4
DATASET USED
Two projects P1 and P2. Let’s consider the first metrics Access To Foreigner Data
(ATFD). Metrics thresholds must be classified into two classes depending on the existence
of a defect. Indeed, we have to make a supervised discretization. In this work, we adopted
a bottomup hierarchical clustering. In table 4.1 Each item is placed in its own cluster; the
next partition is created by merging the two nearest clusters.

Table 4.1 Data class metrics

14
CHAPTER 5
EXPERIMENTAL RESULT

Figure 5.1 reports the precisions and recall values for the three executions using
deferent values of N (4, 5 and 9). The best precision values are 62%, 59%, 74%, 100%,
100% for Xerces v2.7, ArgoUML 0.19.8, Lucene 1.4, Log4j 1.2.1 and GanttProject
v1.10.2., respectively. The best recall value is 100% for all projects.

FIGURE 5.1. Precisions and recall results.

The number of detection for each project and each defect. In Table 5.1, It shows
that even if the detection rate is under 50% for ArgoUML 0.19.8, it is due to the low
detection rate for FE defect (0.38). For the defects LC, Blob and DC, the recall is higher
than 50% varying within 0.51, 0.6 and 0.73 respectively. We can conclude that the proposed
approach is able to detect the majority of defects.

TABLE 5.1 The number of detection for each project and each defect.

15
CHAPTER 6
DISCUSSION ON RESULT
The variation of the recall and precision according to the size of the set of examples.
Results show that the quality of the detection increases with the number of defects
examples. In this work, we limit the set of examples to two open source projects that give
results varying from excellent to satisfying. In Figure 6.1 However, finding the optimal size
of the set of examples needs further investigations, that will be discussed in future research.

FIGURE 6.1 Variation of recall and precision.

FIGURE 6.2 Variation of F1 score.

Results show that the use of decision tree technique is a promising way to investigate the
detection of design defects at the model level. We demonstrated that the rules generated using
decision tree give a good detection results (recall of 100% and Precision > 50%), as shown in
Figure 6.2. The accuracy rate of the detection is about 80%.
As shown in table 6.1 F1 score (5) is the weighted average of Precision and Recall. It considers
both the precision and the recall of the test to compute the score; it is a measure of the test accuracy:
Precision ∗ Recall
5) F1 = 2 ∗ Precision + Recal

16
TABLE 6.1 F1 score

Project F1 Score N
Xerces v2.7 49% 9
ArgoUML 0.19.8 52% 9
Lucene 1.4 80% 4
Log4j 1.2.1 67% 4
Ganttproject v1.10.2 64% 5

17
APPLICATIONS

1. Software Engineering: In software development, decision tree-based design defect

detection can be used to identify potential flaws in the architecture, code, or user interface
design. By analyzing features such as code complexity, coupling, cohesion, and other
design metrics, decision trees can help pinpoint areas of concern that may lead to bugs,
crashes, or security vulnerabilities.
2. Manufacturing and Engineering: Decision trees can be utilized in manufacturing and
engineering industries to detect design defects in products and processes. By analyzing data
collected from sensors, quality control checks, and historical records, decision trees can
identify patterns indicative of potential flaws in product design or manufacturing processes,
allowing for timely intervention and improvement.
3. Automotive Industry: In automotive design and manufacturing, decision trees can be
employed to detect defects in vehicle components, such as engine parts, braking systems,
or electrical circuits. By analyzing data from testing procedures, performance metrics, and
customer feedback, decision trees can help identify design flaws that may compromise
vehicle safety or performance.
4. Aerospace and Defense: Decision tree-based design defect detection can play a critical
role in the aerospace and defense sectors, where the reliability and safety of systems are
paramount. Decision trees can analyze data from simulations, tests, and operational
feedback to identify design flaws in aircraft, spacecraft, missiles, or defense systems,
enabling engineers to address potential vulnerabilities before deployment.
5. Healthcare Systems: In healthcare, decision trees can be used to detect design defects
in medical devices, diagnostic tools, or healthcare systems. By analyzing data from clinical
trials, patient records, and device performance metrics, decision trees can identify design
flaws that may impact patient safety, device effectiveness, or healthcare workflow
efficiency.
6. Consumer Electronics: Decision tree-based design defect detection can be applied in
the consumer electronics industry to identify potential flaws in product design, such as
smartphones, laptops, or home appliances.

18
CHAPTER 7
CONCLUSION
A new approach for the design defects detection at the model level. This work leads
to define a standard way for model quality quantification. We introduced an adaptation of
ID3 decision tree algorithm to identify anti-patterns and bad smells in object-oriented
design. We tested and evaluated our approach on five open source projects by measuring
the precision and the recall. We showed that the efficiency and the precision of the detection
vary from satisfying to excellent with a recall that reaches 100 percent. We proved that
using our approach we detect the majority of design anomalies at the model level based on
analyzing the class and sequence diagrams.
As future work, we plan to eliminate these defects avoiding their propagation to
code. We plan also to extend the detection to other defects. Finally, we plan to implement
a model-refactoring framework, integrating the detection and correction approaches.

19
REFERENCES
[1] M. Misbhauddin and M. Alshayeb, ‘‘UML model refactoring: A systematic literature
review,’’ Empirical Softw. Eng., vol. 20, no. 1, pp. 206–251, Feb. 2015.

[2] R. Panigrahi, L. Kumar, and S. Kuanar, ‘‘An empirical study to investigate different
SMOTE data sampling techniques for improving software refactoring prediction,’’ in Proc.
ICONIP, 2020, pp. 23–31.

[3] K. Czanecki and S. Helsen, ‘‘Classification of model transformation approaches,’’ in

Proc. OOPSLA Workshop Generative Techn. Context (MDA), 2003, pp. 1–7.

[4] M. Mohamed, R. Mohamed, and G. Khaled, ‘‘Classification of model refactoring

approaches,’’ J. Object Technol., vol. 8, no. 6, pp. 121–126, 2009.

[5] J. Zhang, Y. Lin, and J. Gray, ‘‘Generic and domain-specific model refactoring using
a model transformation engine,’’ in Model-Driven Softw. Develop. Berlin, Germany:
Springer, 2005.

[6] S. Freire, A. Passos, M. Mendonca, C. Sant’Anna, and R. O. Spinola, ‘‘On the influence
of UML class diagrams refactoring on code debt: A family of replicated empirical studies,’’
in Proc. 46th Euromicro Conf. Softw. Eng. Adv. Appl. (SEAA), Aug. 2020, pp. 346–353.

[7] W. J. R. C. Brown Malyeau, H. W. S. Mccormick, and T. J. Mowbray, AntiPatterns :

Refactoring Software, Architecture and Projects in Crisis. Hoboken, NJ, USA: Wiley,
1998.

[8] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring : Improving the
Design of Existing Code. Reading, MA, USA: Addison Wesley, 1999.

[9] M. Zhang, T. Hall, and N. Baddoo, ‘‘Code bad smells: A review of current knowledge,’’
J. Softw. Maintenance Evol., Res. Pract., vol. 23, no. 3, pp. 179–202, Oct. 2010.

20
[10] H. Mumtaz, M. Alshayeb, S. Mahmood, and M. Niazi, ‘‘A survey onUMLmodel
smells detection techniques for software refactoring,’’ J. Softw., Evol. Process, vol. 31, no.
3, Mar. 2019, Art. no. e2154.

[11] F. B. e Abreu and W. Melo, ‘‘Evaluating the impact of object-oriented design on

software quality,’’ in Proc. 3rd Int. Softw. Metrics Symp., Mar. 1996, pp. 90–99.

[12] R. Marinescu, ‘‘Detection strategies: Metrics-based rules for detecting design flaws,’’
in Proc. 20th IEEE Int. Conf. Softw. Maintenance, Sep. 2004, pp. 350–359.

[13] M. Alzahrani, ‘‘Measuring class cohesion based on client similarities between method
pairs: An improved approach that supports refactoring,’’ IEEE Access, vol. 8, pp. 227901–
227914, 2020.

[14] S. Mäkelä and V. Leppänen, ‘‘Client-based cohesion metrics for java programs,’’ Sci.
Comput. Program., vol. 74, nos. 5–6, pp. 355–378, Mar. 2009.

[15] K. Erni and C. Lewerentz, ‘‘Applying design-metrics to object-oriented

frameworks,’’ in Proc. 3rd Int. Softw. Metrics Symp., Mar. 1996, pp. 64–74.

[16] M. Kessentini, W. Kessentini, H. Sahraoui, M. Boukadoum, and A. Ouni, ‘‘Design

defects detection and correction by example,’’ in Proc. IEEE 19th Int. Conf. Program
Comprehension, Jun. 2011, pp. 81–90.

[17] P. Tianual and A. Pohthong, ‘‘Defects detection technique of use case views during
requirements engineering,’’ in Proc. 8th Int. Conf. Softw. Comput. Appl., Feb. 2019, pp.
277–281.

[18] F. Arcelli Fontana, M. V. Mäntylä, M. Zanoni, and A. Marino, ‘‘Comparing and

experimenting machine learning techniques for code smell detection,’’ Empirical Softw.
Eng., vol. 21, no. 3, pp. 1143–1191, Jun. 2016.

21
[19] N. Moha, Y.-G. Gueheneuc, L. Duchien, and A.-F. Le Meur, ‘‘DECOR: A method
for the specification and detection of code and design smells,’’ IEEE Trans. Softw. Eng.,
vol. 36, no. 1, pp. 20–36, Jan. 2010.

[20] A. Ouni, M. Kessentini, H. Sahraoui, and M. Boukadoum, ‘‘Maintainability defects

detection and correction: A multi-objective approach,’’ Automated Softw. Eng., vol. 20,
no. 1, pp. 47–79, Mar. 2013.

Diabetes project using machine learning
No ratings yet
Diabetes project using machine learning
49 pages
DESIGN AND ANALYSIS OF ALGORITHMS Lesson Plan
No ratings yet
DESIGN AND ANALYSIS OF ALGORITHMS Lesson Plan
5 pages
Thesis Anjana Perera
No ratings yet
Thesis Anjana Perera
239 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
32 pages
Project Report
No ratings yet
Project Report
54 pages
CS8091-BDA-COURSEFILE_removed
No ratings yet
CS8091-BDA-COURSEFILE_removed
324 pages
Software Defect
No ratings yet
Software Defect
46 pages
Sanjay
No ratings yet
Sanjay
38 pages
Technical Seminar
No ratings yet
Technical Seminar
21 pages
software defect prediction_final_doc_Phase 1
No ratings yet
software defect prediction_final_doc_Phase 1
36 pages
Ebug Final
No ratings yet
Ebug Final
25 pages
Projetc R
No ratings yet
Projetc R
97 pages
Major Documentation-1
No ratings yet
Major Documentation-1
80 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Andriya-Seminar Repot (1) ..
No ratings yet
Andriya-Seminar Repot (1) ..
28 pages
Project Report December 2022 Final
No ratings yet
Project Report December 2022 Final
42 pages
Uma's Final Project1
No ratings yet
Uma's Final Project1
92 pages
Final Project Report 1
No ratings yet
Final Project Report 1
74 pages
1822-b.e-cse-batchno-41
No ratings yet
1822-b.e-cse-batchno-41
175 pages
14 Apr
No ratings yet
14 Apr
9 pages
Minor Project (7-37)
No ratings yet
Minor Project (7-37)
31 pages
Nimmu Team 2
No ratings yet
Nimmu Team 2
80 pages
Deep Learning Based Software Defect Prediction
No ratings yet
Deep Learning Based Software Defect Prediction
11 pages
Ins 4
No ratings yet
Ins 4
7 pages
Project Report Template PICT 1
No ratings yet
Project Report Template PICT 1
36 pages
A15 Final Document
No ratings yet
A15 Final Document
68 pages
DA Report Format
No ratings yet
DA Report Format
18 pages
Binder 1
No ratings yet
Binder 1
93 pages
Comparative Analysis of Supervised Learning Techniques of Machine Learning For Software Defect Prediction
No ratings yet
Comparative Analysis of Supervised Learning Techniques of Machine Learning For Software Defect Prediction
4 pages
Software Defect Prediction Using Machine Learning
No ratings yet
Software Defect Prediction Using Machine Learning
5 pages
Mini Project
No ratings yet
Mini Project
65 pages
46.mp
No ratings yet
46.mp
59 pages
Software Defect Detection Using Machine Learning (5)
No ratings yet
Software Defect Detection Using Machine Learning (5)
61 pages
New Report
No ratings yet
New Report
73 pages
MINI DOCC LAST (1)_removed
No ratings yet
MINI DOCC LAST (1)_removed
52 pages
vitti
No ratings yet
vitti
56 pages
Software Reusability
No ratings yet
Software Reusability
6 pages
Frontpg
No ratings yet
Frontpg
9 pages
karan
No ratings yet
karan
64 pages
Minor Project Report
No ratings yet
Minor Project Report
69 pages
Project report
No ratings yet
Project report
70 pages
Documentation Rtp Merged
No ratings yet
Documentation Rtp Merged
36 pages
Template To Prepare Documentation
No ratings yet
Template To Prepare Documentation
6 pages
Report
No ratings yet
Report
65 pages
Project Report
No ratings yet
Project Report
52 pages
Student Performance Analysis Using Machine Learning
No ratings yet
Student Performance Analysis Using Machine Learning
40 pages
Overview of Software Defect Prediction Using Machine Learning Algorithms
No ratings yet
Overview of Software Defect Prediction Using Machine Learning Algorithms
12 pages
Project Documentation
No ratings yet
Project Documentation
89 pages
Our Research Paper
No ratings yet
Our Research Paper
7 pages
A General Software Defect-Proneness Prediction Framework: Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu
No ratings yet
A General Software Defect-Proneness Prediction Framework: Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu
15 pages
Students Project Report Coverage (V1.1) : The Following Sequence Should Be Followed and Maintained
No ratings yet
Students Project Report Coverage (V1.1) : The Following Sequence Should Be Followed and Maintained
67 pages
Te-project Sample Report
No ratings yet
Te-project Sample Report
18 pages
66
No ratings yet
66
82 pages
1822 B.E Ece Batchno 63
No ratings yet
1822 B.E Ece Batchno 63
106 pages
Aniket Gurav: Total Experience: + 3.5 Years Data Scientist
No ratings yet
Aniket Gurav: Total Experience: + 3.5 Years Data Scientist
4 pages
A Meta-Stacked Software Bug Prognosticator Classifier
No ratings yet
A Meta-Stacked Software Bug Prognosticator Classifier
7 pages
An Enhanced Bayesian Decision Tree Model For Defect Detection On Complex SDLC Defect Data
No ratings yet
An Enhanced Bayesian Decision Tree Model For Defect Detection On Complex SDLC Defect Data
6 pages
Skin Disease Detection Using Machine Learning
100% (2)
Skin Disease Detection Using Machine Learning
59 pages
Cummins Insite Date Unlock Rapidsharebfdcm
0% (1)
Cummins Insite Date Unlock Rapidsharebfdcm
3 pages
Worded Problems: - Number - Coin - Age - Work - Mixture
No ratings yet
Worded Problems: - Number - Coin - Age - Work - Mixture
37 pages
Entrepreneurship Module 8: Computation of Gross Profit What I Know
67% (6)
Entrepreneurship Module 8: Computation of Gross Profit What I Know
6 pages
Bharat Sevak Samaj
No ratings yet
Bharat Sevak Samaj
14 pages
Assignment: Case Study
67% (3)
Assignment: Case Study
3 pages
LUDO
No ratings yet
LUDO
21 pages
Communication-Media-Channels
No ratings yet
Communication-Media-Channels
16 pages
FileList
No ratings yet
FileList
21 pages
Sample Problems Bank Reconciliation
No ratings yet
Sample Problems Bank Reconciliation
44 pages
Embraco Commercial Electrical Components Catalog
No ratings yet
Embraco Commercial Electrical Components Catalog
20 pages
Second Beta of Jquery 4
No ratings yet
Second Beta of Jquery 4
11 pages
Massmart - Road To Recovery
100% (2)
Massmart - Road To Recovery
49 pages
Intern
No ratings yet
Intern
26 pages
Proposal On Design and Fabrication of Teff Seiving Machine
No ratings yet
Proposal On Design and Fabrication of Teff Seiving Machine
6 pages
Business Ethics Activity Case Study
No ratings yet
Business Ethics Activity Case Study
1 page
Analog Clock
No ratings yet
Analog Clock
18 pages
Textbook Spanish
No ratings yet
Textbook Spanish
58 pages
RSL0292_CUT_900FH-1000FH
No ratings yet
RSL0292_CUT_900FH-1000FH
2 pages
Learning Cyber Security and Machine Engineering at The University
No ratings yet
Learning Cyber Security and Machine Engineering at The University
6 pages
Program Delivery Plan
No ratings yet
Program Delivery Plan
17 pages
Journal of International Marketing 2000 8, 3 ABI/INFORM Collection
No ratings yet
Journal of International Marketing 2000 8, 3 ABI/INFORM Collection
26 pages
Project 1
No ratings yet
Project 1
24 pages
Selfie Culture
No ratings yet
Selfie Culture
20 pages
Reservation Register
No ratings yet
Reservation Register
29 pages
Presentation SIMELT PQ
No ratings yet
Presentation SIMELT PQ
14 pages
Like It or Not
No ratings yet
Like It or Not
1 page
Yoga Pose Detection Using Machine Learning
No ratings yet
Yoga Pose Detection Using Machine Learning
11 pages
C++ Marchal To C# Via Edge - Js
No ratings yet
C++ Marchal To C# Via Edge - Js
4 pages
Land Law PPT 4
No ratings yet
Land Law PPT 4
12 pages
IIR - Subhagato Banerjee
No ratings yet
IIR - Subhagato Banerjee
2 pages
Imamul Hai Khan Law College: Amission Form
No ratings yet
Imamul Hai Khan Law College: Amission Form
2 pages
Potential of Biogas Generation From Hybrid Napier Grass
No ratings yet
Potential of Biogas Generation From Hybrid Napier Grass
5 pages
الأخلاق الإعلامية وكيفية تعزيزها
No ratings yet
الأخلاق الإعلامية وكيفية تعزيزها
12 pages
Unit 1 Communucation
No ratings yet
Unit 1 Communucation
7 pages
APPLIED AUDITING Module 1
No ratings yet
APPLIED AUDITING Module 1
3 pages
DevTest Engineering Foundations: Definitive Reference for Developers and Engineers
From Everand
DevTest Engineering Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet

Final Report

Uploaded by

Final Report

Uploaded by

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JNANA SANGAMA” BELAGAUM – 590 018

“ Decision tree-based Design Defects Detection”

CO-ORDINATOR HEAD OF THE DEPARTMENT

Srishyla Educational Trust (R), Bheemasamudra

GM INSTITUTE OF TECHNOLOGY, DAVANGERE

Department of Computer Science& Engineering

This is to certify that the Technical Seminar entitled “ Decision tree-based

HEAD OF THE DEPARTMENT

I am highly grateful to Dr. B N Veerappa, Professor and Head, Dept.of CS&E,

I take this opportunity to express my deep sense of gratitude to the Technical

I take this opportunity to express my deep sense of gratitude to my project guide

Fig.no Description Page.no

3.2.1 Cut Point Example 09

3.2.2 Data class metrics discrete values 10

3.3.1 Final decision tree 12

3.4.1 the variation of recall and precision depending on N. 13

5.1 Precisions and recall results 15

6.1 Variation of recall and precision 16

6.2 Variation of F1 score. 16

Table.no Description Page.no

3.2.1 Discretization of metrics values 10

4.1 Data class metrics 14

DESCRIPTION PAGE NO.

LIST OF FIGURES iii

1.1 Brief on Topic 1

RESULTS AND DISCUSSION 16

1.1 Brief on Topic

Decision tree-based design defects detection is a machine learning approach that

• Efficient Resource Allocation: Allocate resources efficiently by prioritizing the

• Improved Software Quality: Enhance the overall quality of the software by

• Overfitting: Decision trees are prone to overfitting, especially with complex

• Limited Expressiveness: Decision trees have limited expressiveness

• Lack of Continuity: Decision tree boundaries are discontinuous, leading to

• Difficulty Handling Imbalanced Data: Decision trees may struggle to handle

• Sensitive to Small Changes in Data: Decision trees can produce different

1. Title of the paper: UML model refactoring: A systematic literature review.

Description: "UML model refactoring: A systematic literature review" is a comprehensive

Description: Classification of model refactoring approaches is a systematic

Description: Detection strategies utilizing metrics-based rules for identifying design

Description: It involves evaluating and testing various machine learning algorithms to

8. Title of the paper: Towards proving preservation of behaviour of refactoring of UML

Here we delve into an extensive exploration of existing research and scholarly

3.1 Adaptation of Decision Tree to Design Defect Detection

Figure 3.2.1 Cut Point Example

Figure 3.2.2 Data class metrics discrete values

TABLE 3.2.1 Discretization of metrics values

3.3 Choosing the Best Root Node

Recall = Detected defects ∩ Expected defects Expected defects (4)

We want to validate the following assumptions:

Figure 3.4.1 the variation of recall and precision depending on N.

Table 4.1 Data class metrics

FIGURE 5.1. Precisions and recall results.

FIGURE 6.1 Variation of recall and precision.

FIGURE 6.2 Variation of F1 score.

1. Software Engineering: In software development, decision tree-based design defect

[3] K. Czanecki and S. Helsen, ‘‘Classification of model transformation approaches,’’ in

[4] M. Mohamed, R. Mohamed, and G. Khaled, ‘‘Classification of model refactoring

[7] W. J. R. C. Brown Malyeau, H. W. S. Mccormick, and T. J. Mowbray, AntiPatterns :

[11] F. B. e Abreu and W. Melo, ‘‘Evaluating the impact of object-oriented design on

[15] K. Erni and C. Lewerentz, ‘‘Applying design-metrics to object-oriented

[16] M. Kessentini, W. Kessentini, H. Sahraoui, M. Boukadoum, and A. Ouni, ‘‘Design

[18] F. Arcelli Fontana, M. V. Mäntylä, M. Zanoni, and A. Marino, ‘‘Comparing and

[20] A. Ouni, M. Kessentini, H. Sahraoui, and M. Boukadoum, ‘‘Maintainability defects

You might also like