Final Report
Final Report
DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING
A SEMINAR REPORT ON
SEMINAR BY
SREENIVASAN KN
[4GM20CS106]
GUIDE
Dr. Shankarayya Shastri
Assistant Professor
CERTIFICATE
GUIDE COORDINATOR
------------------------------ -------------------------------
Dr. Shankarayya Shastri Dr. Rachana P G
------------------------------
Dr. B N Veerappa
ABSTRACT
Design defects affect project quality and hinder development and maintenance.
Consequently, experts need to minimize these defects in software systems. A promising
approach is to apply the concepts of refactoring at higher level of abstraction based on UML
diagrams instead of code level. Unfortunately, I find in literature many defects that are
described textually and there is no consensus on how to decide if a particular design violates
model quality. Defects could be quantified as metrics based rules that represent a
combination of software metrics. However, it is difficult to find manually the best threshold
values for these metrics. In this paper, I propose a new approach to identify design defects
at the model level using the ID3 decision tree algorithm. I aim to create a decision tree for
each defect. I experimented mine approach on four design defects: The Blob, Data class,
Lazy class and Feature Envy defect, using 15 Object-Oriented metrics. The rules generated
using decision tree give a very promising detection results for the four open source projects
tested in this paper. In Lucene 1.4 project, we found that the precision is 67% for a recall
of 100%. In general, the accuracy varies from 49%, reaching for Lucene 1.4 project 80%.
i
ACKNOWLEDGEMENT
First and the foremost, I take this opportunity to express my deep sense of gratitude
to the principal Dr. Sanjay Pande M B for his guidance and encouragement throughout
the technical seminar.
I take this opportunity to express my deep sense of gratitude to GMIT for providing
me an opportunity to carry out Technical Seminar. I would also like to thank all the
teaching and non-teaching staff of Dept. of CS&E for their kind co-operation during
theTechnical Seminar. The support provided by the College and Departmental library is
gratefully acknowledged.
Finally, I’m thankful to my parents and friends, who helped us in one way or the
other throughout this Technical Seminar.
SREENIVASAN KN
[4GM20CS106]
ii
LIST OF FIGURES
iii
LIST OF TABLES
5.1 The number of detection for each project and each defect 15
6.1 F1 score 17
iv
CONTENTS
ABSTRACT i
ACKNOWLEDGEMENT ii
LIST OF TABLES iv
CHAPTER 1
INTRODUCTION 1
CHAPTER 2
LITERATURE SURVEY 5
CHAPTER 3
ALGORITHM 8
CHAPTER 4
DATASET USED 14
CHAPTER 5
EXPERIMENTAL RESULT 15
CHAPTER 6
APPLICATIONS 18
CHAPTER 7
CONCLUSION 19
REFERENCES 20
CHAPTER 1
INTRODUCTION
Decision tree-based Design Defects Detection is a data-driven approach used to
identify and diagnose design defects in software systems. The method involves creating a
decision tree model that represents the relationships between various factors, such as design
components, defect types, and defect severities. By analyzing the data collected from the
software development process, the model can help in detecting design defects at an early
stage and improving the overall quality of the software.
1
1.2 Objectives
• Early Detection: Identify design defects as early as possible in the software
development process to minimize their impact on the project timeline and budget.
• Accurate Diagnosis: Develop a decision tree model that accurately predicts the
likelihood of design defects based on various factors, such as design components,
defect types, and defect severities.
2
1.3 Advantages
• Data-Driven Insights: The method relies on a decision tree model that is
trained on data collected from the software development process, providing
accurate and reliable insights into the likelihood of design defects.
• Scalability: Decision tree-based design defects detection can be applied to
various software systems, regardless of their size or complexity, making it
a versatile tool for identifying defects.
• Early Warning System: The model can predict the likelihood of design
defects based on input variables, serving as an early warning system for
developers to address issues before they become critical.
• Customizable: The decision tree model can be tailored to suit specific
software systems or industries, making it a flexible solution for design
defect detection.
• Cost-Effective: Implementing decision tree-based design defects detection
can lead to cost savings by identifying and resolving design defects early in
the development process, reducing the need for costly rework and
maintenance.
• Continuous Improvement: By continuously monitoring the software system
and detecting new design defects, the method promotes a culture of
continuous improvement and ensures that high-quality software is
maintained throughout its lifecycle.
3
1.4 Disadvantages
• High Variance: Decision trees are sensitive to small variations in the training
data, leading to high variance in the model's predictions. This can result in
instability and inconsistency in defect detection performance across
different datasets or sampling variations.
• Bias towards Features with Many Levels: Decision trees tend to favor
features with many levels or categories, potentially overlooking other
relevant but less complex features.
4
CHAPTER 2
LITERATURE SURVEY
2. Title of the paper: An empirical study to investigate different SMOTE data sampling
techniques for improving software refactoring prediction
Author and Year: R.Panigrahi, L.Kumar, and S.Kuanar, in Proc. ICONIP, 2020, pp. 23–
31.
Description: This empirical study explores the efficacy of various Synthetic Minority
Over-sampling Technique (SMOTE) data sampling methods in enhancing the accuracy of
software refactoring prediction. By employing SMOTE, the research aims to address class
imbalance issues commonly encountered in software datasets. Through experimentation
and analysis, the study investigates how different SMOTE techniques impact the predictive
performance of software refactoring models, providing insights into optimizing data
preprocessing strategies for better prediction outcomes.
5
3. Title of the paper: Classification of model refactoring approaches .
Author and Year: M.Mohamed, R.Mohamed, and G.Khaled, J. Object Technol., vol. 8,
no. 6, pp. 121–126, 2009.
4. Title of the paper: Generic and domain-specific model refactoring using a model
transformation engine .
Author and Year: J.Zhang, Y.Lin, and J.Gray, in Model-Driven Softw. Develop.
Berlin, Germany: Springer, 2005 .
Description: The study uses a method called Differential Evolution to maximize profit
from farming. It builds on a previous study that considered three goals: maximizing profit,
crop output, and minimizing water use. By refining the results of the multi-objective study,
the researchers found a strategy that maximized profit, earning ZAR 1,330,000 from a
planting area of 771,000 m^2 using 704,694 m^3 of irrigation water.
5. Title of the paper: Detection strategies: Metrics-based rules for detecting design flaws
Author and Year: R.Marinescu, in Proc. 20th IEEE Int. Conf. Softw. Maintenance, Sep.
2004, pp. 350–359.
6
6. Title of the paper: Defects detection technique of use case views during requirements
engineering
Author and Year: P.Tianual and A.Pohthong, in Proc. 8th Int. Conf. Softw.
Comput.Appl., Feb. 2019, pp. 277–281.
Description: The "Defects detection technique of use case views during requirements
engineering" refers to systematic approaches aimed at identifying and resolving flaws
within use case diagrams and scenarios, crucial components of requirements engineering.
7. Title of the paper: Comparing and experimenting machine learning techniques for
code smell detection .
Author and Year: F.Arcelli Fontana, M.V.Mantyla, M.Zanoni, and A.Marino,
Empirical Softw. Eng., vol. 21, no. 3, pp. 1143–1191, Jun. 2016.
Description: The question seems to be about the preservation of the behavior of UML
models during refactoring. The description provided in various knowledge bases and
articles suggests that behavior preservation is a crucial aspect of refactoring UML models.
7
CHAPTER 3
ALGORITHM
A decision tree is a graphical representation of potential solutions to a decision,
based on conditional control statements, and is commonly used in supervised machine
learning. It aims to classify data features into homogeneous groups, with each branch
representing a possible decision. The ID3 algorithm is often employed to construct such
trees, where each non-leaf node corresponds to an input metric, and each arc represents a
possible value of that metric. The algorithm starts by selecting the most informative metrics
and evaluates metric information using Shannon Entropy. Once the tree is built, designers
can filter the extracted rules based on a heuristic value N, which determines the minimal
number of metrics in rule detection. This value balances between over-detection and under-
detection of defects, with small values leading to more false positives and high values
leading to more false negatives.
8
3.2 Choosing the Best Cut Point
To choose the best cut point, we first discretize the metrics threshold in the set of
examples. The discretization consists of transforming values into a finite number of
intervals. After that we re-encode, each value for the selected attribute by associating it
with its corresponding interval. It is a powerful heuristic to classify a set of training
examples using the best decision tree. It is a good method to determine the most relevant
attributes for the classification task. Each metric value is compared to the cut point. The
idea is to transform the continuous interval into two intervals according to the cut point.
We illustrate how we adapted the set of examples based on Table 3.1. It represents the list
of observations in the set of example for the defect ‘‘Data Class’’, in different projects.
In this example, we have two projects P1 and P2. Let’s consider the first metrics Access
To Foreigner Data (ATFD). Metrics thresholds must be classified into two classes
depending on the existence of a defect. Indeed, we have to make a supervised discretization.
In this work, we adopted a bottom-up hierarchical clustering. Each item is placed in its own
cluster; the next partition is created by merging the two nearest clusters.
In Figure 3.2.1, we give a cut point example. Each continuous interval is divided
into three parts; first quartile, medium quartile and third quartile. We choose the cut point
intervals giving the best degree of association. We use the number of phi coefficient defined
in (1) to assess the degree of association between the two variables.
9
x2
(1) ∅ = √ N Where χ2 is derived from Pearson’s chi-squared test and N is the total of
observations.
As presented in Figure 3.2.2 , ∅2 is the best value and the best cut point is 16.5. In Table
3.2.1, we present the final discretization of the observations related to the defect ‘‘Data
Class’’.
We evaluate the phi coefficient for all metrics. Therefore, the node ATFD will have two
arcs one with values <=16.5 and the other with values > 16.5.
10
provides the most clarity in distinguishing between classes. Each branch of the tree
emanates from the most informative metric, with leaf nodes representing decisions based
on computed attributes. The path from root to leaf embodies the design defect detection
rules, with Shannon entropy serving as a guiding metric in optimizing information gain. (2)
E(BE/c) = - ∑𝑐=𝑐.𝑙 p(c) log2 p(c).
Where ,
BE is the set of examples
Cl is the set of classes in BE (Defect, Not Defect)
P(c) is the proportion of the number of defect
values in a class c to the number of elements in the set BE
In Figure 3.4, we present the decision tree for the example presented in Table 3.2 .
We measure the Shannon entropy for each metric.
E(Yes/ATFD) = − P(>16.5) × (P(Yes/>16.5) × log P(Yes/>16.5) + P(No/>16.5) × log
P(No/>16.5)) - P(<=16.5) × (P(Yes/<=16.5) × log P(Yes/<=16.5) + P(No/<=16.5) × log
P(No/<=16.5)) = 0.11
E(Yes/NOM) = 0.27
E(Yes/NOA) = 0.22
E(Yes/NC) = 0.25
The ID3 algorithm selects ATFD as the root metric due to its lowest entropy. Values
of ATFD <= 16.5 lead to a leaf node labeled "Yes," while for ATFD > 16.5, entropy is re-
evaluated for the remaining metrics, with classification stopping at the NC node. In this
example, three rules are extracted:
R1: IF ATFD <= 16.5 THEN Data class = Yes
R2:IF ATFD > 16.5 AND NOA > 11.25 THEN Data class = Yes
R3: IF ATFD > 16.5 AND NOA <= 11.25 AND NC > 251 THEN Data class = Yes.
It is clear that R1 and R2 combine a few numbers of metrics and will generate a
huge number of suspect classes. Furthermore, if the number of metrics (N) is fixed to 3
then we can extract only the rule R3, which seems to be the most appropriate rule for this
illustrative example. The final tree is shows in Figure 3.3.1.
11
FIGURE 3.3.1 Final decision tree.
3.4 Validation
The experiments assess the performance of the ID3 algorithm on detecting Lazy
Class (LC), Blob, Data Class (DC), and Feature Envy (FE) defects using precision and
recall metrics. Precision measures the correctness of defect identification, while recall
evaluates the completeness of defect detection, specifically highlighting the number of true
defects missed by the algorithm.
Precision = Detected defects ∩ Expected defects Detected defects (3)
Detected defects
12
detection rule. Static metrics are evaluated using class diagrams, while behavioral metrics
are assessed using sequence diagrams, generating a set of suspect elements in the model
based on detection rules. The hypothesis underlying the work is that analysis relies on
complete and final design, with the quality of rule detection contingent upon the
completeness of sequence diagram models. Validation is conducted on reverse-engineered
designs of five open-source Java systems—Xerces v2.7, Argo UML 0.19.8, Lucene 1.4,
Log4j 1.2.1, and Gantt Project v1.10.2—chosen based on project size and open-source
availability. PMD 5.4.3 and Nutch 1.12 projects are utilized for creating the set of
examples, with defects manually entered for detection. Validation iterates twice, initially
with only the PMD project and subsequently incorporating the Nutch project to increase
the number of defects, enabling evaluation of the impact on detection quality.
The choice of the N value significantly impacts both recall and precision, with a
trade-off between the two metrics. Minimizing N tends to increase recall but decrease
precision, while maximizing it has the opposite effect, influenced by project size.
Optimization of N seeks to strike a balance between precision and recall, with experimental
findings suggesting a range between four and ten as optimal, depending on project
characteristics. Our experiments on three projects, with varying class counts and defect
numbers, indicate N thresholds of 4, 5, and 7 for optimal trade-offs. Although determining
the best N value is challenging, our approach provides acceptable results, with a suggested
range of 4 to 9, approximately one-third of the total number of metrics. Designers may
explore other N values if results are unsatisfactory, with computations performed
efficiently on a desktop computer within 30 seconds, excluding reverse-engineered designs.
Figure 3.4.1 provides the variation of recall and precision depending on N.
13
CHAPTER 4
DATASET USED
Two projects P1 and P2. Let’s consider the first metrics Access To Foreigner Data
(ATFD). Metrics thresholds must be classified into two classes depending on the existence
of a defect. Indeed, we have to make a supervised discretization. In this work, we adopted
a bottomup hierarchical clustering. In table 4.1 Each item is placed in its own cluster; the
next partition is created by merging the two nearest clusters.
14
CHAPTER 5
EXPERIMENTAL RESULT
Figure 5.1 reports the precisions and recall values for the three executions using
deferent values of N (4, 5 and 9). The best precision values are 62%, 59%, 74%, 100%,
100% for Xerces v2.7, ArgoUML 0.19.8, Lucene 1.4, Log4j 1.2.1 and GanttProject
v1.10.2., respectively. The best recall value is 100% for all projects.
The number of detection for each project and each defect. In Table 5.1, It shows
that even if the detection rate is under 50% for ArgoUML 0.19.8, it is due to the low
detection rate for FE defect (0.38). For the defects LC, Blob and DC, the recall is higher
than 50% varying within 0.51, 0.6 and 0.73 respectively. We can conclude that the proposed
approach is able to detect the majority of defects.
TABLE 5.1 The number of detection for each project and each defect.
15
CHAPTER 6
DISCUSSION ON RESULT
The variation of the recall and precision according to the size of the set of examples.
Results show that the quality of the detection increases with the number of defects
examples. In this work, we limit the set of examples to two open source projects that give
results varying from excellent to satisfying. In Figure 6.1 However, finding the optimal size
of the set of examples needs further investigations, that will be discussed in future research.
16
TABLE 6.1 F1 score
Project F1 Score N
Xerces v2.7 49% 9
ArgoUML 0.19.8 52% 9
Lucene 1.4 80% 4
Log4j 1.2.1 67% 4
Ganttproject v1.10.2 64% 5
17
APPLICATIONS
18
CHAPTER 7
CONCLUSION
A new approach for the design defects detection at the model level. This work leads
to define a standard way for model quality quantification. We introduced an adaptation of
ID3 decision tree algorithm to identify anti-patterns and bad smells in object-oriented
design. We tested and evaluated our approach on five open source projects by measuring
the precision and the recall. We showed that the efficiency and the precision of the detection
vary from satisfying to excellent with a recall that reaches 100 percent. We proved that
using our approach we detect the majority of design anomalies at the model level based on
analyzing the class and sequence diagrams.
As future work, we plan to eliminate these defects avoiding their propagation to
code. We plan also to extend the detection to other defects. Finally, we plan to implement
a model-refactoring framework, integrating the detection and correction approaches.
19
REFERENCES
[1] M. Misbhauddin and M. Alshayeb, ‘‘UML model refactoring: A systematic literature
review,’’ Empirical Softw. Eng., vol. 20, no. 1, pp. 206–251, Feb. 2015.
[2] R. Panigrahi, L. Kumar, and S. Kuanar, ‘‘An empirical study to investigate different
SMOTE data sampling techniques for improving software refactoring prediction,’’ in Proc.
ICONIP, 2020, pp. 23–31.
[5] J. Zhang, Y. Lin, and J. Gray, ‘‘Generic and domain-specific model refactoring using
a model transformation engine,’’ in Model-Driven Softw. Develop. Berlin, Germany:
Springer, 2005.
[6] S. Freire, A. Passos, M. Mendonca, C. Sant’Anna, and R. O. Spinola, ‘‘On the influence
of UML class diagrams refactoring on code debt: A family of replicated empirical studies,’’
in Proc. 46th Euromicro Conf. Softw. Eng. Adv. Appl. (SEAA), Aug. 2020, pp. 346–353.
[8] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring : Improving the
Design of Existing Code. Reading, MA, USA: Addison Wesley, 1999.
[9] M. Zhang, T. Hall, and N. Baddoo, ‘‘Code bad smells: A review of current knowledge,’’
J. Softw. Maintenance Evol., Res. Pract., vol. 23, no. 3, pp. 179–202, Oct. 2010.
20
[10] H. Mumtaz, M. Alshayeb, S. Mahmood, and M. Niazi, ‘‘A survey onUMLmodel
smells detection techniques for software refactoring,’’ J. Softw., Evol. Process, vol. 31, no.
3, Mar. 2019, Art. no. e2154.
[12] R. Marinescu, ‘‘Detection strategies: Metrics-based rules for detecting design flaws,’’
in Proc. 20th IEEE Int. Conf. Softw. Maintenance, Sep. 2004, pp. 350–359.
[13] M. Alzahrani, ‘‘Measuring class cohesion based on client similarities between method
pairs: An improved approach that supports refactoring,’’ IEEE Access, vol. 8, pp. 227901–
227914, 2020.
[14] S. Mäkelä and V. Leppänen, ‘‘Client-based cohesion metrics for java programs,’’ Sci.
Comput. Program., vol. 74, nos. 5–6, pp. 355–378, Mar. 2009.
[17] P. Tianual and A. Pohthong, ‘‘Defects detection technique of use case views during
requirements engineering,’’ in Proc. 8th Int. Conf. Softw. Comput. Appl., Feb. 2019, pp.
277–281.
21
[19] N. Moha, Y.-G. Gueheneuc, L. Duchien, and A.-F. Le Meur, ‘‘DECOR: A method
for the specification and detection of code and design smells,’’ IEEE Trans. Softw. Eng.,
vol. 36, no. 1, pp. 20–36, Jan. 2010.
22