2012 05ArjanvanZadelhoff
2012 05ArjanvanZadelhoff
Universiteit Leiden
Opleiding Informatica
BACHELOR THESIS
2 Background Information 9
2.1 UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Survey Structures . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Structural Questionnaire . . . . . . . . . . . . . . . . . 10
2.2.2 Non-Structural Questionnaire . . . . . . . . . . . . . . 11
2.3 Software Design Metrics . . . . . . . . . . . . . . . . . . . . . 11
2.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Software Metric Tools . . . . . . . . . . . . . . . . . . 11
2.4.2 Statistical Software . . . . . . . . . . . . . . . . . . . . 12
2.4.3 Design and Reverse Engineering Tools . . . . . . . . . 12
2.5 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Eye Tracking . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2 Software Visualization . . . . . . . . . . . . . . . . . . 13
2.5.3 Automated Abstraction of Class Diagrams . . . . . . . 14
2.5.4 Reasoning on UML Class Diagrams . . . . . . . . . . 15
1
3.2.2.1 Size Category (Question B1 - B4) . . . . . . 24
3.2.2.2 Coupling Category (Question B5 - B10) . . . 25
3.2.2.3 Inheritance Category (Question B11 - B13) . 26
3.2.2.4 Class Inclusion/Exclusion (Question B14) . . 26
3.2.3 Practical Problems . . . . . . . . . . . . . . . . . . . . 29
3.2.3.1 Question C1: Referring Figure 9, select the
classes that you think should not be included
in a class diagram . . . . . . . . . . . . . . . 29
3.2.3.2 Question C2: Referring Figure 10, select the
classes that you think should not be included
in this class diagram . . . . . . . . . . . . . . 30
3.2.3.3 Question C3: Referring Figure 11, select the
classes that you think should not be included
in this class diagram . . . . . . . . . . . . . . 31
3.2.3.4 Question C4: Referring to Figure 9, 10, and
11. Which class diagram do you prefer work-
ing with? . . . . . . . . . . . . . . . . . . . . 32
3.2.3.5 Question C5: Referring Figure 12, select the
classes that you think should not be included
in this class diagram . . . . . . . . . . . . . . 34
3.2.3.6 Question C6: Referring Figure 11 and 12,
which class diagram do you prefer working
with and why? . . . . . . . . . . . . . . . . . 35
3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Respondents’ Background . . . . . . . . . . . . . . . . 36
3.3.2 Software Design Metrics . . . . . . . . . . . . . . . . . 37
3.3.3 Class Names and Coupling . . . . . . . . . . . . . . . 38
3.3.4 Class Diagram Preferences . . . . . . . . . . . . . . . . 39
3.3.5 Threat of Validity . . . . . . . . . . . . . . . . . . . . 39
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2
4.2.1.2 Question A2: How many year(s) of experi-
ence do you have in working with class dia-
grams? . . . . . . . . . . . . . . . . . . . . . 45
4.2.1.3 Question A3: Where did you learn about
UML? . . . . . . . . . . . . . . . . . . . . . . 46
4.2.1.4 Question A4: How do you rate your own skill
in creating, modifying and understanding a
class diagram? . . . . . . . . . . . . . . . . . 47
4.2.1.5 Question A5: Indicate whether you (dis)like
to look at source code for understanding a
system? + Question A6: Indicate whether
you (dis)like to look at UML models for un-
derstanding a system? . . . . . . . . . . . . . 48
4.2.1.6 Others: . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Part B: Practical Problems . . . . . . . . . . . . . . . 51
4.2.2.1 Category 1: Attribute . . . . . . . . . . . . . 51
4.2.2.2 Category 2 : Operation . . . . . . . . . . . . 53
4.2.2.3 Category 3: Class . . . . . . . . . . . . . . . 54
4.2.2.4 Category 4: Relationship . . . . . . . . . . . 56
4.2.2.5 Category 5: Inheritance . . . . . . . . . . . . 57
4.2.2.6 Category 6: Package . . . . . . . . . . . . . . 58
4.2.2.7 Category 7: Others . . . . . . . . . . . . . . 59
4.2.3 Part C: Class Diagram Indicators for Class Inclusion/Exclusion 60
4.2.3.1 Question C1: In software documentation, par-
ticularly in class diagrams, what type of in-
formation do you look for to understand a
software system? . . . . . . . . . . . . . . . 60
4.2.3.2 Question C2: In a class diagram, what type
of information do you think can be left out
without affecting your understanding of a
system? . . . . . . . . . . . . . . . . . . . . . 63
4.2.3.3 Question C3: Do you think that a class di-
agram should show the full hierarchy of in-
heritance? If not, which parts could be left
out? (for example: parent, child, intermedi-
ate parent/child, leaf, . . . ) . . . . . . . . . . 66
4.2.3.4 Question C4: What criteria do you think in-
dicate that a class (in a class diagram) is
important for understanding a system? . . . 67
4.2.3.5 Question C5: If you try to understand a class
diagram, which relationships do you look at
first? (Example: dependencies, inheritance,
associations, etc) . . . . . . . . . . . . . . . . 68
3
4.2.3.6 Question C6: If there is a tool for simplifying
class diagrams (e.g. obtained from reverse
engineering), what features/functions would
you expect from such a tool? . . . . . . . . . 68
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Respondents’ Background . . . . . . . . . . . . . . . . 70
4.3.2 Class Properties . . . . . . . . . . . . . . . . . . . . . 70
4.3.3 Class Role and Semantics . . . . . . . . . . . . . . . . 72
4.3.4 Class Diagram Simplification Tool Features . . . . . . 72
4.3.5 Threat of Validity . . . . . . . . . . . . . . . . . . . . 72
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 Conclusions 73
5.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4
UML Class Diagram Simplification: A
Survey Study
By: Arjan van Zadelhoff
Supervisors: Hafeez Osman and Michel R. V. Chaudron
Abstract
Class diagrams play an important role in software development.
However, in some cases, these diagrams contain a lot of information
that makes it hard for software maintainers to use them to understand
a system. To reduce the information in a class diagram, a method
to simplify a class diagram is needed. This simplified class diagram
is resulting from leaving out details that are not needed and to re-
main the important information. To this end, we did 2 surveys to
enquire the information about what type of information they would
include or exclude in order to simplify a class diagram. The first sur-
vey involved 32 software developers with 75 percent of the participants
having more than 5 years of experience in class diagrams. The second
survey involved 25 respondents that answered this survey online, with
76 percent that rated themselves average or above in their skills in cre-
ating, modifying and understanding class diagrams. As for the results,
we found that the important elements in a class diagram are class rela-
tionship, meaningful class names and class properties. We also found
that, in a simplified class diagram, GUI related information, private
and protected operations, helper classes and library classes should be
excluded. In this survey we also tried to discover what types of features
are needed for class diagram simplification tools.
1 Introduction
In this chapter we present the contexts, motivations, and objectives of this
study. We also explain our research methodology, contributions and out-
line of this thesis. After reading this chapter, the reader should have the
knowledge of our aim and problem that we attempted to solve.
5
1.1 Problem Statement
The UML class diagram is one of the valuable artefacts in software develop-
ment and software maintenance. This diagram is helpful for software deve-
lopers and software maintainers in order to understand architecture, design,
implementation and behavior of a software system. UML class diagrams de-
scribe the static structure of programs at a higher level of abstraction than
source code [15].
It is widely known that UML models, which are usually created during
the design phase, are often poorly kept up-to-date during the realization
and maintenance phase. As the implementation evolves, correspondence be-
tween design and implementation degrades from its initial design [18]. For
legacy software, reliable designs are often no longer available, while those
are considered valuable for maintaining such systems.
6
tant for a new programmer to be able to identify which classes, attributes
and operations play a major role in the system. Tools that support during
maintenance, re-engineering or re-architecting activities has become impor-
tant to decrease the time software personnel spend on manual source code
analysis and help to focus attention on important program understanding
issues [7]. Also, a method to assist the software engineer to focus on the
relevant information and to leave out unnecessary information of the design
is needed.
1.2 Objective
This thesis paper specifically aims at simplifying a UML class diagram by
leaving out unnecessary information without affecting the developer’s under-
standing of the entire software. To this end, we have conducted 2 surveys
to gather information from our respondents about what type of information
should not be included in a class diagram.
7
This study consisted of 2 surveys that were part of the Class Diagram Sim-
plification study. The objectives between these two surveys were different.
The overall study of Class Diagram simplification is illustrated in Figure
1 and Table 1 explains the terms. As shown in Figure 1, the first survey
that we have done online was under the scope of Class Diagram Structural
Design Metrics. This survey aimed at finding out which structural design
metrics are important for class selection and class diagram simplification.
This survey consisted of 24 questions which were divided into 3 parts in
order to discover which metrics are important.
The second survey was under the scope of General Class Diagram Informa-
tion. In this survey we tried to discover the elements in a class diagram that
are needed to indicate whether a class should be or should not be included in
a class diagram. This survey consisted of 15 questions that were also divided
into three parts in order to discover the elements that the respondents find
important in a class diagram.
8
class diagrams. It also could be basic information for the software devel-
oper/maintainer on how to determine important or relevant classes in a class
diagram.
2 Background Information
This chapter describes the background information in which we explain a bit
about the UML class diagram that we have used in this study. After that,
we explain the different survey structures that we have used and what the
differences between these two structures are. We then present the software
design metrics that we have chosen for our online survey and we also explain
why we have chosen these metrics. The tools we have used to assist the
experiments are described after that. In the last part of this chapter we
present some related works.
9
Figure 2: UML Class Diagram
software industry based on the Level of Detail (LoD). A class diagram may
have only class names and their relationships between each other, which is a
class diagram with a low level of detail (LLoD). A class diagram can also be
presented with their attributes and their type, operations and their parame-
ters, and the relations between the classes. This class diagram is shown in a
high level of detail (HLoD). Different software developers/maintainers have
their own preferences in the level of detail, which is most probably based on
their experience and the given task for that particular system. These class
diagrams are being modeled based on the user requirements. It is used to
support the software developers and users to further understand the system.
• Structural Questionnaire
• Non-Structural Questionnaire
10
finding a result within a specific context. The process of collecting data is
much simpler and the analysis of the answers takes less time to complete.
Structured questions are best suited in the following situations: (1) the
responses are understandable that the answer choices can develop; (2) it is
not for capturing new ideas or thoughts from the respondent [24].
2.4 Tools
In this section we briefly explain about the tools used for this study. The
first part of this section describes the software metric tools that we have
used in this study. In the next part of this section we explain about a
simple statistical tool and the third part of this section describes the tools
that we have used for designing and reverse engineering class diagrams.
11
No. Metrics Category Description
8. CLD Inheritance The longest path from the class to a leaf node
in the inheritance hierarchy below the class.
12
code. In both questionnaires, there are several reverse engineered models
reconstructed by using this tool.
13
The most answered with graph (49 subjects). The conclusion of this study
was that many researchers of this survey prefer to only integrate the exist-
ing visualization techniques. Source code is also one of the most important
artefacts for these subjects. Eventually, this survey has revealed a tendency
to actually extend software visualization to what might be paraphrased as
software perception.
Bassil et al. [6] did a survey study about software visualization (SV) tools
that existed in 2000. This study addresses various functional, practical, cog-
nitive and code analysis aspects that users may be looking for in SV tools.
The participants, who are users of such tools in their industries or users
that are in a research setting, rated the usefulness and importance of these
aspects, and came up with their own desires. So basically, this questionnaire
questions the SV tools on what has worked and what has not worked for
these participants when applying a specific tool. The questionnaire is orga-
nized in two parts in which the first part is intended for any SV tool user, and
the second part calls on expert users of SV tools. After the questionnaire,
they analyzed it and in general, the participants were quite pleased with the
SV tool at hand. Functional aspects such as searching and browsing, use of
colors, and easy access from the symbol list to the corresponding source code
were rated as the most essential aspects. Also hierarchical representations
and navigation across hierarchies were strongly desired. Animation effects,
3D visualization and Virtual Reality (VR) techniques were least appreci-
ated. Regarding the practical aspects of these tools, they found that the
reliability of such a tool was classified as the most important aspect. They
verified that code comprehension is considered the key for carrying out var-
ious maintenance and software life cycle tasks. Concerning code analysis
aspects, only 3 out of 24 (desirable) aspects were identified as being sup-
ported by more than half of the tools. These aspects were: Visualization
of function calls, of inheritance graphs, and of different levels of detail in
separate windows. In the end, there is not a tool that fulfills all desires yet.
14
properties of classes and relationships which makes it possible to eliminate a
helper class and derive a slightly more abstract class diagram. Another part
of the abstraction rules consist of ambiguous model definitions. In total, the
article provides 121 rules to abstract a class diagram. To date, they have
validated their abstraction technique and its rules on numerous third-party
applications and models with up to several hundred model elements. They
showed that their technique scales, produces correct results most of the time,
and addresses issues such as model ambiguities that are inherently part of
many (UML) diagrams.
15
diagrams. Also, they were required to respond which class diagram flavor
they preferred working with. In total, 25 complete responses were received
with 76% having average or above skills with class diagrams. As the results,
we found out that the metric that counts the number of public operations
is the most important metric of them all. Also, we discovered that class
names and coupling that is equal or less than 2 are influencing factors when
it comes to excluding classes from a class diagram.
The outline of this chapter is the following: we first describe the survey
methodology. Next, we show our results and give our findings based on
these results. Then, we discuss our findings and give our conclusions based
on this questionnaire’s findings.
16
questions, the respondents were asked to answer the questions mainly about
the indicators of class diagram inclusion based on design metrics. In each
of the 13 questions, we briefly explained about the metrics that was used
and 5 answers were offered for the respondents to choose. The choices of
answers are shown in Table 3.
In the last question of part B (i.e. question 14), we tried to discover about
the reason of the respondents in including and excluding a class in a class
diagram. This question aimed to get the other information than software
metrics about the reason of the respondents for including and excluding a
class in a class diagram. This is an open-ended question and it is compulsory
for the respondents to answer.
17
system that we show in this questionnaire contains 24 classes and each
class only shows their operations. The reverse engineered design was
used for this questionnaire.
We also tried to simulate the various flavors of class diagrams from the soft-
ware industry by providing different Levels of Detail (LoD) of class diagrams
and the sources of class diagrams. Different flavors of class diagrams allowed
us to differentiate the indicators of class exclusion based on the class dia-
grams that were provided. The information about the class diagrams that
we used in question 1, 2, 3 and 5 in part C is shown in Table 4.
In the second question (question 6 in the survey) the respondents were re-
quired to choose between the forward design and the reversed engineered
design of Pacman. The respondents were also required to give the reason
why they chose the answer. These questions (4 and 6) were provided with
18
multiple choices of answers which the respondents were required to choose
one of the answers. It was also mandatory to answer this question and the
open-ended questions in which the respondents give the reason why they
chose the answer. The multiple choice answers are shown in Table 5.
19
No. Responses Amount
1 Complete Responses 25
2 Incomplete Responses 73
Total Responses 98
28% Student
32%
Researcher/Academic
40% IT Professional
20
In “Other”, the respondent could specify their status that differentiates from
the previous three answers. The results are shown in Figure 3. 40% of the re-
spondents mentioned that their current status is Researcher/Academic while
32% of the respondents are IT Professionals. 28% of the respondents an-
swered Student in this question. None of the respondents answered “Other”
so Figure 3 shows the results of all the respondents. With these results we
can conclude that the distribution of the respondent’s status is quite even.
3.2.1.2 Question A2: Indicate the location where you are cur-
rently working/studying
5%
5% 5% The Netherlands
4%
Malaysia
4% Sweden
45%
Italy
Austria
32% Spain
Czech Republic
In this optional question the respondent could state their location where
they answered this questionnaire. This question was open-ended, meaning
that the respondents were free to give any answer. Because of this, many
respondents answered this question by stating their university for example.
We looked in which country these universities were and added the country
for it. The complete results of this question are shown in Figure 4. 45%
of the respondents stated that they live in The Netherlands. 32% of the
respondents are from Malaysia. This percentage is big because we asked
some people from the Universiti Utara Malaysia, which is a university in
Malaysia, to answer this online questionnaire. The rest of the respondents
(4-5% each) accessed this questionnaire by seeing this in different forums
which is posted by us.
21
3.2.1.3 Question A3: How many years of experience do you have
in working with class diagrams? And Question A4: How
do you rate your skills in creating, modifying and under-
standing a class diagram?
7
6
5
4
3
2
1
0
< 1 Year 1 - 3 Years 3 - 7 Years 7 - 10 Years 10+ Years
Excellent 0 0 0 0 4
Good 0 1 1 2 1
Average 1 5 3 1 0
Low 4 0 0 0 0
Poor 2 0 0 0 0
We combined the last two questions of part A to discover new findings and
discuss about some matters which will be described later on.
Question A3 was about the experience the respondents have with class di-
agrams. The respondents could choose out of 5 answers which were the
following: < 1 Year, 1 - 3 Years, 3 - 7 Years, 7 - 10 Years, and 10+ Years.
28% of the respondents stated that their experience with class diagram is
less than 1 year. 24% of the respondents mentioned that their experience
with class diagrams is between 1 and 3 years while 16% of the respondents
answered this question with “3 - 7 Years”. 12% of the respondents answered
“7 - 10 Years” and 20% of the respondents mentioned that they have more
than 10 years of experience with class diagrams. As the results, it is quite
evenly distributed. Even though many respondents mentioned that they
have less than 1 year of experience, in the next question most of the respon-
dents rated their selves that their skill is average or above, which is shown
in Figure 5. This means that every respondent has knowledge about class
diagrams.
22
In Question A4, we asked the respondent to rate his/her skills on creating,
modifying and understanding class diagrams. 40% of the respondents stated
that their skill is Average, while 20% answered “Good”. 16% of the respon-
dents rated their skill Excellent. 16% of the respondents and 8% of the
respondents rated their skill Low and Poor, respectively. As the result, we
can state that 76% of the respondents rated their skill of average or above.
For the respondents that answered “< 1 Year” in the previous question, 6
out of 7 answered “Low” or “Poor” in this question. One of them answered
this question with “Average”. This again means that (most of) the respon-
dents have knowledge about class diagrams. The complete results of the
combination of these two questions are shown in Figure 5.
Answer Score
The reason of the scores is obvious: if a respondent does not want a class,
it basically means that the metric that is in this class is not important and
gets negative points. If a respondent answers with “Class(es) sometimes be
included” then this respondent is neutral and the metric does not get any
points. However, if a respondent answers that a class should be included,
then the metric that is being asked in this question gains positive points.
Figure 6 shows the maximum and minimum of the scores that a metric can
get. If a metric gets a negative score after going through all the answers
then we can conclude that this metric is not important and that if a class
contains a high frequency of this metric then this class must be excluded. If
23
Minimum and Maximum Score
-50 -40 -30 -20 -10 0 10 20 30 40 50
a metric has a positive score after going through all the answers then this
metric is important and so is the class that contains this metric at a high
frequency. However, if all metrics have a positive score, then this does not
automatically mean that every metric is important. If this situation occurs,
the metrics should be ranked based on the score and the metrics that are
located amongst the highest position (ordered by highest to low) in the list
are the important metrics. These important metrics should be presented in
the class diagram and the low scoring metrics in the list are the metrics that
should not be included in a class diagram.
The metrics are grouped in three categories which are: Size Category, Cou-
pling Category, and Inheritance Category. Number of attributes (NumAttr),
Number of Operations (NumOps), Number of Public Operations (NumPub-
Ops), and Setters/Getters are grouped in the Size category. Outgoing and
Incoming Dependencies (Dep Out and Dep In), Export Coupling Attributes
and Operations (EC Attr and EC Par), and Import Coupling Attributes
and Operations (IC Attr and IC Par) are grouped in the Coupling category.
Number of Children (NOC), Depth in Tree (DIT), and Class in Leaf Depth
(CLD) are grouped in the Inheritance category. The results of these three
groups are presented in different subsections.
24
Score Size Category
30
25
20
15
10
5
0
NumPubOps NumOps NumAttr Setters/Getters
Points
operations. In other words, they find public operations better and needs
to be included in a class. As for setters/getters, it has the lowest points in
this category. This indicates that the setters/getters are not an important
element in a class diagram for the respondents. A reason for this could be
that it is a common operation and also can be integrated in other operations
that a system actually needs. NumAttr and NumOps also have a quite
average amount of points. We can say that these metrics are normally
needed in a class diagram but that public operations are more preferred.
25
Score Coupling Category
18
16
14
12
10
8
6
4
2
0
Dep_Out IC_Attr Dep_In EC_Attr EC_Par IC_Par
Points
26
Score Inheritance Category
25
20
15
10
0
NOC DIT CLD
Points
that put an irrelevant answer in this question, “Not sure” for example. The
keywords that we have used are the following:
• Size of Class/Diagram
• Complex Class
• Coupling
• Domain Related
• Understandability
• Frequent Class
• Based on Granularity
• Cohesion
27
Keywords to Include a Class in a Class Diagram
35
30
25
20
15
10
5
0
Number of Respondents
18.5% of the respondents said that if a class has many relations then that
class should be included. The last keyword is Domain Related. These are
classes that are related to the concept or domain. Without these classes,
it is hard for a software maintainer to understand a system. Thus, these
classes must be included in a class diagram.
28
3.2.3 Practical Problems
In this part, we tried to access the information about the classes that should
not be included in a class diagram. The information is gathered by allowing
the respondents to choose the classes that should not be included in a class
diagram.
The results show that 48% of the respondents chose to exclude the class
Money and 36% of the respondents chose to not include the OperatorPanel
29
and Status class in a class diagram.
From our observation, those 3 classes have the number of coupling <= 2.
32% of the respondents chose to exclude the classes Deposit, EnvelopeAc-
ceptor, ReceiptPrinter, Transfer and Withdrawal. The coupling for those
classes is equal to 2. This means 8 out of 24 classes in this class diagram
were chosen to be excluded in a class diagram based on the amount of cou-
pling.
The classes that were important in this class diagram are Transaction and
ATM. The coupling in ATM is 9 and in Transaction it is 7. This shows
that the amount of coupling plays a major role in selecting the classes that
should or should not be included in a class diagram. In this question, we
found that the meaningful class names seem not to be influenced much for
the respondents. This is shown by 32% of the respondents that chose to ex-
clude domain related classes i.e. Withdrawal, Transfer and Deposit. Those
three classes are the functionality offered by the ATM Machine.
3.2.3.2 Question C2: Referring Figure 10, select the classes that
you think should not be included in this class diagram
A reverse engineered class diagram from a Library System was used for this
question (Figure 10 in this questionnaire). All elements in a class diagram
were presented (HLoD) and we expected to discover the elements that in-
fluence in selecting the classes that should not be included. The results of
the survey are shown in Figure 12.
From the results, it is obviously shown that most of the respondents chose
not to include the classes that have no relationship. The top 7 classes that
were chosen to be excluded in the class diagram were classes that have no
relationship. In this question, we found that class names also play a major
role in determining whether a class should be included or excluded. The top
three classes that were chosen to be excluded are AboutDialog, MessageBox
and QuitDialog. From the class names, the respondents were able to pre-
dict what the functionality of the class is. AboutDialog, MessageBox and
QuitDialog clearly mentioned the functionality of the classes that are used
to display the information. Thus, these classes are not important because
they are only used to display message. On the other hand, the 5 classes
that not many respondents chose to exclude in a class diagram are classes
that are related to the domain and have coupling more than 2. Borrower,
Reservation, Loan, Item and Title are classes that have a meaningful name
that might indicate the functionality of the classes and also closely relate to
the domain i.e. Library System.
30
Respondents Selection of Classes that Should not be
Included in a Library System
[Borrower]
[Reservation]
[Loan]
[Item]
[Title]
[ReservationFrame]
[UpdateTitleFrame]
[ReturnItemFrame]
[Persistent]
[LendItemFrame]
[FindTitleDialog]
[CancelReservationFrame]
[BorrowerInfoWindow]
[UpdateBorrowerFrame]
[TitleFrame]
[Objld]
[FindBorrowerDialog]
[MainWindow]
[BrowseWindow]
[TitleInfoWindow]
[BorrowerFrame]
[QuitDialog]
[MessageBox]
[AboutDialog]
%
0 10 20 30 40 50 60 70
From the analysis of the results, we found that meaningful class names
and number of coupling influenced the selection criteria of important classes
and not important classes for inclusion/exclusion in this class diagram.
3.2.3.3 Question C3: Referring Figure 11, select the classes that
you think should not be included in this class diagram
In this question, the respondents were required to select the classes that are
not important in a forward designed Pacman Game class diagram (Figure 11
in this questionnaire). Most of the classes in this diagram have relationships
and meaningful class names. The complete result for this question is pre-
sented in Figure 13. The results indicate that 64% of the respondents chose
class Direction to be excluded from a class diagram. This class is an Enu-
meration class and the coupling is equal to 0 which might be the reason why
this class should not be included in a class diagram. 52% of the respondents
31
selected to exclude the Iterator Class while 40% of the respondents chose not
to include the Iterable Class. Both classes are only interface classes which
might indicate that those classes are not important. PacShell only contains
a main operation and might be common in programming. That may be the
reason why this class has been chosen by 35% of the respondents to be left
out from this class diagram.
[Maze]
[GameModel]
[Player]
[Ghost]
[Character]
[Tile]
[GameLevel]
[GameListener]
[ConsoleControl]
[GameEvent]
[ConsoleView]
[MazeIterator]
[GameListenerAdapter]
[PacShell]
[Iterable]
[Iterator]
[Direction]
% 0 10 20 30 40 50 60 70
From the results in this question, we found that the enumeration and inter-
face types of classes are not important classes to be shown in a simplified
class diagram.
32
diagram in C2 presents the reverse engineered design. Class diagram C3
presents High Level of Detail (HLoD) in a forward design. In this question,
we tried to discover which class diagram is preferred by the respondents.
I prefer 5 20 0 2 3 0 0 2 1 2
class
diagram A
(figure 9)
I prefer 2 8 0 1 1 0 0 2 0 0
class
diagram B
(figure 10)
I prefer 12 48 6 5 1 1 3 6 1 1
class
diagram C
(figure 11)
I prefer 1 4 1 2 0 0 0 0 1 0
them all
I do not 2 8 0 0 2 0 0 0 1 1
prefer them
It does not 3 12 0 0 1 1 1 0 1 0
matter
which one
Total 25 100 7 10 8 2 4 10 5 4
The results in Table 8 show that almost half of the respondents preferred
working with class diagram C. This diagram is a HLoD forward design class
diagram. 48% of the respondents preferred diagram C because they men-
tioned that the class diagram is clear, the necessary information are pro-
vided e.g. attributes and operations and most of the classes that are pre-
sented are important. This diagram was preferred most by students and
researchers and one IT Professional. 20% of the respondents preferred to
use class diagram A. Most of the respondents that chose this diagram were
Researchers/Academic and IT Professionals with the skill in class diagram
ranging from Average to Excellent. It seems that most of the respondents
that have a good skill and experience in class diagrams prefer to use this dia-
gram. The respondents mentioned that they preferred this diagram because
it is simple, less technical, domain oriented, systematic and has meaningful
classes. 12% of the respondents mentioned that it did not matter which
diagram they get while only 4% of the respondents preferred all the pre-
sented class diagrams. 8% of the respondents preferred class diagram B and
another 8% did not prefer all the presented class diagrams. They did not
33
prefer the class diagrams because they mentioned that there is “no story”
in the class diagrams and the class diagrams only show the solution, not the
foundation of the domain.
3.2.3.5 Question C5: Referring Figure 12, select the classes that
you think should not be included in this class diagram
This class diagram was derived from the domain of a Pacman Game. It is
slightly different with the class diagram presented in question C3 because
this class diagram was constructed by using a reverse engineering technique.
The Pacman Game implementation is closely following the forward design
and that is the reason why there is a small difference between the forward
engineered class diagrams and the reverse engineered class diagram. In
this question, we tried to discover if there was any difference of selecting the
classes that should not be included in a class diagram in a reverse engineered
class that is close or almost similar with the forward design class. We also
tried to discover which class diagram was preferred which was asked in a
later question.
[Tile]
[GameModel]
[Player]
[Maze]
[GameLevel]
[GameEvent]
[Character]
[Ghost]
[MazeIterator]
[GameListener]
[ConsoleView]
[ConsoleControl]
[GameListenerAdapter]
[PacShell]
[Direction]
% 0 10 20 30 40 50 60 70 80
The complete result of this question is shown in Figure 14. The result
shows that the class Direction and PacShell were selected by 72% of the
34
respondents to be left out from the class diagram. The reasons could be that
those classes have no relationship to other classes, it is an enumeration class
(Direction) and it is a common programming class (PacShell). Compared to
the question C3, the Iterator and Iterable classes were differently presented
in this reverse engineered diagram. The interface class is automatically
presented in the class that is connected to the interface class. For instance,
the interface class Iterator is presented in the Maze class. This result shows
that coupling influenced the selection of a class to be excluded in a class
diagram.
I prefer class 10 40 1 7 2 1 1 5 2 1
diagram D
(figure 12)
I prefer class 4 16 1 0 3 0 1 1 1 1
diagram C
(figure 11)
I prefer them 3 12 1 2 0 0 0 2 1 0
both
I don’t prefer 3 12 2 1 0 0 2 0 0 1
them
It doesn’t 5 20 2 0 3 1 0 2 1 1
matter which
one
Total 25 100 7 10 8 2 4 10 5 4
The results in Table 9 show that most of the respondents (mainly researcher)
preferred to use the reverse engineered class diagram (Class Diagram D).
40% of the respondents chose this diagram because it is more detailed, clear,
35
there is no interface class and it is easier to understand. 20% of the respon-
dents did not choose any of the two class diagrams because for them it does
not matter which one. The reason mentioned by these respondents was that
both class diagrams are equally good and similar. On the other hand, 16%
of the respondents preferred class diagram C. The respondents mentioned
that class diagram D has a complete view of the attributes and operations
and all classes are successfully linked. There is no pattern of selection pre-
sented in this result in terms of the respondents’ role and skill.
If we compare the results of this question and the results of question C4, we
found that the reverse engineered class diagram is chosen if the source code
was closely implemented based on the forward design. These kinds of source
codes are capable to construct a very helpful class diagram that is mostly
comparable with the forward design class diagram and are sometimes even
better. This means that reverse engineered class diagrams that are mostly
similar with the forward design are preferred for the software engineer to
understand a system design.
3.3 Discussion
In this section we discuss the results and findings presented in the previ-
ous section. This section is divided into 5 parts which are the following:
Respondents’ Background, Software Design Metrics, Class Names and Cou-
pling, Class Diagrams Preferences, and Threat of Validity.
36
This also confirms that every respondent had the minimum knowledge to
answer this questionnaire.
In the Coupling category, we have discovered that classes that have many
incoming and outgoing dependencies are important since the points that
they have are 17 and 16, respectively. These points are not high if you com-
pare it with the other categories but in this category these two metrics are
one of the highest, next to IC Attr with also 17 points. We thus discovered
that dependencies are important. Dependencies are also relationships and
if we look at the results of the next part of the questionnaire we see that
coupling (with other words: relationships) is an influencing factor if we want
to include or exclude a class. We have mentioned earlier that attributes are
a common element in a class. Here, we found that IC Attr and EC Attr
have a high amount of points (17 and 15, respectively). They have more
points than EC Par (11 points) and IC Par (9 points). The reason might
be that the class that is declared as an attribute is more important because
the class could be used for every operation in the class. Meanwhile, if the
class is only declared as parameter in an operation, the object of the class
can only be used by the operation internally.
In the Inheritance category, we discovered that for a class that has a high
number of children (NOC), the class should be included in a class diagram.
This parent class is helpful to show the abstraction of a group of classes. On
the other hand, DIT and CLD show the lowest scoring among the software
design metrics. For DIT, the higher number of DIT does not indicate it is
an important class because it basically means that this particular class is
37
located very low in the inheritance hierarchy which means that this class is
too detailed and most of the times not needed. For CLD, if a class has a
high frequency of this metric then this means that this class is very abstract,
meaning that this class alone will not be enough to understand the whole
hierarchy. So it is basically a class that presents an abstraction of classes.
As for the complete results, we found that NumPubOps has the highest
points of all the metrics. Also all the metrics have a positive score. This
means that all software metrics that were used in this study is useful. A
negative score means that the class is not useful and should not be included
in a class diagram. As we mentioned before, the main purpose of this study
is to get the important metrics that influence the class inclusion in a class
diagram. Based on the result, we can get this information by ranking the
score of these metrics. The overall ranking of the score is shown in Table
10. This result could be applied for a software designer to simplify a class
diagram during the documentation phase. This result is a little bit contra-
dicting with the result in Part C. In Part C, a lot of metrics that are related
to relationship have a higher score than the metrics from the Size category.
Part C shows that relationship is an important element in a class diagram.
38
on the results, we have seen that most of the respondents exclude classes
that have no coupling at all, meaning that these classes does not have any
relationships. Many respondents also exclude classes that have coupling <=
2. Another influencing factor is the class name. Many respondents excluded
GUI related classes in the Library system because of the class name and
coupling. However, sometimes this element is not an influencing factor as
we have seen in the ATM system because many respondents actually ex-
cluded domain related classes, classes that are needed for the functionality
of the ATM system.
Aside from these two big influencing factors, many respondents excluded
the type of classes like enumeration and interface. Either of these classes
did not contain any information in it or the coupling was very low. Another
reason of why the interface classes are excluded could be that these classes
are GUI related.
From our observation, the reverse engineered class diagram of the Library
system was not preferred because the structures of the classes were not well-
presented. This might be because the implementation was not conforming
to the design or there was no design in the system before implementation.
39
consider that the amount of full responses were not enough. The locations
of the respondents were also not well distributed in this survey because most
of the respondents came from The Netherlands and Malaysia. Most of the
questions in this study require the respondent to choose the best answers.
We needed to do predictions on why the respondents chose these answers
and this prediction may not be accurate. This questionnaire should contain
question in which it asks why the respondent chose the answer to get the
reason.
3.4 Conclusion
In this survey we have discovered the most important elements in a class
design that should be included in a class diagram. We also discovered what
flavor of class diagrams is preferred to work with. We discovered these
findings by doing an online questionnaire. There were 25 respondents that
completed this questionnaire.
From the results, we discovered that the most important software design
metric is the Number of Public Operations. This means that if a class has
a high number of public operations then this indicates that this class is
important and should be included in a class diagram. In this survey we
also discovered that the class names and coupling are influencing factors
when selecting a class to be excluded from a class diagram. Classes that
have number of coupling less or equal to 2 are most likely to be excluded
from the class diagram. Our most significant discovery of this survey is the
preference of class diagrams the respondents had. The reverse engineered
design is being preferred over the forward design. However, the source code
must implement the forward design so that the reverse engineered design is
mostly similar to the forward design.
With these results we can now highlight the reverse engineered class dia-
grams if they are good for understanding a system or not. We can also
highlight which classes should be included or excluded based on our results
and analysis by looking at the metrics and behavior the respondents had
in Part C. Although the number of responses of this questionnaire is not
that high, we still managed to find some influencing factors when selecting a
class to be included or excluded in a class diagram and we discovered what
type of class diagrams these respondents prefer which are some important
elements that could be used for simplifying UML class diagrams.
40
4 Class Diagram Simplification: What is in the
developer’s mind?
Class diagrams are diagrams that should support the software developer in
understanding a system. However, sometimes the class diagram is too com-
plex to easily understand a system in a short period of time. Is there a way
to simplify this diagram? This survey is to enquire the information about
what type of information they would include or exclude in order to simplify
a class diagram. This survey involved 32 software developers with 75 percent
of the participants having more than 5 years of experience in class diagrams.
We discovered various elements that are important in a class diagram such
as relationship, class names and properties. We also discovered elements
that are not important in a class diagram such as GUI related information,
private and protected operations, and constructors without parameters.
The outline of this chapter is the following: we first describe the survey
methodology. Then, we show our results and give our findings based on
these results. We end this chapter by discussing our findings and presenting
our conclusions based on our analysis of this questionnaire.
41
wanted to compare the respondents’ preferences for UML models or source
code for understanding a system.
Every set of the questionnaire had both MLoD and HLoD. In set A, ATM
system in MLoD and Library System in HLoD were used and in set B, ATM
system in HLoD and Library System in MLoD were used. Different Level of
42
No Class Diagram Set A Set B
1 ATM System Medium Level of De- High Level of Detail
tail (MLoD) (HLoD)
2 Pacman Game Forward Engineered Reverse Engineered De-
Design sign
3 Library System High Level of Detail Medium Level of Detail
(HLoD) (MLoD)
Detail (LoD) were used to simulate different types of details that normally
exist in a class diagram. We also used different sources of class diagrams by
setting forward design and reverse engineered class diagrams to simulate the
different flavors of class diagrams that exist in the software industry. Table
12 explains about the Level of Detail.
43
No. Question Description
2. Question C2: In a class diagram, what type of information do you To find out what type of
think can be left out without affecting your understanding of a information can be left out from
system? a class diagram.
4. Question C4: What criteria do you think indicate that a class (in a To discover how the developers
class diagram) is important for understanding a system? recognize the criteria of a class
that is important in a class
diagram.
5. Question C5: If you try to understand a class diagram, which To determine which relationship
relationships do you look at first? that can be considered important
(Example: dependencies, inheritance, associations, etc) in a class diagram.
6. Question C6: If there is a tool for simplifying class diagrams (e.g. To find out what kind of features
obtained from reverse engineering), what features\functions would or functions are needed for a
you expect from such a tool? class diagram abstraction tool.
44
4.2.1 Part A: Personal Questions
This part consists of six questions related to personal characteristics, know-
ledge and experience. We will give our findings on each question in this part
as well as the other parts. In part “Others” we present several combinations
of the results.
80
60
40
20
0
Project
Architect Analyst Designer Programmer Tester
Manager
% 9 50 13 28 81 3
As for the results, 81% of the respondents are programmers and half of
the respondents are software architects. As shown in Figure 15, 28% of
the respondents are software designers. Figure 15 also highlights that the
majority of the respondents are involved in the Design and Implementation
phase in software development. 14 out of 26 programmers (54%) are also
software architects or software designers. This means that half of the pro-
grammers are involved in designing the software. All project managers that
were involved in this study are also programmers. This indicates that all the
respondents that participated in this study are directly involved in software
development.
45
these results we found that 50% of the respondents are experienced with
class diagrams for more than 10 years. This is expected because most of the
participants of this survey indicated that they know UML when we asked
them before the questionnaire was handed over. The results also show that
75% of the respondents have experience with class diagrams for more than
5 years. There are only about 11% (3 respondents) having less than 1 year
experience in class diagrams. Even though they have less experience in class
diagrams, they have knowledge about UML based on the results in Question
A3.
30
20
10
0
> 10 years 7 to 9 years 5 to 6 years 3 to 4 years 1 to 2 years < 1 year
% 50 11 14 7 7 11
46
Where did the Respondent Learn about UML
0 10 20 30 40 50
percent
4.2.1.4 Question A4: How do you rate your own skill in creating,
modifying and understanding a class diagram?
This question was aimed to gain knowledge about the skills of the respon-
dents in creating, modifying, and understanding class diagrams. Based on
Figure 18, most of the respondents (88%) have average or good skills on
creating, modifying, and understanding class diagrams and only 3% have
excellent skills related to class diagrams. This indicates that over 90% of
the respondents have average skills or above related to class diagrams. Mean-
while, 2 respondents (6%) have low skills and only 1 respondent (3%) has
poor skills related to class diagrams. The 2 respondents that have low skills
are software architects (with no other role) and the only one respondent that
has poor skills is a programmer (with no other role).
30
20
10
0
Poor Low Average Good Exellent
Series1 3 6 44 44 3
47
4.2.1.5 Question A5: Indicate whether you (dis)like to look at
source code for understanding a system? + Question A6:
Indicate whether you (dis)like to look at UML models for
understanding a system?
10
0
Strong Dislike Dislike Neutral Like Really Like
48
Figure 20 shows the results of question A5 and A6 for respondents with the
role of a programmer. The results show that the programmers are a bit more
positive about source code than UML but the difference is not significant.
These results seem almost the same with the overall results shown in Figure
19. These results were expected because an experienced programmer could
understand source code in a short time limit.
10
0
Strongly Dislike Dislike Neutral Like Really Like
It was quite a surprise to see that a lot of software architects like using source
code more than UML to understand a system (Figure 21). The same goes
for the software designers, they like using source code more than UML to
understand a system (Figure 22). However, these results may not be purely
accurate because as we can see in question A1 (role of respondents), most
of the designers and architects in this survey are involved in development or
are also a programmer.
49
4.2.1.6 Others:
Combination of Question A2 & A3
20
No of Respondents 15
10
0
less than 1 1 to 2 more than
3 to 4 years 5 to 6 years 7 to 9 years
year years 10 years
From Colleagues / Industrial
1 0 0 0 1 3
practice
Learn by Myself 1 1 0 0 0 9
Professional Training 0 1 0 2 1 3
HBO/University 1 0 2 2 2 5
No 0 0 0 0 0 0
Figure 23 combines the answers given on question A2 with the answers given
on question A3. This figure shows that 45% of the respondents with 10 years
of experience and above learned UML by themselves. Also, most of the re-
spondents that answered “Learned by myself” came from this group. The
respondents that answered HBO/University are more spread out over the
years of experience and this option has been answered the most if we look
at question A3. Figure 23 also proves that all respondents in this survey
have minimum knowledge of UML even though there are respondents that
have answered that they have experience in UML for less than one year.
50
Class Diagram Skill per Role
30
25
No of Respondent
20
15
10
5
0
Poor Low Average Good Excellent
Tester 0 0 1 0 0
Programmer 1 0 12 12 1
Designer 0 0 3 5 1
Analyst 0 0 2 1 1
Architect 0 2 6 7 1
Project Manager 0 0 0 3 0
51
In the Attribute category, we divided this category into two subcategories:
Properties and Type of Attribute. We divided the Properties subcategory in
three elements: Protected, Public and Private. This basically means that if
a respondent marked the private variables in a class diagram or mentioned
about excluding the private attribute, we considered that the respondent
chose not to include the Private attribute element in a class diagram. The
same goes for the other elements in this subcategory. We also divided the
Type of Attribute subcategory into three elements: No primitive type, GUI
related, and Constant. No primitive type is an attribute that does not have
any primitive type. GUI related attributes are attributes that are related
to Graphical User Interface (GUI) libraries that are provided by the devel-
opment tools such as Textbox, Label and Button. Constant variables are
variables that cannot be changed. Figure 25 illustrates subcategories and
elements in the Attribute category.
20
15
10
5
0
Instance
GUI Related Private Constant Protected
Variable
% 25 19 19 13 6
52
4.2.2.2 Category 2 : Operation
0
Private Protected Public Return Type
No of Response
53
Figure 28 shows the Operation Properties that have been chosen by the
respondents to be excluded in a class diagram. It shows that there is only
one respondent for every operation property that have chosen these elements
that should not be included in a class diagram. The results show that
the majority of the respondents have chosen that all elements in Operation
Properties should be included in a class diagram.
20
15
10
5
0
Constructor
General
Without Getters/Setters Constructor Event Handler
Function
Parameter
% 25 19 16 9 6
The results of the Type of Operation category are presented in Figure 29.
The results show that 25% of the respondents chose to exclude Construc-
tors Without Parameters. This type of operation is not important because
it does not indicate any important information because the default initial-
ization of an object is without parameters. Nevertheless, 16% of the respon-
dents suggested that all Constructors should be left out in a class diagram.
For Getters and Setters, 19% of the respondents suggested that these o-
perations should be excluded in a class diagram. A reason for this could be
that it is a common operation that is created for accessing and modifying
variables in a class diagram. 9% of the respondents mentioned that General
Functions should not be included in a class diagram because these functions
are commonly used and well-known to programmers. Event Handlers were
chosen to be excluded from a class diagram by 6% of the respondents. Most
of the event handlers in the class diagrams in this survey are derived from
GUI libraries. Apart of the result presented in Figure 29, 15% of the respon-
dents indicated that all operations should be excluded from a class diagram.
These respondents mentioned that only class names and relationships are
needed in a class diagram.
54
Figure 30: Class Category
Classes, Log, and GUI Related. The Role subcategory means that classes
have specific role in a system. To exclude a class in a class diagram, we
focus on the classes that perform a supporting role in a system such as GUI
Related classes, Log classes and Console classes. For instance, class Log in
the ATM system is categorized in the Role subcategory.
30
20
10
0
Enumeration Interface abstract
% 38 19 13
Figure 31: Type of Class that Should not be Included in a Class Diagram
For the subcategory Type of Class (Figure 31), 38% of the respondents chose
not to include Enumeration classes. This is followed by Interface classes
with 19% and 13% suggested that Abstract classes should not be included
in simplified class diagrams. Enumeration classes are classes whose values
are enumerated in the model as enumeration literals, which are not needed
to understand a system.
Figure 32 shows the Role subcategory results. It shows that half of the
respondents suggested that GUI related classes and classes for logging tasks
55
Class Role that Should be Excluded in a Class Diagram
60
50
Percentage
40
30
20
10
0
GUI Related Log Input Listener Console
% 50 50 22 6 3
should be left out in order to simplify a class diagram. Most GUI related
classes were presented in the Library system and the Log class was presented
in the ATM system. The respondents suggested eliminating these classes
because without these classes you can still understand the system. The
Input function is a class that is used to take the input from the interface or
device. In the case of the ATM system, the “Money” and “Card” classes are
an example of input function classes. 22% of the respondents said that this
type of class should not to be included in a class diagram. The “Console”
and “Listener” functions appear in the Pacman Game in Part B. These
classes can be considered as classes that interact with the user input and
other system input. There are 6% of the respondents that chose to exclude
the listener function from the class diagram while 3% of the respondents
chose to exclude the console function.
The Relationship category is divided into two subcategories which are Role
and Coupling <= 1. The Role subcategory means the role of the relationship
56
that is labeled on this relationship. Coupling <= 1 means classes that have
relationships equal to 1 or no relationship to other classes at all. Figure 33
shows the information about the subcategories for the Relationship category.
20
0
Classes with Coupling
Role
<= 1
% 31 6
Almost all the respondents that participated in this survey agreed that the
Relationship element is important in a class diagram. However, there is
some information related to the Relationship element that should not be
included in a class diagram which are Classes with coupling less or equal to
1 and the Role of a relationship. 31% of the respondents intend to exclude
classes with Coupling <= 1 because it seems that classes that only have
coupling <= 1 are not important and more seen as a helper class. 6% of
the respondents chose to remove the Role of relationship. The results are
shown in Figure 34.
57
Inheritance No of Respondents
Inherited Operations 3
Package
Separation of
Class Diagram
Package No of Respondents
Separation of Class Diagram 4
The amounts of classes in the three class diagrams are ranging from 15 to 22.
Specifically in the Library System class diagram, there were 4 respondents
that drew several lines to separate the GUI related classes from the classes
that were created by the software developer. They suggested that the class
diagram should be separated into two different diagrams. This basically
means that they wanted to keep the GUI related classes and classes created
for the system separated. There was one respondent that mentioned that the
class diagram is too big and also there was one respondent that suggested
that the class diagram should only consist of 5 to 7 classes in a class diagram.
In Psychology there is a theory that humans can only focus on 7 ± 2 objects
at the same time otherwise there are too many objects to focus on.
58
Figure 37: Others Category
Others
50
percentage
40
30
20
10
0
GameEve MazeIter Envelope Operator
PacShell ObjId Ghost
nt ator Acceptor Panel
% 47 16 16 9 6 6 3
Figure 38 shows the overall results of this category. In this figure there are
46% of the respondents that chose to exclude the PacShell class from the
Pacman Game. The PacShell class consists of the main function and this
type of class is perhaps common in the programming language. 15% of the
respondents excluded GameEvent and MazeIterator classes in the Pacman
Game class diagram. The GameEvent class is excluded because the class
looks like a helper class and it is only related to one class (i.e. coupling =
1). The MazeIterator class also looks like a helper class for another class
59
(i.e helper class for the class Maze in the Pacman Game).
9% of the respondents excluded the ObjId class. A possible reason for this
is that this class is a helper class that gives input to another class. Although
this class has a lot of connections with other classes, it seems this helper
class is not required to be shown in a class diagram. 6% of the respondents
chose to exclude EnvelopeAcceptor and Ghost. The EnvelopeAcceptor class
is only a helper class that is used to transfer data to another class. This
could be the reason that the respondents decided to exclude this class. For
the Ghost class, this may be a misinterpretation of the name of the class.
There is a possibility that the respondents did not understand the role of
the ghost in Pacman and they perhaps think that the Ghost class is a helper
class or a dummy class. In fact, the Ghost class is an actor in the Pacman
Game. 3% of the respondents chose to exclude the OperatorPanel class be-
cause the name is likely to present as a GUI related class.
Hence, from our analysis we may say that most of the classes that have
been named in this category are either helper classes or GUI related classes.
The respondents find such classes not important. This statement also be-
comes more valid if you look at Figure 40 in Part C (question C2A), which
is described later.
60
No Category Keywords No Category Keywords
1 Association Abstraction
Main
Classes/Object/Purp
ose Object related
Class
Class functionality
Semantic
and responsibility 4 Concept
Reasoning Data
5
Others
All Generic
"starting" point Classes
61
3. Class structure: Types of information about the structural design
of a class.
(a) Some respondents said that a class diagram should show what
kind of Data a system needs.
(b) Some respondents said that they only need Generic Classes to
understand a system.
60
50
40
30
20
10
0
Class Class Diagram Class Structure
High level Others
Relationship Semantic and Properties
% 69 50 34 31 6
The results of this question are shown in Figure 39. The results obviously
show that class relationship is the most important information in a class dia-
gram that the respondent searches for understanding a class diagram. 68%
of the respondents mentioned this. 50% of the respondents search for class
semantics such as meaningful class names, class functionality and behavior,
class properties and so on. It is possible that the respondents were trying to
understand the structure of the system by searching for the classes that are
related to the software domain. This means that class diagrams that can
62
present semantics of classes (such as a good class name and class proper-
ties) would provide better software design understanding. About 34% of the
respondents were looking at class properties such as attributes, operations,
class interfaces and so on. This follows with 31% of the respondents that
were looking at the class diagram high level abstraction for example design
concepts, design patterns and class overviews.
30
20
10
0
Library Persistency Utility Not Related Without
Helper Class Interfaces Logging Technical
Class Classes Classes to Domain relationship
% 44 25 22 9 3 3 3 3 3
As shown in Figure 40, almost half of the respondents (44%) suggested that
helper classes should not be included in a class diagram. However, it is not
easy to detect a helper class in a class diagram. Detection based on the
class name, operation, and attribute of the class may be used but it is not
63
accurate since the helper class does not have a standard characteristic and
it depends on the system domain and the software developer who exemplify
it. This result also validates our result in the “Others” category in part B
because most of those classes are helper classes as well.
A quarter of the respondents (25%) did not want library classes to appear
in a class diagram. These library classes could make a class diagram more
complex and hard to understand. The classes that should be included in
a class diagram are only the ones that are created by the designer or pro-
grammer. 22% of the respondents suggested that the interface class type
should not to be included in the class diagram. A reason for this could be
that interface classes are GUI related and are not important for understan-
ding a system. 9% of the respondents indicated that the log class should be
excluded. The log class seems not important since it is common to create
a log for a transaction or activity in a system. Also, log classes are nor-
mally linked or related to other classes that could make the class diagram
more complex. Infrastructure, technical, framework, classes not related to
domain, and classes without relationship were said to be left out from a class
diagram by 3% of the total respondents.
50
40
30
20
10
0
Constructor Supporting
Constructor Getters / Overload GUI event
Private Protected without / default Public
/ Destructor setters function handler
parameter function
% 66 56 41 16 9 9 3 3 3
Figure 41 shows that 65% of the respondents chose to exclude private ope-
rations in a class diagram. Constructors and destructors are also types of
operations that are not needed in a class diagram (56% of the respondents)
in order to understand a system while only 9% of the respondents say that
they do not need constructors without parameters. 40% of the respondents
mentioned that protected operations should be left out from a class diagram.
A reason for this could be that this type of operation can be assumed as a
64
private operation but appears public to some classes only. It was quite a
surprise that not many respondents suggested to remove mutator methods
(getters/setters) from the class diagram since these operations can be inte-
grated in other operations that a system actually needs. Supporting/default
functions such as “toString” should be excluded from a class diagram men-
tioned by 9% of the respondents. 3% of the respondents suggested that
public operations, overload functions and GUI event handlers should be left
out.
D: Other(s)
As shown in Figure 43, about 9% of the respondents said that private fields
should not be included in a class diagram. Only 3% of the respondents
suggested technical, duplicates and UI information not to be included in a
65
Other Information in Class Diagram that
Should be Left out
10
8
6
4
%
2
0
Private Fields Technical Duplicates User
Interface
Figure 43: Other Information in Class Diagram that should be left out
class diagram.
20
15
10
5
0
Leave out
Relevant/Key/ Full Parent/abstrac Concept
only 1 level Library <= 2 level
Important Hierarchy t Classes Related
Classes
% 25 25 9 9 9 3 3
Figure 44 shows that 25% of the respondents suggested that only rele-
vant/key/important classes should be presented in a class diagram regardless
if it is in an inheritance hierarchy while another quarter of the respondents
want the full hierarchy of the inheritance to be shown in a class diagram
(answered ‘Yes’). 9% of the respondents only want the parent class to ap-
pear in the class diagram and also 9% of the respondents suggested that the
66
hierarchy should only consist of 1 level of children or parents. 3% of the re-
spondents mentioned that they only need inheritance that is concept related
and another 3% mentioned that a maximum hierarchy level of 2 should be
enough.
This result shows that the respondents needs full hierarchy, but if the classes
can be identified as key/important/relevant classes, the inheritance can be
simplified by only showing these key/important/relevant classes.
30
Percentage
20
10
0
Meaningful Business / Position of Functionality / Simplified Highlighted
Relationships Size of Class
Classnames Domain Value Class Responsibility Classes Information
% 38 16 16 16 9 9 6 3
67
4.2.3.5 Question C5: If you try to understand a class diagram,
which relationships do you look at first? (Example: de-
pendencies, inheritance, associations, etc)
This question aims to find out the type of relationship the respondents look
at first to understand a class diagram. Three types of relationships were
provided as example answers. The answers are quite biased because most
of the answers only mentioned about these types of relationships. None of
the respondents answered other types of relationship such as composition,
aggregation, and realization.
40
20
0
Association Dependency Inheritance
% 41 19 9
Figure 46: The Type of Relationship in a Class Diagram that the Respon-
dents Look at First
Figure 46 shows the results of this question. It shows that 41% of the re-
spondents liked to search for association relationships first while 19% search
for dependency relationships. Only 9% of the respondents search for in-
heritance relationships. Several respondents answered with more than one
relationship and some respondents ordered these three relationships in an
order of importance. We only took the first answer given so if someone
answered “association and dependency” for example then we only took the
first relationship which is association in this example. This result obviously
shows that the association relationship is the most important relationship
in a class diagram. The association relationship is important to show the
relationship between classes.
68
The Features which a Tool Should have for Simplifying UML Class Diagrams
35
30
25
Percentage
20
15
10
0
Hide/Unh Show Generate Classify Generate
Navigatio Generate Visual
ide Drill more Give from Classes in UI from Change
n/Change Source indication
Informati up/down Informati Advice different Importanc Classifier Source Log
Layout Code of data.
on on XMI files e Code
% 31 22 16 13 6 3 3 3 3 3 3 3
Figure 47: The Features that a Tool Should have for Simplifying UML Class
Diagrams
Figure 47 shows the results of this question. The results show that the re-
spondents mainly want a tool that can hide/unhide information (31% of the
respondents). The other feature that relates to this is the drill up/down
feature because when you are drilling up, the amount of information of a
class diagram will be less and vice versa. 22% of the respondents want such
a feature. These two features are different from each other because in the
first feature you can manually mark the information that you do not want
and the second feature automatically lessens the information when you are
zooming out.
16% of the respondents want to see more information about a class by hover-
ing over a class in a class diagram for example. This could show you different
kinds of information, for example the amount of relationships this class has.
Another feature that many respondents want (13% of the respondents) is
the changeable layout of the class diagram in which the navigation can be
improved. An option could be to resize the layout of the class diagram. 6%
of the respondents want to have a feature in the tool that can give advice
that could improve the class diagram. The other features that are shown in
this figure are being desired by only 1 respondent (3%) each.
Thus, these results show that the respondents want a tool in which the
user of this tool can manipulate the level of detail of a class diagram by
hiding/showing information or zooming out/in of the class diagram.
69
4.3 Discussion
In this section we discuss the results and findings presented in the previ-
ous section. The discussion is divided into five subsections: Respondents’
Background, Class Properties, Class Role and Semantics, Class Diagram
Simplification Tool Features, and Threat of Validity.
70
important element to show the structure of classes in a class diagram. With-
out relationships, a class diagram would only be a list of classes without
showing which class is involved with the other. In terms of inheritance, a
quarter of the respondents needs full hierarchy of the inheritance tree to
be presented but another quarter of the respondents mentioned only classes
that are relevant or important should be presented. Most of the respondents
in this survey looked at association relationships first. This shows that the
association relationship is important in class diagrams. However, this result
is not really accurate since the respondents only gave the answer within the
examples given in the questions.
71
4.3.3 Class Role and Semantics
One of our useful discoveries in this study is the importance of the class
role and semantics in a class diagram. Class roles based on class name are
important because from our observation the respondents seemed to try to
understand a system based on class name and role. Not only class names
can present a role of a class, the operation name and attribute name are also
crucial. This showed when we asked about the important criteria in a class
diagram for understanding a system, most of the respondents mentioned
‘meaningful’ class names, business and domain value related and also func-
tionality or responsibility. From these meaningful class names, they tried to
understand the structure and also interact between classes in a system. By
using this information, they can get an overall idea on how a system works
and get some hints of the functionalities of classes in a class diagram.
In this survey we also discovered that classes that should be left out in a
class diagram are helper classes, library classes and interfaces classes. Most
of the respondents suggested leaving out helper classes. Nevertheless, it is
not easy to automatically identify helper classes based on the class name or
other information because it only can be identified manually by the software
developer and the results are different based on the software developer’s ex-
perience. Helper classes could possibly be detected if the criteria of the
helper classes would be available.
72
whether to choose software architect or software designer because both roles
were used in different terms but with the same meaning.
4.4 Conclusion
This study presented a survey on how to simplify a class diagram without
affecting their understanding of a system. In particular, the questions in this
survey were about what information should be left out from a class diagram
and also what kind of important information should remain. 32 software
developers from the Netherlands participated in this survey.
From the results, it is not a surprise that the most important element in
a class diagram is the relationship. Class relationship is important to show
the structure of a system. The type of relationship that the developers look
at first is the association and dependency after. In this survey we discovered
that the class diagram’s role and semantics are important because most of
the respondents search for meaningful class names and class roles in order
to get high-level understanding on how a system works. This means, mea-
ningful class names, operation names and attribute names are important to
show the functionality or responsibility of a system.
5 Conclusions
In this chapter we present a summary of our findings of this study. Then, we
present several recommendations based on our analysis. Next, we describe
some future works that can be done after this study. Finally, we present our
conclusions based on the analysis of these two questionnaires.
73
5.1 Summary of Findings
In this section we present a summary of our findings and present the simi-
larities and contradictions of the given responses between the two question-
naires.
74
In terms of operations, we discovered that in both questionnaires the respon-
dents preferred public operations since these operations are not restricted to
one class. In the structured questionnaire, the metric that counts the num-
ber of public operations scored 25 points and was also the most important
metric based on the results. In the non-structural questionnaire, we dis-
covered that most of the respondent suggested leaving out the private and
protected operations, hence our statement that the respondents preferred
public operations. Other discoveries in the non-structural questionnaire are
that respondents also suggested leaving out the constructors, specifically
constructors without parameters. This is because the constructor is a de-
fault function provided by development tools when a class is created.
One of our useful discoveries in this study is the importance of the class
names. We have found out in the non-structural questionnaire that the re-
spondents seemed to try to understand a system based on the class names
and roles. This was shown when we asked the important criteria in a class
diagram for understanding a system. Many respondents mentioned that
there must a story around the class diagram and should show the func-
tionality and flow of the system. Most of respondents mentioned that a
class diagram needs to contain “meaningful” class names and must be do-
main related. This can be further validated by inspecting the structured
questionnaire. In question 14 of part B we discovered that the keyword
“Domain related” was one of the most important keywords in this question
that the respondents mentioned. Also, the class names were an influential
factor in part C of the structured questionnaire.
Another useful discovery is that the respondents preferred the reverse engi-
neered class diagram in the structured questionnaire. However, the source
code must correspond with its forward design in order to get a good reverse
engineered class diagram. Thus, this is the reason why the respondents pre-
ferred the reverse engineered design of Pacman over the reverse engineered
design of the Library system because the Library system might not corre-
spond to its forward design.
75
5.2 Recommendations
We recommend highlighting the classes in a class diagram based on our
analysis. These highlighted classes can be used to advise the software devel-
oper/maintainer as a hint on which classes should be included or excluded
in a simplified class diagram.
We also recommend that all classes in a class diagram should have a mean-
ingful name that can present the functionality or the features that is provided
by the class. This also applies with the names of the attributes and opera-
tions. By using meaningful names, the software developer can understand
better and faster because they can predict the flow of the system and the
class’ functionality.
Also from this study, we have discovered information that should be left out
to simplify a class diagram and what metrics are important. By using this
information, a simplified class diagram could be produced. We propose to
validate the resulting class diagram by using an industrial case study and
discover the suitability of the simplified class diagram for the practical us-
age. This proposed study may discover other information that is needed in
a class diagram and other information that can be excluded. It would also
be interesting to include other metrics that we have not chosen and check
whether they are important or not.
76
From the results, we found that class role and responsibility are one of the
important indicators in a class diagram. The role and responsibility of a
class are detected by using the class names, operation names and attribute
names. We would like to suggest a study on names (class, operation and at-
tribute) that the software developers find important or meaningful in order
to understand a system. The results of this study can be used to predict
the important classes in a class diagram.
5.4 Conclusions
In this study we have created two different questionnaires to find out what
kind of information should be left out and what metrics are important in
a class diagram in order to simplify a class diagram. One of the two ques-
tionnaires was created online and we received 25 complete responses. In
the other questionnaire, 32 software developers from The Netherlands par-
ticipated in this survey. We have discovered the important elements that
should be included in a class diagram.
Most of the respondents also mentioned that they would exclude GUI related
information and also library classes. This basically means that the software
developers only want classes that are created by the designer. Helper classes
and the type of classes should be omitted in a class diagram. However, it is
77
not easy to detect a helper class.
References
[1] Enterprise Architect. http://www.sparxsystems.com.au/.
78
[11] A. Craig, A. Dinardo, and R. Gillespie. Pacman game. http://code.
google.com/p/tb-pacman/.
79
[24] A. Parasuraman. Marketing Research. Addison-Wesley Publishing
Company, second edition, 1991. http://www.sciencebuddies.org/
science-fair-projects/project_ideas/Soc_survey.shtml.
80