0% found this document useful (0 votes)
27 views12 pages

Measuring Technical Debt Interest Risk

This research paper presents a metric called Interest Generation Risk Importance (IGRI) to quantify the risk of generating technical debt interest, which refers to the additional maintenance costs incurred by technical debt items in software systems. The study validates this metric by demonstrating its effectiveness in prioritizing technical debt remediation and assessing the impact of new code on the risk of interest generation, concluding that new code typically poses less risk than legacy code. The findings emphasize the importance of managing technical debt to maintain software sustainability and reduce maintenance costs.

Uploaded by

David Tsutsui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

Measuring Technical Debt Interest Risk

This research paper presents a metric called Interest Generation Risk Importance (IGRI) to quantify the risk of generating technical debt interest, which refers to the additional maintenance costs incurred by technical debt items in software systems. The study validates this metric by demonstrating its effectiveness in prioritizing technical debt remediation and assessing the impact of new code on the risk of interest generation, concluding that new code typically poses less risk than legacy code. The findings emphasize the importance of managing technical debt to maintain software sustainability and reduce maintenance costs.

Uploaded by

David Tsutsui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SN Computer Science (2021) 2:12

https://doi.org/10.1007/s42979-020-00406-6

ORIGINAL RESEARCH

The Risk of Generating Technical Debt Interest: A Case Study


Georgios Digkas1 · Apostolos Ampatzoglou2 · Alexander Chatzigeorgiou2 · Paris Avgeriou1 · Oliviu Matei3 ·
Robert Heb3

Received: 1 July 2020 / Accepted: 13 November 2020 / Published online: 16 December 2020
© The Author(s) 2020

Abstract
Technical Debt (TD) interest refers to the extra maintenance costs incurred by the very existence of TD items in a system. The
generation of TD interest can make or break a system: too little interest and the effect of TD is negligible; too much interest
and the system becomes unsustainable. In this paper, we consider the generation of interest as a risk and present a metric to
quantify this risk. Subsequently, we validate this metric in two ways. First, we explore whether the metric can be effectively
used to prioritize TD remediation. Second, we investigate if adding new code reduces the risk of interest generation. The
results of the study suggest that: (a) the proposed risk management metric is capable of efficiently prioritizing TD items; and
(b) that the new code that is introduced in the system is usually less risky for producing interest, compared to legacy code.

Keywords Technical debt · Maintainability · New code · Clean code

Introduction that a software development organization (intentionally


or unintentionally) limits the development time/resources
Technical Debt is a software engineering metaphor that through shortcuts, and thus saves a specific amount of
draws an analogy between shortcuts in development and money (amount of loan–TD Principal) [1, 2]. This benefit
taking out a loan [14]. In particular, the metaphor considers comes with an associated cost, as the product is released
with sub-optimal quality, leading to the occurrence of main-
tenance costs [18]; such costs are termed TD Interest and
This article is part of the topical collection “Interaction between
Energy Consumption, Quality of Service, Reliability and Security, include bug fixing, understanding the existing code, adding
Maintainability of Computer Systems and Network” guest edited new features, etc. [1, 2]. While TD Principal is deterministic,
by Erol Gelenbe. TD interest is probabilistic: we are not sure how frequently
and to what extent a software artifact will change in the
* Georgios Digkas
g.digkas@rug.nl upcoming versions (thus generating interest). The probabil-
ity of an artifact to generate interest is termed TD Interest
Apostolos Ampatzoglou
a.ampatzoglou@uom.edu.gr Probability [28].
The generation of interest plays a crucial role for the
Alexander Chatzigeorgiou
achat@uom.edu.gr impact of TD on software maintenance. Modules that are
rarely maintained do not cause real problems along software
Paris Avgeriou
paris@cs.rug.nl evolution even if they suffer from high TD; paying back the
TD is in such cases unnecessary. On the contrary, modules
Oliviu Matei
oliviu.matei@holisun.com with TD that are often maintained can cause severe over-
head when performing future changes. Thus, we consider
Robert Heb
robert.heb@holisun.com the generation of interest as a risk that threatens software
maintainability.
1
Institute of Mathematics and Computer Science, University In this study, we propose a metric, namely Interest Gen-
of Groningen, Groningen, Netherlands eration Risk Importance (IGRI), to estimate the risk of
2
Department of Applied Informatics, University interest generation. According to Barry Boehm [9], the
of Macedonia, Macedonia, Greece importance of a risk can be calculated as the product of
3
Holisun SRL, Baia Mare, Romania

SN Computer Science
Vol.:(0123456789)
12 Page 2 of 12 SN Computer Science (2021) 2:12

its impact and likelihood to occur. In the case of IGRI, the to Validity”, the main threats to validity. Finally, in “Conclu-
likelihood of the risk corresponds to interest probability, sions”, we conclude the paper.
whereas the impact to the amount of technical debt interest.
The proposed metric can be useful in a number of ways;
in this study, we validate two of them. The first is to assist Related Work and Background Information
TD Prioritization, i.e., the priority to refactor a software
artifact [22]. Artifacts that pose a higher risk to generate In this section, we present related work and background
TD interest would be more urgent for refactoring to prevent information necessary for understanding this study. In
excessive maintenance costs. The second is to assess the particular, in “Technical Debt Prioritization”, we present
effect of writing clean new code on the technical debt evo- related work: i.e., studies on technical debt prioritization;
lution of the system. If new code is less risky to generate whereas in “Software Risk Management”, we discuss
interest, the sustainability of the system can be improved by background concepts from the software risk management
the addition of clean new code. The clean code paradigm is literature.
supported in the literature as an alternative to refactoring for
the improvement of software quality [23], and it tends to be Technical Debt Prioritization
preferable from the developers’ side, as a means to control
the amount of technical debt in the system [5]. The process of TD prioritization ranks identified TD items,
The research work reported in this study has been con- according to certain predefined rules to support deciding
ducted in the context of the SDK4ED1 project, funded by which TD items should be repaid first and which TD items
the European Union’s Horizon 2020 research and innovation can be tolerated until later releases [22]. According to Li
programme. The goal of the project is to investigate trade- et al. [22], TD Prioritization has been studied in 18% of the
offs between optimizations applied to improve Technical TD research corpus.
Debt, Security, and Energy dissipation in software inten- TD prioritization methods can be discussed under two
sive systems. Furthermore, the SDK4ED platform aims at perspectives: based on the concepts used as inputs, as well
assisting decision-making with respect to investments on as, based on the approach used for prioritization per se. With
software improvements. The assessment of artifacts which respect to inputs, according to Seaman and Guo [28], TD
pose a high risk of generating TD interest outlined in this prioritization can be performed, either based on Technical
study is aligned with the overall goal of the project to nar- Debt principal, Technical Debt interest, or Technical Debt
row down the recommended refactoring opportunities. interest probability. With respect to approach, existing meth-
Choosing among optimizations to mitigate software vulner- ods for TD prioritization can be categorized into four main
abilities detected through static analysis [32], to improve classes.
performance [30, 31] and energy consumption [37], and to
improve software maintainability [3, 11] is a non-trivial task. – The first class uses cost/benefit analysis, suggesting that
Research has proved the existence of interrelations between if resolving a TD item can yield a higher benefit than
these qualities [25, 33, 34] rendering the extraction of the cost, then this TD item should be repaid. TD items with
best possible sequence of software refactoring subject to a higher cost/benefit ratios of repayment should be repaid
Multi-Criteria Decision-Making (MCDM) analysis which first [29].
has been implemented in the SDK4ED platform. – The second class suggests that TD items that are more
The rest of the paper is organized as follows. In “Related costly to resolve should be repaid first [20].
Work and Background Information”, we present: (a) related – The third class uses portfolio management. In these
work on technical debt prioritization; (b) background work approaches, TD items along with other new function-
on software risk management. The framework that we use alities and bugs are considered as risks and investment
for calculating Technical Debt Interest and Interest Prob- opportunities (i.e., assets). “The goal of portfolio man-
ability, as well as the proposed metric are introduced in agement is to select the asset set that can maximize the
“Assessing the Risk of Generating Interest”. In “Case Study return on investment or minimize the investment risk
Design”, we present the empirical design through which we [17]”
explore the two aforementioned usage scenarios of the met- – The final class suggests that TD items incurring the
ric. In “Results”, we answer the research questions. In “Dis- higher interest should be repaid first [28].
cussion”, we discuss the main findings, whereas in “Threats
Software Risk Management

Risk management is a software engineering practice


1
https​://sdk4e​d.eu/. (involving processes, methods, and tools) that: (a) assesses

SN Computer Science
SN Computer Science (2021) 2:12 Page 3 of 12 12

Fig. 1  Risk assessment matrix

continuously what can go wrong (risks); (b) determines what basic principles of software risk management. As part of
risks are important to deal with; and (c) implements strate- risk analysis, he suggests that risk exposure (also termed
gies to deal with those risks [7]. According to Boehm, there as risk impact) can be calculated as the product of the
are three main categories of risks: project risks, product probability of unsatisfactory outcome and the loss caused
risks, and business risks [10]. Among these categories, the by the unsatisfactory outcome.
generation of Technical Debt Interest falls in the product risk
category: i.e., “risks that affect the quality or performance
of the software being developed”. In this paper, we focus on
Risk Analysis (see Sommerville [36]): risk analysis aims Assessing the Risk of Generating Interest
at assessing the likelihood and consequences of all risks
identified in a system. Therefore, the rest of this sub-section This section focuses on tailoring the concepts covered
focuses on how risks can be assessed. In the literature, there in “Software Risk Management” to fit the technical debt
are two main schools for risk assessment: categorical risk metaphor. IGRI is meant to quantify the impact of a possi-
assessment and continuous risk assessment. ble change in a specific artifact that suffers from technical
Categorical risk assessment. According to Sommerville debt, by assessing the probability of this artifact to change
[36], risk analysis relies on judgement and experience to and the amount of interest that is going to be generated,
find the probability of a risk (rare, unlikely, possible, likely, upon such a tentative change. The rationale of the met-
or almost certain) and the effects of the risk (catastrophic, ric relies on the risk importance formula, as proposed by
major, moderate, minor, or negligible). Based on this, the Boehm [8], which can be formulated as:
project managers generate a table according to seriousness
of risk and update it during each iteration of the risk process,
IGRI = Interest × Interest Probability. (1)
as shown in Fig. 1. The rest of this section describes the way that TD Interest
Continuous risk assessment. In a seminal work of and TD Interest Probability are calculated.
software project management, Boehm [10] introduced the

SN Computer Science
12 Page 4 of 12 SN Computer Science (2021) 2:12

Given Eq. 2, the ratio of the maintenance effort for the


optimal system (which is unknown) over the effort for the
actual system can be expressed as:
Effortopt maintainabilityact
= . (3)
Effortact maintainabilityopt

For convenience, we call the ratio in the right-hand side


of Eq. 3 Maintainability Level of the actual artifact, as it
expressed its relative quality compared to its hypothetical
optimal implementation. Thus, the effort for maintaining the
optimal system can be expressed as:
Effortopt = Effortact × MaintainabilityLevelact . (4)

Finally, based on its definition, Technical Debt Interest can


Fig. 2  Increased maintenance effort for technical debt items [3]
be calculated using the difference between the actual and the
optimal effort, as follows:
Technical Debt Interest
TD Interest = Effortact × (1 − MaintainabilityLevelact ).
In this study, we calculate Technical Debt Interest based on (5)
the FITTED framework [2]. The estimation of Technical Maintainability. Although no single function can capture
Debt Interest is a challenging endeavor as interest refers to the all aspects of quality, for the sake of simplicity, we assume
“additional” maintenance costs, incurred by inefficiencies in that the hypothetical ’optimal’ system is the one that opti-
software. The nominal software maintenance effort, i.e., the mizes a certain fitness function assessing the quality of soft-
effort that would have been required in case the system had ware (e.g., in terms of complexity, cohesion, coupling, etc.
zero technical debt, is unknown. The FITTED methodology, or any aggregate form of selected qualities). To enable the
which has been proposed [2] and empirically validated in our calculation of the aforementioned maintainability level, we
previous work [4, 38], attempts to calculate interest by estimat- first identify a set of artifacts (e.g., classes, packages, and
ing the “sub-optimality” of any given software artifact. FIT- systems [4]) that can be considered (structurally) similar,
TED assumes that any software artifact (or an entire system) i.e., in terms of lines of code, number of methods, cogni-
has an actual implementation, and a hypothetical optimal one tive complexity, etc. Next, we calculate the optimal value
(in terms of maintainability). Maintaining the optimal system of selected metrics among the set of similar artifacts. These
would require less effort than maintaining the actual system best metric scores are assumed to characterize the hypotheti-
(see Fig. 2). cal ‘optimal’ which the artifact under study could potentially
Despite the fact that a system can by no means be charac- reach. A simplified example is outlined in Fig. 3.
terized as globally optimal, based solely on the optimization Then, we calculate the average ratio of the metric score
of some structural characteristics, several studies in the area of the artifact under study, compared to the optimal value,
of multi-objective software optimization aim at extracting yielding its maintainability level. The metrics that we have
an optimal sequence of refactoring operations that improve selected to use in our study for quantifying maintainability
the software quality [24]. As shown in Fig. 2, adding a new (see Table 1) belong to well-known metric suites [12, 21].
feature A to the optimal system would need a certain effort, The metric selection was based on a secondary study by Riaz
noted as Effort(optimum), whereas adding the same feature to et al. [26], which reported on a systematic literature review
the actual system necessitates a larger effort, noted as Effort (SLR) aimed at summarizing software metrics that can be
(actual). The difference between these two efforts represents used as maintainability predictors.
the Technical Debt Interest that is accumulated during this Maintenance Effort. Since the evolution of software
maintenance activity. The overarching assumption of FITTED cannot be predicted under normal circumstances, it is not
is that maintenance effort is inversely proportional to the main- possible to foresee what kind of modifications (e.g., bug
tainability of the system (or of an individual artifact)—see fixes, introduction of new features, refactoring, etc.) will be
Eq. 2: made in a system during future releases. Hence, we follow
a simple, yet relatively reasonable approach, and base our
Effort = 𝛼 × (1∕maintainability). (2) assessment of future maintenance effort on historical data.
In particular, to consider past effort spent on maintenance

SN Computer Science
SN Computer Science (2021) 2:12 Page 5 of 12 12

Fig. 3  Notion of hypothetical


optimal among similar artifacts

Table 1  Maintainability proxy metrics


Property Metric Description

Inheritance (Inh) DIT Depth of Inheritance Tree: Inheritance level number, 0 for the root class.
NOCC Number of Children Classes: Number of direct sub-classes that the class has.
Coupling (Cpl) MPC Message Passing Coupling: Number of send statements defined in the class.
RFC Response for a Class: Number of local methods plus the number of methods
called by class methods.
DAC Data Abstraction Coupling: Number of abstract types defined in the class.
Cohesion (Coh) LCOM Lack of Cohesion of Methods: Number of disjoint sets of methods in the class.
Complexity (Com) CC Cyclomatic Complexity: Average cyclomatic complexity of methods in the class.
WMPC Weighted MethoWeighted
Size (Size) SIZE1 Lines of Code: Number of semicolons in the class.
SIZE2 Number of Properties: Number of attributes and methods in the class.

activities for each artifact, we record the average lines of there is a relation between Maintenance Effort and PCCC, the
code added/deleted/modified between all pairs of successive two measures correspond to different views of the same phe-
versions of a system (code churn). Consequently, we project nomenon, in the sense that Maintenance Effort captures lines
this average maintenance effort per version to future releases of code (i.e., the average extent of change), whereas PCCC a
of the analyzed artifact. This strategy has been used in a number of commits (frequency of changes). Thus, we con-
variety of studies on software evolution [13, 19]. sider them independent and suitable for being used in the same
calculation.
Technical Debt Interest Probability

Interest probability is calculated based on past maintenance


data. To this end, we use the Percentage of Commits in which
a Class has Changed (PCCC) metric [6]. Despite the fact that

SN Computer Science
12 Page 6 of 12 SN Computer Science (2021) 2:12

Case Study Design Table 2  TD interest assessment of MaQuali packages


Package Interest Interest prob- IGRI
The case study is designed and reported based on the linear- ability
analytic structure as described by Runeson et al. [27]. This
fr.icms.db 57.67 $ 0.50 28.83
section presents the study design in detail.
fr.icms.sorters 0.12 $ 0.00 0.00
fr.icms.models 16.22 $ 0.25 4.05
Research Goals and Questions
fr.icms streams 0.17 $ 0.12 0.02
fr.icms.mail 16.50 $ 0.50 8.25
The goal of this study, as mentioned in the Introduction sec-
fr.icms.renderers 0.46 $ 0.25 0.11
tion, is twofold: (a) to assess whether the proposed metric
fr.icms.printing 1.71 $ 0.12 0.21
can perform effective prioritization of TD items; and (b) to
fr.icms.graph 0.70 $ 0.25 0.17
examine the risk of interest generation posed by new code.
fr.icms.ui 9.97 $ 0.25 2.49
Based on the above, we have set two research questions that
fr.icms.os 4.55 $ 0.25 1.13
correspond to these two goals:
RQ1: Is IGRI able to prioritize TD items similarly to
an expert? To answer this research question, we calculate
IGRI for all classes of a project and we record the urgency decrease the risk of generating interest, effectively reversing
to fix TD (specifically to improve the quality of individual the negative effect of TD.
classes), based on the expert opinion of software engineers.
A correlation analysis between the IGRI values and the Cases and Units of Analysis
expert opinions could validate or invalidate IGRI as a suit-
able prioritization indicator. In case IGRI is able to resemble This study is an embedded multiple case study, in which
the expert opinion with a strong correlation, we would be the case is an existing software system (written in Java),
able to resolve an important scalability problem in TDM, and the units of analysis are its classes. The system that
since experts cannot afford to assess hundreds or even thou- we have analyzed is MaQuali that is developed by Holisun
sands of artifacts manually. Using IGRI, they would instead SRL. MaQuali is a a quality management system (ISO 9001)
get automated suggestions on which TD items to check first supporting the handling of business processes. It consists
for refactoring opportunities. of approx. 100 classes (more than 150,000 lines of code)
RQ2: Does new code pose a lower risk (in terms of and has been maintained for more than a decade. The sys-
IGRI) for generating TD interest? This question is refined tem consists of six main modules, managing the following
through two sub-questions to distinguish between the quan- entities: (a) fiches of progress, (b) actions to be taken, (c)
tity and quality of new code:(RQ2.1) Is IGRI of a compo- documents involved in ISO quality control, (d) planning, (e)
nent related to the amount of new code introduced to that useful information, and (f) milestones.
component over time? and (RQ2.2) Is IGRI of a component
related to the average quality of new code introduced to that Data Collection
component over time?
This research question focuses on new code introduced To answer the aforementioned research questions, we have
over time, which, as explained in “Introduction”, can be a performed the following process. In the first step, we ini-
promising technical debt reduction approach, if the new code tially analyzed the MaQuali source code with the SDK4ED
is of high quality. To answer this research question, we need toolkit2 and quantified IGRI (Interest Generation Risk
to first separate new from modified code in each commit, Importance) for every class of the software. Then, we
and then capture the extent as well as the quality of the new aggregated the results at the level of packages. Next, we ran-
code. As a second step, we need to perform the FITTED domly picked3 10 packages and asked Holisun’s engineers
analysis, and calculate IGRI. Finally, a correlation analysis to provide a ranking of these packages in terms of mainte-
will be performed to answer this research question. The out- nance risk. This process has led us to the dataset outlined
come of the analysis can inform researchers and practitioners in Table 2. The first column corresponds to the name of the
whether the introduction of (clean) new code can lead to
a more sustainable evolution, that generates less technical
debt interest. We conjecture that the more and the cleaner 2
https​://sdk4e​d.eu.
the new code that is added in a component, the less the risk 3
The selection process was as follows. First, we sorted the packages
for that component to produce interest. Subsequently, one by IGRI, and then, we have demarcated 10 areas (bins), each one con-
could advocate the writing of clean new code as a way to taining 10% of the packages. Finally, we selected 10 software pack-
ages, randomly picked from each one of the 10% bins.

SN Computer Science
SN Computer Science (2021) 2:12 Page 7 of 12 12

class, the second column to interest (per commit), the third To map the identified technical debt issues to the class
to interest probability, and the fourth to IGRI. methods of each revision, we perform the following steps:
As a second step, we asked the software engineers of
Holisun that focus on MaQuali maintenance to rank the 1. First, for each revision, we retrieve all technical debt
aforementioned packages, based on the following question: issues by performing the corresponding query to the
“Please rank the aforementioned packages (ties are accept- SonarQube database.
able—however, not preferable) in terms of the risk that their 2. Next, we map the identified technical debt issues to
maintenance might lead to extreme delays. As maintenance, the methods of the corresponding revision. This is per-
please consider the time that you spend for adding a new formed by matching the line in which each technical
requirement, for fixing a bug, etc. In this question, consider debt issue is reported by SonarQube with the method
not only the time required for one maintenance action, but containing that line.
also how frequently you need to maintain them. Assign 1 to
the package that is the least risky and 20 to the most risky Phase 3: Tracking new methods
packages”. Packages have been shuffled for each respond- We identify the introduction of new methods and the
ent, while the assessments of each package, based on the associated TDdensity as follows:
SDK4ED platform was hidden from the engineers. The anal-
ysis of the respondents’ answers (five software engineers) 1. For the new files of each revision (obtained from git
have been aggregated. history), we obtain their representation in the form of
As a third step, we have performed the analysis of new an Abstract Syntax Tree (AST)4. For each new file, we
code technical debt, similarly to our earlier work [15]. For a extract all its methods from the AST representation and
software system evolving through a number of revisions, we then tag all these methods as new.
track new methods introduced either in entirely new pack- 2. For the modified files of each revision, we track new
ages or in existing classes. We then compute the quality of methods in each transition with the help of the Gumtree
these new methods in terms of their technical debt by map- Spoon AST Diff tool5.
ping identified technical debt issues to the line range of these
methods. Note that we use the concept of 𝐓𝐃density , which Phase 4: Calculating the contribution of new methods to the
is the technical debt of these methods normalized over their change in the system’s TDdensity
size in lines of code. TDdensity enables the comparison of Finally, we need to calculate, for each revision in the sys-
technical debt between artifacts of different sizes (such as tem’s history, the contribution of new methods to the change
new methods vs. the already existing system). of the system’s TDdensity . Let us consider a transition from
The process for analyzing git repositories (such as the revision t-1 to revision t. The contribution of new methods
repository of MaQuali) is briefly outlined in the following to the change in the TDdensity of the system is obtained with
phases and individual steps: the following formula:
Phase 1: Retrieval of commits Contribution of new methods
ΔTDdensity (new)
1. First, the git history for the project under study is
retrieved from its master branch. TDt−1 + TDnew(t) (6)
= − TDdensity (t − 1).
2. All commits are sorted to form a time-series of revisions LOCt−1 + LOCnew(t)
that have been performed on the source code. In case of
commits with more than one parent, we have extracted Based on the aforementioned process, the following dataset
the nodes leading to the longest path between the com- has been developed: each row represented a class, whereas
mit node under examination and the start node (i.e., the columns held the following information:
the only node with no parent). This choice avoids any
(chronological) inconsistencies among revisions, and at [V1] Package Name
the same time, the longest path yields the largest number [V2] IGRI
of commits to be analyzed yielding a higher granularity [V3] Expert opinion of Holisun Software Engineers on the
for the analysis. Risk of the Class
3. To reduce the computation time, a filtering step is
applied by ignoring transitions between successive com-
mits that do not involve any changes to Java files.
4
The AST is obtained through the Eclipse Java Development Tools
Phase 2: Mapping of technical debt issues to methods (JDT).
5
https​://githu​b.com/Spoon​Labs/gumtr​ee-spoon​-ast-diff.

SN Computer Science
12 Page 8 of 12 SN Computer Science (2021) 2:12

[V4] Average LoC added as new code in the history of the


package
[V5] Average contribution of new code in the TDdensity of
the package.

Data Analysis

The aforementioned data have been analyzed using descrip-


tive statistics and by Spearman Correlation in pairs. To
answer RQ1, we use the pair [V2]–[V3], and for RQ2.1,
we use the pair [V2]–[V4], whereas for RQ2.2, the pair
[V2]–[V5]. Especially, for RQ2.2, we have transformed [V5]
variable to a categorical one (positive or negative contribu-
tions) and provided additional analysis.

Results

In this section, we present the results of the case study,


organized by research question. In particular, in “Ability of
IGRI to Prioritize TD Artifacts”, we present the results on Fig. 4  Risk ranking consistency of IGRI and perception of stakehold-
the ability of IGRI to predict the risk of software packages ers
to produce high interest. Subsequently, in “Relation of IGRI
and New Code”, we use the newly proposed index to assess programs (especially custom HTML printing).” The fact that
the contribution of new code to the risk of producing techni- the interest probability of this package is quite low, we can
cal debt interest. infer that the opinion of stakeholders on risk is more related
to Technical Debt Interest (i.e., maintenance difficulty)
Ability of IGRI to Prioritize TD Artifacts rather than maintenance frequency.
Regarding the extreme cases (highest or lowest IGRI),
As a means for validating the ability of IGRI to estimate of the packages fr.icms.db and fr.icms.sorters
the risk of a package to generate Technical Debt Interest, are correctly characterized as high and low risks by both
we contrast the metric to the perception of stakeholders on stakeholders and IGRI (the dot on lower left is ranked with
the risk of package maintenance to lead to extreme risks. To 1 from both IGRI and practitioners, as well as the dot on the
achieve this goal, since: (a) the two variables have a different upper right is ranked with 10 from both IGRI and practi-
value range; and (b) we focus on prioritization instead of tioners). Quoting a practitioner: “On the one hand, the fr.
prediction, we preferred to treat the two variables as ordi- icms.sorters package is rarely maintained, because the
nal ones. Thus, we transformed them to the rank that corre- code is pretty basic (no complex logic inside) and classes
sponds to a specific value (i.e., the highest IGRI is assigned inside are used as basic components in lots of other parts of
the value 1; whereas the lowest IGRI is assigned the value the application, so most of maintenance was made on the
10). To visualize the ability of IGRI to consistently rank beginning of development phase of the project. On the other
packages, based on their risk to produce interest (metric hand, the difficulty in maintaining the fr.icms.db pack-
property: consistency, according to the 1061–1998 IEEE age comes from the fact that new requested features implied
Standard for Software Metrics [35]), in Fig. 4, we present the modification of the underlying database structure.” This
a scatter plot. confirms that both ease of maintenance and maintenance
As it can be observed from Fig. 4, the ranking is almost load are deemed as important by the practitioners.
consistent (the dots are close to the main diagonal line), To explore if the aforementioned results are statistically
with only some exceptions. The majority of deviations is by significant, we performed a Spearman Rank correlation
one rank, with only one exception (package: fr.icms. analysis. The results (correlation coefficient: 0.827 and
printing). The specific package is ranked as high risk sig: 0.003) of the test suggest that the two ranks are very
according to stakeholders, but as low risk, based on IGRI. strongly correlated (and statistically significant). Thus, an
According to a lead developer of the MaQuali software: “the IGRI-based prioritization can safely subsume [35] the rank-
printing package was difficult to maintain and keep sustain- ing that an experienced practitioner would provide to pack-
able, because of the lack of strong printing support for java ages in terms of risk to generate Technical Debt Interest.

SN Computer Science
SN Computer Science (2021) 2:12 Page 9 of 12 12

Fig. 5  Percentage of new code


and risk of producing interest

This outcome is significant, in the sense that IGRI calcula- Fig. 5, we present the boxplots of IGRI for each percentile
tion is automated; therefore, it can easily scale at large code- (quartile) of new code density. For instance, the first per-
bases, and is unbiased from stakeholders experience. Thus, centile corresponds to packages that 0–25% of their code
inexperienced developers can use the ranking and identify in each version (on average) is new. Based on Fig. 5, the
maintainability challenges similarly to more experienced median IGRI for the packages of this group is approx. 8
developers. (mean value: 13.84), whereas the median IGRI for packages
in which new code accounts for 26–50% of their codebase
Relation of IGRI and New Code the median is almost zero (mean value: 1.09).
Regarding RQ2.2, rather than focusing on the amount
In this section, we use the IGRI metric, so as to explore the of new code that is added, we focus on the quality of the
relation between new code and the risk of generating TD new code. Thus, we correlated the TDdensity of the new code
Interest. New code has been discussed in the literature as an with IGRI. The results of the Spearman correlation suggest
alternative to refactoring, for reducing the amount of techni- a moderate negative correlation (correlation coefficient:
cal debt [16, 39]. To this end, we explore: (a) if the average –0.547 and sig: 0.100); which, however, is not statistically
percentage of new code that is accumulated in a package significant. This result also suggests that new code of better
along evolution is associated with a decrease or increase of quality tends to reduce the risk of generating interest, but
IGRI–RQ2.1; and (b) if there is a relation between IGRI and this result cannot be generalized. To visualize the difference,
the quality of the new code–RQ2.2. we split the dataset into two groups (poor quality—TDdensity
Regarding RQ2.1, we first explore if the average percent- > 1.0 and good quality—TDdensity < 1.0 ) and provide the
age of new code (in all versions) against all lines of code of boxplots of IGRI—see Fig. 6. Despite the difference in the
the package is correlated to IGRI of the package (calculated mean values (2.55 for Good Quality Code vs. 7.49 for Poor
as the average value of the IGRI score of its classes). The Quality Code), the two samples do not differ statistically
results suggest that there exists a very strong (correlation significantly. However, this could be due to the small sample
coefficient: –0.745 and sig: 0.012) negative relation, that size of our case study.
is statistically significant. The negative sign of the relation By synthesizing the results of RQ2.1 and RQ2.2, we can
suggests that the less new code is introduced in a package, claim that new code is related to the risk of generating inter-
the higher the risk of the corresponding package generat- est. In the general case, the more new code is inserted along
ing interest. To visualize the aforementioned relationship, in evolution, the lower the risk, and if this new code is “clean”,

SN Computer Science
12 Page 10 of 12 SN Computer Science (2021) 2:12

Fig. 6  Quality of new code and


risk of producing interest

the impact of the Technical Debt Interest Risk is further each revision, the lower the risk of incurring technical debt
reduced. This result complies with the literature, suggest- interest.
ing that clean new code reduces the amount of Technical Implications for Researchers and Software Practition-
Debt Principal along evolution [16, 39]. Additionally, we ers. Prioritizing preventive maintenance tasks is a key activity
emphasize that the amount of new code appears to be a more in Technical Debt Management, especially for large codebases
important factor for reducing the risk of producing Technical with numerous opportunities for improvement. The proposed
Debt Interest, compared to the quality of the code. This find- Interest Generation Risk Importance (IGRI) captures accu-
ing is surprising as code of better quality would be expected rately the perception of software engineers as to whether a
to decrease the risk of heavy maintenance; thus, it deserves software package should be ’refactored’ to address its technical
further investigation. debt. We conjecture that IGRI can efficiently prioritize soft-
ware artifacts at other levels of granularity, as well, but this is
a subject of future work. A development team can systemati-
Discussion cally obtain IGRI by tracking technical debt interest (though a
framework such as SDK4ED) and the frequency of changes to
Interpretation of Results. The findings of this study (RQ1) software artifacts. Furthermore, the preliminary evidence that
confirm that the proposed metric captures with sufficient accu- new code (and especially ‘better’ new code) is associated with
racy and the urgency of fixing problems, as it is perceived by lower risk of incurring interest highlights the importance of
software engineers. This result is reasonable: technical debt tracking the quality of new code. Imposing the use of Quality
items with limited probability to undergo changes in the future Gates as a means of controlling the quality of new code can
are naturally deemed as less urgent to fix. The same holds naturally lower the risk of maintainability issues in the future.
for items with reduced interest; software engineers are less
concerned about the maintenance of artifacts that exhibit low
interest (because they are simple or well designed). The second Threats to Validity
research question of the study revealed that the risk of code
packages to generate interest is negatively associated with the In this section, we discuss threats to the validity of the study,
amount (frequency and extent) of new code introduced into including threats to construct, external validity, and reliabil-
them along evolution. New code is often of better quality than ity. The study does not aim at establishing cause-and-effect
the existing code base; thus, the more new code is added in relations, and is thus not concerned with internal validity.

SN Computer Science
SN Computer Science (2021) 2:12 Page 11 of 12 12

Construct Validity reflects how far the examined phe- Acknowledgements Work reported in this paper has received funding
nomenon is connected to the intended objectives. As primary from the European Union H2020 research and innovation programme
under grant agreement No. 780572 (project: SDK4ED).
construct validity threat in technical debt analysis, we should
acknowledge the inherent difficulties in assessing technical Open Access This article is licensed under a Creative Commons Attri-
debt interest. Interest is defined as the additional (future) bution 4.0 International License, which permits use, sharing, adapta-
maintenance effort because of code, design or architectural tion, distribution and reproduction in any medium or format, as long
inefficiencies. By definition, future maintenance cannot be as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
anticipated neither the additional effort compared to an ideal were made. The images or other third party material in this article are
TD-free implementation. included in the article’s Creative Commons licence, unless indicated
Reliability reflects whether the study has been conducted otherwise in a credit line to the material. If material is not included in
and reported in a way that others can replicate it and reach the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
the same results. To mitigate any reliability threats, we need to obtain permission directly from the copyright holder. To view a
report all steps followed to obtain the dataset for the investi- copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
gated research questions and provide links to the employed
tools. Moreover, the employed dataset along with the vari-
able values for the statistical analysis is available in a repli- References
cation package6.
External Validity is related to the ability of generalizing 1. Alves NS, Mendes TS, de Mendonça MG, Spínola RO, Shull F,
the findings to other settings, e.g., other software projects, Seaman C. Identification and management of technical debt: a
systematic mapping study. Inf Softw Technol. 2016;70:100–21.
other programming languages, and possibly other technical 2. Ampatzoglou A, Ampatzoglou A, Chatzigeorgiou A, Avgeriou
debt tools. The current study suffers from such threats as P. The financial aspect of managing technical debt: a systematic
only one software system, written in a particular language, literature review. Inf Softw Technol. 2015;64:52–73. https​://doi.
has been analyzed. Given the importance of technical debt org/10.1016/j.infso​f.2015.04.001.
3. Ampatzoglou A, Michailidis A, Sarikyriakidis C, Ampatzoglou A,
prioritization, we plan to conduct further studies on the Chatzigeorgiou A, Avgeriou P. A framework for managing inter-
validity of IGRI in other settings. est in technical debt: an industrial validation. In: Proceedings of
the 2018 International Conference on Technical Debt; 2018. p.
115–124.
4. Ampatzoglou A, Michailidis A, Sarikyriakidis C, Ampatzoglou A,
Conclusions Chatzigeorgiou A, Avgeriou P. A framework for managing interest
in technical debt: An industrial validation. In: Proceedings of the
Acknowledging that efficient prioritization of technical debt 2018 International Conference on Technical Debt, TechDebt ’18,
repayment is key for software sustainability, we have intro- p. 115–124. Association for Computing Machinery, New York,
NY, USA; 2018. https​://doi.org/10.1145/31941​64.31941​75.
duced a simple, yet effective way to estimate risk importance 5. Arvanitou EM, Ampatzoglou A, Bibi S, Chatzigeorgiou A, Sta-
of technical debt items. By considering both the amount of melos I. Monitoring technical debt in an industrial setting. In:
technical debt interest as well as the probability of artifacts Proceedings of the Evaluation and Assessment on Software
to undergo changes, we have proposed the Interest Genera- Engineering, EASE ’19, p. 123–132. Association for Com-
puting Machinery. New York, NY, USA; 2019. https​: //doi.
tion Risk Importance (IGRI) measure. IGRI quantifies the org/10.1145/33190​08.33190​19.
impact of a possible change in a specific artifact that suffers 6. Arvanitou EM, Ampatzoglou A, Chatzigeorgiou A, Avgeriou P. A
from technical debt. method for assessing class change proneness. In: Proceedings of
The empirical validation in an industrial setting revealed the 21st International Conference on Evaluation and Assessment
in Software Engineering, EASE’17, p. 186–195. Association for
that IGRI captures accurately the notion of urgency to fix Computing Machinery. New York, NY, USA; 2017. https​://doi.
issues, as perceived by software engineers. Moreover, the org/10.1145/30842​26.30842​39.
more new code is added to a software system, the lower the 7. Boehm B. Software risk management. In: European Software
risk to generate interest, compared to already existing code. Engineering Conference. Springer; 1989. p. 1–19.
8. Boehm B, Sullivan K. Software economics: a roadmap, the future
Future work can shed light into the particular characteristics of software engineering. In: Proceedings of the 22nd International
of software artifacts and development practices that lead to Conference on Software Engineering; 2000. p. 319–343. https​://
increased risk of technical debt interest generation. doi.org/10.1145/33651​2.33658​4
9. Boehm BW. Software risk management: principles and practices.
IEEE Softw. 1991;8(1):32–41.
10. Boehm BW. Software risk management: principles and practices.
IEEE Softw. 1991;8(1):32–41. https​://doi.org/10.1109/52.62930​.
11. Charalampidou S, Arvanitou EM, Ampatzoglou A, Avgeriou
P, Chatzigeorgiou A, Stamelos I. Structural quality metrics as
indicators of the long method bad smell: An empirical study. In:
6
https​://drive​.googl​e.com/drive​/folde​rs/1c2RX​6KmmB​CLoU-ac2uE​
Pxc5F​2NMlS​gjx.

SN Computer Science
12 Page 12 of 12 SN Computer Science (2021) 2:12

2018 44th Euromicro Conference on Software Engineering and 27. Runeson P, Host M, Rainer A, Regnell B. Case study research in
Advanced Applications (SEAA); 2018. p. 234–238. IEEE software engineering: Guidelines and examples. Hoboken: Wiley;
12. Chidamber SR, Darcy DP, Kemerer CF. Managerial use of metrics 2012.
for object-oriented software: an exploratory analysis. IEEE Trans 28. Seaman C, Guo Y. Chapter 2—measuring and monitoring techni-
Softw Eng. 1998;24(8):629–39. cal debt. Elsevier; 2011. p. 25 – 46. https:​ //doi.org/10.1016/B978-
13. Conejero JM, Rodríguez-Echeverría R, Hernández J, Clemente PJ, 0-12-38551​2-1.00002​-5. http://www.scien​cedir​ect.com/scien​ce/
Ortiz-Caraballo C, Jurado E, Sánchez-Figueroa F. Early evalua- artic​le/pii/B9780​12385​51210​00025​
tion of technical debt impact on maintainability. J Syst Softw. 29. Seaman C, Guo Y, Zazworka N, Shull F, Izurieta C, Cai Y, Vetrò
2018;142:92–114. https:​ //doi.org/10.1016/j.jss.2018.04.035http:// A. Using technical debt data in decision making: potential deci-
www.scien​cedir​ect.com/scien​ce/artic​le/pii/S0164​12121​83007​36. sion approaches. In: 2012 Third International Workshop on Man-
14. Cunningham W. The wycash portfolio management system. OOPS aging Technical Debt (MTD); 2012. pp. 45–48. IEEE. https​://doi.
Messenger. 1993;4(2):29–30 http://dblp.uni-trier​.de/db/journ​als/ org/10.1109/MTD.2012.62259​99
oopsm​/oopsm​4.html#Cunni​ngham​93. 30. Siavvas M, Gelenbe E. Optimum Checkpointing for Long-running
15. Digkas G, Ampatzoglou A, Chatzigeorgiou A, Avgeriou P. On the Programs. In: 15th China-Europe International Symposium on
temporality of introducing code technical debt. In: 13th Interna- Software Engineering Education; 2019.
tional Conference on the Quality of Information and Communica- 31. Siavvas M, Gelenbe E. Optimum interval for application-level
tions Technology (QUATIC 2020). Springer; 2020. checkpoints. In: 2019 6th IEEE International Conference on
16. Digkas G, Lungu M, Chatzigeorgiou A, Avgeriou P. The evolution Cyber Security and Cloud Computing (CSCloud)/2019 5th IEEE
of technical debt in the apache ecosystem. In: European Confer- International Conference on Edge Computing and Scalable Cloud
ence on Software Architecture. Springer; 2017. p. 51–66. https​:// (EdgeCom); 2019. pp. 145–150. IEEE.
doi.org/10.1007/978-3-319-65831​-5_4 32. Siavvas M, Gelenbe E, Kehagias D, Tzovaras D. Static analysis-
17. Guo Y, Seaman C. A portfolio approach to technical debt manage- based approaches for secure software development. In: Inter-
ment. In: Proceedings of the 2nd Workshop on Managing Techni- national ISCIS Security Workshop. Springer, Cham; 2018. pp.
cal Debt, MTD ’11, p. 31–34. Association for Computing Machin- 142–157.
ery, New York, NY, USA; 2011. https​://doi.org/10.1145/19853​ 33. Siavvas M, Marantos C, Papadopoulos L, Kehagias D, Soudris
62.19853​70. D, Tzovaras D. On the relationship between software security
18. Harrington HJ. Poor-quality cost: implementing, understanding, and energy consumption. In: 15th China-Europe International
and using the cost of poor quality. Boca Raton: CRC Press; 1987. Symposium on Software Engineering Education; 2019.
19. Kazman R, Cai Y, Mo R, Feng Q, Xiao L, Haziyev S, Fedak V, 34. Siavvas M, Tsoukalas D, Jankovic M, Kehagias D, Chatzigeorgiou
Shapochka, A. A case study in locating the architectural roots A, Tzovaras D, Anicic N, Gelenbe E. An empirical evaluation of
of technical debt. In: 2015 IEEE/ACM 37th IEEE International the relationship between technical debt and software security. In:
Conference on Software Engineering; 2015. vol. 2, p. 179–188. 9th International Conference on Information Society and Technol-
20. Letouzey JL. The sqale method for evaluating technical debt. ogy (ICIST), vol. 2019; 2019.
In: 2012 Third International Workshop on Managing Techni- 35. Society IC. 1061–1998: IEEE standard for a software quality met-
cal Debt (MTD). 2012; p. 31–36. IEEE. https​://doi.org/10.1109/ rics methodology. IEEE; 2009.
MTD.2012.62259​97 36. Sommerville I. Software engineering. 9th ed. Boston: Addison-
21. Li W, Henry S. Object-oriented metrics that predict main- Wesley Publishing Company; 2010.
tainability. J Syst Softw. 1993;23(2):111–22. https ​ : //doi. 37. Tsimpourlas F, Papadopoulos L, Bartsokas A, Soudris D. A design
org/10.1016/0164-1212(93)90077​-B. http://www.scien​cedir​ect. space exploration framework for convolutional neural networks
com/scien​ce/artic​le/pii/01641​21293​90077​B. Object-Oriented implemented on edge devices. IEEE Trans Comput Aided Des
Software. Integr Circuits Syst. 2018;37(11):2212–21.
22. Li Z, Avgeriou P, Liang P. A systematic mapping study on techni- 38. Tsintzira A, Ampatzoglou A, Matei O, Ampatzoglou A, Chatz-
cal debt and its management. J Syst Softw. 2015;101:193–220. igeorgiou A, Heb R. Technical debt quantification through met-
23. Martin RC. Clean code: a handbook of agile software craftsman- rics: An industrial validation. In: 15th China-Europe International
ship. London: Pearson Education; 2009. Symposium on Software Engineering Education (CEISEE’ 19);
24. Ouni A, Kessentini M, Sahraoui H. Multiobjective optimization 2019.
for software refactoring and evolution. In: Advances in Comput- 39. Zabardast E, Gonzalez-Huerta J, Šmite D. Refactoring, bug fixing,
ers, vol. 94. Elsevier; 2014. p. 103–167. https​://doi.org/10.1016/ and new development effect on technical debt: An industrial case
B978-0-12-80016​1-5.00004​-9 study. In: 46th EUROMICRO Conference on Software Engineer-
25. Papadopoulos L, Marantos C, Digkas G, Ampatzoglou A, Chatz- ing and Advanced Applications (SEAA 2020). IEEE; 2020.
igeorgiou A, Soudris D. Interrelations between software quality
metrics, performance and energy consumption in embedded appli- Publisher’s Note Springer Nature remains neutral with regard to
cations. In: Proceedings of the 21st International Workshop on jurisdictional claims in published maps and institutional affiliations.
Software and Compilers for Embedded Systems; 2018. p. 62–65.
26. Riaz M, Mendes E, Tempero E. A systematic review of soft-
ware maintainability prediction and metrics. In: 2009 3rd Inter-
national Symposium on Empirical Software Engineering and
Measurement; 2009. p. 367–377. IEEE. https​://doi.org/10.1109/
ESEM.2009.53142​33

SN Computer Science

You might also like