CodeMine
CodeMine
64 I E E E s o f t w A r E | P u B l I s H E D B y t H E I E E E C o M P u t E r s o C I E t y 0 74 0 -74 5 9 / 13 / $ 3 1. 0 0 © 2 0 13 I E E E
implements
Code review submits as
Procedure/ Class/
Review
method type
calls uses
FIGURE 1. The types of data CODEMINE platform collects. Artifacts are cross-referenced as much as possible, allowing queries against
CODEMINE to go beyond an individual repository.
former category, the primary concepts the engineering activity, taking into ac- and accessible from each instance of
are source fi les and their attributes: to- count the two most common software the CODEMINE data platform; how-
tal size, size of code versus comments, verification and validation activities. ever, each instance might have slightly
implemented methods, and defi ned Organization information and process different capabilities, in terms of both
classes or types. In the latter, concepts information (such as release schedules data stored and analytics that execute
of a change, a branch, and an integra- and development milestones) are also on it. Yet, client applications will be
tion characterize the team’s output a part of CODEMINE. They provide able to run on the data platform as long
over time. context for the engineering activity, the as the data they need is present, ideally
Another large and important body code being developed, and all activities scaling their capabilities on the basis of
of data resides in work item reposito- around that. which data is actually present. If an ap-
ries. These typically encompass both As Figure 1 depicts, artifacts are plication can’t run on a particular in-
features and defects, both types of cross-referenced as much as possible, stance of the data platform, it will be
which are often tightly linked to source allowing queries against CODEMINE able to fail gracefully.
code changes. It’s a bidirectional rela- to go beyond an individual repository.
tionship—features and defects are both Data Store
a trigger for as well as a cause of source Architecture The core element of the data platform is
code changes. Figure 2 describes the CODEMINE the data store. It’s a logical concept re-
Data on builds describes the com- platform’s high-level architecture. alized as a collection of data sources—
position of the fi nal software product More than one instance currently ex- typically databases but also fi le shares
and also allows us to map source code ists; all conform to the same blue- with either text or binary fi les. These
to the resulting executable. Code re- print. We’re assuming a high degree data sources don’t have to be colocated
views and tests complete the picture of of commonality in the data stored in but are likely to remain geographically
J u ly/A u g u s t 2 0 1 3 | IEEE s o f t w A r E 65
Single-purpose analysis
Tools
... ...
... ...
... ...
CODEMINE
platform API
CODEMINE
datamart services
Data model exposed for querying
Data archiving
CODEMINE
loaders
close to the raw data they cache con- the data store. They understand the Platform APIs (Data Model)
sistent with individual product group schema of the raw data source they’re CODEMINE has a standard set of
data and security policy. It’s not neces- querying from. Data loaders are built interfaces that expose data from the
sary for all data platform deployments to be as independent and decoupled data platform. The interfaces target
to have the same data sources. Appli- from one another as possible. most common entities such as code,
cations use the data catalog service to The data collection workflow takes defects, features, tests, people, and
query for presence and logical loca- care of orchestrating data collections, their attributes and relationships.
tion (such as a connection string or fi le enforcing any dependencies, and ensur- The most common usage patterns
share name) of specific pieces of data. ing collections happen in the correct should be realized through this data
order. The workflow will be defi ned in model.
Data Loaders close cooperation with product groups Applications that make use of the
Data loaders are modules of code that and adheres to the “pull once” model data platform will most often follow
read raw data and directly put it into of data collection as closely as possible. this pattern:
66 I E E E s o f t w A r E | w w w. C o M P u t E r . o r g / s o f t w A r E
Platform Services
Platform services encompass a vari- FIGURE 3. CRANE tool screenshot.
ety of features related to data catalog-
ing, security and access permissions,
event logging, data archiving, and data and opening it up to both the Micro- related to their product, process,
publishing. soft internal research community and or organization.
Each part of the data platform sys- product groups, three distinct patterns • To enable new research. Data
tem needs to be able to log events to of data use emerged: from each product team, and espe-
a common place. Reasons for logging cially from across product teams,
include health monitoring and trend- • As a data source for a reporting is a compelling source of informa-
ing, data access auditing, execution tool or methodology that’s part of tion and inspiration for new lines
tracing, and alerting in failure cases. a product team’s process. When a of research.
Product groups need the ability to product team uses the CODEMINE
control access to their cached data platform and the client application What follows are examples from
the same way they control access to in production, this usage pattern re- each of these categories.
raw data sources. The security pol- quires data freshness and reliability
icy module must be able to under- of data acquisition and analysis as Example 1: Mature Research
stand the security configuration sys- well as operational uptime and ef- Encoded into a Tool
tems used by product groups, query ficiency to get to data. Change is a fundamental unit of work
the security policies at the right • For one-time, custom analysis fo- for software development teams that
frequency, and apply them to both cusing on answering a specifi c exists regardless of whether a prod-
stored data and interfaces accessible question. Although the data might uct is a traditional boxed version or
from outside the data platform. Cur- not be stored in a way that’s opti- a service or whether a team uses an
rently, data platform instances are mized for a particular query, the agile process or a more traditional
protected by individual and separate fact that the data is available at all approach.
security groups. and easy to access (compared to Making postrelease changes re-
accessing raw data sources for the quires a thorough understanding of
Data Platform same data) makes CODEMINE not only the architecture of the soft-
Usage Scenarios the go-to data source when a prod- ware component to be changed but also
In the process of creating the platform uct team needs to make a decision its dependencies and interactions with
J u ly/A u g u s t 2 0 1 3 | IEEE s o f t w A r E 67
68 I E E E s o f t w A r E | w w w. C o M P u t E r . o r g / s o f t w A r E
J u ly/A u g u s t 2 0 1 3 | IEEE S o f t w a r e 69
70 I E E E s o f t w A r E | w w w. C o M P u t E r . o r g / s o f t w A r E
Call Articles
6. A. Mockus, N. Nagappan, and T.T. Dinh-
Trong, “Test Coverage and Post-verifi cation
Defects: A Multiple Case Study,” Proc. 3rd
Int’l Symp. Empirical Software Eng. and
Measurement (ESEM 09), IEEE CS, 2009,
for
pp. 291–301.
7. L. Williams, G. Kudrjavets, and N. Nagappan,
“On the Effectiveness of Unit Test Automa- IEEE Software seeks practical, readable
tion at Microsoft,” Proc. 20th Int’l Symp. articles that will appeal to experts and nonexperts
Software Reliability Eng. (ISSRE 09), IEEE alike. The magazine aims to deliver reliable
CS, 2009, pp. 81–89.
information to software developers and managers
8. E. Shihab, C. Bird, and T. Zimmermann,
“The Effect of Branching Strategies on Soft- to help them stay on top of rapid technology
ware Quality,” Proc. Int’l Symp. Empirical change. Submissions must be original and no
Software Eng. and Measurement (ESEM 12), more than 4,700 words, including 200 words
ACM, 2012, pp. 301–310.
for each table and figure.
Author guidelines:
www.computer.org/software/author.htm
Further details: [email protected]
Selected CS articles and columns
are also available for free at www.computer.org/software
http://ComputingNow.computer.org.
J u ly/A u g u s t 2 0 1 3 | IEEE s o f t wA r E 71