0% found this document useful (0 votes)

12 views8 pages

CodeMine

The document discusses CODEMINE, a software development data analytics platform created by Microsoft to collect and analyze engineering process data across various product teams. It highlights the platform's architecture, data sources, and the common practices for utilizing the data to improve software development processes. The article aims to share lessons learned and design rationale to assist other organizations in building similar analytics platforms.

Uploaded by

krugersone45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

CodeMine

Uploaded by

krugersone45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

FOCUS: SOFTWARE ANALYTICS: SO WHAT?

CODEMINE: though each solution is unique in its

intended purpose and the way it im-
proves the engineering process, there

Building a Software are commonalities of inputs, outputs,

and methods used among the tools. For
example, a majority of the reviewed
Development Data Analytics tools need similar input data: source
code repositories and system binaries,
Platform at Microsoft defect databases, and organization
hierarchies.
In late 2009, a team at Microsoft
was established to explore and imple-
Jacek Czerwonka, Microsoft
ment a common platform, CODE-
Nachiappan Nagappan, Microsoft Research MINE, for collecting and analyzing
engineering process data from across
Wolfram Schulte, Microsoft a diverse set of Microsoft’s product
teams. CODEMINE quickly became
Brendan Murphy, Microsoft Research pervasive and is now deployed in all
major product groups at Microsoft.
This project wasn’t done for the sake
of research or academic impact but was
// The process Microsoft has gone through developing actually deployed in Microsoft and has
CODEMINE—a software development data analytics hundreds of users. Currently, CODE-
MINE is deployed in all major Micro-
platform for collecting and analyzing engineering
soft product groups: Windows, Win-
process data—includes constraints, and pivotal dows Phone, Office, Exchange, Lync,
organizational and technical choices. // SQL, Azure, Bing, and Xbox.
This article presents the motivation,
challenges, solutions, and most impor-
tant, the lessons learned by the CODE-
MINE team to aid in replicating such
a platform in other organizations. We
hope our design rationale can help oth-
ers who are building similar analytics
platforms.

Data Sources and Schema

EARLY, TRUSTWORTHY DATA avail- • risk evaluation and change impact Figure 1 depicts a high-level schema of
able at the required frequencies lets analysis tools, 2 the repositories and types of artifacts
engineers and managers make data- • version control branch structure mined by CODEMINE. In terms of
driven decisions that can enable the optimization, 3 both volume and frequency of change,
success of the software development • social-technical data analysis,4 and source code repositories are the larg-
process to deliver a high-quality, on- • custom search for bugs and debug est sources of engineering data for a
time software system. At Microsoft, logs, speeding up investigations of company like Microsoft. They con-
several teams use data to improve pro- new issues.5 tain information on a variety of source
cesses. Examples include code–related artifacts divided into data
When reviewing these and other describing its state, composition, and
• trend monitoring and reports on de- solutions from our existing portfolio high-level attributes, as well as data de-
velopment health,1 of tools, our teams realized that even scribing ongoing code changes. In the

64 I E E E s o f t w A r E | P u B l I s H E D B y t H E I E E E C o M P u t E r s o C I E t y 0 74 0 -74 5 9 / 13 / $ 3 1. 0 0 © 2 0 13 I E E E

s4nag.indd 64 6/6/13 12:01 PM

Work item Source code Process information

Feature/defect Integration Product Schedule

moves
created on
resolves ships
from
defines
opens resolves Submitted
into
Organization Change Branch Build Test
submits

Person belongs to built on

works edits belongs to tests

with requests comments on Source file Executable Test job
defines

implements
Code review submits as

Procedure/ Class/
Review
method type
calls uses

FIGURE 1. The types of data CODEMINE platform collects. Artifacts are cross-referenced as much as possible, allowing queries against
CODEMINE to go beyond an individual repository.

former category, the primary concepts the engineering activity, taking into ac- and accessible from each instance of
are source fi les and their attributes: to- count the two most common software the CODEMINE data platform; how-
tal size, size of code versus comments, verification and validation activities. ever, each instance might have slightly
implemented methods, and defi ned Organization information and process different capabilities, in terms of both
classes or types. In the latter, concepts information (such as release schedules data stored and analytics that execute
of a change, a branch, and an integra- and development milestones) are also on it. Yet, client applications will be
tion characterize the team’s output a part of CODEMINE. They provide able to run on the data platform as long
over time. context for the engineering activity, the as the data they need is present, ideally
Another large and important body code being developed, and all activities scaling their capabilities on the basis of
of data resides in work item reposito- around that. which data is actually present. If an ap-
ries. These typically encompass both As Figure 1 depicts, artifacts are plication can’t run on a particular in-
features and defects, both types of cross-referenced as much as possible, stance of the data platform, it will be
which are often tightly linked to source allowing queries against CODEMINE able to fail gracefully.
code changes. It’s a bidirectional rela- to go beyond an individual repository.
tionship—features and defects are both Data Store
a trigger for as well as a cause of source Architecture The core element of the data platform is
code changes. Figure 2 describes the CODEMINE the data store. It’s a logical concept re-
Data on builds describes the com- platform’s high-level architecture. alized as a collection of data sources—
position of the fi nal software product More than one instance currently ex- typically databases but also fi le shares
and also allows us to map source code ists; all conform to the same blue- with either text or binary fi les. These
to the resulting executable. Code re- print. We’re assuming a high degree data sources don’t have to be colocated
views and tests complete the picture of of commonality in the data stored in but are likely to remain geographically

J u ly/A u g u s t 2 0 1 3 | IEEE s o f t w A r E 65

s4nag.indd 65 6/6/13 12:01 PM

FOCUS: SOFTWARE ANALYTICS: SO WHAT?

Single-purpose analysis

Tools
... ...

... ...

... ...
CODEMINE
platform API

CODEMINE
datamart services
Data model exposed for querying

Data catalog and data recovery

CODEMINE
data source

Code People Process Data publishing and replication

Work Flexibility in data availability

items Builds Access permissions

Data archiving
CODEMINE
loaders

Understand format of repositories

Failure-resistant loading
Remove noise from data
Save data into a common schema
Product team’s
repositories

Code Bugs Builds Organization Tests

Small number of source schemas

Many combinations

FIGURE 2. High-level architecture of a CODEMINE data platform instance.

close to the raw data they cache con- the data store. They understand the Platform APIs (Data Model)
sistent with individual product group schema of the raw data source they’re CODEMINE has a standard set of
data and security policy. It’s not neces- querying from. Data loaders are built interfaces that expose data from the
sary for all data platform deployments to be as independent and decoupled data platform. The interfaces target
to have the same data sources. Appli- from one another as possible. most common entities such as code,
cations use the data catalog service to The data collection workflow takes defects, features, tests, people, and
query for presence and logical loca- care of orchestrating data collections, their attributes and relationships.
tion (such as a connection string or fi le enforcing any dependencies, and ensur- The most common usage patterns
share name) of specific pieces of data. ing collections happen in the correct should be realized through this data
order. The workflow will be defi ned in model.
Data Loaders close cooperation with product groups Applications that make use of the
Data loaders are modules of code that and adheres to the “pull once” model data platform will most often follow
read raw data and directly put it into of data collection as closely as possible. this pattern:

66 I E E E s o f t w A r E | w w w. C o M P u t E r . o r g / s o f t w A r E

s4nag.indd 66 6/6/13 12:01 PM

1. Query the data catalog to ensure
that the needed data exists. Fetch
connection strings to data sources
or URLs to needed services.
2. Tailor functionality depend-
ing on the available data.
3. Connect to services and query for
data through the data model.
4. Display data.

The data model is the preferred

way to access the data stored in the
data platform. It needs to be expres-
sive enough to support the data needs
of the productized solutions. However,
for purposes of specialized queries,
one-off research tasks, or prototyping,
access to interfaces exposed by indi-
vidual data sources is also available.

Platform Services
Platform services encompass a vari- FIGURE 3. CRANE tool screenshot.
ety of features related to data catalog-
ing, security and access permissions,
event logging, data archiving, and data and opening it up to both the Micro- related to their product, process,
publishing. soft internal research community and or organization.
Each part of the data platform sys- product groups, three distinct patterns • To enable new research. Data
tem needs to be able to log events to of data use emerged: from each product team, and espe-
a common place. Reasons for logging cially from across product teams,
include health monitoring and trend- • As a data source for a reporting is a compelling source of informa-
ing, data access auditing, execution tool or methodology that’s part of tion and inspiration for new lines
tracing, and alerting in failure cases. a product team’s process. When a of research.
Product groups need the ability to product team uses the CODEMINE
control access to their cached data platform and the client application What follows are examples from
the same way they control access to in production, this usage pattern re- each of these categories.
raw data sources. The security pol- quires data freshness and reliability
icy module must be able to under- of data acquisition and analysis as Example 1: Mature Research
stand the security configuration sys- well as operational uptime and ef- Encoded into a Tool
tems used by product groups, query ficiency to get to data. Change is a fundamental unit of work
the security policies at the right • For one-time, custom analysis fo- for software development teams that
frequency, and apply them to both cusing on answering a specifi c exists regardless of whether a prod-
stored data and interfaces accessible question. Although the data might uct is a traditional boxed version or
from outside the data platform. Cur- not be stored in a way that’s opti- a service or whether a team uses an
rently, data platform instances are mized for a particular query, the agile process or a more traditional
protected by individual and separate fact that the data is available at all approach.
security groups. and easy to access (compared to Making postrelease changes re-
accessing raw data sources for the quires a thorough understanding of
Data Platform same data) makes CODEMINE not only the architecture of the soft-
Usage Scenarios the go-to data source when a prod- ware component to be changed but also
In the process of creating the platform uct team needs to make a decision its dependencies and interactions with

J u ly/A u g u s t 2 0 1 3 | IEEE s o f t w A r E 67

s4nag.indd 67 6/6/13 12:01 PM

FOCUS: SOFTWARE ANALYTICS: SO WHAT?

minimum-cost algorithm, CRANE is

Single branch removal able to recommend specific, high-value
tests.
The system has already been suc-
cessfully deployed in Windows, and
pilots are underway in other product
teams.

Example 2: Ad Hoc Analysis

for Decision Making
Velocity cost

Here’s a simple but very important

question: Is code coverage effective,
and is there a code coverage percentage
at which we should stop testing?
We analyzed code coverage of mul-
tiple released versions of Microsoft
products and correlated branch and
statement coverage with postrelease
0
failures. There was a strong positive
0 correlation between coverage and fail-
Conflict avoidance
ures. From discussions with the rela-
vant teams, we found out that there are
FIGURE 4. Velocity versus conflict avoidance. Red dots indicate branches that aren’t useful, several reasons for the existence of this
green dots indicate branches that are useful, and blue dots indicate branches with mixed inverse relationship:
utility.
• code covered doesn’t guarantee that
the code is correct;
other system components. Testing such with features, defects, people, code • having 100 percent code coverage
changes in reasonable time and at a reviews, and auxiliary data such as doesn’t mean the system will have
reasonable cost is a problem because an code coverage. CRANE is able to use no failures—rather, it means that
infi nite number of test cases can be ex- this data, and consequently, teams can bugs could be found outside antici-
ecuted for any modification.2 Further- automatically receive information on pated coverage scenarios; and
more, they’re applicable to hundreds of change composition, associated bugs, • each time a fi x is done, a test case
millions of users; even the smallest mis- similar changes, involved people, and is written to cover the fi x (often,
takes can translate to very costly fail- possible risks and recommend risk- changed binaries might therefore
ures and rework. mitigation steps. have high code coverage simply be-
CRANE is a failure prediction, CRANE is able to not only surface cause they have been modified sev-
change risk analysis, and test prioriti- information about changes but also eral times).
zation system at Microsoft Corporation provide interpretation via overlaying
that leverages existing research for the coverage data and statistical risk mod- This fi nding led us to a follow-up
development and maintenance of the els to identify the most risky and least study on the use of code coverage in
Windows operating system family.2 covered parts of a change. Figure 3 conjunction with code complexity (for
CRANE is built on top of the shows a snapshot of a CRANE analy- example, cyclomatic complexity and
CODE MINE infrastructure, as shown sis, which identifies change, coverage, class coupling) as a better indicator of
in Figure 2 on the top layer, where the dependency, people, and prior bug in- code quality. In addition, we were able
tools exist leveraging the CODEMINE formation. It allows engineers and en- to benchmark our results with external
platform. The CODEMINE data plat- gineering managers to focus their atten- organizations such as Avaya to com-
form constantly monitors changes tion on the most failure-prone parts of pare and contrast our results. 6 Studies
happening in the source code reposi- their work. Through use of code cov- of unit testing show its increased effec-
tory and can cross-reference these erage data and a maximum-coverage/ tiveness in obtaining high-quality code

68 I E E E s o f t w A r E | w w w. C o M P u t E r . o r g / s o f t w A r E

s4nag.indd 68 6/6/13 12:01 PM

because it eliminates the need for tes- and Windows 7) and observed that a Encode Process Information
ters to find the category of bugs that branch structure that aligns with the Process information, including release
could be more easily found by devel- team’s organizational structure leads to schedule (milestones and dates), orga-
opers and lets them focus more on sce- fewer postrelease failures than branches nization of code bases, team structure,
nario testing.7 aligned to the product’s architectural and so on, is very important to provide
layering. 8 better context—for example, why is
Example 3: Use of Data in New Research there a sudden spike in bugs (more us-
Many companies use branches in ver- Lessons on ers added), sudden spike in code churn
sion control systems to coordinate the Replicating CODEMINE (code integration milestone), and so on?
work of hundreds to thousands of de- One of our primary goals in this At Microsoft, this information isn’t
velopers building a software system or article is to help other organizations present in one place or tool. It might
service. Branches isolate concurrent replicate the work we’re doing with pop up in project tracking, a code re-
work, avoiding instability during de- CODEMINE to build their own data pository, and bug-tracking tools or be
velopment. The downside is an increase
in time for changes to move though the
system. So, can we determine the op-
timal branch structure, guaranteeing Branch evaluation lets teams identify
fast code velocity and a high degree of
isolation? Answering this question isn’t the cost paid for the benefit
only important to Microsoft but also to
other commercial companies and the
research community.
Toward this end, we performed analytics platform. We’ve compiled a configured with some level of custom-
various experiments on simulating dif- list of suggestions from our experience ization to interpret this. Organizations
ferent branch structures. 3 For exam- that would assist in replicating our should make plans to embed this infor-
ple, we replayed the check-in history CODEMINE effort and some things mation in the system to provide valu-
of several product groups, assuming for other organizations to think able metadata.
specific branches didn’t exist. Under of differently when building their
these conditions, all changes hierarchi- platform. Provide Flexibility and Extensibility for
cally roll up to a higher-level branch, Collected Data and Deployed Analytics
and we can detect conflicts by identi- Create an Independent Instance for Each Product teams have varying require-
fying files getting modified together. Product Team in the Data Platform ments and need the ability to define
The resulting graph (see Figure 4) plots Easy partitioning, the ability to con- which data and metadata are stored
the cost of a branch versus its value as strain access, and the ability to move in the data platform and how they’re
a factor, isolating parallel lines of de- parts of the infrastructure greatly as- analyzed; this will allow teams to best
velopment. In Figure 4, red dots indi- sisted us in creating independent reflect their existing processes or en-
cate branches that aren’t useful—that instances. able new ones. For example, one team
is, adding velocity and not providing might decide to add customer user data
much conflict avoidance. Green dots Have Uniform Interfaces for Data Analysis to their instance of their data store. The
indicate branches that are useful, and Even though multiple instances will ex- system should be able to fully support
blue dots indicate branches with mixed ist, applications need to rely on a com- such extensions.
utility. Branch structures are created mon set of services, APIs, or a stable
in context and to suit needs of a spe- schema present in each. The data plat- Allow Dynamic Discovery of Data
cific product and organization; such form interfaces’ evolution must be done Platform’s Capabilities by Application
branch evaluation lets teams identify very carefully; preserving backward Each application relying on the data
the cost paid for the benefit and iden- compatibility should be of primary platform needs the ability to identify
tify parts of the branch tree that should concern. This also greatly helps when capabilities of a particular data plat-
be restructured. 3 you build an application once and can form instance and adjust its function
We also analyzed the architectural redeploy it multiple times across several accordingly. For example, some prod-
structure of Windows (for both Vista data instances. uct groups collect and archive historical

J u ly/A u g u s t 2 0 1 3 | IEEE S o f t w a r e 69

s4nag.indd 69 6/6/13 12:01 PM

FOCUS: SOFTWARE ANALYTICS: SO WHAT?

Innovate at the Right Level of the Stack

Use mature foundational technology
Collaboration
New research and existing programming skills. As
surfaces
in software
further areas of much as possible, we try to use the op-
engineering
improvement erating system, storage, and database
platform technology that’s mature and
already part of Microsoft’s stack to
avoid spending time innovating, for
example, at the level of raw storage or
methods of distributed computation.
Instead, we focus on data availability,
accurate data acquisition, data clean-
Additional and Solutions/tools
clean data easily deployed ing, abstracting representation of the
available for in product engineering process, and data analysis.
further research groups In terms of accessing data, we need to
ensure any new programming models
used are absolutely necessary for the
task so we don’t create artificial barri-
Solved business ers of entry for users of our data.
problem

FIGURE 5. Cycle of collaboration and data availability.

code coverage data and some choose

not to. The tools must be able to seam-
the data platform must adhere to rules
of a well-behaved service defi ned by op-
W e’ve observed that once
data is easily accessible,
new usage scenarios open
up; for instance, CODEMINE is cur-
rently being used to understand on-
lessly scale their functionality down if erations teams. Resiliency to failure, re- boarding processes, optimize individ-
code coverage data isn’t available for a try logic, logging of fatal and nonfatal ual processes (like build), and optimize
particular product group. errors, health monitoring, and notifica- overall code flow.
tions should be built in. Another significant goal of the
Support Policies for Security, CODE MINE platform is enabling fu-
Privacy, and Audit Host as a Cloud Service ture research and analysis. Figure 5
The data platform must allow for set- Based on need, economic consider- explains the cycle of data availability
ting authorization, authentication, pri- ations, load, and availability require- where new research in software en-
vacy, and audit policies to reflect se- ments, carefully evaluate the necessity gineering spawns new solutions to be
curity requirements and policies of the to host the service on the cloud or on deployed in product groups. These so-
product group or the data owner. As traditional servers. Overengineering al- lutions can be used to solve large busi-
a general rule, information leaving the ways leads to wasted effort. ness problems and enable additional
data platform will be accessible only to research with the scaling out of the en-
people who were granted permission to Know the Data Platform Might gineering work. This further strength-
the original raw data coming in; how- Not Fulfill All Data Needs ens the collaboration and opens new
ever, stricter rules might apply to sub- The data platform will be scoped to avenues for collaboration and again
sets of data. provide data that’s used by several cli- leads to new research ideas.
ent applications—that is, there must As a way to propagate the ideas of
Allow Ongoing Support be a level of commonality of inputs for data-driven decision making, we re-
and Maintenance Outside of CODEMINE the platform to start serving the data. cently started a virtual community fo-
In most cases, product teams eventually However, applications can still access cused on sharing questions, solutions,
take over ownership and operations other, more specialized data sources methods, and tools related to engineer-
for their respective data platform in- and federate with the data platform as ing process data analysis. It is a cross-
stances. To ensure a smooth transition, their needs dictate. disciplinary group of product team

70 I E E E s o f t w A r E | w w w. C o M P u t E r . o r g / s o f t w A r E

s4nag.indd 70 6/6/13 12:01 PM

members and researchers with expe-

ABOUT THE AUTHORS

rience and backgrounds in empirical
software engineering, data analysis, JACEK CZERWONKA is a principal software architect in the Tools
and data visualization. The group’s for Software Engineers group at Microsoft. His research interests
include software testing and quality assurance, systems-level testing,
goal is to emphasize data-driven deci-
pairwise and model-based testing, and data-driven decision making
sion making in our teams and to equip on software projects. Czerwonka received his MSc in Computer Sci-
product teams with relevant guidelines, ence from Technical University of Szczecin. Contact him at jacekcz@
methods, and tools. As we realize our microsoft.com.
goals, the CODEMINE data platform
often serves as the common denomina-
tor in our community activities. NACHIAPPAN NAGAPPAN is a principal researcher in the Empiri-
cal Software Engineering group at Microsoft Research. His research
interests include software analytics, focusing software reliability, and
empirical software engineering processes. Nagappan received a PhD in
References computer science from North Carolina State University. Contact him at
1. N. Nagappan and T. Ball, “Use of Relative [email protected].
Code Churn Measures to Predict System De-
fect Density,” Proc. 27th Int’l Conf. Software
Eng. (ICSE 05), ACM, 2005, pp. 284–292.
2. J. Czerwonka et al., “CRANE: Failure Predic-
tion, Change Analysis and Test Prioritization
in Practice—Experiences from Windows,” WOLFRAM SCHULTE is an engineering general manager and
Proc. 4th Int’l Conf. Software Testing, Veri- principal researcher at Microsoft. His research interests include soft-
fi cation and Validation (ICST 11), IEEE CS, ware engineering, focusing on build, modeling, verification, test, and
2011, pp. 357–366. programming languages, ranging from language design to runtimes.
3. C. Bird and T. Zimmermann, “Assessing the Schulte received a PhD in computer science from the Technical Univer-
Value of Branches with What-If Analysis,” sity of Berlin. Contact him at [email protected].
Proc. ACM SIGSOFT 20th Int’l Symp. Foun-
dations of Software Eng. (FSE 12), ACM,
2012, pp. 45–54.
4. C. Bird et al., “Putting It All Together: Using BRENDAN MURPHY is a principal researcher at Microsoft Research. His research interests
Socio-technical Networks to Predict Failures,”
include system dependability, encompassing measurement, reliability, and availability. Contact
Proc. 20th Int’l Symp. Software Reliability
Eng. (ISSRE 09), IEEE CS, 2009, pp. 109–119.
him at [email protected].
5. B. Ashok et al., “DebugAdvisor: A Recom-
mender System for Debugging,” Proc. 7th
Joint Meeting European Software Eng. Conf.
and ACM SIGSSOFT Symp. Foundations of
Software Eng. (ESEC/FSE 09), ACM, 2009,
pp. 373–382.

Call Articles
6. A. Mockus, N. Nagappan, and T.T. Dinh-
Trong, “Test Coverage and Post-verifi cation
Defects: A Multiple Case Study,” Proc. 3rd
Int’l Symp. Empirical Software Eng. and
Measurement (ESEM 09), IEEE CS, 2009,

for
pp. 291–301.
7. L. Williams, G. Kudrjavets, and N. Nagappan,
“On the Effectiveness of Unit Test Automa- IEEE Software seeks practical, readable
tion at Microsoft,” Proc. 20th Int’l Symp. articles that will appeal to experts and nonexperts
Software Reliability Eng. (ISSRE 09), IEEE alike. The magazine aims to deliver reliable
CS, 2009, pp. 81–89.
information to software developers and managers
8. E. Shihab, C. Bird, and T. Zimmermann,
“The Effect of Branching Strategies on Soft- to help them stay on top of rapid technology
ware Quality,” Proc. Int’l Symp. Empirical change. Submissions must be original and no
Software Eng. and Measurement (ESEM 12), more than 4,700 words, including 200 words
ACM, 2012, pp. 301–310.
for each table and figure.

Author guidelines:
www.computer.org/software/author.htm
Further details: [email protected]
Selected CS articles and columns
are also available for free at www.computer.org/software
http://ComputingNow.computer.org.

J u ly/A u g u s t 2 0 1 3 | IEEE s o f t wA r E 71

s4nag.indd 71 6/6/13 12:01 PM

Week 3 - Data Engineering Lifecycle
100% (1)
Week 3 - Data Engineering Lifecycle
6 pages
Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant
From Everand
Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant
Mahbouba Gharbi
5/5 (2)
Learning Software Engineering
From Everand
Learning Software Engineering
IT Campus Academy
No ratings yet
Rajib Mall Lecture Notes
No ratings yet
Rajib Mall Lecture Notes
73 pages
Software Design And Development in your pocket
From Everand
Software Design And Development in your pocket
David Chen
5/5 (1)
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
From Everand
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
Fouad Sabry
No ratings yet
Building Analysis Model
No ratings yet
Building Analysis Model
51 pages
What Is A Data Platform
No ratings yet
What Is A Data Platform
18 pages
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Efficient Development with JetBrains Tools: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with JetBrains Tools: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming Best Practices for New Developers: A Practical Guide with Examples
From Everand
Programming Best Practices for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
From Everand
IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
Anant Jhingran
5/5 (1)
Unit 2 - Software Design
No ratings yet
Unit 2 - Software Design
219 pages
Chapter 31 Computer-Aided Software Engineering
100% (1)
Chapter 31 Computer-Aided Software Engineering
6 pages
145 questions for data scientists in software engineering
No ratings yet
145 questions for data scientists in software engineering
12 pages
Lecture 2 (Defining Data Analytics)
No ratings yet
Lecture 2 (Defining Data Analytics)
21 pages
Developer. : Object Oriented CASE Tools: Lost Opportunities and Future Directions
No ratings yet
Developer. : Object Oriented CASE Tools: Lost Opportunities and Future Directions
11 pages
dwdm portion
No ratings yet
dwdm portion
3 pages
Data Engineering Workbook
No ratings yet
Data Engineering Workbook
30 pages
ch06
No ratings yet
ch06
40 pages
Iswe 9
No ratings yet
Iswe 9
35 pages
Sencp 3
No ratings yet
Sencp 3
38 pages
SE.Unit no.5
No ratings yet
SE.Unit no.5
35 pages
These Slides Are Designed To Accompany Software Engineering: A Practitioner'S Approach, 7/E (Mcgraw-Roger Pressman. 1
No ratings yet
These Slides Are Designed To Accompany Software Engineering: A Practitioner'S Approach, 7/E (Mcgraw-Roger Pressman. 1
40 pages
Architecture Body of Knowledge
From Everand
Architecture Body of Knowledge
Ron Bennett
No ratings yet
Chapter 06
No ratings yet
Chapter 06
39 pages
DE Skills and Tools Guide
No ratings yet
DE Skills and Tools Guide
20 pages
5.case Tools
No ratings yet
5.case Tools
16 pages
Lec 18-19
No ratings yet
Lec 18-19
39 pages
Professional Application Lifecycle Management with Visual Studio 2012
From Everand
Professional Application Lifecycle Management with Visual Studio 2012
Mickey Gousset
No ratings yet
DOC-20250317-WA0008.
No ratings yet
DOC-20250317-WA0008.
19 pages
5 Data Enginnering Projefct
No ratings yet
5 Data Enginnering Projefct
9 pages
Object Oriented Software Engineering: COMSATS University, Islamabad
No ratings yet
Object Oriented Software Engineering: COMSATS University, Islamabad
41 pages
Model
No ratings yet
Model
16 pages
A Graph Based Approach To Manage CAE-Data in A Data-Lake
No ratings yet
A Graph Based Approach To Manage CAE-Data in A Data-Lake
6 pages
Chapter 1 Object Oriented Software Engineering and System Design
No ratings yet
Chapter 1 Object Oriented Software Engineering and System Design
113 pages
CASE Tools
No ratings yet
CASE Tools
20 pages
Expert Guide to Eclipse CDT: Definitive Reference for Developers and Engineers
From Everand
Expert Guide to Eclipse CDT: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Software Industry Leaders
From Everand
Software Industry Leaders
Zuri Deepwater
No ratings yet
Requirement Engg Unit 3.3
No ratings yet
Requirement Engg Unit 3.3
47 pages
Koneru Lakshmaiah College of Engineering: S.indira Priya Darsini V.chaitanya
No ratings yet
Koneru Lakshmaiah College of Engineering: S.indira Priya Darsini V.chaitanya
9 pages
Chapter 31 Computer-Aided Software Engineering
No ratings yet
Chapter 31 Computer-Aided Software Engineering
5 pages
Data Engineering - Beginner's Guide
100% (1)
Data Engineering - Beginner's Guide
9 pages
essentials-of-data-engineeringByMukeshSaini
No ratings yet
essentials-of-data-engineeringByMukeshSaini
30 pages
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
UCS602 AnalysisModelling
No ratings yet
UCS602 AnalysisModelling
86 pages
Comprehensive Guide to .NET MAUI Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to .NET MAUI Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
2006 Model-Driven Engineering
No ratings yet
2006 Model-Driven Engineering
9 pages
se unit 5
No ratings yet
se unit 5
100 pages
Lecture 3a Big Data
No ratings yet
Lecture 3a Big Data
18 pages
Sepm m2 24 Partii
No ratings yet
Sepm m2 24 Partii
68 pages
Spring 2.5 Aspect Oriented Programming
From Everand
Spring 2.5 Aspect Oriented Programming
Massimiliano DessÃ¬
No ratings yet
ch09
No ratings yet
ch09
31 pages
NetBeans Development Guide: Definitive Reference for Developers and Engineers
From Everand
NetBeans Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
From Everand
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
DG. Junior
No ratings yet
SOFTDES MODULE 1 BCP INTRODUCTION TO SOFTWARE ENGINEERING Student
No ratings yet
SOFTDES MODULE 1 BCP INTRODUCTION TO SOFTWARE ENGINEERING Student
6 pages
Module 02 CH 6
No ratings yet
Module 02 CH 6
57 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Week 2 Data Rols DataPlatfro Use Cases v1 S25
No ratings yet
Week 2 Data Rols DataPlatfro Use Cases v1 S25
50 pages
OOP JAVA M1 Ktunotes - in
No ratings yet
OOP JAVA M1 Ktunotes - in
170 pages
(PAL) Programmable Array Logic
No ratings yet
(PAL) Programmable Array Logic
8 pages
Libreoffice All ShortCut Keys
No ratings yet
Libreoffice All ShortCut Keys
10 pages
IT Law Course PLan
No ratings yet
IT Law Course PLan
10 pages
Odoo Functional Training v8 MRP PDF
No ratings yet
Odoo Functional Training v8 MRP PDF
8 pages
11 JPA Hibernate Query Hints
No ratings yet
11 JPA Hibernate Query Hints
4 pages
MSTDB A Hybrid Storage-Empowered Scalable Semantic Blockchain Database
No ratings yet
MSTDB A Hybrid Storage-Empowered Scalable Semantic Blockchain Database
17 pages
POS120 Series Installation Guide V1.4
No ratings yet
POS120 Series Installation Guide V1.4
18 pages
Jurnal 17
No ratings yet
Jurnal 17
7 pages
Digital Marketing Course
No ratings yet
Digital Marketing Course
5 pages
Fabric Fundamentals
No ratings yet
Fabric Fundamentals
759 pages
How To Connect An HTML Page To A Microsoft Access Database
No ratings yet
How To Connect An HTML Page To A Microsoft Access Database
6 pages
Lab 2
No ratings yet
Lab 2
10 pages
Adding Dante To Your Network Audinate
No ratings yet
Adding Dante To Your Network Audinate
1 page
DEBUG
No ratings yet
DEBUG
11 pages
Circuit Designer or Provisioner or Project Coordinator or Order
No ratings yet
Circuit Designer or Provisioner or Project Coordinator or Order
3 pages
Cmas 3GPP Spec
No ratings yet
Cmas 3GPP Spec
13 pages
Cell-Phone Controlled Spy-Bot: Major Project 2011-12
No ratings yet
Cell-Phone Controlled Spy-Bot: Major Project 2011-12
29 pages
Different Structured of Troubleshooting Techniques
No ratings yet
Different Structured of Troubleshooting Techniques
2 pages
STM32 HTTP Camera
No ratings yet
STM32 HTTP Camera
8 pages
Question Bank Module 4 and 5
No ratings yet
Question Bank Module 4 and 5
2 pages
Revised WT Imp Questions1
No ratings yet
Revised WT Imp Questions1
2 pages
Review Of: "A View of 20th and 21st Century Software Engineering"
No ratings yet
Review Of: "A View of 20th and 21st Century Software Engineering"
2 pages
ECDIS Type Specific Familiarization
100% (1)
ECDIS Type Specific Familiarization
4 pages
Car Rental App FINAL
No ratings yet
Car Rental App FINAL
25 pages
12 Strings
No ratings yet
12 Strings
13 pages
HCI Visit MacArthur Documentation
No ratings yet
HCI Visit MacArthur Documentation
14 pages
AEG-Siemens Stock Sheet 2024-1
No ratings yet
AEG-Siemens Stock Sheet 2024-1
13 pages
MT Sics
No ratings yet
MT Sics
46 pages
Hospital Management System C++
No ratings yet
Hospital Management System C++
12 pages

CodeMine

Uploaded by

CodeMine

Uploaded by

FOCUS: SOFTWARE ANALYTICS: SO WHAT?

CODEMINE: though each solution is unique in its

Building a Software are commonalities of inputs, outputs,

Data Sources and Schema

s4nag.indd 64 6/6/13 12:01 PM

Feature/defect Integration Product Schedule

Person belongs to built on

works edits belongs to tests

s4nag.indd 65 6/6/13 12:01 PM

Data catalog and data recovery

Code People Process Data publishing and replication

Work Flexibility in data availability

Understand format of repositories

Code Bugs Builds Organization Tests

Small number of source schemas

FIGURE 2. High-level architecture of a CODEMINE data platform instance.

s4nag.indd 66 6/6/13 12:01 PM

The data model is the preferred

s4nag.indd 67 6/6/13 12:01 PM

minimum-cost algorithm, CRANE is

Example 2: Ad Hoc Analysis

Here’s a simple but very important

s4nag.indd 68 6/6/13 12:01 PM

s4nag.indd 69 6/6/13 12:01 PM

Innovate at the Right Level of the Stack

FIGURE 5. Cycle of collaboration and data availability.

code coverage data and some choose

s4nag.indd 70 6/6/13 12:01 PM

ABOUT THE AUTHORS

s4nag.indd 71 6/6/13 12:01 PM

You might also like