Software engineering practices and software quality empirical research results

Nikolai Avteniev
Software Engineer
LinkedIn NYC
Software engineering practices and
software quality empirical research
results

Becoming a Software Engineer
3
My Journey
• First Line Of Code (Pascal 1997)
• Discovered other languages (Assembly, C/C++, 1998)
• Discovered Java (1999)
• First Software Engineering Job (2001)
• Discovered XP (2002)
• Founded a Software Company (2004)
• Joined A Web 2.0 Startup (2009)
• First Software Conference (2011)
• Joined LinkedIn (2012)
• Discovered Empirical Software Engineering (2013)

4
Making Software
What Really Works and Why We Believe It
By: Andy Gram & Greg Wilson
Covers:
• Empirical Software Engineering
• Results From Research

5
Motivation
• What is software engineering as a craft?
• How to measure software quality?
• How do others measure software quality?

6
Summary
• Empirical Software Engineering Research Works!
• Post Release Defect Counts / Rates are useful data!
• RESULT: Structure of Engineering Organization is Important
• RESULT: Code Review Work, but not how we expect.
• RESULT: Unit Testing Works!
• RESULT: Defect prediction might be useful with some work
• RESULT: Static Code Analysis can help, needs some work
• Software Giants are using empirical research to improve their process.

Software Engineering Org and Code
Ownership

The influence of organizational structure on software quality: an
empirical case study.
8
Organization Structure might Predict Quality
• Microsoft Vista OS made up of 3000 components
• Overlaid Organization structure over the components
• Build logistic regression to predict defects
• Resulting model was able to identify defective components with recall 84%
and precision of 86%
• Best model they built!

Don't touch my code!: examining the effects of ownership on software
quality.
9
Code Ownership Matters
• Microsoft Vista and 7 Operating System
• Split contributions into minor contributors < 5% and major contributors > 5%
• Number of minor contributors has a significant correlation with post release defects
• Removing minor contributors from predictive model reduced the effectiveness of
the model

Programmer-based fault prediction.
10
Individuals Are Unpredictable
• AT&T 16 releases of a 1.4M LOC 3100 file system
• Do developer bug ratio remains the same between releases?
• NO!
• Does including information on a particular developer improve the accuracy of the
predictions?
• NO!
• Cumulative number of contributors can improve prediction!

Expectations, outcomes, and challenges of modern code review.
12
Code reviews a not just for defects
• Microsoft Code Review Process
• Managers and ICs think code reviews are primarily for finding defects
• Code reviews result primarily in code improvements
• Finding defects requires more context and better understanding of the code
• Communication is important to effective code reviews

Characteristics of useful code reviews: an empirical study at
Microsoft.
13
Not all comments are useful
• Microsoft analyzed over a million code review comments across five teams
• Defects < 5%, False positive ~ 11%
• 64% to 68% of reviews were classified as useful
• Reviewers experienced with the code give more useful comments
• Smaller change sets get more useful comments
• Build and configuration file changes get fewer useful comments

Best kept secrets of peer code review.
14
Code Reviews Done Right
• Cisco Systems code review process
• “defect” definition: reviewer and author agree it needs to be fixed
• Smaller code reviews are more effective
• After 90 minutes the reviewers are less effective
• 300 LOC/per hour

Evaluating the efficacy of test-driven development: industrial case
studies.
16
TDD takes longer, and it’s worth it.
• Microsoft total of 3 projects
• Compared to teams working on similar projects
• Managers estimate it took 15-35% longer
• Project that didn’t use TDD saw 2x defect density
• Verified by similar study at IBM

Test coverage and post-verification defects: A multiple case study.
17
More test less bugs, but it will cost you!
• Microsoft and Avaya
• Measured relationship between code coverage and defect density
• Better coverage correlates with reduced defect density*
• Biggest wins at 80-100% coverage
• It takes more effort to go from 50%-90% then to go from 0-50%

Where the bugs are.
19
80:20 Rule For Bugs
• AT&T Two Production Systems
• 20% of files account for between 71% and 92% of Bugs
• This can help focus testing efforts

Assessing the impact of using fault-prediction in industry
20
Need More Data
• AT&T Research
• All analysis was post-hoc not impact of development of the software
• Track defects over time
• Might help with testing efficiency

Does bug prediction support human developers? findings from a
Google case study.
21
Predictor aren’t helpful
• Google two production systems
• Using state of the art predictors
• Predictions need to be relevant, reasonable, and actionable
• “Please Fix” in code review tool had no effect on reviewers

Evaluating static analysis defect warnings on production software.
23
FindBugs needs some work
• Google java code base and JRE 6 and Sun’s Glassfish JEE
• FindBugs finds true but low impact bugs
• Out of 1127 medium/high priority warning 193 were impossible
• Need an effort to curate initial set of issues, a system to keep track of effectiveness
of the tool.

Static analysis tools as early indicators of pre-release defect density.
24
Find Issues Before You Ship!
• Microsoft Windows Server 2003
• Two tools PREFix and PREfast
• High density of static issues positively correlate with large defect density in testing

Assessing the relationship between software assertions and code
quality: an empirical investigation.
25
Assert Are Better!
• Microsoft Visual Studio
• Assertion density has negative correlation with defect density
• Assertions identified more defects than static code analysis
• Code Assertions > Static Code Checks

Development and deployment at Facebook.
27
Facebook
• Facebook practices perpetual development
• Used unit testing and automated unit test execution
• Peer code reviews
• Feature Flags
• Frequent deploys
• Engineers in IRC during deploy
• INFER static analysis tools for mobile apps

Google's Innovation Factory: Testing, Culture, and Infrastructure.
28
Google
• Peer Code Reviews
• Automated Testing
• Encourage teams to improve code quality
• Tricorder turns every engineer into a static code analysis tool builder

30
References
1. Ayewah, Nathaniel, et al. "Evaluating static analysis defect warnings on production software." Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program
analysis for software tools and engineering. ACM, 2007.
2. Bacchelli, Alberto, and Christian Bird. "Expectations, outcomes, and challenges of modern code review." Proceedings of the 2013 international conference on software
engineering. IEEE Press, 2013.
3. Bird, Christian, et al. "Don't touch my code!: examining the effects of ownership on software quality." Proceedings of the 19th ACM SIGSOFT symposium and the 13th
European conference on Foundations of software engineering. ACM, 2011.
4. Bhat, Thirumalesh, and Nachiappan Nagappan. "Evaluating the efficacy of test-driven development: industrial case studies." Proceedings of the 2006 ACM/IEEE international
symposium on Empirical software engineering. ACM, 2006.
5. Calcagno, Cristiano, et al. "Moving Fast with Software Verification." NASA Formal Methods (2015): 3-11.
6. Cohen, Jason, et al. "Best kept secrets of peer code review." (2006).
7. Copeland, Patrick. "Google's Innovation Factory: Testing, Culture, and Infrastructure." Software Testing, Verification and Validation (ICST), 2010 Third International
Conference on. IEEE, 2010.
8. Feitelson, Dror G., Eitan Frachtenberg, and Kent L. Beck. "Development and deployment at Facebook." IEEE Internet Computing 4 (2013): 8-17.
9. Johnson, Bryant, et al. "Why don't software developers use static analysis tools to find bugs?." Software Engineering (ICSE), 2013 35th International Conference on. IEEE,
2013.

31
References
10. Kudrjavets, Gunnar, Nachiappan Nagappan, and Thomas Ball. "Assessing the relationship between software assertions and code quality: an empirical investigation."
Proceedings of the 17 th International Symposium on Software Reliability Engineering. 2006.
11. Lewis, Carmen, et al. "Does bug prediction support human developers? findings from a google case study." Software Engineering (ICSE), 2013 35th International Conference
on. IEEE, 2013.
12. Mockus, Audris, Nachiappan Nagappan, and Trung T. Dinh-Trong. "Test coverage and post-verification defects: A multiple case study." Empirical Software Engineering and
Measurement, 2009. ESEM 2009. 3rd International Symposium on. IEEE, 2009.
13. Nagappan, Nachiappan, et al. "Realizing quality improvement through test driven development: results and experiences of four industrial teams." Empirical Software
Engineering 13.3 (2008): 289-302.
14. Nagappan, Nachiappan, Brendan Murphy, and Victor Basili. "The influence of organizational structure on software quality: an empirical case study."Proceedings of the 30th
international conference on Software engineering. ACM, 2008.
15. Nagappan, Nachiappan, and Thomas Ball. "Static analysis tools as early indicators of pre-release defect density." Proceedings of the 27th international conference on
Software engineering. ACM, 2005.
16. Newcombe, Chris, et al. "Use of Formal Methods at Amazon Web Services." (2014).
17. Oram, Andy, and Greg Wilson. Making software: What really works, and why we believe it. " O'Reilly Media, Inc.", 2010.
18. Ostrand, Thomas J., Elaine J. Weyuker, and Robert M. Bell. "Programmer-based fault prediction." Proceedings of the 6th International Conference on Predictive Models in

32
References
19. Ostrand, Thomas J., Elaine J. Weyuker, and Robert M. Bell. "Where the bugs are." ACM SIGSOFT Software Engineering Notes. Vol. 29. No. 4. ACM, 2004.
20. Sadowski, Caitlin, et al. "Tricorder: Building a Program Analysis Ecosystem."
21. Sanchez, Julio Cesar, Laurie Williams, and E. Michael Maximilien. "On the sustained use of a test-driven development practice at ibm." Agile Conference (AGILE), 2007.
IEEE, 2007.
22. Weyuker, Elaine J. "Empirical software engineering research-the good, the bad, the ugly." Empirical Software Engineering and Measurement (ESEM), 2011 International
Symposium on. IEEE, 2011.
23. Weyuker, Elaine J., Thomas J. Ostrand, and Robert M. Bell. "Assessing the impact of using fault-prediction in industry." Testing: Academic & Industrial Conference (TAIC
2011). 2011.
24. Williams, Laurie, Gunnar Kudrjavets, and Nachiappan Nagappan. "On the effectiveness of unit test automation at microsoft." Software Reliability Engineering, 2009.
ISSRE'09. 20th International Symposium on. IEEE, 2009.

Software engineering practices and software quality empirical research results

More Related Content

What's hot

Similar to Software engineering practices and software quality empirical research results

Recently uploaded

Software engineering practices and software quality empirical research results

Editor's Notes