Nikolai Avteniev
Software Engineer
LinkedIn NYC
Software engineering practices and
software quality empirical research
results
Motivation and Inspiration
Becoming a Software Engineer
3
My Journey
• First Line Of Code (Pascal 1997)
• Discovered other languages (Assembly, C/C++, 1998)
• Discovered Java (1999)
• First Software Engineering Job (2001)
• Discovered XP (2002)
• Founded a Software Company (2004)
• Joined A Web 2.0 Startup (2009)
• First Software Conference (2011)
• Joined LinkedIn (2012)
• Discovered Empirical Software Engineering (2013)
4
Making Software
What Really Works and Why We Believe It
By: Andy Gram & Greg Wilson
Covers:
• Empirical Software Engineering
• Results From Research
5
Motivation
• What is software engineering as a craft?
• How to measure software quality?
• How do others measure software quality?
6
Summary
• Empirical Software Engineering Research Works!
• Post Release Defect Counts / Rates are useful data!
• RESULT: Structure of Engineering Organization is Important
• RESULT: Code Review Work, but not how we expect.
• RESULT: Unit Testing Works!
• RESULT: Defect prediction might be useful with some work
• RESULT: Static Code Analysis can help, needs some work
• Software Giants are using empirical research to improve their process.
Software Engineering Org and Code
Ownership
The influence of organizational structure on software quality: an
empirical case study.
8
Organization Structure might Predict Quality
• Microsoft Vista OS made up of 3000 components
• Overlaid Organization structure over the components
• Build logistic regression to predict defects
• Resulting model was able to identify defective components with recall 84%
and precision of 86%
• Best model they built!
Don't touch my code!: examining the effects of ownership on software
quality.
9
Code Ownership Matters
• Microsoft Vista and 7 Operating System
• Split contributions into minor contributors < 5% and major contributors > 5%
• Number of minor contributors has a significant correlation with post release defects
• Removing minor contributors from predictive model reduced the effectiveness of
the model
Programmer-based fault prediction.
10
Individuals Are Unpredictable
• AT&T 16 releases of a 1.4M LOC 3100 file system
• Do developer bug ratio remains the same between releases?
• NO!
• Does including information on a particular developer improve the accuracy of the
predictions?
• NO!
• Cumulative number of contributors can improve prediction!
Code Reviews
Expectations, outcomes, and challenges of modern code review.
12
Code reviews a not just for defects
• Microsoft Code Review Process
• Managers and ICs think code reviews are primarily for finding defects
• Code reviews result primarily in code improvements
• Finding defects requires more context and better understanding of the code
• Communication is important to effective code reviews
Characteristics of useful code reviews: an empirical study at
Microsoft.
13
Not all comments are useful
• Microsoft analyzed over a million code review comments across five teams
• Defects < 5%, False positive ~ 11%
• 64% to 68% of reviews were classified as useful
• Reviewers experienced with the code give more useful comments
• Smaller change sets get more useful comments
• Build and configuration file changes get fewer useful comments
Best kept secrets of peer code review.
14
Code Reviews Done Right
• Cisco Systems code review process
• “defect” definition: reviewer and author agree it needs to be fixed
• Smaller code reviews are more effective
• After 90 minutes the reviewers are less effective
• 300 LOC/per hour
Testing
Evaluating the efficacy of test-driven development: industrial case
studies.
16
TDD takes longer, and it’s worth it.
• Microsoft total of 3 projects
• Compared to teams working on similar projects
• Managers estimate it took 15-35% longer
• Project that didn’t use TDD saw 2x defect density
• Verified by similar study at IBM
Test coverage and post-verification defects: A multiple case study.
17
More test less bugs, but it will cost you!
• Microsoft and Avaya
• Measured relationship between code coverage and defect density
• Better coverage correlates with reduced defect density*
• Biggest wins at 80-100% coverage
• It takes more effort to go from 50%-90% then to go from 0-50%
Defect Prediction
Where the bugs are.
19
80:20 Rule For Bugs
• AT&T Two Production Systems
• 20% of files account for between 71% and 92% of Bugs
• This can help focus testing efforts
Assessing the impact of using fault-prediction in industry
20
Need More Data
• AT&T Research
• All analysis was post-hoc not impact of development of the software
• Track defects over time
• Might help with testing efficiency
Does bug prediction support human developers? findings from a
Google case study.
21
Predictor aren’t helpful
• Google two production systems
• Using state of the art predictors
• Predictions need to be relevant, reasonable, and actionable
• “Please Fix” in code review tool had no effect on reviewers
Static Code Analysis
Evaluating static analysis defect warnings on production software.
23
FindBugs needs some work
• Google java code base and JRE 6 and Sun’s Glassfish JEE
• FindBugs finds true but low impact bugs
• Out of 1127 medium/high priority warning 193 were impossible
• Need an effort to curate initial set of issues, a system to keep track of effectiveness
of the tool.
Static analysis tools as early indicators of pre-release defect density.
24
Find Issues Before You Ship!
• Microsoft Windows Server 2003
• Two tools PREFix and PREfast
• High density of static issues positively correlate with large defect density in testing
Assessing the relationship between software assertions and code
quality: an empirical investigation.
25
Assert Are Better!
• Microsoft Visual Studio
• Assertion density has negative correlation with defect density
• Assertions identified more defects than static code analysis
• Code Assertions > Static Code Checks
Applications in Industry
Development and deployment at Facebook.
27
Facebook
• Facebook practices perpetual development
• Used unit testing and automated unit test execution
• Peer code reviews
• Feature Flags
• Frequent deploys
• Engineers in IRC during deploy
• INFER static analysis tools for mobile apps
Google's Innovation Factory: Testing, Culture, and Infrastructure.
28
Google
• Peer Code Reviews
• Automated Testing
• Encourage teams to improve code quality
• Tricorder turns every engineer into a static code analysis tool builder
Q&A
30
References
1. Ayewah, Nathaniel, et al. "Evaluating static analysis defect warnings on production software." Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program
analysis for software tools and engineering. ACM, 2007.
2. Bacchelli, Alberto, and Christian Bird. "Expectations, outcomes, and challenges of modern code review." Proceedings of the 2013 international conference on software
engineering. IEEE Press, 2013.
3. Bird, Christian, et al. "Don't touch my code!: examining the effects of ownership on software quality." Proceedings of the 19th ACM SIGSOFT symposium and the 13th
European conference on Foundations of software engineering. ACM, 2011.
4. Bhat, Thirumalesh, and Nachiappan Nagappan. "Evaluating the efficacy of test-driven development: industrial case studies." Proceedings of the 2006 ACM/IEEE international
symposium on Empirical software engineering. ACM, 2006.
5. Calcagno, Cristiano, et al. "Moving Fast with Software Verification." NASA Formal Methods (2015): 3-11.
6. Cohen, Jason, et al. "Best kept secrets of peer code review." (2006).
7. Copeland, Patrick. "Google's Innovation Factory: Testing, Culture, and Infrastructure." Software Testing, Verification and Validation (ICST), 2010 Third International
Conference on. IEEE, 2010.
8. Feitelson, Dror G., Eitan Frachtenberg, and Kent L. Beck. "Development and deployment at Facebook." IEEE Internet Computing 4 (2013): 8-17.
9. Johnson, Bryant, et al. "Why don't software developers use static analysis tools to find bugs?." Software Engineering (ICSE), 2013 35th International Conference on. IEEE,
2013.
31
References
10. Kudrjavets, Gunnar, Nachiappan Nagappan, and Thomas Ball. "Assessing the relationship between software assertions and code quality: an empirical investigation."
Proceedings of the 17 th International Symposium on Software Reliability Engineering. 2006.
11. Lewis, Carmen, et al. "Does bug prediction support human developers? findings from a google case study." Software Engineering (ICSE), 2013 35th International Conference
on. IEEE, 2013.
12. Mockus, Audris, Nachiappan Nagappan, and Trung T. Dinh-Trong. "Test coverage and post-verification defects: A multiple case study." Empirical Software Engineering and
Measurement, 2009. ESEM 2009. 3rd International Symposium on. IEEE, 2009.
13. Nagappan, Nachiappan, et al. "Realizing quality improvement through test driven development: results and experiences of four industrial teams." Empirical Software
Engineering 13.3 (2008): 289-302.
14. Nagappan, Nachiappan, Brendan Murphy, and Victor Basili. "The influence of organizational structure on software quality: an empirical case study."Proceedings of the 30th
international conference on Software engineering. ACM, 2008.
15. Nagappan, Nachiappan, and Thomas Ball. "Static analysis tools as early indicators of pre-release defect density." Proceedings of the 27th international conference on
Software engineering. ACM, 2005.
16. Newcombe, Chris, et al. "Use of Formal Methods at Amazon Web Services." (2014).
17. Oram, Andy, and Greg Wilson. Making software: What really works, and why we believe it. " O'Reilly Media, Inc.", 2010.
18. Ostrand, Thomas J., Elaine J. Weyuker, and Robert M. Bell. "Programmer-based fault prediction." Proceedings of the 6th International Conference on Predictive Models in
32
References
19. Ostrand, Thomas J., Elaine J. Weyuker, and Robert M. Bell. "Where the bugs are." ACM SIGSOFT Software Engineering Notes. Vol. 29. No. 4. ACM, 2004.
20. Sadowski, Caitlin, et al. "Tricorder: Building a Program Analysis Ecosystem."
21. Sanchez, Julio Cesar, Laurie Williams, and E. Michael Maximilien. "On the sustained use of a test-driven development practice at ibm." Agile Conference (AGILE), 2007.
IEEE, 2007.
22. Weyuker, Elaine J. "Empirical software engineering research-the good, the bad, the ugly." Empirical Software Engineering and Measurement (ESEM), 2011 International
Symposium on. IEEE, 2011.
23. Weyuker, Elaine J., Thomas J. Ostrand, and Robert M. Bell. "Assessing the impact of using fault-prediction in industry." Testing: Academic & Industrial Conference (TAIC
2011). 2011.
24. Williams, Laurie, Gunnar Kudrjavets, and Nachiappan Nagappan. "On the effectiveness of unit test automation at microsoft." Software Reliability Engineering, 2009.
ISSRE'09. 20th International Symposium on. IEEE, 2009.

Software engineering practices and software quality empirical research results

  • 1.
    Nikolai Avteniev Software Engineer LinkedInNYC Software engineering practices and software quality empirical research results
  • 2.
  • 3.
    Becoming a SoftwareEngineer 3 My Journey • First Line Of Code (Pascal 1997) • Discovered other languages (Assembly, C/C++, 1998) • Discovered Java (1999) • First Software Engineering Job (2001) • Discovered XP (2002) • Founded a Software Company (2004) • Joined A Web 2.0 Startup (2009) • First Software Conference (2011) • Joined LinkedIn (2012) • Discovered Empirical Software Engineering (2013)
  • 4.
    4 Making Software What ReallyWorks and Why We Believe It By: Andy Gram & Greg Wilson Covers: • Empirical Software Engineering • Results From Research
  • 5.
    5 Motivation • What issoftware engineering as a craft? • How to measure software quality? • How do others measure software quality?
  • 6.
    6 Summary • Empirical SoftwareEngineering Research Works! • Post Release Defect Counts / Rates are useful data! • RESULT: Structure of Engineering Organization is Important • RESULT: Code Review Work, but not how we expect. • RESULT: Unit Testing Works! • RESULT: Defect prediction might be useful with some work • RESULT: Static Code Analysis can help, needs some work • Software Giants are using empirical research to improve their process.
  • 7.
    Software Engineering Organd Code Ownership
  • 8.
    The influence oforganizational structure on software quality: an empirical case study. 8 Organization Structure might Predict Quality • Microsoft Vista OS made up of 3000 components • Overlaid Organization structure over the components • Build logistic regression to predict defects • Resulting model was able to identify defective components with recall 84% and precision of 86% • Best model they built!
  • 9.
    Don't touch mycode!: examining the effects of ownership on software quality. 9 Code Ownership Matters • Microsoft Vista and 7 Operating System • Split contributions into minor contributors < 5% and major contributors > 5% • Number of minor contributors has a significant correlation with post release defects • Removing minor contributors from predictive model reduced the effectiveness of the model
  • 10.
    Programmer-based fault prediction. 10 IndividualsAre Unpredictable • AT&T 16 releases of a 1.4M LOC 3100 file system • Do developer bug ratio remains the same between releases? • NO! • Does including information on a particular developer improve the accuracy of the predictions? • NO! • Cumulative number of contributors can improve prediction!
  • 11.
  • 12.
    Expectations, outcomes, andchallenges of modern code review. 12 Code reviews a not just for defects • Microsoft Code Review Process • Managers and ICs think code reviews are primarily for finding defects • Code reviews result primarily in code improvements • Finding defects requires more context and better understanding of the code • Communication is important to effective code reviews
  • 13.
    Characteristics of usefulcode reviews: an empirical study at Microsoft. 13 Not all comments are useful • Microsoft analyzed over a million code review comments across five teams • Defects < 5%, False positive ~ 11% • 64% to 68% of reviews were classified as useful • Reviewers experienced with the code give more useful comments • Smaller change sets get more useful comments • Build and configuration file changes get fewer useful comments
  • 14.
    Best kept secretsof peer code review. 14 Code Reviews Done Right • Cisco Systems code review process • “defect” definition: reviewer and author agree it needs to be fixed • Smaller code reviews are more effective • After 90 minutes the reviewers are less effective • 300 LOC/per hour
  • 15.
  • 16.
    Evaluating the efficacyof test-driven development: industrial case studies. 16 TDD takes longer, and it’s worth it. • Microsoft total of 3 projects • Compared to teams working on similar projects • Managers estimate it took 15-35% longer • Project that didn’t use TDD saw 2x defect density • Verified by similar study at IBM
  • 17.
    Test coverage andpost-verification defects: A multiple case study. 17 More test less bugs, but it will cost you! • Microsoft and Avaya • Measured relationship between code coverage and defect density • Better coverage correlates with reduced defect density* • Biggest wins at 80-100% coverage • It takes more effort to go from 50%-90% then to go from 0-50%
  • 18.
  • 19.
    Where the bugsare. 19 80:20 Rule For Bugs • AT&T Two Production Systems • 20% of files account for between 71% and 92% of Bugs • This can help focus testing efforts
  • 20.
    Assessing the impactof using fault-prediction in industry 20 Need More Data • AT&T Research • All analysis was post-hoc not impact of development of the software • Track defects over time • Might help with testing efficiency
  • 21.
    Does bug predictionsupport human developers? findings from a Google case study. 21 Predictor aren’t helpful • Google two production systems • Using state of the art predictors • Predictions need to be relevant, reasonable, and actionable • “Please Fix” in code review tool had no effect on reviewers
  • 22.
  • 23.
    Evaluating static analysisdefect warnings on production software. 23 FindBugs needs some work • Google java code base and JRE 6 and Sun’s Glassfish JEE • FindBugs finds true but low impact bugs • Out of 1127 medium/high priority warning 193 were impossible • Need an effort to curate initial set of issues, a system to keep track of effectiveness of the tool.
  • 24.
    Static analysis toolsas early indicators of pre-release defect density. 24 Find Issues Before You Ship! • Microsoft Windows Server 2003 • Two tools PREFix and PREfast • High density of static issues positively correlate with large defect density in testing
  • 25.
    Assessing the relationshipbetween software assertions and code quality: an empirical investigation. 25 Assert Are Better! • Microsoft Visual Studio • Assertion density has negative correlation with defect density • Assertions identified more defects than static code analysis • Code Assertions > Static Code Checks
  • 26.
  • 27.
    Development and deploymentat Facebook. 27 Facebook • Facebook practices perpetual development • Used unit testing and automated unit test execution • Peer code reviews • Feature Flags • Frequent deploys • Engineers in IRC during deploy • INFER static analysis tools for mobile apps
  • 28.
    Google's Innovation Factory:Testing, Culture, and Infrastructure. 28 Google • Peer Code Reviews • Automated Testing • Encourage teams to improve code quality • Tricorder turns every engineer into a static code analysis tool builder
  • 29.
  • 30.
    30 References 1. Ayewah, Nathaniel,et al. "Evaluating static analysis defect warnings on production software." Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 2007. 2. Bacchelli, Alberto, and Christian Bird. "Expectations, outcomes, and challenges of modern code review." Proceedings of the 2013 international conference on software engineering. IEEE Press, 2013. 3. Bird, Christian, et al. "Don't touch my code!: examining the effects of ownership on software quality." Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 2011. 4. Bhat, Thirumalesh, and Nachiappan Nagappan. "Evaluating the efficacy of test-driven development: industrial case studies." Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering. ACM, 2006. 5. Calcagno, Cristiano, et al. "Moving Fast with Software Verification." NASA Formal Methods (2015): 3-11. 6. Cohen, Jason, et al. "Best kept secrets of peer code review." (2006). 7. Copeland, Patrick. "Google's Innovation Factory: Testing, Culture, and Infrastructure." Software Testing, Verification and Validation (ICST), 2010 Third International Conference on. IEEE, 2010. 8. Feitelson, Dror G., Eitan Frachtenberg, and Kent L. Beck. "Development and deployment at Facebook." IEEE Internet Computing 4 (2013): 8-17. 9. Johnson, Bryant, et al. "Why don't software developers use static analysis tools to find bugs?." Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 2013.
  • 31.
    31 References 10. Kudrjavets, Gunnar,Nachiappan Nagappan, and Thomas Ball. "Assessing the relationship between software assertions and code quality: an empirical investigation." Proceedings of the 17 th International Symposium on Software Reliability Engineering. 2006. 11. Lewis, Carmen, et al. "Does bug prediction support human developers? findings from a google case study." Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 2013. 12. Mockus, Audris, Nachiappan Nagappan, and Trung T. Dinh-Trong. "Test coverage and post-verification defects: A multiple case study." Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on. IEEE, 2009. 13. Nagappan, Nachiappan, et al. "Realizing quality improvement through test driven development: results and experiences of four industrial teams." Empirical Software Engineering 13.3 (2008): 289-302. 14. Nagappan, Nachiappan, Brendan Murphy, and Victor Basili. "The influence of organizational structure on software quality: an empirical case study."Proceedings of the 30th international conference on Software engineering. ACM, 2008. 15. Nagappan, Nachiappan, and Thomas Ball. "Static analysis tools as early indicators of pre-release defect density." Proceedings of the 27th international conference on Software engineering. ACM, 2005. 16. Newcombe, Chris, et al. "Use of Formal Methods at Amazon Web Services." (2014). 17. Oram, Andy, and Greg Wilson. Making software: What really works, and why we believe it. " O'Reilly Media, Inc.", 2010. 18. Ostrand, Thomas J., Elaine J. Weyuker, and Robert M. Bell. "Programmer-based fault prediction." Proceedings of the 6th International Conference on Predictive Models in
  • 32.
    32 References 19. Ostrand, ThomasJ., Elaine J. Weyuker, and Robert M. Bell. "Where the bugs are." ACM SIGSOFT Software Engineering Notes. Vol. 29. No. 4. ACM, 2004. 20. Sadowski, Caitlin, et al. "Tricorder: Building a Program Analysis Ecosystem." 21. Sanchez, Julio Cesar, Laurie Williams, and E. Michael Maximilien. "On the sustained use of a test-driven development practice at ibm." Agile Conference (AGILE), 2007. IEEE, 2007. 22. Weyuker, Elaine J. "Empirical software engineering research-the good, the bad, the ugly." Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on. IEEE, 2011. 23. Weyuker, Elaine J., Thomas J. Ostrand, and Robert M. Bell. "Assessing the impact of using fault-prediction in industry." Testing: Academic & Industrial Conference (TAIC 2011). 2011. 24. Williams, Laurie, Gunnar Kudrjavets, and Nachiappan Nagappan. "On the effectiveness of unit test automation at microsoft." Software Reliability Engineering, 2009. ISSRE'09. 20th International Symposium on. IEEE, 2009.

Editor's Notes

  • #5 This is a great introduction book to empirical research in software engineering and the results that can be achieved by applying this method to industrial practices. I read this book shortly after it came out and it made me question all the things in the software engineering folklore that I’ve accepted as truth in the then 10 or so years that I spent in the industry.
  • #6 This research was motivated by the desire to understand what does it mean to improve the way we practice software engineering as a craft. In order to understand that it might be important to understand what we can do to measure the quality of the software we produce. I imagine that others have attempted to do similar things. What were the measurements that they used to learn if they are improving their craft. The idea is to create a level of craft practiced dashboard this research was motivated by the need to know what to put in the dashboard.
  • #7 Empirical Software Engineering Research is being used to evaluate engineering best practices in industrial setting where running a control group is expensive and challenging Post Release Defect Counts has been used to evaluate some long standing engineering techniques and processes
  • #9 MSFT research the effects of the structure of the software engineering organization on the quality of the software that is produced. The quality was measured as defect density with post-release defects. The OS was broken up into 3000 components and several organization measured were used. Primary owners, how many different teams contribute to a particular binary, the level of the manger in the organization who is responsible for 75% of the code changes to a module. This information was then used to build a logistic regression predicting a defect prone component. The model build using the organizational structure had recall of 84% and precision of 86%. These proved to be better predictors then other good predictors like complexity, churn, code coverage and pre release defects. Precision is the ratio of the predicted results that are actually true. Recall is the ratio of true items that are predicted.
  • #10 In another paper MSFT investigated the importance of ownership on defects. The metrics uses were minor contributors < 5%, major contributors > 5% and max % contributed by top contributor. This metrics have a strong relationship with post release defect density. To verify this finding further they used a defect predictor what depended on contributors and removed data about minor contributors which reduced the effectiveness of the predictor. The researchers made recommendations on how to account for this. Pay careful attention to contributions from minor owners, sync minor owners with major owners, allocate more testing to these contributions.
  • #11 AT&T wanted to see if information about and individual contributor an be used to predict defects. The concluded that while the cumulative number of contributors can improved the predictive power of the base model information about an individual doesn’t have a significant impact.
  • #13 Researchers at MSFT looked into the expectations of code review when compared with outcomes of code reviews. They noted that while the number one reason to conduct code review seems to be to find defects the number one result of a code review is code improvement. They suggest this might be due to the fact that you need a lot of context to provide a through code review which finds defects in the implementation. They also make comments on code review effectiveness: Using code reviews for quality assurance is not good enough since reviews don’t identify defects and rarely identify subtle defects. Understanding is key to a successful code review, context is important. Code reviews are useful for tasks other than finding defects for example knowledge sharing or finding alternative solutions. Communication is an import aspect of a successful code review and needs to be supported by the tooling.
  • #14 Engineers at Microsoft find comments that relate to the correctness of the implementation useful, and those that relate to the structure and alternative approaches somewhat useful. The researchers then went on to create a machine learned classifier to identify useful comments and applied it to five projects within Microsoft code base. A total of 1,496,340 review comments were categorized from 190,050 review requests. Across the five project analyzed 64% to 68% of reviews were classified as useful. This data set was further analyzed for insights into the attributes of reviewers and changesets which lead to a higher density of useful comments. The researchers observed that reviewers experienced with the code under review give more useful comments than reviewers looking at the code for the first time. Base on this information they recommend carefully picking reviewers and caution that new reviewers need to be included in order to gain experience with the code base. The researchers also observed that smaller changesets get a higher density of useful comments and that source code gets a higher density of useful comments than build and configuration files. Based on this information they recommend breaking up changesets into small incremental changesets when possible and paying particular attention to build and configuration files under review. Finally, the researchers observed that the useful comment density stabilizes over time for a code base and dips in usefulness can be analyzed by teams to understand issues with the review process.
  • #15 A large case study was conducted on the use of the code review process at Cisco systems. The study used a different definiton of defect: ..When a reviewer or consensus of reviewers determines that code must be changed before it is acceptable, it is a “defect”... This case study provided some interesting insight into effective code reviews. Lines of code under review need to be under 200 and not above 400 or the reviewers start to miss defects. Inspecting at a rate of less than 300 LOC per hour is best for defect detection. Total time spend on the review needs to be around 60 minutes and not exceed 90 minutes.
  • #17 MSFT researchers published several studies where they saw a significant improvement in defect density for teams that used TDD development when compared to teams working on similar projects that didn’t use test driven development. This finding was confirmed by similar results at IBM. The IBM researchers didn’t directly report on defect density improvements but indicated that the defect density was better then industry standards.
  • #18 An interesting paper was published by a research team at Avaya and MSFT. This paper attempted to see the relationship between test coverage and post verification defects. The initial measure of correlation of code coverage and defect density revealed a small negative correlation for Avaya and a small positive correlation for MSFT. The researchers argued that this might be due to the fact that buggy components get more tests to prevent future bugs from being introduced. They adjusted for this by building a logistic regression model which controlled for this effect. After applying this change they noted a strong negative relationship between coverage and defect density in both the Avaya and Microsoft systems. On the Avaya system the researchers were able to conduct further research in the effect on code coverage on defect density. They noted that be biggest drop in defect density occurred when moving to the highest coverage. They also noted that moving to the highest coverage is more expansive then getting to a baseline.
  • #20 The researchers at AT&T investigated if the bugs were concentrated in a specific set of files. Testing the 80/20 rule on defect distribution. They build a model that predicted the likely hood of a file containing a defect and then took the top 20% of files to see how many actual defects those files contain. The turn our to contain between 71 and 92 % of all defects. This relationship held across multiple applications at AT&T. The same team recommended that this information can be used to more efficiently allocate testing resource, which are always scares.
  • #21 At AT&T at the time of the writing this analysis was done post hoc and didn’t effect the actual development process. In a subsequent paper the researchers propose metrics that can be tracked to see if defect prediction has an effect on testing effectiveness. The metrics proposed were the number of faults detected at different stages of testing and the average time to detect a bug.
  • #22 Researchers at google wanted to find out the reason that defect predictors are not used in the industry. The evaluated a few defect predictors on two google systems and noted that the existing predictors didn’t necessary select files that the developers though were buggy. They then improved on existing algorithms and integrated a defect predictor into the code review process. The noticed that engineers didn’t spend any more time on fault files. The concluded that the current tools don’t provide the information necessary to help practitioners and suggest that predictors provide a way to fix the defect.
  • #24 Google researchers investigated using FindBugs on Java and Google code bases. They noted that the static analysis tool in memory safe programming languages tend to find a significant number of trivial issue a long with actual issues. At Google the bugs returned by FindBugs were carefully curated and tracked in an effort to reduce false positives of invalid or impossible bugs.
  • #25 MSFT researchers investigating using the defect density produced by static analysis tools as indicator or pre-release defect density. They found a strong positive correlation between the two. Meaning large defect density from static analysis tools leads to large defect density on pre-release testing.
  • #26 Another research team looked into a relationship between code asserts and defect density. They concluded that assert density has a negative correlation to defect density, the more assert the fewer defects. They compared these results to static analysis tools results.
  • #28 Facebook published details of their development and deployment process for the main facebook.com application. This is a large application which grew in both the number of contributors and the size of the code base. The tools they use to help maintain quality are unit testing, peer code reviews, feature flags, A/B tests and frequent deployments. The also instill code ownership by requiring that developer are on call during a deployment this leads to strong ownership practice where less then 10% of file are modified by 7 or more engineers. In a different study Facebook researchers report on using formal methods in the Perpetual development process. They note that event in this process sometimes it is important to catch bugs before they are release for example in mobile applications since in this scenario the deployment is controlled by the app store or the device owner. INFER is a static analysis tool which looks for null reference and resource use deffects in android and iOs applications. It was integrated into the code review process where the bugs are displayed on the line of code with a suggested fix.
  • #29 Google researchers report on how they keep the high right of quality at google. They encourage teams to be responsible for the quality of their own code. This is not necessary enforced by encouraged by certain programs. For example they have a test certification process which at lower level encourages the teams to take some simple steps and later specifieds code coverage goals. Google also report on a system they’ve developed called Tricorder which enables program analysis at scale. This system is used to develop and deploy 30 code analyzers on the google code base while keeping the rate of false positives low and providing feedback to the code check author. Tricorder is integrated into the code review process.