Building Real Software: agile development

Showing posts with label agile development. Show all posts

Friday, September 29, 2017

Agile Application Security book

This is the first post in a while. I've been busy working on a bunch of projects. One of them is now finally complete: a book on Agile Application Security for O'Reilly, with Laura Bell, Michael Brunton-Spall, and Rich Smith.

In this book we try to build bridges between the security community and Agile teams, by taking advantage of our different experiences and viewpoints:

Rich's extensive experience as a pen tester, and running the security team at Etsy.
Michael's experience in hyperdrive Agile development, DevOps and security at The Guardian and the UK Digital Service.
Laura's work as a software developer and application security cat herder with large and small organizations in many different stages on their journeys to Agile adoption.
My work in development and operations in enterprise financial technology.

This is a unique book that looks at Agile from a security perspective, and security from an Agile perspective.

We explain the driving ideas and key problems in security, and the core enabling practices in Agile that help teams succeed, and how security programs can leverage Agile ideas and practices. How to deal with important risks and problems, and how to scale.

We look in detail at security practices and tools in an Agile context: threat and risk management, how to think about security in requirements, secure coding and code reviews, security testing in Continuous Integration and Continuous Deployment, what scanning can and cannot do for you, building hardened infrastructure and running secure systems, and putting all of this together into automated pipelines and feedback loops.

We also step through regulatory compliance and how to achieve continuous compliance; and how to get value from working with outsiders, including auditors, pen testers and bug bounty programs. We end with how to build an agile security culture and how to break down walls between engineers and security.

It was a unique opportunity to work with experts around the world: Michael in the UK, Laura in New Zealand, Rich in the US. Challenging, exhausting, and a great learning experience.

Our hope is that it offers value to developers who work in Agile environments and are new to security; to people in the security community who want to understand how security can keep up with high-velocity Agile and DevOps teams; and even to people who are expert in both.

Monday, June 22, 2015

What does DevOps mean to Developers?

A post that I wrote for JavaWorld DevOps for Developers: A New Agility, describes how DevOps changes the way that developers work, and what they need to know to succeed.

Wednesday, March 4, 2015

Putting Security into Sprints

To build a secure app, you can’t wait to the end and hope to “test security in”. For teams who follow Agile methods like Scrum, this means you have to find a way to add security into Sprints. Here’s how to do it:

Sprint Zero

A few basic security steps need to be included upfront in Sprint Zero:

Platform selection – when you are choosing your language and application framework, take some time to understand the security functions they provide. Then look around for security libraries like Apache Shiro (a framework for authentication, session management and access control), Google KeyCzar (crypto), and the OWASP Java Encoder (XSS protection) to fill in any blanks.
Data privacy and compliance requirements – make sure that you understand data needs to be protected and audited for compliance purposes (including PII), and what you will need to prove to compliance auditors.
Secure development training – check the skill level of the team, fill in as needed with training on secure coding. If you can’t afford training, buy a couple of copies of Iron-Clad Java, and check out SAFECode’s free seminars on secure coding.
Coding guidelines and code review guidelines – consider where security fits in. Take a look at CERT’s Secure Java Coding Guidelines.
Testing approach – plan for security unit testing in your Continuous Integration pipeline. And choose a static analysis tool and wire it into Continuous Integration too. Plan for pen testing or other security stage gates/reviews later in development.
Assigning a security lead - someone on the team who has experience and training in secure development (or who will get extra training in secure development) or someone from infosec, who will act as the point person on risk assessments, lead threat modeling sessions, coordinate pen testing and scanning and triage the vulnerabilities found, bring new developers up to speed.
Incident Response - think about how the team will help ops respond to outages and to security incidents.

Early Sprints

The first few Sprints, where you start to work out the design and build out the platform and the first-ofs for key interfaces and integration points, is when the application’s attack surface expands quickly.

You need to do threat modeling to understand security risks and make sure that you are handling them properly.

Start with Adam Shostack’s 4 basic threat modeling questions:

What are you building?
What can go wrong?
What are you going to do about it?
Did you do an acceptable job at 1-3?

Delivering Features (Securely)

A lot of development work is business as usual, delivering features that are a lot like the other features that you’ve already done: another screen, another API call, another report or another table. There are a few basic security concerns that you need to keep in mind when you are doing this work. Make sure that problems caught by your static analysis tool or security tests are reviewed and fixed. Watch out in code reviews for proper use of frameworks and libraries, and for error and exception handling and defensive coding.

Take some extra time when a security story comes up (a new security feature or a change to security or privacy requirements), and think about abuser stories whenever you are working on a feature that deals with something important like money, or confidential data, or secrets, or command-and-control functions.

Heavy Lifting

You need to think about security any time you are doing heavy lifting: large-scale refactoring, upgrading framework code or security plumbing or the run-time platform, introducing a new API or integrating with a new system. Just like when you are first building out the app, spend extra time threat modeling, and be more careful in testing and in reviews.

Security Sprints

At some point later in development you may need to run a security Sprint or hardening Sprint – to get the app ready for release to production, or to deal with the results of a pen test or vulnerability scan or security audit, or to clean up after a security breach.

This could involve all or only some of the team. It might include reviewing and fixing vulnerabilities found in pen testing or scanning. Checking for vulnerabilities in third party and Open Source components and patching them. Working with ops to review and harden the run-time configuration. Updating and checking your incident response plan, or improving your code review or threat modeling practices, or reviewing and improving your security tests. Or all of the above.

Adding Security into Sprints. Just Do It.

Adding security into Sprints doesn’t have to be hard or cost a lot. A stripped down approach like this will take you a long way to building secure software. And if you want to dig deeper into how security can fit into Sprints, you can try out Microsoft’s SDL for Agile. Just do it.

Tuesday, January 20, 2015

ThoughtWorks Takes Security Sandwiches off the Menu

Most people in software development have heard about ThoughtWorks.

ThoughtWorks' Chief Scientist, Martin Fowler, is one of the original Agile thought leaders, and they continue to drive new ideas in Agile development and devops, including Continuous Delivery.

At least once a year the thought leaders of ThoughtWorks get together and publish a Technology Radar – a map of the techniques and tools and ideas that they are having success with and recommend to other developers, or that are trying out in their projects and think other people should know more about, or that they have seen fail and want to warn other people about.

I always look forward to reading the Radar when it comes out. It’s a good way to learn about cool tools and new ideas, especially in devops, web and mobile development, Cloudy stuff and IoT, and other things that developers should know about.

But until recently, security has been conspicuously absent from the Radar: which means that security wasn't something that ThoughtWorks developers thought was important or interesting enough to share. Over the last year this has changed, and ThoughtWorks has started to include application security and data privacy concerns in design, development and delivery, including privacy vs big data, forward secrecy, two-factor authentication, OpenID Connect, and the OWASP Top 10.

The first Radar of 2015 recommends that organizations avoid the “Security Sandwich” approach to implementing appsec in development projects, and instead look for ways to build security into Agile development:

Traditional approaches to security have relied on up-front specification followed by validation at the end. This “Security Sandwich” approach is hard to integrate into Agile teams, since much of the design happens throughout the process, and it does not leverage the automation opportunities provided by continuous delivery. Organizations should look at how they can inject security practices throughout the agile development cycle.

This includes: evaluating the right level of Threat Modeling to do up-front; when to classify security concerns as their own stories, acceptance criteria, or cross-cutting non-functional requirements; including automatic static and dynamic security testing into your build pipeline; and how to include deeper testing, such as penetration testing, into releases in a continuous delivery model. In much the same way that DevOps has recast how historically adversarial groups can work together, the same is happening for security and development professionals.

The sandwich – policies upfront, and pen testing at the end to “catch all the security bugs” – doesn't work, especially for Agile teams and teams working in devops environments. Teams who use lightweight, iterative incremental development practices and release working software often need tools and practices to match. Instead of scan-at-the-end-then-try-to-fix, we need simple, efficient checks and guides that can be embedded into Agile development and faster, more efficient tools that provide immediate feedback in Continuous Integration and Continuous Delivery. And we need development and security working together more closely and more often.

It’s good to see pragmatic application security on the ThoughtWorks Radar. I hope it’s on your radar too.

Tuesday, January 13, 2015

We can’t measure Programmer Productivity… or can we?

If you go to Google and search for "measuring software developer productivity" you will find a whole lot of nothing. Seriously -- nothing.
Nick Hodges, Measuring Developer Productivity

By now we should all know that we don’t know how to measure programmer productivity.

There is no clear cut way to measure which programmers are doing a better or faster job, or to compare productivity across teams. We “know” who the stars on a team are, who we can depend on to deliver, and who is struggling. And we know if a team is kicking ass – or dragging their asses. But how do we prove it? How can we quantify it?

All sorts of stupid and evil things can happen when you try to measure programmer productivity.

But let’s do it anyways.

We’re writing more code, so we must be more productive

Developers are paid to write code. So why not measure how much code they write – how many lines of code get delivered?

Because we've known since the 1980s that this is a lousy way to measure productivity.

Lines of code can’t be compared across languages (of course), or even between programmers using the same language working in different frameworks or following different styles. Which is why Function Points were invented – an attempt to standardize and compare the size of work in different environments. Sounds good, but Function Points haven’t made it into the mainstream, and probably never will – very few people know how Function Points work, how to calculate them and how they should be used.

The more fundamental problem is that measuring productivity by lines (or Function Points or other derivatives) typed doesn’t make any sense. A lot of important work in software development, the most important work, involves thinking and learning – not typing.

The best programmers spend a lot of time understanding and solving hard problems, or helping other people understand and solve hard problems, instead of typing. They find ways to simplify code and eliminate duplication. And a lot of the code that they do write won’t count anyways, as they iterate through experiments and build prototypes and throw all of it away in order to get to an optimal solution.

The flaws in these measures are obvious if we consider the ideal outcomes: the fewest lines of code possible in order to solve a problem, and the creation of simplified, common processes and customer interactions that reduce complexity in IT systems. Our most productive people are those that find ingenious ways to avoid writing any code at all.
Jez Humble, The Lean Enterprise

This is clearly one of those cases where size doesn’t matter.

We’re making (or saving) more money, so we must be working better

We could try to measure productivity at a high level using profitability or financial return on what each team is delivering, or some other business measure such as how many customers are using the system – if developers are making more money for the business (or saving more money), they must be doing something right.

Using financial measures seems like a good idea at the executive level, especially now that “every company is a software company”. These are organizational measures that developers should share in. But they are not effective – or fair – measures of developer productivity. There are too many business factors are outside of the development team’s control. Some products or services succeed even if the people delivering them are doing a lousy job, or fail even if the team did a great job. Focusing on cost savings in particular leads many managers to cut people and try “to do more with less” instead of investing in real productivity improvements.

And as Martin Fowler points out there is a time lag, especially in large organizations – it can sometimes take months or years to see real financial results from an IT project, or from productivity improvements.

We need to look somewhere else to find meaningful productivity metrics.

We’re going faster, so we must be getting more productive

Measuring speed of development – velocity in Agile – looks like another way to measure productivity at the team level. After all, the point of software development is to deliver working software. The faster that a team delivers, the better.

But velocity (how much work, measured in story points or feature points or ideal days, that the team delivers in a period of time) is really a measure of predictability, not productivity. Velocity is intended to be used by a team to measure how much work they can take on, to calibrate their estimates and plan their work forward.

Once a team’s velocity has stabilized, you can measure changes in velocity within the team as a relative measure of productivity. If the team’s velocity is decelerating, it could be an indicator of problems in the team or the project or the system. Or you can use velocity to measure the impact of process improvements, to see if training or new tools or new practices actually make the team’s work measurably faster.

But you will have to account for changes in the team, as people join or leave. And you will have to remember that velocity is a measure that only makes sense within a team – that you can’t compare velocity between teams.

Although this doesn't stop people from trying. Some shops use the idea of a well-known reference story that all teams in a program understand and use to base their story points estimates on. As long as teams aren't given much freedom on how they come up with estimates, and as long as the teams are working in the same project or program with the same constraints and assumptions, you might be able to do rough comparison of velocity between teams. But Mike Cohn warns that

If teams feel the slightest indication that velocities will be compared between teams there will be gradual but consistent “point inflation.”

ThoughtWorks explains that velocity <> productivity in their latest Technology Radar:

We continue to see teams and organizations equating velocity with productivity. When properly used, velocity allows the incorporation of “yesterday's weather” into a team’s internal iteration planning process. The key here is that velocity is an internal measure for a team, it is just a capacity estimate for that given team at that given time. Organizations and managers who equate internal velocity with external productivity start to set targets for velocity, forgetting that what actually matters is working software in production. Treating velocity as productivity leads to unproductive team behaviors that optimize this metric at the expense of actual working software.

Just stay busy

One manager I know says that instead of trying to measure productivity

“We just stay busy. If we’re busy working away like maniacs, we can look out for problems and bottlenecks and fix them and keep going”.

In this case you would measure – and optimize for – cycle time, like in Lean manufacturing.

Cycle time – turnaround time or change lead time, from when the business asks for something to when they get it in their hands and see it working – is something that the business cares about, and something that everyone can see and measure. And once you start looking closely, waste and delays will show up as you measure waiting/idle time, value-add vs. non-value-add work, and process cycle efficiency (total value-add time / total cycle time).

“It’s not important to define productivity, or to measure it. It’s much more important to identify non-productive activities and drive them down to zero.”
Erik Simmons, Intel

Teams can use Kanban to monitor – and limit – work in progress and identify delays and bottlenecks. And Value Stream Mapping to understand the steps, queues, delays and information flows which need to be optimized. To be effective, you have to look at the end-to-end process from when requests are first made to when they are delivered and running, and optimize all along the path, not just the work in development. This may mean changing how the business prioritizes, how decisions are made and who makes the decisions.

In almost every case we have seen, making one process block more efficient will have a minimal effect on the overall value stream. Since rework and wait times are some of the biggest contributors to overall delivery time, adopting “agile” processes within a single function (such as development) generally has little impact on the overall value stream, and hence on customer outcomes.
Jezz Humble, The Lean Enterprise

The down side of equating delivery speed with productivity? Optimizing for cycle time/speed of delivery by itself could lead to problems over the long term, because this incents people to think short term, and to cut corners and take on technical debt.

We’re writing better software, so we must be more productive

“The paradox is that when managers focus on productivity, long-term improvements are rarely made. On the other hand, when managers focus on quality, productivity improves continuously.”
John Seddon, quoted in The Lean Enterprise

We know that fixing bugs later costs more. Whether it’s 10x or 100+x, it doesn't really matter. And that projects with fewer bugs are delivered faster – at least up to a point of diminishing returns for safety-critical and life-critical systems.

And we know that the costs of bugs and mistakes in software to the business can be significant. Not just development rework costs and maintenance and support costs. But direct costs to the business. Downtime. Security breaches. Lost IP. Lost customers. Fines. Lawsuits. Business failure.

It’s easy to measure that you are writing good – or bad – software. Defect density. Defect escape rates (especially defects – including security vulnerabilities – that escape to production). Static analysis metrics on the code base, using tools like SonarQube.

And we know how to write good software - or we should know by now. But is software quality enough to define productivity?

Devops – Measuring and Improving IT Performance

Devops teams who build/maintain and operate/support systems extend productivity from dev into ops. They measure productivity across two dimensions that we have already looked at: speed of delivery, and quality.

But devops isn't limited to just building and delivering code – instead it looks at performance metrics for end-to-end IT service delivery:

Delivery Throughput: deployment frequency and lead time, maximizing the flow of work into production
Service Quality: change failure rate and MTTR

It’s not a matter of just delivering software faster or better. It’s dev and ops working together to deliver services better and faster, striking a balance between moving too fast or trying to do too much at a time, and excessive bureaucracy and over-caution resulting in waste and delays. Dev and ops need to share responsibility and accountability for the outcome, and for measuring and improving productivity and quality.

As I pointed out in an earlier post this makes operational metrics more important than developer metrics. According to recent studies, success in achieving these goals lead to improvements in business success: not just productivity, but market share and profitability.

Measure Outcomes, not Output

In The Lean Enterprise (which you can tell I just finished reading), Jez Jumble talks about the importance of measuring productivity by outcome – measuring things that matter to the organization – not output.

“It doesn't matter how many stories we complete if we don’t achieve the business outcomes we set out to achieve in the form of program-level target conditions”.

Stop trying to measure individual developer productivity.

It’s a waste of time.

Everyone knows who the top performers are. Point them in the right direction, and keep them happy.

Everyone knows the people who are struggling. Get them the help that they need to succeed.

Everyone knows who doesn't fit in. Move them out.

Measuring and improving productivity at the team or (better) organization level will give you much more meaningful returns.

When it comes to productivity:

Measure things that matter – things that will make a difference to the team or to the organization. Measures that are clear, important, and that aren't easy to game.
Use metrics for good, not for evil – to drive learning and improvement, not to compare output between teams or to rank people.

I can see why measuring productivity is so seductive. If we could do it we could assess software much more easily and objectively than we can now. But false measures only make things worse.
Martin Fowler, CannotMeasureProductivity

Wednesday, November 19, 2014

Different Ways of Scaling Agile

At this year's Construx Software Executive Summit one of the problems that we explored was how to scale software development, especially Agile development, across projects, portfolios, geographies and enterprises. As part of this, we looked at 3 different popular methods for scaling Agile: LeSS (Large Scale Scrum), SAFe (Scaled Agile Framework), and DAD (Disciplined Agile Delivery).

LeSS and LeSS Huge - Large Scale Scrum

Craig Larman, the co-author of LeSS (and LeSS Huge - for really big programs), started off by criticizing the "contract game" or "commitment game" that management, developers and customers traditionally play to shift blame upfront for when things (inevitably) go wrong on a project. It was provocative and entertaining, but it had little to do with scaling Agile.

He spent the rest of his time building the case for restructuring organizations around end-to-end cross-functional feature teams who deliver working code rather than specialist component teams and functional groups or matrices. Feature teams can move faster by sharing code and knowledge, solving problems together and minmizing handoffs and delays.

Enterprise architecture in LeSS seems easy. Every team member is a developer - and every developer is an architect. Architects work together outside of teams and projects in voluntary Communities of Practice to collaborate and shape the organization's architecture together. This sounds good - but architecture, especially in large enterprise environments, is too important to try and manage out-of-band. LeSS doesn't explain how eliminating specialization and working without upfront architecture definition and architectural standards and oversight will help build big systems that work with other big systems.

LeSS is supposed to be about scaling up, but most of what LeSS lays out looks like Scrum done by lots of people at the same time. It's not clear where Scrum ends and LeSS starts.

SAFe - Scaled Agile Framework

There's no place for management in LeSS (except for Product Owners, who are the key constraint for success - like in Scrum). Implementing Less involves fundamentally restructuring your organization around business-driven programs and getting rid of managers and specialists.

Managers (as well as architects and other specialists) do have a role in SAFe's Scaled Agile Framework - a more detailed and heavyweight method that borrows from Lean, Agile and sequential Waterfall development approaches. Teams following Scrum (and some XP technical practices) to build working code roll up into programs and portfolios, which need to be managed and coordinated.

In fact, there is so much for managers to do in SAFe as "Lean-Agile Leaders" that Dean Leffingwell spent most of his time enumerating and elaborating the roles and responsibilities of managers in scaling Agile programs and leading change.

Some of the points that stuck with me:

The easiest way to change culture is to have success. Focus on execution, not culture, and change will follow.
From Deming: Only managers can change the system - because managers create systems. Change needs to come from the middle.
Managers need to find ways to push decisions down to teams and individuals, giving them strong and clear "decision filters" so that they understand how to make their own decisions.

DAD - Disciplined Agile Delivery

Scott Ambler doesn't believe that there is one way to scale Agile development, because in an enterprise different teams and projects will deliver different kinds of software in different ways: some may be following Scrum or XP, or Kanban, or Lean Startup with Continuous Deployment, or RUP, or SAFe, or a sequential Waterfall approach (whether they have good reasons, or not so good reasons, for working the way that they do).

Disciplined Agile Development (DAD) is not a software development method or project management framework - it is a decision-making framework that looks at how to plan, build and run systems across the enterprise. DAD layers over Scrum/XP, Lean/Kanban or other lifeycles, helping managers make decisions about how to manage projects, how to manage risks, and how to drive change.

Projects, and people working in projects, need to be enterprise-aware - they need to work within the constraints of the organization, follow standards, satisfy compliance, integrate with legacy systems and with other projects and programs, and leverage shared resources and expertise and other assets across the organization.

Development isn't the biggest problem in scaling Agile. Changes need to be made in many different parts of the organization in order to move faster: governance (including the PMO), procurement, finance, compliance, legal, product management, data management, ops, ... and these changes can take a long time. In Disciplined Agile Development, this isn't easy, and it's not exciting. It just has to be done.

Scaling Agile is Hard, but it's worth it

Almost all of us agreed with Dean Leffingwell that "nothing beats Agile at the team level". But achieving the same level of success at the organizational level is a hard problem. So hard that none of the people who are supposed to be experts at it could clearly explain how to do it.

After talking to senior managers from many different industries and different countries, I learned that most organizations seem to be finding their own way, blending sequential Waterfall stage-gate development and large-scale program management practices at the enterprise-level with Agile at the team level. Using Agile approaches to explore ideas and requirements, prototyping and technical spikes to help understand viability and scope and technical needs and risks early, before chartering projects. Starting off these projects with planning and enough analysis and modeling upfront to identify key dependencies and integration points, then getting Agile teams to fill in the details and deliver working software in increments. Managing these projects like any other projects, but with more transparency into the real state of software development - because you get working software instead of status reports.

The major advantage of Agile at scale isn't the ability to react to continuous changes or even to deliver faster or cheaper. It's knowing sooner whether you should keep going, or if you need to keep going, or if you should stop and do something else instead.

Tuesday, July 29, 2014

Devops isn't killing developers – but it is killing development and developer productivity

Devops isn't killing developers – at least not any developers that I know.

But Devops is killing development, or the way that most of us think of how we are supposed to build and deliver software. Agile loaded the gun. Devops is pulling the trigger.

Flow instead of Delivery

A sea change is happening in the way that software is developed and delivered. Large-scale waterfall software development projects gave way to phased delivery and Spiral approaches, and then to smaller teams delivering working code in time boxes using Scrum or other iterative Agile methods. Now people are moving on from Scrum to Kanban, and to One-Piece Continuous Flow with immediate and Continuous Deployment of code to production in Devops.

The scale and focus of development continues to shrink, and so does the time frame for making decisions and getting work done. Phases and milestones and project reviews to sprints and sprint reviews to Lean controls over WIP limits and task-level optimization. The size of deliverables: from what a project team could deliver in a year to what a Scrum team could get done in a month or a week to what an individual developer can get working in production in a couple of days or a couple of hours.

The definition of “Done” and “Working Software” changes from something that is coded and tested and ready to demo to something that is working in production – now (“Done Means Released”).

Continuous Delivery and Continuous Deployment replace Continuous Integration. Rapid deployment to production doesn't leave time for manual testing or for manual testers, which means developers are responsible for catching all of the bugs themselves before code gets to production – or do their testing in production and try to catch problems as they happen (aka “Monitoring as Testing").

Because Devops brings developers much closer to production, operational risks become more important than project risks, and operational metrics become more important than project metrics. System uptime and cycle time to production replace Earned Value or velocity. The stress of hitting deadlines is replaced by the stress of firefighting in production and being on call.

Devops isn't about delivering a project or even delivering features. It’s about minimizing lead time and maximizing flow of work to production, recognizing and eliminating junk work and delays and hand offs, improving system reliability and cutting operational costs, building in feedback loops from production to development, standardizing and automating steps as much as possible. It’s more manufacturing and process control than engineering.

Devops kills Developer Productivity too

Devops also kills developer productivity.

Whether you try to measure developer productivity by LOC or Function Points or Feature Points or Story Points or velocity or some other measure of how much code is written, less coding gets done because developers are spending more time on ops work and dealing with interruptions, and less time writing code.

Time learning about the infrastructure and the platform and understanding how it is setup and making sure that it is setup right. Building Continuous Delivery and Continuous Deployment pipelines and keeping them running. Helping ops to investigate and resolve issues, responding to urgent customer requests and questions, looking into performance problems, monitoring the system to make sure that it is working correctly, helping to run A/B experiments, pushing changes and fixes out… all take time away from development and pre-empt thinking about requirements and designing and coding and testing (the work that developers are trained to do and are good at).

The Impact of Interruptions and Multi-Tasking

You can’t protect developers from interruptions and changes in priorities in Devops, even if you use Kanban with strict WIP limits, even in a tightly run shop – and you don’t want to. Developers need to be responsive to operations and customers, react to feedback from production, jump on problems and help detect and resolve failures as quickly as possible. This means everyone, especially your most talented developers, need to be available for ops most if not all of the time.

Developers join ops on call after hours, which means carrying a pager (or being chased by Pager Duty) after the day’s work is done. And time wasted on support calls for problems that end up not being real problems, and long nights and weekends on fire fighting and tracking down production issues and helping to recover from failures, coming in tired the next day to spend more time on incident dry runs and testing failover and roll-forward and roll-back recovery and participating in post mortems and root cause analysis sessions when something goes wrong and the failover or roll-forward or roll-back doesn’t work.

You can’t plan for interruptions and operational problems, and you can’t plan around them. Which means developers will miss their commitments more often. Then why make commitments at all? Why bother planning or estimating? Use just-in-time prioritization instead to focus in on the most important thing that ops or the customer need at the moment, and deliver it as soon as you can – unless something more important comes up and pre-empts it.

As developers take on more ops and support responsibilities, multi-tasking and task switching – and the interruptions and inefficiency that come with it – increase, fracturing time and destroying concentration. This has an immediate drag on productivity, and a longer term impact on people’s ability to think and to solve problems.

Even the Continuous Deployment feedback loop itself is an interruption to a developer’s flow.

After a developer checks in code, running unit tests in Continuous Integration is supposed to be fast, a few seconds or minutes, so that they can keep moving forward with their work. But to deploy immediately to production means running through a more extensive set of integration tests and systems tests and other checks in Continuous Delivery (more tests and more checks takes more time), then executing the steps through to deployment, and then monitoring production to make sure that everything worked correctly, and jumping in if anything goes wrong. Even if most of the steps are automated and optimized, all of this takes extra time and the developer’s attention away from working on code.

Optimizing the flow of work in and out of operations means sacrificing developer flow, and slowing down development work itself.

Expectations and Metrics and Incentives have to Change

In Devops, the way that developers (and ops) work change, and the way that they need to be managed changes. It’s also critical to change expectations and metrics and incentives for developers.

Devops success is measured by operational IT metrics, not on meeting project delivery goals of scope, schedule and cost, not on meeting release goals or sprint commitments, or even meeting product design goals.

How fast can the team respond to important changes and problems: Change Lead Time and Cycle Time to production instead of delivery milestones or velocity
How often do they push changes to production (which is still the metric that most people are most excited about – how many times per day or per hour or minute Etsy or Netflix or Amazon deploy changes)
How often do they make mistakes - Change / Failure ratio
System reliability and uptime – MTBF and especially MTTD and MTTR
Cost of change – and overall Operations and Support costs

Devops is more about Ops than Dev

As more software is delivered earlier and more often to production, development turns into maintenance. Project management is replaced by incident management and task management. Planning horizons get much shorter – or planning is replaced by just-in-time queue prioritization and triage.

With Infrastructure as Code Ops become developers, designing and coding infrastructure and infrastructure changes, thinking about reuse and readability and duplication and refactoring, technical debt and testability and building on TDD to implement TDI (Test Driven Infrastructure). They become more agile and more Agile, making smaller changes more often, more time programming and less on paper work.

And developers start to work more like ops. Taking on responsibilities for operations and support, putting operational risks first, caring about the infrastructure, building operations tools, finding ways to balance immediate short-term demands for operational support with longer-term design goals.

None of this will be a surprise to anyone who has been working in an online business for a while. Once you deliver a system and customers start using it, priorities change, everything about the way that you work and plan has to change too.

This way of working isn't better for developers, or worse necessarily. But it is fundamentally different from how many developers think and work today. More frenetic and interrupt-driven. At the same time, more disciplined and more Lean. More transparent. More responsibility and accountability. Less about development and more about release and deployment and operations and support.

Developers – and their managers – will need to get used to being part of the bigger picture of running IT, which is about much more than designing apps and writing and delivering code. This might be the future of software development. But not all developers will like it, or be good at it.

Thursday, March 27, 2014

Secure DevOps - Seems Simple

The DevOps security story is deceptively simple. It’s based on a few fundamental, straight forward ideas and practices:

Smaller Releases are Safer

One of these ideas is that smaller, incremental and more frequent releases are safer and cause less problems than big bang changes. Makes sense.

Smaller releases contain less code changes. Less code means less complexity and fewer bugs. And less risk, because smaller releases are easier to understand, easier to plan for, easier to test, easier to review, and easier to roll back if something goes wrong.

And easier to catch security risks by watching out for changes to high risk areas of code: code that handles sensitive data, or security features or other important plumbing, new APIs, error handling. At Etsy for example, they identify this code in reviews or pen testing or whatever, hash it, and automatically alert the security team when it gets changed, so that they can make sure that the changes are safe.

Changing the code more frequently may also make it harder for the bad guys to understand what you are doing and find vulnerabilities in your system – taking advantage of a temporary “Honeymoon Effect” between the time you change the system and the time that the bad guys figure out how to exploit weaknesses in it.

And changing more often forces you to simplify and automate application deployment, to make it repeatable, reliable, simpler, faster, easier to audit. This is good for change control: you can put more trust in your ability to deploy safely and consistently, you can trace what changes were made, who made them, and when.

And you can deploy application patches quickly if you find a problem.

“...being able to deploy quick is our #1 security feature”
Effective Approaches to Web Application Security, Zane Lackey

Standardized Ops Environment through Infrastructure as Code

DevOps treats “Infrastructure as Code”: infrastructure configurations are defined in code that is written and managed in the same way as application code, and deployed using automated tools like Puppet or Chef instead of by hand. Which means that you always know how your infrastructure is setup and that it is setup consistently (no more Configuration Drift). You can prove what changes were made, who made them, and when.

You can deploy infrastructure changes and patches quickly if you find a problem.

You can test your configuration changes in advance, using the same kinds of automated unit test and integration test suites that Agile developers rely on – including tests for security.

And you can easily setup test environments that match (or come closer to matching) production, which means you can do a more thorough and accurate job of all of your testing.

Automated Continuous Security Testing

DevOps builds on Agile development practices like automated unit/integration testing in Continuous Integration, to include higher level automated system testing in Continuous Delivery/Continuous Deployment.

You can do automated security testing using something like Gauntlt to “be mean to your code” by running canned attacks on the system in a controlled way.

Other ways of injecting security into Devops include:

Providing developers with immediate feedback on security issues through self-service static analysis: running Static Analysis scans on every check-in, or directly in their IDEs as they are writing code.
Helping developers to write automated security unit tests and integration tests and adding them to the Continuous testing pipelines.
Automating checks on Open Source and other third party software dependencies as part of the build or Continuous Integration, using something like OWASP’s Dependency Check to highlight dependencies that have known vulnerabilities.

Fast feedback loops using automated testing means you can catch more security problems – and fix them – earlier.

Operations Checks and Feedback

DevOps extends the idea of feedback loops to developers from testing all the way into production, allowing (and encouraging) developers visibility into production metrics and getting developers and ops and security to all monitor the system for anomalies in order to catch performance problems and reliability problems and security problems.

Adding automated asserts and health checks to deployment (and before start/restart) in production to make sure key operational dependencies are met, including security checks: that the configurations correct, ports that should be closed are closed, ports that should be opened are opened, permissions are correct, SSL is setup properly…

Or even killing system processes that don’t conform (or sometimes just to make sure that they failover properly, like they do at Netflix).

People talking to each other and working together to solve problems

And finally DevOps is about people talking together and solving problems together. Not just developers talking to the business/customers. Developers talking to ops, ops talking to developers, and everybody talking to security. Sharing ideas, sharing tools and practices. Bringing ops and security into the loop early. Dev and ops and security working together on planning and on incident response and learning together in Root Cause Analysis sessions and other reviews. Building teams across silos. Building trust.

Making SecDevOps Work

There’s good reasons to be excited by what these people are doing, the path that they are going down. It promises a new, more effective way for developers and security and ops to work together.

But there are some caveats.

Secure DevOps requires strong engineering disciplines and skills. DevOps engineering skills are still in short supply. And so are information security(and especially appsec) skills. People who are good at both DevOps and appsec are a small subset of these small subsets of the talent available.

Outside of configuration management and monitoring, the tooling is limited – you’ll probably have to write a lot of what you need yourself (which leads quickly back to the skills problem).

A lot more work needs to be done to make this apply to regulated environments, with enforced separation of duties and where regulators think of Agile as “the A Word” (so you can imagine what they think of developers pushing out changes to production in Continuous Deployment, even if they are using automated tools to do it). A small number of people are exploring these problems in a Google discussion group on DevOps for managers and auditors in regulated industries, but so far there are more people asking questions than offering answers.

And getting dev and ops and security working together and collaborating across development, ops and security might take an extreme makeover of your organization’s structure and culture.

Secure DevOps practices and ideas aren't enough by themselves to make a system secure. You still need all of the fundamentals in place. Even if they are releasing software incrementally and running lots of automated tests, developers still need to understand software security and design security in and follow good software engineering practices. Whether they are using "Infrastructure as Code" or not, Ops still has to design and engineer the datacenter and the network and the rest of the infrastructure to be safe and reliable, and run things in a secure and responsible way. And security still needs to train everyone and followup on what they are doing, run their scans and pen tests and audits to make sure that all of this is being done right.

Secure DevOps is not as simple as it looks. It needs disciplined secure development and secure ops fundamentals, and good tools and rare skills and a high level of organizational agility and a culture of trust and collaboration. Which is why only a small number of organizations are doing this today. It’s not a short term answer for most organizations. But it does show a way for ops and security to keep up with the high speed of Agile development, and to become more agile, and hopefully more effective, themselves.

Thursday, March 13, 2014

Application Security – Can you Rely on the Honeymoon Effect?

I learned about some interesting research from Dave Mortman at this year’s RSA conference in San Francisco which supports the Devops and Agile arguments that continuous, incremental, iterative changes can be made safely: a study by by the MIT Lincoln lab (Milk or Wine: Does Software Security Improve with Age?) and The Honeymoon Effect, by Sandy Clark at the University of Pennsylvania

These studies show that most software vulnerabilities are foundational (introduced from start of development up to first release), the result of early decisions and not the result of later incremental changes. And there is a “honeymoon period” after software is released, before bad guys understand it well enough to find and exploit vulnerabilities. Which means the more often that you release software changes, the safer your system could be.

Understanding the Honeymoon Effect

Research on the honeymoon period, the time “after the release a software product (or version) and before the discovery of the first vulnerability” seems to show that finding security vulnerabilities is “primarily a function of familiarity with the system”. Software security vulnerabilities aren't like functional or reliability bugs which are mostly found soon after release, slowing down over time:

“…we would expect attackers (and legitimate security researchers) who are looking for bugs to exploit to have the easiest time of it early in the life cycle. This, after all, is when the software is most intrinsically weak, with the highest density of ”low hanging fruit” bugs still unpatched and vulnerable to attack. As time goes on, after all, the number of undiscovered bugs will only go down, and those that remain will presumably require increasing effort to find and exploit.

But our analysis of the rate of the discovery of exploitable bugs in widely-used commercial and open-source software, tells a very different story than what the conventional software engineering wisdom leads us to expect. In fact, new software overwhelmingly enjoys a honeymoon from attack for a period after it is released. The time between release and the first 0-day vulnerability in a given software release tends to be markedly longer than the interval between the first and second vulnerability discovered, which in turn tends to be longer than the time between the second and the third…”

It may take a while for attackers to find the first vulnerability, but then it gets progressively easier – because attackers use information from previous vulnerabilities to find the next ones, and because the more vulnerabilities they find, the more confident they are in their ability to find even more (there's blood in the water for a reason).

This means that software may actually be safest when it should be the weakest:

“when the software is at its weakest, with the ‘easiest’ exploitable vulnerabilities still unpatched, there is a lower risk that this will be discovered by an actual attacker on a given day than there will be after the vulnerability is fixed!”

Code Reuse Shortens your Honeymoon

Clark’s team also found that re-use of code shortens the honeymoon, because this code may already be known to attackers:

“legacy code resulting from code-reuse [whether copy-and-paste or using frameworks or common libraries] is a major contributor to both the rate of vulnerability discovery and the numbers of vulnerabilities found…

We determined that the standard practice of reusing code offers unexpected security challenges. The very fact that this software is mature means that there has been ample opportunity to study it in sufficient detail to turn vulnerabilities into exploits.”

In fact, reuse of code can lead to “less than Zero day” vulnerabilities – software that is already known to be vulnerable before your software is released.

Leveraging Open Source or frameworks and libraries and copying-and-pasting from code that is already working obviously saves times and reduces development costs, and helps developers to minimize technical risks, including security risks – it should be safer to use a special-purpose security library or the security features of your application framework than it is to try to solve security problems on your own. But this also brings along its own set of risks, especially the dangers of using popular software components with known vulnerabilities – software that attackers know and can easily exploit on a wide scale. This means that if you’re going to use Open Source (and just about everybody does today), then you need to put in proactive controls to track what code is being used and make sure that you keep it up to date.

Make the Honeymoon Last as Long as you can

One risk of Agile development and Devops is that security can’t keep up with the rapid pace of change - at least not the way that most organizations practice security today. But if you’re moving fast enough, the bad guys might not be able to keep up either. So speed can actually become a security advantage:

“Software that was changed more frequently had a significantly longer median honeymoon before the first vulnerability was discovered.”

The idea of constant change as protection is behind Shape Shifter, an interesting new technology which constantly changes attributes of web application code so that attackers, especially bots, can’t get a handle on how the system works or execute simple automated attacks.

But speed of change isn't enough by itself to protect you, especially since a lot changes that developers make don’t materially affect the Attack Surface of the application – the points in the system that an attacker can use to get into (or get data out of) an application. Changes like introducing a new API or file upload, or a new user type, or modifying the steps in a key business workflow like an account transfer function could make the system easier or harder to attack. But most minor changes to the UI or behind the scenes changes to analytics and reporting and operations functions don't factor in.

The honeymoon can’t last forever any ways: it could be as long as 3 years, or as short as 1 day. If you are stupid or reckless or make poor technology choices or bad design decisions it won’t take the bad guys that long to find the first vulnerability, regardless of how often you fiddle with the code, and it will only get worse from there. You still have to do a responsible job in design and development and testing, and carefully manage code reuse, especially use of Open Source code – whatever you can to make the honeymoon last as long as possible.

Thursday, January 30, 2014

Small Projects and Big Programs

The Standish Group’s CHAOS 2013 Report has some interesting things to say about what is driving software development project success today. More projects are succeeding (39% in 2012, up from 29% in 2004), mostly because projects are getting smaller (which also means more projects done using Agile, since small projects are the sweet spot for Agile):

“Very few large projects perform well to the project management triple constraints of cost, time, and scope. In contrast to small projects, which have more than a 70% chance of success, a large project has virtually no chance of coming in on time, on budget, and within scope… Large projects have twice the chance of being late, over budget, and missing critical features…. A large project is more than 10 times more likely to fail outright, meaning it will be cancelled or will not be used because it outlived its useful life prior to implementation.”

So don’t run large projects. Of course it’s not that simple: many problems, especially in enterprises, are much too big to be solved by small teams in small projects. Standish Group says that you can get around this if you “Think Big, Act Small”:

“…there is no need for large projects… any IT project can be broken into a series of small projects that could also be done in parallel if necessary.”

Anything that can be done in one big project can obviously be done in a bunch of smaller projects. You can make project management simple – by pushing the hard management problems and risks up to the program level.

Program Management isn't Project Management

Understanding and orchestrating work across multiple projects isn’t as simple as breaking a big project down into small projects and running them independently. Managing programs, managing horizontally across projects, is different than managing projects. There are different risks, different problems to be solved. It requires different skills and strengths, and a different approach.

PMI recognizes this, and has a separate certification for Program Managers (PgMP). Program management is more strategic than project management. Program Managers are not just concerned with horizontal and cross-project issues, coordinating and managing interdependencies between projects – managing at scale. They are also responsible for understanding and helping to achieve business goals, for managing organizational risks and political risks, and they have to take care of financing and contracts and governance: things that Agile coaches running small projects don’t have to worry much about (and that Agile methods don’t help with).

Agile and Program Management

The lightweight tools and practices that you use to successfully coach an Agile team won’t scale to managing a program. Program management needs all of those things that traditional, plan-driven project management is built on. More upfront planning to build a top-down roadmap for all of the teams to share: project teams can’t be completely free to prioritize work and come up with new ideas on the fly, because they have to coordinate handoffs and dependencies. Architecture and technology strategy. More reporting. Working with the PMO. More management roles and more management. More people to manage. Which = more politics.

Johanna Rothman talks a little bit about program management in her book Managing Your Project Portfolio, and has put up a series of blog posts on Agile and Lean Program Management as work in progress for another book she is writing on program management and Agile.

Rothman looks at how to solve problems of organization in programs using the Scrum of Scrums hierarchy (teams hold standups, then Scrum Masters have their own standups together every day to deal with cross-program issues). Because this approach doesn't scale to handle coordination and decision making and problem solving in larger programs, she recommends building loose networks between projects and teams using Communities of Practice (a simple functional matrix in which architects, testers, analysts, and especially the Product Owners in each team coordinate continuously with each other across teams).

Rothman also looks at the problems of coordinating work on the backlog between teams, and evolving architecture, and how Program Managers need to be Servant Leaders and not care what teams do or how they do it, only about the results.

Rothman believes that Program Managers should establish and maintain momentum from the beginning of the program. Rather than taking time upfront to initiate and plan (because, who actually needs to plan a large, complex program?!), get people learning how to work together from the start. Release early, release often, and keep work in progress to a minimum – the larger the program, the less work in progress you should have. Finally she describes some tools that you could use to track and report progress and provide insight into a program’s overall status, and explains how and why they need to be different than the tools used for projects.

There are some ideas here that make sense and would probably work, and some others that don’t - like skipping planning.

Get Serious about Program Management

A more credible and much more comprehensive approach for managing large programs in large organizations would be one of the heavyweight enterprise Agile hybrids: the Scaled Agile Framework (SAFe) or Disciplined Agile Delivery which take Agile ideas and practices and envelop them inside a structured, top-down governance-heavy process/project/program/portfolio management framework based on the Rational Unified Process. But now you’re not trying to manage and coordinate small, simple Agile projects any more, you’re doing something quite different, and much more expensive.

The most coherent and practical framework I have seen for managing programs is laid out in Executing Complex Programs, a course offered by Stanford University, as part of its Advanced Project Management professional development certificate.

This framework covers how to manage distributed cross-functional and cross-organizational teams in global environments; managing organizational and political and logistical and financial risks; and modeling and understanding and coordinating the different kinds of interdependencies and interfaces between projects and teams (shared constraints and resources, APIs and shared data, hand-offs and milestones and drop-dead dates, experts and specialists…) in large, complex programs. The course explores case studies using different approaches, some Agile, some not, some in high reliability / safety critical and regulated environments. This should give you everything that you need to manage a program effectively.

You can and should make projects simpler and smaller – which means that you’ll have to do more program management. But don’t try to get by at the program level with improvising and iterating and leveraging the same simple techniques that have worked with your teams. Nothing serious gets done outside of programs. So take program management seriously.

Thursday, January 23, 2014

Can you Learn and Improve without Agile Retrospectives? Of course you can…

Retrospectives – bringing the team together on a regular basis to examine how they are working and identify where and how they can improve – are an important part of Agile development.

Scrum and “Inspect and Adapt”

So important that Schwaber and Sutherland burned retrospectives into Scrum at the end of every Sprint, to make sure that teams will continuously Inspect and Adapt their way to more effective and efficient ways of working.

End-of-Sprint retrospectives are now commonly accepted as the right way to do things, and are one of the more commonly followed practices in Agile development. VersionOne’s latest State of Agile Development survey says that 72% of Agile teams are doing retrospectives.

Good Retrospectives are Hard Work

Good retrospectives are a lot of work.

For the leader/Coach/Scrum Master who needs to sell them to the team – and to management – and build a safe and respectful environment to hold the meetings and guide everyone through the process properly.

For the team, who need to take the time to learn and understand together and act on what they've learned and then follow-up and actually get better at how they work.

So hard that there several books written just on how to do retrospectives,(Agile Retrospectives: Making Good Teams Great, The Retrospective Handbook, Getting Value out of Agile Retrospectives), as well as several chapters written about retrospectives in other books on Agile, and retrospective websites (including one just on how to make retrospectives fun) and a wiki and at least one prime directive for running retrospectives, and dozens of blog posts with suggestions and coaching tips and alternative meeting formats and collaborative games and tools and techniques to help teams and coaches through the process, to energize retrospectives or re-energize them when teams lose momentum and focus.

Questioning the need for Retrospectives

Because retrospectives are so much work, some people have questioned how useful running retrospectives each Sprint really is, whether they can get by without a retrospective every time, or maybe without doing them at all.

There are good and bad reasons for teams to skip – or at least want to skip – retrospectives.

Because not everyone works in a safe environment where people trust and respect each other, so retrospectives can be dangerous and alienating, a forum for finger pointing and blame and egoism.

Because they don’t result in meaningful change, because the team doesn’t act on what they find – or aren’t given a chance to – and so the meetings become a frustrating and pointless waste of time, rehashing the same problems again and again.

Because the real problems that they need to solve in order to succeed are larger problems that they don’t have the authority or ability to do anything about, and so the meetings become a frustrating and pointless waste of time….

Because the team is under severe time pressure, they have to deliver now or there may not be a chance to get better in the future.

Because the team is working well together, they've “inspected and adapted” their way to good practices and don’t have any serious problems that have to be fixed or initiatives that are worth spending a lot of extra time and energy on, at least for now. They could keep on trying to look for ways to get even better, or they could spend that time getting more work done.

Inspecting and Adapting – without Regular Retrospectives

Regular, frequent retrospectives can be useful – especially when you are first starting off in a new team on a new project. But once the team has learned how to learn, the value that they can get from retrospectives will decline.

This is especially the case for teams working in rapid cycles, short Sprints every 2 weeks or every week or sometimes every few days. As the Sprints get shorter, the meetings need to be shorter too, which doesn’t leave enough time to really review and reflect. And there’s not enough time to make any meaningful changes before the next retrospective comes up again.

At some point it makes good sense to stop and try something different. Are there other ways to learn and improve that work as well, or better than regular team retrospective meetings?

XP and Continuous Feedback

Retrospectives were not part of Extreme Programming as Kent Beck et al defined it (in either the first or second edition).

XP teams are supposed to follow good engineering (at least coding and testing) practices and work together in an intelligent way from the beginning – it should be enough to follow the rules of XP, and fix things when they are broken.

XP relies on built-in feedback loops: TDD, Continuous Integration and continuous testing, pair programming, frequently delivering small releases of software for review. The team is expected to learn from all of this feedback, and improve as they go. If tests fail, or they get negative feedback from the Customer, or find other problems, they need to understand what went wrong, why, and correct it.

Devops and Continuous Delivery/Deployment

Delivering software frequently, or continuously, to production pushes this one step further. If you are delivering working software to real customers on a regular basis, you don’t need to ask the team to reflect internally, to introspect – your customers will tell you if you are doing a good job, and where you need to improve:

Are you delivering what customers need and want? Is it usable? Do they like it?

Is the software quality good – or at least good enough?

Are you delivering fast enough?

By understanding and acting on this feedback, the team will improve in ways that make a real difference.

Root Cause Analysis

If and when something seriously goes wrong in testing or production or within the team, call everyone together for an in depth review and carefully step through Root Cause Analysis to understand what happened, why, what you need to change to prevent problems like this from happening again, and put together a realistic plan to get better.

Reviews like this, where the team works together to confront serious problems in a serious way and genuinely understand them and commit to fixing them, are much more important than a superficial 2-hour meeting every couple of weeks. These can be – and often are – make or break situations. Handled properly, this can pull teams together and make them much stronger. Never waste a crisis.

Kanban and Micro-Optimization

Teams following Kanban are constantly learning and improving.

By making work visible and setting work limits, they can immediately detect delays and bottlenecks, then get together and correct them. This micro-optimization at the task level, always tuning and fixing problems as they come up, might seem superficial, but the results are immediate (recognizing and correcting problems as soon as they come up makes more sense than waiting until the next scheduled meeting), and small improvements are all that many teams are actually able to make anyways.

Take advantage of audits and reviews

In large organizations and highly regulated environments, audits and other reviews (for example security penetration tests) are a fact of life. Instead of trying to get through them with the least amount of effort and time wasted, use them as valuable learning opportunities. Build on what the auditors or reviewers ask for and what they find. If they find something seriously missing or wrong, treat it as a serious problem, understand it and correct it at the source.

Moving Beyond Retrospectives

There are other ways to keep learning and improving, other ways to get useful feedback, ways that can be as effective or more effective and less expensive than frequent retrospectives, from continuous checking and tuning to deep dives if something goes wrong.

You can always schedule regular retrospective meetings if the circumstances demand it: if quality or velocity start to slide noticeably, or conflicts arise in the team, or if key people leave, or there’s been some other kind of shock, a sudden change in direction or priorities that requires everyone to work in a much different way, and start learning all over again.

But don’t tie people down and force them to go through a boring, time-wasting exercise because it’s the “right way to do Agile”, or turn retrospectives into a circus because it’s the only way you can keep people engaged. Find other, better ways to keep learning and improving.

Thursday, December 12, 2013

Stop Telling Stories

There are beautiful, simple ideas in today’s Agile development methods that work really well. And some that don’t. Like defining all of your requirements as User Stories.

I don’t like the name. Stories are what you tell children before putting them to bed, not valuable information that you use to build complex systems. I don’t like the format that most teams use to write stories. And I don’t like how they use them.

Sometimes you need Stories, Sometimes you need Requirements

One of the “rules” of Agile is that stories have to be small – small enough to fit on an index card or a sticky note. They are too short on purpose, because they are supposed to be place holders, reminders to have conversations with the customer when you are ready to work on them:

They're not requirements. They're not use cases. They're not even narratives. They're much simpler than that.

Stories are for planning. They're simple, one-or-two line descriptions of work the team should produce.

This isn't enough detail for the team to implement and release working software, nor is that the intent of stories. A story is a placeholder for a detailed discussion about requirements. Customers are responsible for having the requirements details available when the rest of the team needs them.
James Shore, The Art of Agile - Stories

According to Mike Cohn in his book Succeeding With Agile, making stories short forces the team to shift their focus from writing about features to talking about them. Teams want to do this because these discussions are more important than what what gets written down.

But this idea can be – and often is – taken too far. Sure, most people have learned that it’s not possible to write correct, complete, comprehensive requirements specs for everything upfront. But there are lots of times when it doesn't make sense to limit yourself to 1- or 2-line placeholders for something that you hope to fill in later.

Some requirements aren't high-level expressions of customer intent that can be fleshed out in a conversation and demoed to verify that you got it right. They are specs which need to be followed line by line, or rules or tolerances that constrain your design and implementation in ways that are important and necessary for you to understand as early as possible.

Some requirements, especially in technical or scientific domains, are fundamentally difficult to understand and expensive to get wrong. You want to get as much information as you can upfront, so developers – and customers – have the chance to study the problem and think things through, share ideas, ask questions and get answers, explore options and come up with experiments and scenarios. You want and need to write these things down and get things as straight as you can before you start trying to solve the wrong problem.

And there are other times when you've already had the conversation – you were granted a brief window with people who understood very well what they needed and why. You might not get the same chance with the same people again. So you better write it down while you still remember what they said.

Short summary stories to be detailed later or detailed requirements worked out early – different problems and different situations need different approaches.

The Connextra Template: As a {type of user} I want {something}…

Stories started off as simple, free-form descriptions of work that the development team needed to do, like “Warehouse Inventory Report”. But now, the Role-Feature-Reason template also known as the Connextra template/format (because somebody working there came up with it back in 2001) is the way that we are told we should all write user requirements.

as a {type of user}, I want to {do something}, so that {reason}

How did this happen? And why?

According to Mike Cohn (who helped to popularize this template is his books and courses) there are a few reasons that stories should be written this way:

Reason 1

Something significant and I'm tempted to say magical happens when requirements are put in the first person…

Reason 2

Having a structure to the stories actually helps the product owner prioritize. If the product backlog is a jumble of things like:
Fix exception handing

Let users make reservations

Users want to see photos

Show room size options

… and so on, the Product Owner has to work harder to understand what the feature is, who benefits from it, and what the value of it is.

Reason 3

I've heard an argument that writing stories with this template actually suppresses the information content of the story because there is so much boilerplate in the text. If you find that true, then correct it in how you present the story… [which is not a Reason to use this template, but a workaround if you do use it}.

Trying to fit every requirement into this template comes with its own set of problems:

User story format is awkward and heavy-handed. “As a ___, I want ___, so I can ____.” The concept is good – there should always be an explanation of “why” the task is desired, to make sure the end result fulfills the actual need. But the amount of verbal gymnastics I've seen people go through to try to make a simple and obvious requirement into a “User Story” proves that despite what Agile says, it’s not always the best way to go.
Talia Fukuroe, 6 Reasons Why Agile Doesn't Work

Lots of other people have seen similar problems:

Steve Ropa at VersionOne (“Why I Don’t Like User Story Templates”) has worked with teams that don’t understand or properly follow core Agile ideas and practices…

But where they excel is making sure every story is expressed in the format As a ____ I can____So That____. No matter what the story is, they find a way to shoe horn it into that template. And this is where things start to fall apart. The need to fit the story into the template becomes more important than the content of the actual story. …

The Template has become a formalized gate. My understanding of stories when I first learned about them was that they were to bring natural language back to the conversation around what the software is intended to do. How are we to move away from formal “shall lists” and requirements documents if we are just replacing them with Story Templates?

Gojko Adzic says that robotically following a standardized story template leads to stories that “are grammatically correct but completely false”:

Stories like this are fake, misleading, and only hurt. They don’t provide a context for a good discussion or prioritisation. They are nothing more than functional task breakdowns, wrapped into a different form to pass the scrutiny of someone who spent two days on some silly certification course, and to provide the organisation some fake comfort that they are now, in fact, agile.

Everything is about the Customer

We are also told that User Stories must be customer-centric, written from the customer’s point of view and always describe something that customers care about.

This argument is built around two central ideas:

The development team must always be clearly delivering Customer Value. From the start of the project, the team is supposed to deliver working features that customers can see, touch, explore and respond to.
The Customer/Product Owner has to be able to understand every requirement, which means that every requirement must be in their language and something that they care about. (This is another example of what’s wrong with the Customer/Product Owner idea in Agile development – that a single person can be responsible for defining everything that is done in a project.)

Focusing exclusively on delivering Customer Value leaves little room for non-functional requirements and constraints, which are critically important in building any real system, and de-emphasizes important design problems and technical work that developer teams need to do in order to deliver a high-quality system and to minimize technical risks.

It has led to lots of unnecessary confusion and disagreement over how developers should take care of non-functional requirements and constraints, developers hiding technical requirements inside customer stories, or trying to warp their own technical and architectural requirements into customer-style stories that end up not making sense to customers or to developers.

It is especially a problem with under-the-covers technical requirements like security, maintainability and supportability – cross-cutting concerns that don't make sense to a customer, but are fundamental constraints that apply across all of the work that the team does and how they do it, not just work done in one Sprint.

Like the idea of a using common template, putting the customer first in requirements was well-intentioned: a way to bring the development team and customers closer together, and a way to make sure that the people paying for the work actually got what they asked for. But insisting that this is the only way that every requirement has to be framed creates unnecessary problems and risks and makes the team’s work harder than it has to be.

Don’t Get Hung Up on User Stories – Do Whatever Works

Insisting that all of the work that every development team needs to do has to be defined in the same way is arbitrary, unnecessary, and wrong. It's like back in the UML days when we learned that requirements had to be captured through Use Cases. Anyone remember how silly - and pointless - it was trying to model functional specifications using stick-men and bubbles?

Teams should use whatever format makes sense for their situation, with as little or as much detail as the problem and situation requires. There are times when I would much rather work with well-defined, detailed documented rules and scenarios that have been carefully thought through and reviewed, or a prototype that has been shown to customers and improved iteratively, than to be limited to a long list of 2-line User Stories and hope that I will be able to get someone who understands each requirement to fill the details in when the team needs them.

At Agile 2013, Jeff Patton in his talk on Agile Requirements and Product Management made it clear that the Story Template is a tool that should be used for beginners in order to learn how to ask questions. Like the “snow plow” technique for beginning skiers – something to be thrown aside once you know what you’re doing. His recommendation is to “use whatever you need to capture requirements: pictures, slides, notes, acceptance tests”. At the same conference, Scott Ambler reiterated that “stories are not enough. They are just one tool, a usage view. There are so many more options”.

Don’t worry about stories or epics and themes – or themes and epics (the Agile community can’t all agree on what’s a theme, what’s an epic, and it doesn't matter anyway). Add details when you need to. Get rid of details that you don’t need.

Don't get caught up in what Roman Pichler calls “Story Mania”:

Some product owners and teams are so fond of user stories that everything is expressed as a story. This either results in some rather odd stories – stories that capture the user interface design, complex user interactions, and technical requirements; or these aspects are simply overlooked.

Like any technique, user story writing has its strengths and limitations. I find stories particularly well suited to capture product functionality, and when applied properly, nonfunctional requirements. But user interface design and complex user interactions are better described by other techniques including design sketches, mock-ups, scenarios, and storyboards. Complement your user stories therefore with other techniques, and don’t feel obliged to only use stories.

Even some of the people who developed the original idea of User Stories agree that not everything can or should be done as a story.

Capture requirements in whatever format works best. Keep requirements as light and clear and natural as possible. If you have important technical requirements that need to be worked on, write them down and don’t worry about trying to warp them into some kind of arbitrary customer stories because you've been told that’s what you have to do in order to “be Agile”.

Let’s all stop telling stories and get some work done.