Who This Guide is For
This guide is for executives, managers, and entrepreneurs who want to make better strategic decisions about software. It’s designed to give you a baseline conceptual understanding of the software development process to help you buy software, build software, and interact with software development teams.
Developers will also find this guide useful for improving their communication with business stakeholders and clients.
For help applying these ideas to your organization, email me . I offer consulting, coaching, and specialized development services.
Software development, like business, is social and collaborative. Software projects fail when communication breaks down between customers, business stakeholders, and developers. Practices such as Agile aim to ensure that the necessary communication occurs. However, there’s no silver bullet — you must adapt practices to meet your project’s specific needs.
Agile teams recognize that they don’t know exactly what needs to be built and how to build it. Instead, they seek to continuously deliver value, identify problems early, and get feedback from their users. They organize work around what users should be able to accomplish rather than software features.
Software quality can be managed just like any other requirement. The same goes for software security, but it’s difficult to estimate the probability and cost of security incidents. Testing is perhaps the most effective way to ensure quality. However, because testing can only detect problems and not show the absence of problems, teams should also consider code analysis tools and manual code review, especially for security.
Managing a software development project isn’t fundamentally different from managing any other project. However, managers should appreciate that they likely have both a different personality and different time management needs than their developers. Managers should also maintain awareness of which technical decisions have the potential to impact business outcomes now and in the future.
Table of Contents
- Components and Contracts
- Making Software vs. Making Cars
- Software Engineering vs. Excel vs. SQL vs. Scripting
- The Software Development Lifecycle (SDLC)
- Parkinson’s Law of Triviality
- Continuous Integration and Deployment
- Post-Mortems and Retrospectives
- Figure: Sample Post-Mortem Analysis
- Managing Software Quality
- Anticipating the Future (controlling Technical Debt)
- Managing Development Teams
- Time Management and Interruptions
- Skills Acquisition and Novelty
- Cloud Computing, Infrastructure-as-a-Service, Platform-as-a-Service, and Beyond
- From Monolith to Microservices
What is Software?
Components and Contracts
The most effective way to think about software is as a set of components and contracts (agreements) between those components. The contracts have provisions such as: "If you give me X, then I’ll do Y" and "If you instead give me Z, I will crash or do something unpredictable."
Thinking about software as components and contracts can help you understand how it works at multiple levels. Software engineers call this way of thinking — envisioning multiple components as a single component — abstraction. Abstraction isn’t unique to software engineering — it’s how all humans handle complexity. For example, a CEO of a large company thinks in terms of business units, such as accounting and sales, rather than the responsibilities of each accountant and salesperson.
To demonstrate the concept of abstraction, let’s take the Event Ticketing app on your phone. At a high level, you can think of it as a single component with the following contract with the customer: "If you select an available event and give me valid payment information, I will reserve you a spot and show you a confirmation. Otherwise, I will show you an error."
A level down, the software can be thought of as having two components: the app that’s running on the customer’s phone, and a ticket processing server. The app has a contract with both the customer and the server. The contract with the server states, "If you send an event identifier and payment information in a particular format, I will send you a confirmation code, an error that the event is sold-out, or an error that the payment was invalid. Otherwise, I will send you a generic error."
Yet another level down, the ticket processing component might actually consist of both a component for processing payments and a database component for storing events and orders. We can keep conceptually breaking down the software until we hit the lowest level — each line of code is a component with a contract.
A bug occurs when a component doesn’t do what it agreed to in its contract. But remember, in some cases, unexpected behavior might just be a component dutifully executing a bad contract! Just as with a legal contract, you have to be aware of loopholes, especially when you’re up against an adversarial counterparty, such as a hacker.
Developers communicate the intended contract for a component using documentation. When you hear developers complain about bad documentation, they’re complaining that the software’s functionality and contracts are not clear. In these cases, if they’re lucky, the code is available and they can try to guess what the contract is. Without a documented contract though, it can be impossible to distinguish a bug (unintended) from a feature (intended).
Over time, common design patterns have been developed for putting together components. Like business frameworks (e.g., Porter’s 5 Forces, a Business Model Canvas, etc.), design patterns provide a common conceptual language for discussing and building software. Their strengths and limitations are generally well understood and are taught as part of a software engineering education.
Making Software vs. Making Cars
The first automobile was built in 1768 — it was steam powered. Cars wouldn’t hit production until over a hundred years later when, in 1886, Karl Benz produced several copies of the same car. Today we understand the basics of automotive engineering. And, because cars are bound by the laws of physics, we can build in a margin of error that ensures reliable performance and safety under most conditions.
Compared to automotive engineering, software engineering is a young profession. Software engineers regularly have to design and build systems that haven’t been built before. And, unlike cars, software is subject to environments that don’t follow the laws of physics. Therefore, it shouldn’t come as a surprise that only 39% of IT projects are completed on-time, on-scope, and on-budget. The good news is that software engineering, as a profession, is continually learning how to make more types of software. One prominent example in business are CRUD (create, read, update, delete) applications for performing business data entry and data management.
When you take your car into the shop for maintenance, the mechanic might change the oil and rotate the tires. Software maintenance, on the other hand, often isn’t maintenance — it’s modifying the software to meet changing business needs. In this way, software maintenance is akin to asking your mechanic to attach a propeller to your car so that it can fly.
Figure 1: Is it a car? Is it a helicopter? No — it’s Helicopter-Car !
Software Engineering vs. Excel vs. SQL vs. Scripting
You’ve constructed glorious spreadsheets in Excel, queried databases with SQL, and even scripted in Python — software engineering doesn’t seem so hard! What you’re missing is that you’ve typically only had to worry about what’s called the happy path, taking a known input and computing an output. When building software, developers are forced to consider all paths, not just the happy ones. Exceptions can arise accidentally from misbehaving components or even because a malicious hacker is attacking your software. Software engineers have to build software that works across a range of inputs, across time, and with multiple users and components interacting simultaneously. Resilience is hard and time consuming to get right.
The Software Development Lifecycle (SDLC)
Software development, like business, is social and collaborative. Communication is key. Product managers (PMs) communicate with customers to determine needs. PMs communicate with developers to create requirements and schedules. Developers communicate to hash out specifications, coordinate development, and keep the PM up-to-date on progress and problems.
Projects fail when communication fails. Executives fail to provide vision. Product managers fail to understand customer needs. Developers fail to prioritize. Sub-teams fail to coordinate. In 1999, the $125MM Mars Climate Orbiter famously disintegrated in the Martian atmosphere due to a software bug. Why? – the ground control software was speaking in Imperial units (pound-seconds) and the spacecraft was listening for metric (newton-seconds).
All of the practices you hear about — Agile, Waterfall, etc. — are designed to ensure that necessary communication occurs. Unfortunately, there’s no silver bullet for communication. Instead, you must adopt and adapt practices to fit the specific needs of your project.
In the past, the dominant practice for software development was Waterfall. With Waterfall, a team would follow a sequential process of first gathering all of the requirements, then designing the system, implementing it, verifying it, and finally deploying it. Teams using Waterfall often learned the hard way that it’s nearly impossible to get each step right the first time around.
This guide presents Agile software development practices. Agile has been successful because, unlike Waterfall, it recognizes and accommodates for the reality that teams rarely know exactly what they need to build and how to build it.
In Agile, teams describe the software requirements with User Stories. Each User Story details the steps a user will take to accomplish a goal. Compared to describing software features (what functionality the software has), User Stories place the emphasis on the value that the software provides to the customer.
As part of creating user stories, teams create mockups (wireframes) that show the user completing the story. Compared to building prototypes and demos, mockups are an inexpensive way to facilitate communication and identify problems early.
Parkinson’s Law of Triviality
Teams often spend a disproportionate amount of time discussing surface-level and easy-to-grasp issues. For these issues, people feel more entitled to have opinion. For example, a team can endlessly discuss where a button should go in a phone app but may spend little time discussing the best way to implement encryption to protect sensitive data.
Agile teams break work into 1-4 week long sprints. The short turn-around time is designed to regularly deliver business value and create ample opportunity for feedback.
At the beginning of a sprint, the team sets the scope for what they’re trying to learn and accomplish. During the sprint, the team can adjust the scope of the sprint based on their progress and what they’ve learned so far.
To plan a sprint, a team has to estimate how long tasks will take. Estimates are notoriously inaccurate and should be monitored/revised during the sprint. Despite their inaccuracy, estimates can be valuable in eliciting differences in understanding. For example, if one developer expects a story to take one hour and another expects it to take ten hours, the two developers likely have a different understanding of what’s required.
For some projects (e.g., ongoing website maintenance), it may make sense to eliminate sprints and instead focus on continuously delivering value. One system for this approach is Kanban, which is inspired by Toyota’s lean manufacturing process of the same name. Teams practicing Kanban perform a continuous flow of work. For example:
Gather Requirements → Develop → Test → Deploy → Collect Feedback
To prevent bottlenecks, teams place limits on how much work can exist at each stage in the process at a given time. For example, if work is getting backed up in testing, a team might temporarily shift resources from development and requirements gathering to quality assurance. The team then deploys the tested changes to deliver value and get feedback from users.
|Signs of a Dysfunctional Software Development Lifecycle|
|Red Flag||Possible Causes|
|Your team is always in crunch mode||
|Work regularly gets bottle-necked at the end of a sprint||
|Sprint deadlines regularly slip||
|Work is regularly finished right at the end of the sprint and is low quality||
Continuous Integration and Deployment
Agile teams integrate their changes into the software on a regular basis rather than integrating all of their changes at the end of a sprint. Continuous integration allows problems to be detected early, and with the help of tools, automatically.
The shift to Software-as-a-Service (SaaS) has enabled teams to take continuous integration a step further. Agile teams can automatically deploy changes to users once the changes have passed automated checks and/or received approval. Advanced teams conduct staged rollouts, allowing them to catch problems and get feedback from a subset of users before they deploy changes to everyone.
Version Control (such as Git or Mercurial ) is a tool for keeping track of the software’s source, including its source code, documentation, configuration, and data schemas. Version Control is similar to Microsoft Word’s Track Changes features, but it provides better support for merging edits from multiple team members.
Additionally, as the name suggests, Version Control enables team members to work on multiple versions of the software at the same time, and switch between them. For example, one developer may be fixing bugs in the production version of the software while another is prototyping improvements.
Scrums and Stand-ups
Agile teams hold scrums (also called stand-ups) to promote awareness and uncover previously unknown interdependencies and blockages. Team members take turns saying what they’ve accomplished since the previous stand-up, what they’re working on next, and any risks and impediments they foresee. Some teams hold these short meetings every day. However, you should always consider other ways (e.g., email updates) of accomplishing the same goals.
Post-Mortems and Retrospectives
After a sprint, major deliverable, or incident, Agile teams hold post-mortems (also called retrospectives). The goals of a post-mortem are to determine (1) what went well or badly, (2) why, and (3) what to do about it. Ideally, you want to get to the root cause of an issue. One effective way to diagnose a problem (or positive outcome) is by asking the "5 Whys". By repeatedly asking "why?", you uncover a chain of causality.
When trying to determine the root cause, don’t de-personalize — be specific about each individual’s responsibilities and the actions they took. Additionally, recognize that in some cases, the process is flawed (e.g., is based on unrealistic expectations). In these cases, you have to explore who designed the process — or failed to design — the process.
Figure 2: A partial post-mortem of a service outage. The analysis clearly identifies who was involved, what their responsibilities were, and what actions they took (or failed to take). The analysis explores multiple possibilities instead of just one line of reasoning. The analysis also makes blunt statements about what people are like (e.g., John is over-confident). Your ability to go to that level will depend on the culture of your company and team. Based on this analysis, what changes would you suggest?
An Issue Tracker (such as JIRA ) is a database of all the user stories, bugs, and tasks for a project as well as discussions about them. Additionally, a team can track information such as how important an issue is, how long it’s estimated to take, who is working on it, and its current status.
An Issue Tracker is most effective when the entire team is using it. When only part of the team is using it, team members have to hunt down information: was that requirement in an email from last month? Did the requirement come from a phone call? Why did that requirement change? Fragmented information wastes time and is a breeding ground for software bugs.
Teams can also create workflows for different types of issues. Workflows allow the team to enforce policies such as, "Each user story we’ve implemented must be signed off on by the product manager before it’s delivered to the customer." While workflows ensure that particular communications occurred, having overly complex workflows can impair a team’s agility. The workflows you implement should be grounded in the needs of your project, such as to address a problem analyzed in a post-mortem.
Managing Software Quality
Software quality is a requirement to be managed like any other requirement. Investments in quality should be tied to user needs. Software security can also be managed as a requirement. However, the probability and impact of security vulnerabilities are hard to estimate.
To effectively manage software quality, the team has to understand their software’s risk unique profile: which components are the riskiest, what kinds of failures can occur, and how the failure of one component failing might impact the others. Based on the risk profile, the team can decide on the appropriate architectural changes, development practices, and development tools.
Quality checks are most effective when they’re automated, as automated checks enable quality to be continuously monitored and enforced. The upfront and maintenance costs of automated checks are strongly linked to how the components are organized. Often, code that is organized to be conducive to checking is also more conducive to development, as automated checks encourage a clearer separation of responsibilities between components.
Testing is perhaps the best approach to measuring and enforcing quality. Every test has two parts: (1) an input and (2) a check that the software did the right thing.
The quality of a test suite is generally measured by how much of the software it tests (code coverage). The reasoning is that a test can’t find bugs in code it doesn’t execute. The most common way of calculating coverage is statement coverage, which measures the percentage of lines of code the tests execute. The problem with statement coverage is that while a majority of code is on the happy path, most bugs occur as a result of the developer not accounting for a certain scenario. Therefore, a team may also want to consider metrics that measure decision coverage, the percentage of logical paths that a test suite executes. Regardless of the method you use, be aware of its limitations so you aren’t lured into a false sense of confidence.
While tests are good for detecting problems, they can’t show the absence of bugs. For example, with testing, it’s nearly impossible to demonstrate that only certain users have access to confidential data. For these kinds of safety requirements, a team has to adopt a combination of testing, code analysis tools, and code review.
Automated Code Analysis
Code analysis tools "read" a program and automatically identify corner cases and what might go wrong. Remember the Mars Orbiter that disintegrated in the atmosphere? Today’s code analysis tools can automatically discover that Imperial units are being used where metric units are expected. Like an automated test suite, these checks can be run with every change to the software.
You can think of code analysis tools as being similar to grammar and spell-check in Microsoft Word. Even though they can catch a lot of problems (and sometimes things that aren’t actually problems), you’ll still want to have someone review your paper for content and style. This is where code review comes in.
Code review is the practice of having developers review each other’s code. An independent review uncovers bugs and potential ways of improving the code. As an added benefit, code reviews reduce operational risk by ensuring that multiple developers are familiar with each part of the software. Additionally, code review can be a good opportunity to mentor junior developers.
Anticipating the Future (controlling Technical Debt)
Whenever a team takes a shortcut, they might have to make extra efforts in the future to compensate. The shortcuts that build up over time result in technical debt. The team pays "interest" on this debt in that future changes require more resources to implement. A team can pay down the debt by refactoring — re-organizing the code to make it easier to understand and maintain.
As you know, debt can actually be a good way to finance a business. The same is true for technical debt, but you should note that technical debt differs from financial debt in three key ways: (1) you don’t necessarily know when you’re going to pay interest, (2) you sometimes never have to pay interest, (3) you can sometimes walk away from your debt without penalty.
Teams can also go too far in the other direction: they build contingencies for future capabilities that they’ll never need. The response to this is "You Aren’t Going to Need It!" (YAGNI).
Effectively navigating technical debt requires strong communication between the business stakeholders and developers. Developers are not necessarily in the best position to know the probability of certain requirements. Similarly, business stakeholders don’t know how certain development decisions will impact development time and costs.
Managing Development Teams
Managing a development team isn’t fundamentally different than managing other teams. However, there are some general areas you should be aware of due to the nature of the work.
Time Management and Interruptions
Software developers have to keep a lot of information in their head at once. The cost of switching tasks is high. It can take anywhere from 10-15 minutes for a developer to recover from a 5-minute interruption. Managers need to allow developers blocks of time to make progress. Similarly, developers need to self-manage their distractions (chat, notifications, etc.). Responsiveness expectations should balance the need for questions to be answered quickly with the need for developers to have uninterrupted work time.
Good developers are expensive. You can improve the return on your investment by ensuring developers have access to the best tools money can buy. A top-of-the-line computer (memory, monitors, etc.) is an especially strong investment due to low prices.
On the Myers-Briggs scale, developers show a preference toward Judgement (structure and decisions). Conversely, entrepreneurs lean toward Perceiving (freestyle thinking and staying open to options). If you also lean toward Perceiving, recognize that you are likely to find it draining to hammer out detailed requirements with developers. Conversely, recognize that many developers will be frustrated by what they view as a lack of clarity from you. By accounting for differences in personality types and communication styles, you can adapt a team’s practices to streamline communication and boost motivation.
Skills Acquisition and Novelty
Like any professional, developers want improve to their skillset and do interesting work.
At its worst, this desire leads to something called resume-driven development. A developer will choose the latest, sexiest, technology instead of a relatively boring, but battle-proven method. Alternatively, a developer may choose to re-write some software from scratch instead of modifying legacy software.
Undoubtedly, these are sometimes the right decisions. However, a developer likely does not have sufficient context to make the best decision for the business. For example, a technology might provide a large enough advantage to offset the time required for that developer to learn the technology, but did the developer also factor in the risk of that technology being abandoned by the industry? How about the cost to hire new developers with those skills to maintain the system? Similarly, it’s easy for developers to underestimate the vast amount of business expertise that is often hidden within a legacy system.
Appendix: Hot Trends
In practice, big data is data that can’t be analyzed on a single computer. Typically, this means that the data is bigger than the memory (RAM) on the machine. These days, computers can support a lot of memory, so chances are your company has small data. Small data is good because standard development is a lot cheaper than having to deploy and program for a big data system (e.g., Hadoop or Spark).
If buying a more powerful computer doesn’t make your data seem small, there are often ways to make your data smaller in order to analyze it. For example, a dataset with the values “Male" and "Female" can be made 56x smaller by encoding the information as either 0 or 1. A little creativity can go a long way in avoiding complexity and saving on development time and costs.
Cloud Computing, Infrastructure-as-a-Service, Platform-as-a-Service, and Beyond
In 2013, the popular messaging app WhatsApp had 200 million active users and a staff of only 50 people. In 2014, it had 600 million active users and was acquired by Facebook for $19BB. WhatsApp was able to achieve this scale and leverage because of cloud computing.
Cloud computing provides a flexible pool of resources for teams to use when they need them. The first cloud computing was Infrastructure-as-a-Service (IaaS), providing physical computing resources on-demand. IaaS eliminated the need for teams to provision and manage their own hardware in a data center. Platform-as-a-Service (PaaS) went a step further and eliminated the need for a team to configure and administer those servers. PaaS frees the team to focus on the core business logic.
Over time, cloud computing has continued "up the stack" by packaging capabilities and services on top of the lower layers. One prominent example is Machine-Learning-as-a-Service (MLaaS). MLaaS gives teams the ability to train models over vast amounts of data without having to administer machines or deploy machine learning software.
Teams that leverage cloud computing are making a trade-off between convenience and flexibility. The further up the stack a team goes, the more decisions they’ve ceded to the cloud computing provider. A lack of flexibility can make certain changes impossible. For these changes, the team will have to drop down a level to implement their custom logic. Additionally, due to differences in services, it can take significant effort to switch providers. A team can potentially become locked-in to a provider that has a steep pricing structure.
From Monolith to Microservices
A monolith is software written as a single service running on a single server (or multiple copies of the same service across multiple servers). A recent trend is to attempt to break up the monolith into multiple different microservices, each serving a specific purpose. In theory, microservices are supposed to make your software more resilient and enable sub-teams to work independently. In practice, the benefits are more nuanced.
The primary limitation of microservices is that splitting software into multiple parts doesn’t, by itself, eliminate dependencies. If you have an Event Ticketing system, moving payment processing to its own microservice doesn’t eliminate the dependency on payment processing. When the payment processing system fails, customers still won’t be able to buy tickets. Furthermore, the payment team can’t work completely independently. The team still has to coordinate with the other teams to determine the contract (agreement) between the payment processing microservice and the other services.
Additionally, the design patterns and development tools dealing with microservices are relatively immature. A team needs to critically examine the conditions under which they can actually realize the benefits of microservices.
Open Source Software
Open Source Software (OSS) is software for which the source code is available and licensed so that developers can modify and share it. Though not technically accurate, the term has come to refer to software that is developed in the open and is made available at no monetary cost.
Teams should evaluate Open Source Software just as they would any other software.
Permissive licenses, such as the MIT license, allow you to incorporate the code into proprietary software. Copy-left licenses, such as the GNU Public License, "contaminate" your software’s licensing, requiring you to distribute derivative works using the the same licensing. In this case, you may be required to release the source code for part or all of your software.
Some Open Source Software is supported by a company. However, being supported by a company does not mean that the company has any obligation to provide support to your team now or in the future.
Before deciding to use a project, a team should evaluate the health of that project and its community, including what level of support they can expect. To perform this evaluation, look at all the available information, including release history, the Issue Tracker, mailing lists, and popular community forums (e.g., Stack Overflow, the most popular Q&A site for developers). A team will likely have to build missing features or fix bugs impacting their particular project. If a feature isn’t already available or on a roadmap, it’s likely not a priority for the project’s developers. Therefore, a team’s developers should also review the Open Source Software’s source code to evaluate its quality.
 CHAOS Manifesto 2013: Think Big, Act Small . In 2012, large projects (with a cost of labor over $10MM) only had a success rate of 10%. Small projects (with a cost of labor under $1MM) had a success rate of 76%.
 No Silver Bullet: Essence and Accidents of Software Engineering is a famous essay by Frederick P. Brooks, Jr. It’s part of The Mythical Man Month , perhaps the most influential book on software project management.
 Parkinson’s Law, or the Pursuit of Progress. Parkinson’s main law is that work will expand to fill the time allotted. Parkinson’s Law of Triviality is also commonly called bike-shedding — Parkinson had observed a finance committee spend more time debating a proposed bike shed than a proposed nuclear power plant.
 The 5 Whys approach was developed at Toyota by Sakichi Toyoda. Eric Ries, author of The Lean Startup , also recommends this approach for start-ups. He also describes the approach in a Harvard Business Review article.
 Bridgewater Associates, one of the most successful hedge funds of all time, applies the diagnosis process to all business problems, not just software development. "Don’t depersonalize mistakes" is Principle #15 in the company’s guiding Principles . The founder, Ray Dalio, believes that the root cause is always what people are like: their personality, strengths, and weaknesses.
 On the reliability of programs, a lecture by Edsger Dijkstra. Dijkstra, who passed away in 2002, is a Turing Award recipient. The Turing Award is the computing equivalent of the Nobel Prize.
 Programmer Interrupted . An approachable blog post by researcher Chris Parnin reviewing the academic research on programmer interruptions.
 Developer Personalities: Audience Brief . Developers might also tend toward Introversion (vs. Extroversion) and Thinking (vs. Sensing). Developers who lean toward Sensing may have problems seeing the big picture.
 Knowing Entrepreneurial Personalities – A Prerequisite for Entrepreneurial Education . Unlike Entreprenuers, managers tend to lean toward Judging (structure and decisions). A major implication of this difference is that the best person to start a company or project may not be the best person to manage its day-to-day operations.