Decisions that matter broadly in software development

20 min readDec 8, 2022

The ideal software with the best PX (people experience) to ever be created will be one without any real and visible effort put into the development process and will be one without any constraints and trade-offs whatsoever. But in the real world, software takes time and there are so many control structures around the process of making the outcome wholesome and valuable. These include: methods, version control, tasks, estimates, designs, concrete experimentation, meetings, deadlines and man-hours.

There are two broad components to consider when embarking on a software project namely: the technical-domain component (e.g. methods/tooling, data, design, version control, domain knowledge) and the people-domain component (e.g. deadlines, team spirit, effective communication, work breakdown, estimates/story points, ego, motivation, man-hours, tasks/tickets, meetings/discussions). It turns out that most of the decisions that we make in pursing a software development project to a successful end are mostly for and about the technical-domain and product-domain component and hardly more about the people-domain component. This leads to a lot of pain for the team members of a software engineering team (and other stakeholders - like the product teams or upper management) and that pain cycles back in some form in a feedback loop and destroys motivation and drive of the people involved and like an infection it spreads and reduces the overall team velocity and operative cadence.

Most people believe that the primary source of a lack of excitement and enthusiasm on software projects (this is not to be confused with burnouts but is also related) is because software developers are spending an awfully long time and going through awfully long process and hoops to build out a feature or feature set for a given software project. However, in my little experience, i have found this not to be true (even a very seemingly exciting software project or problem can become boring quickly). The major reason for boredom and lack of excitement is diminishing overall people experience on software projects.

When we speak about system design these days (from the myriad of technical articles online and GitHub repos and gists available on the topic), we talk very little about one very integral and broadly vital part of the entire system design machinery — People!

System design is hardly what it should mean in a broader sense. System design is not about boxes and arrows on a whiteboard. We spend our time designing and planning for the technical details of software to be built (or being built) but not designing and planning for the people (ourselves) who will build this software. We do everything possible to accommodate or compensate for the idiosyncrasies and inadequacies of the tooling/methods we utilise in building software but hardly ever try to accommodate for the idiosyncrasies (biases especially) and inadequacies of the people working to build these software systems.

There’s always much talk about Processes (think: sprint planning, backlog grooming, retrospectives, team culture, technical planning and team cohesion) and the Product (think: requirements, implementation, business/domain alignment and customer needs) but all that talk hardly does translate into well-built software products or well-built processes. The hierarchy of decisions is skewed against the people involved in the making those decisions for the processes that lead to the creation of the product. Our insights, best intentions or intuition, however frugal, lead us to prioritise one “seemingly important” thing over the other because our knowledge of how we get in the way of our own productivity is incomplete most of the time. Prioritisation is good only when it serves to provide actual order to chaos. Just generating estimates for our todo list isn’t enough. It is the quantifying of the risk in those estimates that makes estimates tolerable. For software to be successful in both development and deployment, you have to make decisions the cater to not just Processes and the Product but also to People (the three Ps’ that make up any coherent system) involved.

In order words, prioritise decisions that matter broadly (i.e. matter the most) over decisions that matter narrowly (i.e. matter very little). Ostensibly, we confuse decisions that matter narrowly for those that matter broadly.

A summary of primal history

In the 1970s, the “Waterfall Model” for developing software was quite popular. However, it had many drawbacks. Most of its’ drawbacks can be attributed to the result of the generally misunderstanding and consequently the improper implementation of a model for software development explained in a 1970 seminal paper by Winston Royce. Due to this misunderstanding, there was a tendency to create a “complete” and rigid design upfront (at the software design stage) way before the coding and testing stages. Furthermore, this “complete” design tended to change and morph more frequently than anticipated by the people who created these designs. Moreover, the time it should take to implement this “complete” design tends to increases over different software project iterations. This erroneous practice of trying yet failing every time to create a “complete” design before starting out to write software code plagued several software development teams (at that time) for a long while before the Agile manifesto (and by extension the Agile Model) was created in 1981. The core theme of the Agile Model is: “We value responding to changes in a fluid plan or design over religiously following a rigid plan or design”.

If you study the myriad of mutually exclusive and/or inclusive events that led to the abandonment of the Waterfall Model and the adoption of the Agile Model, you are going to notice that most people overestimate their ability to deduce ahead of time everything they need to know and do to breakdown a complex task into several simple tasks that will promote progress and simultaneously underestimate their inability to deduce ahead of time that their ignorance of the potential results from the interplays of various seemingly independent yet related and complex tasks can in fact hinder such progress for a sufficiently elaborate endeavour such as software development.

A perfect collection of essays that elucidates and exposes this human flaw to estimate incorrectly how things are going to work out in the end from the beginning is a book written by a computer scientist named Fred Brooks titled The Mythical Man-Month. In this book, he outlines many useful ideas that can guide managers of software projects in effectively managing timelines and more importantly schedules that involve people (software engineers). Brooks discusses several causes of scheduling failures in software development projects. The most salient point in his essays is the Brook’s Law: “Adding manpower to a late software project makes it later”. Most sufficiently complex software projects cannot be perfectly partitioned into discrete tasks that can be worked on without communication between the workers and without establishing a set of complex interrelationships between tasks and the workers performing them.

Why is this ? Well because humans are the biggest source of overhead costs to any endeavour (See Conway’s Law). Our minds cannot fully appreciate ahead of time the explicit cost for an undertaking especially the cost of our own bad judgment (implicit cost). Humans suck at accurately counting the exact cost of any task. It’s just the way we are. Sometimes we overestimate and at other times we underestimate.

The human mind/memory is beautiful but limited

Have you ever tried to read a 500 page book quickly in a single sitting? You quickly find out that there’s so little the brain (or by extension the mind) can hold in at any moment in time. You only remember about 20% of all of the contents of the 500 pages just after reading it. As coders, when we read code written by other people, our minds can only hold so little information about the entire (huge) codebase at a time. It’s why great code readability is predicated on simple smallness, modularity and what’s easy to understand quickly.

Then, there’s also boehm’s law which states that the risk and cost of finding and fixing a bug goes up (exponentially) over time. Why ? Well, it’s easier and quicker to spot a bug or understand the functionality in a codebase you wrote today than it’ll be tomorrow or in a week. Why ? Well, someone has modified the code already by the same time next week and his/her changes (via a pull request) have been very much intertwined with yours. So, if there was a bug in your code, it’s now more difficult to find or reproduce. It is because your small and limited cognitive carrying capacity is a vital aspect of what drives code quality. This is why you need help from code comments, instrumentation and software tests (feedback loops and structures).

Overwhelming complexity is always at odds with human cognition. It’s the reason why simplistic scientific models exist for extremely complex real-world phenomena. For instance, the model for the water cycle is a very simple model for understanding and explaining how water moves around the earth in different forms to support life even though in reality it’s a lot more complex. Heck, Mathematics is usually used in the useful over-simplification of several complex real-world activities or study e.g chemistry, biology and economics. Several economic models have been put forward to explain complex human behaviour on consumption, marginal utility and production but no single economic model fully captures the entirety of complex human economic behaviour at the very same time. This is because human beings are complex life forms that are unpredictable (much like software) and do very rational and irrational things with or without reason. As humans, we also find it difficult to deeply understand anything that is too complex in one go. Therefore, simplistic models are the means by which we dumb it all down so that it’s easy to explain and also follow. More so, there’s a danger to this manner of dumbing things down in that we lose insight into other aspects of the behaviour or complexity being studied. We could ruin the endeavour being pursued due to human oversight, overconfidence or over-expectation based on our false sense of reaching exhaustive detail.

Much of this means that people can only execute tasks in small, incremental steps or sets/chunks per time (before moving the other set or chunks of work) and never in large ones.

Unfortunately, sometimes, we as humans find ourselves continually working against this constraint of our limited cognitive range rather than working with it.

The many avoidable errors in our ways

Human imagination and silliness are both equally infinite though seemingly innocent and sometimes unintentional. It is said (by Churchill) that those who do not learn from history are doomed to repeat it. Experience is indeed the best teacher but you learn nothing from experience if you do not process it. Well, this hasn’t stopped humans from repeating mistakes made in much earlier generations now has it ? Every now and then, some bad ideas comes back in vogue just a little worse than the last time they were in vogue. Mark Twain said “History hardly repeats itself but it often rhymes”. Sometimes, we are just not attentive enough to the idea as it is being applied in realtime and so do not perceive the regression that accompanies it because we are conditioned to believe it’s a good idea when it’s in fact a bad one.

There are many past cautionary tales of how software systems were created and/or updated based on intuition or assumption alone (mostly faulty ones) which led to very horrible outcomes. The not so recent Boeing 737 Max plane accidents of 2018 and 2019 comes to mind. The key issue with the flight control system (MCAS) for the Boeing 737 planes was a lack of failsafes or corrective redundancy because someone (I hope not an engineer) at Boeing assumed that it was never going to need a failsafe 🤦🏾‍♂️.

When writing software tests, for example, engineers are plagued with lots of biases and heuristics that it can mostly become difficult to test effectively. These biases blind software engineers to mistakes and faulty assumptions that are right in front of them. It turns out that human psychology plays an important role in the quality of software testing done.

Also, when web frontend engineers write CSS for example, they unintentionally find nice ways to include foot guns as part of their work. For instance, working against normal flow, puts the engineers at a disadvantage and forces her/him to write unnecessary code in media queries to clean up after themselves. My Twitter thread here speaks to this.

The cloud-native software ecosystem has over 300 distinct tools! How cool is that ? Well, not very cool indeed. However, for some reason, that seems not to be too bothersome to most of us. Before humans move ever so quickly to ruin anything, they always start out with the best intentions (most of time — irrational intentions).

We often intuit about how well things are going to turn out without reasoning deeply about it. Between 2002 and 2021, the world-wide web changed drastically for both software consumers and software developers. The number of options and checklist items exploded almost exponentially to the point where it was difficult to keep track. There’s always a novel front-end and back-end library/framework or roadmap article each year now. This way we unintentionally increase complexity.

Even when we cannot fully intuit about how well things are going to turn out, we somehow rely on some sort of confirmation bias or congruence bias to validate rash decisions and/or faulty assumptions (upon which said decision was made) instead of turning to actual experimentation or reviewing actual data from past historical experiences (of themselves or others) to determine if the decision is well informed or not.

Lastly, one thing i have continued to notice is that people will blame themselves last when things go wrong. People are more inclined to blame their tools or others initially even when the negative results are squarely their fault.

Some overheads are hardly ever obvious in software development

How do we build sufficiently large software systems that have high commercial and social value ? Well, we build them by putting together smaller software systems (or pieces) that work well on their own and work well together (as part of a bigger whole). When a scrum master schedules meetings and a product owner creates subtasks for a much larger task with epics and stories (and what not), the way they divide up the work can affect the people on the team and how they deliver ultimately increasing or reducing friction between them and leading them to start cutting corners (ignoring software quality checks) just to get their own part of the work done or . When this happens, they (team members) are no longer interested in team effort and efficiency. They instinctively opt out of the painful overhead (i say painful because of the friction and unpleasantness that occurs when they try to reach out to a team member to help make their work easier through higher efficiency) of communicating with each other to get work done.

Overhead is an unavoidable consequence of any important endeavour that has inputs and outputs. Entrepreneurs who run businesses have to deal with overhead like direct and indirect costs in the course of trying to make money and sell their services or products to customers. These costs are usually present in the profit/loss or income statement for the business. This overhead problem is hardly ever obvious to a scrum master or a product owner at the start of a software project. Why ? because it is assumed that the team members will get along just fine simply by asking (naively): “I hope we are all ready for the next sprint of our new software project ? 😀” to which all the members of the software team answer with a resounding yet unconvincing “YES!! 😁” ever so excitedly. However, an oversight that the scrum master and product owner fail to plan for people and their idiosyncrasies! They assume they don’t need to plan for how people can get in the way of their own progress even when they genuinely mean to progress.

During sprint planning, we do like to focus squarely on the work to be done and how it should be done without really discussing the arrangements for and organisation around how it would be done. People just assume that the easier thing is to hope for the best (optimistic estimation of tasks) but plan for the worst (handling roadblocks, delays e.t.c.) or not think too much about blind spots, scope realisation or missing criteria and kick the can down the road while assuming that team members will figure shit out eventually. Well, this assumption is always flawed. Team members will most likely not figure shit out. If anything, they’d step on each others toes intentionally and/or unintentionally because of the costs associated with change that comes with building software and the limited time stipulated to effect such change.

So, what are the costs associated with change ? These are the overheads that plague the process of building software as a team and if not identified early by the software team and adequately planned for, then the team will find that the consequences are dire.

The will is infinite but the execution bounded

We do not have all the time in the world nor do we have an endless supply of resources. Our first job as software engineers is not to master syntax, algorithms and logic for solving problems using computer instructions. Our first job is also not to provide elaborate software solutions to problems we are tasked to deliver. Our first job is actually to bring down the cost of change across the full spectrum of software in development. Our second job is then to sustainably develop software solutions to problems posed in business, industry e.t.c.

Change as with most (if not all) things in life (and in software development too) is the only constant thing. We must manage change at every ugly or beautiful turn and at every decision point and also more importantly manage its costs. There are 2 broad category of costs that are associated with software development namely:

Transaction Costs
Integration Costs

These two costs must be managed at both the technical and non-technical levels of software development as failure to do so might result in very serious consequences (like technical debt). As software is created in diseconomies of scale (as opposed to economies of scale), it means that one cannot simply mass produce software like you can mass produce shoes or milk or biscuits in a factory while bringing down the cost per unit made. Economies of scale are cost advantages gotten from the manufacture and production of scarce tangible goods. When it comes to software (an intangible good), it becomes extremely difficult to mass produce it because of its nature of not being stable for long periods of time (when subjected to constant change) and lacking any means by which it can be easily, quickly and almost effortlessly be “merged”, “sliced” or “diced”.

Transactional costs are the costs incurred during the making of one or more software artefacts and the constituent parts. They are usually due to the skill level of the team of software engineers and/or single software engineer involved. The cost of creating a new source code file or installing a third-party OSS package into your software project folder is a transaction cost.

A very good example of transaction cost is how easy or hard it is to read a piece of code because of the choices of all or some or one of the members of a team of software developers working on a project. We do all agree (somewhat i guess) that the best way to manage large codebases is to organize by feature and not by type.

We know that organizing by feature improves developer ergonomics when modifying the codebase and reduces the cognitive stress when reading a codebase for the first time. However, we have been conditioned in a manner to always organize by type.

When someone who is new to the team proceeds to read the code orgainzed by type, is it possible that the transaction between their eyes, brain and the codebase doesn’t pay off at all ? Does their understanding of the workings of the codebase diminish as they spend more time reading ? Also, when a brittle test fails due to how tightly coupled the test code is to the actual application code being tested, the cost of the transaction between the writing of test code and actual code is in the degree of coupling between them. When the coupling is reduced, the transaction cost comes down. To cut down transaction costs, one has to use tools like code reviews sparingly and effectively (as code reviews are themselves can be a source of Transactional costs too).

Integration Costs on the other hand are incurred during the merging of related or unrelated software units or artefacts (built by different members of the same or different team(s) of software engineers) which must eventually go together in order to create an even more capable, powerful and valuable software than initially available. When installing a third-party open source library or integrating with a third-party API service, these all come with costs that are not obvious hitherto. Like the latest version of a third-party library may not agree with the current version of ReactJS you have already installed so that can cause a delay as you may need to look for another third-party open source package that does the same task.

The thing about integration costs is that they are harder to determine upfront than transactional costs especially when estimating software engineering tasks on a JIRA, Trello or Asana board. When delving into the endeavour of task estimating and work breakdown, i find Agile Poker sessions to be quite an ineffective use of time for all involved! Why ? Because, these sessions are quick to estimate each task solely based on transactional costs while leaving out the integration costs. In my experience, i have found that actual task estimation sessions (that discuss the technical details of the work to be done with respect to the current state of the codebase) and story breakdown sessions (that map the user story items to its own block of technical detail) do a better job. Also, i find that these story breakdown sessions work best after Sprint planning and not before. Now, i must state that it is the job of the Product owner/manager to create epics (epic tickets) and stories (story tickets) but it is your job as a software engineer(s) to create task (task tickets) and award the estimate story points. By creating the task ticket yourself, you are in a better position to estimate it better and award a more correct (estimate) story point.

A very good example of integration cost is merge conflicts. Merge conflicts are frequent on codebases and amongst teams where the tasks (on the JIRA boards) are not created and distributed with some bounded context in mind. It’s also common with team members who aren’t communicating as often as they should and where there are no agreed upon coding conventions and guide for the team in general. To cut down integration costs, one has to use tools like comprehensive story-splitting techniques that split across multiple primary product concerns and never along a single product concern. It is only the tasks under each story (for a given epic) that should be split along a single product concern that is deliverable, independent and testable. Also, the probability for a merge conflict is considerably higher when replacing a piece of code or a software feature or modifying a software feature than when adding a new feature in.

For instance, when building an e-commerce website, the task of a single software engineer (under a story) in a team should never cut across multiple unrelated domains (business functions) all at once. What do i mean by this ? Most software have areas that are inter-related but do not work together directly (i.e. they work independently of one another but share a similar dependency or affect the same data source).

Furthermore, when building an e-commerce site, you don’t want to work on the shopping cart feature without defining who can own the shopping cart and what products can go into it. So, you need to have a product management model — not necessarily a full-blown inventory system and then user management model. So when dividing up tasks, you must work on the cart and product management model at the same time. Also, the user management system needs to be worked on before the shopping cart feature. Ordering and prioritizing work to be done is a great way to reduce integration costs.

Incidentally, whenever you are tempted to breakdown a task across (not along) the same software concern (presentation, data modelling, business rules) that share similar dependencies please don’t. You are inviting chaos, needless duplication and disruption down the line. You should always strive to breakdown tasks along multiple concerns (presentation, data modelling, business rules). Never ask two different engineers during the same sprint to work on registration and authentication separately. It never bodes well and worse when they hardly communicate with each other.

Also, as product manager, you should not have 2 or more software engineers on a team working on tasks that all have the same dependency or relate to the same software artefact. It’s like asking several frontend engineers to work on different sections of the same web page as their individual task. That can easily go wrong with lots of merge conflicts if the team members aren’t over-communicating! But you knew that already — didn’t you ? You knew there was a chance that they wouldn’t over-communicate amongst themselves yet you didn’t split the tasks well.

Don’t get sucked in by shortcuts, guesswork and rabbit holes

Sometimes it can be difficult to assess the full impact of a seemingly simple/small change in a codebase (especially if you’re new to the/that codebase) let alone large ones. A more readable, cohesive yet loosely coupled codebase helps with this, but this is hardly the first instinct for most software developers when developing software in the early days of development. Our first instinct is usually to do it rough and fast. We want to quickly have an implementation out (a Proof-Of-Concept so to speak) in the shortest time possible driven by “business value”.

This is a very horrible approach for the long term and short term. It is said that it pays to be a lazy programmer who is looking for the cheapest, simplest and easiest ways to implement software features. Well, yes, I agree! However, it pays even more to be an objectively lazy programmer. You have to be able to momentarily weigh the cost against the benefit of a rough and fast approach and how possible it is to mend things as you go ahead. The argument against this from most software practitioners however is that proper refactoring will rescue the codebase eventually.

Another argument is that working and delivering fast and early is better than working yet delivering late because the former is good for your career in general but the latter is only good for solid engineering practices. I must say, these are all valid points because I am not against occasional hotfixes to production that bypass QA. I am also not against monkey patching a feature with a feature flag to hide a partial implementation of the feature in production. So, what am I against ? I am against starting a serious software project as if it were a Proof-of-Concept or the outcome of a JIRA spike ticket. The truth is if you do so, you hardly ever recover from it and it most likely will affect your career (which you are fighting so hard to protect) not only your engineering practices going forward. I am also against making the use of shortcuts a consistent habit.

The usual solution to rushed, dirty, tightly coupled code is to provide a middleman object or process that indirectly relates/connects to the originally coupled piece of code (i.e. By providing a level of indirection). The solution to a piece of code that lacks cohesion is to hide implementation details behind a friendly, transparent and accessible code interface and provide a simple dependency link for it to other dependent aspects of the software (i.e. By presenting a layer of abstraction). The reason for hiding the implementation details isn’t to keep the details unknown; it’s to make it possible to modify the implementation without breaking dependent code.

As I mentioned earlier, humans have a limited current working memory at any given time or time span. Abstractions and indirections allow us to reason about large or larger systems in an incremental manner as we build them up from smaller subsystems. Furthermore, Abstractions should be designed and implemented to make understanding the implementation details easy when needed/desired. Therefore, readability score of the codebase and transparency of the encapsulation should not be sacrificed on the altar of abstractions and indirections. That said, it is very possible to hide too much and nest interfaces too deep. As a famous computer scientist (David J. Wheeler) once noted: Every problem in computer science can be solved with an extra level of indirection except for the problem of too many levels of indirection.

The outcome of the many decisions we have to make when building software can increase or decrease the cost of change as we go through the software development and release lifecycle. It is crucial therefore to provide safeguards and key indicators for curbing and detecting such high costs respectively. However, In doing so, we must strike a balance between our lofty goals and what is simple to manage. Complexity is the enemy of execution!

Conclusion

It is pertinent to do away with assumption and needless complexity early on in the life of any software project. Every decision must be based on actual experimentation or data from past experiences. Every decision has to be aligned with the way the people involved are disposed to thinking and acting. For instance, you want to plan against how overheads can derail the goals of a sprint irrespective of the change schedule.