Review: Working Effectively With Legacy Code

In this post we will review the book Working Effectively with Legacy Code by Michael C. Feathers.

The basic premise of the book is that we want to make a change in an existing (legacy) code base and we want to make it safely. The flowchart below describes in broad strokes the content of the book:

 

The core of the book focuses on breaking dependencies of legacy code so that we can add tests more easily. How can we break the vicious cycle? We need to make changes to add tests, but we need to have tests to make changes safely. The idea is to break the dependencies in safe ways even if it temporarily leads to a worse design until we can add tests and refactor the code.

Organization of the book

The book is divided into 3 parts of a total of 25 chapters. In the first part the author describes concepts and introduce tools that are used in the rest of the book. In part II, he describes several different situations and problems and goes on to provide a solution to them. Part 3 contains a single chapter and describes several dependency breaking techniques. As the author suggests, they are not required to be read in order.

Selected Notes

Here I’m going to list some of the techniques the book provided that I found interesting and novel.

How to break dependencies

  • Seam model – the idea is to make behavior changes while minimizing code changes. One example is to add a new subclass in order to remove/mock dependencies for tests.
  • Sprouting – move blocks of code to new methods and classes which can be easily tested.
  • Wrapper class – wrap code inside a new class which we have more control of, and can be mocked. This is a variant of the seam model, but instead of relying on inheritance it uses composition. As the author notes, this is the decorator design pattern.

A common pattern between them is keeping code changes to a minimum. The bigger the changes that more likely subtle behavior changes are introduced.

Where to test

  • Effect analysis – draw an informal DAG, where nodes represent functions or classes and a directed edge implies that changing one node will affect the other. This will help understanding which code is subject to be affected and must be tested.
  • Pinch points – The effect analysis might yield too many affected nodes – pinch points nodes that cover a large portion of affected code but has relatively few dependencies so it’s easier to test.

Interpreted Languages

The book has the strongest assumption that we are working with a compiled object-oriented language, namely C++, Java or C#. He dedicates about a page or two to dynamic languages, in this case Ruby.

Quotes

The author shares several design principles throughout the chapters. Here are some quotes describing some of them.

Temporal coupling:

When you group things together just because they have to happen at the same time, the relationship between them isn’t very strong.

Refactoring:

Rename class is a powerful refactoring. It changes the way people see code and lets them notice possibilities that they might not have considered before.

Command and Query Separation:

A method should be a command or a query but not both. A command is a method that can modify the state of the object but that doesn’t return a value. A query is a method that returns a value but that does not modify the object.

Psychology of working with legacy code:

If morale is low on your team and it’s slow because of code quality, here’s something that you can try: pick the ugliest most obnoxious set of classes in the project and get them under test. When you’ve tackled the worst problem as a team you will feel in control of your situation.

Conclusion

Overall there are many interesting techniques for breaking dependencies and making the code more testable. I’ve been using some of them in a less methodic way. I like the pragmatism of the techniques (done is better than perfect), which can be seen when good practices and designs are temporarily violated to write tests.

I’ve also enjoyed the code design tips interspersed throughout the chapters, but I disagree with some of the suggestions, for example converting static utils to methods in class instances. In my opinion static classes are a good forcing function to keep implementation modular and stateless. This is especially the case if there’s a good mocking framework available which can mock out static functions.

I found that the set of tips are less useful for dynamic language or just in time compilers which allows runtime modifications of the objects, because it allows much more powerful testing frameworks which for example can easily mock dependencies is for us. Nowadays I work mostly with Hack and JavaScript, which both provide a lot of runtime flexibility and support for mocking dependencies. The challenge I often face is having to mock too many dependencies which is tedious.

The major negative of the book is that it contains a lot of obvious typos and typographic error such as duplicated listings.

Related Posts

  • Review: Code Complete 2 – This is another review of a highly practical and pragmatic book that I believe made me a better programmer. The overlap in content is minimal, so they’re a nice complement.

Review: The Design of Everyday Things

don-norman

Don Norman is the director of The Design Lab at University of California, San Diego.

In The Design of Everyday Things, he discusses human psychology, introduces many concepts around design and provides suggestions to improve the usability of products. He takes into account practical real world challenges, such as time and budgets constraints during development.

The book is divided into seven chapters which we’ll summarize in this short post.

1. The psychopathology of everyday things

This first chapter focus on attributes of products that influence its usability. It introduces concepts such as affordances, mapping and feedback that improve usability. Affordances help people figure out what actions are possible without the need for labels or instructions. These are relationships (between human and object), not (object) properties.

Affordance make this obvious this side is to be pushed. The asymmetric bar suggest which side of the door to press.

Affordance make this obvious this side is to be pushed. The asymmetric bar suggest which side of the door to press.

Sometimes it’s not possible to make actions obvious, in which case we need signifiers to help with it. Signifiers include messages, symbols and legends.

Signifiers, labels in this case, aid users in deciding whether to pull or push.

Signifiers, labels in this case, aid users in deciding whether to pull or push.

Mapping is useful when the controls the human interacts with are not in the same place as the object controlled. A common example is a switch and light. When there are many lamps to control using physical correspondence between them make it easier to find out which switch controls each light.

Physical distribution of switches location maps to actual lights location.

Physical distribution of switches location maps to actual lights location.

My first reaction on the switch above is that it looks ugly and cluttered. One message I got from the book is that good design is not necessarily beautiful and minimal – sometimes they’re conflicting even, because they might hide affordances and signifiers.

Feedback is communicating the result of an action immediately. This includes turning the light on the elevator button when it has been pressed or in web design depressing a button and disabling it temporarily (if the result cannot be returned immediately).

Conceptual model is the ability for the user to keep a simplified version of the system in their mind, often relating to an existing product. One example is the use of terms like Desktop, Folders and Files in the GUI of an operating system, relying on the existing model of organization from an office.

One example of bad conceptual model is the heater/oven regulated by a thermostat. If you want to pre-heat the oven quicker, one natural idea is to put the temperature to the maximum and then lower it down when it’s ready. The problem is that this is not how thermostat ovens work. They have a heater providing a constant flow of heat, and they control the temperature by turning it on and off. The longer you leave it on, the higher the temperature gets, but it make it reach that temperature faster.

2. The psychology of everyday actions

This chapter focus on the user side, more specifically, what goes in their head when interacting with a product. He proposes breaking down an action into stages.

Stages of an action

Stages of an action

He discusses levels of processing: visceral (instinct), behavioral (habit) and reflective (conscious). In the picture above, the stages are aligned by these levels. Intention and evaluation are both at the conscious level, plan and interpretation at the behavioral, and finally execution and perception are visceral.

Users blame themselves. Humans are usually eager in blaming other people day to day, but when interaction with machines, they often blame themselves, but the confusion is caused by a bad design.

3. Knowledge in the head and in the world

This chapter focuses on how we use knowledge to interact with a product. He categorizes knowledge into two: knowledge in the head (memory) and knowledge in the world (conventions, standards).

Delving into the workings of memory he talks about short term vs long term memory and how short memory can only keep a few item “on cache” (using a computer analogy). The author mentions how constraints help remembering things, such as why it’s easier to remember poems vs. prose, because has a more rigid structure. He brings back ideas from Chapter 1, like conceptual models and mapping, which reduces the amount of things to remember.

Regarding knowledge in the world, a lot of conventions vary according to culture or country (e.g. which side of the road to drive on), which must be taken into account especially when developing systems available internationally.

Systems should rely more on knowledge in the world than in the head. Some systems rely on knowledge in the head on purpose, often for security reasons, for example reliance on passwords.

4. Knowing what to do: constraints, discoverability and feedback

This chapter focuses on how the product can help users to interact with it by limiting the universe of possible actions (constraints), making it easy to discover the right way to use it (discoverability) and providing feedback information along the way to tell users whether they’re using it correctly.

He categorizes constraints into four types: physical, cultural, semantic (derived from the purpose of the action) and logical (for example: there’s only one logical way to perform an action).

For discoverability the author analyzes the design of faucets, which have to make it easy for users to control water flow and temperature.

For feedback, he discusses the pros (does not require focused attention) and cons (annoyance, surrounding noise) of using sound as feedback.

5. Human error? No, bad design

In this chapter, the author focuses on user errors. He categorizes them into slips (execution error) and mistakes (planning error). Slips are easier to detect because they are a deviation of the expected plan, while mistakes might be executing correctly but the wrong plan.

He suggests designing for errors. This includes preventing errors in the first place (constraints), sensibility checks (e.g. input validation), the option to undo actions, make error obvious and easy to correct.

6. Design thinking

This chapter provides a framework for the process of designing. It includes the double diamond: the first diamond tries to find/define the problem, while the second is to find the solution.

The analogy with the diamond shape is that in both phases it starts by expanding the range of ideas and then narrowing down to specific ones. More technically, he defines four phases in each of the diamonds:

1. Observation
2. Idea generation
3. Prototyping
4. Testing

Observation requires a deep understanding of a small set of customers (as opposite to other forms of observations such as large-scale general A/B testing).

Idea generation is basically brainstorming. This, with the prototyping and testing should be an interactive process.

In the rest of the chapter the author discusses related topics of designing, how external factors influence the design process (budget and time constraints), the fact that the buyer might not be the end user (e.g. appliances for a rental place) and how making something harder to use might be desirable (such as to improve security and provide access control).

7. Design in the world of business

In this final chapter, the author focus on real world design. Besides the budget and time constraints, one source of bloated design is the featuritis that arises from competition. If the competitor of a product adds a new feature, it has to follow suit and add it too.

Another challenge with design, arises from the fact that people don’t like changes. Improving the design or introducing a new technology sometimes doesn’t take off until much later when people start getting used to it and adopting it. Around this theme, we discusses the tradeoffs of incremental and radical innovations, and argues that both are important for the development of products.

Conclusion

the-design-of-everyday-things

My impressions: I did like that the book uses consistent terminology to explain concepts and that the author provides a lot of examples. I also like the fact that he come up with conceptual models, defining relationships between different concepts, such as the stages of an action.

I didn’t think the book was very organized. He does mention the book doesn’t have to be consumed linearly, but I did feel that the book was a collection of topics around a theme instead of a cohesive text. I’m used to technical books where you look at the table of contents and how the small parts (chapters) usually have well defined boundaries and how they assemble together to form the big picture.

Most of my work consists in developing Web interfaces for people to do their jobs better. Usability is a very important concept in this field, so I’m eager to learn more about this subject.

Thoughts: Usability of code

In light of a recent read, Code Complete 2, I’ve been constantly aware of the usability (readability) of source code. If we think about it, it shares similar challenges with end products and maybe it’s possible to leverage ideas from this book and apply them to coding.

Some analogies: good function names are affordances on how to use a function, sticking to code conventions are a good way to move knowledge from the head to the world (Chapter 3), comments can act as signifiers, invariants and unit-tests can act as constraints that convey the expected behavior of a function. Conceptual models are achieve by using good abstraction that maps intuitively to the business rules the code is aimed to implement.

As emphasized in the book, we write code for people, not for machines, so there’s no reason to not strive to make them as useful as products we interface with every day.

Review: Code Complete 2

code-complete

In this post I’ll share my notes about the book: Code Complete 2, by Steve McConnell. The book has great information about several aspects of Software Development, but it’s quite long: 862 pages.

This is not a summary of the book by any means, but rather points that I found interesting, novel or useful. I hope the reader find it useful or at least that it inspires you to go after the book for more details on a specific topic.

I’ve written down bullet points about each chapter, some of which I added my own thoughts/comments.

Chapter 1 – Introduction

* Motivation: What is Software construction and why it’s important
* Explains the overall format of the rest of the book

Chapter 2 – Metaphors

* Metaphors help to understand problems better and use solutions developed for the analogous problems and apply them to the original problem.
* Common analogy to software development is civil construction.
* Bad metaphors, from forced analogies, which can be misleading.

Chapter 3 – Prerequisites (gathering requirements)

* Errors caught in the specification phase are the cheapest to fix (10x if the error is caught in construction).
* Different types of software require different degrees of investment in prerequisites. Three categories of softwares:
— business (e.g. website),
— mission-critical (e.g. packaged software) and
— embedded/life critical (e.g. avionics or medical devices).
* Requirements changes over time, so your design has to be flexible enough to allow changes.

Chapter 4 – Construction Planning

* Important decisions that have to be taken before the construction phase: programming language, code conventions, development practices (e.g. pair-programming), etc.
* Technology wave and the tradeoffs of choosing technology in different stages of this wave. For example, late-wave technology is more mature, has better documentation and user-friendly error messages. On the other hand early-wave environments are usually created to address problems with existing solutions.

Comments: Adopting early technology also helps with recruiting. Programmers like new technology.

Chapter 5 – Design

* The main goal of the design should be to keep software complexity low.
* Different levels of design: software-level, packages, classes, routines and internal routines — different types of software require different amounts of design details.
* Two main flavors to carry over the design: bottom-up or top-down approaches.
* Information hiding is key in a good design: It helps lowering the complexity by not requiring the person reading the code abstract details and reduce decoupling.

Comments: Hiding information properly is an art. It doesn’t help to stick as much code as possible into private methods if the public methods are not intuitive and require diving into implementation details.

Chapter 6 – Classes

* Consistent abstractions – different methods in the class should model the problem at the same level. Example: a class representing an employees record which inherits from a List with two methods:

addEmployee()
firstItem()

The second one is closer to implementation detail. In general, the closest to the business level the abstraction is, the better.

* Inheritance vs. composition: Inheritance if often abused and long chains of inheritance is often hard to read. Arthur Riel suggests no more than 6 levels, author says it should be limited to 3 levels.
* Be careful with excess of attribution to a single class. Heuristic that a class should have at most 7 members.

Chapter 7 – Routines

* Naming: should be verb + noun and should describe the value it returns (if any).
* The purpose of a routine is to reduce complexity.
* The heuristic to the maximum number of parameters is 7.
* Routines should follow the linux philosophy: it should do one thing and do it well.

Comments: in the past I used to think of routines as ways to share code. This sometimes conflicts with readability and the linux principle. This is especially true when you group several routines calls into one because it’s used in two places, but they’re not cohesive enough to name it the routine clearly, so we end up using vague terms such as init, prepare, preprocess, cleanup, etc. Nowadays I prefer being verbose (i.e. repeating the lines in both places) in favor of readable code.

Chapter 8 – Defensive Programming

* When to use assertions: error handling for things you expect to occur and assertion for the ones you don’t expect.
* When to use exceptions: should be defined a convention. The name of the exceptions should match the level of abstraction of the current code (e.g. no RuntimeException where business logic is defined) this also means catch/re-throwing if the exception crosses the boundary of two different abstraction layers.
* Barricades: a specific layer that deals with bad input so that the core code doesn’t have to deal with them.

Comments: the barricade is pretty nice, it helps reducing the complexity in the main code by not having to handle too many corner cases and also centralizes where bad data is handled, so you don’t risk double or triple checking the same conditions in several places.

defense

Chapter 9 – Pseudocode Programming Process (PPP)

* Describe the routine first in pseudo-code and then add the actual implementation but leaving the pseudo-code as comment.

Chapter 10 – Variables

* The span of a variable is defined as the number of lines between where a variable is declared to where it’s used. The average span of a variable is a indicator of complexity. High span variables means that the variable is spread out along the code. To reduce this number one can try to re-order statements in such a way that variables are declared close to where it’s used and all its uses are grouped in closer places.

Chapter 11 – Variable naming

* Optimal length is 10 to 16 characters.
* Computed qualities such as total, max, should be suffix, not prefix.
* Use conventions to indicate special types of variables such as loop indexes, temporary, boolean, enums, etc.
* Define a document for consistent variable naming conventions, including abbreviations.

Chapter 12 – Fundamental Data Types

* Consider using special purposed containers instead of plain arrays. Most of the cases we need sequential access, so queue, stack or sets can be more appropriate.
* Use intermediated boolean variables for the sole purpose of making complex predicates (in if clauses) simpler.

Chapter 13 – Unusual Data Types

* Organize related set of variables into a structure so they become more readable/easier to copy.
* Global variables are evil. If you need them, at least out them behind access routines (e.g. static member variables in a class).

Chapter 14 – Organizing code within a routine

* Make dependencies between 2 pieces of code obvious: via parameters, comments or flags + invariants.
* To break dependencies chunks of code, initializing variables in the beginning of the routine might help.

Comments: I also like memoized functions to break dependencies. If B depends on A being run, I create A as a memoized function that B can call no matter if it had been called already.

Chapter 15 – Conditionals

* When doing if/else conditionals, test the “normal” case first and the exception in the “else”.

Comments: I tend to do the opposite in favor of the early returns: this helps reducing nesting and clear up corner cases first – this works well when the handling of the exception case is simple.

Chapter 16 – Loops

* Keep as much code outside of the loop as possible, and treat it as a black box, if possible.
* Perform only one function in the loop, prefer using multiple loops with simple functions than one loop with many functions – unless you can prove that using multiple loops is the bottleneck of the code.
* The loop body should be short (<20 lines) and should be visible entirely in the screen.
* Loop nesting should be limited to 3 levels.

Chapter 17 – Unusual Control Structures

* Multiple returns within the same routine: use only if this improves readability.
* Avoid recursion if possible. Restrict it to a single routine (no chains like A -> B -> A -> B...).

Chapter 18 – Table-driven methods

* Simplify complicated conditional logic by pre-computing the values and hard-coding them in a lookup table (works if inputs are discrete).

Chapter 19 – General control issues

* When comparing numbers, use the number-line order, in other words, always use the “. For example, instead of

a < 0 || a > 10 do a < 0 || 10 < a

* Do not put side effects on conditionals.
* Switch/Case statements indicates poorly factored code in OO programming.
* Measuring control complexity in a routine: count the number of if, while, for, and, or and case. It should be less than 10.

Chapter 20 – Quality assurance

* Testing and code reviews are not enough by themselves. A combination of different techniques yields the lowest number of bugs in general.
* In studies, code review by 2 people found twice more errors as code reviews by 1 person – this is surprising because one would think a lot of the errors found by each reviewer would overlap. The book doesn’t mention the optimal number of reviewers.
* There are many qualities in a software including: correctness, usability, efficiency, reliability, flexibility and robustness, and some of them are conflicting (e.g. improving robustness decreases efficiency). Decide upfront which characteristic to optimize and keep the tradeoffs in mind.

Chapter 21 – Collaborative Development

* Formal inspections of design: the author of the design creates a document and other people in the team have to review it independently. This not only forces the designer to think it thoroughly, but also makes sure other people in the team will be familiar with the architecture.

Chapter 22 – Testing

* Exercise control flows. Instead of testing all conditionals combinations (which would be exponential, and prohibitive), add 2 tests for each conditional (one for the true and another for the false cases).
* Test data flow. The suggestion is to test all pairs of (definition, usage) of all variables. For example, if we have

  int a = 10; // 1
  ...
  if (x < a) { // 2
     ...
  } 
  int b = a + 1; // 3

In this case we would add two tests: one that exercises lines (1) and (2) and another that exercises (1) and (3).

* Bug distribution is not uniform across the code. It’s more likely that 20% of the code contains 80% of the bugs. It’s important to identify these areas and avoid over-testing, especially if using TDD.
* Keep records of bugs and fixes: where the bugs were found, severity of the code, etc. This will help to identify the critical paths.

Chapter 23 – Debugging

* A bug in your code means you don’t fully understand your code.
* Reproduce the error in different ways, to make sure you understand what is really causing the problem.
* Rewriting code might be a better alternative if debugging is taking too long. The idea is set a maximum time dedicated to debugging, after which one is allowed more drastic solutions such as rewrites.

Chapter 24 – Refactoring

* Make refactorings safe. In order to accomplish that, they should be small, self-contained, easy to review, documented, and tested.
* Different refactorings have different risks degrees and the planned accordingly.
* Book recommendation: Refactoring: Improving the Design of Existing Code by Martin Fowler.

Chapter 25 – Code Tuning

* Code tuning is overrated.
* Performance is often not the best feature to optimize: throughput and correctness are more important.
* 4% of the code accounts for 50% of the performance – and programmers are bad at guessing code bottlenecks. Finish the product first, and profiling to find the bottlenecks.
* Common sources of performance bottlenecks are system calls, I/O and pagination.

Chapter 26 – Code Tuning Techniques

* Specific techniques to improve runtime of code.
* Experimental results for each technique shows that in some environments the optimizations provide great performance gains, but in other cases, no significant improvements are obtained (sometimes degrading performance).
* The key takeaway is: profile! Compilers are very smart nowadays, so it’s hard to predict what roles an optimization will be converted to final machine code.
* Downside of optimizations is loss of readability:

“The impact of unmeasured code tuning on performance is speculative at best, whereas the impact on readability is as certain as it is detrimental.”

Chapter 27 – Program Size Affect Construction

* As the code size grows,
— More people are necessary, increasing communication overhead
— Productivity is lower
— Error density increases
— More time has to be proportionally spent on non-construction phases (planning and design)

Chapter 28 – Managing Construction

This chapter seems to be more targeted to managers, but also useful to developers to understand the “other side”.

* On encouraging good coding:

“If someone on a project is going to define standards, have a respected architect define the standards rather than the manager. Software projects operate as much on an expertise hierarchy as on an authority hierarchy.”

* Configuration management: practices to deal with changes, either in requirements, design or source code.
— Discuss planned changes in group, no matter how easy they are to implement, so people keep track of .
* Estimating construction time:
— Use estimating software,
— Treat estimation seriously
— Adjust estimates periodically and
— If initial estimations were off, learn why.

Chapter 29 – Integration

* Different strategies for doing code integration, mainly top-down (start with skeleton and fill in the blanks) and bottom-up (starts with individual pieces and glue them together).
* Strategies to make the integration smoother, such as automated builds and smoke tests.

Chapter 30 – Programming Tools

* Covers: IDE’s, tools for compiling, refactoring, debugging and testing
* Large projects require special purpose tools
* Programmers overlook some powerful tools for years before discovering them

11054398564_58a9322fa1_o

Chapter 31 – Layout and Style

* Covers indentation, line breaks
* Do not align assignment statements on the ‘=’ sign. The idea is noble and it improves readability, but it’s a standard hard to maintain. Not everyone will have the same care and also automated refactors will likely miss it.
* Add at least one blank line before each comment

Chapter 32 – Self-documenting code

* Don’t comment too much or too little :)
* The author admits there’s not a lot of hard-data regarding usefulness of comments and what the “right” amount is
* Comment while or, better yet, before coding
* Especially in bug fixes, the comment should explain why the code works now, not why it didn’t work in the past.
* Comments styles have to be easy to maintain (avoid end-of-line comments, because if the variable gets renamed, it will misalign the comment)

Chapter 33 – Personal Character

Traits that the author considers important in a good programmer:

* Humility – admits their limitation, open to learn new things, change their minds
* Curiosity

Analyze and plan before you act: dichotomy between analysis and action. Programmers tend to err of the side of acting.

* Intellectual Honesty – admit mistakes, be honest/realistic about estimates and delays,
* Communication
* Creativity
* Discipline
* Laziness

The most important work in effective programming is thinking, and people tend not to look busy when they’re thinking. If I worked with a programmer who looked busy all the time, I’d assume he was not a good programmer because he wasn’t using his most valuable tool, his brain.

Traits that the author thinks are overrated:

* Persistence
* Experience
* Gonzo programming – programming like crazy, non-stop

Chapter 34 – Themes in Software Craftsmanship

Conclusion and review of thee book

* The primary goal of software design and construction is conquering complexity
* Process matters.

My message to the serious programmer is: spend a part of your working day examining and refining your own methods. Even though programmers are always struggling to meet some future or past deadline, methodological abstraction is a wise long-term investment – Robert W. Floyd.

* Write programs for people first, computers second
* Watch for warning signs. Examples: a high number of bugs in a particular class may indicate the class is poorly design. A lot of bugs in the project overall might indicate a flawed development process. Difficulty to reuse in another place, indicates it’s too coupled, etc.

Chapter 35 – Where to find more information

Books recommendation:

* Pragmatic Programmer – Hunt and Thomas.
* Programming Pearls – Bentley, J.
* Extreme Programming Explained: Embrace Change – Beck, K.
* The Psychology of Computer Programming – Weinberg.
* The Mythical Man-Month – Brooks
* Software Creativity – Glass, R.
* Software Engineering: A Practitioner’s Approach – Pressman, R.
* Facts and Fallacies of Software Engineering – Glass, R.
* UML Distilled – Fowler, M.
* Refactoring: Improving the Design of Existing Code – Fowler, M.
* Design Patterns – Gamma et al.
* Writing Solid Code – Maguire, S.

Conclusion

This book contains a lot of valuable information and I’ve incorporated several of his ideas in my day-to-day work, especially regarding making code easier to read.

The suggestions in the book are often backed by hard data, making them more credible. Sometimes the advice is subjective, even contradicting, but he often provides several points of view or alternatives, so that the reader can make their best judgement of when to use them.