Refactoring

Introduction

In schools and universities, we are taught how to write software, a good software. Yet, too often in enterprises we encounter examples of a badly written code, the code that runs in production and it is responsible for doing the less/very important business of that enterprise.

Obviously, there is a gap between the (sample) problems we solve in school and the ones we encounter in real life. So, how is that gap created actually?

Well, for a start, the real-life scenarios are (much) more complex than all the Hello worlds, Fibonaccis, abstract bodies with concrete spheres and cubes or whatever else we used to cope with while progressing in our learn process.

Beside the real-life complexity of the domain problem we are solving, there is usually a shortage in resources. The most obvious is the shortage of time and money, because projects should finish in some reasonable time, that is within the planned budget. Then there is a shortage in experience, since not everyone has had the opportunity to solve the same or similar problem previously.

When combined all together, this often results in a badly designed or written software. With poor test coverage, if any.

So, when you are assigned to a project that looks a bit like this, what can you do? Of course, you can protest and explain to your manager that you don’t want to be part of that project. Or you can roll up your sleeves and perform some…

Refactoring

What is refactoring? The term can be seen as a verb and noun as well.

  • Refactoring (verb)

To restructure software by applying a series of refactorings, without changing its observable behavior.

  • Refactoring (noun)

A change made to the internal structure of software to make it easier to understand and cheaper to modify.

The key takeaway is that refactoring is a change to the codebase that does not modify the software from the outside perspective and results in an improved code quality afterwards.

Clear enough. But what is code quality?

I personally consider a (software) code to be of a good quality if it (is):

  • Easily adjustable and extendable,

which is a direct consequence of a good software design and architecture;

  • Understandable,

which is a direct consequence of well thought layering, division of responsibility and expected (natural) naming of things;

  • Contains no duplication of (core) logic,

which is an indirect consequence of the previous two and a direct consequence of donnish coding;

  • Covered by meaningful unit tests,

which protects us from making stupid mistakes with ease.

 

Now since we are clear what the refactoring is and why it is helpful, let us see some refactoring in action.

Example

I was a new team member on a project that produced a software tool that was generating various reports for the Dutch railways. Something like Google Analytics, just for the railways.

Among many reports, there was a report called Details report, which was responsible for the generation of single (data) rows that were the part of the bigger, aggregated data, a sort of a drill-down view on a report.

As one of the first to-dos on the project I received a NSZFR-103 Jira ticket with the content:

(Details) Reports generated once user clicks on a table row are hard-coded and too complicated on back end side.

Detail reports should just query original data and return results rather than making whole report from which context user is coming from and then filtering those data.

method

After some investigation throughout the solution I realized this was true – the implementation was too complicated and too long.

There was a single method in the Details report with some 420+ lines of code that was fetching the data for all the other reports in a form of a huge if – else if code block.

Such method does not satisfy at least 2 criteria that I have mentioned previously in relation to the code quality:

  • it contains duplication of (core) logic, because it takes responsibility of all the other reports and does the work for them,
  • it cannot be covered by meaningful unit tests, because it takes a whole lot of context (and setup), to work with all the reports.

(Not to mention the other two criteria here.)

By the way, this is one of the most typical code smells. Check out more on this here https://sourcemaking.com/refactoring/smells/long-method.

Bonus problems

Beside this there were additional issues with the Details report. A report can have several filters to narrow down the displayed data. These filters were not working correctly on the Details report.

There was also the ‘Download’ button. The download was also not respecting filtered data.

This had to be solved with the task as well.

How did I solve this?

It is a bit hard to formalize the ways of programming and refactoring, but I came up with this in the end.

Step 1 – Structure the code for readability

We always start by reading the code and analyzing the solution. We generally try to understand what is being accomplished by what part of the code or module (the responsibilities) and how the things are linked (that is the relationships). Therefore, my first step is always to make reading and analysis part comfortable for myself.

This can be formalized by the series of actions:

  • Format the code

Use the Ctrl + K + D (Visual Studio) or Ctrl + Shift + F (Visual Studio Code) shortcuts to accomplish this. Let the default formatters do their job and let them clear out the basic things for you. Formatted code is prettier and easier to read.

 

  • Correct spelling / naming

While you are reading through the code correct (obvious) spelling errors. Feel free to correct the naming* as well, if you think it does not describe the best the purpose to which it serves. This includes project, namespace, interface and class renaming, method and variable renaming and correcting comments, should they be present.

* It is known that the naming is one of the hardest problems in programming. My advice is always stick to the real world. Most of the time we are transferring things from the real world to the code, so stick to that real domain.

 

  • Public things go up 5, private go down 6 the file

We are reading from left to right, top to bottom. Therefore, I prefer much if I can see the public things of a class at the top of it. Usually the public things tell us what a class can do, what it offers as an interface towards the outside world. This comes quite beneficial if you are discovering things, because you can abstract then the other stuff and concentrate your attention only on important things.

Tip: You can collapse to definitions with Ctrl + M + O (Visual Studio) or Ctrl + K + 0 (Visual Studio Code) shortcut to make this phase even faster.

 

  • Remove unused things

If there are unused parts of the code that your IDE can handle, use that. Less text (code) is always better. If there are perhaps parts of the code that your IDE does not recognize, but you are confident that those parts are not used, mark them as obsolete or remove them.

Step 2 – Reading the problem from the code

While you are going through the code, check the code against the actual requirements or the real-world domain.

The actual requirements can be found in your specification, in a Jira ticket or can be shared with you by your team member. If none of this is available (this is not a rare case) ask you project manager or a business owner for actual clarifications.

This is important because you must understand the problem (and the domain) before you do any of the refactoring.

When you understand the problem, then you can adjust your code so that it provides the correct correspondence to that problem.

 

Step 3 – Action

This is the step where I finally do the refactoring. This includes a series of actions like moving and extracting code blocks (methods and variables), creation of the new interface and class files, but also the destruction (deletion) of the unnecessary ones.

There is a formalization of some 50+ refactoring maneuvers and I encourage you to check out the catalog here https://refactoring.com/catalog. However, don’t bother too much with it. You will notice that actions and maneuvers come naturally once you prepare the ground for that, as I have laid that out previously.

Important thing for you to remember at this step is not to spread too much. Be focused and perform your modifications (refactoring) in small and digestible steps. This is truly the best advice I can give you.

Keep in mind that refactoring should not break any software functionality. After your refactoring, make sure the modified application parts are working. If you have your code covered by unit tests, even better. If by some chance, you work with integration test suite, marvelous. If not, then do everything manually.

Commit your changes when you are sure you have made the right improvement. Then do the steps all over again.

Example solved

In the following I will present the classes involved in the solution of the task (NSZFR-103). Unfortunately, no code samples are allowed to be shared (NDA).

At the start of the refactoring we had the following model:

example solved

The BaseReport is an abstract class and every report in the solution implements that abstract class. Also, report holds a reference to an interface called IReportMetadata, which stores some meta information about the report, for example the unique identifier of the report, its name, the report group to which the report belongs, etc.

From my understanding of the problem I have concluded that the Details report class holds logic of the other report classes and does their job when deciding which rows will be part of the result set. Therefore, my main idea around this refactoring was to cut out the code pieces from the large method and return that logic to the specific report classes.

 r methodFor that I have added the virtual method to the BaseReport class called GetDetailsFilter. The purpose of this method would be to return a filer function, which will do the job of deciding which rows will be the part of the result set. The signature of that method is:

virtual GetDetailsFilter(): Func<TableData, bool>

For your reference the TableData object represents a single data row in a report, and every report essentially holds a collection of such objects.

So the result of the first refactoring was this

result

The next I did was to cut the specific parts of the large method and paste that code into the specific overrides of the GetDetailsFilter.

With this in place the length of the long method became zero and I entirely removed that method. After each extraction I had manually tested that report generation of that specific report still works, since that part was not covered by unit tests.

The next I did was to fix the bugs related to filtering on the Details report (check Bonus problems). I soon realized that the reason why the filtering bug was present was the inability of the model to express filtering on the Details report.

There was no property that could hold the filtering information for the Details report. For that purpose, I added a property called DetailFilters to the IReportMetadata interface. There I also added another property DefaultFilters, to be able to define filters that should always be executed for a given report.

Performing the second step in the process (Reading the problem from the code) was a mess. For what would seem to be a simple modeling problem, there was an explosion of related classes which appeared to be having the same or similar kind of responsibility.

The following classes

  • Report.Definition.Filtergrid
  • Models.Report.Filters.FilterInput
  • Models.Report.Filters.FilterItem
  • Report.Definition.FilterValue and Core.ReportEngine.FilterValue
  • ReportEngine.IFilter
  • Models.Filters.ComboBox.SingleSelectComboBoxFilter
  • Models.Filters.ComboBox.MultiselectComboBoxFilter

were all related to filtering.

Of course, I needed to perform some cuts there, because there is no reason to keep so many classes. I was sure they are essentially not needed.

Anyway, what do we need for filtering in general? Ability to define the domain, which is a set of possible values. I called this the Filter definition. Next, we must have the ability to express what is a selected value (or a set of values in case of multi-selection) from the set of possible values in a filter definition. I called this simply a Filter. Optionally, we can provide the information on operation that should be applied when filtering. The operation can be any of contains, equals, larger than, etc.

refactor 1

 

At the start of this refactoring the model had some obvious duplication and the naming was not to the best of my taste.

I renamed the FilterInput to be simply Filter, as I have concluded that it does the same work as my Filter class from the above description.

refactor 2

 

Afterwards I removed what appears to be property duplication. Both IReportMetadata and the BaseReport contain properties that have the same signature, like FiltersDefinition and DetailFilters.

Adding property DetailFilters in the abstract class, next to the existing one in the IReportMetadata interface, was my own bad. 😇

refactor 3

 

I also renamed FiltersDefinition to be FilterDefinitions as it is more natural. As the result of that I got the following model:

 refactor 7

I also found out that the IFilter is not the best naming. It can easily get associated with Filter (which is wrong), yet the interface has the same responsibility as what I described previously as Filter definition. Therefore, I renamed the IFilter to be IFilterDefinition.

Finally, the model looked like this:

 refactor 8

This looks better than it was before. We used only 3 abstractions to represent the filtering (IFilterDefinition, Filter and FilterItem).

After this step, both bugs were gone, as I was filling the DetailFilters with the right filters, that were then being consumed by the Details report accordingly, both when generation and/or downloading the report. Great! 🎉

 

Conclusion

I have showed a real-world example of a task in a project that required refactoring. The method with 420+ lines was basically eliminated from the Details report and distributed across several other reports. The division of responsibility means less complexity, which means faster problem understanding and easier maintenance. All this is good for the software health (quality).

I have also fixed two bugs as a side effect. A lot of similar classes tend to cause confusion when looking into the code. By removing unnecessary classes, I have lowered the code complexity, which is also a good thing. Actually, a very good thing.

The refactoring should be done regularly and there is always room for new polishing. I also refactored myself. 😊

I strongly believe that constant refactoring is the best only way to build complex enterprise software of high-quality.

Author: Aleksandar Bojinovic