Debugging 101

Posted: October 23, 2016
Categories: teaching, software, debian

While teaching this semester a class on concurrent programming, I realized during the labs that most of the students couldn't properly debug their code. They are at the end of a 2-year cursus, know many different programming languages and frameworks, but when it comes to tracking down a bug in their own code, they often lacked the basics. Instead of debugging for them I tried to give them general directions that they could apply for the next bugs. I will try here to summarize the very first basic things to know about debugging. Because, remember, writing software is 90% debugging, and 10% introducing new bugs (that is not from me, but I could not find the original quote).

So here is my take at Debugging 101.

Use the right tools

Many good tools exist to assist you in writing correct software, and it would put you behind in terms of productivity not to use them. Editors which catch syntax errors while you write them, for example, will help you a lot. And there are many features out there in editors, compilers, debuggers, which will prevent you from introducing trivial bugs. Your editor should be your friend; explore its features and customization options, and find an efficient workflow with them, that you like and can improve over time. The best way to fix bugs is not to have them in the first place, obviously.

Test early, test often

I've seen students writing code for one hour before running make, that would fail so hard that hundreds of lines of errors and warnings were outputted. There are two main reasons doing this is a bad idea:

You have to debug all the errors at once, and the complexity of solving many bugs, some dependent on others, is way higher than the complexity of solving a single bug. Moreover, it's discouraging.
Wrong assumptions you made at the beginning will make the following lines of code wrong. For example if you chose the wrong data structure for storing some information, you will have to fix all the code using that structure. It's less painful to realize earlier it was the wrong one to choose, and you have more chances of knowing that if you compile and execute often.

I recommend to test your code (compilation and execution) every few lines of code you write. When something breaks, chances are it will come from the last line(s) you wrote. Compiler errors will be shorter, and will point you to the same place in the code. Once you get more confident using a particular language or framework, you can write more lines at once without testing. That's a slow process, but it's ok. If you set up the right keybinding for compiling and executing from within your editor, it shouldn't be painful to test early and often.

Read the logs

Spot the places where your program/compiler/debugger writes text, and read it carefully. It can be your terminal (quite often), a file in your current directory, a file in /var/log/, a web page on a local server, anything. Learn where different software write logs on your system, and integrate reading them in your workflow. Often, it will be your only information about the bug. Often, it will tell you where the bug lies. Sometimes, it will even give you hints on how to fix it.

You may have to filter out a lot of garbage to find relevant information about your bug. Learn to spot some keywords like error or warning. In long stacktraces, spot the lines concerning your files; because more often, your code is to be blamed, rather than deeper library code. grep the logs with relevant keywords. If you have the option, colorize the output. Use tail -f to follow a file getting updated. There are so many ways to grasp logs, so find what works best with you and never forget to use it!

Print foobar

That one doesn't concern compilation errors (unless it's a Makefile error, in that case this file is your code anyway).

When the program logs and output failed to give you where an error occured (oh hi Segmentation fault!), and before having to dive into a memory debugger or system trace tool, spot the portion of your program that causes the bug and add in there some print statements. You can either print("foo") and print("bar"), just to know that your program reaches or not a certain place in your code, or print(some_faulty_var) to get more insights on your program state. It will give you precious information.

stderr >> "foo" >> endl;
my_db.connect(); // is this broken?
stderr >> "bar" >> endl;

In the example above, you can be sure it is the connection to the database my_db that is broken if you get foo and not bar on your standard error.

(That is an hypothetical example. If you know something can break, such as a database connection, then you should always enclose it in a try/catch structure).

Isolate and reproduce the bug

This point is linked to the previous one. You may or may not have isolated the line(s) causing the bug, but maybe the issue is not always raised. It can depend on many other things: the program or function parameters, the network status, the amount of memory available, the decisions of the OS scheduler, the user rights on the system or on some files, etc. More generally, any assumption you made on any external dependency can appear to be wrong (even if it's right 99% of the time). According to the context, try to isolate the set of conditions that trigger the bug. It can be as simple as "when there is no internet connection", or as complicated as "when the CPU load of some external machine is too high, it's a leap year, and the input contains illegal utf-8 characters" (ok, that one is a lot, but it surely happens!). But you need to reliably be able to reproduce the bug, in order to be sure later that you indeed fixed it.

Of course when the bug is triggered at every run, it can be frustrating that your program never works but it will in general be easier to fix.

RTFM

Always read the documentation before reaching out for help. Be it man, a book, a website or a wiki, you will find precious information there to assist you in using a language or a specific library. It can be quite intimidating at first, but it's often organized the same way. You're likely to find a search tool, an API reference, a tutorial, and many examples. Compare your code against them. Check in the FAQ, maybe your bug and its solution are already referenced there.

You'll rapidly find yourself getting used to the way documentation is organized, and you'll be more and more efficient at finding instantly what you need. Always keep the doc window open!

Google and Stack Overflow are your friends

Let's be honest: many of the bugs you'll encounter have been encountered before. Learn to write efficient queries on search engines, and use the knowledge you can find on questions&answers forums like Stack Overflow. Read the answers and comments. Be wise though, and never blindly copy and paste code from there. It can be as bad as introducing malicious security issues into your code, and you won't learn anything. Oh, and don't copy and paste anyway. You have to be sure you understand every single line, so better write them by hand; it's also better for memorizing the issue.

Take notes

Once you have identified and solved a particular bug, I advise to write about it. No need for shiny interfaces: keep a list of your bugs along with their solutions in one or many text files, organized by language or framework, that you can easily grep.

It can seem slightly cumbersome to do so, but it proved (at least to me) to be very valuable. I can often recall I have encountered some buggy situation in the past, but don't always remember the solution. Instead of losing all the debugging time again, I search in my bug/solution list first, and when it's a hit I'm more than happy I kept it.

Reach out for help (from your duck)

Rubber duck debugging encourages you to explain your problem loudly, for example to a rubber duck sitting on your desk. Using plain words to describe a problem will force you to think about it clearly, and you'll be amazed how fast it might lead you to identifying the root cause of your bug, because you'll realize there's a problem somewhere you couldn't think about if you didn't start from the beginning.

Further debugging

Remember this was only Debugging 101, that is, the very first steps on how to debug code on your own, instead of getting frustrated and helplessly stare at your screen without knowing where to begin. When you'll write more software, you'll get used to more efficient workflows, and you'll discover tools that are here to assist you in writing bug-free code and spotting complex bugs efficiently. Listed below are some of the tools or general ideas used to debug more complex software. They belong more to a software engineering course than a Debugging 101 blog post. But it's good to know as soon as possible these exist, and if you read the manuals there's no reason you can't rock with them!

Loggers. To make the "foobar" debugging more efficient, some libraries are especially designed for the task of logging out information about a running program. They often have way more features than a simple print statement (at the price of being over-engineered for simple programs): severity levels (info, warning, error, fatal, etc), output in rotating files, and many more.
Version control. Following the evolution of a program in time, over multiple versions, contributors and forks, is a hard task. That's where version control plays: it allows you to keep the entire history of your program, and switch to any previous version. This way you can identify more easily when a bug was introduced (and by whom), along with the patch (a set of changes to a code base) that introduced it. Then you know where to apply your fix. Famous version control tools include Git, Subversion, and Mercurial.
Debuggers. Last but not least, it wouldn't make sense to talk about debugging without mentioning debuggers. They are tools to inspect the state of a program (for example the type and value of variables) while it is running. You can pause the program, and execute it line by line, while watching the state evolve. Sometimes you can also manually change the value of variables to see what happens. Even though some of them are hard to use, they are very valuable tools, totally worth diving into!

Don't hesitate to comment on this, and provide your debugging 101 tips! I'll be happy to update the article with valuable feedback.

Happy debugging!