Just the other day I was debugging a particularly troublesome problem on a project. We were seeing a CPU exception with no discernable reason.  It got me thinking about debugging and how sometimes it feels more like art than science.  I’ve seen more than one developer get “stuck in a rut” trying to debug a problem because of the environment or tools and an inability to push past it.  Let’s face it, debugging is a frustrating but necessary evil in our work.  Some people loathe it some people love it.  In case you’re wondering, I fall into the latter camp – I absolutely love testing and debugging the code I’ve written.  There’s something very satisfying about testing and verifying all the hard work you’ve put in designing and developing something.  That’s not to say it is not aggravating.  Like most things in life, how you apply your experiences really helps get you through tough debugging experiences.  How dependent on source-level debuggers are you?  Can you be effective without one?  What if you only have printf like functionality and it tells you nothing?  What if you don’t even have print capability?  Many software engineers throw up their hands at these things, scoff and say, “in this day and age that never happens!”  To this I laugh and wonder what exactly they have been working on.

In the recent past I’ve encountered more than one project with no available source level debugging, limited print/log capability and strange debug output at the time of the problem.  All difficult but not impossible problems.  This is where you cast what I like to think of as “debug nets”.  Think fishing.  Even if you’re not a fisherman or outdoorsy type, you should still get the metaphor.  You
cast a net trying trap the problem as part of your debug efforts:

Very nice, tightly laced nets capable of catching lots of bugs of varying types

    • With a source level debugger you set breakpoints
    • With a source level debugger and memory dumps you can analyze what was happening at the time of the crash
    • Use a static analysis tool, such as valgrind

Decent, moderately laced nets capable of catching many bugs of varying types

    • No source debugger, print/log output telling you when things are happening

Poor, widely laced nets used to catch a wide variety but nondescript bugs

    • CPU exception dump providing raw hex values of CPU registers

Any one of these things is better than nothing.  Yet, none of them are silver bullets either.  Source level debuggers are great, but what if by the time it stops, memory is so trashed it doesn’t help?  Printing debug output is still a tried and true method of debug, but what if your debug changing timing and skews the problem, or worse yet prevents it?  There really is no one good answer for how to overcome these issues. But, there are some common concepts which can be applied to aide in these problems.  Think of these as adding hooks to your nets or weaving a tighter net to catch the

  • Wrap your heap calls with diagnostic versions: track alloc/free, make sure they match and balance, add bounds checking of buffers, and trap double free-buffer problems
  • Add fixed data patterns to memory/objects
  • Use terse logging, with very minimal variable arguments, reducing the time it takes to log
  • If your language supports exceptions, use them!
  • Try to exacerbate the problem! The easier and more frequent you can make the problem happen the better.  This gives you more data points in which to work against.
  • Change something to cause the problem to happen or change.  Re-running the same thing with no data is just spinning your wheels – you are not getting anywhere!
  • Get comfortable with how your compiler and linker work for the CPU and be prepared to work backwards from raw addresses. Get creative, this can be very hard to do with dynamically loadable modules/libraries!

What are some of your favorite ways to debug very difficult problems?  What has been your most difficult problem to find and fix?  Good luck and happy debugging!