The "Why" Loop - the only technique you need to debug

2025-03-09

A fundamental skill that helps you in becoming a more efficient developer is the ability to debug effectively.

Even if you type at 150 WPM, know all vim shortcuts and use the coolest new tools, your productivity as a developer will hit a brick wall if you can't quickly and effectively find the root cause to the errors that you run into - both while developing, and while maintaining software.

This blog shares a technique that I use to help me effectively tackle errors or bugs in code - the "why" loop.

I like to classify errors into two broad categories - Errors on the Developer's System, and Errors on the Deployed Service. Fixing both of these can be done with the same process, but the prerequisites are different.

To efficiently debug errors on your system, maintain a well-organized local setup. Avoid hacky workarounds, ensure proper logging, use appropriate IDE and tools, and understand the code you wrote.

On the other hand, debugging errors on deployed services requires your solution to have proper monitoring practices and tools in place - using logs, traces, and metrics helps tackle down problems quickly.

Even with all these enablers, a lot of developers struggle to figure out why something is not working.

The "Why" Loop#

When I run into a roadblock, I use a simple technique to figure out the root cause of my errors and bugs - I ask myself "Why". This may seem like "common sense", but you'd be surprised how many issues can be easily resolved by asking yourself this simple question repeatedly. You also might feel like you already do this, but try doing it more conciously and you'll notice the difference.

As the name suggests, you need to keep repeating the question till you are able to solve your issue. Start at the first bit of feedback you get - an error message, stack trace, link etc. - and ask yourself why it might be giving you that feedback. This will lead you to one of three possibilities

You find the root cause - stop asking why, fix it
You found a linked cause at a different place - go there, analyze & ask yourself why again
You don't know why - ask someone who might know why

A few guiding pointers to keep in mind when using the Why Loop :

Don't make assumptions : If you aren't sure on the answer to the Why, don't go forward on assumptions - you can end up in rabbit holes for hours, or even days.
It's OK to not know why : Not knowing why something is breaking is OK! But it's important that you find out why from someone who does - don't just ask a senior dev to fix it or send a ticket to the support team. Find out how they debug and learn from it.
Share your learnings : You can save others valuable time if you help explain your RCA so that next time they face the same issue, they don't try reinventing the wheel to reach where you reached. Share with your colleague, team, company or even the world.

A quick real world example might help understand what I mean. You are writing code on a backend service that processes user data and you encounter an error when trying to save a user's profile. The error message you receive is: Error: Unable to save user profile. Database connection failed.

Let's apply the "Why" Loop to this scenario:

Why did the database connection fail? Maybe, the connection string was wrong? Maybe the server is down?
- Check the connection string - nope, looks good! Let's go back.
- Check the server status - looks like it's down! Let's dig deeper.
Why is the database server down? Maybe, the server crashed, or was restarted?
- Check the auditing on the server - everything looks normal, no manual actions. Then the server went down abruptly.
Why did the server crash? There might be an issue with the server's resources, such as memory or CPU usage.
- Analyze all the monitoring on the server - you notice memory hit 100% over the past one hour! Let's dig deeper.
Why did the memory spike? There could be a memory leak in the application or an unexpected spike in traffic.
- Check the traffic on your server - looks normal, no spikes. Then it could be a code issue - let's check it out and verify before assuming it is.
- You check the most recent code changes, you notice a pointer that was not freed - the source of your leak.

You have found your root cause! Now, take the measures to fix it.

Don't stop till you are blocked#

A lot of developers stop on the steps in between and pass the burden forward - an issue that was completely on your (or your teams) plate was forwarded to a support ticket because you didn't want to check the database logs - you assumed it's an issue with the database that another team needs to fix.

You shouldn't stop your Why Loop until you reach a point where you are blocked - for example, in case the server logs are not accessible to you for security reasons, you need to reach out to someone else to get help.

Hopefully, this technique can help you too! Less dependency on others, and faster development for you!

Till the next post, au revoir!