How to make sense of error logs?

Errors are scary for most people; especially when starting off.

I’m not talking about the scenario where you expect a certain result and get something else; that might be related more to the code you personally wrote.

I’m specifically talking of a scenario when you either got started up at a new place with a completely new project; maybe the framework is new for you; maybe they have a custom-made framework for their org.

One day while working on a task; you somehow introduced a bug that throws some sort of exception which you have no idea why it came nor can you deduce how your code could have broken anything. You don’t even know the codebase well enough to be able to figure out remotely what could’ve caused it.

During a time like this, you can either get overwhelmed by that 200 lines of errors; or calmly befriend them. As it is the only thing that can help you understand the issue the quickest; not solve, but understand.

Error messages might not specifically help you fix the issue or come up with a solution, but they’re a great tool to understand what went wrong.

In layman terms, the gibberish that gets printed when your code fails is known as a stack trace.

Understanding Stack-Traces

For example, you have a piece of code that starts executing from a file called A, calls a function in file B, and B, in turn, calls a function in C; something goes wrong in C and it throws an exception Ec; which if not handled, will result in the program to exit with non-zero code. For such a scenario, you can always assume that the info about C will come first followed by B and A.

There’s a very special reason for this as well; since thinking outside-in in a scenario like this wouldn’t really help much for a developer to understand the context as well. As when you start from the outside you have a lot of context to understand and process till you reach that line that actually throws an error. And also have to think about code that might not even be related to the error you’re trying to debug.

So the better way is to look inside-out; you start with looking at the most minimal context and try to figure out the reason why it failed; failed to understand? Ok, increase a level of context, now you can look at B which is called C, and figure out what might’ve gone wrong there, and just move contexts up and gradually build up on it until you figure out what actually went wrong.

What does it look like?

Generally speaking, a stack trace will start with the name of an exception(or error); it will provide some background information about it and then there will be N number of lines starting from where the exception originated in leading up to where the execution began from.

Most probably, each line of the stack trace will also contain some information about which function/method the exception originated from, and additionally also provide information about the file and even the line number where it happened.

For example:

caused by SomeException
    at someClass.someFunction(SomeFileName.someExt:lineNumber)
    at someClass.someFunction(SomeFileName.someExt:lineNumber)

Needle in a haystack - Recognising lines of interest

I feel like stating the obvious, but the quickest way to understand the reason why some error came lies in trying to understand the flow of control leading up to the error; i.e. how did the flow of control reach that point of failure; and what was the path in terms of lines of code it travels.

However, it may not seem obvious when we think about how modern software development works; we mostly use a framework or a bunch of library functions to get started. And while we try to manifest the app we want to create, we often end up using a lot of functions that we ourselves have not written; aka library code.

So, realistically in a stack trace, you can expect only at most 5 lines in a stack trace to be significant enough for you to pay attention to; but those 5 lines will be hidden within a plethora of methods from the framework or library which is being used.

So, the lines you need to be paying attention to are basically lines that are from your project; as those are the only lines where you might have added something which would break the app.

This is where IDEs can be of good help; most good IDE has this amazing feature where while showing errors they highlight all the files that belong to your project and are written by you as well as gives a link over it, upon clicking which the exact file and line is opened on that IDE.

Image of an error log in Intellij Community.

In the above image, we can see a stack trace of what seems like a validatePassword gone wrong. You can very easily just glance and locate the orange lines, and you will notice that all of those lines start with com.company which is the package name for the current project.

This is how IDEs may help in this first step of zoning into specific lines from a bigger stack trace.

Avoid this silly mistake

A lot of people; to be ‘safe’ tend to wrap their code in try-catch blocks while catching the java.lang.Exception class. This is scary; as now all the exceptions the programmer hasn’t even thought about getting ‘caught’ and since the bubbling of that event is stopped, it gets very hard to catch or debug that issue. As there will be no stacktrace for it at all.

I’m talking about a code somewhat like this

public void doSomething() {
    try {
        someService.createSomeThing();
    catch (Exception ex) {
        logger.error("Something went wrong", ex);
    }
}

If the code is like this; and somehow the database goes down, instead of throwing a scary 500 which will prompt everyone to quickly look at it, it will silently handle that error and just provide a log statement as a response. Even while trying to investigate in such a scenario when someone will look into the logs they will only see the classname of the error that happened, and that’s it.

One thing that can be done(not should) is on top of just printing the ex we can also log ex.getStackTrace() to be sure that in such scenarios atleast the whole stacktrace is logged for us to then read and understand the issue.

However, what I think should be done is we should only handle exceptions that we expect; anything else that we didn’t probably means it is something that should fail our app.

As an example; let’s say you have a createUserName method which creates a userName for a specific user. If you know it can only throw two exceptions, UserNameAlreadyExists or UserNameFormatInvalid; then you should only handle those two exceptions and handle them accordingly and not catch java.lang.Exception as that means you will literally catch all exceptions that might happen downstream.

Find out exactly what to google

Let’s say now you’ve gotten comfortable at looking at stacktrace, and you know where to look at and also how to locate lines relevant for you, the next simple step is googling. Learning how to read stacktrace will give you exactly what you need to google to understand the issue.

For example; let’s say you have a big stacktrace, and you locate 3lines of it where your own code is mentioned, and at the topmost entry an exception is thrown that you have never heard of, this is the time to just google the specific name, learn about it and think how your code could have thrown it; as massive hints you now know exactly which lines to look at in exactly which files; which will make this process a bit faster.

Closing thoughts

This was a very basic guide; however every language and framework has a few things slightly different and you might need to fine-tune those while working to be at an advantage; for example with Laravel; whenever a 500 is thrown in a debug-enabled environment, the error page includes all the environment variables that were set and all the request params that came in for the request. So if you know about that, debugging an issue that is related to a wrong or cached environment variable will be a breeze.

Shuvojit's Blog