Unified logging process: How to write logfile?

3 minute read

Some history

For the past 20 years, the primary consumer of log data shifted from humans to machines - Who reads log with tail –F nowadays? No-one! The reason is quite simple: with the multiplication of available softwares, we simply do not have time to read and compute logs with our own eyes.

This is the truth, and everyone working in a IT services company will tell you the exact same truth. In Worldline, engineers still need to read logs, especially when their software behaves in an unexpected manner. With the multiplication of layers, each software uses its own logging process, and therefore, a human cannot easily understand how the log is formatted.

“This integer “203”, is it a PID, the response code, or the response time?” “Is this response time in milliseconds or seconds?” “Do you have the complete http referrer in logs”?

If you’re an engineer and have already asked yourself one of these questions, this article may help you.

A legacy issue

As a matter of facts, servers are crunching log data day and night to generate every kind of report and run statistical calculations to help us making decisions.

However, many organizations - Worldline was one of them - had difficulties to turn their growing log data into actionable insights. Why can’t they get value from their log data while all the pieces are available and affordable?

The answer is just right there: legacy logging infrastructure wasn’t designed to be “machine-first”, and so much effort is wasted trying to make various backend systems understand the log data.

Most existing current log formats have weak structures. This is because we, engineers, are excellent at parsing texts, and since we are the primary consumers of the log data, there was less motivation for log producers to give log formats a much better structure.

The statistics computation problem

In Worldline, we currently uses several tools to help us writing logs: syslog-ng, syslog, rsyslog, log4j, log2j2 etc. And all those processes obviously use their own log format. Especially for the timestamp. Try to guess. Which one of these timestamps is currently in production?

01/06/2017 – 14:00:00
20170601-140000
Thu Jun  1 14:00:00 CEST 2017

The answer is “all of them”! Because of that, computing statistics can be very difficult. The process has to use a common log format to produre daily statistics. Therefore, it has to transform the legacy log into something better.

The current solution: designed for the Elastic stack

We use Logstash as a common entry point. However, each log has to be in a specific format. Each log witch does not apply rules is excluded. This is what we called: LogFormat.

This unification has several goals:

  • Make logs easier to read, for a RUN team, for instance.
  • Automatic statisitics
  • KISS (Keep It Simple, Stupid)
  • Create a base that can evolve in the future
  • Default Kibana Dashboard with simple metrics

Some rules have been established (not all of them are written here):

  • Separator is simple space
  • Add only necessary information
  • If the information is missing, then replace it with “-“

The LogFormat V1 has been deployed with 2 sub-format: Log & Access with a common generic part:

Generic Part

The sub-format Log part looks like this:

Sub LogPart

Whereas the sub-format Access part is:

SubAccessPart

And if you need some metrics, the reserved names are pid, criticity, custom_message, src_ip, src_host, user_id, request, return_code and so one.

Here are examples of well-formated logs:

1 2015-01-13T14:13:55+01:00 server1 - - MYAPP log INFO RX="ufapi-d5ac59cc-0d6d-47d4-a3ef-bb888e642293" | custom_json='{ "requestId":"bb888e642293", "action":"GET", "resource":"/", "resourceId":"null", "userId":"test-user1", "workspace":"mywork"}' | mylog='3 ids OK'

1 2014-12-15T15:33:23+01:00 serverX - - LogManagementPortal access info 10.10.10.10 - "POST /elasticsearch/.kibana/__kibanaQueryValidator/_validate/query?explain=true&ignore_unavailable=true HTTP/1.1" 200 - 0.007 174 | upstream_response_time=0.007 | http_referer=http://server3:8080/ | http_user_agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari

1 2014-12-10T16:05:50+0100 serv32 - - portal access INFO - myuser "GET" 100 a5ad23797c91f465aaca162b5f611f843ae87b6b61ec508292848b413c71b13b 0 0 | API= "REST" | findAll="TypeRPM" | Find all TypeRPM

Who can use it?

Everyone! The aim of this LogFormat is to create a common format that can be shared by all apps, in any IT services company. If you’re used to Kibana & Elasticsearch mappings, you know how hard it is to create multiple mappings that match all logs. Here, We currenly have only one mapping shared by all applications. If a developer needs a Kibana dashboard, he just has to respect this LogFormat and filebeat his logs to Logstash instances, and boom! The dashboard is ready.

What are the next steps?

If you need more information about this LogFormat, fell free to contact us. We can provide you the Logstash template for this LogFormat.


Written by

Christophe Journel

Nintendo addict