Skip to Content
Back to Blog

Filebeat for ft_transcendence: Efficient Log Collection

Filebeat for ft_transcendence: Efficient Log Collection

What is Filebeat?

Filebeat is a lightweight shipper for log data. It's designed to efficiently collect, process, and forward logs from various sources to Elasticsearch. For our ft_transcendence project, Filebeat serves as the foundation of our observability pipeline, gathering logs from multiple services.

Current Filebeat Version: 8.17.4 (matching our Elasticsearch version)

Why Filebeat for ft_transcendence?

Filebeat offers several advantages for our project:

  1. Low Resource Footprint: Important for our constrained local environment
  2. Reliable Log Delivery: Includes features like back-pressure sensing and persistent queues
  3. Log Enrichment: Adds metadata to make logs more searchable and analyzable
  4. Pre-built Modules: Ready-to-use configurations for common systems like Nginx and PostgreSQL

Log Sources in ft_transcendence

We're collecting logs from multiple services:

  • Django: Application logs and API access logs
  • Nginx: Web server access and error logs
  • Next.js: Frontend application logs
  • Redis: Cache operation logs
  • PostgreSQL: Database query and transaction logs

Filebeat Configuration

filebeat.yml

# Basic settings
filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: true

# Input configurations
filebeat.inputs:
  - type: container
    enabled: true
    paths:
      - /var/lib/docker/containers/*/*.log
    processors:
      - add_docker_metadata: ~

# Module configurations
filebeat.modules:
  - module: nginx
    access:
      enabled: true
      var.paths: ['/var/log/nginx/access.log*']
    error:
      enabled: true
      var.paths: ['/var/log/nginx/error.log*']

  - module: postgresql
    log:
      enabled: true
      var.paths: ['/var/log/postgresql/*.log*']

  # Custom file input for Django logs
  - type: log
    enabled: true
    paths:
      - /var/log/django/*.log
    fields:
      service: django
    fields_under_root: true
    multiline:
      pattern: '^[[:space:]]+|^Traceback[[:space:]]|^[[:alnum:]]+Error:|^[[:alnum:]]+Exception:'
      negate: false
      match: after

  # Custom file input for Next.js logs
  - type: log
    enabled: true
    paths:
      - /var/log/nextjs/*.log
    fields:
      service: nextjs
    fields_under_root: true
    json:
      keys_under_root: true
      message_key: message

# Output configuration
output.elasticsearch:
  hosts: ['elasticsearch:9200']
  username: '${ELASTIC_USER}'
  password: '${ELASTIC_PASSWORD}'
  ssl.enabled: true
  ssl.verification_mode: 'none' # For development only

  # Index settings
  indices:
    - index: 'nginx-%{+yyyy.MM.dd}'
      when.contains:
        service: 'nginx'
    - index: 'django-%{+yyyy.MM.dd}'
      when.contains:
        service: 'django'
    - index: 'nextjs-%{+yyyy.MM.dd}'
      when.contains:
        service: 'nextjs'
    - index: 'postgresql-%{+yyyy.MM.dd}'
      when.contains:
        service: 'postgresql'
    - index: 'redis-%{+yyyy.MM.dd}'
      when.contains:
        service: 'redis'
    - index: 'system-%{+yyyy.MM.dd}'

# Processing pipeline
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  - add_fields:
      target: ''
      fields:
        project: 'ft_transcendence'
        environment: 'development'

Docker Compose Integration

Here's how we've added Filebeat to our Docker Compose stack:

filebeat:
  image: docker.elastic.co/beats/filebeat:8.17.4
  container_name: ft_filebeat
  user: root # To access container logs
  volumes:
    - ./config/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
    - /var/lib/docker/containers:/var/lib/docker/containers:ro
    - /var/run/docker.sock:/var/run/docker.sock:ro
    - filebeat_data:/usr/share/filebeat/data
    # Log volume mounts
    - ./logs/nginx:/var/log/nginx:ro
    - ./logs/django:/var/log/django:ro
    - ./logs/nextjs:/var/log/nextjs:ro
    - ./logs/postgresql:/var/log/postgresql:ro
  environment:
    - ELASTIC_USER=elastic
    - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
  networks:
    - elastic
  depends_on:
    - elasticsearch
  restart: unless-stopped
  command: filebeat -e -strict.perms=false

Log Processing Features

Multiline Handling

For logs that span multiple lines (like Python tracebacks), we've configured pattern recognition:

multiline:
  pattern: '^[[:space:]]+|^Traceback[[:space:]]|^[[:alnum:]]+Error:|^[[:alnum:]]+Exception:'
  negate: false
  match: after

JSON Parsing

For services that output JSON logs (like Next.js), we enable automatic parsing:

json:
  keys_under_root: true
  message_key: message

Metadata Enrichment

We add various metadata to make our logs more useful:

  1. Host Information: OS, hostname, architecture
  2. Docker Metadata: Container name, image, labels
  3. Custom Fields: Project name, environment

Log Rotation and Management

To prevent disk space issues in our limited environment, we implement:

  1. Log Rotation: Using Docker log rotation settings
  2. Index Lifecycle Management: Aging out older logs
  3. Compaction: Regular optimization of indices

Performance Tuning

For optimal Filebeat performance in a resource-constrained environment:

# Add to filebeat.yml for performance
queue.mem:
  events: 1024
  flush.min_events: 512
  flush.timeout: 5s

max_procs: 1 # Limit CPU usage on small machines

Monitoring Filebeat

To ensure Filebeat itself is operating correctly:

  1. Check the status with: docker logs ft_filebeat
  2. Monitor internal metrics through Kibana
  3. Set up alerts for Filebeat failures

Next Steps

In the next article, we'll explore how Logstash processes and enriches the logs collected by Filebeat before they reach Elasticsearch.