How to set up monitoring for your Mastodon instance with Prometheus and Grafana

January 11, 2023

I’m not much of a “server person”. But there’s no escaping servers and cloud if you are serious about working in technology. Applications and data live in the cloud, and it’s always good to have understanding of how it works, even if you mostly work on the client side.

Recent surge of interest in Mastodon took me to setting up some instances and helping to operate them–some on my own, and some as part of a group. Such projects are a useful vehicle to learn new things in the context of a real application, and one of the areas I was interested in, is monitoring.

I went through the process of setting up monitoring for some Mastodon instances with Prometheus and Grafana. I haven’t worked with these systems before, so it was a basic crash course to me, which I think I somewhat passed, as I understand the basic concepts now. This post is mainly a set of notes and copypastable instructions for my future self, but there’s nothing secret here, and it may be of value to others too.

This material is largely based on the excellent series of blog posts by IPng Networks: part 1, part 2, part 3. That series assumes a bit more background knowledge and doesn’t cover the basics, like installing Grafana and Prometheus. So my post fills in some gaps in those.

Okay. Let’s dig in

Mental model of Grafana and Prometheus

Here is a naive explanation of what these are. Prometheus is a tool for collecting metrics, and Grafana displays those metrics on dashboards.

Here’s how I think about them.

Diagram about Prometheus and Grafana

Prometheus has the concept of exporters - they export metrics about a particular application or subsystem from a particular host. One key exporter provided by Prometheus is node exporter. It has nothing to do with Node.js - it means “node” as in a physical machine instance (which may be really physical or a VM).

The exporters all run a small web server which exports metrics in specific text format. There is a list of canonical port allocations for exporters.

Next, there is a central “prometheus” agent, which collects and stores metrics from the individual exporters, and exposes yet another small web server. Prometheus also has its own web UI to visualize the metrics, but I won’t look at that since I’ll use Grafana.

Grafana can consume data from many sources, with Prometheus being one (and in this post, the only) source. It visualizes the data as dashboards. It can do many other thigns like generate alerts, which is not covered in this post.

This was a very quick and naive explanation to help establish a basic mental model. These systems have a lot more depth which is beyond the scope of this basic post.

So with the basic mental model in place, let’s install some software.

Starting point

I assume the following starting point.

I have a Ubuntu host running Mastodon.
I have superuser access on the host, and know how to run some basic Linux configuration and sudo. (If I already got Mastodon up and running, I am more than covered here.)
I will install all monitoring on the same host.

Install Grafana

Let’s get an empty Grafana up and running before we fill it with data. Grafana is present in many APT repositories, but the version of Grafana that comes from my hosting provider is very old. Fortunately, Grafana provides its own APT repository. Follow the instructions here. I chose to install the open source version.

Now, you follow the instructions, you start the daemon, the server is running… or is it? How do you know?

Here is one important command you should use throughout this setup process: sudo lsof -nP | grep LISTEN. This shows all applications that are listening on some ports. I mentioned above that we’ll be installing a bunch of small web servers, and this is a good way to confirm they are actually running.

When you get to this point, and look for grafana in the list of running servers, you won’t see it at first. What is going on?

By default, both Grafana and Mastodon listen at port 3000. Since Mastodon is already running on that port, Grafana couldn’t, and fails to run.

To confirm this, here is another useful command: sudo journalctl -u grafana-server (replace “grafana-server” with whatever daemon you are interested in). You will likely see something like this.

Jan 11 08:01:35 example grafana-server[584534]: logger=server t=2023-01-11T08:01:35.790401765Z level=error msg="Stopped background service" service=*api.HTTPServer reason="failed to open listener on address 0.0.0.0:3000: listen tcp 0.0.0.0:3000: bind: address already in use"
Jan 11 08:01:35 example grafana-server[584534]: logger=secret.migration t=2023-01-11T08:01:35.796920608Z level=error msg="Stopped secret migration service" service=*migrations.DataSourceSecretMigrationService reason="context canceled"
Jan 11 08:01:35 example grafana-server[584534]: logger=infra.lockservice t=2023-01-11T08:01:35.797571767Z level=error msg="Failed to release the lock" error="context canceled"
Jan 11 08:01:35 example grafana-server[584534]: logger=server t=2023-01-11T08:01:35.798076811Z level=error msg="Server shutdown" error="*api.HTTPServer run error: failed to open listener on address 0.0.0.0:3000: listen tcp 0.0.0.0:3000: bind: address already in use"
Jan 11 08:01:35 example grafana-server[584534]: *api.HTTPServer run error: failed to open listener on address 0.0.0.0:3000: listen tcp 0.0.0.0:3000: bind: address already in use
Jan 11 08:01:35 example systemd[1]: grafana-server.service: Main process exited, code=exited, status=1/FAILURE
Jan 11 08:01:35 example systemd[1]: grafana-server.service: Failed with result 'exit-code'.
Jan 11 08:01:36 example systemd[1]: grafana-server.service: Scheduled restart job, restart counter is at 5.
Jan 11 08:01:36 example systemd[1]: Stopped Grafana instance.
Jan 11 08:01:36 example systemd[1]: grafana-server.service: Start request repeated too quickly.
Jan 11 08:01:36 example systemd[1]: grafana-server.service: Failed with result 'exit-code'.
Jan 11 08:01:36 example systemd[1]: Failed to start Grafana instance.

To fix this, edit /etc/grafana/grafana.ini. Add some port where there isn’t anything running already, for example 3100:

# The http port  to use
http_port = 3100

Restart the server with sudo systemctl restart grafana-server. You should now see it in the list of servers:

sudo lsof -nP | grep LISTEN

…
grafana-s 584598                           grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584607 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584608 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584609 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584610 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584611 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584612 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584613 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584614 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584615 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)
grafana-s 584598 584616 grafana-s          grafana    9u     IPv6            5587781       0t0        TCP *:3100 (LISTEN)

Install nginx frontend for Grafana

Your Grafana isn’t available from the public Internet. Or at least shouldn’t be. Check that you can’t publicly access http://grafana.example.com:3100. If you can, use some firewall or network configuration on your host or hosting provider to limit access to this port from the public Internet.

A common practice for public servers is to have nginx frontend which does HTTPS termination and reverse-proxies to the actual application. This is also what Mastodon itself does. We will now do this for Grafana.

I will walk through the nginx host setup in typical steps.

First, update your DNS to point grafana.example.com to the IP of this server.

Next, add grafana into /etc/nginx/sites-available, with the following content:

server {
  listen 80;
  listen [::]:80;
  root /var/www/html;
  server_name grafana.example.com;
  index index.html index.htm index.nginx-debian.html;
  location / {
    # First attempt to serve request as file, then
    # as directory, then fall back to displaying a 404.
    try_files $uri $uri/ =404;
  }
}

Link it to /etc/nginx/sites-enabled. Test your configuration: sudo nginx -t. If it looks OK, restart nginx. sudo nginx -s reload. You should now see a default site when you go to http://grafana.example.com.

Now, add HTTPS to the site. Easiest is to use Let’s Encrypt, and have their certbot do all the work. Run sudo certbot and follow the prompts. This sets up HTTPS certificates for grafana, as well as the automatic cetificate rotation (Let’s Encrypt certificates are short-lived and must be automatically rotated). When you did this correctly, you can now go to https://grafana.example.com and see a default site, served correctly over HTTPS without warnings.

Finally, set up the reverse proxy from nginx. Here’s what I use, and what your /etc/nginx/sites-available/grafana should look like in the end.

server {

  server_name grafana.example.com;

  location / {
    proxy_pass http://localhost:3100;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_set_header Host $host;
    proxy_cache_bypass $http_upgrade;
  }

  listen [::]:443 ssl ipv6only=on; # managed by Certbot
  listen 443 ssl; # managed by Certbot
  ssl_certificate /etc/letsencrypt/live/grafana.example.com/fullchain.pem; # managed by Certbot
  ssl_certificate_key /etc/letsencrypt/live/grafana.example.com/privkey.pem; # managed by Certbot
  include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
  ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

server {
    if ($host = grafana.example.com) {
      return 301 https://$host$request_uri;
    } # managed by Certbot

    listen 80;
    listen [::]:80;

    server_name grafana.example.com;
    return 404; # managed by Certbot
}

Test your configuration: sudo nginx -t. Restart nginx: sudo nginx -s reload.

Launch Grafana, set up admin password and users

Finally you can start to see something in your browser. Go to https://grafana.example.com. You should see Grafana UI.

Grafana

The first login is with user “admin” and password “admin”. It will prompt you to set a secure password. Do that. You can then continue using Grafana as admin, or set up another admin user for yourself. You can also invite more people. We haven’t set up e-mail sending in Grafana by this point, so e-mail invitations won’t work. Just copy the invitation links from the web UI and share those yourself.

Install Prometheus and node-exporter

Install Prometheus from latest official binary following these instructions. Ignore the part about “firewall rules opened for accessing Prometheus port 9090” because we won’t access Prometheus from public Internet.

You should now see that Prometheus is listening on your server:

sudo lsof -nP | grep LISTEN

…
prometheu 585995                        prometheus    7u     IPv6            5612824       0t0        TCP *:9090 (LISTEN)
prometheu 585995 585996 prometheu       prometheus    7u     IPv6            5612824       0t0        TCP *:9090 (LISTEN)
prometheu 585995 585997 prometheu       prometheus    7u     IPv6            5612824       0t0        TCP *:9090 (LISTEN)
prometheu 585995 585998 prometheu       prometheus    7u     IPv6            5612824       0t0        TCP *:9090 (LISTEN)
prometheu 585995 585999 prometheu       prometheus    7u     IPv6            5612824       0t0        TCP *:9090 (LISTEN)
prometheu 585995 586003 prometheu       prometheus    7u     IPv6            5612824       0t0        TCP *:9090 (LISTEN)

Install node_exporter from the official downloads page. (Replace the URL with the latest version.)

curl -OL https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar xzvf node_exporter-1.5.0.linux-amd64.tar.gz
sudo cp node_exporter-1.5.0.linux-amd64/node_exporter /usr/local/bin

Create /etc/systemd/system/node_exporter.service with the following content:

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Add the service and check its status. All should look good:

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl status node_exporter

You should now see the node metrics being exported. You can manually query the URL with curl and you’ll see a ton of metrics.

$ curl http://localhost:9100/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
…

Add the newly added node_exporter to prometheus configuration. Edit /etc/prometheus/prometheus.yml and add this into scrape_configs:

- job_name: node
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9100']

Restart prometheus for the new configuration to take effect: sudo systemctl restart prometheus

Add your first Grafana dashboard

We have done a lot of work but not yet seeing any dashboards. Let’s add our first dashboard to Grafana which exposes metrics from the node exporter that you just set up.

Go to https://grafana.example.com. Add Prometheus as data source. The only thing you need to enter here is the URL.

Add Prometheus as Grafana data source

Add the dashboard. Select “Import dashboard” and add 1860 as the dashboard ID. This is the official full node_exporter dashboard.

Import dashboard to Grafana

Click “Load”. In the next screen, select Prometheus as data source. Congratulations, you should now see your first dashboard.

Node exporter dashboard in Grafana

Add Mastodon statsd exporter and dashboard

Let’s now add the Mastodon-specific dashboard that IPng Networks describes in part 3 of their Mastodon blog series.

Download a recent binary version from the package releases page.

curl -OL https://github.com/prometheus/statsd_exporter/releases/download/v0.23.0/statsd_exporter-0.23.0.linux-amd64.tar.gz
tar xzvf statsd_exporter-0.23.0.linux-amd64.tar.gz
sudo cp statsd_exporter-0.23.0.linux-amd64/statsd_exporter /usr/local/bin

Install the statsd mapping file provided by IPng Networks:

curl -OL https://ipng.ch/assets/mastodon/statsd-mapping.yaml
sudo cp statsd-mapping.yaml /etc/prometheus

Create /etc/default/statsd_exporter with this content:

ARGS="--statsd.mapping-config=/etc/prometheus/statsd-mapping.yaml"

Create statsd_exporter.service in /etc/systemd/system/ with this content:

[Unit]
Description=Statsd exporter
After=network.target

[Service]
Restart=always
User=prometheus
EnvironmentFile=/etc/default/statsd_exporter
ExecStart=/usr/local/bin/statsd_exporter $ARGS
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

Add this to /etc/prometheus/prometheus.yml:

  - job_name: statsd_exporter
    static_configs:
    - targets: ['localhost:9102']

Add this to your Mastodon .env.production:

STATSD_ADDR=localhost:9125

Restart the daemons:

sudo systemctl daemon-reload
sudo systemctl start statsd_exporter
sudo systemctl restart prometheus
sudo systemctl restart mastodon-sidekiq
sudo systemctl restart mastodon-streaming
sudo systemctl restart mastodon-web

Verify that you see some output from the exporter:

$ curl http://localhost:9102/metrics

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.5989e-05
go_gc_duration_seconds{quantile="0.25"} 3.2811e-05
go_gc_duration_seconds{quantile="0.5"} 5.1658e-05
go_gc_duration_seconds{quantile="0.75"} 6.6586e-05
go_gc_duration_seconds{quantile="1"} 0.000104236
go_gc_duration_seconds_sum 0.000498889
go_gc_duration_seconds_count 9
…

Import the dashboard to Grafana. Dashboard ID is 17492.

Grafana Mastodon dashboard

Add more exporters

You can now add more exporters, such as for PostgreSQL, Redis, nginx, and ElasticSearch if you have ES enabled. I am not going to provide the details for each exporter here, but you saw the basic pattern above, and it’s the same for all exporters:

Install the exporter software
Set up configuration for it in /etc/default
Set up a system service for it in /etc/systemd/system (if you install it yourself) or make sure it exists in /lib/systemd/system (if installed by package manager). Difference explained here.
Add any other needed configuration and permissions. See the specific exporter docs. E.g for PostgreSQL, you will need to set up a special database user to read the database stats.
Add a link to the exporter to prometheus.yml to make Prometheus aware of it
Restart the services, check manually that the exporter is indeed exporting
Add a dashboard to Grafana

Where to go from here

This was a naive post that only covered the basics of monitoring Mastodon with Grafana and Prometheus. You can do many more things with these systems that I did not cover, like build your own dashboards, set up alerts, monitor things across hosts etc.

In more advanced devops environments, many of the things I do here manually are automated, templatized, containerized, scripted etc. This is just a basic manual set up with one host.

For more inspiration about Mastodon monitoring (which I may further dig into myself one day and improve my own dashboards) see lots of info from Hachyderm. For example:

the public Hachyderm dashboard
announcement by @nova
this post by @dma which links to their dashboard source and a longer writeup