Overview of My Grafana Setup

Linux Monitoring

For the past few weeks I have been working extensively on the devices in my LAN. My FreeNAS had crashed due to a power outage resulting on a burnt memory module and a data issue with my main pool that caused a kernel panic on boot. My remote backup solution had not been working for over a year, so I knew that it would also need a complete re-design.

I rebuilt my FreeNAS pools and setup a new remote backup solution using ZFS send (FreeNAS => Ubuntu 20.04), and got a couple of UPSs to protect from future power failures. I also created a better backup structure and jobs for my main devices (will post about that later), and configured additional alerts for device availability and job status. And to tie all down together, I thought what better than to revamp my Grafana dashboard to show all this new info.

Here’s the final result! I’m quite happy with it.

How Everything is Placed Together

When I first started with Grafana, I was mainly using it with the Zabbix plugin and an imported pfSense dashboard. I had one influx database that was used with pfSense, and a MariaDB for the imported Zabbix data. The dashboard gave me a basic overview of my network data usage (pfSense) as well as status for my LAN devices (Zabbix).

Old dashboard

With the new configuration things got a bit more complex (as expected). I’m now displaying data from 5 different servers, with 2 different OS types (FreeBSD and Linux) and 5 different distributions (pfSense, FreeNAS, Arch, CentOS and Ubuntu). My collector of choice is Telegraf (due to it’s low resource and simplicity), and I have made an effort to stick to it throughout the different OSs in my design.

The Grafana Server

Collectors

  • CentOS (also the Grafana server)
    • Zabbix => MariaDB
      • Collects LAN statistics
    • telegraf => InfluxDB (DB2)
      • Collects server resources
  • pfSense
    • telegraf => InfluxDB (DB1)
      • Collects server resources, network usage and UPS status
  • Desktop (Arch)
    • telegraf => InfluxDB (DB2)
      • Collects server resources and UPS status
  • FreeNAS
    • telegraf => InfluxDB (DB2)
      • Collects server resources and ZPOOL status/usage
  • Ubuntu
    • telegraf => InfluxDB (DB2)
      • Collects server resources, VPN status, IP address and ZPOOL status/usage

How it all looks together

As the proverb that says that “A pictures is worth a thousand words”, if you are visual person like me the diagram below will help you have a better understanding of the setup I listed above.

Breaking it Down

Now let’s dive a bit more into the current dashboard looking at what data is being displayed and from what data source.

I have divided the dashboard into 8 sections and labeled it with numbers (they may be small, but they are there).

Section 1: WAN Ping

  • Description: Displays ping status to 4 specific hosts
  • Information displayed:
    • Ping status to Google
    • Ping status to Google DNS
    • Ping status to Cludflare DNS
    • Ping status to my ISP’s gateway (Bell)
  • Data source: MariaDB (Zabbix)

Section 2: pfSense Network Data

  • Description: Diplays network related data from pfSense
  • Information displayed:
    • Monthly WAN data usage
    • Daily WAN data usage
    • Wan throughput
    • LAN thorughput
  • Data source: InfluxDB (DB1)

Section 3: pfSense OS Data

  • Description: Displays system resource for pfSense
  • Information displayed:
    • Status (up/down)
    • Uptime
    • Active users
    • CPU, memory, storage (ZFS) usage
    • UPS information
  • Data source:
    • InfluxDB (DB1)
    • Zabbix plugin

Section 4: FreeNAS

  • Description: Displays system resource for FreeNAS
  • Information displayed:
    • Status (up/down)
    • Plugin status (up/down)
    • Uptime
    • Active users
    • CPU, memory, storage (ZFS) usage
  • Data source:
    • InfluxDB (DB2)
    • Zabbix plugin

Section 5: Ubuntu

  • Description: Displays system resource for Ubuntu
  • Information displayed:
    • Status (up/down)
    • VPN status (up/down)
    • IP address
    • Uptime
    • Active users
    • CPU, memory, storage (ZFS) usage
  • Data source:
    • InfluxDB (DB2)
    • Zabbix plugin

Section 6: CentOS

  • Description: Displays system resource for CentOS
  • Information displayed:
    • Status (up/down)
    • Uptime
    • Active users
    • CPU, memory, storage usage
  • Data source:
    • InfluxDB (DB2)
    • Zabbix plugin

Section 7: Arch

  • Description: Displays system resource for Arch
  • Information displayed:
    • Status (up/down)
    • Uptime
    • Active users
    • CPU, memory, storage usage
    • UPS information
  • Data source:
    • InfluxDB (DB2)
    • Zabbix plugin

Section 8: LAN

  • Description: Diplays LAN related data
  • Information displayed:
    • ICMP ping history for LAN devices
    • Current status of LAN device (up/down)
  • Data source:

Conclusion

While I’m happy with this setup there is still some room for improvement (as there always is). At some point I would like to consolidate the two Influx databases, integrate Unifi Controller and SmartThings data, and possibly add a Plex dashboard as well.

Got any comments or questions? Feel free to leave them in the comment section below.

How to Collect UPS Data on pfSense with Telegraf

pfSense Monitoring

If you are running Grafana at home to monitor your devices, and you also have pfSense running off a UPS (if you don’t, check out my previous article on How to Setup a USB UPS on pfSense), you may want to pull UPS related data from pfSense.

My Grafana pfSense config


Instructions

a. Start by logging into your pfSense, go into “System => Package Manager = Available Packages” and install Telegraf

b. Now login to pfSense via ssh, and create a file in /usr/local/bin/getUpsData.py with the content below

Note: Make sure to change the UPS name in cmd="upsc BackUPSES750"

# https://github.com/sa7mon/ups-telegraf
from __future__ import print_function
import subprocess

cmd="upsc BackUPSES750"
output=""
string_measurements=["battery.charge","ups.status","battery.runtime"]

p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)

for line in p.stdout.readlines(): #read and store result in log file
    line = line.decode("utf-8").rstrip()
    key = line[:line.find(":")]
    value = line[line.find(":")+2:]

    if key in string_measurements:
        if value.isalpha():
            value = '"' + value + '"'
        measurement = key + "=" + value
        if output != "":
            measurement = "," + measurement
        output += measurement

output = "ups,ups.name=BackUPSES750 " + output.rstrip()
print(output)

The output data will be as shown below. If you would like to format the output, refer to my instructions on my GitHub Repo - https://github.com/victorbrca/telegraf-plugins/tree/main/UPS

ups,ups.name=BackUPSES750 battery.charge=100,battery.runtime=18405,ups.status="OL"

c. Go back to pfSense UI and go into “Services => Telegraf”

d. Configure Telegraf as your usually would, and under “Additional configuration for Telegraf” add the configuration below:

[[inputs.exec]]
  commands = ["python2.7 /usr/local/etc/getUpsData.py"]
  timeout = "5s"
  data_format = "influx"

e. Restart Telegraf and check your influxdb for the new data being populated

How to Setup a USB UPS on pfSense

pfSense Hardware

So you need to configure pfSense with an UPS!? Well, good thing this post is called “How to Setup UPS on pfSense”.

a. Start by plugging the USB cable to your pfSense and your UPS

b. Now log in to the pfSense UI and go into “System => Package Manager”

b. Search for ‘nut’ and click on ‘Install’

c. Go to “Services => UPS => UPS Settings”, select “Local USB”, give the UPS a name and click on “Save”

d. Go back to the “UPS Status” page. If you can see your UPS then you are pretty much done. Now all you have to do is configure any additional NUT settings (if you need).


Possible Issues

Troubleshooting 1

If your UPS is not showing in the “UPS Settings” page, logon to pfSense via ssh and issue usbconfig. You should be able to see your UPS listed.

[2.4.4-RELEASE][admin@pfSense.localdomain] usbconfig
ugen0.1: <0x8086 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
ugen1.1: <Intel EHCI root HUB> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen0.2: <Smart Smart Wireless Device> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
ugen1.2: <vendor 0x8087 product 0x8000> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
ugen0.3: <American Power Conversion Back-UPS ES 750G FW908.W3 .D USB FWW3> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (2mA)

Solution:

If you can’t see your UPS with upsconfig try using another USB cable.

Troubleshooting 2

If you can see the UPS with usbconfig, try to restart the service from command line so you can view error messages on stdout.

Browse to /usr/local/etc/rc.d and manually restart the NUT service with ./nut.sh restart.

[2.4.4-RELEASE][admin@pfSense.localdomain]/usr/local/etc/rc.d: ./nut.sh restart
stopping NUT
starting NUT
Network UPS Tools upsmon 2.7.4
kill: No such process
UPS: BackUPSES750 (master) (power value 1)
Using power down flag file /etc/killpower
Network UPS Tools - UPS driver controller 2.7.4
Network UPS Tools - Generic HID driver 0.41 (2.7.4)
USB communication driver 0.33
No matching HID UPS found
Driver failed to start (exit status=1)
Network UPS Tools upsd 2.7.4
fopen /var/db/nut/upsd.pid: No such file or directory
listening on ::1 port 3493
listening on 127.0.0.1 port 3493
Can't connect to UPS [BackUPSES750] (usbhid-ups-BackUPSES750): No such file or directory

Broadcast Message from admin@pfSense.localdomain                               
        (no tty) at 16:57 EDT...                                               

Communications with UPS BackUPSES750 lost                                      


Broadcast Message from admin@pfSense.localdomain                               
        (no tty) at 16:57 EDT...                                               

UPS BackUPSES750 is unavailable

💡 ERROR Can't connect to UPS [BackUPSES750] (usbhid-ups-BackUPSES750): No such file or directory

Solution:

Restart pfSense

Troubleshooting 3

Try running upsc [ups name].

[2.4.4-RELEASE][admin@pfSense.localdomain] upsc BackUPSES750
Error: Driver not connected

Broadcast Message from admin@pfSense.localdomain                               
        (no tty) at 17:02 EDT...                                               

UPS BackUPSES750 is unavailable

💡 ERROR Error: Driver not connected

Solution:

Restart pfSense

How to Install Telegraf on FreeNAS

Monitoring FreeNAS

Quick and simple instructions on how to install Telegraf collector agent on FreeNAS.

a. Start by creating a folder for Telegraf on one of your pools. I have mine under /mnt/Volume1/home/system/telegraf

b. Download the Telegraf tar for FreeBSD (here), extract it and copy the binary and the .conf files to the folder we created in the previous step

  • ./telegraf-[version]/usr/bin/telegraf
  • ./telegraf-[version]/etc/telegraf/telegraf.conf

c. Create telegraf.init in the same folder (make sure to modify the 2 lines with your path)

26 : ${telegraf_conf:="/mnt/Volume1/home/system/telegraf/${name}.conf"}

32 command_args="-crP ${pidfile} /mnt/Volume1/home/system/telegraf/${name} ${telegraf_flags} -config=${telegraf_conf} >> /var/log/telegraf.log 2>&1"
#!/bin/sh
# $FreeBSD$

# PROVIDE: telegraf
# REQUIRE: DAEMON NETWORKING
# BEFORE: LOGIN
# KEYWORD: shutdown

# Add the following lines to /etc/rc.conf to enable telegrafb:
# telegraf_enable="YES"
#
# telegraf_enable (bool): Set to YES to enable telegraf
# Default: NO
# telegraf_conf (str): telegraf configuration file
# Default: ${PREFIX}/etc/telegraf.conf
# telegraf_flags (str): Extra flags passed to telegraf

. /etc/rc.subr

name="telegraf"
rcvar=telegraf_enable
load_rc_config $name

: ${telegraf_enable:="YES"}
: ${telegraf_flags:="-quiet"}
: ${telegraf_conf:="/mnt/Volume1/home/system/telegraf/${name}.conf"}

# daemon
start_precmd=telegraf_prestart
pidfile="/var/run/${name}.pid"
command=/usr/sbin/daemon
command_args="-crP ${pidfile} /mnt/Volume1/home/system/telegraf/${name} ${telegraf_flags} -config=${telegraf_conf} >> /var/log/telegraf.log 2>&1"

telegraf_prestart() {
# Have to empty rc_flags so they don't get passed to daemon(8)
 rc_flags=""
}

run_rc_command "$1"

d. Edit the configuration file (telegraf.conf) according to your needs

e. Create a link in /usr/local/etc/rc.d for telegraf

ln -s /mnt/Volume1/home/system/telegraf/telegraf.init /usr/local/etc/rc.d/telegraf

f. Start the service

service telegraf start

g. Check the output of the logs to make sure it’s working

# tail /var/log/telegraf.log
2020-10-25T17:34:33Z I! Starting Telegraf 1.16.0

h. Create an “Init/Shutdown Script” (change to reflect your path)

ln -s /mnt/Volume1/home/system/telegraf/telegraf.init /usr/local/etc/rc.d/telegraf ; service telegraf start


Credits:

How to Configure a UPS Between Linux and FreeNAS

Linux Shell Hardware FreeNAS

After having bad memory modules due to power outages at home (on both my FreeNAS and my desktop) I knew I had to invest on a UPS. I will cover the details of the configuration, and as usual, I will try to go straight to the point.

Keep in mind that I’m running Arch on my desktop and FreeNAS-11.3-U1, and that the UPS is connected to my desktop (master) via USB.

Instructions

Hardware

a. Connect the USB cable to your computer

b. Make sure that you can see it with lsusb

➤ lsusb | grep -i ups
Bus 002 Device 008: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS

Desktop

Configuring the UPS

a. Start by installing nut and nut-monitor (if you want a nice GUI to view UPS status)

local/nut 2.7.4-2
    A collection of programs which provide a common interface for monitoring and administering UPS, PDU and SCD
    hardware

local/nut-monitor 2.7.4-3
    GUI to manage devices connected a NUT server

b. Next run nut-scanner to see if identifies your UPS. This will give a start configuration that we can use for the UPS

➤ sudo nut-scanner -U
Scanning USB bus.
[nutdev1]
	driver = "usbhid-ups"
	port = "auto"
	vendorid = "0764"
	productid = "0501"
	product = "SL Series"
	vendor = "CPS"
	bus = "002

c. Edit /etc/nut/ups.conf and add the output from the previous step. You can change the device name (between []) to anything you like

Tip: If you only have one UPS keep the name simple so it’s easy for your to remember and type it in the terminal.

[CyberPowerSL700U]
	driver = "usbhid-ups"
	port = "auto"
	vendorid = "0764"
	productid = "0501"
	product = "SL Series"
	vendor = "CPS"
	bus = "002"

d. Start the USB driver

➤ sudo upsdrvctl start

d.1. If you get an error here you might need to setup a udev rule for your UPS. This can be done by creating the file /etc/udev/rules.d/50-ups.rules with the content below

SUBSYSTEM=="usb", ATTR{idVendor}=="XXXX", ATTR{idProduct}=="YYYY", SYMLINK+="ups0", GROUP="nut"

Make sure to add the vendor ID and product ID for the UPS (both lsusb and nut-scanner -U can provide that info). This what mine looks like:

➤ cat /etc/udev/rules.d/50-ups.rules
SUBSYSTEM=="usb", ATTR{idVendor}=="0764", ATTR{idProduct}=="0501", SYMLINK+="ups0", GROUP="nut"

d.2. After that reload udev rules and try to start the UPS USB driver again

➤ sudo udevadm control --reload-rules && sudo udevadm trigger
➤ sudo upsdrvctl start

e. Make sure that you can view the UPS status with upsc

➤ upsc CyberPowerSL700U
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 20
battery.mfr.date: CPS
battery.runtime: 599
battery.runtime.low: 300
battery.type: PbAcid
battery.voltage: 14.2
battery.voltage.nominal: 12
device.mfr: CPS
device.model: SL Series
device.type: ups
driver.name: usbhid-ups
driver.parameter.bus: 002
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: SL Series
driver.parameter.productid: 0501
driver.parameter.synchronous: no
driver.parameter.vendor: CPS
driver.parameter.vendorid: 0764
driver.version: 2.7.4
driver.version.data: CyberPower HID 0.4
driver.version.internal: 0.41
input.transfer.high: 140
input.transfer.low: 96
input.voltage: 115.0
input.voltage.nominal: 0
output.voltage: 115.0
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.load: 44
ups.mfr: CPS
ups.model: SL Series
ups.productid: 0501
ups.realpower.nominal: 375
ups.status: OL
ups.timer.shutdown: -60
ups.timer.start: -60
ups.vendorid: 0764

Configuring the NUT Server

We are going to configure it as the master (meaning that it will send the shutdown command to the listening clients when the battery is low).

Info: If you want to know what the shutdown process looks like, take a look at the Official NUT Document: Shutdown design

a. Edit /etc/nut/nut.conf and set the MODE to netserver

MODE=netserver

b. Now let’s configure the listening IP and port. Edit /etc/nut/upsd.conf, edit theLISTEN directive as below:

LISTEN 0.0.0.0 3493

c. Configure the users in /etc/nut/upsd.users

Add passwords under my_master_password and my_slave_password

[upsmaster]
  # Administrative user
  password = my_master_password

  # Allow changing values of certain variables in the UPS
  actions = SET

  # Allow setting the "Forced Shutdown" flag in the UPS
  actions = fsd

  # Allow all instant commands
  upsmon master

[upsremote]
  password = my_slave_password
  upsmon slave

d. Now let’s tell upsmon to monitor the UPS. Edit /etc/nut/upsmon.conf with the lines below

Make sure to change the UPS name (ups_name) and master password (my_master_password)

MONITOR ups_name@localhost 1 [master user] [master password] master

e. Check the status of the NUT server with systemctl status nut-server

f. Enable the NUT server and monitor

➤ sudo systemctl enable --now nut-server.service
➤ sudo systemctl enable --now nut-monitor.service

Timing

It’s important that we understand how the shutdown process works, and how much time each of the slaves take to power down. At the end of the process the master node will send a shutdown command to the UPS, and the last thing we want is to cutoff power to one of our slaves.

I have 3 devices connected to my UPS. I have timed their shutdown so I now how much time I need:

  • Desktop - 30s
  • FreeNAS - 60s (without drives)
  • CentOS - 30s

On top of that, I also need to take in consideration the UPS low battery run time (what the low battery threshold is and how long it can run), and what the shutdown delay should be.

Oversimplified chronogram

UPS       |####################### Low Battery Run Time ######################|
          |                             ########### Shutdown Delay ###########|
          |
Master    |######### Shutdown ##########|
(Desktop) |
          |
Slave1    |####################### Shutdown #########################|
(FreeNAS) |
          |
Slave2    |########## Shutdown ##########|
(Centos)  |
          -------------------------------|---------------------------|--------|
   Time                                 30s                         60s      90s  
Let’s get started

a. Start by timing your device with a normal shutdown

b. Now we can change our configuration, and that will vary with your device

Option 1: Use upsrw

You can use upsrw to set the variable, but that sometimes will not work as it only sets the value for the OS and doesn’t write back to the UPS (which is the same for all 3 cases). The variables we are looking for are:

  • battery.charge.low - Threshold in battery percentage that the UPS should change it’s status to LB (low battery)
  • battery.runtime.low - Threshold in battery run time (seconds) that the UPS should change it’s status to LB (low battery)
  • ups.delay.shutdown - Additional delay after the Master has shutdown and the UPS will wait to cutoff power (the important one we just covered)
  • ups.delay.start - This is how long the UPS should wait to restore power. For home use I usually set this to high

You can set them with upsrw -s [variable=value] -u [UPS admin user] [ups name]. For example:

upsrw -s ups.delay.shutdown=60 -u upsmaster CyberPowerSL700U

⚠️ WARNING: The main problem with this process is that it does not survive restarts.

Option 2: Use the override directive

You can use the override directive in /etc/nut/ups.conf to override levels reported back from the UPS. Similar to the option above, this will only update the outside world (the master device and not the UPS itself). See UPS.CONF(5) for more info.

Option 3: Use offdelay

This is the method that worked the best for my battery model. I have configured the following 3 variables in /etc/nut/ups.conf.

  • offdelay - How long to wait to cut power - overrides ups.delay.shutdown
  • ondelay - How long to wait before turning on devices again - overrides ups.delay.start
  • lowbatt - Overrides battery.charge.low

Here’s my updated /etc/nut/ups.conf

[CyberPowerSL700U]
	driver = "usbhid-ups"
	port = "auto"
	vendorid = "0764"
	productid = "0501"
	product = "SL Series"
	vendor = "CPS"
	bus = "002"
  offdelay = "90" # How long to wait to cut power - overrides ups.delay.shutdown
  ondelay = "50" # How long to wait before turning on devices again - overrides ups.delay.start
  lowbatt = "30" # Overrides battery.charge.low

c. Restart nut-server.service

➤ sudo systemctl restart nut-server.Services

Configuring FreeNAS as the client/slave

a. Login to your FreeNAS and go to Services. Enable the UPS service, set it to start automatically and then click on the edit button

b. Make the following configuration

  • UPS Mode - Slave
  • Identifier - Name for the UPS
  • Remote Host - IP of the master
  • Remote Port - Port
  • Driver - Look for your UPS model. I could not find my model, so I typed in the driver that I’m using on the master host (usbhid-ups) *
  • Port or Hostname - Hostname of the master host

⚠️ WARNING: While this allowed me to save my configuration and the shutdown works, when I access the config page the driver field is blank. This is different than on older versions of FreeNAS where the driver option was only available in Master mode. Beware that it may create problems if you also decide to go this route.

  • Shutdown Mode - I chose ‘UPS reaches low battery’
  • Shutdown Timer - 30 (default)
  • Shutdown Command - I’m using /sbin/shutdown -p now
  • No Communication Warning Time - 60 (default)
  • Monitor User - The slave user you configured in ups.conf on the master
  • Monitor Password - The slave password you configured in ups.conf on the master

Check the ‘Send Email Status Updates’ if you’d like to received emails (also add your email address). Click on save.

c. Check the connection to the UPS with the command below

upsc ups_name@ip:port

d. Check that FreeNAS is actually monitoring the UPS (/var/log/nut/ups.log)

OL - On line (no power failure)

OB - On battery

LB - Low battery

e. Log back to your master machine and confirm that you can see FreeNAS as a connected client (upsc -c)

Assuming that my FreeNAS IP is 192.168.10.20

$ upsc -c ups_name@localhost
192.168.10.20

Tip: You can also check the status of nut-server.service and it will show you the connected clients

Bonus: Setup CentOS as client/slave

As I mentioned before, I also have a CentOS server that connects to this UPS. So here’s how to configure it.

a. Install nut-client

# yum install -y nut-client

b. Add the monitor line to /etc/ups/upsmon.conf

MONITOR CyberPowerSL700U@10.13.15.200 1 [slave user] [slave pass] slave

c. Enable and start the service

# systemctl enable --now nut-monitor.service

d. Check that you can see the new client in the master (upsc -c [ups name])

# upsc -c CyberPowerSL700U
127.0.0.1
192.168.10.20  # FreeNAS
192.168.10.21  # CentOS

Testing

Now let’s test see if all works. Instead of pulling the cord we can instead run upsmon -c fsd on the master (this will save wear and tear on your battery). Hopefully we have configured our delay time properly and we should see all our devices powering off, and eventually the UPS as well.

Tip: Because I have quick removable drive bays on my FreeNAS server, I have removed all drives before testing (to be extra careful and to avoid any issues).

➤ sudo upsmon -c fsd
code with