Troubleshooting: Podman OpenArchiver Redis Service Failure

by Admin 59 views
Troubleshooting Podman OpenArchiver Redis Service Failure

Hey guys! Today, we're diving into a common issue that many of you might encounter: the failure of the podman-openarchiver-redis.service. This can be a real headache, especially when you're relying on this service for your applications. So, let's break down a recent service failure report, figure out what went wrong, and how to fix it.

Understanding the Service Failure Report

First off, let's take a look at the failure report for podman-openarchiver-redis.service on the storage host. This report gives us a ton of crucial information about what happened. Here’s the gist of it:

  • Service: podman-openarchiver-redis.service
  • Host: storage
  • Time: 2025-11-06 22:40:57
  • Failure Count: 1
  • Exit Code: 137 (SIGKILL - killed)

Exit Code 137 is a big clue here. It means the service was terminated by a SIGKILL signal, which typically happens when a process is killed due to out-of-memory errors or other critical issues. This is our starting point for digging deeper.

Diving into the Service Status

Let's check the service status to get more context. Here’s a snippet from the service status:

Ɨ podman-openarchiver-redis.service
     Loaded: loaded (/etc/systemd/system/podman-openarchiver-redis.service; enabled; preset: ignored)
     Active: failed (Result: exit-code) since Thu 2025-11-06 22:38:56 EST; 2min 0s ago
   Duration: 1d 7h 52min 21.101s
 Invocation: 680ab31c5c1e43c8b132014a5042e33b
    Process: 2874758 ExecStart=/nix/store/b1lrhk245mbmjlbzd28200i3rnspl603-unit-script-podman-openarchiver-redis-start/bin/podman-openarchiver-redis-start (code=exited, status=137)
    Process: 831085 ExecStop=/nix/store/xyps01lhpw19lkh2rdcc8hhgadwpnmy7-unit-script-podman-openarchiver-redis-pre-stop/bin/podman-openarchiver-redis-pre-stop (code=exited, status=0/SUCCESS)
    Process: 831426 ExecStopPost=/nix/store/ssb1zrrrmikc5ik5azvjy8fs3q9aw3am-unit-script-podman-openarchiver-redis-post-stop/bin/podman-openarchiver-redis-post-stop (code=exited, status=0/SUCCESS)
   Main PID: 2874758 (code=exited, status=137)
         IP: 0B in, 2.5K out
         IO: 9.2M read, 2.3M written
   Mem peak: 42.7M (swap: 104K)
        CPU: 5.645s

Nov 06 22:38:53 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:53.159 # Background saving error
Nov 06 22:38:56 storage podman-openarchiver-redis-pre-stop[831089]: time="2025-11-06T22:38:56-05:00" level=warning msg="StopSignal SIGTERM failed to stop container openarchiver-redis in 10 seconds, resorting to SIGKILL"
Nov 06 22:38:56 storage podman[831089]: 2025-11-06 22:38:56.147062731 -0500 EST m=+10.053468496 container died a4527bc94755488648d2a54892298e37e517e40a30a2ccfef0817cc13e80f5a3 (image=docker.io/valkey/valkey:8-alpine, name=openarchiver-redis, PODMAN_SYSTEMD_UNIT=podman-openarchiver-redis.service)
Nov 06 22:38:56 storage podman[831089]: 2025-11-06 22:38:56.405368663 -0500 EST m=+10.311774436 container remove a4527bc94755488648d2a54892298e37e517e40a30a2ccfef0817cc13e80f5a3 (image=docker.io/valkey/valkey:8-alpine, name=openarchiver-redis, PODMAN_SYSTEMD_UNIT=podman-openarchiver-redis.service)
Nov 06 22:38:56 storage podman-openarchiver-redis-pre-stop[831089]: a4527bc94755488648d2a54892298e37e517e40a30a2ccfef0817cc13e80f5a3
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Main process exited, code=exited, status=137/n/a
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Failed with result 'exit-code'.
Nov 06 22:38:56 storage systemd[1]: Stopped podman-openarchiver-redis.service.
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Consumed 5.645s CPU time, 42.7M memory peak, 104K memory swap peak, 9.2M read from disk, 2.3M written to disk, 2.5K outgoing IP traffic.
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Triggering OnFailure= dependencies.

Key observations:

  • The service failed with exit-code 137.
  • The podman-openarchiver-redis-pre-stop script issued a SIGKILL after a SIGTERM failed to stop the container within 10 seconds.
  • The memory peak reached 42.7M with a swap usage of 104K.

Examining Recent Logs

To really get to the bottom of this, let’s dive into the logs. Log snippets can reveal specific errors or warnings that led to the failure.

Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Triggering OnFailure= dependencies.
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Consumed 5.645s CPU time, 42.7M memory peak, 104K memory swap peak, 9.2M read from disk, 2.3M written to disk, 2.5K outgoing IP traffic.
Nov 06 22:38:56 storage systemd[1]: Stopped podman-openarchiver-redis.service.
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Failed with result 'exit-code'.
Nov 06 22:38:56 storage systemd[1]: podman-openarchiver-redis.service: Main process exited, code=exited, status=137/n/a
Nov 06 22:38:56 storage podman-openarchiver-redis-pre-stop[831089]: a4527bc94755488648d2a54892298e37e517e40a30a2ccfef0817cc13e80f5a3
Nov 06 22:38:56 storage podman[831089]: 2025-11-06 22:38:56.405368663 -0500 EST m=+10.311774436 container remove a4527bc94755488648d2a54892298e37e517e40a30a2ccfef0817cc13e80f5a3 (image=docker.io/valkey/valkey:8-alpine, name=openarchiver-redis, PODMAN_SYSTEMD_UNIT=podman-openarchiver-redis.service)
Nov 06 22:38:56 storage podman[831089]: 2025-11-06 22:38:56.147062731 -0500 EST m=+10.053468496 container died a4527bc94755488648d2a54892298e37e517e40a30a2ccfef0817cc13e80f5a3 (image=docker.io/valkey/valkey:8-alpine, name=openarchiver-redis, PODMAN_SYSTEMD_UNIT=podman-openarchiver-redis.service)
Nov 06 22:38:56 storage podman-openarchiver-redis-pre-stop[831089]: time="2025-11-06T22:38:56-05:00" level=warning msg="StopSignal SIGTERM failed to stop container openarchiver-redis in 10 seconds, resorting to SIGKILL"
Nov 06 22:38:53 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:53.159 # Background saving error
Nov 06 22:38:53 storage openarchiver-redis[2874758]: 14025:C 06 Nov 2025 22:38:53.059 # Failed opening the temp RDB file temp-14025.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:53 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:53.059 * Background saving started by pid 14025
Nov 06 22:38:53 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:53.058 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:47 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:47.201 # Background saving error
Nov 06 22:38:47 storage openarchiver-redis[2874758]: 14024:C 06 Nov 2025 22:38:47.101 # Failed opening the temp RDB file temp-14024.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:47 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:47.100 * Background saving started by pid 14024
Nov 06 22:38:47 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:47.099 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:46 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:46.192 # Errors trying to shut down the server. Check the logs for more information.
Nov 06 22:38:46 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:46.192 # Error trying to save the DB, can't exit.
Nov 06 22:38:46 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:46.192 # Failed opening the temp RDB file temp-1.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:46 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:46.192 * Saving the final RDB snapshot before exiting.
Nov 06 22:38:46 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:46.192 * User requested shutdown...
Nov 06 22:38:46 storage openarchiver-redis[2874758]: 1:signal-handler (1762486726) Received SIGTERM scheduling shutdown...
Nov 06 22:38:46 storage systemd[1]: Stopping podman-openarchiver-redis.service...
Nov 06 22:38:41 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:41.148 # Background saving error
Nov 06 22:38:41 storage openarchiver-redis[2874758]: 14023:C 06 Nov 2025 22:38:41.047 # Failed opening the temp RDB file temp-14023.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:41 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:41.047 * Background saving started by pid 14023
Nov 06 22:38:41 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:41.046 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:35 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:35.192 # Background saving error
Nov 06 22:38:35 storage openarchiver-redis[2874758]: 14022:C 06 Nov 2025 22:38:35.091 # Failed opening the temp RDB file temp-14022.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:35 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:35.091 * Background saving started by pid 14022
Nov 06 22:38:35 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:35.090 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:29 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:29.132 # Background saving error
Nov 06 22:38:29 storage openarchiver-redis[2874758]: 14021:C 06 Nov 2025 22:38:29.031 # Failed opening the temp RDB file temp-14021.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:29 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:29.031 * Background saving started by pid 14021
Nov 06 22:38:29 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:29.030 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:23 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:23.178 # Background saving error
Nov 06 22:38:23 storage openarchiver-redis[2874758]: 14020:C 06 Nov 2025 22:38:23.077 # Failed opening the temp RDB file temp-14020.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:23 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:23.077 * Background saving started by pid 14020
Nov 06 22:38:23 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:23.076 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:17 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:17.122 # Background saving error
Nov 06 22:38:17 storage openarchiver-redis[2874758]: 14019:C 06 Nov 2025 22:38:17.021 # Failed opening the temp RDB file temp-14019.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:17 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:17.021 * Background saving started by pid 14019
Nov 06 22:38:17 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:17.020 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:11 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:11.168 # Background saving error
Nov 06 22:38:11 storage openarchiver-redis[2874758]: 14018:C 06 Nov 2025 22:38:11.067 # Failed opening the temp RDB file temp-14018.rdb (in server root dir /data) for saving: Permission denied
Nov 06 22:38:11 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:11.067 * Background saving started by pid 14018
Nov 06 22:38:11 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:11.066 * 1 changes in 3600 seconds. Saving...
Nov 06 22:38:05 storage openarchiver-redis[2874758]: 1:M 06 Nov 2025 22:38:05.119 # Background saving error
Nov 06 22:38:05 storage openarchiver-redis[2874758]: 14017:C 06 Nov 2025 22:38:05.019 # Failed opening the temp RDB file temp-14017.rdb (in server root dir /data) for saving: Permission denied

Here’s what we can gather:

  • Multiple "Background saving error" messages.
  • Consistent "Failed opening the temp RDB file" errors due to "Permission denied".
  • The service received a SIGTERM, but failed to stop within 10 seconds, leading to a SIGKILL.

Potential Causes and Solutions

Okay, so what’s really going on here? Based on the logs, we can identify a couple of key issues:

1. Permission Issues

The most glaring issue is the permission denied error when trying to save the RDB file. This suggests that the Redis process doesn't have the necessary permissions to write to the /data directory. Let's fix this:

  • Solution: Check the permissions of the /data directory. You’ll want to ensure that the user running the Redis container has read and write access. Use commands like ls -l /data to inspect permissions and chown or chmod to adjust them. For example:

    sudo chown -R redis_user:redis_group /data
    sudo chmod -R 775 /data
    

    Replace redis_user and redis_group with the actual user and group under which your Redis process runs.

2. Resource Limits (Memory)

Although the memory peak seems reasonable at 42.7M, the SIGKILL (exit code 137) indicates that the process might have been killed due to memory constraints or other resource limits imposed by the container runtime or the system. Let's investigate this angle.

  • Solution:
    • Check Resource Limits: Review the resource limits set for the Podman container. You can do this by inspecting the Podman service definition or using podman inspect on the container.

      podman inspect <container_name_or_id> | grep -i memory
      
    • Increase Memory Limits: If the memory limit is too low, increase it. You can adjust the memory limits in your Podman service file (e.g., the systemd unit file) or when you run the container.

      podman run -d --memory=128m --name openarchiver-redis <your_image>
      
    • Monitor Memory Usage: Use tools like top, htop, or Podman stats to monitor the memory usage of the container over time. This will help you understand if the current limits are sufficient.

3. Background Saving Issues

The logs show recurring ā€œBackground saving errorā€ messages. Redis uses background saving to persist data to disk without blocking the main process. If background saving fails repeatedly, it can lead to instability.

  • Solution:
    • Address Permissions First: Since the primary error is permission-related, fixing the permissions on the /data directory should resolve this issue. If not, further investigate disk space or I/O issues.
    • Check Disk Space: Ensure there is enough free disk space for Redis to create temporary RDB files during the saving process.
    • Monitor I/O: High disk I/O can also cause background saving to fail. Monitor disk I/O using tools like iotop or iostat.

Steps to Resolve the Issue

Alright, let's put together a step-by-step guide to fix this:

  1. Check and Correct Permissions:

    • Use ls -l /data to inspect permissions.
    • Use sudo chown -R redis_user:redis_group /data to change ownership.
    • Use sudo chmod -R 775 /data to adjust permissions.
  2. Review Resource Limits:

    • Use podman inspect <container_name_or_id> | grep -i memory to check memory limits.
  3. Increase Memory Limits (If Necessary):

    • Modify your Podman service file or use podman run -d --memory=128m.
  4. Monitor Memory Usage:

    • Use top, htop, or Podman stats to monitor container memory usage.
  5. Restart the Service:

    • After making changes, restart the service using sudo systemctl restart podman-openarchiver-redis.service.
  6. Check Logs Again:

    • Monitor the logs using journalctl -u podman-openarchiver-redis.service -f to ensure the service is running smoothly and there are no new errors.

Conclusion

Service failures like this can be intimidating, but by systematically analyzing the logs and service status, we can pinpoint the root causes and implement effective solutions. In this case, permission issues and potential resource limits were the primary culprits. By addressing these, we can get the podman-openarchiver-redis.service back up and running smoothly.

Remember, troubleshooting is a process. Always start with the logs, understand the error messages, and work through potential solutions methodically. You got this! If you have any questions or run into other issues, don't hesitate to reach out. Happy troubleshooting!