BACK TO ENTRIES
LOG_ENTRY: 2026.03.11 · 20 TELEMETRY_HITS

How a Dead VPN Tunnel Took Down My Deployment Pipeline

B

Omobayonle Ogundele

MAIN_NODE: DEVOPS_ENGINEER

It was the same night I had just fixed the Harbor IP issue. I was feeling good —
pipeline was green, Harbor was back up, code was pushing. I triggered a fresh
deployment and watched Drone CI kick off the build.

Build step — ✅
Push to Harbor — ✅
Deploy step — ❌

2026/03/10 23:21:34 dial tcp 10.0.0.1:22: i/o timeout

My heart sank. We were so close.


My Setup (Quick Recap)

For context, here's how my deployment infrastructure is wired:

  • Gitea — self-hosted Git server on my local machine
  • Drone CI — self-hosted CI/CD runner, also local
  • Harbor — private Docker registry, also local
  • Oracle Cloud — where the portfolio site actually lives
  • WireGuard — VPN tunnel connecting my local machine to Oracle Cloud

The deploy step works by SSHing into the Oracle Cloud server via a private
WireGuard tunnel IP 10.0.0.1. Instead of exposing port 22 directly to the
internet, all SSH traffic goes through the encrypted VPN tunnel. It's more
secure and cleaner.

When the tunnel is up, 10.0.0.1 is reachable. When it's down, it's completely
unreachable — like the server doesn't exist.


The Error

dial tcp 10.0.0.1:22: i/o timeout

An i/o timeout on port 22 usually means one of two things:

  1. The server is down
  2. The network path to the server is broken

I knew the Oracle server was fine — I had just accessed it minutes earlier.
So the network path was broken. And the only thing between Drone and the
Oracle server was the WireGuard tunnel.


Diagnosing the Problem

I SSHed into the Oracle server directly using its public IP to check the
WireGuard status from the server side:

ssh ubuntu@129.146.31.124
sudo wg show

The output told me everything:

interface: wg0
  public key: k4x517u/PCs3CyyxsdoayvlvJTT56xPGLRWyO/dtpnw=
  private key: (hidden)
  listening port: 51820

peer: Gw3AfCezkMigE/jyB6JB/4VJBaZ9b7D7Tf3pTUoyoXY=
  endpoint: 45.222.98.68:46876
  allowed ips: 10.0.0.2/32
  latest handshake: 1 day, 13 hours, 27 minutes, 7 seconds ago
  transfer: 470.41 MiB received, 21.83 MiB sent

1 day, 13 hours, 27 minutes ago.

That's the smoking gun. The latest handshake timestamp shows when the two
WireGuard peers last successfully communicated. Over a day ago meant the tunnel
had been dead since yesterday — probably since my machine restarted or suspended.

The Oracle server was sitting there waiting for a connection that was never
coming because my local WireGuard interface had gone down and never came back up.


Why Did the Tunnel Die?

WireGuard is a stateless VPN. It doesn't maintain a persistent connection the
way OpenVPN does. When your machine suspends, hibernates or restarts, the
WireGuard interface goes down. When it comes back up — it doesn't automatically
reconnect unless you've configured it to do so.

My setup had persistent keepalive set to 25 seconds which helps keep the
tunnel alive during idle periods. But it can't survive a full system restart
or suspend without systemd bringing the interface back up automatically.


The Fix

On my local machine I restarted WireGuard manually:

sudo wg-quick down wg0
sudo wg-quick up wg0

Then verified the tunnel was back:

sudo wg show
interface: wg0
  public key: Gw3AfCezkMigE/jyB6JB/4VJBaZ9b7D7Tf3pTUoyoXY=
  private key: (hidden)
  listening port: 60712

peer: k4x517u/PCs3CyyxsdoayvlvJTT56xPGLRWyO/dtpnw=
  endpoint: 129.146.31.124:51820
  allowed ips: 10.0.0.0/24
  latest handshake: 9 seconds ago
  transfer: 92 B received, 180 B sent
  persistent keepalive: every 25 seconds

9 seconds ago. Tunnel was alive.

Retriggered the pipeline:

git commit --allow-empty -m "retrigger pipeline"
git push

Build passed. Image deployed. Site was live. ✅


The Permanent Fix

To make sure WireGuard comes back up automatically after every reboot I
enabled it as a systemd service:

sudo systemctl enable wg-quick@wg0

Now every time the machine starts, systemd brings up the WireGuard interface
automatically before any other services try to use it. The tunnel is always
up and the pipeline never fails for this reason again.

Verify it's enabled:

sudo systemctl status wg-quick@wg0

How to Debug This in the Future

If you ever see dial tcp <ip>:22: i/o timeout in a pipeline that uses
WireGuard, here's the exact checklist:

Step 1 — Check if the tunnel is alive:

sudo wg show

Look at latest handshake. If it's more than a few minutes ago the tunnel
is probably dead.

Step 2 — Restart WireGuard:

sudo wg-quick down wg0
sudo wg-quick up wg0

Step 3 — Verify the handshake:

sudo wg show
# latest handshake should be a few seconds ago

Step 4 — Retrigger the pipeline:

git commit --allow-empty -m "retrigger pipeline"
git push

Step 5 — Prevent it happening again:

sudo systemctl enable wg-quick@wg0

Lessons Learned

1. VPN tunnels are not self-healing by default.
WireGuard is lightweight and fast but it doesn't maintain state across
reboots. Always configure it as a systemd service on machines that restart.

2. The latest handshake field is your best debugging tool.
When a WireGuard tunnel is dead, sudo wg show tells you exactly how long
it's been down. No guessing needed.

3. i/o timeout on SSH almost always means a network path problem.
Not a server problem, not a credentials problem — the packets are simply
not getting through. Think about what's between the client and the server.

4. Running CI/CD infrastructure on a laptop is inherently fragile.
Laptops suspend, restart and lose network connections constantly. Production
CI/CD infrastructure should run on always-on servers. This is fine for
learning and homelab setups but something to keep in mind as you scale.

5. Two failures in one night is a learning opportunity.
In the same session I debugged a DHCP IP change breaking Harbor AND a dead
WireGuard tunnel breaking deployments. Both were network issues. Both taught
me to always think about the network path first when infrastructure stops
responding.


What's Next

I'm planning to move my CI/CD infrastructure off my local machine entirely
and onto dedicated cloud servers. That means:

  • Harbor on a VPS with a fixed IP
  • Drone CI on a VPS
  • No more dependency on my laptop being on and connected

Until then the homelab setup keeps teaching me things you can't learn from
tutorials. Every breakage is a blog post. 🚀


If you found this useful or have a better way to handle WireGuard reliability,
drop a comment below. Always learning.

B

Omobayonle Ogundele

DevOps Engineer based in Lagos, Nigeria. Building reliable infrastructure and sharing logs from the edge of production.

Comments (0)

No comments yet. Be the first!

LEAVE_RESPONSE