Skip to content
Back to Blog
AI Self-Hosting Docker DevOps

Building a Self-Hosted AI Stack on a $300 Home Server

Nur Ikhwan Idris · · 8 min read

I run my own AI stack at home. Local LLM inference, a ChatGPT-like web interface, private search, network-wide ad blocking, monitoring dashboards, and automated workflows — all on a repurposed desktop that cost me about RM1,400 (roughly $300 USD). No monthly subscriptions. No data leaving my network unless I want it to. This post is the full breakdown of how it works, so you can build one too.

Total running cost: electricity. That's it. Compared to $50-100/month for equivalent cloud services (ChatGPT Plus, Cloudflare paid plans, monitoring SaaS), the hardware pays for itself in about three months.

1. Why Self-Host AI?

Four reasons, in order of importance to me:

  • Privacy. My prompts, documents, and search queries stay on my machine. Nothing gets sent to OpenAI, Google, or anyone else unless I explicitly choose to.
  • Cost. ChatGPT Plus is $20/month. Claude Pro is $20/month. GitHub Copilot is $10/month. Add monitoring and cloud storage and you're easily at $50-100/month. A home server is a one-time cost.
  • Control. I pick the models. I pick the versions. I decide when to update. No vendor pulling a model from under me or changing pricing overnight.
  • Learning. Running this stack has taught me more about Docker, networking, DNS, reverse proxies, GPU drivers, and Linux administration than any course could. It's a production environment that I actually use every day.

2. The Hardware

Nothing fancy. This is a repurposed desktop PC — the kind you can find secondhand for cheap. The key specs:

  • GPU: NVIDIA GeForce GTX 1070 with 8GB VRAM — this is the most important component for local LLM inference
  • OS: Ubuntu 24.04 LTS (Server)
  • Storage: Mounted at /mnt/storage/ for all persistent data
  • Hostname: khadam (Arabic for "servant" — seemed fitting)
  • Static IP: 192.168.0.15, locked via netplan so it never drifts

The GTX 1070 is old but still perfectly capable for 7B parameter models. It handles qwen2.5:7b comfortably with room to spare. More on GPU limitations later.


3. The AI Stack

Ollama — LLM Inference Engine

Ollama is the backbone. It manages model downloads, handles GPU offloading, and exposes a local API for inference. Think of it as Docker but for language models — you ollama pull a model and it just works.

# /opt/ollama/docker-compose.yml
services:
  ollama:
    image: ollama/ollama:0.6.2
    container_name: ollama
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"
    volumes:
      - /mnt/storage/ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Note the 127.0.0.1 bind. This is critical — Ollama's default is 0.0.0.0, which exposes it to your entire network with no authentication. I found this the hard way during a security audit and locked it down immediately. Only nginx and other local containers should talk to Ollama.

Open WebUI — The Chat Interface

Open WebUI gives you a ChatGPT-like web interface that talks to Ollama. It supports multiple models, conversation history, web search integration, and user accounts.

# /opt/openwebui/docker-compose.yml
services:
  openwebui:
    image: ghcr.io/open-webui/open-webui:v0.5.20
    container_name: openwebui
    restart: unless-stopped
    ports:
      - "127.0.0.1:8200:8080"
    volumes:
      - /mnt/storage/openwebui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

SearXNG — Private Search

SearXNG is a self-hosted metasearch engine. It aggregates results from Google, Bing, DuckDuckGo, and others without sending your queries to any of them directly — it acts as a proxy. I've integrated it with Open WebUI so the AI can search the web when it needs current information, all through my own search instance.

# /opt/searxng/docker-compose.yml
services:
  searxng:
    image: searxng/searxng:2024.12.23-66d498bd7
    container_name: searxng
    restart: unless-stopped
    ports:
      - "127.0.0.1:8888:8080"
    volumes:
      - /mnt/storage/searxng:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:8888/

4. Docker Setup

Every service runs in Docker. Every service gets its own compose file at /opt/<service>/docker-compose.yml. All persistent data lives on /mnt/storage/. This separation means I can tear down and rebuild any service without affecting others.

The full list of services currently running:

  • AI: Ollama, Open WebUI, SearXNG
  • Productivity: Nextcloud (cloud storage), OnlyOffice (document editing), Immich (photo management), Vaultwarden (password manager)
  • Infrastructure: AdGuard Home (DNS + DHCP), Uptime Kuma (status monitoring), Portainer (container management)
  • Monitoring: Grafana, Prometheus, node_exporter, nvidia_gpu_exporter
  • Automation: n8n (workflow automation), Firecrawl (web scraping)

That's 18+ containers running simultaneously on a machine that cost $300. Docker makes this possible — each service is isolated, reproducible, and independently updatable.


5. Reverse Proxy and Public Access

All services bind to 127.0.0.1 — nothing is directly exposed to the network. Nginx handles all routing based on the subdomain:

# /etc/nginx/conf.d/ai.nurikhwanidris.my.conf
server {
    listen 443 ssl;
    server_name ai.nurikhwanidris.my;

    ssl_certificate     /etc/letsencrypt/live/nurikhwanidris.my/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/nurikhwanidris.my/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8200;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (needed for Open WebUI)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

For public access, I use a Cloudflare Tunnel instead of opening ports on my router. The traffic flow is:

Browser → Cloudflare Edge → cloudflared tunnel → nginx → Docker container

No port forwarding. No dynamic DNS. No exposed IP. Cloudflare handles DDoS protection and SSL termination at the edge, then the tunnel carries traffic securely to my server. The cloudflared daemon runs as a systemd service and reconnects automatically.

SSL is handled by a wildcard cert for *.nurikhwanidris.my via Let's Encrypt with a Cloudflare DNS challenge — one cert covers every subdomain.


6. Security

Running services from home means security isn't optional. Here's the layered approach:

  • UFW firewall: Default deny incoming. Only ports 22 (SSH, restricted to LAN and Tailscale), 53 (DNS for LAN), 80/443 (nginx), and 41641/udp (Tailscale) are open.
  • 127.0.0.1 binding: Every container binds to localhost only. No service is reachable except through nginx.
  • Cloudflare Tunnel: No ports forwarded on the router. The tunnel is the only path from the internet to my server.
  • Tailscale: Admin services (Grafana, Portainer, AdGuard) are only accessible via Tailscale VPN or the home LAN — they are not exposed through the Cloudflare Tunnel.
# UFW rules summary
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       192.168.0.0/24
22                         ALLOW       100.64.0.0/10      # Tailscale
53                         ALLOW       192.168.0.0/24      # DNS for LAN
80/tcp                     ALLOW       Anywhere
443/tcp                    ALLOW       Anywhere
41641/udp                  ALLOW       Anywhere            # Tailscale

7. Monitoring

I want to know when things go wrong before they affect me. The monitoring stack is Grafana + Prometheus, with two exporters:

  • node_exporter: CPU, memory, disk, network metrics
  • nvidia_gpu_exporter: GPU temperature, VRAM usage, utilisation, power draw

Prometheus scrapes both exporters every 15 seconds. Grafana dashboards show real-time and historical data. I can see exactly how much VRAM Ollama is using for a given model, which is essential when you're working with 8GB limits.

Uptime Kuma monitors all public endpoints from inside the network. If any service goes down, I get a notification. It's available at status.nurikhwanidris.my as a public status page.


8. AdGuard Home — Network DNS and DHCP

AdGuard Home does double duty on my network. It's the DNS server for every device in the house (blocking ads and trackers at the DNS level), and it's also the DHCP server — I disabled DHCP on the router entirely.

The key trick is split-horizon DNS. I added a DNS rewrite rule so that *.nurikhwanidris.my resolves to 192.168.0.15 (the server's local IP) when queried from inside the network. This means:

  • From the internet: traffic goes through Cloudflare Tunnel (secure, filtered)
  • From home: traffic goes directly to nginx on the LAN (fast, no round trip to Cloudflare)
  • Tailscale-only services (Grafana, Portainer, AdGuard) work on the home LAN without needing Tailscale connected

One gotcha: I named the AdGuard subdomain perisai.nurikhwanidris.my instead of adguard.nurikhwanidris.my because having "adguard" in a domain triggers AdGuard's own filter lists. Took me a while to figure out why I couldn't reach my own admin panel.


9. The Automation Layer

The AI stack becomes significantly more powerful when you can automate workflows around it. Two tools handle this:

  • n8n: A self-hosted workflow automation platform (like Zapier, but free and local). I use it to chain together triggers, API calls, AI inference, and notifications. For example: monitor an RSS feed, summarise new articles with Ollama, and send the summary to a Telegram channel.
  • Firecrawl: A self-hosted web scraping engine. It handles JavaScript rendering, extracts clean text from web pages, and feeds it into n8n workflows or directly into Ollama for analysis.

The combination of n8n + Firecrawl + Ollama means I can build AI-powered automation workflows that run entirely on my own hardware. No API rate limits, no per-token costs, no data leaving my network.


10. Lessons Learned (The Hard Way)

Pin your Docker image versions

This is the single most important lesson. I was running Immich (a self-hosted Google Photos alternative) with image: immich/immich-server:release — a floating tag that always pulls the latest version. One morning, a docker compose pull grabbed a major version bump that included a breaking PostgreSQL migration. The database was wiped.

I lost photos. Not all of them (I had backups), but enough to learn the lesson permanently. Now every compose file uses pinned versions:

# WRONG - floating tag, will break eventually
image: immich/immich-server:release

# CORRECT - pinned version, upgrade deliberately
image: immich/immich-server:v1.118.2

GPU memory is a hard ceiling

The GTX 1070 has 8GB VRAM. The gemma2:9b model needs about 7.5GB — it loads, but long conversations eventually cause out-of-memory crashes. The solution is simple: use qwen2.5:7b instead. It's smaller, fits comfortably, and in my experience gives comparable quality for general assistant tasks.

If you're choosing hardware for a home AI server, VRAM is the spec that matters most. 8GB is the minimum for useful models. 16GB (RTX 4060 Ti) opens up 13B models. 24GB (RTX 3090/4090) lets you run practically anything that fits in a single GPU.

Ollama binds to 0.0.0.0 by default

This one surprised me. Ollama's default Docker configuration exposes the inference API on all interfaces — meaning anyone on your network (or the internet, if you've forwarded ports) can use your GPU for inference with zero authentication. Always bind to 127.0.0.1.

AdGuard blocks itself

If your AdGuard admin panel domain contains the word "adguard", the filter lists will block it. Use a different name. I went with "perisai" (Malay for "shield").

Stop AdGuard before editing its config

AdGuard writes to its YAML config while running. If you edit the file while the container is up, your changes get overwritten. Always docker stop adguard-app first, edit, then docker start adguard-app.


11. The Cost Breakdown

Here's what this replaces:

  • ChatGPT Plus: $20/month
  • Cloud storage (Google One / iCloud): $3-10/month
  • Password manager (1Password/Bitwarden): $3-5/month
  • Monitoring SaaS (Datadog/New Relic): $15-30/month
  • DNS filtering (NextDNS/Pi-hole cloud): $2-5/month
  • Automation (Zapier): $20-50/month

Conservative estimate: $50-100/month in cloud services, replaced by a one-time $300 hardware investment plus electricity (roughly $5-10/month for a desktop running 24/7). The server paid for itself within the first three months.

The trade-off is time. Setting this up took me several weekends of troubleshooting Docker networking, GPU driver issues, DNS loops, and firewall misconfigurations. But that time investment compounds — every new service I add takes minutes now, not hours, because the infrastructure is already in place.


12. Getting Started

If you want to replicate this, here's the minimum viable setup:

  1. Get a machine with an NVIDIA GPU. A used GTX 1070/1080 is fine. Install Ubuntu Server 24.04 and the NVIDIA Container Toolkit.
  2. Install Docker and docker-compose. Create /opt/ollama/docker-compose.yml and /opt/openwebui/docker-compose.yml using the examples above.
  3. Pull a model. Run docker exec ollama ollama pull qwen2.5:7b and you have a local LLM.
  4. Set up nginx. One vhost per service, proxy_pass to the container port.
  5. Lock it down. UFW on, all containers bound to 127.0.0.1, Cloudflare Tunnel for external access.

You can have a working local AI chat interface in under an hour. Everything else — Nextcloud, monitoring, AdGuard, automation — is incremental. Add services as you need them. The beauty of Docker is that each one is isolated and disposable.

The server has been running for over a month now with essentially zero downtime. It handles my daily AI chat, stores my photos, manages my passwords, blocks ads on every device in the house, and monitors itself. All on a $300 machine sitting quietly in the corner.

Questions or want to see the full compose files? Reach out via the contact section.