Self-Hosted AI with Web Search: Complete Build Guide

March 1, 2026

by SwissLayer 7 min read

We run a small team that uses AI heavily for research, coding, and day-to-day work. We already had a self-hosted Ollama server running on a dedicated Ubuntu machine with five large language models loaded. Our team connects to it from various locations — browser via Open WebUI, mobile via VPN, and from agent frameworks like OpenClaw running on separate VMs.

The problem was simple: our models had no access to the internet. Ask them about current events, today's prices, recent news — and they would either hallucinate or admit they did not know. We wanted to fix this without giving up control of our infrastructure, without paying per-query API fees, and without routing our queries through third-party services.

What We Built

The goal was:

Every client on our network can ask questions that require current information
The AI automatically searches the web when needed, without the user having to do anything special
All search traffic stays within our infrastructure (no Google API keys, no Bing subscriptions)
Authentication so we know who is using what
Everything survives a reboot

This guide documents exactly how we built it, what broke along the way, and how we fixed it.

The Architecture

Before we start, here is the full picture of what we are building:

Open WebUI connects directly to:

:11434 (Ollama)
:8080 (SearXNG) via built-in integration

All other clients connect to :11436 (Middleware) which talks to:

:11434 (Ollama, native API)
:8080 (SearXNG, web search)

LiteLLM runs on :11435 connecting to :11434 (Ollama) — provides management dashboard, optional for chat.

Key components:

Ollama — already running on :11434, serves our five models
SearXNG — self-hosted meta search engine, runs in Docker on :8080, internal only
Middleware — a Python script we wrote, runs on :11436, handles authentication and the web search tool loop for all non-WebUI clients
LiteLLM — runs on :11435, provides a management dashboard and per-user key management
Open WebUI — connects directly to Ollama on :11434 and uses its own built-in SearXNG integration

Why not route everything through one component?

Open WebUI already handles Ollama and web search natively, and it handles thinking model responses correctly. There was no benefit to adding an extra layer for it. For other clients (agent frameworks, API callers, mobile apps), the middleware provides a single authenticated endpoint with web search built in.

Server Setup

Our server (brain01):

OS: Ubuntu 22.04.5 LTS
Ollama version: 0.17.2
Public IP: static (replace YOUR_SERVER_IP throughout this guide)
Models: five models loaded (Qwen3, Qwen3.5, Dolphin Mixtral variants)

Step 1: Install Docker

We need Docker to run SearXNG. Do not use the snap version — it causes permission issues. Use the official Docker repository.

Why we are doing this: SearXNG is distributed as a Docker image. Running it in Docker is the cleanest way to manage it — isolated, easily restartable, and upgradeable without affecting the host system.

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \\
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \\
  "deb [arch=$(dpkg --print-architecture) \\
  signed-by=/etc/apt/keyrings/docker.gpg] \\
  https://download.docker.com/linux/ubuntu \\
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \\
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Verify the installation:

docker --version
# Expected: Docker version 29.x.x

Step 2: Install and Configure SearXNG

What SearXNG is: SearXNG is a self-hosted meta search engine. It does not maintain its own web index. Instead, it sends your query simultaneously to multiple real search engines (Google, Bing, DuckDuckGo, Wikipedia, and others), collects the results, deduplicates them, and returns a combined response.

Why this matters for privacy: Your team's search queries never get associated with individual users' IP addresses or browser fingerprints. The search engines see traffic from one server, not from five people in different locations.

Launch the container:

sudo mkdir -p /opt/searxng
sudo docker run -d \\
  --name searxng \\
  --restart always \\
  -p 8080:8080 \\
  -e SEARXNG_BASE_URL="http://YOUR_SERVER_IP:8080/" \\
  searxng/searxng:latest

The --restart always flag means SearXNG will automatically restart if it crashes or if the server reboots.

Enable JSON Format (Critical Step)

The gotcha: SearXNG returns a 403 Forbidden error when you request JSON output by default. This is because JSON format is disabled in the default configuration — only HTML is enabled. Our middleware needs JSON, so we must enable it.

# Copy the config file out
sudo docker cp searxng:/etc/searxng/settings.yml /opt/searxng/settings.yml

# Add JSON to formats
sudo sed -i '/formats:/,/^  - html/ { /^  - html/a\\  - json\\n}' /opt/searxng/settings.yml

# Restart with config mounted
sudo docker stop searxng && sudo docker rm searxng
sudo docker run -d --name searxng --restart always \\
  -p 8080:8080 \\
  -e SEARXNG_BASE_URL="http://YOUR_SERVER_IP:8080/" \\
  -v /opt/searxng/settings.yml:/etc/searxng/settings.yml \\
  searxng/searxng:latest

Test: curl "http://localhost:8080/search?q=test&format=json"

Step 3-5: LiteLLM, Middleware, systemd

Install LiteLLM for management, write the Python middleware for tool calling loops, and set up systemd services for auto-restart. The middleware handles authentication, injects web_search tools, executes searches via SearXNG, and returns OpenAI-compatible responses.

Full implementation details, complete Python code (200+ lines), LiteLLM config, and systemd service files are available in the original PDF guide.

Step 6: Firewall (UFW)

Critical lesson: UFW processes rules top-to-bottom. Don't add explicit DENY rules for ports before ALLOW rules for trusted IPs — the denies will fire first and block your access.

Correct approach: Add ALLOW rules for trusted IPs, rely on UFW's default deny policy for everything else.

sudo ufw allow from YOUR_VPN_IP
sudo ufw allow from YOUR_HOME_IP
sudo ufw allow 22/tcp
sudo ufw enable

Step 7: Open WebUI Configuration

Configure Open WebUI to connect directly to Ollama (:11434) and SearXNG (:8080). Enable web search in Admin Panel → Settings → Web Search. Use the sparkle icon (✦) in conversations to enable search per-chat.

Troubleshooting: What Went Wrong

1. LiteLLM strips content field from thinking models
Symptom: Empty responses from Qwen3
Fix: Bypass LiteLLM for chat — middleware calls Ollama's /api/chat directly

2. SearXNG 403 on JSON requests
Cause: JSON disabled by default
Fix: Add - json to formats in settings.yml, mount as volume

3. UFW blocks trusted IPs
Cause: Explicit DENY rules before ALLOW rules
Fix: Delete explicit denies, rely on default policy

4. OpenClaw web search doesn't work
Cause: OpenClaw intercepts tool calls instead of passing them through to Ollama
Status: Works for non-search tasks; web search incompatible without toolCallPassthrough config

Final Architecture

Port	Service	Purpose
8080	SearXNG	Meta search (internal only)
11434	Ollama	LLM inference
11435	LiteLLM	Management dashboard
11436	Middleware	Auth + web search for API clients

Client routing:

Open WebUI → :11434 (Ollama) + :8080 (SearXNG) — native integration
API clients → :11436 (Middleware) — automatic web search with auth
OpenClaw → :11434 (Ollama) — no web search (tool conflict)

What's Next

HTTPS/TLS termination
Per-user usage logging and rate limits
OpenClaw tool passthrough investigation
Automated model rotation and health checks

Published by SwissLayer — self-hosted infrastructure, no $999 stand required.

Need help setting up self-hosted AI infrastructure? Contact us for dedicated servers, VPS, and AI/ML hosting in Switzerland.