← Back to Guides
🖥

Self-Hosted Ollama Setup Guide

Run your own LLM server and connect it to your website — complete privacy, zero API costs, unlimited usage.

Overview

Your Own AI Server, Your Own Rules

This guide walks you through deploying Ollama on your own server (GPU VPS or Mac), securing it behind Nginx, and connecting it to a public website via a Cloudflare Worker proxy. Your data stays on your hardware, and your API key never touches the browser.

🔒
Total Privacy
All AI processing happens on your server. No data leaves your infrastructure.
💰
Zero Per-Token Cost
No API fees, no subscriptions. Pay only for server hardware.
🌐
Website-Ready
Secure proxy layer keeps your server safe while serving public web traffic.

Architecture

How It Works

Three layers protect your Ollama server while keeping it accessible from the web.

Website
User's browser
CF Worker
Rate limit + CORS
Nginx
SSL + API key auth
Ollama
localhost:11434
Why three layers?
The browser never sees your server URL or API key. The Cloudflare Worker hides credentials and enforces rate limits. Nginx provides SSL and a second authentication layer. Ollama only listens on localhost.

Before You Begin

Prerequisites

Requirement Details
ServerGPU VPS (RunPod, Lambda, Hetzner) or Mac with Apple Silicon
Operating SystemUbuntu 22.04+ (Linux) or macOS 13+ (Mac)
RAM16 GB minimum, 32 GB+ recommended
DomainA domain pointing to your server (e.g. ollama.example.com)
Cloudflare AccountFree tier is sufficient for the Worker proxy
SSL CertificateLet's Encrypt (free) or Cloudflare Origin Certificate

Step 1

Install Ollama & Pull Models

Linux (VPS)

bash# Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Pull a model ollama pull llama3.1:8b # Verify installation ollama list

macOS

bash# Install via Homebrew brew install ollama # Or download from https://ollama.com/download # Pull a model ollama pull llama3.1:8b
Note
By default Ollama listens on 127.0.0.1:11434. Since Nginx runs on the same machine, no change is needed. If they're on different hosts, set OLLAMA_HOST=0.0.0.0 in the systemd unit or via launchctl.

Step 2

Configure Nginx Reverse Proxy

Nginx provides SSL termination, API key authentication, and rate limiting in front of Ollama.

Install Nginx & Certbot

bash# Ubuntu/Debian sudo apt update && sudo apt install -y nginx certbot python3-certbot-nginx # macOS brew install nginx

Nginx Configuration

Create /etc/nginx/sites-available/ollama-api:

nginxupstream ollama { server 127.0.0.1:11434; } # Rate limiting: 15 requests per minute per IP limit_req_zone $binary_remote_addr zone=ollama_limit:10m rate=15r/m; server { listen 80; server_name ollama.example.com; location /api/chat { # API key authentication if ($http_x_api_key != "YOUR_SECRET_API_KEY") { return 401; } limit_req zone=ollama_limit burst=5 nodelay; proxy_pass http://ollama/api/chat; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_read_timeout 120s; proxy_buffering off; } # Block everything else location / { return 403; } }

Enable & Secure

bash# Enable the site sudo ln -s /etc/nginx/sites-available/ollama-api /etc/nginx/sites-enabled/ # Get SSL certificate sudo certbot --nginx -d ollama.example.com # Generate a strong API key openssl rand -hex 32 # Reload Nginx sudo nginx -t && sudo systemctl reload nginx
Important
Replace YOUR_SECRET_API_KEY with the key generated by openssl rand -hex 32. Save it securely — you'll need it for the Cloudflare Worker in Step 3.

Step 3

Deploy Cloudflare Worker Proxy

The Worker sits between your website and the Ollama server. It hides your server URL and API key from the browser, and adds rate limiting and CORS protection.

Project Structure

structureworkers/ollama-proxy/ ├── wrangler.toml └── src/ └── index.js

wrangler.toml

tomlname = "ollama-proxy" main = "src/index.js" compatibility_date = "2026-03-31" [vars] ALLOWED_ORIGIN = "https://your-website.com"

Worker Code (src/index.js)

The Worker validates requests, enforces rate limits, and forwards messages to your Ollama server. It supports both streaming and non-streaming responses.

javascriptexport default { async fetch(request, env) { const origin = env.ALLOWED_ORIGIN; // CORS preflight if (request.method === 'OPTIONS') { return new Response(null, { status: 204, headers: { 'Access-Control-Allow-Origin': origin, 'Access-Control-Allow-Methods': 'POST, OPTIONS', 'Access-Control-Allow-Headers': 'Content-Type', }, }); } // Parse and validate const { message, model } = await request.json(); // Forward to Ollama const res = await fetch(`${env.OLLAMA_URL}/api/chat`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'X-API-Key': env.OLLAMA_API_KEY, }, body: JSON.stringify({ model: model || 'llama3.1:8b', messages: [{ role: 'user', content: message }], stream: false, }), }); const data = await res.json(); return Response.json({ reply: data.message.content }); }, };

Deploy the Worker

bashcd workers/ollama-proxy # Set secrets (never put these in code) npx wrangler secret put OLLAMA_URL # Enter: https://ollama.example.com npx wrangler secret put OLLAMA_API_KEY # Enter: your API key from Step 2 # Deploy npx wrangler deploy
Result
Your Worker is now live at https://ollama-proxy.your-subdomain.workers.dev. Only POST requests from your allowed origin will be accepted.

Step 4

Frontend Integration

Add a chat widget to your website that sends messages to the Cloudflare Worker.

javascriptconst WORKER_URL = 'https://ollama-proxy.your-subdomain.workers.dev'; async function askAI(message) { const res = await fetch(WORKER_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message }), }); const data = await res.json(); return data.reply; } // Usage const reply = await askAI('What is local AI?'); console.log(reply);
Tip
For streaming responses, set stream: true in the request body and read the response as a ReadableStream. See the full guide in our GitHub repository for the streaming implementation.

Step 5

Keep Ollama Running

1

Linux — systemd

Ollama installs a systemd service by default. Enable it to auto-start on boot:

bashsudo systemctl enable ollama sudo systemctl start ollama sudo systemctl status ollama
2

macOS — Login Item

Add Ollama.app to System Settings → General → Login Items, or create a launchd plist for headless operation.

3

Health Monitoring

Set up a cron job to check Ollama every 5 minutes and auto-restart if it goes down.

bash# Add to crontab: crontab -e */5 * * * * curl -sf http://localhost:11434/api/tags > /dev/null || sudo systemctl restart ollama

Model Selection

Recommended Models

Model Size RAM Needed Best For
llama3.1:8b4.7 GB8 GBGeneral chat, fast responses
llama3.1:70b40 GB48 GB+High-quality reasoning
mistral:7b4.1 GB8 GBFast, multilingual
qwen2.5:7b4.4 GB8 GBChinese + English
deepseek-r1:8b4.9 GB8 GBCoding, reasoning
gemma2:9b5.4 GB10 GBLightweight, general purpose
For Hong Kong users
We recommend qwen2.5:7b for bilingual Chinese/English support, or llama3.1:8b for general English use. Both run well on 16 GB Apple Silicon Macs.

Infrastructure

Recommended Server Providers

Provider GPU Price Best For
RunPodA40 / A100$0.40–$1.50/hrBurst workloads, prototyping
LambdaA10G / A100$0.60–$1.10/hrSustained workloads
Hetzner— (CPU)~€40/moLow-traffic, small models
Vast.aiVarious$0.20+/hrBudget GPU rental
Your MacM1–M4Electricity onlyDemo, privacy-first

Security

Security Checklist

Ollama on localhost only

Ollama binds to 127.0.0.1 — Nginx handles all external traffic

API key on every request

Nginx rejects requests without a valid X-API-Key header

Secrets in Cloudflare

OLLAMA_URL and OLLAMA_API_KEY stored as Worker secrets — never in code

CORS + Rate Limiting

Worker only accepts requests from your domain, 15 req/min per IP

SSL/TLS everywhere

HTTPS from browser to Worker to Nginx — no plaintext traffic

Need Help Setting This Up?

We configure self-hosted Ollama setups for businesses across Hong Kong. Book a free assessment and we'll handle everything.

Book Free Assessment