Self-Hosted Ollama Setup Guide
Run your own LLM server and connect it to your website — complete privacy, zero API costs, unlimited usage.
Overview
Your Own AI Server, Your Own Rules
This guide walks you through deploying Ollama on your own server (GPU VPS or Mac), securing it behind Nginx, and connecting it to a public website via a Cloudflare Worker proxy. Your data stays on your hardware, and your API key never touches the browser.
Architecture
How It Works
Three layers protect your Ollama server while keeping it accessible from the web.
Before You Begin
Prerequisites
| Requirement | Details |
|---|---|
| Server | GPU VPS (RunPod, Lambda, Hetzner) or Mac with Apple Silicon |
| Operating System | Ubuntu 22.04+ (Linux) or macOS 13+ (Mac) |
| RAM | 16 GB minimum, 32 GB+ recommended |
| Domain | A domain pointing to your server (e.g. ollama.example.com) |
| Cloudflare Account | Free tier is sufficient for the Worker proxy |
| SSL Certificate | Let's Encrypt (free) or Cloudflare Origin Certificate |
Step 1
Install Ollama & Pull Models
Linux (VPS)
macOS
Step 2
Configure Nginx Reverse Proxy
Nginx provides SSL termination, API key authentication, and rate limiting in front of Ollama.
Install Nginx & Certbot
Nginx Configuration
Create /etc/nginx/sites-available/ollama-api:
Enable & Secure
Step 3
Deploy Cloudflare Worker Proxy
The Worker sits between your website and the Ollama server. It hides your server URL and API key from the browser, and adds rate limiting and CORS protection.
Project Structure
wrangler.toml
Worker Code (src/index.js)
The Worker validates requests, enforces rate limits, and forwards messages to your Ollama server. It supports both streaming and non-streaming responses.
Deploy the Worker
Step 4
Frontend Integration
Add a chat widget to your website that sends messages to the Cloudflare Worker.
Step 5
Keep Ollama Running
Linux — systemd
Ollama installs a systemd service by default. Enable it to auto-start on boot:
macOS — Login Item
Add Ollama.app to System Settings → General → Login Items, or create a launchd plist for headless operation.
Health Monitoring
Set up a cron job to check Ollama every 5 minutes and auto-restart if it goes down.
Model Selection
Recommended Models
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| llama3.1:8b | 4.7 GB | 8 GB | General chat, fast responses |
| llama3.1:70b | 40 GB | 48 GB+ | High-quality reasoning |
| mistral:7b | 4.1 GB | 8 GB | Fast, multilingual |
| qwen2.5:7b | 4.4 GB | 8 GB | Chinese + English |
| deepseek-r1:8b | 4.9 GB | 8 GB | Coding, reasoning |
| gemma2:9b | 5.4 GB | 10 GB | Lightweight, general purpose |
Infrastructure
Recommended Server Providers
| Provider | GPU | Price | Best For |
|---|---|---|---|
| RunPod | A40 / A100 | $0.40–$1.50/hr | Burst workloads, prototyping |
| Lambda | A10G / A100 | $0.60–$1.10/hr | Sustained workloads |
| Hetzner | — (CPU) | ~€40/mo | Low-traffic, small models |
| Vast.ai | Various | $0.20+/hr | Budget GPU rental |
| Your Mac | M1–M4 | Electricity only | Demo, privacy-first |
Security
Security Checklist
Ollama on localhost only
Ollama binds to 127.0.0.1 — Nginx handles all external traffic
API key on every request
Nginx rejects requests without a valid X-API-Key header
Secrets in Cloudflare
OLLAMA_URL and OLLAMA_API_KEY stored as Worker secrets — never in code
CORS + Rate Limiting
Worker only accepts requests from your domain, 15 req/min per IP
SSL/TLS everywhere
HTTPS from browser to Worker to Nginx — no plaintext traffic