🖥

Self-Hosted Ollama Setup Guide

Run your own LLM server and connect it to your website — complete privacy, zero API costs, unlimited usage.

Overview

Your Own AI Server, Your Own Rules

This guide walks you through deploying Ollama on your own server (GPU VPS or Mac), securing it behind Nginx, and connecting it to a public website via a Cloudflare Worker proxy. Your data stays on your hardware, and your API key never touches the browser.

🔒

Total Privacy

All AI processing happens on your server. No data leaves your infrastructure.

💰

Zero Per-Token Cost

No API fees, no subscriptions. Pay only for server hardware.

🌐

Website-Ready

Secure proxy layer keeps your server safe while serving public web traffic.

Architecture

How It Works

Three layers protect your Ollama server while keeping it accessible from the web.

Website

User's browser

→

CF Worker

Rate limit + CORS

→

Nginx

SSL + API key auth

→

Ollama

localhost:11434

Why three layers?

The browser never sees your server URL or API key. The Cloudflare Worker hides credentials and enforces rate limits. Nginx provides SSL and a second authentication layer. Ollama only listens on localhost.

Before You Begin

Prerequisites

Requirement	Details
Server	GPU VPS (RunPod, Lambda, Hetzner) or Mac with Apple Silicon
Operating System	Ubuntu 22.04+ (Linux) or macOS 13+ (Mac)
RAM	16 GB minimum, 32 GB+ recommended
Domain	A domain pointing to your server (e.g. ollama.example.com)
Cloudflare Account	Free tier is sufficient for the Worker proxy
SSL Certificate	Let's Encrypt (free) or Cloudflare Origin Certificate

Step 1

Install Ollama & Pull Models

Linux (VPS)

bash# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1:8b

# Verify installation
ollama list

macOS

bash# Install via Homebrew
brew install ollama

# Or download from https://ollama.com/download

# Pull a model
ollama pull llama3.1:8b

Note

By default Ollama listens on 127.0.0.1:11434. Since Nginx runs on the same machine, no change is needed. If they're on different hosts, set OLLAMA_HOST=0.0.0.0 in the systemd unit or via launchctl.

Step 2

Configure Nginx Reverse Proxy

Nginx provides SSL termination, API key authentication, and rate limiting in front of Ollama.

Install Nginx & Certbot

bash# Ubuntu/Debian
sudo apt update && sudo apt install -y nginx certbot python3-certbot-nginx

# macOS
brew install nginx

Nginx Configuration

Create /etc/nginx/sites-available/ollama-api:

nginxupstream ollama {
    server 127.0.0.1:11434;
}

# Rate limiting: 15 requests per minute per IP
limit_req_zone $binary_remote_addr zone=ollama_limit:10m rate=15r/m;

server {
    listen 80;
    server_name ollama.example.com;

    location /api/chat {
        # API key authentication
        if ($http_x_api_key != "YOUR_SECRET_API_KEY") {
            return 401;
        }

        limit_req zone=ollama_limit burst=5 nodelay;

        proxy_pass http://ollama/api/chat;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 120s;
        proxy_buffering off;
    }

    # Block everything else
    location / {
        return 403;
    }
}

Enable & Secure

bash# Enable the site
sudo ln -s /etc/nginx/sites-available/ollama-api /etc/nginx/sites-enabled/

# Get SSL certificate
sudo certbot --nginx -d ollama.example.com

# Generate a strong API key
openssl rand -hex 32

# Reload Nginx
sudo nginx -t && sudo systemctl reload nginx

Important

Replace YOUR_SECRET_API_KEY with the key generated by openssl rand -hex 32. Save it securely — you'll need it for the Cloudflare Worker in Step 3.

Step 3

Deploy Cloudflare Worker Proxy

The Worker sits between your website and the Ollama server. It hides your server URL and API key from the browser, and adds rate limiting and CORS protection.

Project Structure

structureworkers/ollama-proxy/
├── wrangler.toml
└── src/
    └── index.js

wrangler.toml

tomlname = "ollama-proxy"
main = "src/index.js"
compatibility_date = "2026-03-31"

[vars]
ALLOWED_ORIGIN = "https://your-website.com"

Worker Code (src/index.js)

The Worker validates requests, enforces rate limits, and forwards messages to your Ollama server. It supports both streaming and non-streaming responses.

javascriptexport default {
  async fetch(request, env) {
    const origin = env.ALLOWED_ORIGIN;

    // CORS preflight
    if (request.method === 'OPTIONS') {
      return new Response(null, {
        status: 204,
        headers: {
          'Access-Control-Allow-Origin': origin,
          'Access-Control-Allow-Methods': 'POST, OPTIONS',
          'Access-Control-Allow-Headers': 'Content-Type',
        },
      });
    }

    // Parse and validate
    const { message, model } = await request.json();

    // Forward to Ollama
    const res = await fetch(`${env.OLLAMA_URL}/api/chat`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-API-Key': env.OLLAMA_API_KEY,
      },
      body: JSON.stringify({
        model: model || 'llama3.1:8b',
        messages: [{ role: 'user', content: message }],
        stream: false,
      }),
    });

    const data = await res.json();
    return Response.json({ reply: data.message.content });
  },
};

Deploy the Worker

bashcd workers/ollama-proxy

# Set secrets (never put these in code)
npx wrangler secret put OLLAMA_URL
# Enter: https://ollama.example.com

npx wrangler secret put OLLAMA_API_KEY
# Enter: your API key from Step 2

# Deploy
npx wrangler deploy

Result

Your Worker is now live at https://ollama-proxy.your-subdomain.workers.dev. Only POST requests from your allowed origin will be accepted.

Step 4

Frontend Integration

Add a chat widget to your website that sends messages to the Cloudflare Worker.

javascriptconst WORKER_URL = 'https://ollama-proxy.your-subdomain.workers.dev';

async function askAI(message) {
  const res = await fetch(WORKER_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message }),
  });
  const data = await res.json();
  return data.reply;
}

// Usage
const reply = await askAI('What is local AI?');
console.log(reply);

Tip

For streaming responses, set stream: true in the request body and read the response as a ReadableStream. See the full guide in our GitHub repository for the streaming implementation.

Step 5

Keep Ollama Running

Linux — systemd

Ollama installs a systemd service by default. Enable it to auto-start on boot:

bashsudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

macOS — Login Item

Add Ollama.app to System Settings → General → Login Items, or create a launchd plist for headless operation.

Health Monitoring

Set up a cron job to check Ollama every 5 minutes and auto-restart if it goes down.

bash# Add to crontab: crontab -e
*/5 * * * * curl -sf http://localhost:11434/api/tags > /dev/null || sudo systemctl restart ollama

Model Selection

Recommended Models

Model	Size	RAM Needed	Best For
llama3.1:8b	4.7 GB	8 GB	General chat, fast responses
llama3.1:70b	40 GB	48 GB+	High-quality reasoning
mistral:7b	4.1 GB	8 GB	Fast, multilingual
qwen2.5:7b	4.4 GB	8 GB	Chinese + English
deepseek-r1:8b	4.9 GB	8 GB	Coding, reasoning
gemma2:9b	5.4 GB	10 GB	Lightweight, general purpose

For Hong Kong users

We recommend qwen2.5:7b for bilingual Chinese/English support, or llama3.1:8b for general English use. Both run well on 16 GB Apple Silicon Macs.

Infrastructure

Recommended Server Providers

Provider	GPU	Price	Best For
RunPod	A40 / A100	$0.40–$1.50/hr	Burst workloads, prototyping
Lambda	A10G / A100	$0.60–$1.10/hr	Sustained workloads
Hetzner	— (CPU)	~€40/mo	Low-traffic, small models
Vast.ai	Various	$0.20+/hr	Budget GPU rental
Your Mac	M1–M4	Electricity only	Demo, privacy-first

Security

Security Checklist

✓

Ollama on localhost only

Ollama binds to 127.0.0.1 — Nginx handles all external traffic

✓

API key on every request

Nginx rejects requests without a valid X-API-Key header

✓

Secrets in Cloudflare

OLLAMA_URL and OLLAMA_API_KEY stored as Worker secrets — never in code

✓

CORS + Rate Limiting

Worker only accepts requests from your domain, 15 req/min per IP

✓

SSL/TLS everywhere

HTTPS from browser to Worker to Nginx — no plaintext traffic

Self-Hosted Ollama Setup Guide

Your Own AI Server, Your Own Rules

How It Works

Prerequisites

Install Ollama & Pull Models

Linux (VPS)

macOS

Configure Nginx Reverse Proxy

Install Nginx & Certbot

Nginx Configuration

Enable & Secure

Deploy Cloudflare Worker Proxy

Project Structure

wrangler.toml

Worker Code (src/index.js)

Deploy the Worker

Frontend Integration

Keep Ollama Running

Linux — systemd

macOS — Login Item

Health Monitoring

Recommended Models

Recommended Server Providers

Security Checklist

Ollama on localhost only

API key on every request

Secrets in Cloudflare

CORS + Rate Limiting

SSL/TLS everywhere

Need Help Setting This Up?