Home
Cover of AI Inference with Ollama, llama.cpp, and vLLM

AI Inference with Ollama, llama.cpp, and vLLM

By GK Marballi · Paperback · USD 26.68

A hands-on guide to running and optimizing open-source LLM inference from local machines to high-throughput production environments.

Ollama llama.cpp vLLM Computers & Technology

Overview

The book explains how to bridge the gap between running a first local model and operating reliable inference services at scale. It covers memory and quantization tradeoffs, batching, hardware choices, and deployment patterns.

By comparing Ollama, llama.cpp, and vLLM in practical contexts, it helps readers decide when each tool fits best, whether they are building chat applications, RAG systems, or production APIs.