Reference — Glossary

The terms, in plain English

The vocabulary of agentic AI, defined the way we actually use it — no hype, no hand-waving.

Agents

Agentic AI: AI systems that don't just answer — they take actions toward a goal: planning steps, calling tools, observing results, and correcting course with limited human input. The shift from a chatbot that talks to a system that does.
related:Agents Tool Use Harness
Agents: A model wrapped in a loop that can use tools, keep state, and pursue a goal over many steps. Where a single prompt gives one answer, an agent decides what to do next, acts, checks the outcome, and repeats until the task is done.
related:Agentic AI Harness Tool Use Memory
Harness: The scaffolding around a model that makes it an agent: the loop, the tool definitions, the prompt assembly, memory wiring, and safety checks. The model is the engine; the harness is the car built around it. Most of the engineering effort lives here.
related:Agents Tool Use Agentic AI
Multi-Agent Orchestration: Coordinating several specialised agents — a planner, researchers, a reviewer — that work in parallel or hand off to each other. More reliable and scalable than asking one agent to do everything.
related:Agents Agentic AI Harness

Memory: How an agent remembers across turns and sessions. Short-term lives in the context window; durable memory is stored externally (files, a database, a vector store) and recalled when relevant. The reliable path to long-horizon work — cheaper and more auditable than an ever-bigger context.
related:Context Window RAG (Retrieval-Augmented Generation)Vector Database Agents
Context Window: The maximum amount of text (measured in tokens) a model can consider at once — prompt, history, retrieved docs, and the reply. Everything outside it is invisible to the model, which is why memory and retrieval matter.
related:Token Memory RAG (Retrieval-Augmented Generation)
Token: The unit models read and write — roughly a word-piece (about ¾ of a word in English). Context limits, latency, and pricing are all measured in tokens, so token efficiency is a real engineering lever.
related:LLM (Large Language Model)Context Window Inference
Guardrails: The checks that keep an agent inside safe, intended bounds — input/output filters, permission gates, action limits, human-in-the-loop on risky steps. Non-negotiable for enterprise deployment.
related:Agents Harness
Hallucination: When a model states something false with full confidence. Inherent to how LLMs work; mitigated — not eliminated — with retrieval, citations, verification steps, and keeping a human in the loop where it counts.
related:RAG (Retrieval-Augmented Generation)Guardrails LLM (Large Language Model)

MCP (Model Context Protocol): An open standard for connecting models to tools and data sources through a common interface, so a capability built once works across many agents and apps. The emerging "USB-C" for plugging context into AI.
related:Tool Use Agents
Embeddings: Numeric vectors that capture the meaning of text, so similar ideas sit close together in space. The backbone of semantic search and retrieval — you compare meaning, not keywords.
related:Vector Database RAG (Retrieval-Augmented Generation)
Vector Database: A store built to index and search embeddings by similarity (pgvector, Pinecone, Weaviate). It's what lets an agent ask "what do I know that's relevant to this?" and get an answer in milliseconds.
related:Embeddings RAG (Retrieval-Augmented Generation)Memory
Quantization: Shrinking a model by storing its weights at lower precision (e.g. 4-bit), so it runs faster and fits smaller hardware with modest quality loss. Key to running capable open models locally and affordably.
related:Local / Open-Weight Models Inference
Inference: Running a trained model to get an output — the cost you pay per request, every request, forever. Inference economics (speed, price per token, hardware) often decide what's viable in production.
related:LLM (Large Language Model)Token Quantization

LLM (Large Language Model): A neural network trained on vast text to predict the next token. That simple objective, at scale, yields models that write, reason, summarise, and code. The engine underneath nearly every modern AI product.
related:Models Token Inference Context Window
Models: The trained networks themselves — GPT, Claude, Gemini, Llama, Mistral and the rest. They differ in capability, speed, cost, context size, and whether the weights are open or closed. Choosing the right one per task is half of building well.
related:LLM (Large Language Model)Local / Open-Weight Models Inference
Local / Open-Weight Models: Models whose weights you can download and run on your own hardware (via Ollama, vLLM, llama.cpp). Trade some peak capability for privacy, control, predictable cost, and no data leaving your network — often decisive for enterprise.
related:Models Quantization Inference

Tool Use: Giving a model the ability to call functions — search the web, query a database, run code, hit an API — and feed the result back into its reasoning. The bridge between a model that knows things and an agent that can do things.
related:Agents MCP (Model Context Protocol)Harness
RAG (Retrieval-Augmented Generation): Fetch relevant documents at query time and put them in the context so the model answers from your data, not just its training. The standard way to ground AI in private, current, or domain-specific knowledge and cut hallucination.
related:Vector Database Embeddings Memory Hallucination
Fine-Tuning: Further-training a base model on your own examples to bake in a style, format, or task. Powerful but heavier than prompting or RAG — usually the last lever you reach for, not the first.
related:Models LLM (Large Language Model)
Reasoning: Models that "think" before answering — working through intermediate steps internally (chain-of-thought) to handle harder, multi-step problems. Trades extra tokens and latency for accuracy.
related:LLM (Large Language Model)Token