Home
Reference — Glossary
The terms, in plain English
The vocabulary of agentic AI, defined the way we actually use it — no hype, no hand-waving.
Agents
- Agentic AI
- AI systems that don't just answer — they take actions toward a goal: planning steps, calling tools, observing results, and correcting course with limited human input. The shift from a chatbot that talks to a system that does.
- Agents
- A model wrapped in a loop that can use tools, keep state, and pursue a goal over many steps. Where a single prompt gives one answer, an agent decides what to do next, acts, checks the outcome, and repeats until the task is done.
- Harness
- The scaffolding around a model that makes it an agent: the loop, the tool definitions, the prompt assembly, memory wiring, and safety checks. The model is the engine; the harness is the car built around it. Most of the engineering effort lives here.
- Multi-Agent Orchestration
- Coordinating several specialised agents — a planner, researchers, a reviewer — that work in parallel or hand off to each other. More reliable and scalable than asking one agent to do everything.
Core
- Memory
- How an agent remembers across turns and sessions. Short-term lives in the context window; durable memory is stored externally (files, a database, a vector store) and recalled when relevant. The reliable path to long-horizon work — cheaper and more auditable than an ever-bigger context.
- Context Window
- The maximum amount of text (measured in tokens) a model can consider at once — prompt, history, retrieved docs, and the reply. Everything outside it is invisible to the model, which is why memory and retrieval matter.
- Token
- The unit models read and write — roughly a word-piece (about ¾ of a word in English). Context limits, latency, and pricing are all measured in tokens, so token efficiency is a real engineering lever.
- Guardrails
- The checks that keep an agent inside safe, intended bounds — input/output filters, permission gates, action limits, human-in-the-loop on risky steps. Non-negotiable for enterprise deployment.
- Hallucination
- When a model states something false with full confidence. Inherent to how LLMs work; mitigated — not eliminated — with retrieval, citations, verification steps, and keeping a human in the loop where it counts.
Infra
- MCP (Model Context Protocol)
- An open standard for connecting models to tools and data sources through a common interface, so a capability built once works across many agents and apps. The emerging "USB-C" for plugging context into AI.
- Embeddings
- Numeric vectors that capture the meaning of text, so similar ideas sit close together in space. The backbone of semantic search and retrieval — you compare meaning, not keywords.
- Vector Database
- A store built to index and search embeddings by similarity (pgvector, Pinecone, Weaviate). It's what lets an agent ask "what do I know that's relevant to this?" and get an answer in milliseconds.
- Quantization
- Shrinking a model by storing its weights at lower precision (e.g. 4-bit), so it runs faster and fits smaller hardware with modest quality loss. Key to running capable open models locally and affordably.
- Inference
- Running a trained model to get an output — the cost you pay per request, every request, forever. Inference economics (speed, price per token, hardware) often decide what's viable in production.
Models
- LLM (Large Language Model)
- A neural network trained on vast text to predict the next token. That simple objective, at scale, yields models that write, reason, summarise, and code. The engine underneath nearly every modern AI product.
- Models
- The trained networks themselves — GPT, Claude, Gemini, Llama, Mistral and the rest. They differ in capability, speed, cost, context size, and whether the weights are open or closed. Choosing the right one per task is half of building well.
- Local / Open-Weight Models
- Models whose weights you can download and run on your own hardware (via Ollama, vLLM, llama.cpp). Trade some peak capability for privacy, control, predictable cost, and no data leaving your network — often decisive for enterprise.
Technique
- Tool Use
- Giving a model the ability to call functions — search the web, query a database, run code, hit an API — and feed the result back into its reasoning. The bridge between a model that knows things and an agent that can do things.
- RAG (Retrieval-Augmented Generation)
- Fetch relevant documents at query time and put them in the context so the model answers from your data, not just its training. The standard way to ground AI in private, current, or domain-specific knowledge and cut hallucination.
- Fine-Tuning
- Further-training a base model on your own examples to bake in a style, format, or task. Powerful but heavier than prompting or RAG — usually the last lever you reach for, not the first.
- Reasoning
- Models that "think" before answering — working through intermediate steps internally (chain-of-thought) to handle harder, multi-step problems. Trades extra tokens and latency for accuracy.