KMM Technologies
Home

Reference — Glossary

The terms, in plain English

The vocabulary of agentic AI, defined the way we actually use it — no hype, no hand-waving.

Agents

Agentic AI
AI systems that don't just answer — they take actions toward a goal: planning steps, calling tools, observing results, and correcting course with limited human input. The shift from a chatbot that talks to a system that does.
Agents
A model wrapped in a loop that can use tools, keep state, and pursue a goal over many steps. Where a single prompt gives one answer, an agent decides what to do next, acts, checks the outcome, and repeats until the task is done.
Harness
The scaffolding around a model that makes it an agent: the loop, the tool definitions, the prompt assembly, memory wiring, and safety checks. The model is the engine; the harness is the car built around it. Most of the engineering effort lives here.
Multi-Agent Orchestration
Coordinating several specialised agents — a planner, researchers, a reviewer — that work in parallel or hand off to each other. More reliable and scalable than asking one agent to do everything.

Core

Memory
How an agent remembers across turns and sessions. Short-term lives in the context window; durable memory is stored externally (files, a database, a vector store) and recalled when relevant. The reliable path to long-horizon work — cheaper and more auditable than an ever-bigger context.
Context Window
The maximum amount of text (measured in tokens) a model can consider at once — prompt, history, retrieved docs, and the reply. Everything outside it is invisible to the model, which is why memory and retrieval matter.
Token
The unit models read and write — roughly a word-piece (about ¾ of a word in English). Context limits, latency, and pricing are all measured in tokens, so token efficiency is a real engineering lever.
Guardrails
The checks that keep an agent inside safe, intended bounds — input/output filters, permission gates, action limits, human-in-the-loop on risky steps. Non-negotiable for enterprise deployment.
Hallucination
When a model states something false with full confidence. Inherent to how LLMs work; mitigated — not eliminated — with retrieval, citations, verification steps, and keeping a human in the loop where it counts.

Infra

MCP (Model Context Protocol)
An open standard for connecting models to tools and data sources through a common interface, so a capability built once works across many agents and apps. The emerging "USB-C" for plugging context into AI.
Embeddings
Numeric vectors that capture the meaning of text, so similar ideas sit close together in space. The backbone of semantic search and retrieval — you compare meaning, not keywords.
Vector Database
A store built to index and search embeddings by similarity (pgvector, Pinecone, Weaviate). It's what lets an agent ask "what do I know that's relevant to this?" and get an answer in milliseconds.
Quantization
Shrinking a model by storing its weights at lower precision (e.g. 4-bit), so it runs faster and fits smaller hardware with modest quality loss. Key to running capable open models locally and affordably.
Inference
Running a trained model to get an output — the cost you pay per request, every request, forever. Inference economics (speed, price per token, hardware) often decide what's viable in production.

Models

LLM (Large Language Model)
A neural network trained on vast text to predict the next token. That simple objective, at scale, yields models that write, reason, summarise, and code. The engine underneath nearly every modern AI product.
Models
The trained networks themselves — GPT, Claude, Gemini, Llama, Mistral and the rest. They differ in capability, speed, cost, context size, and whether the weights are open or closed. Choosing the right one per task is half of building well.
Local / Open-Weight Models
Models whose weights you can download and run on your own hardware (via Ollama, vLLM, llama.cpp). Trade some peak capability for privacy, control, predictable cost, and no data leaving your network — often decisive for enterprise.

Technique

Tool Use
Giving a model the ability to call functions — search the web, query a database, run code, hit an API — and feed the result back into its reasoning. The bridge between a model that knows things and an agent that can do things.
RAG (Retrieval-Augmented Generation)
Fetch relevant documents at query time and put them in the context so the model answers from your data, not just its training. The standard way to ground AI in private, current, or domain-specific knowledge and cut hallucination.
Fine-Tuning
Further-training a base model on your own examples to bake in a style, format, or task. Powerful but heavier than prompting or RAG — usually the last lever you reach for, not the first.
Reasoning
Models that "think" before answering — working through intermediate steps internally (chain-of-thought) to handle harder, multi-step problems. Trades extra tokens and latency for accuracy.