How I Built a RAG That Doesn't Hallucinate

This article is a guide for non-engineers: for social-science researchers and young activists who want to understand how the „Participatory Intelligent Assistant” actually works — and why it works the way it does.

In short

PIA is a private knowledge base with an AI assistant built on a RAG architecture. Unlike ChatGPT, it answers only from the documents in your own database and cites its sources instead of making things up. Below I explain — without the jargon — how it works: what an embedding is, why we split documents into chunks, and why the Polish language forced some concrete technical decisions.

A problem anyone following the field knows

Imagine you work on children’s and youth participation — or any other topic — as a researcher, as an activist in an organisation, as a member of an advisory team.

Every day new material piles up: a UNICEF report, an academic paper, a government document, some national policy in Czech, an NGO diagnosis, a briefing from the Ombudsman for Children.

Some of it is in Polish, some in English, but most of it is in other languages. Some are PDFs, some are web pages — and on top of that you have your own book or an internal study sitting in a Word file.

There’s no way to handle this by hand — I couldn’t. Reading everything in sequence is inefficient, and Ctrl+F across dozens of files is almost useless. The problem with simple Ctrl+F search is that it finds exactly the word you type. If you’re looking for „teen participation” but the document says „youth councils” or the Polish „rady młodzieżowe” — it finds nothing, even though they’re talking about exactly the same thing.

PIA — short for the Polish Partycypacyjny Inteligentny Asystent (Participatory Intelligent Assistant) — was born out of exactly this need. The name PIA is something I made up as a working title, because when I talk to something I like to give it a name. „Ask PIA” sounds better in my head than „Ask the RAG” — and PIA is precisely a RAG.

I built it for the advisory team to the Ombudsman for Children, where academics and activists work side by side with very different backgrounds — both in participation itself and in using AI. When I was appointed to that team, I had only a very general idea of what children’s and youth participation is, and I struggled to fill that gap quickly.

Others on the team, in turn, had the knowledge and the materials — but the challenge was sharing that knowledge and quickly analysing large volumes of data. An example: my sub-team was tasked with analysing 13 legal policies we knew of from EU countries, plus a model UNICEF document, and drawing conclusions about which of those practices we could propose for Poland, and why.

A year ago I took a few people from the team to an introductory AI-for-NGOs training, and it became clear that many of them were hearing words like „RAG”, „embedding” or „LLM” for the first time in their lives. And to use AI well, it helps to understand how and why it works the way it does. This text is exactly about that.

When I had to prepare an internal training in May, I decided it would be better to work on something the team already knows and understands, and to explain all these concepts on top of that.

What PIA is — and what it definitely isn’t

In short: PIA is a private knowledge base with an AI assistant that does three things.

Every day it searches the web for new material on its own (at 7:00 a.m.) and adds it to the database.
It lets you upload sources manually — you drag in a file, and the AI itself extracts the title, authors, date and tags. First it checks whether it already has the document, even if the files were named differently.
It gives you a search engine and a chat like ChatGPT, but better — you ask, in plain language, „what does research say about student participation in high school?”, and the system answers citing specific sources, which you can click and then copy into Word.

PIA chat screen with the question „What do you want to ask?” and a note that answers are based on sources in a knowledge base about children's and youth participation (Polish interface) — PIA’s chat start screen (the tool’s interface is in Polish). The key promise is already visible here: „I’ll answer based on sources in the knowledge base about children’s and youth participation.”

And here’s the most important distinction, because it determines everything:

PIA is not ChatGPT.

ChatGPT answers based on what it „remembered” from the internet up to a certain date. It doesn’t know your report from last week or the consultation PDF you just uploaded. And when you ask about something like that — it can invent an answer with full confidence: a non-existent author, a made-up date, a fictional report. This isn’t a „bug” — it follows from the nature of these models, which I’ll get to in a moment.

PIA is different: it answers solely on the basis of the documents in your database. If something isn’t there, it says „no data” instead of making things up. That’s the foundation of the whole construction. It also won’t give you a recipe for grilled steak, even though the AI models it’s built on probably do know that recipe.

PIA refuses to write a recipe for grilled steak, explaining the question is outside the scope of the available sources about children's and youth participation (Polish interface) — The test in practice: asked for a grilled-steak recipe, PIA doesn’t make things up — it answers „I can’t provide an answer based on the available sources”, because there’s nothing about cooking in the database.

For the record, what PIA also isn’t: it’s not a public search engine (only a narrow, approved group of people has access) and it’s not a peer-review system. It’s a tool for briefing and discovery, not for judging quality — we leave that judgement to a human.

RAG — or, how to keep the AI from making things up

This is where we get to the heart of the matter. The acronym RAG stands for Retrieval-Augmented Generation. It sounds technical, but the idea is simple.

A language model (like Claude or ChatGPT) is a brilliant writer. Whatever you ask it, it will always answer — because that’s how language models work.

To put it more simply: picture an exceptionally eloquent sociology graduate. Asked for an essay on the latest research, they’ll write it fluently and convincingly. The problem is that if they don’t know the sources first-hand, they’ll start inventing authors and dates — because their job is to „sound sensible”, not to „tell the truth”. Hence all the memes about AI hallucinations.

RAG is an instruction: „First take these specific books off the shelf, read the relevant passages, and only then write — and for every sentence, say which book it came from.”

What actually happens when you ask PIA a question:

The system understands the question. If you’ve been talking for a few turns and you follow up with „and what about municipalities?”, a cheap, fast model first rewrites that into a standalone query using the conversation context: „youth participation at the municipal level”.
The system searches for matching chunks in your database — not by keywords, but by meaning (exactly how, I explain in the next section).
It builds a „cheat sheet” for the model — it pastes the chunks it found into the prompt, numbered: [1], [2], [3]…
The model writes the answer solely from that cheat sheet, inserting citation numbers: „The level of real participation in high schools is low [1]. The Youth Diagnosis 2026 indicates that only 23% of students… [1].”
The citations become clickable — [1] is a link to the exact source, which you can open and verify.

Diagram of the five RAG steps: question, embedding the question into a vector of 3072 numbers, searching for the closest chunks in the database, pasting them into the prompt as a numbered cheat sheet, and the model's answer with clickable citations — The five steps of RAG — from a question to an answer based solely on the retrieved chunks, with a clickable footnote on every sentence.

Why is this architecture so powerful? For four reasons a researcher will appreciate right away:

Trustworthiness — every statement has a footnote. There’s no „as is widely known”, there’s „[1] indicates”. You can click and judge the source yourself.
Currency — to make PIA aware of a new report, you just add it to the database. There’s no need to „retrain” the model (that would be a hundreds-of-thousands-of-dollars operation).
Staying on topic — ask it for the capital of Argentina, or anything unrelated, and you’ll get „no data in the database”, because I explicitly forbid the model from using knowledge outside your documents.
Auditability — the system logs every question, every answer and every cited source. You can reconstruct exactly what it answered and on what basis.

On the hot path — where the quality of the answer matters most — we currently use the most capable model available (Claude Opus 4.8), and for cheaper, auxiliary tasks, lighter models. That too is a deliberate decision: the expensive model where the reader will feel it; the cheap one where they won’t.

Why „search by meaning” — and how that’s even possible

Let’s go back to step 2: how does the system know that „youth councils” and the Polish „rady młodzieżowe” mean the same thing, when they don’t share a single word?

The answer is the embedding. It’s the conversion of every chunk of text into a vector of numbers — in our case, a string of 3072 numbers — in such a way that texts with similar meaning get numerically close vectors, even if the vocabulary is completely different.

Here an analogy that social scientists will appreciate helps. Think of a multidimensional Likert scale. In a study of attitudes toward youth democracy you might have, say, 20 items: trust in institutions, sense of agency, attitude toward authority… Each respondent becomes a vector of 20 numbers. Two respondents with similar scores across those 20 dimensions have similar attitudes — even if they answered individual questions differently.

An embedding is exactly the same thing, only with 3072 dimensions instead of 20 — and with one crucial difference: those dimensions weren’t defined by a researcher. The model „learned” them itself from a huge corpus of text. You can’t name them („dimension 47 is formality”), but the distance between two vectors reliably tells you how close two texts are in topic. We turn the question into such a vector too, and look for the chunks whose vectors are closest — and this works across languages as well: an English document can „match” a Polish question.

Embedding diagram: a sentence turned into a vector of 3072 numbers, an analogy to a multidimensional Likert scale with 3072 dimensions, and a cosine-similarity scale from 0.0 to 1.0 — An embedding turns text into a vector of 3072 numbers. The closer two vectors are, the closer the texts are in topic — regardless of the words used, and even of the language.

Why this particular model and 3072 dimensions? Because in testing it handled nuance better than lighter variants, and the cost of computing a single query is negligible. But there’s one catch that forced a concrete engineering decision on me:

Polish text „costs” roughly 1.5–2× more than English.

Tokenisation comparison: English „participation” is 1 token (~4 characters per token), Polish „partycypacja” is 4 tokens (~2–3 characters per token); the embedding limit is 14,000 characters for Polish instead of 24,000 for English (slide in Polish) — The same concept: one token in English, four in Polish. Hence the lower character limit per chunk for Polish — and the higher cost. (A slide from my presentation „AI in the service of participation”, in Polish.)

Models cut text into „tokens” (pieces of words). English splits into tokens economically (~4 characters per token); Polish — because of inflection, diacritics and longer words — much more densely (~2–3 characters per token). The embedding model has a hard limit on how many tokens it accepts at once. That’s why I set, based on my own knowledge, a limit of 14,000 characters per chunk to embed: for Polish that’s a safe margin, so the text fits within the model’s limit and doesn’t get „cut off mid-sentence”. It’s a good example of how the realities of the Polish language force decisions that wouldn’t exist in an English-only project — or would matter less. At the same time I rejected the idea of translating everything into English, embedding that, and then translating back into Polish — because I cared about the language and its nuance.

Why the AI splits documents the way it does

If we embed chunks of „up to 14,000 characters”, what about everything else? Because documents vary wildly: some are a single page, others are whole books — our database includes texts of over a million characters. You can’t (and there’s no point trying to) turn such a book into a single vector — a vector „about everything” is really „about nothing”. Hence chunking: splitting a document into pieces.

My AI does this hierarchically, on two levels:

Child chunk (~1,000 characters, with a slight overlap at the edges) — this is what we search through. Small, precise, „about one thought”. As a result, the hit is accurate: instead of „this document mentions it somewhere”, we get „this exact paragraph talks about it”.
Parent chunk (~5,000 characters) — the broader context around the matched child. This is what we hand to the model to write the answer.

The analogy: an index card versus a chapter. You search through the cards — they’re small and it’s easy to find the right one among them. But once you find the card, to write a smart answer you read the whole chapter it came from, so you don’t pull a sentence out of context. Small chunks give you search accuracy; large ones give you richness of context — we take both.

On top of that there’s a second search path: classic keyword search (full-text search), but in a version that understands Polish inflection — so that „partycypacją”, „partycypacji” and „partycypacja” are treated as the same word. Standard database tools can’t do this for Polish, so we used a specialised extension. The best results come from combining both methods — semantic and keyword — because each catches something the other misses.

What’s in it for you?

If you’re a researcher: PIA works like an automated systematic literature review. The same stages you know — defining queries, an initial screen by title and abstract, full reading, coding/tagging, synthesis — the system does on its own, in ~30 seconds per document instead of weeks. But curation stays on your side: you can reject any entry, archive it, star it as important, fix the tags. That’s a deliberate trade-off: I bet on speed, because the bottleneck isn’t the precision of the metadata — it’s your time spent reading.

A single source view in PIA: automatically extracted metadata (authors, publisher, language, word count), a „Why it matters” section, a description, tags, and the verification buttons Approve and Reject (Polish interface) — A source view: the AI extracts metadata, tags and a „Why it matters” summary on its own — but the final word belongs to a human: Approve, Reject, fix the tags, add a note. (Interface in Polish.)

If you’re a young activist: instead of „I read somewhere that young people want to be asked about school matters”, you get a specific sentence with a source number you can click and show to a decision-maker. An argument backed by a citable source carries incomparably more weight than an impression.

And, to be honest about the limits — because understanding them is also part of using AI well:

No document — no answer. PIA doesn’t know the world beyond your database. That’s an advantage (it doesn’t make things up), but also a limitation (it’s only as good as what you put into it).
A bad question — weak hits. The more specific the question, the more accurate the chunks. A single word, „participation”, will find anything; a full sentence finds what you actually mean.
Watch the citations. If [1] is about a different matter than the sentence it sits next to, that’s a signal the model over-interpreted. Clickable citations exist precisely so you can catch this.
It’s not peer review. PIA helps you find and summarise, but judging a source’s scholarly value is up to you.

To close

My goal wasn’t to build a „magic box that knows everything”. It was a tool you can trust, because you can see what every sentence rests on — and one that a team with very different backgrounds can understand and use sensibly.

That’s why every decision I described here — RAG instead of „bare” chat, embedding by meaning, the 14,000-character limit dictated by the specifics of Polish, two-level chunking, combining semantic search with inflectional search — is not accidental. They’re answers to concrete questions: how to keep the AI from making things up, how to cope with Polish, how to handle both a one-page briefing and a whole book.

Because using AI well isn’t about blind faith in the answer. It’s about understanding how and why it came to be.

On this project I learned a lot myself about building small, specialised AI models with concrete domain knowledge. It’s now fairly easy to deploy an identical system with different knowledge — about sociology, teenagers’ mental-health problems, or anything else. I drew on my background in Polish-language NLP, because without it you couldn’t do tokenisation (splitting text into smaller pieces), lemmatisation and stemming (reducing words to their base form, e.g. „running”, „ran” to „run”) and meaning recognition properly. I hope PIA will also be a genuinely useful tool for the people on the team.

How I Built an AI That Doesn't Make Things Up About Youth Participation

Maybe we can do something together?

How can I help you?

Contact