Created with the help of AI

Category — Understand

Retrieved Source Attribution

provenancecausal reasoningcognitive accessibilitygenerative AI

User question

What sources did the system draw from?

Consulting signal

Relevant in any client deploying a RAG-based knowledge system, such as internal search, policy assistants, or support chatbots, where users need to verify answers or trace responses back to source documents.

Overview

Why this pattern exists

A knowledge assistant returns an answer. It sounds right. It's well-structured and specific. A user in a legal, medical, or compliance context acts on it. Later, someone asks: where did that come from? Was it from the organization's documentation, or from the model's general training? Was it current? The interface provides no way to answer any of these questions.

This is the gap that RAG (Retrieval-Augmented Generation) systems create when they aren't paired with attribution design. The architecture retrieves documents at runtime to generate a response: which means there are sources, there is a retrievable basis for the answer. But if the interface doesn't surface them, users have no way to verify the answer, trace it back, or assess how reliable the retrieval was.

Retrieved Source Attribution makes the retrieval layer visible. It is distinct from Data Provenance & Lineage, which concerns training data: the long-term substrate the model was built on. This pattern concerns what the system looked up for this specific response, and how closely that material actually matched the question. Both matter. They happen at different points in the pipeline and require different design responses.

Design goal

Surface the documents, passages, or records the system retrieved at runtime, so users can verify the basis of a response, assess its reliability, and access primary sources directly.

Usage guidance

When to use

The system retrieves content from a document store, knowledge base, or external index to produce its output
Users need to verify claims or trace answers to primary sources
The domain has high stakes for accuracy (legal, medical, financial, policy)
The system may retrieve from multiple conflicting sources
Users are researchers, analysts, or professionals with a need to cite or audit

When not to use

The system generates purely from model weights with no retrieval step
The sources are confidential, proprietary, or legally restricted from disclosure
Attribution would expose system architecture that creates a security risk
The task is low-stakes and source disclosure would add noise without value (e.g. casual chitchat)

Design

UI primitives

Inline Signal / Marker

Inline citation markers

Superscript numbers or footnote markers anchored to specific claims in the output, not just a list at the bottom. Users can see exactly which sentence came from which source.

Data Visualization / Highlight

Passage highlight

When a user clicks or hovers a citation marker, the retrieved passage is highlighted or shown in context, not just the document title.

Inline Signal / Indicator

Relevance indicator

A visual signal (bar, dot, percentage) showing how closely a retrieved source matched the query. Helps users assess whether the retrieval was a strong match or a loose approximation.

Content Block / Summary

Source count summary

A compact indicator in the interface showing "Based on 4 sources": gives a quick sense of whether the answer is well-grounded or based on a single retrieved document.

Contextual Overlay / State Display

No source found state

An explicit design state for when the system generated an answer without retrieving supporting material. This is not a failure state: it's a transparency signal that the response came from model knowledge, not a document.

How to use

Layer the disclosure.

Most users don't need to see all retrieved passages by default. Show a compact source count inline, expand to titles on demand, expand to full passages on further request.

Anchor citations to claims, not just responses.

A list of sources at the bottom of a response is weak attribution. Inline markers tied to specific sentences are far more useful and honest.

Distinguish retrieval strength.

A source retrieved with 0.94 cosine similarity is different from one retrieved at 0.51. That difference should be visible, especially in high-stakes contexts.

Show the "no source" state explicitly.

If a response was generated without retrieval, say so. This is as important as showing sources when they exist: it tells the user the basis has changed.

Don't conflate sources with correctness.

A response can cite a real source and still misrepresent or misquote it. Attribution is not a guarantee of accuracy: pair this pattern with Grounding & Hallucination Indicators when accuracy verification matters.

Use cases

flow a

Verify a specific claim

1. User receives a response with inline citation markers.
2. User clicks [2] on a specific sentence.
3. Passage panel expands showing the retrieved excerpt, source title, and relevance score.
4. User clicks through to the full original document.

flow b

Assess overall grounding

1. User sees "Based on 3 sources" in the response header.
2. User opens the source panel.
3. User sees one source is highly relevant, two are marginal.
4. User decides to re-prompt or seek additional verification.

flow c

No source found

1. User asks a question outside the document corpus.
2. System responds but shows "No sources retrieved: response based on model knowledge."
3. User is prompted to verify independently or provide a document.

Design trade-offs

Transparency vs. cognitive load

Showing all retrieved passages by default overwhelms most users. Default to summary, expand on demand.

Attribution vs. false confidence

Displaying a clean source list can make a response feel more authoritative than it is. Use uncertainty signals alongside attribution.

Source disclosure vs. system security

In some deployments, revealing retrieved documents exposes proprietary document stores. Consider showing document types or categories rather than full paths when disclosure must be limited.

Connections

Relation to other patterns

Data Provenance & Lineage

Where does this data come from?

Give users visibility into the origin, quality, and appropriateness of the data used in an AI decision: at both the model training level and the specific decision level.

Provenance concerns training data; this pattern concerns runtime retrieval. They address different moments in the AI pipeline.

Grounding & Hallucination Indicators

Can I trust what the system generated?

Give users visible signals about the factual grounding of AI-generated content: distinguishing between responses that are well-supported by retrievable evidence and those that may contain fabricated, outdated, or unverifiable claims.

Attribution shows what was retrieved; Grounding signals whether the output faithfully reflects it. Use together in high-stakes contexts.

Confidence & Uncertainty

How certain is the system?

Surface the AI system's confidence level and uncertainty range in a way that is immediately visible, readable by non-technical users, and calibrated to the actual uncertainty in the output: preventing both over-trust and unnecessary alarm.

Retrieval relevance scores complement output confidence signals.

Inspection Dialogue

Can I ask the system follow-up questions?

Provide an interactive mechanism through which users can ask follow-up questions about a decision or output, probe specific factors, test hypothetical changes, and receive direct answers: without requiring technical knowledge.

A user asking "where did you get that?" is a natural continuation of this pattern.

Sources

Lewis et al. (2020) — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

foundational paper establishing RAG as an architecture. Introduces the retrieval-generation split that makes source attribution both possible and necessary

Es et al. (2023) — RAGAS: Automated Evaluation of Retrieval Augmented Generation

introduces faithfulness and context relevance as measurable properties of RAG outputs. Provides the evaluative framing this pattern operationalizes for users

Gebru et al. (2018) — Datasheets for Datasets

while focused on training data, its framework for documenting data sources, collection methods, and intended use informs how retrieved sources should be disclosed

Created as a side project by Christian Laesser & AI