The Research Layer

Peter: Iterative Research
Over Explicit Knowledge

Foundation models compress knowledge implicitly into opaque weights. Walter builds it explicitly into inspectable graphs. Peter is the research layer that navigates those graphs — not by retrieving text, but by conducting iterative, human-like research: asking, inspecting, identifying gaps, refining, and drilling deeper.

The Research Loop

At the heart of Peter is an iterative research process — not a single query-and-retrieve step, but a loop that mirrors how a skilled human researcher works.

Peter navigates Walter's compiled knowledge graphs with dependency awareness, prerequisite chains, and supporting concepts — refining its understanding with each pass through the loop.

This approach transforms knowledge retrieval from "find similar documents" into "build structured understanding iteratively".

Peter's Research Process

1 Query the knowledge graph
2 Inspect the structure of results
3 Identify gaps and contradictions
4 Refine the question
5 Drill deeper or broaden scope

Multi-Signal Knowledge Matching

Peter scores relevance using a hybrid approach that prevents false positives while capturing genuine conceptual relationships.

Hybrid Scoring Weights

Semantic
Embeddings
50%
BM25
Keywords
30%
Jaccard
Entities
20%
Temporal
× Multiplier
×

Why Multi-Signal Matters

Single-signal retrieval fails in predictable ways:

  • Semantic-only matches conceptually related but technically irrelevant content
  • Keyword-only misses paraphrased or differently-termed knowledge
  • Entity-only over-indexes on surface mentions without conceptual depth

Quality Gates

Peter's hybrid scoring with disqualification thresholds ensures:

  • Minimum semantic relevance required (conceptual gate)
  • Combined score must exceed threshold (quality gate)
  • Below-threshold matches explicitly rejected (no noise)

Two-Stage Retrieval Pipeline

Peter uses a two-stage retrieval architecture that optimises for both speed and precision:

STAGE 1
Bi-encoder
Fast recall
→ Top 100 →
STAGE 2
Cross-encoder
High precision
→ Top 20

Cross-encoders see query and document together, achieving 15-30% higher precision than bi-encoders alone.

Temporal Modes

Not all knowledge ages equally. Peter applies three temporal modes to distinguish foundational concepts from time-sensitive developments.

Concept

No decay. Definitions, principles, and foundational standards contribute equally regardless of when compiled.

Event

~70d

Exponential decay. Regulatory updates, rulings, and announcements lose relevance over time.

Hybrid

~140d

Slow decay with 0.5 floor. Best practices and evolving interpretations retain at least half their relevance.

This Means

  • A foundational legal definition contributes equally regardless of compilation date
  • A specific enforcement action from 18 months ago contributes less than one from last month
  • Industry best practices retain at least 50% relevance even as they age
  • The system automatically balances recency against foundational importance

Knowledge Graph Navigation

Walter's knowledge graphs encode explicit relationships between concepts. When Peter researches a specific topic, it automatically surfaces the prerequisite knowledge, supporting concepts, and dependent ideas — providing the full context a researcher needs.

PREREQUISITE: Data Protection Principles └── supports → DEPENDENT: GDPR Article 22 Enforcement PREREQUISITE: AI Ethics Frameworks └── supports → BUILDS ON: EU AI Act Implementation PREREQUISITE: Jurisdictional Reach └── depends-on → DEPENDENT: Cross-Border Transfer Rulings
Relationship Type Meaning
prerequisite Must be understood before the dependent concept makes sense
supports Foundational knowledge that reinforces the dependent idea
depends-on Cannot be fully evaluated without the prerequisite
related Thematic connection that provides useful context

Gap and Conflict Detection

Peter doesn't just retrieve what's known — it identifies what's missing, where sources disagree, and what needs more evidence.

Gap Identification

  • Detects missing prerequisites in the knowledge graph
  • Flags concepts referenced but never compiled
  • Identifies thin areas where evidence is sparse
  • Surfaces assumptions that lack explicit support

Conflict Surfacing

  • Compares claims across multiple compiled sources
  • Flags areas where sources reach different conclusions
  • Distinguishes genuine contradictions from scope differences
  • Prioritises conflicts by impact on the research question

Evidence Assessment

  • Evaluates how well-supported each claim is
  • Counts independent sources per knowledge unit
  • Flags single-source claims in high-stakes areas
  • Tracks confidence levels across the research path

Research Guidance

  • Suggests where to drill deeper based on gaps
  • Recommends broadening when context is missing
  • Proposes follow-up queries to resolve conflicts
  • Builds a map of what's known, unknown, and contested

Contradiction Detection

Knowledge sources often disagree. Peter uses Natural Language Inference (NLI) to detect and surface contradictions rather than hiding them.

How It Works

  • New claims are compared against semantically similar existing claims
  • NLI model classifies pairs as: entailment, contradiction, or neutral
  • Contradictions above confidence threshold are stored with relationship links
  • Both claims remain visible with conflict annotation

Why This Matters

  • Traditional systems flatten contradictions or pick arbitrarily
  • Research requires seeing where sources disagree
  • Confidence scores indicate strength of contradiction
  • Researchers can investigate and resolve conflicts

Example Contradiction

Claim A: "GDPR Article 22 prohibits automated decision-making"
CONTRADICTION (87% confidence)
Claim B: "GDPR Article 22 permits automated decisions with appropriate safeguards"

Both claims are preserved. The nuance (prohibition vs permitted-with-safeguards) is surfaced, not hidden.

Full Traceability Chain

Every assertion Peter surfaces traces back to sources. No "trust me" — every response is auditable to source documents.

Research Query Response │ ├── Knowledge Unit │ ├── claim_text: "GDPR Article 22 requires..." │ ├── confidence: 0.92 │ └── sources[] │ ├── document: "EU_GDPR_2016_679.pdf" │ ├── section: "Article 22, Paragraph 1" │ └── context: "automated individual decision-making" │ ├── Knowledge Context │ ├── related_units: 47 │ ├── prerequisite_chain: ["data-subject-rights", "lawful-basis"] │ └── contradictions: ["member-state-derogations"] │ └── Match Score Breakdown ├── semantic_score: 0.82 ├── bm25_score: 0.71 ├── jaccard_score: 0.45 └── temporal_weight: 0.95

How Peter Differs from RAG

RAG answers: "Here are documents that seem related."
Peter answers: "Here's what you need to know, what you need to understand first, where the sources disagree, and exactly where each claim comes from."

Traditional RAG Peter
Knowledge implicit in model weights Knowledge explicit in inspectable graphs
Retrieve similar chunks Navigate structured knowledge graphs
Single-stage embedding search Two-stage: bi-encoder recall → cross-encoder precision
Single-signal matching Multi-signal hybrid scoring with disqualification
No temporal awareness Temporal modes for different knowledge types
Flat retrieval Knowledge graph navigation with prerequisite chains
Contradictions flattened or hidden NLI-based contradiction detection and surfacing
Direct query embedding Query transformation (hypothetical document generation)
One-shot similarity Iterative refinement with prerequisite chains
"Here are related documents" "Here's context, conflicts, and confidence"

Making Papers Think Together

Peter transforms compiled knowledge into navigable understanding.
  • Iterative research loop — Ask, inspect, identify gaps, refine, drill deeper
  • Knowledge graph navigation — Prerequisite chains and dependency awareness
  • Gap identification — Surfaces what's missing and under-evidenced
  • Multi-signal matching — Beyond semantic similarity
  • Temporal awareness — Foundational vs time-sensitive
  • Contradiction detection — NLI-powered conflict surfacing
  • Full traceability — Every claim auditable to source
  • Two-stage retrieval — Bi-encoder recall, cross-encoder precision

Walter compiles the knowledge. Peter makes it think.
Together, they form the basis of a new kind of cognitive infrastructure.

Get in Touch Back to Overview