1
Files & Structure
2
Configure
3
Run
Step 1: Upload Files & Configure Structure
Database File Structure
Questions File Structure
Step 2: Validation & Configuration
Validating files...
✓ Files validated successfully
✗ Validation failed
⚠ Warnings
✓
Correct answers column validated
Database Columns Preview
| Letter | Name | Description | Unit |
|---|
Questions Preview
| # | Metric | Question |
|---|
Correct Answers Preview
| # | Metric | Expected Answer |
|---|
Pipeline Configuration
Configure how the matching pipeline processes questions. Expand each stage to learn what it does and tune its parameters.
0 Learning Normalization Process learning rules file
Processes your learning.txt file to normalize rules, detect conflicts, and optimize for matching. Runs once at startup before processing questions.
1-3 Candidate Retrieval Find potential column matches
Combines BM25 keyword search with semantic embeddings to find candidate database columns. Uses hybrid search with Reciprocal Rank Fusion (RRF) to balance exact keyword matches with semantic similarity.
Candidates by Database Size
Adaptive Weights (Dense vs BM25)
4 AI Reranking Filter candidates with AI
Uses AI to re-score and filter candidate columns based on question context and classification. Narrows down from K candidates to a smaller set for detailed matching.
5-6 Main Matching & Certainty Core matching algorithm
Primary GPT-4o chain-of-thought matching that analyzes candidates and selects the best match(es). Calculates certainty score combining embedding similarity and AI confidence.
Certainty Level Thresholds
Certainty score determines the confidence level: Incertain < Probable < Certain. Below the empty answer threshold, the answer is left empty.
Thresholds must be in order: Empty Answer < Uncertain < Probable
Certainty Score Weights
Controls how certainty score is calculated. Weights must sum to 1.0.
Weights must sum to 1.0
7 Self-Consistency Sampling Verify with multiple samples
When initial confidence is low, generates multiple samples with different temperatures and uses voting to verify the answer. Helps catch unstable matches where AI gives different answers.
Confidence Bonus Settings
8 Ambiguity Detection Find alternative matches
Detects when multiple database columns could reasonably answer the question. Flags high/medium ambiguity for human review and lists alternative matches.
9-10 Validation & Retry Verify answer quality
Validates the selected answer for scope/type/semantic mismatches. If validation finds issues and confidence is below threshold, re-runs matching with feedback.
G API Retry Settings Handle rate limits
Controls retry behavior for failed API calls due to rate limits or server errors. Uses exponential backoff with jitter to handle Azure's 60-second token-per-minute limits.
G Embedding & JSON Parsing Technical settings
Controls embedding generation batch size and JSON response parsing/repair behavior.
Step 3: Configure & Run
Run Options
Confidence Thresholds
Cache Options
Progress
0/0 questions
0%
Live Logs
Results
| # | Metric | Answer | Match Name | Expected | Expected Name | Certainty | Score | Status |
|---|
Output File & Logs Documentation
Output Excel File
The downloaded Excel file contains a single sheet named Results. Column headers are in row 1; data starts at row 2 with one row per question.
| Col | Header | Type | Values / Format | Description |
|---|---|---|---|---|
| A | Row ID | Number | e.g. 1, 2, 3 | Original row identifier from the questions file |
| B | Metric | String | Free text | Question metric name (short label) |
| C | Definition | String | Free text | Full question text / definition |
| D | Answer | String | Column letter(s), space-separated | Matched database column letter(s). Empty if no match. Up to 3 columns. |
| E | Match Name | String | Pipe-separated names | Full name(s) of matched column(s) including unit. Empty if no match. |
| F | Verification | String | Free text | Proof text from column descriptions showing why this match is correct |
| G | Explanation | String | Free text | Chain-of-thought reasoning from the AI explaining the matching logic |
| H | Confidence | Number | 0.00 – 1.00 | AI's self-assessed confidence in the match |
| I | Needs Review | String | YES / NO | Human review flag. YES when: confidence < review threshold, samples disagree, ambiguity is MEDIUM/HIGH, or validation finds issues |
| J | Certainty Score | Number | 0.00 – 1.00 | Combined score = (embedding similarity × weight) + (AI confidence × weight) |
| K | Certainty Level | String | Certain / Probable / Uncertain / No Data | Categorical level derived from certainty score thresholds |
| L | Expected Answer | String | Column letter(s) | Optional — expected correct answer (only when validation file provided) |
| M | Expected Name | String | Full name(s) | Optional — full name(s) of expected column(s) (only when validation file provided) |
Logs ZIP
The downloaded ZIP contains one timestamped directory with per-question subdirectories:
{YYYY-MM-DD - HH:MM}/
├── run-summary.json
├── Question 01 - 5/
│ ├── ia-calls.json
│ ├── Logs.json
│ ├── 01-QueryExpansion-Prompt.txt
│ ├── 01-QueryExpansion-Response.txt
│ ├── 06-MainMatching-Prompt.txt
│ ├── 06-MainMatching-Response.txt
│ └── ...
├── Question 02 - 6/
│ └── ...
| File | Description |
|---|---|
run-summary.json | Overall run statistics: question count, total tokens, cost, duration |
ia-calls.json | Consolidated log of all AI calls for a question: model, tokens, cost, duration, cache status, parsed results |
Logs.json | Legacy metadata: question info, candidate count, matching attempts with temperature and confidence |
{NN}-{Stage}-Prompt.txt | Raw prompt text sent to the AI at each pipeline stage |
{NN}-{Stage}-Response.txt | Raw AI response at each pipeline stage |
AI Pipeline Stages
| # | Stage | Description |
|---|---|---|
| 00 | LearningNormalization | Normalize and deduplicate learning rules |
| 01 | QueryExpansion | Expand question for better embedding search |
| 02 | ColumnInjection | Force-include columns from learning rules |
| 03 | PreClassification | Classify question type, scope, ESG category |
| 04 | Embedding | Semantic similarity search |
| 05 | Reranking | AI re-scoring of candidates |
| 06 | MainMatching | Primary GPT-4o column matching |
| 07 | SelfConsistency | Additional samples for verification |
| 08 | AmbiguityDetection | Detect alternative matches |
| 09 | Validation | Verify scope/type alignment |
| 10 | ValidationRetry | Retry with correction if validation fails |
| 11 | ColumnClarification | Disambiguate similar columns |
Accuracy Report
📝 Generated Learning Rules
These rules were generated from mismatched answers. Copy and add to your learning file to improve future matches.