SBPI Prediction Pipeline — Full Stack Execution

April 2, 2026 | ShurAI Internal Technical Report

Session Overview

In a single session, we executed the complete SBPI prediction pipeline from infrastructure recovery through live prediction generation. Six interdependent steps were completed: config deployment, critical source code recovery, knowledge graph expansion, W14 prediction generation, signal weight optimization, and brand intelligence card creation. The pipeline is now fully operational with a dual-config test locked for W14 evaluation.

Key Metrics

4,052 RDF Triples in Oxigraph

up from 1,672

105 W14 Predictions Locked

5 methods × 21 companies

4 Weeks of Data Loaded

W10 – W13

66.7% Signal Weight Optimization

Optuna TPE best score

21 Brand Intel Cards

deployed & live

4 Live Sites Deployed

Cloudflare Pages

Pipeline Steps Completed

1. Infrastructure Recovery

Decompiled missing sbpi_to_rdf.py from bytecode. Resolved Oxigraph store lock via SPARQL INSERT DATA batches.

2. Knowledge Graph Expansion

Loaded 4 weeks of data (W10–W13), growing the store from 1,672 to 4,052 triples (+142%).

3. Prediction Pipeline Upgrade

Added kg_optimized as 5th prediction method. Dual-config test locked for W14.

4. W14 Predictions Generated

105 predictions locked across 5 methods and 21 companies.

5. Signal Weight Optimization

Optuna TPE optimizer (30 trials) achieved 66.7% appropriate rate on synthetic labels.

6. Brand Intelligence Cards

21 sortable company intelligence cards deployed to sbpi-brand-intel.pages.dev.

Infrastructure Recovery

Two critical infrastructure issues were resolved before the pipeline could execute.

sbpi_to_rdf.py Recovery

The source file for the RDF ETL pipeline was missing. Only a compiled .pyc existed in __pycache__. An agent decompiled the bytecode, recovering all 7 functions and 63 module-level names with exact fidelity. The reconstructed script was verified to generate 1,033 triples from W13 data.

A null-safety patch was applied for archive files with missing previous_composite fields.

Recovered Functions

# 7 functions recovered from bytecode decompilation
def load_state_file(path)
def state_to_graph(state, week_label)
def link_weeks(graph, weeks)
def validate_graph(graph)
def load_to_oxigraph(graph, endpoint)
def run_sample_queries(endpoint)
def main()

CLI Interface

# Available command-line options
--current      # Process current week only
--all          # Process all available weeks
--validate     # Run graph validation checks
--serve        # Start local Oxigraph server
--output-turtle # Export as Turtle format

Oxigraph Store Lock Resolution

The Oxigraph server (PID 2674) held an exclusive lock on the store, blocking the standard pyoxigraph file-access pattern.

Resolution Path

Generated N-Triples via rdflib from the state files
Loaded via SPARQL INSERT DATA batches (50 triples per batch) through the HTTP endpoint on port 7878
The /store POST endpoint accepted data (HTTP 201) but didn't persist
SPARQL INSERT DATA was the working path that persisted triples reliably

Prediction Pipeline Upgrade

The prediction experiment script was upgraded from 4 methods to 5, adding the Optuna-tuned kg_optimized configuration for a dual-config head-to-head test.

Method Comparison

Method	Description	W14 Predictions
persistence	Predicts no change (delta = 0)	21 STABLE
naive_momentum	Predicts delta = last week's delta	8 UP 8 STABLE 5 DOWN
mean_reversion	Predicts reversion toward tier midpoint	21 UP
kg_default	Original hardcoded thresholds (untuned)	21 STABLE
kg_optimized	Optuna-tuned 12-parameter config	13 UP 8 STABLE

Dual-Config Test

kg_optimized uses the Optuna TPE-optimized parameters from best-config.json (69.9% training accuracy on W10–W12 data). It imports load_best_config() and predict_with_config() from kg_interface_optimizer.py. Falls back to default config if best-config.json is missing.

Key Optimized Parameters

Parameter	Default	Optimized	Change
direction_threshold	0.500	1.295	+159%
confidence_base	0.600	0.443	−26%
mean_reversion_rate	0.100	0.257	+157%
anomaly_contributes	false	true	enabled
divergence_weight	—	0.180	new
tier_proximity_weight	—	0.096	new

Sample Prediction Comparison — Amazon W14

Method	Direction	Delta	Confidence
kg_default	STABLE	0.00	0.50
kg_optimized	UP	+1.99	0.95
naive_momentum	DOWN	−2.60	0.55
mean_reversion	UP	+0.78	0.45

Knowledge Graph Expansion

1,672 Previous Triples

4,052 Current Triples

+142% Growth

Data Loaded — 4 Weeks

Week	Score Records	Status
W10-2026	16	Archive
W11-2026	17	Archive
W12-2026	21	Archive
W13-2026	21	Current

Triple Composition Per Company

Company entity
~10 triples
type, slug, name, vertical, geography, roles

ScoreRecord
~8 triples
type, forCompany, forWeek, composite, previous, delta, tier

DimensionScore
~15 triples
5 dimensions × 3 triples each

Signal
~3–4 triples
type, text, URL

Attestation
~6 triples
type, confidence, source, provenance

SPARQL Verification — W13 Top 5

Rank	Company	Composite Score
1	DramaBox	82.75
2	ReelShort	81.20
3	Disney	77.10
4	iQiYi	67.30
5	JioHotstar	65.40

Biggest W13 Movers

Company	Delta
Amazon	+4.05
JioHotstar	+3.15
COL/BeLive	+2.70
Both Worlds	+2.65
GammaTime	+2.35

21 distinct companies confirmed across the knowledge graph via SPARQL query.

Signal Weight Optimization

Track C: Signal Weighting Research Program | Experiment 3

Research Question

What parameters control the threshold between "this signal warrants a mitigation recommendation" and "this signal is expected volatility within a functioning strategy"?

Optimizer Configuration

Optuna TPE Optimizer

30 Trials

66.7% Best Score

75 Synthetic Labels

Optimized Parameters

Parameter	Default	Optimized	Change
materiality_threshold	2.000	2.362	+18%
structural_change_weight	1.500	2.286	+52%
competitor_action_weight	1.300	1.806	+39%
tier_proximity_sensitivity	3.000	5.902	+97%
multi_dimension_threshold	2	2	+0
trajectory_window	4	8	+4 weeks
volatility_dampener	0.300	0.658	+119%

Interpretation

The optimizer learned to raise the materiality threshold (+18%), weight structural changes more heavily (+52%), extend the trajectory lookback window from 4 to 8 weeks, and increase the volatility dampener significantly (+119%) — meaning historically volatile companies get reduced urgency.

Tier proximity sensitivity nearly doubled (+97%), meaning the system becomes much more alert when companies approach tier boundaries.

Trial Distribution

Mean
62.2%

Std Dev
2.4%

95% CI
61.3% – 63.2%

Best
66.7%

0% appropriate_rate (maximized) 100%

Deployed Deliverables

4 sites deployed during this session, all on Cloudflare Pages.

Site	URL	Account	Content
W13 Editorial	microco-weekly-editorial-bja-8zm.pages.dev	weareshur	8-tab weekly report with corrected framing
W13 Validation	sbpi-w13-validation-3y2.pages.dev	weareshur	Prediction validation with back-to-editorial nav
Brand Intel Cards	sbpi-brand-intel.pages.dev	weareshur	21 sortable company intelligence cards
Autoresearch Status	sbpi-autoresearch-status.pages.dev	getsteady	13-day pipeline activity report

Editorial Corrections Applied

Changed "KG-augmented is broken, not wrong" to "KG-augmented — first tuning pass pending"
Changed "predicts stable for everything because the optimized config has not been deployed" to "ran with default (untuned) parameters; an optimized config deploys for W14"
Updated all validation report links to new weareshur URLs
Added back-to-editorial navigation on validation report

What Comes Next

The pipeline is operational. Nightly outputs and W14 evaluation cycle below.

Automated Nightly Pipeline

Scheduler runs at 6:13 AM daily:

SPARQL queries against Oxigraph — nightly insights (movers, anomalies, tier proximity alerts)
Prediction experiment — locked predictions for next week
KG interface optimizer — incremental config improvement as data accumulates

W14 Evaluation Cycle

When W14 actuals are scored:

Evaluate all 105 locked predictions against actuals
First head-to-head: kg_optimized vs kg_default — does the optimized config add signal?
Cumulative accuracy across 3 weeks (57 predictions per method)
Update brand intelligence cards with W14 data
Generate W14 editorial report

Growing the Knowledge Graph

Each week adds ~800+ triples to the store. Current trajectory:

W13
4,052

W14 (est.)
~4,850

W20 (est.)
10,000+

Target
Full automation target

Domain-specific longitudinal data that general-purpose models lack access to.

Signal Weight Refinement

The 75 synthetic labels bootstrap the system. As human expert labels replace synthetic ones (via the --label interactive interface), the signal weight optimizer converges on validated materiality thresholds. The 66.7% appropriate rate should improve as real expert judgment replaces synthetic labels.

Brand Intelligence Cards

The 21-card deck becomes a weekly deliverable. Each card's intelligence take updates as new data arrives, tracking structural position changes, dimension gaps closing or widening, and tier transition trajectories.