Project Noesis | Blackbook

Proud to support research and innovation - Project Noesis

Project Noesis — Architectural Overview

A Multi‑Run Learning Ecosystem for Symbolic and Structural Market Understanding

Author: Nick Mavrokefalos

Publication date: 20 March 2026

Abstract

Project Noesis is a multi‑run, self‑supervised learning framework designed to discover latent behavioural rules in tabular financial time‑series data. Unlike conventional predictive models, which optimise for accuracy on a fixed supervised task, this system seeks to uncover the structural regularities that govern the data‑generating process itself. The architecture integrates a tabular transformer with relative positional encodings, a metric‑learning rule‑space embedding head, hierarchical clustering for regime identification, and symbolic regression for equation extraction. Each training run begins from fresh random weights, yet contributes persistent structural knowledge in the form of rules, subrules, and symbolic expressions. A meta‑initialisation mechanism based on PCA‑reduced weight‑space clustering introduces evolutionary pressure across runs, guiding the system toward initialisation regions that consistently yield high‑quality structural discovery. The result is a continual, self‑supervised pipeline capable of rediscovering stable regimes, expressing them mathematically, and refining its own inductive biases over time. Project Noesis represents a step toward automated theory formation in financial systems, where understanding—not prediction—is the primary objective.

1. Introduction

Machine learning systems for financial data typically optimise for predictive accuracy under supervised objectives. While effective in narrow contexts, such systems rarely uncover the underlying structure of the data‑generating process. They learn to predict, but not to understand.

Project Noesis explores a different paradigm: a system that attempts to discover the rules of a dataset rather than merely forecast its outputs.

The central hypothesis is that structural understanding precedes prediction. If a model can identify stable behavioural regimes, infer latent relationships, and express them symbolically, then forecasting becomes a natural consequence of comprehension rather than memorisation.

To investigate this hypothesis, Project Noesis introduces a multi‑run, self‑supervised learning ecosystem that accumulates structural knowledge across independent training runs while discarding model weights. The system integrates:

a tabular transformer backbone
a rule‑space embedding mechanism
hierarchical clustering for rule discovery
symbolic regression for equation extraction
and an evolutionary meta‑initialisation process

Together, these components form a continual, self‑supervised pipeline for uncovering the latent rules governing tabular market data.

2. System Overview

Project Noesis is not a single model but a multi‑run architecture. Each run begins with fresh random weights, trains for a fixed number of epochs, and contributes structural information to a persistent rule memory.

Across runs, the system accumulates:

rules
subrules
symbolic equations
rediscovery statistics
and weight‑space performance metrics

This creates a feedback loop in which the system gradually converges toward more meaningful representations of the underlying data.

3. Transformer Backbone for Tabular Time‑Series

The core model is a transformer adapted for tabular market data. Each row is treated as a token, and a sequence of rows forms a temporal context window.

Key architectural features include:

learned feature embeddings for each column
relative positional encodings to capture temporal relationships
multi‑head self‑attention to model inter‑row dependencies
a CLS token to summarise global sequence information

This formulation allows the model to apply language‑model‑style relational reasoning to structured financial data.

4. Multi‑Task Self‑Supervised Learning

The model is trained using four complementary objectives:

4.1 Masked Feature Modelling (MLM)

Randomly masked values must be reconstructed, encouraging the model to learn feature semantics and cross‑column relationships.

4.2 Next‑Row Forecasting

The model predicts the next row in the sequence, learning temporal continuity and market dynamics.

4.3 Next‑Sequence Prediction (NSP)

Given two rows, the model classifies whether the second is the true successor. This teaches the notion of valid vs invalid transitions.

4.4 Rule‑Space Embedding

A metric‑learning head embeds transitions into a latent rule vector, forming the basis for rule discovery.

These tasks jointly shape the internal geometry of the model’s representation space.

5. Rule Discovery via Clustering

After each epoch, the system extracts rule vectors and performs:

dimensionality reduction
clustering in rule space
identification of stable behavioural regimes

Each discovered rule is represented by:

a centroid
member indices
rediscovery strength
global confidence
optional symbolic equations

Rules persist across runs, forming a long‑term structural memory independent of model weights.

6. Hierarchical Subrule Discovery

Within each rule, a second clustering pass identifies subrules, capturing finer‑grained structure such as:

conditional behaviours
local regimes
micro‑patterns

This hierarchical organisation mirrors human conceptual decomposition of complex systems.

7. Symbolic Regression for Equation Extraction

For each sufficiently strong rule or subrule, the system applies PySR to fit symbolic equations describing the relationships between selected target features.

Symbolic regression serves two purposes:

it externalises the model’s internal structure in interpretable mathematical form
equation quality contributes to rule confidence and long‑term stability

This step transforms latent representations into explicit, human‑readable hypotheses about the dataset.

8. Rule‑Guided Learning

Rules influence subsequent training runs through several mechanisms:

strong rules increase the weight of the rule‑embedding loss
equations may refine or constrain predictions
rule similarity affects NSP and forecasting
the model is nudged toward representations that yield cleaner, more stable equations

This creates a self‑reinforcing structural loop in which the system gradually improves its ability to discover and express rules.

9. Meta‑Initialisation Across Independent Runs

Each run logs:

its initial weights (flattened)
its success score (based on rule quality)

Across runs, the system performs:

PCA on initial weight vectors
KMeans clustering
ranking clusters by success score
sampling new initial weights from the best cluster

This mechanism imposes evolutionary pressure on weight space, allowing the system to converge toward regions that consistently yield high‑quality rule discovery.

Unlike traditional meta‑learning, this process does not retain model weights — only the distribution of successful initialisations.

10. Forecasting with Rule Alignment

During inference, the system:

generates a candidate next row
evaluates transition validity via NSP
aligns predictions with relevant rules
optionally refines outputs using symbolic equations

Predictions include:

NSP confidence
applied rules
similarity scores
rule weights
equation‑based adjustments

This produces explainable, rule‑aligned forecasts.

11. Novelty and Contribution

Project Noesis introduces several architectural innovations:

multi‑run structural learning without weight retention
persistent rule memory independent of model parameters
symbolic regression as a structural feedback signal
evolutionary meta‑initialisation based on weight‑space clustering
hierarchical rule and subrule discovery
relative‑encoding‑driven transformer for tabular transitions

The system behaves less like a predictor and more like a self‑supervised scientific engine, iteratively uncovering the rules that govern the dataset.

12. Indicators of Structural Understanding

The system is considered successful when it consistently produces:

stable rules rediscovered across independent runs
high‑confidence subrules
complex, non‑trivial symbolic equations
consistent regime structures
improving success scores
a convergent weight‑initialisation distribution

At this point, the model has effectively reverse‑engineered the dataset, demonstrating comprehension rather than memorisation.

Project Status and Sponsorship

Project Noesis is an active and evolving research initiative. The system described here represents the current state of an ongoing investigation into self‑supervised rule discovery, symbolic abstraction, and meta‑initialisation dynamics in complex tabular domains. Development continues as new mechanisms for structural stability, rule refinement, and equation extraction are explored, with the long‑term goal of advancing automated theory formation in financial systems.

This research is conducted and sponsored by London Blackbook Ltd, who support the continued exploration of novel architectures that prioritise structural understanding over conventional predictive modelling.