
Proud to support research and innovation - Project Noesis
Project Noesis — Continuous Structural Discovery and Canonical Law Extraction in Tabular Systems
Author: Nick Mavrokefalos
Date: June 2026
Sponsor: London Blackbook Ltd
Abstract
Project Noesis is a continuous, self-supervised learning architecture designed to discover invariant structural laws from evolving tabular data. Unlike conventional machine learning systems that optimise predictive accuracy on fixed datasets, Noesis treats prediction as secondary evidence for structural validity. Its primary objective is the extraction, refinement, and consolidation of symbolic representations that approximate the underlying data-generating process.
The system operates as a multi-run ecosystem in which each run is independent in parameter space but contributes persistent structural knowledge in the form of rule libraries, subrule hierarchies, and symbolic equations. A central innovation is Global Symbolic Consolidation, where locally discovered symbolic expressions are clustered, merged by shared support, and re-optimised through guided symbolic regression to produce canonical equations representing stable data regimes.
A meta-initialisation mechanism introduces evolutionary pressure across runs by clustering weight-space initialisations using PCA and selecting high-performing regions for future sampling. Combined with continuous ingestion of new tabular rows, Noesis functions as a persistent scientific discovery engine capable of refining its own inductive biases over time.
The system is designed to operate in streaming environments such as robotics, finance, and sensor-driven systems, where data evolves continuously and structural laws must be incrementally discovered rather than statically learned.
1. Introduction
Most machine learning systems are designed to approximate functions for prediction. However, prediction alone does not guarantee understanding of the underlying system.
Project Noesis is built on a different premise:
The goal of intelligence is not prediction, but the discovery of invariant structure.
Noesis attempts to infer the latent laws governing tabular systems by iteratively constructing symbolic representations of observed behaviour. Rather than relying on a single training process, it operates across multiple independent runs, each contributing partial structural hypotheses that accumulate over time.
This transforms learning into a continuous scientific process:
-
hypothesis generation (rules and subrules)
-
symbolic formulation (equations)
-
cross-run consolidation (law refinement)
-
persistent memory (structural knowledge base)
2. System Overview
Noesis is not a single model but a persistent learning ecosystem composed of repeated training cycles.
Each cycle produces:
-
rule embeddings
-
hierarchical clusters of behavioural regimes
-
symbolic equations via regression
-
evaluation metrics for structural validity
Across cycles, this information is stored in persistent libraries:
-
Rule Library
-
Meta-Rule (Subrule) Library
-
Global Symbolic Library
Importantly, model weights are discarded after each run. Only structural artifacts persist.
3. Transformer-Based Representation Engine
At the core of each run is a tabular transformer adapted for sequential structured data.
Each row represents a token in a temporal context window.
The model includes:
-
feature embeddings per column
-
relative positional encodings for temporal structure
-
multi-head attention over rows
-
a global CLS token for sequence summarisation
This architecture produces latent representations that encode both:
-
inter-feature relationships
-
temporal dynamics
-
regime transitions
However, these representations are not the final output of the system. They serve as input for downstream structural discovery.
4. Multi-Task Self-Supervised Learning
Each run is trained using multiple complementary objectives:
4.1 Masked Feature Modelling
Random feature masking forces the model to infer missing values from context, encouraging structural feature dependency learning.
4.2 Next-Row Prediction
The system learns temporal continuity by predicting the next row in a sequence.
4.3 Transition Validity Classification
A binary objective determines whether a row transition is valid under learned dynamics.
4.4 Rule-Space Embedding
A metric-learning head embeds transitions into a latent rule space, forming the basis for clustering and symbolic extraction.
5. Rule Discovery and Hierarchical Structure
After each training epoch, Noesis extracts rule embeddings and performs clustering in latent rule space.
This yields:
-
behavioural regimes (rules)
-
sub-regimes (subrules)
-
stability measures based on rediscovery frequency
-
confidence scores derived from consistency across runs
Rules are stored persistently across runs and do not depend on model weights.
This forms a structural memory system independent of parameter instantiation.
6. Hierarchical Subrule Formation
Within each rule cluster, secondary clustering identifies finer structure.
Subrules capture:
-
conditional behaviours
-
local regime variations
-
micro-patterns within larger regimes
This produces a hierarchical decomposition:
system → rule → subrule → symbolic expression
7. Symbolic Regression and Equation Extraction
For each sufficiently stable rule or subrule, symbolic regression (PySR) is used to extract equations that approximate observed relationships.
These equations represent:
-
candidate structural laws
-
interpretable functional relationships
-
compressed representations of regime dynamics
Each equation is stored with:
-
feature set
-
confidence score
-
support indices
-
rediscovery statistics
However, these equations are not final laws. They are hypotheses subject to global refinement.
8. Global Symbolic Consolidation (Core Innovation)
A central component of Noesis is Global Symbolic Consolidation, which transforms local symbolic hypotheses into canonical laws.
This process includes:
8.1 Equation Aggregation
All equations from:
-
rules
-
subrules
are collected into a unified symbolic space.
8.2 Fingerprint-Based Clustering
Equations are embedded using structural fingerprints capturing:
-
operator distributions
-
variable interactions
-
tree depth
-
functional composition
Clustering identifies families of structurally similar equations.
8.3 Support Merging
For each equation family:
-
member indices are merged
-
feature sets are unified
-
training data is expanded
This creates a richer empirical basis than any individual rule.
8.4 Guided Symbolic Re-Regression
Each family undergoes re-optimisation using guided symbolic regression.
Key properties:
-
restricted operator set from family
-
feature constraints from union of supports
-
multi-seed robustness search
-
warm-start bias from family equations
This step produces refined candidate laws grounded in broader evidence.
8.5 Global Evaluation
Each candidate equation is evaluated using:
-
MAE / RMSE / median error
-
coverage of valid predictions
-
monotonicity analysis
-
sensitivity gradients
-
outlier robustness
A calibrated confidence score is computed from error structure and stability.
8.6 Canonical Equation Selection
A multi-criteria scoring function selects:
-
highest confidence
-
best coverage
-
lowest complexity
-
best robustness
The result is a canonical equation per target, forming the current best approximation of a structural law.
9. Rule-Guided Learning Feedback Loop
Discovered rules influence future learning through:
-
weighting of rule embedding loss
-
constraint biasing in prediction tasks
-
reinforcement of stable regimes
-
selection pressure on symbolic consistency
This creates a feedback loop between:
representation → rule discovery → symbolic extraction → learning bias
10. Continuous Streaming Operation
Noesis is designed for continuously evolving datasets.
In streaming mode:
-
new rows are appended in real time
-
structural memory is preserved
-
symbolic libraries are incrementally updated
-
multiple full training runs are executed cyclically
-
system restarts without losing discovered structure
This enables application in:
-
robotics sensor streams
-
financial tick data
-
industrial telemetry
-
adaptive control systems
11. Meta-Initialisation Across Runs
Each independent run records:
-
initial weight configuration (flattened vector)
-
resulting structural discovery score
Across runs:
-
PCA is applied to weight initialisations
-
KMeans clusters successful initialisation regions
-
high-performing clusters are reinforced
-
new initialisations are sampled from these regions
This introduces evolutionary pressure in weight space, improving structural discovery efficiency over time.
Importantly:
no raw model weights are preserved—only structural statistics.
12. Inference and Rule-Aligned Prediction
During inference, predictions are generated using:
-
transformer output
-
rule alignment scoring
-
symbolic equation correction
-
transition validity filtering
Each prediction includes:
-
confidence from rule space
-
applicable rules/subrules
-
symbolic adjustment factors
-
validity classification
This results in explainable, structure-aware predictions.
13. Failure as Structural Evidence
A key principle of Noesis is that failure is informative.
When no stable global equation emerges:
-
rule fragmentation persists
-
confidence remains low
-
symbolic families remain unstable
This is interpreted as:
insufficient feature representation of the underlying system
Thus, failure indicates missing structure, not simply poor performance.
14. Scientific Interpretation
Noesis behaves less like a predictive model and more like a computational scientific system.
It:
-
generates hypotheses (rules)
-
refines them (subrules)
-
expresses them mathematically (symbolic regression)
-
validates them globally (consolidation)
-
retains only stable laws
Over time, it converges toward a structured approximation of the underlying system dynamics.
15. Novel Contributions
Project Noesis introduces:
-
persistent multi-run structural learning independent of model weights
-
hierarchical rule and subrule discovery
-
symbolic regression as a structural feedback mechanism
-
global symbolic consolidation into canonical laws
-
evolutionary meta-initialisation via weight-space clustering
-
continuous streaming adaptation for tabular systems
-
failure-as-information learning paradigm
16. Conclusion
Project Noesis represents a shift from predictive modelling toward structural discovery.
Instead of learning a function that fits data, Noesis constructs a persistent, evolving set of symbolic hypotheses about the data-generating process itself.
Over time, it aims to converge on:
a library of canonical equations that approximate the invariant laws governing complex tabular systems.
In this sense, Noesis is not a model, but an evolving scientific engine for automated theory formation.
Project Status and Sponsorship
Project Noesis is an active and continuously evolving research system for structural discovery in tabular data. The framework implements a multi-run, self-supervised learning process in which behavioural regimes, symbolic equations, and higher-order structural relationships are incrementally extracted, consolidated, and refined over time. The current system extends beyond local rule discovery to include global symbolic consolidation, enabling the formation of canonical equations derived from aggregated evidence across clustered symbolic families.
Development is ongoing, with continued improvements to hierarchical rule formation, symbolic regression stability, and cross-run meta-initialisation dynamics that guide the system toward regions of parameter space that consistently yield high-quality structural hypotheses. The long-term objective is the automated discovery of invariant laws governing complex, potentially streaming tabular systems, where data is continuously updated and structural models must evolve accordingly.
This research is conducted and sponsored by London Blackbook Ltd, supporting the development of systems that prioritise structural understanding and law discovery over conventional predictive optimisation.
