R API (tensorsl)¶
tsl-r/ (R package tensorsl) wraps the Rust core through
extendr, the R analogue of tsl-py. It
exposes a small S3 fit/predict interface whose hyperparameters mirror the Python
TSLRegressor — a model fit in R with the same data and
seed reproduces the Python results — plus a native ggplot2
interpretability layer.
The package has three layers:
- Fit & inspect —
tsl(),predict(),print(), andtsl_components()(the fitted glass-box structure). - Compute layer —
tsl_*()functions that reconstruct the interpretability quantities (partial dependence, tilt, backbone, importance, local explanations) exactly from the fitted components and return tidy data frames. - Plot layer —
plot_*()functions that render those quantities as ggplots in a flat theme, plus the one-verbautoplot()entry point. They mirror the Pythontensorsl.plothelpers; the Plotting reference shows the rendered figures.
Input contract
tsl() and predict() take a numeric matrix (rows = observations, columns =
features); a data frame is coerced via as.matrix(). NA/NaN/Inf are rejected.
Column names become the feature names used for labelling.
Not portable across sessions
A fitted tsl object holds an external pointer into Rust, so it cannot be round-tripped
through saveRDS() / readRDS(). Refit in the new session (fits are deterministic given
seed). Serialisation is a documented follow-up.
Fit & inspect¶
tsl()¶
tsl(x, y,
epochs = 10L, n_trees = 10L, n_iter = 10L, decay = 1.0,
split_try = 10L, colsample_bytree = 0.8, alpha = 0.0,
complexity_penalty = 0.0, min_split_loss = 0.0, min_interval_samples = 1L,
refinement_strategy = "l2", prior_sample_size = 0.0, update_clamp = Inf,
tilt_tau = 0.01, tilt_rho = 0.0, split_strategy = "random",
top_k = 10L, must_fill_all_k = TRUE, similarity_threshold = 0.0,
bagged = FALSE, seed = 42L, verbosity = 1L)
Fit a boosted TSL model. The flat hyperparameters map onto the same Rust builders as the Python API; see the Hyperparameters reference for tuning guidance.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
matrix |
required | training features (rows = observations, columns = features) |
y |
numeric |
required | training targets, one per row of x |
epochs |
integer |
10 |
number of boosting rounds (stages) |
n_trees |
integer |
10 |
bagged grid tensors per stage |
n_iter |
integer |
10 |
split budget per grid |
decay |
numeric |
1.0 |
multiply n_iter by this after epoch 1 |
split_try |
integer |
10 |
candidate split positions per feature ("random" strategy) |
colsample_bytree |
numeric |
0.8 |
fraction of features sampled per tree |
alpha |
numeric |
0.0 |
refinement regularisation strength |
complexity_penalty |
numeric |
0.0 |
penalty discouraging extra splits |
min_split_loss |
numeric |
0.0 |
minimum loss reduction to accept a split |
min_interval_samples |
integer |
1 |
minimum observations per interval |
refinement_strategy |
character |
"l2" |
"l2" or "huber" |
prior_sample_size |
numeric |
0.0 |
parent-anchoring strength (advanced; 0.0 = off) |
update_clamp |
numeric |
Inf |
update-magnitude cap (advanced; Inf = off) |
tilt_tau |
numeric |
0.01 |
\(\ell_2\) coupling between the \(u_+\) and \(u_-\) tilts |
tilt_rho |
numeric |
0.0 |
\(\ell_1\) coupling on \((u_+ - u_-)\) |
split_strategy |
character |
"random" |
"random", "best_split", or "top_k" |
top_k |
integer |
10 |
(for "top_k") candidate pool size |
must_fill_all_k |
logical |
TRUE |
(for "top_k") require all \(k\) slots |
similarity_threshold |
numeric |
0.0 |
bag trim \(\xi\) (0 keeps all) |
bagged |
logical |
FALSE |
accepted for parity with the Python API; has no effect |
seed |
integer |
42 |
RNG seed (fits are deterministic) |
verbosity |
integer |
1 |
log level: 0 off, 1 info, 2 debug, 3 trace |
Returns
An object of class "tsl": a list carrying the fitted-model pointer and training
diagnostics.
| Field | Type | Description |
|---|---|---|
ptr |
externalptr |
handle to the fitted Rust model |
err |
numeric |
final training error |
residuals |
numeric |
training residuals |
y_hat |
numeric |
training predictions |
feature_names |
character | NULL |
column names of x |
n_features, n_obs |
integer |
training shape |
x_background |
matrix |
a copy of the training design matrix, retained so the plotting functions can marginalise over it without it being passed again |
call |
call |
the matched call |
predict()¶
Predict from a fitted model. newdata must be a numeric matrix with the same number of
columns as the training data; the prediction is the sum of all stage predictions.
| Parameter | Type | Default | Description |
|---|---|---|---|
object |
tsl |
required | a fitted model from tsl() |
newdata |
matrix |
required | features to predict (same columns as training) |
Returns — a numeric vector of predictions, one per row of newdata.
print()¶
A one-line summary of the fitted model: feature count, training rows, and training error.
Returns x invisibly.
tsl_components()¶
Extract the fitted glass-box structure in two-tensor form — one entry per boosting stage. Each component's prediction is the ordered difference \(\lambda_+ \prod_j b_j e^{d_j} - \lambda_- \prod_j b_j e^{-d_j}\) (see backbone and exponential tilt).
Returns — a list with one element per stage; each stage is a list with:
| Field | Type | Description |
|---|---|---|
scaling_plus, scaling_minus |
numeric |
the stage's OLS coefficients on the \(+\) and \(-\) branches (the only place scaling is applied) |
candidate_indices |
integer |
1-based indices of the bagged trees kept in the stage |
combined_grid_tensor |
list |
the aggregated representative component |
grid_tensors |
list |
the per-tree bag it was aggregated from |
Each grid tensor is itself a list of per-feature splits, backbone_values (\(b \ge 0\)),
tilt_values (\(d \in \mathbb{R}\)), and observation_counts, plus the branch scalars
lambda_plus, lambda_minus and the legacy scaling (ignored in two-tensor mode).
Interpretability: compute layer¶
These pure functions reconstruct each interpretability quantity from the fitted components
and return tidy data frames; the plot_*() functions build on them, and power users can
consume the data directly. Because TSL is separable the reconstructions are exact, not
sampled approximations. The model-native PD math is derived in
Partial dependence.
Common parameters¶
Most compute functions share these; the per-function notes below list only the distinctive ones.
| Parameter | Type | Default | Description |
|---|---|---|---|
object |
tsl |
required | a fitted model from tsl() |
X |
matrix | NULL |
NULL |
background matrix to marginalise over (default: the training data retained by tsl()) |
features |
character | integer | NULL |
NULL |
features to include, by name or 1-based index (default: all) |
feature_x, feature_y |
character | integer |
required | the two features for 2D functions |
stages |
integer | NULL |
NULL |
stages to include, 1-based (default: all) |
grid_points |
integer |
200/100/50 |
evaluation resolution |
scale |
character |
"raw" |
"raw" (prediction scale) or "component" (\(\hat{m}\)-space shapes) |
tsl_pd()¶
tsl_pd(object, X = NULL, features = NULL, grid_points = 200L, stages = NULL,
scale = "raw") -> data.frame
First-order partial dependence per feature and stage: the positive branch pos, the
signed-negative branch neg, and their sum net. Summing net over stages gives the
total effect.
Returns — a data frame with columns feature, feature_idx, stage, stage_idx,
x, pos, neg, net, c_plus, c_minus, backbone.
tsl_pd_2d()¶
Per-stage signed 2D partial dependence over a feature pair, evaluated on a
grid_points × grid_points mesh in long form.
Returns — a data frame with columns x, y, stage, value (the signed 2D PD). The
grid vectors are attached as attributes x_vals/y_vals, the feature names as
feature_x/feature_y.
tsl_backbone_2d()¶
tsl_backbone_2d(object, X = NULL, feature_x, feature_y, grid_points = 100L,
stages = NULL) -> data.frame
The per-stage backbone product
\(b_x(x)\,b_y(y)\) (unsigned magnitude) and the signed 2D partial dependence, stacked in one
long frame with a panel factor ("backbone product" / "2D partial dependence"). Grid
vectors and feature names are attached as attributes.
tsl_ice()¶
Individual Conditional Expectation curves for one feature: each sampled observation's prediction as the feature is swept over its range, holding the observation's other features fixed.
| Parameter | Type | Default | Description |
|---|---|---|---|
feature |
character | integer |
required | the feature to vary |
n_ice |
integer |
50 |
observations sampled for the ICE lines |
seed |
integer |
0 |
seed for the observation sample |
Returns — a list with ice (data frame ice_id, x, y) and pd (data frame x,
y, the average curve).
tsl_tilt()¶
The piecewise-constant tilt \(d_j(x)\) (the signed direction) per feature and stage.
Returns — a data frame with columns feature, stage, x, d.
tsl_tilt_2d()¶
tsl_tilt_2d(object, X = NULL, feature_x, feature_y, grid_points = 100L,
stages = NULL) -> data.frame
Per-stage tilt product \(d_x(x)\,d_y(y)\) on a mesh, in long form.
Returns — a data frame with columns x, y, stage, value; grid vectors and feature
names attached as attributes.
tsl_tilt_diagnostics()¶
tsl_tilt_diagnostics(object, X = NULL, features = NULL, grid_points = 200L,
stages = NULL) -> data.frame
Four diagnostic curves per feature and stage — \(\tanh d\), \(b\tanh d\), \(\tanh(d - \bar d)\), and \(b\tanh(d - \bar d)\), where \(b\) is the backbone and \(d\) the tilt. See Backbone–tilt reconstruction.
Returns — a data frame with columns feature, stage, x, curve, value.
tsl_importance()¶
Per-stage and aggregated feature importance. Backbone importance is \(\mathrm{Var}[\log b_j]\) (how strongly feature \(j\) gates); tilt importance is \(\mathrm{Var}[d_j]\) (how strongly it steers). Stages are weighted by their share of prediction energy; the combined score is \(I_j = I_j^b + \gamma\, I_j^d\).
| Parameter | Type | Default | Description |
|---|---|---|---|
gamma |
numeric |
1 |
weight on the tilt term in the combined score |
Returns — a list with per_stage (data frame feature, stage, backbone, tilt),
global (data frame feature, backbone, tilt, combined), stage_weights (data frame
stage, weight), and gamma.
tsl_local()¶
Decompose the prediction for one point into per-stage positive and negative branch contributions (summing to the prediction), per-feature backbone and tilt values, and the stage intercepts that absorb the OLS scalings.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
numeric |
required | a vector of length n_features (or a one-row matrix) |
Returns — a list with stage_contributions, f_plus_contributions,
f_minus_contributions, backbone_magnitudes, tilt_sums, feature_backbone and
feature_tilt (matrices, stages × features), intercept_backbone, intercept_tilt,
total_prediction, a tidy stages data frame (stage, fpos, fneg, net), and
feature_names.
tsl_plot_data()¶
Recover the data frame a plot_*() function was built from (attached as the "tsl_data"
attribute), so the plot can be rebuilt or extended. Returns NULL if absent.
Interpretability: plot layer¶
Each plot_*() function reconstructs its data through the compute layer and assembles a
ggplot in the flat theme, attaching the computed data as the "tsl_data" attribute (recover
it with tsl_plot_data()). They share the
common parameters above. The rendered figures are shown on the
Plotting reference.
plot_first_order_pd()¶
plot_first_order_pd(object, X = NULL, features = NULL, grid_points = 200L,
stages = NULL, scale = "raw", show_backbone_overlay = TRUE,
show_data_density = FALSE) -> ggplot
The per-stage, per-feature first-order PD as a faceted grid: the positive branch PD+
(orange) and negative branch PD- (blue), both on the positive scale, with the signed gap
between them shaded. That gap is the net effect \(\mathrm{PD}_+ - \mathrm{PD}_-\).
| Parameter | Type | Default | Description |
|---|---|---|---|
show_backbone_overlay |
logical |
TRUE |
overlay the \(\sqrt{C_+ C_-}\,b\) backbone as a dotted line |
show_data_density |
logical |
FALSE |
add a bottom rug from a sample of the background rows |
pd_difference_plot()¶
pd_difference_plot(object, X = NULL, features = NULL, grid_points = 200L,
stages = NULL, show_backbone_overlay = TRUE,
show_data_density = FALSE) -> ggplot
The same faceted grid as plot_first_order_pd(), framed as the signed contribution: the
shaded gap between the PD+ and PD- curves is the stage's net effect, with the per-feature
backbone optionally overlaid as a dotted line. The workhorse 1D interpretation plot.
plot_2d_pd()¶
Per-stage signed 2D partial dependence over a feature pair, drawn as a tiled heatmap with the blue–orange diverging fill. Each panel is rescaled to its own 98th-percentile range so weaker stages stay legible.
plot_ice()¶
ICE curves for one feature (faint indigo, one per sampled observation) with the
partial-dependence mean overlaid as a bold dark line. See tsl_ice() for the
distinctive parameters.
plot_tilt_1d()¶
The per-feature, per-stage tilt \(d_j(x)\) as
step curves, one panel per (stage, feature) cell, with the signed fill carrying the sign
(orange positive, blue negative).
plot_2d_tilt()¶
The per-stage tilt product \(d_x(x)\,d_y(y)\) as a diverging heatmap, each panel rescaled symmetrically to its own range so the colour is anchored at zero and comparable across stages.
plot_tilt_diagnostics()¶
plot_tilt_diagnostics(object, X = NULL, features = NULL, grid_points = 200L,
stages = NULL) -> ggplot
The four tilt diagnostic curves per feature and stage: the curves run across the columns, features down the rows, one coloured line per stage in each panel.
plot_2d_backbone()¶
plot_2d_backbone(object, X = NULL, feature_x, feature_y, grid_points = 100L,
stages = NULL) -> patchwork
Two stacked rows of per-stage heatmaps over a feature pair: the unsigned backbone product \(b_x(x)\,b_y(y)\) on a sequential indigo ramp, and the signed 2D partial dependence on a diverging blue–orange ramp — the generic "spatial backbone" plot. Composed with patchwork; without it, the two component ggplots are returned as a list.
plot_feature_importance()¶
A six-panel report: per-stage backbone and tilt importance (heatmaps), a stage-weight histogram, and global bars for tilt, combined, and backbone importance. Composed with patchwork; without it, the six component ggplots are returned as a named list.
| Parameter | Type | Default | Description |
|---|---|---|---|
gamma |
numeric |
1 |
weight on the tilt term in the combined score |
plot_local_interpretation()¶
A per-point "backbone × tilt" decomposition: each query point becomes a row of three panels
sharing one stage ordering (by absolute net contribution) — a signed stage-contribution
waterfall, a per-stage backbone-share marimekko, and per-stage signed tilts. Composed with
patchwork; without it, the per-point tsl_local() results are returned.
| Parameter | Type | Default | Description |
|---|---|---|---|
points |
numeric | list | matrix |
required | one point (length n_features), a list of such vectors, or a matrix with one point per row |
titles |
character | NULL |
NULL |
per-point titles (default "Point 1", "Point 2", …) |
top_k_features |
integer |
3 |
tilt axes kept per stage in the signed-tilt panel |
Component plots¶
Raw univariate step components of the fitted grid tensors — each feature's curve is the per-interval mean factor \(m = b\cosh(d)\) over the finite interior intervals.
plot_grid_tensor_components(grid_tensor, axis = NULL, feature_names = NULL) -> ggplot
plot_combined_grid_tensors(object, axis = NULL) -> ggplot
plot_epoch_components(object, epoch) -> ggplot
| Function | Draws |
|---|---|
plot_grid_tensor_components |
one grid tensor's components, e.g. tsl_components(fit)[[1]]$combined_grid_tensor; pass axis to restrict to one feature |
plot_combined_grid_tensors |
each stage's combined grid-tensor components, one facet per stage |
plot_epoch_components |
every tree in a stage's bag (one facet per feature), shaded by total scale \(\lambda_+ + \lambda_-\), with the combined component overlaid in ink. epoch is the 1-based stage index |
One-verb entry point¶
autoplot()¶
A convenience wrapper dispatching to the plot_*() functions by type, so every diagnostic
is reachable from one verb. Extra arguments are forwarded to the underlying function (e.g.
feature_x/feature_y for the 2D plots, feature for "ice", points for "local").
| Parameter | Type | Default | Description |
|---|---|---|---|
type |
character |
"pd" |
"pd", "pd_difference", "pd_2d", "ice", "tilt", "tilt_2d", "tilt_diagnostics", "backbone_2d", "importance", "local", or "components" |
... |
— | — | forwarded to the dispatched plot_*() function |
library(ggplot2)
autoplot(fit, type = "pd", features = "a")
autoplot(fit, type = "tilt_2d", feature_x = "a", feature_y = "b")
Returns — a ggplot (or a patchwork object for the composite types).
Theme & scales¶
The flat aesthetic — white panels, hairline borders, a faint grid, muted monospace labels, and the indigo / blue–orange palette — is exported for custom figures.
theme_flat()¶
The minimal ggplot2 theme behind every tensorsl diagnostic, composable with +. Pair it with
the scale_*_tsl() family for the matching colour ramps.
scale_*_tsl()¶
scale_fill_tsl_backbone(name = "backbone", ...) # sequential indigo (magnitude)
scale_fill_tsl_tilt(name = "tilt", ...) # sequential orange (tilt magnitude)
scale_fill_tsl_diverging(name = "value", limits = c(-1, 1), ...) # blue–orange, anchored at 0
scale_colour_tsl(name = NULL, ...) # calm categorical cycle (per stage)
Fill and colour scales matching theme_flat(). scale_fill_tsl_diverging()
squishes values outside its symmetric limits to the ends.
Installation & development¶
tensorsl lives in the tsl-r/ subdirectory of the repo and compiles the Rust core as a static
library at build time. The core is pure Rust and links no system numerical libraries, so the
only prerequisite is a Rust toolchain (rustc >= 1.80, from rustup.rs).
# pak (owner/repo/subdir):
pak::pak("jyliuu/TSL/tsl-r")
# remotes / devtools:
remotes::install_github("jyliuu/TSL", subdir = "tsl-r")
The core is pinned as a git dependency that cargo fetches during the build, so no
separate checkout is required. For development against the working-tree core and the wrapper
regeneration workflow, see the package
README and the california
vignette.