Notation¶

This page collects the symbols used throughout the Under the hood section. TSL solves a supervised regression problem: estimate the conditional mean \(m(\mathbf{x}) = \mathbb{E}[Y \mid X = \mathbf{x}]\) from i.i.d. observations \(\mathcal{D}_n = \{(y^{(i)}, \mathbf{x}^{(i)})\}_{i=1}^n \subseteq \mathbb{R}\times\mathcal{X}\), where the feature support \(\mathcal{X}\subseteq\mathbb{R}^p\) may be non-rectangular.

Indices and data¶

Symbol	Meaning
\(p\)	number of features
\(n\)	number of training samples
\([k]\)	the set \(\{1,\dots,k\}\)
\(j \in [p]\)	feature / coordinate index
\(\ell, k \in [R]\)	stage index
\(x_j\) / \(X_j\)	realized value / random variable for coordinate \(j\)
\(\mathbf{x},\,\mathbf{x}^{(i)}\)	feature vector (the \(i\)-th sample)
\(X_{(-j)}\)	random subvector over all coordinates except \(j\)
\(y^{(i)}\) / \(Y\)	response (sample \(i\)) / response random variable
\(m(\mathbf{x})\)	true conditional mean \(\mathbb{E}[Y\mid X=\mathbf{x}]\)
\(\hat{m}(\mathbf{x})\)	the fitted TSL estimator

Model¶

Symbol	Meaning
\(R\)	number of stages (separation rank \(\le 2R\))
\(\hat{m}^{(\ell)}(\mathbf{x})\)	stage-\(\ell\) predictor, \(\hat{m}_{+}^{(\ell)} - \hat{m}_{-}^{(\ell)}\)
\(\hat{m}_{\pm,j}^{(\ell)}(x_j)\)	positive univariate factor for feature \(j\), sign branch \(\pm\) (\(>0\))
\(\hat{m}_{\pm}^{(\ell)}(\mathbf{x})\)	scaled signed branch \(\lambda_{\pm}^{(\ell)}\prod_j \hat{m}_{\pm,j}^{(\ell)}(x_j) = \lambda_{\pm}^{(\ell)}\,\tilde{m}_{\pm}^{(\ell)}\ (\ge 0)\)
\(\lambda_{+}^{(\ell)},\lambda_{-}^{(\ell)}\)	non-negative stage scalars
\(b_j^{(\ell)}(x_j)\)	backbone (magnitude / activity gate), \(>0\)
\(d_j^{(\ell)}(x_j)\)	tilt (signed log-imbalance), \(\in\mathbb{R}\)
\(b^{(\ell)}(\mathbf{x})\)	stage backbone \(b_0^{(\ell)}\prod_j b_j^{(\ell)}(x_j)\)
\(d^{(\ell)}(\mathbf{x})\)	stage tilt \(d_0^{(\ell)} + \sum_j d_j^{(\ell)}(x_j)\)
\(b_0^{(\ell)},\,d_0^{(\ell)}\)	stage-level backbone scale / tilt intercept (absorb \(\lambda_\pm\))

Grid tensor & fitting¶

Symbol	Meaning
\(\mathcal{G}^{(\ell)}\)	a grid tensor: interval partitions + per-bin values
\(\mathcal{I}_j^{(\ell)}\)	interval partition of axis \(j\); \(L_j^{(\ell)} = \#\mathcal{I}_j^{(\ell)}\)
\(\hat{v}_{\pm,j,I}^{(\ell)}\)	value of \(\hat{m}_{\pm,j}^{(\ell)}\) on interval \(I\) (\(>0\))
\(\mathbbm{1}_I(x_j)\)	indicator that \(x_j\) falls in interval \(I\)
\(R_i^{(\ell-1)}\)	outer residual at stage \(\ell\) (target minus previous stages)
\(r_i\)	within-stage residual (outer residual minus current stage)
\(\tilde{m}_{\pm}^{(\ell)},\ \tilde{m}_{\pm}^{(i)}\)	unscaled product \(\prod_j \hat{m}_{\pm,j}^{(\ell)}(x_j)\) (per-sample at \(\mathbf{x}^{(i)}\)); the scaled branch is \(\hat{m}_{\pm}^{(\ell)} = \lambda_{\pm}^{(\ell)}\,\tilde{m}_{\pm}^{(\ell)}\)
\(w_i\)	stabilizing weight (\(=1\) for L2; Huber-dependent otherwise)
\(u_\pm^S,\ \hat{u}_\pm^S\)	multiplicative bin update on region \(S\), and its delta \(u_\pm-1\)
\(\alpha\)	ridge regularization on the bin update
\(\tau,\rho\)	tilt-coupling regularizers (implementation; see Fitting)
\(v_{\min},v_{\max}\)	clamp bounds keeping updates positive/stable
\(S_{11},S_{22},S_{12},t_1,t_2\)	sufficient statistics of the \(2\times2\) split system
\(\Delta_S,\ \Delta_{\text{split}}\)	per-region / total error reduction from a split
\(n_{\text{grids}}\)	number of bagged grids per stage (`n_trees`)
\(\xi\)	similarity trim threshold; keep top \(K=\lceil(1-\xi)n_{\text{grids}}\rceil\)

Interpretation¶

Symbol	Meaning
\(\mathrm{PD}_j(x_j)\)	partial dependence \(\mathbb{E}_{X_{(-j)}}[m(x_j, X_{(-j)})]\)
\(\mathrm{PD}_{\pm,j}^{(\ell)}(x_j)\)	signed-branch partial dependence of stage \(\ell\)
\(C^{(\ell)}_{\pm,j}\)	per-stage/feature/branch PD constant \(c^{(\ell)}_{\pm,j}\lambda_\pm^{(\ell)}\)

Branch sign convention

The "\(+\)" and "\(-\)" branches are not positive/negative values — both products are non-negative (\(\hat{m}_{\pm}^{(\ell)} \ge 0\)). Signed effects arise only from the difference \(\hat{m}_{+}^{(\ell)} - \hat{m}_{-}^{(\ell)}\). See The model.