Skip to content

Notation

This page collects the symbols used throughout the Under the hood section. TSL solves a supervised regression problem: estimate the conditional mean \(m(\mathbf{x}) = \mathbb{E}[Y \mid X = \mathbf{x}]\) from i.i.d. observations \(\mathcal{D}_n = \{(y^{(i)}, \mathbf{x}^{(i)})\}_{i=1}^n \subseteq \mathbb{R}\times\mathcal{X}\), where the feature support \(\mathcal{X}\subseteq\mathbb{R}^p\) may be non-rectangular.

Indices and data

Symbol Meaning
\(p\) number of features
\(n\) number of training samples
\([k]\) the set \(\{1,\dots,k\}\)
\(j \in [p]\) feature / coordinate index
\(\ell, k \in [R]\) stage index
\(x_j\) / \(X_j\) realized value / random variable for coordinate \(j\)
\(\mathbf{x},\,\mathbf{x}^{(i)}\) feature vector (the \(i\)-th sample)
\(X_{(-j)}\) random subvector over all coordinates except \(j\)
\(y^{(i)}\) / \(Y\) response (sample \(i\)) / response random variable
\(m(\mathbf{x})\) true conditional mean \(\mathbb{E}[Y\mid X=\mathbf{x}]\)
\(\hat{m}(\mathbf{x})\) the fitted TSL estimator

Model

Symbol Meaning
\(R\) number of stages (separation rank \(\le 2R\))
\(\hat{m}^{(\ell)}(\mathbf{x})\) stage-\(\ell\) predictor, \(\hat{m}_{+}^{(\ell)} - \hat{m}_{-}^{(\ell)}\)
\(\hat{m}_{\pm,j}^{(\ell)}(x_j)\) positive univariate factor for feature \(j\), sign branch \(\pm\) (\(>0\))
\(\hat{m}_{\pm}^{(\ell)}(\mathbf{x})\) scaled signed branch \(\lambda_{\pm}^{(\ell)}\prod_j \hat{m}_{\pm,j}^{(\ell)}(x_j) = \lambda_{\pm}^{(\ell)}\,\tilde{m}_{\pm}^{(\ell)}\ (\ge 0)\)
\(\lambda_{+}^{(\ell)},\lambda_{-}^{(\ell)}\) non-negative stage scalars
\(b_j^{(\ell)}(x_j)\) backbone (magnitude / activity gate), \(>0\)
\(d_j^{(\ell)}(x_j)\) tilt (signed log-imbalance), \(\in\mathbb{R}\)
\(b^{(\ell)}(\mathbf{x})\) stage backbone \(b_0^{(\ell)}\prod_j b_j^{(\ell)}(x_j)\)
\(d^{(\ell)}(\mathbf{x})\) stage tilt \(d_0^{(\ell)} + \sum_j d_j^{(\ell)}(x_j)\)
\(b_0^{(\ell)},\,d_0^{(\ell)}\) stage-level backbone scale / tilt intercept (absorb \(\lambda_\pm\))

Grid tensor & fitting

Symbol Meaning
\(\mathcal{G}^{(\ell)}\) a grid tensor: interval partitions + per-bin values
\(\mathcal{I}_j^{(\ell)}\) interval partition of axis \(j\); \(L_j^{(\ell)} = \#\mathcal{I}_j^{(\ell)}\)
\(\hat{v}_{\pm,j,I}^{(\ell)}\) value of \(\hat{m}_{\pm,j}^{(\ell)}\) on interval \(I\) (\(>0\))
\(\mathbbm{1}_I(x_j)\) indicator that \(x_j\) falls in interval \(I\)
\(R_i^{(\ell-1)}\) outer residual at stage \(\ell\) (target minus previous stages)
\(r_i\) within-stage residual (outer residual minus current stage)
\(\tilde{m}_{\pm}^{(\ell)},\ \tilde{m}_{\pm}^{(i)}\) unscaled product \(\prod_j \hat{m}_{\pm,j}^{(\ell)}(x_j)\) (per-sample at \(\mathbf{x}^{(i)}\)); the scaled branch is \(\hat{m}_{\pm}^{(\ell)} = \lambda_{\pm}^{(\ell)}\,\tilde{m}_{\pm}^{(\ell)}\)
\(w_i\) stabilizing weight (\(=1\) for L2; Huber-dependent otherwise)
\(u_\pm^S,\ \hat{u}_\pm^S\) multiplicative bin update on region \(S\), and its delta \(u_\pm-1\)
\(\alpha\) ridge regularization on the bin update
\(\tau,\rho\) tilt-coupling regularizers (implementation; see Fitting)
\(v_{\min},v_{\max}\) clamp bounds keeping updates positive/stable
\(S_{11},S_{22},S_{12},t_1,t_2\) sufficient statistics of the \(2\times2\) split system
\(\Delta_S,\ \Delta_{\text{split}}\) per-region / total error reduction from a split
\(n_{\text{grids}}\) number of bagged grids per stage (n_trees)
\(\xi\) similarity trim threshold; keep top \(K=\lceil(1-\xi)n_{\text{grids}}\rceil\)

Interpretation

Symbol Meaning
\(\mathrm{PD}_j(x_j)\) partial dependence \(\mathbb{E}_{X_{(-j)}}[m(x_j, X_{(-j)})]\)
\(\mathrm{PD}_{\pm,j}^{(\ell)}(x_j)\) signed-branch partial dependence of stage \(\ell\)
\(C^{(\ell)}_{\pm,j}\) per-stage/feature/branch PD constant \(c^{(\ell)}_{\pm,j}\lambda_\pm^{(\ell)}\)

Branch sign convention

The "\(+\)" and "\(-\)" branches are not positive/negative values — both products are non-negative (\(\hat{m}_{\pm}^{(\ell)} \ge 0\)). Signed effects arise only from the difference \(\hat{m}_{+}^{(\ell)} - \hat{m}_{-}^{(\ell)}\). See The model.