# Lasso on Vertically Partitioned Data

Implementing Lasso on vertically partitioned data in a naive fashion provides some information on the issues with vertically partitioned data.

Consider a dataset with response $$y$$ where the predictor matrix $$X$$ is vertically partitioned among three sites.

$X = [X_1, X_2, X_3]$

where the combined $$X$$ is $$n\times p$$, each $$X_i$$ is $$n\times p_k$$, for $$k= 1, 2, 3$$, and $$\sum_{i=1}^3 p_k = p.$$

We wish to fit a lasso model:

$\begin{array}{ll} \underset{\beta}{\mbox{minimize}} & ||y - X\beta||_2^2 + \lambda ||\beta||_1. \end{array}$

For purposes of illustration, we generate a dataset.

set.seed(129)
n <- 100
p <- 5
X <- matrix(rnorm(n * p), n, p)
X1 <- X[, 1:2]
X2 <- X[, 3, drop = FALSE]
X3 <- X[, 4:5]
beta_true <- c(5, 0, 0, 2, 2)
y <- X %*% beta_true + rnorm(n)

## The Aggregated Fit

If the data were not vertically partitioned, the fit would be straightforward for a specified $$\lambda = 2$$ say. We would just solve the optimization problem, the primal problem.

suppressWarnings(suppressMessages(library(CVXR)))

beta <- Variable(p)
lambda <- 2
p_obj <- sum_squares(y - X %*% beta) + lambda / 2 * p_norm(beta, 1)
p_prob <- Problem(Minimize(p_obj))
p_result <- solve(p_prob)

The resulting value of the primal objective is 109.5436324 and the fitted estimate is

(beta_primal <- p_result$getValue(beta)) ## [,1] ## [1,] 4.937721027 ## [2,] -0.007471594 ## [3,] 0.094597544 ## [4,] 2.065937851 ## [5,] 2.130020709 ## The Lasso Dual The dual problem for lasso for that $$\lambda$$ is $\begin{array}{ll} \underset{u}{\mbox{minimize}} & ||y - u||_2^2 \end{array} \mbox{ subject to } ||X^Tu||_{\infty} \leq \lambda.$ In the above $$u$$ is an $$n$$-vector of parameters and the constraint is $X^Tu = \left[\begin{array}{l} X_1^Tu\\ X_2^Tu\\ X_3^Tu \end{array}\right]$ where each $$X_k^Tu$$ is $$p_k\times n$$. It follows that $|X^Tu|_{\infty} = \max\{||X_1^Tu||_{\infty}, ||X_2^Tu||_{\infty}, ||X_3^Tu||_{\infty}\}$ Thus, if each site $$k$$ provides $$||X_k^Tu||_{\infty}$$ for a given $$u$$, the constraint can be computed in a distributed fashion by a master performing the optimization. So the dual is solvable in a distributed fashion. Indeed, we can compute this to check. u <- Variable(n) d_obj <- 0.5 * sum_squares(y - u) ##d_constraint <- list(p_norm(t(X) %*% u) <= lambda) d_constraint <- list( max(p_norm(t(X1) %*% u, Inf), p_norm(t(X2) %*% u, Inf), p_norm(t(X3) %*% u, Inf)) <= lambda ) d_prob <- Problem(Minimize(d_obj), d_constraint) d_result <- solve(d_prob) uVal <- d_result$getValue(u)
## Print a few values out of the 100
head(uVal)
##             [,1]
## [1,] -1.05733012
## [2,] -0.32031376
## [3,]  0.57115415
## [4,] -1.28355101
## [5,] -1.97281480
## [6,]  0.09022386

So far, so good.

## The Catch

Now that we have solved the dual problem, we need to recover the solution to the original primal problem. The correspondence between the primal solution $$\hat{\beta}$$ and dual solution $$u$$ is:

$X\hat{\beta} = y - u.$

The solution is, of course,

$\hat{\beta} = (X^TX)^{-1}X^T(y- u).$

This is where we get killed because the $$X^TX$$ matrix involves cross-site terms like $$X_i^TX_j$$:

If we ignored that fact, we can check that we do get the right solution.

XtX <- t(X) %*% X
beta_dual<- solve(XtX) %*% t(X) %*% (y - uVal)
cbind(beta_dual, beta_primal)
##               [,1]         [,2]
## [1,]  4.919405e+00  4.937721027
## [2,] -9.091532e-06 -0.007471594
## [3,]  7.857126e-02  0.094597544
## [4,]  2.047664e+00  2.065937851
## [5,]  2.110122e+00  2.130020709