Lasso on Vertically Partitioned Data
Implementing Lasso on vertically partitioned data in a naive fashion provides some information on the issues with vertically partitioned data.
Consider a dataset with response y where the predictor matrix X is vertically partitioned among three sites.
X=[X1,X2,X3]
where the combined X is n×p, each Xi is n×pk, for k=1,2,3, and ∑3i=1pk=p.
We wish to fit a lasso model:
minimizeβ||y−Xβ||22+λ||β||1.
For purposes of illustration, we generate a dataset.
set.seed(129)
n <- 100
p <- 5
X <- matrix(rnorm(n * p), n, p)
X1 <- X[, 1:2]
X2 <- X[, 3, drop = FALSE]
X3 <- X[, 4:5]
beta_true <- c(5, 0, 0, 2, 2)
y <- X %*% beta_true + rnorm(n)
The Aggregated Fit
If the data were not vertically partitioned, the fit would be straightforward for a specified λ=2 say. We would just solve the optimization problem, the primal problem.
suppressWarnings(suppressMessages(library(CVXR)))
beta <- Variable(p)
lambda <- 2
p_obj <- sum_squares(y - X %*% beta) + lambda / 2 * p_norm(beta, 1)
p_prob <- Problem(Minimize(p_obj))
p_result <- solve(p_prob)
The resulting value of the primal objective is 109.5436324 and the fitted estimate is
(beta_primal <- p_result$getValue(beta))
## [,1]
## [1,] 4.937721027
## [2,] -0.007471594
## [3,] 0.094597544
## [4,] 2.065937851
## [5,] 2.130020709
The Lasso Dual
The dual problem for lasso for that λ is
minimizeu||y−u||22 subject to ||XTu||∞≤λ.
In the above u is an n-vector of parameters and the constraint is
XTu=[XT1uXT2uXT3u]
where each XTku is pk×n. It follows that
|XTu|∞=max{||XT1u||∞,||XT2u||∞,||XT3u||∞}
Thus, if each site k provides ||XTku||∞ for a given u, the constraint can be computed in a distributed fashion by a master performing the optimization.
So the dual is solvable in a distributed fashion.
Indeed, we can compute this to check.
u <- Variable(n)
d_obj <- 0.5 * sum_squares(y - u)
##d_constraint <- list(p_norm(t(X) %*% u) <= lambda)
d_constraint <- list(
max(p_norm(t(X1) %*% u, Inf),
p_norm(t(X2) %*% u, Inf),
p_norm(t(X3) %*% u, Inf)) <= lambda
)
d_prob <- Problem(Minimize(d_obj), d_constraint)
d_result <- solve(d_prob)
uVal <- d_result$getValue(u)
## Print a few values out of the 100
head(uVal)
## [,1]
## [1,] -1.05733012
## [2,] -0.32031376
## [3,] 0.57115415
## [4,] -1.28355101
## [5,] -1.97281480
## [6,] 0.09022386
So far, so good.
The Catch
Now that we have solved the dual problem, we need to recover the solution to the original primal problem. The correspondence between the primal solution ˆβ and dual solution u is:
Xˆβ=y−u.
The solution is, of course,
ˆβ=(XTX)−1XT(y−u).
This is where we get killed because the XTX matrix involves cross-site terms like XTiXj:
If we ignored that fact, we can check that we do get the right solution.
XtX <- t(X) %*% X
beta_dual<- solve(XtX) %*% t(X) %*% (y - uVal)
cbind(beta_dual, beta_primal)
## [,1] [,2]
## [1,] 4.919405e+00 4.937721027
## [2,] -9.091532e-06 -0.007471594
## [3,] 7.857126e-02 0.094597544
## [4,] 2.047664e+00 2.065937851
## [5,] 2.110122e+00 2.130020709