# Alternating randomized least squares for CP Decomposition

The function cp_arls computes an estimate of the best rank-R CP model of a tensor X using alternating randomized least-squares algorithm. The input X must be a (dense) tensor. The output CP model is a ktensor. The CP-ARLS method is described in the following reference:

• C. Battaglino, G. Ballard, T. G. Kolda. A Practical Randomized CP Tensor Decomposition. SIAM Journal on Matrix Analysis and Applications 39(2):876-901, 2018. https://doi.org/10.1137/17M1112303

## Set up a sample problem

We set up an especially difficult and somewhat large sample problem that has high collinearity (0.9) and 1% noise. This is an example where the randomized method will generally outperform the standard method.

```sz = [200 300 400];
R = 5;
ns = 0.01;
coll = 0.9;

info = create_problem('Size', sz, 'Num_Factors', R, 'Noise', ns, ...
'Factor_Generator', @(m,n) matrandcong(m,n,coll), ...
'Lambda_Generator', @ones);

% Extract data and solution
X = info.Data;
M_true = info.Soln;
```

## Running the CP-ARLS method

Running the method is essentially the same as using CP-ALS, feed the data matrix and the desired rank. Note that the iteration is of the form NxN which is the number of epochs x the number of iterations per epoch. The default number of iterations per epoch is 50. At the end of each epoch, we check the convergence criteria. Because this is a randomized method, we do not achieve strict decrease in the objective function. Instead, we look at the number of epochs without improvement (newi) and exit when this crosses the predefined tolerance (newitol), which defaults to 5. It is important to note that the fit values that are reported are approximate, so this is why it is denoted by f~ rather than just f.

```tic
[M1, ~, out1] = cp_arls(X,R);
time1 = toc;
scr1 = score(M1,M_true);
fprintf('\n*** Results for CP-ARLS (with mixing) ***\n');
fprintf('Time (secs): %.3f\n', time1)
fprintf('Score (max=1): %.3f\n', scr1);
```
```CP-ARLS (with mixing):
Iter 10x50: f~ = 9.895982e-01 newi = 3
Iter 12x50: f~ = 9.895715e-01 newi = 5

*** Results for CP-ARLS (with mixing) ***
Time (secs): 7.320
Score (max=1): 0.953
```

## Speed things up by skipping the initial mixing

The default behavior is to mix the data in each mode using an FFT and diagonal random +/-1 matrix. This may add substantial preprocessing time, though it helps to ensure that the method converges. Oftentimes, such as with randomly-generated data, the mixing is not necessary.

```tic
[M2, ~, out2] = cp_arls(X,R,'mix',false);
time2 = toc;
scr2 = score(M2,M_true);

fprintf('\n*** Results for CP-ARLS (no mix) ***\n');
fprintf('Time (secs): %.3f\n', time2)
fprintf('Score (max=1): %.3f\n', scr2);
```
```CP_ARLS (without mixing):
Iter 10x50: f~ = 9.891749e-01 newi = 0
Iter 20x50: f~ = 9.891475e-01 newi = 5

*** Results for CP-ARLS (no mix) ***
Time (secs): 8.054
Score (max=1): 0.977
```

## Comparing with CP-ALS

CP-ALS may be somewhat faster, especially since this is a relatively small problem, but it usually will not achieve as good of an answer in terms of the score.

```tic;
[M3, ~, out3] = cp_als(X,R,'maxiters',500,'printitn',10);
time3 = toc;
scr3 = score(M3,M_true);
fprintf('\n*** Results for CP-ALS ***\n');
fprintf('Time (secs): %.3f\n', time3)
fprintf('Score (max=1): %.3f\n', scr3);
```
```CP_ALS:
Iter 10: f = 9.623077e-01 f-delta = 2.8e-04
Iter 20: f = 9.643623e-01 f-delta = 1.9e-04
Iter 30: f = 9.661441e-01 f-delta = 1.4e-04
Iter 33: f = 9.664695e-01 f-delta = 9.5e-05
Final f = 9.664695e-01

*** Results for CP-ALS ***
Time (secs): 2.669
Score (max=1): 0.327
```

## How well does the approximate fit do?

It is possible to check the accuracy of the fit computation by having the code compute the true fit and the final solution, enabled by the truefit option.

```[M4,~,out4] = cp_arls(X,R,'truefit',true);
```
```CP-ARLS (with mixing):
Iter 10x50: f~ = 9.896803e-01 newi = 1
Iter 16x50: f~ = 9.896567e-01 newi = 5
Final fit = 9.896384e-01 Final estimated fit = 9.896909e-01
```

## Varying epoch size

It is possible to vary that number of iterations per epoch. Fewer iterations means that more time is spent checking for convergence and it may also be harder to detect as an single iteration can have some fluctuation and we are actually looking for the overall trend. In contrast, too many iterations means that the method won't realize when it has converged and may spend too much time computing.

```tic
M = cp_arls(X,R,'epoch',1,'newitol',20);
toc
fprintf('Score: %.4f\n',score(M,M_true));
```
```CP-ARLS (with mixing):
Iter 10x1: f~ = 9.578138e-01 newi = 0
Iter 20x1: f~ = 9.628324e-01 newi = 0
Iter 30x1: f~ = 9.700016e-01 newi = 0
Iter 40x1: f~ = 9.725053e-01 newi = 4
Iter 50x1: f~ = 9.735807e-01 newi = 5
Iter 60x1: f~ = 9.736604e-01 newi = 2
Iter 70x1: f~ = 9.740389e-01 newi = 0
Iter 80x1: f~ = 9.742339e-01 newi = 7
Iter 90x1: f~ = 9.738148e-01 newi = 17
Iter 100x1: f~ = 9.740003e-01 newi = 9
Iter 110x1: f~ = 9.744334e-01 newi = 19
Iter 111x1: f~ = 9.741795e-01 newi = 20
Elapsed time is 2.011118 seconds.
Score: 0.5006
```
```tic
M = cp_arls(X,R,'epoch',200,'newitol',3,'printitn',2);
toc
fprintf('Score: %.4f\n',score(M,M_true));
```
```CP-ARLS (with mixing):
Iter  2x200: f~ = 9.893987e-01 newi = 0
Iter  4x200: f~ = 9.894291e-01 newi = 1
Iter  6x200: f~ = 9.894693e-01 newi = 1
Iter  8x200: f~ = 9.894298e-01 newi = 1
Iter 10x200: f~ = 9.894309e-01 newi = 3
Elapsed time is 19.795041 seconds.
Score: 0.9874
```

## Set up another sample problem

We set up another problem with 10% noise, but no collinearity.

```sz = [200 300 400];
R = 5;
ns = 0.10;

info = create_problem('Size', sz, 'Num_Factors', R, 'Noise', ns, ...
'Factor_Generator', @rand,'Lambda_Generator', @ones);

% Extract data and solution
X = info.Data;
M_true = info.Soln;
```

## Terminating once a desired fit is achieved

If we know the noise level is 10%, we would expect a fit of 0.90 at best. So, we can set a threshold that is close to that and terminate as soon as we achieve that accuracy. Since detecting convergence is hard for a randomized method, this can lead to speed ups. However, if the fit is not high enough, the accuracy may suffer consequently.

```M = cp_arls(X,R,'newitol',20,'fitthresh',0.895,'truefit',true);
fprintf('Score: %.4f\n',score(M,M_true));
```
```CP-ARLS (with mixing):
Iter  1x50: f~ = 8.977503e-01 newi = 0
Final fit = 8.974316e-01 Final estimated fit = 8.977503e-01
Score: 0.9828
```

## Changing the number of function evaluation samples

The function evaluation is approximate and based on sampling the number of entries specified by nsampfit. If this is too small, the samples will not be accurate enough. If this is too large, the computation will take too long. The default is , which should generally be sufficient. It may sometimes be possible to use smaller values. The same sampled entries are used for every convergence check --- we do not resample to check other entries.

```M = cp_arls(X,R,'truefit',true,'nsampfit',100);
fprintf('Score: %.4f\n',score(M,M_true));
```
```CP-ARLS (with mixing):
Iter  9x50: f~ = 9.077972e-01 newi = 5
Final fit = 8.977454e-01 Final estimated fit = 9.143556e-01
Score: 0.9691
```

## Change the number of sampled rows in least squares solve

The default number of sampled rows for the least squares solves is ceil(10*R*log2®). This seemed to work well in most tests, but this can be varied higher or lower. For R=5, this means we sample 117 rows per solve. The rows are different for every least squares problem. Let's see what happens if we reduce this to 10.

```M = cp_arls(X,R,'truefit',true,'nsamplsq',10);
fprintf('Score: %.4f\n',score(M,M_true));
```
```CP-ARLS (with mixing):
Iter 10x50: f~ = 7.111503e-01 newi = 2
Iter 16x50: f~ = 6.587896e-01 newi = 5
Final fit = 7.639470e-01 Final estimated fit = 7.608317e-01
Score: 0.3965
```

What if we use 25?

```M = cp_arls(X,R,'truefit',true,'nsamplsq',25);
fprintf('Score: %.4f\n',score(M,M_true));
```
```CP-ARLS (with mixing):
Iter 10x50: f~ = 8.768886e-01 newi = 4
Iter 11x50: f~ = 8.756291e-01 newi = 5
Final fit = 8.816613e-01 Final estimated fit = 8.810186e-01
Score: 0.8536
```