solving matrix equations

In this manual you will learn how to solve different kinds of equations involving matrices and vectors.

Given a matrix A and a vector y, the types of equations we like to solve are of the form y=A·x, where A and y are given. The task is to find the vector x. The first two subsections show how to deal with this equation with no constraints on the solution vector x. Section 4 will show how to deal with the situation when the solution vector x is constrained.

In the equation above, the matrix and the vectors have to conform. This means that the number of rows of A should equal the size of the y vector and the size of the solution vector x will always equal the number of colums of A.

Note: In Praat scripting we don't distinguish in notation between a vector and its transpose.

1. Matrix A is square

In this case an exact solution for x is possible because if A is "well behaved" we can calculate its inverse and the solution will be x = A−1·y. The function solve# (a##, y#) is for this type of problem.

The following example shows an exactly solvable system because A is a square matrix and "well behaved":

    a## = {{0, 2, 0, 1},
    ... {2, 2, 3, 2},
    ... {4,-3, 0, 1},
    ... {6, 1,-6,-5}}
    y# = {0,-2,-7,6}
    x# = solve# (a##, y#)
    writeInfoLine: x#

My info window shows:

-0.49999999999999967 1.000000000000001 0.3333333333333344 -2.0000000000000027

As a check we can calculate the norm of the difference between y and A·x, which should be zero for a perfect solution.

    appendInfoLine: norm (y# - mul# (a##, x#))

My info window shows 1.0777744118960794e-14, which is almost zero (it is not exactly zero due to rounding errors because real numbers cannot be represented exactly in a computer.

2. Matrix A has more rows than columns

If the matrix has more rows than colums an exact solution is generally not possible and the best thing we can do is to find a solution vector x that minimizes the squared distance between the vectors y and A·x. The problem now can be posed as: find the vector x that minimizes ||yA·x||2. This is a regression problem which can be solved using singular value decomposition.

The following example shows a 5×2 matrix A. The solution therefore is a vector with two elements.

    a## = {{-0.84,-0.184},
    ... { 0.09, 1.26},
    ... { 0.62,-0.20},
    ... {-1.48,-1.03},
    ... { 1.29, 0.03}}
    y# = {-0.19, -0.90, -1.53, -2.30, 0.58}
    x# = solve# (a##, y#)
    writeInfoLine: x#

My info window shows:

0.5881230621928452 0.21643159712237164

We can calculate the norm of the difference:

    appendInfoLine: norm (y# - mul# (a##, x#))

My info window shows:

2.556992185298919

This means that no other vector x can be found that has ||yA·x|| < 2.556992185298919!

3. Matrix A has more columns than rows

If the number of colums is larger than the number of rows the system in general is underdetermined, i.e. we can not determine a complete solution vector.

4. The x vector is constrained

If there are additional constraints on the vector x we can handle three different cases:

4.1 Constraints on the squared norm of the solution x

These can be expressed in the following form: minimize ||yA·x||2 + α(xxδ) for some α > 0 and δ ≥ 0. Here the constraint on the squared norm xx of x is enforced by adding a penalty function with weighing factor α.

The function to use is solveWeaklyConstrained# (a##, y#, alpha, delta)

The function is called "weakly constrained" because the penalty function prohibits a relatively large departure of xx from δ. If we let α go to infinity we have a constrained least squares regression problem of minimizing ||yA·x||2 subject to xx = δ. The algorithm we have implemented is described by Ten Berge (1991).

4.1.1 An example from the cited paper with an 6×2 matrix A

    a## = {{ 4, 1, 0.5},
    ... { 4,-1,-0.5},
    ... {-4, 1,-0.5},
    ... {-4,-1, 0.5},
    ... { 2, 0, 0},
    ... {-2, 0, 0}}
    y# = {-1,-1,-1,-1,-1,1}
    alpha = 6.0
    delta = 2 / 3
    x# = solveWeaklyConstrained# (a##, y#, alpha, delta)
    writeInfoLine: x#

My info window shows:

-0.0563380281690141 -3.341667830688472e-17 0.7616819283108669

4.1.2 The solution of the regression without the constraint

No constraints are involved if we set α = 0

    x# = solveWeaklyConstrained# (a##, y#, 0.0, delta)
    writeInfoLine: x#

My info window shows:

-0.05555555555555557 -5.696494054485392e-17 3.0458443711512004e-16

The same solution would have appeared if we had used the following code:

    x# = solve# (a##, y#)
    writeInfoLine: x#

4.1.3. To enforce a solution where the norm of the solution equals one

We choose a very large value for α and set δ to 1.0.

    x# = solveWeaklyConstrained# (a##, y#, 1e100, 1.0)
    writeInfoLine: x#

My info window shows:

-0.05633802816901411 -3.341667830688472e-17 0.9984117520251988

4.2. Constraint of nonnegativity of the solution

Here we constrain the elements of the solution to be nonnegative. The problem is stated as minimize ||yA·x||2 where all xi ≥ 0.

This problem can be solved by an iterative alternating least squares method as described by Borg & Groenen (1997). The function to use is solveNonnegative# (a##, y# [, x#], maximumNumberOfIterations, tolerance, infoLevel)

The parameters have the following meaning:

a##, y#
the A matrix and the y vector.
[, x#]
the optional vector to start the iterations. If not given the procedure starts with the zero vector.
maximumNumberOfIterations
the maximum number of iterations that should be run if the tolerance criterion is not met.
tolerance
is a criterion value that is needed to decide when to stop the iterations. If d(n) is the squared approximation error in the n-th step of the iteration, i.e. d(n) = ||yA·x(n)||2, where x(n) is the approximation of x at step n, then the iteration stops if either d(n) == 0 or
|d(n) - d(n-1)| / ||y||2 < tolerance.
infoLevel
determines the amount of information that is written to the info window during iterations. No info is written if the value is zero.

As an example consider:

    a## = {{-4, 2, 2},
           { 2, 4, 2},
           { 1, 1, 1},
           { 2,-1, 3}}
    y# = {1,2,1,3}
    result# = solveNonnegative# (a##, y#, 100, 1e-17, 0)
    writeInfoLine: result#

My info window shows:

0.17687074830212887 0 0.8594104308385341

The same a## and y# with extra output and only three iterations:

    result# = solveNonnegative# (a##, y#, 3, 1e-17, 2)
    writeInfoLine: result#

Now my info window shows:

Iteration: 1, error: 2.7083144168962345

Iteration: 2, error: 0.22835187182198863

Iteration: 3, error: 0.019415103584264275

Number of iterations: 3; Minimum: 0.019415103584264275

0.18686771227425772 0.0063553925295192215 0.8542134965490019

The solution is not reached after only 3 iterations. We use the found solution to start a new iteration:

    result# = solveNonnegative# (a##, y#, result#, 100, 1e-17, 1)
    writeInfoLine: result#

My info window shows

Number of iterations: 6; Minimum: 0.011337868480725613

0.17687074830212887 0 0.8594104308385341

The final solution has been reached after 6 extra iterations.

4.3 Constraints on the sparseness of the solution

As we have seen above, if the number of columns is larger than the number of rows then a unique solution does not exist in general because the number of unknowns is larger than the number of equations. However, there is an exception:

If the matrix A has some special properties and the solution vector has to be sparse, i.e. most of its values are zero, then we can find the x that minimizes ||yA·x||2.

In general an iterative procedure is needed for the minimization. We have implemented one based on iterative hard thresholding which is described by Blumensath & Davies (2010).

solveSparse# (a##, y# [, x#], maximumNumberOfNonzeros, maximumNumberOfIterations, tolerance, infoLevel)

The parameters have the following meaning:

a##, y#
the A matrix and the y vector.
[, x#]
the optional vector to start the iterations. If not given the procedure starts with the zero vector.
maximumNumberOfNonzeros
the maximum number of nonzero elements in the solution vector, i.e. the sparsity.
maximumNumberOfIterations
the maximum number of iterations that should be run if the tolerance criterion is not met.
tolerance
is a criterion value that is needed to decide when to stop the iterations. If d(n) is the squared approximation error in the n-th step of the iteration, i.e. d(n) = ||yA·x(n)||2, where x(n) is the approximation of x at step n, then the iteration stops if
|d(n) − d(n−1)| / ||y||2 < tolerance.
infoLevel
determines the amount of information that is written to the info window during iterations. No info is written if the value is zero.

In the following example we first construct a sparse vector x, with random numbers between 0.1 and 10, and a random Gaussian matrix A. From these two we construct our y. We then solve the sparse system and compare its solution xs to the constructed x.

    nrow = 100
    ncol = 1000
    x# = zero# (ncol)
    for i to size (x#)
       x# [i] = if randomUniform (0,1) < 0.005 then randomUniform (0.1, 10) else 0.0 fi
    endfor
    # On average in x# 5 out of 1000 will be unequal zero.
    a# = randomGauss## (nrow, ncol, 0.0, 1.0 / nrow)
    y# = mul# (a##, x#)
    maximumNumberOfNonzeros = 10
    maximumNumberOfIterations = 200
    tolerance = 1e-17
    info = 0 ; no info
    xs# = solveSparse# (a##, y#, maximumNumberOfNonzeros, maximumNumberOfIterations, tolerance, info)
    # We have found the solution now check
    dif# = x# - xs#
    inner = inner (dif#, dif#)
    writeInfoLine: if inner > 1e-7 then "error" else "OK" endif

My info window shows: OK.

Links to this page


© djmw 20230801