solving matrix equations

solving matrix equations

In this manual you will learn how to solve different kinds of equations involving matrices and vectors.

Given a matrix A and a vector y, the types of equations we like to solve are of the form y=A·x, where A and y are given. The task is to find the vector x. The first two subsections show how to deal with this equation with no constraints on the solution vector x. Section 4 will show how to deal with the situation when the solution vector x is constrained.

In the equation above, the matrix and the vectors have to conform. This means that the number of rows of A should equal the size of the y vector and the size of the solution vector x will always equal the number of colums of A.

Note: In Praat scripting we don't distinguish in notation between a vector and its transpose.

1. Matrix A is square

In this case an exact solution for x is possible because if A is "well behaved" we can calculate its inverse and the solution will be x = A⁻¹·y. The function solve# (a##, y#) is for this type of problem.

The following example shows an exactly solvable system because A is a square matrix and "well behaved":

   a## = {{0, 2, 0, 1},

   ...    {2, 2, 3, 2},

   ...    {4,-3, 0, 1},

   ...    {6, 1,-6,-5}}

   y# = {0,-2,-7,6}

   x# = solve# (a##, y#)

   writeInfoLine: x#

My info window shows:

-0.49999999999999967 1.000000000000001 0.3333333333333344 -2.0000000000000027

As a check we can calculate the norm of the difference between y and A·x, which should be zero for a perfect solution.

   appendInfoLine: norm (y# - mul# (a##, x#))

My info window shows 1.0777744118960794e-14, which is almost zero (it is not exactly zero due to rounding errors because real numbers cannot be represented exactly in a computer.

2. Matrix A has more rows than columns

If the matrix has more rows than colums an exact solution is generally not possible and the best thing we can do is to find a solution vector x that minimizes the squared distance between the vectors y and A·x. The problem now can be posed as: find the vector x that minimizes ||y − A·x||². This is a regression problem which can be solved using singular value decomposition.

The following example shows a 5×2 matrix A. The solution therefore is a vector with two elements.

   a## = {{-0.84,-0.184},

   ...    { 0.09, 1.26},

   ...    { 0.62,-0.20},

   ...    {-1.48,-1.03},

   ...    { 1.29, 0.03}}

   y# = {-0.19, -0.90, -1.53, -2.30, 0.58}

   x# = solve# (a##, y#)

   writeInfoLine: x#

My info window shows:

0.5881230621928452 0.21643159712237164

We can calculate the norm of the difference:

   appendInfoLine: norm (y# - mul# (a##, x#))

My info window shows:

2.556992185298919

This means that no other vector x can be found that has ||y − A·x|| < 2.556992185298919!

3. Matrix A has more columns than rows

If the number of colums is larger than the number of rows the system in general is underdetermined, i.e. we can not determine a complete solution vector.

4. The x vector is constrained

If there are additional constraints on the vector x we can handle three different cases:

4.1 Constraints on the squared norm of the solution x

These can be expressed in the following form: minimize ||y − A·x||² + α(x′x − δ) for some α > 0 and δ ≥ 0. Here the constraint on the squared norm x′x of x is enforced by adding a penalty function with weighing factor α.

The function to use is solveWeaklyConstrained# (a##, y#, alpha, delta)

The function is called "weakly constrained" because the penalty function prohibits a relatively large departure of x′x from δ. If we let α go to infinity we have a constrained least squares regression problem of minimizing ||y − A·x||² subject to x′x = δ. The algorithm we have implemented is described by Ten Berge (1991).

4.1.1 An example from the cited paper with an 6×2 matrix A

   a## = {{ 4, 1, 0.5},

   ...    { 4,-1,-0.5},

   ...    {-4, 1,-0.5},

   ...    {-4,-1, 0.5},

   ...    { 2, 0,   0},

   ...    {-2, 0,   0}}

   y# = {-1,-1,-1,-1,-1,1}

   alpha = 6.0

   delta = 2 / 3

   x# = solveWeaklyConstrained# (a##, y#, alpha, delta)

   writeInfoLine: x#

My info window shows:

-0.0563380281690141 -3.341667830688472e-17 0.7616819283108669

4.1.2 The solution of the regression without the constraint

No constraints are involved if we set α = 0

   x# = solveWeaklyConstrained# (a##, y#, 0.0, delta)

   writeInfoLine: x#

My info window shows:

-0.05555555555555557 -5.696494054485392e-17 3.0458443711512004e-16

The same solution would have appeared if we had used the following code:

   x# = solve# (a##, y#)

   writeInfoLine: x#

4.1.3. To enforce a solution where the norm of the solution equals one

We choose a very large value for α and set δ to 1.0.

   x# = solveWeaklyConstrained# (a##, y#, 1e100, 1.0)

   writeInfoLine: x#

My info window shows:

-0.05633802816901411 -3.341667830688472e-17 0.9984117520251988

4.2. Constraint of nonnegativity of the solution

Here we constrain the elements of the solution to be nonnegative. The problem is stated as minimize ||y − A·x||² where all x_i ≥ 0.

This problem can be solved by an iterative alternating least squares method as described by Borg & Groenen (1997). The function to use is solveNonnegative# (a##, y# [, x#], maximumNumberOfIterations, tolerance, infoLevel)

The parameters have the following meaning:

a##, y#: the A matrix and the y vector.
[, x#]: the optional vector to start the iterations. If not given the procedure starts with the zero vector.
maximumNumberOfIterations: the maximum number of iterations that should be run if the tolerance criterion is not met.
tolerance: is a criterion value that is needed to decide when to stop the iterations. If d(n) is the squared approximation error in the n-th step of the iteration, i.e. d(n) = ||y − A·x(n)||², where x(n) is the approximation of x at step n, then the iteration stops if either d(n) == 0 or; |d(n) - d(n-1)| / ||y||² < tolerance.
infoLevel: determines the amount of information that is written to the info window during iterations. No info is written if the value is zero.

As an example consider:

   a## = {{-4, 2, 2},

          { 2, 4, 2},

          { 1, 1, 1},

          { 2,-1, 3}}

   y# = {1,2,1,3}

   result# = solveNonnegative# (a##, y#,  100, 1e-17, 0)

   writeInfoLine: result#

My info window shows:

0.17687074830212887 0 0.8594104308385341

The same a## and y# with extra output and only three iterations:

   result# = solveNonnegative# (a##, y#, 3, 1e-17, 2)

   writeInfoLine: result#

Now my info window shows:

Iteration: 1, error: 2.7083144168962345

Iteration: 2, error: 0.22835187182198863

Iteration: 3, error: 0.019415103584264275

Number of iterations: 3; Minimum: 0.019415103584264275

0.18686771227425772 0.0063553925295192215 0.8542134965490019

The solution is not reached after only 3 iterations. We use the found solution to start a new iteration:

   result# = solveNonnegative# (a##, y#, result#, 100, 1e-17, 1)

   writeInfoLine: result#

My info window shows

Number of iterations: 6; Minimum: 0.011337868480725613

0.17687074830212887 0 0.8594104308385341

The final solution has been reached after 6 extra iterations.

4.3 Constraints on the sparseness of the solution

As we have seen above, if the number of columns is larger than the number of rows then a unique solution does not exist in general because the number of unknowns is larger than the number of equations. However, there is an exception:

If the matrix A has some special properties and the solution vector has to be sparse, i.e. most of its values are zero, then we can find the x that minimizes ||y − A·x||².

In general an iterative procedure is needed for the minimization. We have implemented one based on iterative hard thresholding which is described by Blumensath & Davies (2010).

solveSparse# (a##, y# [, x#], maximumNumberOfNonzeros, maximumNumberOfIterations, tolerance, infoLevel)

The parameters have the following meaning:

a##, y#: the A matrix and the y vector.
[, x#]: the optional vector to start the iterations. If not given the procedure starts with the zero vector.
maximumNumberOfNonzeros: the maximum number of nonzero elements in the solution vector, i.e. the sparsity.
maximumNumberOfIterations: the maximum number of iterations that should be run if the tolerance criterion is not met.
tolerance: is a criterion value that is needed to decide when to stop the iterations. If d(n) is the squared approximation error in the n-th step of the iteration, i.e. d(n) = ||y − A·x(n)||², where x(n) is the approximation of x at step n, then the iteration stops if; |d(n) − d(n−1)| / ||y||² < tolerance.
infoLevel: determines the amount of information that is written to the info window during iterations. No info is written if the value is zero.

In the following example we first construct a sparse vector x, with random numbers between 0.1 and 10, and a random Gaussian matrix A. From these two we construct our y. We then solve the sparse system and compare its solution xs to the constructed x.

   nrow = 100

   ncol = 1000

   x# = zero# (ncol)

   for i to size (x#)

      x# [i] = if randomUniform (0,1) < 0.005 then randomUniform (0.1, 10) else 0.0 fi

   endfor

   # On average in x# 5 out of 1000 will be unequal zero.

   a# = randomGauss## (nrow, ncol, 0.0, 1.0 / nrow)

   y# = mul# (a##, x#)

   maximumNumberOfNonzeros = 10

   maximumNumberOfIterations = 200

   tolerance = 1e-17

   info = 0   ; no info

   xs# = solveSparse# (a##, y#, maximumNumberOfNonzeros, maximumNumberOfIterations, tolerance, info)

   # We have found the solution now check

   dif# = x# - xs#

   inner = inner (dif#, dif#)

   writeInfoLine: if inner > 1e-7 then "error" else "OK" endif

My info window shows: OK.

1. Matrix A is square

2. Matrix A has more rows than columns

3. Matrix A has more columns than rows

4. The x vector is constrained

4.1 Constraints on the squared norm of the solution x

4.1.1 An example from the cited paper with an 6×2 matrix A

4.1.2 The solution of the regression without the constraint

4.1.3. To enforce a solution where the norm of the solution equals one

4.2. Constraint of nonnegativity of the solution

4.3 Constraints on the sparseness of the solution

Links to this page