a robust linear regression method, first proposed by Theil (1950). The slope of the regression line is estimated as the median of all pairwise slopes between each pair of points in the data set. Because this number of pairs increases quadratically with the number of data points, we have implemented a somewhat less computationally intensive procedure, the incomplete theil regression. In the incomplete method we first split the data set of N data points (xi, yi), i = 1..N, in two equal sets of size N/2 and then calculate N/2 slopes as
|mi = (yN/2+i - yi) / (xN/2+i - xi), for i = 1..N/2.|
The regression slope m is calculated as the median of these N/2 values mi.
Given the slope m, the offset b is calculated as the median of the N values bi= yi - m·xi.
The theil regression has a breakdown point of 29.3%, which means that it can tolerate arbitrary corruption of up to 29.3 of the input data-points without degradation of its accuracy
© djmw, July 10, 2013