Polyline
Simplification

by Dan Sunday

Overview
Vertex Reduction
Pseudo-Code: Vertex Reduction
Douglas-Peucker Algorithm
Pseudo-Code: Douglas-Peucker
Convex Hull Speed-Up
Implementation
poly_simplify()
References

Often a polyline has too much resolution for an application, such as visual displays of geographic map boundaries or detailed animated figures in games or movies.  That is, the points on the polylines representing the object boundaries are too close together for the resolution of the application.  For example, in a computer display, successive vertices of the polyline may be displayed at the same screen pixel so that successive edge segments start, stay at, and end at the same displayed point.  The whole polyline may even have all its vertices mapped to the same pixel, so that it appears simply as a single point in the display! 

For the sake of efficiency, one doesn't want to draw all these degenerate lines, since drawing a single point would be enough. To achieve this, one wants to reduce the vertices and edges of the polyline to essential ones that suffice for the resolution of one's application.  There are several algorithms for doing this.  In effect, these algorithms approximate a high resolution polyline with a smaller low resolution reduced polyline having fewer vertices.  This also speeds up subsequent algorithms, such as area fill or intersection, that may applied to the reduced polyline or polygon.


Overview

We will consider several different algorithms for reducing the points in a polyline to produce a simplified polyline that approximates the original within a specified tolerance.  Most of these algorithms work in any dimension since they only depend on computing the distance between points and lines.  Thus, they can be applied to arbitrary 2D or 3D curves that have been approximated by a polyline, for example, by sampling a parametric curve at regular small intervals.

The first simplification algorithm, vertex reduction, is a fast O(n) algorithm.  It is the fastest and least complicated algorithm, but gives the coarsest result.  However, it can be used as a preprocessing stage before applying other algorithms.  This results in an faster combined algorithm since vertex reduction can significantly decrease the number of vertices that remain for input to other simplification algorithms.

The second algorithm is the classical Douglas-Peucker (DP) approximation algorithm that is used extensively for both computer graphics and geographic information systems.  There are two variants of this algorithm, the original O(n2) method [Douglas & Peucker, 1973] and a more recent O(n log n) one [Hershberger & Snoeyink, 1992].  Unfortunately, as is often the case, the faster algorithm is more complicated to implement.  Additionally, it is not as general, and only works for simple 2D planar polylines, and not in higher dimensions.

Finally, we combine these algorithms, vertex reduction followed by DP approximation, in our C++ code implementation of poly_simplify() as a fast practical high-quality polyline simplification algorithm.


Vertex Reduction

In vertex reduction, successive vertices that are clustered too closely are reduced to a single vertex.  For example, if a polyline is being drawn in a computer display, successive vertices may be drawn at the same pixel if they are closer than some fixed application tolerance.  In a large range geographic map display, two vertices of a boundary polyline may be separated by as much as a mile (or more), and still be displayed at the same pixel; and the edge segments joining them are also being drawn at this pixel.  One would like to discard the redundant vertices so that successive vertices are separated several pixels, and edge segments are not just points.

Vertex reduction is the brute-force algorithm for polyline simplification.  For this algorithm, a polyline vertex is discarded when its distance from a prior initial vertex is less than some minimum tolerance ε > 0.   Specifically,  after fixing an initial vertex V0, successive vertices Vi are tested and rejected if they are less than ε away from V0. But, when a vertex is found that is further away than ε, then it is accepted as part of the new simplified polyline, and it also becomes the new initial vertex for further simplification of the polyline.  Thus, the resulting edge segments between accepted vertices are larger than the ε tolerance. This procedure is easily visualized as follows:

Pseudo-Code: Vertex Reduction

Here is the straightforward pseudo-code for this algorithm:

Input: tol = the approximation tolerance
     
  L = {V0,V1,...,Vn-1} is any n-vertex polyline

Set start = 0;
Set k = 0;
Set W0 = V0;
for each vertex Vi (i=1,n-1)
{
    if Vi is within tol from Vstart
    then ignore it, and continue with the next vertex

    Vi is further than tol away from Vstart
    so add it as a new vertex of the reduced polyline
    Increment k++;
    Set Wk = Vi;
    Set start = i; as the new initial vertex
}

Output:
W = {W0,W1,...,Wk-1} = the k-vertex simplified polyline

This is a fast O(n) algorithm.  It should be implemented comparing squares of distances with the squared tolerance to avoid expensive square root calculations.  This algorithm is the first preprocessing stage in our implementation for poly_simplify() given below.


Douglas-Peucker Algorithm

Whereas vertex reduction uses closeness of vertices as a rejection criterion, the Douglas-Peucker (DP) algorithm uses the closeness of a vertex to an edge segment.  This algorithm works from the top down by starting with a crude initial guess at a simplified polyline, namely the single edge joining the first and last vertices of the polyline.   Then the remaining vertices are tested for closeness to that edge.  If there are vertices further than a specified tolerance, ε > 0, away from the edge, then the vertex furthest from it is added the simplification.  This creates a new guess for the simplified polyline.  Using recursion, this process continues for each edge of the current guess until all vertices of the original polyline are within tolerance of the simplification.

More specifically, in the DP algorithm, the two extreme endpoints of a polyline are connected with a straight line as the initial rough approximation of the polyline.  Then, how well it approximates the whole polyline is determined by computing the distances from all intermediate polyline vertices to that (finite) line segment.  If all these distances are less than the specified tolerance ε, then the approximation is good, the endpoints are retained, and the other vertices are eliminated.  However, if any of these distances exceeds the ε tolerance, then the approximation is not good enough.  In this case, we choose the point that is furthest away as a new vertex subdividing the original polyline into two (shorter) polylines, as illustrated in the following diagram.

This procedure is repeated recursively on these two shorter polylines.  If at any time, all of the intermediate distances are less than the ε threshold, then all the intermediate points are eliminated.  The routine continues until all possible points have been eliminated.  Successive stages of this process are shown in the following example.


 

Pseudo-Code: Douglas-Peucker

Here is pseudo-code for this algorithm:

Input:  tol = the approximation tolerance
      
  L = {V0,V1,...,Vn-1} is any n vertex polyline

// Mark vertices that will be in the simplified polyline
Initially Mark V0 and V
n

// Recursively simplify by selecting vertex furthest away
simplify(tol,
L, 0, n);

// Copy Marked vertices to the output simplified polyline
for (i=m=0; i<=n; i++) {
    if (Vi is marked) { add it to the output polyline
W
        Wm = Vi;
        m++;
    }
}

Output:
W = {W0,W1,...,Wm-1} = the simplified m-vertex polyline

//-------------------------------------------------------------
// This is the recursive simplification routine
simplify(tol,
L, j, k)
{
    if (k
£ j+1) there is nothing to simplify
        return immediately
    otherwise
    test distance of intermediate vertices from segment Vj to Vk

    Put Sjk  = the segment from Vj to Vk
    Put maxd = 0 is the distance of farthest vertex from Sjk
    Put maxi = j is the index of the vertex farthest from Sjk

    for each intermediate vertex Vi (i=j+1,k-1)
    {
        Put dv = distance from Vi to segment Sjk
        if (dv <= maxd) then Vi is not farther away, so
            continue to the next vertex
        otherwise Vi is a new max vertex
        Put maxd = d and maxi = i to remember the farthest vertex
    }
    if (maxd > tol) a vertex is further than tol from Sjk
    {
        // split the polyline at the farthest vertex
        Mark Vmaxi as part of the simplified polyline
        and recursively simplify the two subpolylines
        simplify(tol,
L, j, maxi);
        simplify(tol,
L, maxi, k);
    }
}

This algorithm is O(nm) worst case time, and O(n log m) expected time, where m is the size of the simplified polyline.  Note that this is an output dependent algorithm, and will be very fast when m is small.  This algorithm is implemented as the second stage of our poly_simplify() routine given below.  The first stage of our implementation does vertex reduction on the polyline before invoking the DP algorithm.  This results in a fast high-quality polyline approximation/simplification algorithm.

Convex Hull Speed-Up

[Hershberger & Snoeyink, 1992] describe an interesting improvement for speeding up the Douglas-Peucker algorithm, making it a worst case O(n log n) time algorithm.  They do this by speeding up the selection of the farthest intermediate vertex from the segment Sij = ViVj. This is achieved by noting that the farthest vertex must be on the convex hull of the polyline chain from Vi to Vj.  Then, they compute this hull in O(n) time using Melkman's algorithm for the hull of a simple polyline (see Algorithm 15), and find the vertex farthest from Sij using an O(log n) binary search on the hull vertices (see Algorithm 13).  They next show how to reuse the hull information when a polyline chain is split at the farthest vertex.  The details for putting the whole algorithm together are a bit delicate, but in the final analysis it is shown to have worst case O(n log n) time.

There are a few caveats about this algorithm.  First, it depends on using the Melkman O(n) time hull algorithm, and thus the polyline must be simple, although this is often true in many applications.  Second, since this algorithm depends on a 2D convex hull algorithm, it only applies to planar polylines, whereas the VR and DP algorithms can be used for any dimension. Next, this algorithm is more complicated than the original DP algorithm, and is more difficult to code and debug.  Fortunately, the [Hershberger & Snoeyink, 1992] paper has an Appendix with a complete "C" code implementation.  Finally, although they improve on the worst case behavior, the original DP algorithm is usually O(n log m) and can do better in the best cases where m is small.  So, altogether its a stalemate over which algorithm is best, and using the easier to code, more general DP algorithm is preferred by us.


Implementation

Here is a sample "C++" implementation of this algorithm. 

// Copyright 2002, softSurfer (www.softsurfer.com)
// This code may be freely used and modified for any purpose
// providing that this copyright notice is included with it.
// SoftSurfer makes no warranty for this code, and cannot be held
// liable for any real or imagined damage resulting from its use.
// Users of this code must verify correctness for their application.

// Assume that classes are already given for the objects:
//    Point and Vector with
//        coordinates {float x, y, z;}    // as many as are needed
//        operators for:
//            == to test equality
//            != to test inequality
//            (Vector)0 = (0,0,0)         (null vector)
//            Point  = Point ± Vector
//            Vector = Point - Point
//            Vector = Vector ± Vector
//            Vector = Scalar * Vector    (scalar product)
//            Vector = Vector * Vector    (cross product)
//    Segment with defining endpoints {Point P0, P1;}
//===================================================================

// dot product (3D) which allows vector operations in arguments
#define dot(u,v)   ((u).x * (v).x + (u).y * (v).y + (u).z * (v).z)
#define norm2(v)   dot(v,v)        // norm2 = squared length of vector
#define norm(v)    sqrt(norm2(v))  // norm = length of vector
#define d2(u,v)    norm2(u-v)      // distance squared = norm2 of difference
#define d(u,v)     norm(u-v)       // distance = norm of difference

// poly_simplify():
//    Input:  tol = approximation tolerance
//            V[] = polyline array of vertex points
//            n   = the number of points in V[]
//    Output: sV[]= simplified polyline vertices (max is n)
//    Return: m   = the number of points in sV[]
int
poly_simplify( float tol, Point* V, int n, Point* sV )
{
    int    i, k, m, pv;            // misc counters
    float  tol2 = tol * tol;       // tolerance squared
    Point* vt = new Point[n];      // vertex buffer
    int*   mk = new int[n] = {0};  // marker buffer

    // STAGE 1.  Vertex Reduction within tolerance of prior vertex cluster
    vt[0] = V[0];              // start at the beginning
    for (i=k=1, pv=0; i<n; i++) {
        if (d2(V[i], V[pv]) < tol2)
            continue;
        vt[k++] = V[i];
        pv = i;
    }
    if (pv < n-1)
        vt[k++] = V[n-1];      // finish at the end

    // STAGE 2.  Douglas-Peucker polyline simplification
    mk[0] = mk[k-1] = 1;       // mark the first and last vertices
    simplifyDP( tol, vt, 0, k-1, mk );

    // copy marked vertices to the output simplified polyline
    for (i=m=0; i<k; i++) {
        if (mk[i])
            sV[m++] = vt[i];
    }
    delete vt;
    delete mk;
    return m;         // m vertices in simplified polyline
}

// simplifyDP():
//  This is the Douglas-Peucker recursive simplification routine
//  It just marks vertices that are part of the simplified polyline
//  for approximating the polyline subchain v[j] to v[k].
//    Input:  tol = approximation tolerance
//            v[] = polyline array of vertex points
//            j,k = indices for the subchain v[j] to v[k]
//    Output: mk[] = array of markers matching vertex array v[]
void
simplifyDP( float tol, Point* v, int j, int k, int* mk )
{
    if (k <= j+1) // there is nothing to simplify
        return;

    // check for adequate approximation by segment S from v[j] to v[k]
    int     maxi = j;          // index of vertex farthest from S
    float   maxd2 = 0;         // distance squared of farthest vertex
    float   tol2 = tol * tol;  // tolerance squared
    Segment S = {v[j], v[k]};  // segment from v[j] to v[k]
    Vector  u = S.P1 - S.P0;   // segment direction vector
    double  cu = dot(u,u);     // segment length squared

    // test each vertex v[i] for max distance from S
    // compute using the Feb 2001 Algorithm's dist_Point_to_Segment()
    // Note: this works in any dimension (2D, 3D, ...)
    Vector  w;
    Point   Pb;                // base of perpendicular from v[i] to S
    double  b, cw, dv2;        // dv2 = distance v[i] to S squared

    for (int i=j+1; i<k; i++)
    {
        // compute distance squared
        w = v[i] - S.P0;
        cw = dot(w,u);
        if ( cw <= 0 )
            dv2 = d2(v[i], S.P0);
        else if ( cu <= cw )
            dv2 = d2(v[i], S.P1);
        else {
            b = cw / cu;
            Pb = S.P0 + b * u;
            dv2 = d2(v[i], Pb);
        }
        // test with current max distance squared
        if (dv2 <= maxd2)
            continue;
        // v[i] is a new max vertex
        maxi = i;
        maxd2 = dv2;
    }
    if (maxd2 > tol2)        // error is worse than the tolerance
    {
        // split the polyline at the farthest vertex from S
        mk[maxi] = 1;      // mark v[maxi] for the simplified polyline
        // recursively simplify the two subpolylines at v[maxi]
        simplifyDP( tol, v, j, maxi, mk );  // polyline v[j] to v[maxi]
        simplifyDP( tol, v, maxi, k, mk );  // polyline v[maxi] to v[k]
    }
    // else the approximation is OK, so ignore intermediate vertices
    return;
}
//===================================================================
 


References

David Douglas & Thomas Peucker, "Algorithms for the reduction of the number of points required to represent a digitized line or its caricature", The Canadian Cartographer 10(2), 112-122  (1973)

John Hershberger & Jack Snoeyink, "Speeding Up the Douglas-Peucker Line-Simplification Algorithm", Proc 5th Symp on Data Handling, 134-143 (1992).  UBC Tech Report available online from NEC ResearchIndex.

 

Copyright © 2001-2006 softSurfer.  All rights reserved.
Email comments and suggestions to
feedback@softsurfer.com