Saturday, April 21, 2012

Classification of LiDAR data

Professional software available in the domain of classification of LiDAR data as well as the literature available for feature extraction cum classification reveal that a certain hierarchy has to be followed in the process. Before proceeding to extract the features present on the terrain through the process of classification, we should recall that in a typical terrain there are ground, buildings, trees, roads and vehicles. The errors present in capturing the data often results in outliers also being present in the dataset.

One typically begins with the elimination of outliers. Outliers are those points which lay isolated within a sphere of radius $\varepsilon$. If the sphere contains less than $n$ points, the points are labelled as outliers. Both $\varepsilon$ and $n$ are thresholds that are provided by the user of the software or the classification algorithm. To find the outliers, each of the points in the dataset is considered as the center of a sphere of radius $\varepsilon$, and then the number of points within the sphere are counted. If the number is less than $n$, the said point is labelled as an outlier.

The popular software named TerraSolid (which works on the CAD platform Microstation), also has a concept of finding low lying points in addition to the first step of finding outliers. This routine is present in the utility named TerraScan. To find a low lying point it assumes a cylindrical neighbourhood of a point, where the axis of the cylinder is the z-direction. If for a given threshold $h$, which forms the height of the cylinder, there is no other point contained in the cylinder, then the point is labelled as low-lying.

The next step after the classification of the outliers and low-lying points, is to classify the ground points. Since 1999, several algorithms have been researched on for labelling the ground points. Sithole and Vosselman (2004) have provided an interesting review of the algorithms for ground classification. In their paper, the authors review and test the performance of several algorithms on the ISPRS test dataset. It has been reported later in the literature that these algorithms were not suitable for all the terrains, and therefore some interesting additional algorithms have been developed. The issues with ground point classification have been reviewed and addressed in a paper by Meng, Currit and Zhao (2010). The process of classification of the ground points has been referred in the literature as “filtering”. In TerraSolid, in addition to the slope of the ground, information regarding the longest edge of a building is also sought. This information is required in order to avoid classifying a pretty long building as a ground cluster.

After the ground points have been “filtered out”, there are trees and buildings to be detected. Some of the urban areas do not even contain trees, but some do! Sometimes, the trees are too close to the buildings. If the intensity information is not used, the tree points and the building points appear to be in the same cluster.

There could be multiple strategies for building extraction from the unclassified datasets. TerraSolid people first classify the low-vegetation and high-vegetation points just by their height from the ground. The remaining points are then classified as buildings using their own set of algorithms. Although this sounds pretty crude, it does help. The buildings can be then reconstred into a CAD model as TerraSolid sits on a Microstation enviroment. We shall deal with building extraction from LiDAR data in a separate post.

Road points could be classified from the ground points themselves. Researchers have reported the use of intensity values from LiDAR data to separate the road and other points. However, the problem becomes different when bridges and flyovers have to detected. Apart from the intensity values, the height values also need to be used.

Trees could be detected using a template matching procedure. A botanist usually knows the shape of a tree. The property that LiDAR data can capture multiple storeys from the trees, comes in handy here. Tree templates are available as RPC (Rational Polynomial Coefficient) models for purchase. However, this database is pretty limited. There is a research opportunity to create these RPC models by scanning different forests. The Indian biodiversity is pretty high, and an initiative to create tree models for the different species of trees available in India (at least) will be an excellent direction for research and development.
  1. Sithole and Vosselman (2004), Experimental comparion of filter algorithms for bare-earth extraction from airborne laser scanning point clouds. doi:10.1016/j.isprsjprs.2004.05.004
  2. Meng, Currit and Zhao (2010), Ground Filtering Algorithms for Airborne LiDAR Data: A Review of Critical Issues , doi:10.3390/rs2030833 


  1. Another concept which I think is complimentary to classification is the segmentation of the lidar data.
    Also the accuracy of most of the classification methods out there are 85%. So none of them has achieved total automation without any human intervention.
    - Arpan

  2. Sure, i will also talk about segmentation in a separate post.

  3. Hi Suddhasheel

    Thanks for precise and helpful explanation.
    I am a student who is working on Lidar data for extraction of cars from the scene.
    I am able to perform classification using terra scan, cars are visible but now I am not able to extract them or was thinking to generate a convex hull around them.
    By any method my purpose is to confirm their recognition as cars, not just visually.
    can you please share your opinion how can I achieve this. Any tool exist in terra or microstation or ArcGIS etc. for this purpose?