RSGISLib Clumps Classification Utilities

The steps to undertaking a classification using clumps are:

  • Image segmentation to generate clumps

  • Populate attributes to clumps

  • Generate training and populate to clumps

  • Train the classifier

  • Apply the classifier

  • Collapse to generate a classification.

If you have undertaken an image segmentation and want to use those segments for a classification using RSGISLib then you need to use the image clumps representation. This is described in the paper below:

Clewley, D., Bunting, P., Shepherd, J., Gillingham, S., Flood, N., Dymond, J., Lucas, R., Armston, J., Moghaddam, M. (2014). A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables Remote Sensing 6(7), 6111 6135. https://dx.doi.org/10.3390/rs6076111

Commonly we would use the Shepherd et al., (2019) segmentation using the following function:

from rsgislib.segmentation import segutils

input_img = "S2_UVD_27sept_27700_sub.kea"
clumps_img = "s2_uvd_27sept_clumps.kea"
tmp_path = "./tmp"
segutils.runShepherdSegmentation(input_img, clumps_img, tmpath=tmp_path, numClusters=60, minPxls=100, distThres=100, sampling=100, kmMaxIter=200)

Shepherd, J., Bunting, P., Dymond, J. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination Remote Sensing 11(6), 658. https://dx.doi.org/10.3390/rs11060658

To populate the clumps (i.e., segments or objects) with the attribute information used for the classification you need to use the functions within the rsgislib.rastergis module, for example:

import rsgislib.rastergis

# Populate with all statistics (min, max, mean, standard deviation)
bandinfo = []
bandinfo.append(rsgislib.rastergis.BandAttStats(band=1, minField='BlueMin', maxField='BlueMax', meanField='BlueMean', stdDevField='BlueStdev'))
bandinfo.append(rsgislib.rastergis.BandAttStats(band=2, minField='GrnMin', maxField='GrnMax', meanField='GrnMean', stdDevField='GrnStdev'))
bandinfo.append(rsgislib.rastergis.BandAttStats(band=3, minField='RedMin', maxField='RedMax', meanField='RedMean', stdDevField='RedStdev'))
bandinfo.append(rsgislib.rastergis.BandAttStats(band=4, minField='RE1Min', maxField='RE1Max', meanField='RE1Mean', stdDevField='RE1Stdev'))
rsgislib.rastergis.populateRATWithStats(input_img, clumps_img, bandinfo)

# Populate with just mean statistic
bandinfo = []
bandinfo.append(rsgislib.rastergis.BandAttStats(band=1, meanField='BlueMean'))
bandinfo.append(rsgislib.rastergis.BandAttStats(band=2, meanField='GrnMean'))
bandinfo.append(rsgislib.rastergis.BandAttStats(band=3, meanField='RedMean'))
bandinfo.append(rsgislib.rastergis.BandAttStats(band=4, meanField='RE1Mean'))
rsgislib.rastergis.populateRATWithStats(input_img, clumps_img, bandinfo)

To train the classifier you need to create a column within the clump raster attribute table (RAT) specifying the class for the clumps being used for training. Training is often provided as vector layers, using a rastergis helper function you can generate the training data:

import rsgislib.rastergis

classes_dict = dict()
classes_dict['Mangroves'] = [1, 'Mangroves.shp']
classes_dict['Other'] = [2, 'Other.shp']
tmp_path = './tmp'
classes_int_col_in = 'ClassInt'
classes_name_col = 'ClassStr'
rsgislib.rastergis.populateClumpsWithClassTraining(clumps_img, classes_dict, tmp_path, classes_int_col_in, classes_name_col)

Populate RAT Training

rsgislib.classification.classratutils.populate_clumps_with_class_training(clumps_img: str, classes_info: list, tmp_dir: str, classes_int_col: str, classes_name_col: str, rat_band: int = 1)

A function to populate a clumps file with training from a series of vector layers (1 per class)

Parameters:
  • clumps_img – input clumps file.

  • classes_info – A list of rsgislb.classification.ClassVecSamplesInfoObj objects. Note, the file_h5 variable is not needed in this function.

  • tmp_dir – File path (which needs to exist) where files can temporally be written.

  • classes_int_col – Output column name for integer values representing each class.

  • classes_name_col – Output column name for string class names.

  • rat_band – The band within the input image the RAT is associated with.

Extract Data for Training

rsgislib.classification.classratutils.extract_rat_col_data(clumps_img: str, cols: list, sel_col: str, sel_col_val: str, out_h5_file: str, datatype: int = None, rat_band: int = 1)

A function which extracts column values to be used as training, testing, validation sets for building a classifier. The data will be saved within a HDF5 file. Note, this function reads each whole column into memory and then subsets it so for very large RATs this might cause problems.

Parameters:
  • clumps_img – The inputted clumps file with the associated RAT.

  • cols – a list of the columns to be exported. Note, they will be stored in the HDF5 file in the order specified here and therefore this order is important and needs to be maintained when this is used going forward (e.g., when applying a classifier trained using this data.

  • sel_col – The column within the RAT specifying which rows will be exported.

  • sel_col_val – The value in the sel_col which indicated which rows are to be exported.

  • out_h5_file – The output HDF5 file the data will be exported to.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

  • rat_band – The band within the input image the RAT is associated with.