RSGISLib Classification Module

The classification module provides classification functionality within RSGISLib.

The classification module has functions which allows classifiers to be applied to image data, either on a per pixel based or following an image segmentation and the classification of the resultant segments/clumps/objects.

The classification functions are available within a number of sub-modules for interfacing with different libraries and methods:

This rsgislib.classification module provides functions for dealing with training data, undertaking an accuracy assessment and other useful utilities, see below.

Training Data

rsgislib.classification.get_class_training_data(imgBandInfo, classVecSampleInfo, tmpdir, sub_sample=None, refImg=None)

A function to extract training for vector regions for a given input image set.

Parameters
  • imgBandInfo – A list of rsgislib.imageutils.ImageBandInfo objects to define the images and bands of interest.

  • classVecSampleInfo – A list of rsgislib.classification.ClassVecSamplesInfoObj objects to define the training regions.

  • tmpdir – A directory for temporary outputs created during the processing.

  • sub_sample – If not None then an integer needs to be provided which takes a random selection from the available samples to balance the number of samples used for the classification.

  • refImg – A reference image which defines the area of interest, pixel size etc. for the processing. If None then an image will be generated using the input images but the tmpdir needs to be defined.

Returns

dictionary of ClassSimpleInfoObj objects.

rsgislib.classification.split_sample_train_valid_test(input_sample_h5_file, train_h5_file, valid_h5_file, test_h5_file, test_sample, valid_sample, train_sample=None, rand_seed=42, datatype=None)

A function to split a HDF5 samples file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) into three (i.e., Training, Validation and Testing).

Parameters
  • input_sample_h5_file – Input HDF file, probably from rsgislib.imageutils.extractZoneImageBandValues2HDF.

  • train_h5_file – Output file with the training data samples (this has the number of samples left following the removal of the test and valid samples if train_sample=None)

  • valid_h5_file – Output file with the valid data samples.

  • test_h5_file – Output file with the testing data samples.

  • test_sample – The size of the testing sample to be taken.

  • valid_sample – The size of the validation sample to be taken.

  • train_sample – The size of the training sample to be taken. If None then the remaining samples are returned.

  • rand_seed – The random seed to be used to randomly select the sub-samples.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

rsgislib.classification.get_class_training_chips_data(imgBandInfo, classVecSampleInfo, chip_h_size, tmpdir, refImg=None)

A function to extract training chips (windows/regions) for vector regions for a given input image set.

Parameters
  • imgBandInfo – A list of rsgislib.imageutils.ImageBandInfo objects to define the images and bands of interest.

  • classVecSampleInfo – A list of rsgislib.classification.ClassVecSamplesInfoObj objects to define the training regions.

  • chip_h_size – is half the chip size to be extracted (i.e., 10 with output image chips 21x21, 10 pixels either size of the one of interest).

  • tmpdir – A directory for temporary outputs created during the processing.

  • refImg – A reference image which defines the area of interest, pixel size etc. for the processing. If None then an image will be generated using the input images but the tmpdir needs to be defined.

Returns

dictionary of ClassSimpleInfoObj objects.

rsgislib.classification.split_chip_sample_train_valid_test(input_sample_h5_file, train_h5_file, valid_h5_file, test_h5_file, test_sample, valid_sample, train_sample=None, rand_seed=42, datatype=None)

A function to split a chip HDF5 samples file (from rsgislib.imageutils.extractChipZoneImageBandValues2HDF) into three (i.e., Training, Validation and Testing).

Parameters
  • input_sample_h5_file – Input HDF file, probably from rsgislib.imageutils.extractZoneImageBandValues2HDF.

  • train_h5_file – Output file with the training data samples (this has the number of samples left following the removal of the test and valid samples if train_sample=None)

  • valid_h5_file – Output file with the valid data samples.

  • test_h5_file – Output file with the testing data samples.

  • test_sample – The size of the testing sample to be taken.

  • valid_sample – The size of the validation sample to be taken.

  • train_sample – The size of the training sample to be taken. If None then the remaining samples are returned.

  • rand_seed – The random seed to be used to randomly select the sub-samples.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

rsgislib.classification.get_num_samples(input_sample_h5_file)

A function to return the number of samples within the input HDF5 file.

Parameters

input_sample_h5_file – Input HDF file, probably from rsgislib.imageutils.extractZoneImageBandValues2HDF.

Returns

the number of samples in the hdf5 file.

Utilities

rsgislib.classification.collapseClasses(inputimage, outputimage, gdalformat, classColumn, classIntCol)

Collapses an attribute table with a large number of classified clumps (segments) to a attribute table with a single row per class (i.e. a classification rather than segmentation.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input file with attribute table.

  • outputImage – is a string containing the name and path of the output file.

  • gdalformat – is a string with the output image format for the GDAL driver.

  • classColumn – is a string with the name of the column with the class names - internally this will be treated as a string column even if a numerical column is specified.

  • classIntCol – is a sting specifying the name of a column with the integer class representation. This is an optional parameter but if specified then the int reprentation of the classes will be reserved.

rsgislib.classification.colour3bands(inputimage, outputimage, gdalformat)

Generates a 3 band colour image from the colour table in the input file.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input file with attribute table.

  • outputImage – is a string containing the name and path of the output file.

  • gdalformat – is a string with the output image format for the GDAL driver.

Accuracy Assessment

rsgislib.classification.generateRandomAccuracyPts(inputImage, outputShp, classImgCol, classImgVecCol, classRefVecCol, numPts, seed, force)

Generates a set of random points for accuracy assessment.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input image with attribute table.

  • outputShp – is a string containing the name and path of the output shapefile.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is a string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • numPts – is an int specifying the total number of points which should be created.

  • seed – is an int specifying the seed for the random number generator. (Optional: Default 10)

  • force – is a bool, specifying whether to force removal of the output vector if it exists. (Optional: Default False)

rsgislib.classification.generateStratifiedRandomAccuracyPts(inputImage, outputShp, classImgCol, classImgVecCol, classRefVecCol, numPts, seed, force, usePxlLst)

Generates a set of stratified random points for accuracy assessment.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input image with attribute table.

  • outputShp – is a string containing the name and path of the output shapefile.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is a string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • numPts – is an int specifying the number of points for each class which should be created.

  • seed – is an int specifying the seed for the random number generator. (Optional: Default 10)

  • force – is a bool, specifying whether to force removal of the output vector if it exists. (Optional: Default False)

  • usePxlLst – is a bool, if there are only a small number of pixels then creating a list of all the pixel locations will speed up processing. (Optional: Default False)

rsgislib.classification.generateTransectAccuracyPts(inputImage, inputLinesShp, outputPtsShp, classImgCol, classImgVecCol, classRefVecCol, lineStep, force=False)

A tool for converting a set of lines in to point transects and populating with the information for undertaking an accuracy assessment.

Where:

Parameters
  • inputImage – is a string specifying the input image file with classification.

  • inputLinesShp – is a string specifying the input lines shapefile path.

  • outputPtsShp – is a string specifying the output points shapefile path.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is an optional string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • lineStep – is a double specifying the step along the lines between the points

  • force – is an optional boolean specifying whether the output shapefile should be deleted if is already exists (True and it will be deleted; Default is False)

rsgislib.classification.popClassInfoAccuracyPts(inputImage, inputShp, classImgCol, classImgVecCol, classRefVecCol)

Generates a set of stratified random points for accuracy assessment.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input image with attribute table.

  • inputShp – is a string containing the name and path of the input shapefile.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is an optional string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

Classification Utility Classes

class rsgislib.classification.ClassSimpleInfoObj(id=None, fileH5=None, red=None, green=None, blue=None)

This is a class to store the information associated within the classification.

Parameters
  • id – Output pixel value for this class

  • fileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the training data for the class

  • red – Red colour for visualisation (0-255)

  • green – Green colour for visualisation (0-255)

  • blue – Blue colour for visualisation (0-255)

  • id – Output pixel value for this class

  • fileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the training data for the class

  • red – Red colour for visualisation (0-255)

  • green – Green colour for visualisation (0-255)

  • blue – Blue colour for visualisation (0-255)

class rsgislib.classification.ClassInfoObj(id=None, out_id=None, trainfileH5=None, testfileH5=None, validfileH5=None, red=None, green=None, blue=None)

This is a class to store the information associated within the classification.

Parameters
  • id – Internal unique ID value for this class (must start 0 and be consecutive between the classes)

  • out_id – External unique ID for ther class which will be used as the output image pixel value, can be any integer.

  • trainfileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the training data for the class

  • testfileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the testing data for the class

  • validfileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the validation data for the class

  • red – Red colour for visualisation (0-255)

  • green – Green colour for visualisation (0-255)

  • blue – Blue colour for visualisation (0-255)

  • id – Internal unique ID value for this class (must start 0 and be consecutive between the classes)

  • out_id – External unique ID for ther class which will be used as the output image pixel value, can be any integer.

  • trainfileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the training data for the class

  • testfileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the testing data for the class

  • validfileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the validation data for the class

  • red – Red colour for visualisation (0-255)

  • green – Green colour for visualisation (0-255)

  • blue – Blue colour for visualisation (0-255)

class rsgislib.classification.ClassVecSamplesInfoObj(id=None, classname=None, vecfile=None, veclyr=None, fileH5=None)

This is a class to store the information associated with the classification vector training regions.

Parameters
  • id – Unique ID for the class (will probably be the pixel value for this class)

  • classname – Unique name for the class.

  • vecfile – A vector file path with the training samples

  • veclyr – The vector layer name within the vecfile for the training samples.

  • fileH5 – A file path for a HDF5 file where the pixel values for these samples will be stored.

  • id – Unique ID for the class (will probably be the pixel value for this class)

  • classname – Unique name for the class.

  • vecfile – A vector file path with the training samples

  • veclyr – The vector layer name within the vecfile for the training samples.

  • fileH5 – A file path for a HDF5 file where the pixel values for these samples will be stored.

class rsgislib.classification.SamplesInfoObj(className=None, classID=None, maskImg=None, maskPxlVal=None, outSampImgFile=None, numSamps=None, samplesH5File=None, red=None, green=None, blue=None)

This is a class to store the information associated within the classification.

Parameters
  • className – The name of the class

  • classID – Is the classification numeric ID (i.e., output pixel value)

  • maskImg – The input image mask from which samples are taken

  • maskPxlVal – The pixel value within the mask for the class

  • outSampImgFile – Temporary file which will store the sampled pixels.

  • numSamps – The number of samples required.

  • samplesH5File – File location for the HDF5 file with the input image values for training.

  • red – for visualisation red value.

  • green – for visualisation green value.

  • blue – for visualisation blue value.

  • className – The name of the class

  • classID – Is the classification numeric ID (i.e., output pixel value)

  • maskImg – The input image mask from which samples are taken

  • maskPxlVal – The pixel value within the mask for the class

  • outSampImgFile – Temporary file which will store the sampled pixels.

  • numSamps – The number of samples required.

  • samplesH5File – File location for the HDF5 file with the input image values for training.

  • red – for visualisation red value.

  • green – for visualisation green value.

  • blue – for visualisation blue value.