RSGISLib Image Segmentation Module

The segmentation module contains the segmentation functionality for RSGISLib.

A number of steps are required for the segmentation, for most users it is recommended to use the runShepherdSegmentation helper function which will run all the required steps to generate a segmentation:

Example:

from rsgislib.segmentation import segutils

segutils.runShepherdSegmentation(inImage,
                                 outputClumps,
                                 tmpath='./',
                                 numClusters=60,
                                 minPxls=100,
                                 distThres=100,
                                 sampling=100, kmMaxIter=200)

Where ‘inImage’ is the input image (optionally masked and stretched) and ‘outputClumps’ is the output clumps file.

More information about the segmentation method is available in the following paper:

Daniel Clewley, Peter Bunting, James Shepherd, Sam Gillingham, Neil Flood, John Dymond, Richard Lucas, John Armston and Mahta Moghaddam. 2014. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing. Volume 6, Pages 6111-6135. http://www.mdpi.com/2072-4292/6/7/6111

rsgislib.segmentation.RMSmallClumpsStepwise(inputimage, clumpsimage, outputimage, gdalformat, stretchstatsavail, stretchstatsfile, storemean, processinmemory, minclumpsize, specThreshold)

Deprecated: is now ‘rmSmallClumpsStepwise’ (note starts with lower case ‘rm’)

rsgislib.segmentation.UnionOfClumps(outputimage, gdalformat, inputimagepaths, nodata)

Deprecated: is now ‘unionOfClumps’ (note starts with lower case ‘u’)

Utilities

rsgislib.segmentation.segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg=None, tmpath=’.’, gdalformat=’KEA’, noStats=False, noStretch=False, noDelete=False, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200, processInMem=False, saveProcessStats=False, imgStretchStats=”, kMeansCentres=”, imgStatsJSONFile=”)

Utility function to call the segmentation algorithm of Shepherd et al. (2014).

Where:

  • inputImg is a string containing the name of the input file.
  • outputClumps is a string containing the name of the output clump file.
  • outputMeanImg is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
  • tmpath is a file path for intermediate files (default is current directory).
  • gdalformat is a string containing the GDAL format for the output file (default = KEA).
  • noStats is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
  • noStretch is a bool which specifies that the input image bands should not be stretched (default = False).
  • noDelete is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
  • numClusters is an int which specifies the number of clusters within the KMeans clustering (default = 60).
  • minPxls is an int which specifies the minimum number pixels within a segments (default = 100).
  • distThres specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
  • bands is an array providing a subset of image bands to use (default is None to use all bands).
  • sampling specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
  • kmMaxIter maximum iterations for KMeans.
  • processInMem where functions allow it perform processing in memory rather than on disk.
  • saveProcessStats is a bool which specifies that the image stretch stats and the kMeans centre stats should be saved along with a header.
  • imgStretchStats is a string providing the file name and path for the image stretch stats (Output).
  • kMeansCentres is a string providing the file name and path for the KMeans clusters centres (don’t include file extension; .gmtxt will be added to the end) (Output).
  • imgStatsJSONFile is a string providing the name and path of a JSON file storing the image spatial extent and imgStretchStats and kMeansCentres file paths for use by other commands (Output).

Example:

from rsgislib.segmentation import segutils

inputImg = 'jers1palsar_stack.kea'
outputClumps = 'jers1palsar_stack_clumps_elim_final.kea'
outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea'

segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg, minPxls=100)
rsgislib.segmentation.tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR=’segtmp’, tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)

Utility function to call the segmentation algorithm of Shepherd et al. (2014) using the tiled process outlined in Clewley et al (2015).

  • inputImage is a string containing the name of the input file.
  • clumpsImage is a string containing the name of the output clump file.
  • tmpath is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
  • tileWidth is an int specifying the width of the tiles used for processing (Default 2000)
  • tileHeight is an int specifying the height of the tiles used for processing (Default 2000)
  • validDataThreshold is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
  • numClusters is an int which specifies the number of clusters within the KMeans clustering (default = 60).
  • minPxls is an int which specifies the minimum number pixels within a segments (default = 100).
  • distThres specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
  • bands is an array providing a subset of image bands to use (default is None to use all bands).
  • sampling specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
  • kmMaxIter maximum iterations for KMeans (Default 200).

Example:

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
rsgislib.segmentation.segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg=None, tmpath=’.’, gdalformat=’KEA’, noStats=False, noStretch=False, noDelete=False, minPxls=100, distThres=100, bands=None, processInMem=False)

Utility function to call the segmentation algorithm of Shepherd et al. (2014) using pre-calculated stretch stats and KMeans cluster centres.

Where:

  • inputImg is a string containing the name of the input file.
  • outputClumps is a string containing the name of the output clump file.
  • kMeansCentres is a string providing the file name and path for the KMeans clusters centres (Input)
  • imgStretchStats is a string providing the file name and path for the image stretch stats (Input - not required if noStretch=True)
  • outputMeanImg is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
  • tmpath is a file path for intermediate files (default is current directory).
  • gdalformat is a string containing the GDAL format for the output file (default = KEA).
  • noStats is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
  • noStretch is a bool which specifies that the input image bands should not be stretched (default = False).
  • noDelete is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
  • minPxls is an int which specifies the minimum number pixels within a segments (default = 100).
  • distThres specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
  • bands is an array providing a subset of image bands to use (default is None to use all bands).
  • sampling specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
  • processInMem where functions allow it perform processing in memory rather than on disk.

Example:

from rsgislib.segmentation import segutils

inputImg = 'jers1palsar_stack.kea'
outputClumps = 'jers1palsar_stack_clumps_elim_final.kea'
outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea'
kMeansCentres = 'jers1palsar_stack_kcentres.gmtxt'
imgStretchStats = 'jers1palsar_stack_stchstats.txt'

segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg, minPxls=100)
rsgislib.segmentation.segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath=’.’, gdalformat=’KEA’, noStats=False, noStretch=False, noDelete=False, numClusters=100, minPxlsStart=10, minPxlsStep=5, numOfMinPxlsSteps=20, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Utility function to call the segmentation algorithm of Shepherd et al. (2014) and to test are range of ‘k’ within the kMeans.

Where:

  • inputImg is a string containing the name of the input file
  • outputClumps is a string containing the name of the output clump file
  • outStatsFile is a string containing the name of the output CSV file with the image segmentation stats
  • outputMeanImg is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
  • tmpath is a file path for intermediate files (default is current directory).
  • gdalformat is a string containing the GDAL format for the output file (default is KEA)
  • noStats is a bool which specifies that no image statistics and pyramids should be built for the output images.
  • noStretch is a bool which specifies that the input image bands should not be stretched.
  • noDelete is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
  • numClusters is an int which specifies the number of clusters within the KMeans clustering process
  • minPxlsStart is an int which specifies the minimum number pixels within a segments at the start of processing.
  • minPxlsStep is an int which specifies the minimum number pixels within a segments increment each step.
  • numOfMinPxlsSteps is an int which specifies the number steps (i.e., tests) which are performed.
  • distThres specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
  • bands is an array providing a subset of image bands to use (default is None to use all bands)
  • sampling specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
  • kmMaxIter maximum iterations for KMeans.
  • minNormV is a floating point =None
  • maxNormV=None
  • minNormMI=None
  • maxNormMI=None)

Example:

from rsgislib.segmentation import segutils


inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea'
outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_MinPxl'
outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_MinPxlMean'
tmpath='./OptimalTests/tmp/'
outStatsFile = './OptimalTests/StatsMinPxl.csv'

# Will test minimum number of pixels within an object from 10 to 100 with intervals of 5.
segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClusters=100, minPxlsStart=5, minPxlsStep=5, numOfMinPxlsSteps=20, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)
rsgislib.segmentation.segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath=’.’, gdalformat=’KEA’, noStats=False, noStretch=False, noDelete=False, numClustersStart=10, numClustersStep=10, numOfClustersSteps=10, minPxls=10, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, processInMem=False, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Utility function to call the segmentation algorithm of Shepherd et al. (2014) and to test are range of ‘k’ within the kMeans.

Where:

  • inputImg is a string containing the name of the input file
  • outputClumps is a string containing the name of the output clump file
  • outStatsFile is a string containing the name of the output CSV file with the image segmentation stats
  • outputMeanImg is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
  • tmpath is a file path for intermediate files (default is current directory).
  • gdalformat is a string containing the GDAL format for the output file (default is KEA)
  • noStats is a bool which specifies that no image statistics and pyramids should be built for the output images.
  • noStretch is a bool which specifies that the input image bands should not be stretched.
  • noDelete is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
  • numClustersStart is an int which specifies the number of clusters within the KMeans clustering to start the process
  • numClustersStep is an int which specifies the number of clusters within the KMeans clustering added with each step
  • numOfClustersSteps is an int which specifies the number steps (i.e., tests) which are performed.
  • minPxls is an int which specifies the minimum number pixels within a segments.
  • distThres specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
  • bands is an array providing a subset of image bands to use (default is None to use all bands)
  • sampling specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
  • kmMaxIter maximum iterations for KMeans.
  • processInMem where functions allow it perform processing in memory rather than on disk.
  • minNormV is a floating point =None
  • maxNormV=None
  • minNormMI=None
  • maxNormMI=None)

Example:

from rsgislib.segmentation import segutils


inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea'
outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_Clumps'
outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_ClumpsMean'
tmpath='./OptimalTests/tmp/'
outStatsFile = './OptimalTests/StatsClumps.csv'

# Will test clump values from 10 to 200 with intervals of 10.
segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClustersStart=10, numClustersStep=10, numOfClustersSteps=20, minPxls=50, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Clump

rsgislib.segmentation.clump(inputimage, outputimage, gdalformat, processinmemory, nodata, addPxlVal2Rat)

A function which clumps an input image (of int pixel data type) to identify connected independent sets of pixels.

Where:

  • inputimage is a string containing the name of the input file
  • outputimage is a string containing the name of the output file
  • gdalformat is a string containing the GDAL format for the output file - eg ‘KEA’
  • processinmemory is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
  • nodata is None or float
  • addPxlVal2Rat is a boolean specifying whether the pixel value (from inputimage) should be added as a RAT.
rsgislib.segmentation.tiledclump.performClumpingSingleThread(inputImage, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
rsgislib.segmentation.tiledclump.performClumpingMultiProcess(inputImage, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’, nCores=-1)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
  • nCores - is an int specifying the number of cores to be used for clumping processing.
rsgislib.segmentation.tiledclump.performUnionClumpingSingleThread(inputImage, refImg, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • refImg - the reference image which the union is undertaken with (typically an existing classification)
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
rsgislib.segmentation.tiledclump.performUnionClumpingMultiProcess(inputImage, refImg, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’, nCores=-1)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • refImg - the reference image which the union is undertaken with (typically an existing classification)
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
  • nCores - is an int specifying the number of cores to be used for clumping processing.

Label

rsgislib.segmentation.labelPixelsFromClusterCentres(inputimage, outputimage, clustercenters, ignorezeros, gdalformat)

Labels image pixels with the ID of the nearest cluster centre.

Where:

  • inputimage is a string containing the name of the input file
  • outputimage is a string containing the name of the output file
  • clustercentres is a string containing the name of the cluster centre file
  • ignore zeros is a bool
  • gdalformat is a string containing the GDAL format for the output file - eg ‘KEA’
rsgislib.segmentation.relabelClumps(inputimage, outputimage, gdalformat, processinmemory)

Relabel clumps

Where:

  • inputimage is a string containing the name of the input file
  • outputimage is a string containing the name of the output file
  • gdalformat is a string containing the GDAL format for the output file - eg ‘KEA’
  • processinmemory is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

Elimination

rsgislib.segmentation.eliminateSinglePixels(inputimage, clumpsimage, outputimage, tempfile, gdalformat, processinmemory, ignorezeros)

Eliminates single pixels

Where:

  • inputimage is a string containing the name of the input file
  • clumpsimage is a string containing the name of the clump file
  • outputimage is a string containing the name of the output file
  • tempfile is a string containing the name of the temporary file to use
  • gdalformat is a string containing the GDAL format for the output file - eg ‘KEA’
  • processinmemory is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
  • ignore zeros is a bool
rsgislib.segmentation.rmSmallClumps(clumpsImage, outputImage, threshold, gdalformat)

A function to remove small clumps and set them with a value of 0 (i.e., no data)

Where:

  • clumpsImage is a string containing the name of the input clumps file - note a column called ‘Histogram’.
  • outputImage is a string containing the name of the output clumps file
  • threshold is a float containing the area threshold (in pixels)
  • gdalformat is a string defining the format of the output image.
rsgislib.segmentation.rmSmallClumpsStepwise(inputimage, clumpsimage, outputimage, gdalformat, stretchstatsavail, stretchstatsfile, storemean, processinmemory, minclumpsize, specThreshold)

eliminate clumps smaller than a given size from the scene, small clumps will be combined with their spectrally closest neighbouring clump in a stepwise fashion unless over spectral distance threshold

Where:

  • inputimage is a string containing the name of the input file
  • clumpsimage is a string containing the name of the clump file
  • outputimage is a string containing the name of the output file
  • gdalformat is a string containing the GDAL format for the output file - eg ‘KEA’
  • stretchstatsavail is a bool
  • stretchstatsfile is a string containing the name of the stretch stats file
  • storemean is a bool
  • processinmemory is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
  • minclumpsize is an unsigned integer providing the minimum size for clumps.
  • specThreshold is a float providing the maximum (Euclidian distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.

Join / Union

rsgislib.segmentation.unionOfClumps(outputimage, gdalformat, inputimagepaths, nodata, addPxlVals2Rat)

The function takes the union of clumps images - combining them so all lines from all clumps are preserved in the new outputted clumps image.

Where:

  • outputimage is a string containing the name of the output file
  • gdalformat is a string containing the GDAL format for the output file - eg ‘KEA’
  • inputimagepaths is a list of input image paths
  • nodata is None or float
  • addPxlVals2Rat is a boolean specifying whether the pixel values (from inputimagepaths) should be added as a RAT; column names have prefix ‘ClumpVal_’ with index starting at 1 for each variable.

Visualisation

rsgislib.segmentation.meanImage(inputImage, inputClumps, outputImage, gdalformat, datatype)

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Where:

  • inputImage is a string containing the name of the input image file from which the mean is taken.
  • inputClumps is a string containing the name of the input clumps file
  • outputImage is a string containing the name of the output image.
  • gdalformat is a string defining the format of the output image.
  • datatype is an containing one of the values from rsgislib.TYPE_*

Tiles

rsgislib.segmentation.mergeSegmentationTiles(outputimage, bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)

Merge body clumps from tile segmentations into outputfile

Where:

  • outputimage is a string containing the name of the output file
  • bordermaskimage is a string containing the name of the border mask file
  • tileboundary is an unsigned integer containing the tile boundary pixel value
  • tileoverlap is an unsigned integer containing the tile overlap pixel value
  • tilebody is an unsigned integer containing the tile body pixel value
  • colsname is a string containing the name of the object id column
  • inputimagepaths is a list of input image paths
rsgislib.segmentation.tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR=’segtmp’, tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)

Utility function to call the segmentation algorithm of Shepherd et al. (2014) using the tiled process outlined in Clewley et al (2015).

  • inputImage is a string containing the name of the input file.
  • clumpsImage is a string containing the name of the output clump file.
  • tmpath is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
  • tileWidth is an int specifying the width of the tiles used for processing (Default 2000)
  • tileHeight is an int specifying the height of the tiles used for processing (Default 2000)
  • validDataThreshold is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
  • numClusters is an int which specifies the number of clusters within the KMeans clustering (default = 60).
  • minPxls is an int which specifies the minimum number pixels within a segments (default = 100).
  • distThres specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
  • bands is an array providing a subset of image bands to use (default is None to use all bands).
  • sampling specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
  • kmMaxIter maximum iterations for KMeans (Default 200).

Example:

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
rsgislib.segmentation.tiledclump.clumpImgFunc(imgs)

Clump an image with values provides as an array for use within a multiprocessing Pool

rsgislib.segmentation.tiledclump.performClumpingMultiProcess(inputImage, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’, nCores=-1)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
  • nCores - is an int specifying the number of cores to be used for clumping processing.
rsgislib.segmentation.tiledclump.performClumpingSingleThread(inputImage, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
rsgislib.segmentation.tiledclump.performUnionClumpingMultiProcess(inputImage, refImg, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’, nCores=-1)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • refImg - the reference image which the union is undertaken with (typically an existing classification)
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
  • nCores - is an int specifying the number of cores to be used for clumping processing.
rsgislib.segmentation.tiledclump.performUnionClumpingSingleThread(inputImage, refImg, clumpsImage, tmpDIR=’tmp’, width=2000, height=2000, imgFormat=’KEA’)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

  • inputImage - the input image to be clumped.
  • refImg - the reference image which the union is undertaken with (typically an existing classification)
  • clumpsImage - the output clumped image.
  • tmpDIR - the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
  • width - int for width of the image tiles used for processing (Default = 2000).
  • height - int for height of the image tiles used for processing (Default = 2000).
  • imgformat - string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
rsgislib.segmentation.tiledclump.unionClumpImgFunc(imgs)

Union Clump an image with values provides as an array for use within a multiprocessing Pool

Other

rsgislib.segmentation.generateRegularGrid(inputImage, outputClumps, gdalformat, numXPxls, numYPxls, offset)

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Where:

  • inputImage is a string containing the name of the input image file specifying the dimensions of the output image.
  • outputClumps is a string containing the name and path of the output clumps image
  • gdalformat is a string defining the format of the output image.
  • numXPxls is the size of the grid cells in the X axis in pixel units.
  • numYPxls is the size of the grid cells in the Y axis in pixel units.
  • offset is a boolean specifying whether the grid should be offset, i.e., starts half way point of numXPxls and numYPxls (Default is false; optional)
rsgislib.segmentation.dropSelectedClumps(clumpsImage, outputClumps, gdalFormat)

A function to drop the selected clumps from the segmentation.

Where:

  • clumpsImage is a string containing the filepath for the input clumps image.
  • outputClumps is a string containing the name and path of the output clumps image
  • gdalFormat is a string defining the format of the output image.
  • selectClumpsCol is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
rsgislib.segmentation.findTileBordersMask(bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)

Mask tile borders

Where:

  • bordermaskimage is a string containing the name of the border mask file
  • tileboundary is an unsigned integer containing the tile boundary pixel value
  • tileoverlap is an unsigned integer containing the tile overlap pixel value
  • tilebody is an unsigned integer containing the tile body pixel value
  • colsname is a string containing the name of the object id column
  • inputimagepaths is a list of input clump image paths
rsgislib.segmentation.includeRegionsInClumps(clumpsImage, regionsImage, outputClumps, gdalFormat)

A function to include a set of clumped regions within an existing clumps (i.e., segmentation) image. NOTE. You should run the relabelClumps function on the output of this command before using further.

Where:

  • clumpsImage is a string containing the filepath for the input clumps image.
  • regionsImage is a string containing the filepath for the input regions image.
  • outputClumps is a string containing the name and path of the output clumps image
  • gdalFormat is a string defining the format of the output image.
rsgislib.segmentation.mergeClumpImages(inputimagepaths, outputimage, mergeRATs)

Merge all clumps from tile segmentations into outputfile

Where:

  • inputimagepaths is a list of input image paths
  • outputimage is a string containing the name of the output file
  • mergeRATs is a boolean specifying with the image RATs are to merged (Default: false; Optional)
rsgislib.segmentation.mergeEquivClumps(clumpsImage, outputClumps, gdalFormat, valClumpsCols)

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Where:

  • clumpsImage is a string containing the filepath for the input clumps image.
  • outputClumps is a string containing the name and path of the output clumps image
  • gdalFormat is a string defining the format of the output image.
  • valClumpsCol is a list of strings defining the value(s) used to define equivalence (typically it might be the original pixel values when clumping through tiling).
rsgislib.segmentation.mergeSegments2Neighbours(clumpsImage, spectralImage, outputClumps, gdalFormat, selectedClumpsCol, noDataClumpsCol)

A function to merge some selected clumps with the neighbours based on colour (spectral) distance where clumps identified as no data are ignored.

Where:

  • clumpsImage is a string containing the filepath for the input clumps image.
  • spectralImage is a string containing the filepath for the input image used to define ‘distance’.
  • outputClumps is a string containing the name and path of the output clumps image
  • gdalFormat is a string defining the format of the output image.
  • selectClumpsCol is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
  • noDataClumpsCol is a string defining the binary column for defining the segments to be ignored as no data (1 == no-data clumps).
rsgislib.segmentation.pxlGrowRegions(clumpsImage, valsImage, outputImage, gdalFormat, muParseCriteria, varNameBandPairs)

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Where:

  • clumpsImage is a string containing the filepath for the input clumps image.
  • valsImage is a string containing the file path for the values (criteria) image.
  • outputClumps is a string containing the name and path of the output clumps image
  • gdalFormat is a string defining the format of the output image.
  • muParseCriteria is a string with an muparser criteria (muparser; e.g., b1 < 20?1:0). Expression output must be 0 or 1 (1 for True).
  • varNameBandPairs is a list pairs specifying the variable name (in muparser expression) and the band number to which it refers in valsImage (note band numbers start a 1).

Example:

varBandPair = collections.namedtuple('VarBandPair', ['varName', 'bandIndex'])
varBandPairSeq = list()
varBandPairSeq.append(varBandPair(varName='b1', bandIndex=1))
muParseCriteria = 'b1 > 1000?1:0'
rsgislib.segmentation.pxlGrowRegions(tmpInitClearSkyRegionsFinal, tmpCloudsImgDist2CloudsNoData, tmpClearSkyRegionsGrow, 'KEA', muParseCriteria, varBandPairSeq)