RSGISLib Image Segmentation Module¶

The segmentation module contains the segmentation functionality for RSGISLib.

A number of steps are required for the segmentation, for most users it is recommended to use the runShepherdSegmentation helper function which will run all the required steps to generate a segmentation:

Example:

from rsgislib.segmentation import segutils

segutils.runShepherdSegmentation(inImage,
                                 outputClumps,
                                 tmpath='./',
                                 numClusters=60,
                                 minPxls=100,
                                 distThres=100,
                                 sampling=100, kmMaxIter=200)

Where ‘inImage’ is the input image (optionally masked and stretched) and ‘outputClumps’ is the output clumps file.

More information about the segmentation method is available in the following paper:

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

For the wider system of data analysis using segments see the following paper:

Daniel Clewley, Peter Bunting, James Shepherd, Sam Gillingham, Neil Flood, John Dymond, Richard Lucas, John Armston and Mahta Moghaddam. 2014. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing. Volume 6, Pages 6111-6135. http://www.mdpi.com/2072-4292/6/7/6111

Utilities¶

rsgislib.segmentation.segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200, processInMem=False, saveProcessStats=False, imgStretchStats='', kMeansCentres='', imgStatsJSONFile='')¶

Utility function to call the segmentation algorithm of Shepherd et al. (2019).

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Where:

Parameters

inputImg – is a string containing the name of the input file.
outputClumps – is a string containing the name of the output clump file.
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).
noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans.
processInMem – where functions allow it perform processing in memory rather than on disk.
saveProcessStats – is a bool which specifies that the image stretch stats and the kMeans centre stats should be saved along with a header.
imgStretchStats – is a string providing the file name and path for the image stretch stats (Output).
kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (don’t include file extension; .gmtxt will be added to the end) (Output).
imgStatsJSONFile – is a string providing the name and path of a JSON file storing the image spatial extent and imgStretchStats and kMeansCentres file paths for use by other commands (Output).

Example:

from rsgislib.segmentation import segutils

inputImg = 'jers1palsar_stack.kea'
outputClumps = 'jers1palsar_stack_clumps_elim_final.kea'
outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea'

segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg, minPxls=100)

rsgislib.segmentation.tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)¶

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).

Parameters

inputImage – is a string containing the name of the input file.
clumpsImage – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)
tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)
validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans (Default 200).

Example:

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)

rsgislib.segmentation.segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, minPxls=100, distThres=100, bands=None, processInMem=False)¶

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using pre-calculated stretch stats and KMeans cluster centres.

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Where:

Parameters

inputImg – is a string containing the name of the input file.
outputClumps – is a string containing the name of the output clump file.
kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (Input)
imgStretchStats – is a string providing the file name and path for the image stretch stats (Input - not required if noStretch=True)
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).
noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
processInMem – where functions allow it perform processing in memory rather than on disk.

Example:

from rsgislib.segmentation import segutils

inputImg = 'jers1palsar_stack.kea'
outputClumps = 'jers1palsar_stack_clumps_elim_final.kea'
outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea'
kMeansCentres = 'jers1palsar_stack_kcentres.gmtxt'
imgStretchStats = 'jers1palsar_stack_stchstats.txt'

segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg, minPxls=100)

rsgislib.segmentation.segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=100, minPxlsStart=10, minPxlsStep=5, numOfMinPxlsSteps=20, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)¶

Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.

Where:

Parameters

inputImg – is a string containing the name of the input file
outputClumps – is a string containing the name of the output clump file
outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default is KEA)
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.
noStretch – is a bool which specifies that the input image bands should not be stretched.
noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
numClusters – is an int which specifies the number of clusters within the KMeans clustering process
minPxlsStart – is an int which specifies the minimum number pixels within a segments at the start of processing.
minPxlsStep – is an int which specifies the minimum number pixels within a segments increment each step.
numOfMinPxlsSteps – is an int which specifies the number steps (i.e., tests) which are performed.
distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
bands – is an array providing a subset of image bands to use (default is None to use all bands)
sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
kmMaxIter – maximum iterations for KMeans.
minNormV – is a floating point =None
maxNormV – None
minNormMI – None
maxNormMI – None

Example:

from rsgislib.segmentation import segutils

inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea'
outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_MinPxl'
outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_MinPxlMean'
tmpath='./OptimalTests/tmp/'
outStatsFile = './OptimalTests/StatsMinPxl.csv'

# Will test minimum number of pixels within an object from 10 to 100 with intervals of 5.
segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClusters=100, minPxlsStart=5, minPxlsStep=5, numOfMinPxlsSteps=20, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

rsgislib.segmentation.segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClustersStart=10, numClustersStep=10, numOfClustersSteps=10, minPxls=10, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, processInMem=False, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)¶

Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Where:

Parameters

inputImg – is a string containing the name of the input file
outputClumps – is a string containing the name of the output clump file
outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default is KEA)
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.
noStretch – is a bool which specifies that the input image bands should not be stretched.
noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
numClustersStart – is an int which specifies the number of clusters within the KMeans clustering to start the process
numClustersStep – is an int which specifies the number of clusters within the KMeans clustering added with each step
numOfClustersSteps – is an int which specifies the number steps (i.e., tests) which are performed.
minPxls – is an int which specifies the minimum number pixels within a segments.
distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
bands – is an array providing a subset of image bands to use (default is None to use all bands)
sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
kmMaxIter – maximum iterations for KMeans.
processInMem – where functions allow it perform processing in memory rather than on disk.
minNormV – is a floating point =None
maxNormV – None
minNormMI – None
maxNormMI – None

Example:

from rsgislib.segmentation import segutils


inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea'
outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_Clumps'
outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_ClumpsMean'
tmpath='./OptimalTests/tmp/'
outStatsFile = './OptimalTests/StatsClumps.csv'

# Will test clump values from 10 to 200 with intervals of 10.
segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClustersStart=10, numClustersStep=10, numOfClustersSteps=20, minPxls=50, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Clump¶

rsgislib.segmentation.clump(inputimage, outputimage, gdalformat, processinmemory, nodata, addPxlVal2Rat)¶

A function which clumps an input image (of int pixel data type) to identify connected independent sets of pixels.

Where:

Parameters

inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
nodata – is None or float
addPxlVal2Rat – is a boolean specifying whether the pixel value (from inputimage) should be added as a RAT.

rsgislib.segmentation.tiledclump.performClumpingSingleThread(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.performClumpingMultiProcess(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.

rsgislib.segmentation.tiledclump.performUnionClumpingSingleThread(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.performUnionClumpingMultiProcess(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.

Label¶

rsgislib.segmentation.labelPixelsFromClusterCentres(inputimage, outputimage, clustercenters, ignorezeros, gdalformat)¶

Labels image pixels with the ID of the nearest cluster centre.

Where:

Parameters

inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
clustercentres – is a string containing the name of the cluster centre file
ignore – zeros is a bool
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

rsgislib.segmentation.relabelClumps(inputimage, outputimage, gdalformat, processinmemory)¶

Relabel clumps

Where:

Parameters

inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

Elimination¶

rsgislib.segmentation.eliminateSinglePixels(inputimage, clumpsimage, outputimage, tempfile, gdalformat, processinmemory, ignorezeros)¶

Eliminates single pixels

Where:

Parameters

inputimage – is a string containing the name of the input file
clumpsimage – is a string containing the name of the clump file
outputimage – is a string containing the name of the output file
tempfile – is a string containing the name of the temporary file to use
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
ignorezeros – is a bool

rsgislib.segmentation.rmSmallClumps(clumpsImage, outputImage, threshold, gdalformat)¶

A function to remove small clumps and set them with a value of 0 (i.e., no data)

Where:

Parameters

clumpsImage – is a string containing the name of the input clumps file - note a column called ‘Histogram’.
outputImage – is a string containing the name of the output clumps file
threshold – is a float containing the area threshold (in pixels)
gdalformat – is a string defining the format of the output image.

rsgislib.segmentation.rmSmallClumpsStepwise(inputimage, clumpsimage, outputimage, gdalformat, stretchstatsavail, stretchstatsfile, storemean, processinmemory, minclumpsize, specThreshold)¶

eliminate clumps smaller than a given size from the scene, small clumps will be combined with their spectrally closest neighbouring clump in a stepwise fashion unless over spectral distance threshold

Where:

Parameters

inputimage – is a string containing the name of the input file
clumpsimage – is a string containing the name of the clump file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
stretchstatsavail – is a bool
stretchstatsfile – is a string containing the name of the stretch stats file
storemean – is a bool
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
minclumpsize – is an unsigned integer providing the minimum size for clumps.
specThreshold – is a float providing the maximum (Euclidian distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.

Join / Union¶

rsgislib.segmentation.unionOfClumps(outputimage, gdalformat, inputimagepaths, nodata, addPxlVals2Rat)¶

The function takes the union of clumps images - combining them so all lines from all clumps are preserved in the new outputted clumps image.

Where:

Parameters

outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
inputimagepaths – is a list of input image paths
nodata – is None or float
addPxlVals2Rat – is a boolean specifying whether the pixel values (from inputimagepaths) should be added as a RAT; column names have prefix ‘ClumpVal_’ with index starting at 1 for each variable.

Visualisation¶

rsgislib.segmentation.meanImage(inputImage, inputClumps, outputImage, gdalformat, datatype)¶

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Where:

Parameters

inputImage – is a string containing the name of the input image file from which the mean is taken.
inputClumps – is a string containing the name of the input clumps file
outputImage – is a string containing the name of the output image.
gdalformat – is a string defining the format of the output image.
datatype – is an containing one of the values from rsgislib.TYPE_*

Tiles¶

rsgislib.segmentation.mergeSegmentationTiles(outputimage, bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)¶

Merge body clumps from tile segmentations into outputfile

Where:

Parameters

outputimage – is a string containing the name of the output file
bordermaskimage – is a string containing the name of the border mask file
tileboundary – is an unsigned integer containing the tile boundary pixel value
tileoverlap – is an unsigned integer containing the tile overlap pixel value
tilebody – is an unsigned integer containing the tile body pixel value
colsname – is a string containing the name of the object id column
inputimagepaths – is a list of input image paths

rsgislib.segmentation.tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)¶

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).

Parameters

inputImage – is a string containing the name of the input file.
clumpsImage – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)
tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)
validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans (Default 200).

Example:

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)

rsgislib.segmentation.tiledclump.clumpImgFunc(imgs)¶: Clump an image with values provides as an array for use within a multiprocessing Pool

rsgislib.segmentation.tiledclump.performClumpingMultiProcess(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.

rsgislib.segmentation.tiledclump.performClumpingSingleThread(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.performUnionClumpingMultiProcess(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.

rsgislib.segmentation.tiledclump.performUnionClumpingSingleThread(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters

inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.unionClumpImgFunc(imgs)¶: Union Clump an image with values provides as an array for use within a multiprocessing Pool

scikit-image¶

rsgislib.segmentation.skimgseg.performFelsenszwalbSegmentation(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, scale=1, sigma=0.8, min_size=20)¶

A function to perform the Felsenszwalb segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters

inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
scale – scikit-image Felsenszwalb parameter: ‘Free parameter. Higher means larger clusters.’
sigma – scikit-image Felsenszwalb parameter: ‘Width of Gaussian kernel used in preprocessing.’
min_size – scikit-image Felsenszwalb parameter: ‘Minimum component size. Enforced using postprocessing.’

rsgislib.segmentation.skimgseg.performQuickshiftSegmentation(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, pcaPxlSample=100, ratio=1.0, kernel_size=5, max_dist=10, sigma=0, convert2lab=True, random_seed=42)¶

A function to perform the quickshift segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters

inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 3 image bands in the input file then you can use PCA to reduce the number of image bands.
ratio – scikit-image Quickshift parameter: ‘Balances color-space proximity and image-space proximity. Higher values give more weight to color-space. (between 0 and 1)’
kernel_size – scikit-image Quickshift parameter: ‘Width of Gaussian kernel used in smoothing the sample density. Higher means fewer clusters.’
max_dist – scikit-image Quickshift parameter: ‘Cut-off point for data distances. Higher means fewer clusters.’
sigma – scikit-image Quickshift parameter: ‘Width for Gaussian smoothing as preprocessing. Zero means no smoothing.’
convert2lab – scikit-image Quickshift parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. For this purpose, the input is assumed to be RGB.’
random_seed – scikit-image Quickshift parameter: ‘Random seed used for breaking ties.’

rsgislib.segmentation.skimgseg.performRandomWalkerSegmentation(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, beta=130, mode='bf', tol=0.001, spacing=None)¶

A function to perform the random walker segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters

inputImg – input image file.
markersImg – input markers image file - markers must be uniquely numbered.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
beta – scikit-image random_walker parameter: ‘Penalization coefficient for the random walker motion (the greater beta, the more difficult the diffusion).’
mode – scikit-image random_walker parameter: ‘Mode for solving the linear system in the random walker algorithm. Available options {‘cg_mg’, ‘cg’, ‘bf’}.’ * ‘bf’ (brute force): an LU factorization of the Laplacian is computed. This is fast for small images (<1024x1024), but very slow and memory-intensive for large images (e.g., 3-D volumes). * ‘cg’ (conjugate gradient): the linear system is solved iteratively using the Conjugate Gradient method from scipy.sparse.linalg. This is less memory-consuming than the brute force method for large images, but it is quite slow. * ‘cg_mg’ (conjugate gradient with multigrid preconditioner): a preconditioner is computed using a multigrid solver, then the solution is computed with the Conjugate Gradient method. This mode requires that the pyamg module (http://pyamg.org/) is installed. For images of size > 512x512, this is the recommended (fastest) mode.
tol – scikit-image random_walker parameter: ‘tolerance to achieve when solving the linear system, in cg’ and ‘cg_mg’ modes.’
spacing – scikit-image random_walker parameter: ‘Spacing between voxels in each spatial dimension. If None, then the spacing between pixels/voxels in each dimension is assumed 1.’

rsgislib.segmentation.skimgseg.performSlicSegmentation(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, n_segments=100, compactness=10.0, max_iter=10, sigma=0, spacing=None, convert2lab=None, enforce_connectivity=True, min_size_factor=0.5, max_size_factor=3, slic_zero=False)¶

A function to perform the slic segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters

inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
n_segments – scikit-image Slic parameter: ‘The (approximate) number of labels in the segmented output image.’
compactness – scikit-image Slic parameter: ‘Balances color proximity and space proximity. Higher values give more weight to space proximity, making superpixel shapes more square/cubic. In SLICO mode, this is the initial compactness. This parameter depends strongly on image contrast and on the shapes of objects in the image. We recommend exploring possible values on a log scale, e.g., 0.01, 0.1, 1, 10, 100, before refining around a chosen value.’
max_iter – scikit-image Slic parameter: ‘Maximum number of iterations of k-means.’
sigma – scikit-image Slic parameter: ‘Width of Gaussian smoothing kernel for pre-processing for each dimension of the image. The same sigma is applied to each dimension in case of a scalar value. Zero means no smoothing. Note, that sigma is automatically scaled if it is scalar and a manual voxel spacing is provided (see Notes section).’
spacing – scikit-image Slic parameter: ‘The voxel spacing along each image dimension. By default, slic assumes uniform spacing (same voxel resolution along z, y and x). This parameter controls the weights of the distances along z, y, and x during k-means clustering.’
convert2lab – scikit-image Slic parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. The input image must be RGB. Highly recommended.’
enforce_connectivity – scikit-image Slic parameter: ‘Whether the generated segments are connected or not’
min_size_factor – scikit-image Slic parameter: ‘Proportion of the minimum segment size to be removed with respect to the supposed segment size “depth:paramwidth*height/n_segments”’
max_size_factor – scikit-image Slic parameter: ‘Proportion of the maximum connected segment size. A value of 3 works in most of the cases.’
slic_zero – scikit-image Slic parameter: ‘Run SLIC-zero, the zero-parameter mode of SLIC.’

rsgislib.segmentation.skimgseg.performWatershedSegmentation(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, compactness=0, watershed_line=False)¶

A function to perform the watershed segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters

inputImg – input image file.
markersImg – input markers image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
compactness – scikit-image Watershed parameter: ‘Use compact watershed with given compactness parameter. Higher values result in more regularly-shaped watershed basins; Peer Neubert & Peter Protzel (2014). Compact Watershed and Preemptive SLIC: On Improving Trade-offs of Superpixel Segmentation Algorithms. ICPR 2014’
watershed_line – scikit-image Watershed parameter: ‘If watershed_line is True, a one-pixel wide line separates the regions obtained by the watershed algorithm. The line has the label 0.’

Other¶

rsgislib.segmentation.generateRegularGrid(inputImage, outputClumps, gdalformat, numXPxls, numYPxls, offset)¶

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Where:

Parameters

inputImage – is a string containing the name of the input image file specifying the dimensions of the output image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
numXPxls – is the size of the grid cells in the X axis in pixel units.
numYPxls – is the size of the grid cells in the Y axis in pixel units.
offset – is a boolean specifying whether the grid should be offset, i.e., starts half way point of numXPxls and numYPxls (Default is false; optional)

rsgislib.segmentation.dropSelectedClumps(clumpsImage, outputClumps, gdalformat)¶

A function to drop the selected clumps from the segmentation.

Where:

Parameters

clumpsImage – is a string containing the filepath for the input clumps image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).

rsgislib.segmentation.findTileBordersMask(bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)¶

Mask tile borders

Where:

Parameters

bordermaskimage – is a string containing the name of the border mask file
tileboundary – is an unsigned integer containing the tile boundary pixel value
tileoverlap – is an unsigned integer containing the tile overlap pixel value
tilebody – is an unsigned integer containing the tile body pixel value
colsname – is a string containing the name of the object id column
inputimagepaths – is a list of input clump image paths

rsgislib.segmentation.includeRegionsInClumps(clumpsImage, regionsImage, outputClumps, gdalformat)¶

A function to include a set of clumped regions within an existing clumps (i.e., segmentation) image. NOTE. You should run the relabelClumps function on the output of this command before using further.

Where:

Parameters

clumpsImage – is a string containing the filepath for the input clumps image.
regionsImage – is a string containing the filepath for the input regions image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.

rsgislib.segmentation.mergeClumpImages(inputimagepaths, outputimage, mergeRATs)¶

Merge all clumps from tile segmentations into outputfile

Where:

Parameters

inputimagepaths – is a list of input image paths
outputimage – is a string containing the name of the output file
mergeRATs – is a boolean specifying with the image RATs are to merged (Default: false; Optional)

rsgislib.segmentation.mergeEquivClumps(clumpsImage, outputClumps, gdalformat, valClumpsCols)¶

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Where:

Parameters

clumpsImage – is a string containing the filepath for the input clumps image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
valClumpsCol – is a list of strings defining the value(s) used to define equivalence (typically it might be the original pixel values when clumping through tiling).

rsgislib.segmentation.mergeSegments2Neighbours(clumpsImage, spectralImage, outputClumps, gdalformat, selectedClumpsCol, noDataClumpsCol)¶

A function to merge some selected clumps with the neighbours based on colour (spectral) distance where clumps identified as no data are ignored.

Where:

Parameters

clumpsImage – is a string containing the filepath for the input clumps image.
spectralImage – is a string containing the filepath for the input image used to define ‘distance’.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
noDataClumpsCol – is a string defining the binary column for defining the segments to be ignored as no data (1 == no-data clumps).

rsgislib.segmentation.pxlGrowRegions(clumpsImage, valsImage, outputImage, gdalformat, muParseCriteria, varNameBandPairs)¶

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Where:

Parameters

clumpsImage – is a string containing the filepath for the input clumps image.
valsImage – is a string containing the file path for the values (criteria) image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
muParseCriteria – is a string with an muparser criteria (muparser; e.g., b1 < 20?1:0). Expression output must be 0 or 1 (1 for True).
varNameBandPairs – is a list pairs specifying the variable name (in muparser expression) and the band number to which it refers in valsImage (note band numbers start a 1).

Example:

varBandPair = collections.namedtuple('VarBandPair', ['varName', 'bandIndex'])
varBandPairSeq = list()
varBandPairSeq.append(varBandPair(varName='b1', bandIndex=1))
muParseCriteria = 'b1 > 1000?1:0'
rsgislib.segmentation.pxlGrowRegions(tmpInitClearSkyRegionsFinal, tmpCloudsImgDist2CloudsNoData, tmpClearSkyRegionsGrow, 'KEA', muParseCriteria, varBandPairSeq)

RSGISLib Image Segmentation Module¶

Utilities¶

Clump¶

Label¶

Elimination¶

Join / Union¶

Visualisation¶

Tiles¶

scikit-image¶

Other¶

Table of Contents

Previous topic

Next topic

This Page