RSGISLib Image Segmentation Module¶
The segmentation module contains the segmentation functionality for RSGISLib.
A number of steps are required for the segmentation, for most users it is recommended to use the runShepherdSegmentation helper function which will run all the required steps to generate a segmentation:
from rsgislib.segmentation import segutils
sampling=100, kmMaxIter=200)
Where ‘inImage’ is the input image (optionally masked and stretched) and ‘outputClumps’ is the output clumps file.
More information about the segmentation method is available in the following paper:
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658.
For the wider system of data analysis using segments see the following paper:
Daniel Clewley, Peter Bunting, James Shepherd, Sam Gillingham, Neil Flood, John Dymond, Richard Lucas, John Armston and Mahta Moghaddam. 2014. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing. Volume 6, Pages 6111-6135.
(inputImg, outputClumps, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200, processInMem=False, saveProcessStats=False, imgStretchStats='', kMeansCentres='', imgStatsJSONFile='')¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019).
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658.
- Parameters
inputImg – is a string containing the name of the input file.
outputClumps – is a string containing the name of the output clump file.
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).
noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans.
processInMem – where functions allow it perform processing in memory rather than on disk.
saveProcessStats – is a bool which specifies that the image stretch stats and the kMeans centre stats should be saved along with a header.
imgStretchStats – is a string providing the file name and path for the image stretch stats (Output).
kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (don’t include file extension; .gmtxt will be added to the end) (Output).
imgStatsJSONFile – is a string providing the name and path of a JSON file storing the image spatial extent and imgStretchStats and kMeansCentres file paths for use by other commands (Output).
from rsgislib.segmentation import segutils inputImg = 'jers1palsar_stack.kea' outputClumps = 'jers1palsar_stack_clumps_elim_final.kea' outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea' segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg, minPxls=100)
(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).
- Parameters
inputImage – is a string containing the name of the input file.
clumpsImage – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)
tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)
validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans (Default 200).
from rsgislib.segmentation import tiledsegsingle inputImage = 'LS5TM_20110428_sref_submask_osgb.kea' clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea' tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, minPxls=100, distThres=100, bands=None, processInMem=False)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) using pre-calculated stretch stats and KMeans cluster centres.
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658.
- Parameters
inputImg – is a string containing the name of the input file.
outputClumps – is a string containing the name of the output clump file.
kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (Input)
imgStretchStats – is a string providing the file name and path for the image stretch stats (Input - not required if noStretch=True)
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).
noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
processInMem – where functions allow it perform processing in memory rather than on disk.
from rsgislib.segmentation import segutils inputImg = 'jers1palsar_stack.kea' outputClumps = 'jers1palsar_stack_clumps_elim_final.kea' outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea' kMeansCentres = 'jers1palsar_stack_kcentres.gmtxt' imgStretchStats = 'jers1palsar_stack_stchstats.txt' segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg, minPxls=100)
(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=100, minPxlsStart=10, minPxlsStep=5, numOfMinPxlsSteps=20, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.
- Parameters
inputImg – is a string containing the name of the input file
outputClumps – is a string containing the name of the output clump file
outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default is KEA)
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.
noStretch – is a bool which specifies that the input image bands should not be stretched.
noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
numClusters – is an int which specifies the number of clusters within the KMeans clustering process
minPxlsStart – is an int which specifies the minimum number pixels within a segments at the start of processing.
minPxlsStep – is an int which specifies the minimum number pixels within a segments increment each step.
numOfMinPxlsSteps – is an int which specifies the number steps (i.e., tests) which are performed.
distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
bands – is an array providing a subset of image bands to use (default is None to use all bands)
sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
kmMaxIter – maximum iterations for KMeans.
minNormV – is a floating point =None
maxNormV – None
minNormMI – None
maxNormMI – None
from rsgislib.segmentation import segutils inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea' outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_MinPxl' outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_MinPxlMean' tmpath='./OptimalTests/tmp/' outStatsFile = './OptimalTests/StatsMinPxl.csv' # Will test minimum number of pixels within an object from 10 to 100 with intervals of 5. segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClusters=100, minPxlsStart=5, minPxlsStep=5, numOfMinPxlsSteps=20, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)
(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClustersStart=10, numClustersStep=10, numOfClustersSteps=10, minPxls=10, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, processInMem=False, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658.
- Parameters
inputImg – is a string containing the name of the input file
outputClumps – is a string containing the name of the output clump file
outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default is KEA)
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.
noStretch – is a bool which specifies that the input image bands should not be stretched.
noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
numClustersStart – is an int which specifies the number of clusters within the KMeans clustering to start the process
numClustersStep – is an int which specifies the number of clusters within the KMeans clustering added with each step
numOfClustersSteps – is an int which specifies the number steps (i.e., tests) which are performed.
minPxls – is an int which specifies the minimum number pixels within a segments.
distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
bands – is an array providing a subset of image bands to use (default is None to use all bands)
sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
kmMaxIter – maximum iterations for KMeans.
processInMem – where functions allow it perform processing in memory rather than on disk.
minNormV – is a floating point =None
maxNormV – None
minNormMI – None
maxNormMI – None
from rsgislib.segmentation import segutils inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea' outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_Clumps' outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_ClumpsMean' tmpath='./OptimalTests/tmp/' outStatsFile = './OptimalTests/StatsClumps.csv' # Will test clump values from 10 to 200 with intervals of 10. segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClustersStart=10, numClustersStep=10, numOfClustersSteps=20, minPxls=50, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)
(inputimage, outputimage, gdalformat, processinmemory, nodata, addPxlVal2Rat)¶ A function which clumps an input image (of int pixel data type) to identify connected independent sets of pixels.
- Parameters
inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
nodata – is None or float
addPxlVal2Rat – is a boolean specifying whether the pixel value (from inputimage) should be added as a RAT.
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
(inputimage, outputimage, clustercenters, ignorezeros, gdalformat)¶ Labels image pixels with the ID of the nearest cluster centre.
- Parameters
inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
clustercentres – is a string containing the name of the cluster centre file
ignore – zeros is a bool
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
(inputimage, outputimage, gdalformat, processinmemory)¶ Relabel clumps
- Parameters
inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
(inputimage, clumpsimage, outputimage, tempfile, gdalformat, processinmemory, ignorezeros)¶ Eliminates single pixels
- Parameters
inputimage – is a string containing the name of the input file
clumpsimage – is a string containing the name of the clump file
outputimage – is a string containing the name of the output file
tempfile – is a string containing the name of the temporary file to use
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
ignorezeros – is a bool
(clumpsImage, outputImage, threshold, gdalformat)¶ A function to remove small clumps and set them with a value of 0 (i.e., no data)
- Parameters
clumpsImage – is a string containing the name of the input clumps file - note a column called ‘Histogram’.
outputImage – is a string containing the name of the output clumps file
threshold – is a float containing the area threshold (in pixels)
gdalformat – is a string defining the format of the output image.
(inputimage, clumpsimage, outputimage, gdalformat, stretchstatsavail, stretchstatsfile, storemean, processinmemory, minclumpsize, specThreshold)¶ eliminate clumps smaller than a given size from the scene, small clumps will be combined with their spectrally closest neighbouring clump in a stepwise fashion unless over spectral distance threshold
- Parameters
inputimage – is a string containing the name of the input file
clumpsimage – is a string containing the name of the clump file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
stretchstatsavail – is a bool
stretchstatsfile – is a string containing the name of the stretch stats file
storemean – is a bool
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
minclumpsize – is an unsigned integer providing the minimum size for clumps.
specThreshold – is a float providing the maximum (Euclidian distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.
Join / Union¶
(outputimage, gdalformat, inputimagepaths, nodata, addPxlVals2Rat)¶ The function takes the union of clumps images - combining them so all lines from all clumps are preserved in the new outputted clumps image.
- Parameters
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
inputimagepaths – is a list of input image paths
nodata – is None or float
addPxlVals2Rat – is a boolean specifying whether the pixel values (from inputimagepaths) should be added as a RAT; column names have prefix ‘ClumpVal_’ with index starting at 1 for each variable.
(inputImage, inputClumps, outputImage, gdalformat, datatype)¶ A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.
- Parameters
inputImage – is a string containing the name of the input image file from which the mean is taken.
inputClumps – is a string containing the name of the input clumps file
outputImage – is a string containing the name of the output image.
gdalformat – is a string defining the format of the output image.
datatype – is an containing one of the values from rsgislib.TYPE_*
(outputimage, bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)¶ Merge body clumps from tile segmentations into outputfile
- Parameters
outputimage – is a string containing the name of the output file
bordermaskimage – is a string containing the name of the border mask file
tileboundary – is an unsigned integer containing the tile boundary pixel value
tileoverlap – is an unsigned integer containing the tile overlap pixel value
tilebody – is an unsigned integer containing the tile body pixel value
colsname – is a string containing the name of the object id column
inputimagepaths – is a list of input image paths
(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).
- Parameters
inputImage – is a string containing the name of the input file.
clumpsImage – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)
tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)
validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans (Default 200).
from rsgislib.segmentation import tiledsegsingle inputImage = 'LS5TM_20110428_sref_submask_osgb.kea' clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea' tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
(imgs)¶ Clump an image with values provides as an array for use within a multiprocessing Pool
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
(imgs)¶ Union Clump an image with values provides as an array for use within a multiprocessing Pool
(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, scale=1, sigma=0.8, min_size=20)¶ A function to perform the Felsenszwalb segmentation algorithm from the scikit-image library (
- Parameters
inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
scale – scikit-image Felsenszwalb parameter: ‘Free parameter. Higher means larger clusters.’
sigma – scikit-image Felsenszwalb parameter: ‘Width of Gaussian kernel used in preprocessing.’
min_size – scikit-image Felsenszwalb parameter: ‘Minimum component size. Enforced using postprocessing.’
(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, pcaPxlSample=100, ratio=1.0, kernel_size=5, max_dist=10, sigma=0, convert2lab=True, random_seed=42)¶ A function to perform the quickshift segmentation algorithm from the scikit-image library (
- Parameters
inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 3 image bands in the input file then you can use PCA to reduce the number of image bands.
ratio – scikit-image Quickshift parameter: ‘Balances color-space proximity and image-space proximity. Higher values give more weight to color-space. (between 0 and 1)’
kernel_size – scikit-image Quickshift parameter: ‘Width of Gaussian kernel used in smoothing the sample density. Higher means fewer clusters.’
max_dist – scikit-image Quickshift parameter: ‘Cut-off point for data distances. Higher means fewer clusters.’
sigma – scikit-image Quickshift parameter: ‘Width for Gaussian smoothing as preprocessing. Zero means no smoothing.’
convert2lab – scikit-image Quickshift parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. For this purpose, the input is assumed to be RGB.’
random_seed – scikit-image Quickshift parameter: ‘Random seed used for breaking ties.’
(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, beta=130, mode='bf', tol=0.001, spacing=None)¶ A function to perform the random walker segmentation algorithm from the scikit-image library (
- Parameters
inputImg – input image file.
markersImg – input markers image file - markers must be uniquely numbered.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
beta – scikit-image random_walker parameter: ‘Penalization coefficient for the random walker motion (the greater beta, the more difficult the diffusion).’
mode – scikit-image random_walker parameter: ‘Mode for solving the linear system in the random walker algorithm. Available options {‘cg_mg’, ‘cg’, ‘bf’}.’ * ‘bf’ (brute force): an LU factorization of the Laplacian is computed. This is fast for small images (<1024x1024), but very slow and memory-intensive for large images (e.g., 3-D volumes). * ‘cg’ (conjugate gradient): the linear system is solved iteratively using the Conjugate Gradient method from scipy.sparse.linalg. This is less memory-consuming than the brute force method for large images, but it is quite slow. * ‘cg_mg’ (conjugate gradient with multigrid preconditioner): a preconditioner is computed using a multigrid solver, then the solution is computed with the Conjugate Gradient method. This mode requires that the pyamg module ( is installed. For images of size > 512x512, this is the recommended (fastest) mode.
tol – scikit-image random_walker parameter: ‘tolerance to achieve when solving the linear system, in cg’ and ‘cg_mg’ modes.’
spacing – scikit-image random_walker parameter: ‘Spacing between voxels in each spatial dimension. If None, then the spacing between pixels/voxels in each dimension is assumed 1.’
(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, n_segments=100, compactness=10.0, max_iter=10, sigma=0, spacing=None, convert2lab=None, enforce_connectivity=True, min_size_factor=0.5, max_size_factor=3, slic_zero=False)¶ A function to perform the slic segmentation algorithm from the scikit-image library (
- Parameters
inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
n_segments – scikit-image Slic parameter: ‘The (approximate) number of labels in the segmented output image.’
compactness – scikit-image Slic parameter: ‘Balances color proximity and space proximity. Higher values give more weight to space proximity, making superpixel shapes more square/cubic. In SLICO mode, this is the initial compactness. This parameter depends strongly on image contrast and on the shapes of objects in the image. We recommend exploring possible values on a log scale, e.g., 0.01, 0.1, 1, 10, 100, before refining around a chosen value.’
max_iter – scikit-image Slic parameter: ‘Maximum number of iterations of k-means.’
sigma – scikit-image Slic parameter: ‘Width of Gaussian smoothing kernel for pre-processing for each dimension of the image. The same sigma is applied to each dimension in case of a scalar value. Zero means no smoothing. Note, that sigma is automatically scaled if it is scalar and a manual voxel spacing is provided (see Notes section).’
spacing – scikit-image Slic parameter: ‘The voxel spacing along each image dimension. By default, slic assumes uniform spacing (same voxel resolution along z, y and x). This parameter controls the weights of the distances along z, y, and x during k-means clustering.’
convert2lab – scikit-image Slic parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. The input image must be RGB. Highly recommended.’
enforce_connectivity – scikit-image Slic parameter: ‘Whether the generated segments are connected or not’
min_size_factor – scikit-image Slic parameter: ‘Proportion of the minimum segment size to be removed with respect to the supposed segment size “depth:paramwidth*height/n_segments”’
max_size_factor – scikit-image Slic parameter: ‘Proportion of the maximum connected segment size. A value of 3 works in most of the cases.’
slic_zero – scikit-image Slic parameter: ‘Run SLIC-zero, the zero-parameter mode of SLIC.’
(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, compactness=0, watershed_line=False)¶ A function to perform the watershed segmentation algorithm from the scikit-image library (
- Parameters
inputImg – input image file.
markersImg – input markers image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
compactness – scikit-image Watershed parameter: ‘Use compact watershed with given compactness parameter. Higher values result in more regularly-shaped watershed basins; Peer Neubert & Peter Protzel (2014). Compact Watershed and Preemptive SLIC: On Improving Trade-offs of Superpixel Segmentation Algorithms. ICPR 2014’
watershed_line – scikit-image Watershed parameter: ‘If watershed_line is True, a one-pixel wide line separates the regions obtained by the watershed algorithm. The line has the label 0.’
(inputImage, outputClumps, gdalformat, numXPxls, numYPxls, offset)¶ A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.
- Parameters
inputImage – is a string containing the name of the input image file specifying the dimensions of the output image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
numXPxls – is the size of the grid cells in the X axis in pixel units.
numYPxls – is the size of the grid cells in the Y axis in pixel units.
offset – is a boolean specifying whether the grid should be offset, i.e., starts half way point of numXPxls and numYPxls (Default is false; optional)
(clumpsImage, outputClumps, gdalformat)¶ A function to drop the selected clumps from the segmentation.
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
(bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)¶ Mask tile borders
- Parameters
bordermaskimage – is a string containing the name of the border mask file
tileboundary – is an unsigned integer containing the tile boundary pixel value
tileoverlap – is an unsigned integer containing the tile overlap pixel value
tilebody – is an unsigned integer containing the tile body pixel value
colsname – is a string containing the name of the object id column
inputimagepaths – is a list of input clump image paths
(clumpsImage, regionsImage, outputClumps, gdalformat)¶ A function to include a set of clumped regions within an existing clumps (i.e., segmentation) image. NOTE. You should run the relabelClumps function on the output of this command before using further.
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
regionsImage – is a string containing the filepath for the input regions image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
(inputimagepaths, outputimage, mergeRATs)¶ Merge all clumps from tile segmentations into outputfile
- Parameters
inputimagepaths – is a list of input image paths
outputimage – is a string containing the name of the output file
mergeRATs – is a boolean specifying with the image RATs are to merged (Default: false; Optional)
(clumpsImage, outputClumps, gdalformat, valClumpsCols)¶ A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
valClumpsCol – is a list of strings defining the value(s) used to define equivalence (typically it might be the original pixel values when clumping through tiling).
(clumpsImage, spectralImage, outputClumps, gdalformat, selectedClumpsCol, noDataClumpsCol)¶ A function to merge some selected clumps with the neighbours based on colour (spectral) distance where clumps identified as no data are ignored.
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
spectralImage – is a string containing the filepath for the input image used to define ‘distance’.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
noDataClumpsCol – is a string defining the binary column for defining the segments to be ignored as no data (1 == no-data clumps).
(clumpsImage, valsImage, outputImage, gdalformat, muParseCriteria, varNameBandPairs)¶ A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
valsImage – is a string containing the file path for the values (criteria) image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
muParseCriteria – is a string with an muparser criteria (muparser; e.g., b1 < 20?1:0). Expression output must be 0 or 1 (1 for True).
varNameBandPairs – is a list pairs specifying the variable name (in muparser expression) and the band number to which it refers in valsImage (note band numbers start a 1).
varBandPair = collections.namedtuple('VarBandPair', ['varName', 'bandIndex']) varBandPairSeq = list() varBandPairSeq.append(varBandPair(varName='b1', bandIndex=1)) muParseCriteria = 'b1 > 1000?1:0' rsgislib.segmentation.pxlGrowRegions(tmpInitClearSkyRegionsFinal, tmpCloudsImgDist2CloudsNoData, tmpClearSkyRegionsGrow, 'KEA', muParseCriteria, varBandPairSeq)