RSGISLib Image Segmentation Module¶
The segmentation module contains the segmentation functionality for RSGISLib.
A number of steps are required for the segmentation, for most users it is recommended to use the runShepherdSegmentation helper function which will run all the required steps to generate a segmentation:
Example:
from rsgislib.segmentation import segutils
segutils.runShepherdSegmentation(inImage,
outputClumps,
tmpath='./',
numClusters=60,
minPxls=100,
distThres=100,
sampling=100, kmMaxIter=200)
Where ‘inImage’ is the input image (optionally masked and stretched) and ‘outputClumps’ is the output clumps file.
More information about the segmentation method is available in the following paper:
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658
For the wider system of data analysis using segments see the following paper:
Daniel Clewley, Peter Bunting, James Shepherd, Sam Gillingham, Neil Flood, John Dymond, Richard Lucas, John Armston and Mahta Moghaddam. 2014. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing. Volume 6, Pages 6111-6135. http://www.mdpi.com/2072-4292/6/7/6111
Utilities¶
-
rsgislib.segmentation.segutils.
runShepherdSegmentation
(inputImg, outputClumps, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200, processInMem=False, saveProcessStats=False, imgStretchStats='', kMeansCentres='', imgStatsJSONFile='')¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019).
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658
Where:
- Parameters
inputImg – is a string containing the name of the input file.
outputClumps – is a string containing the name of the output clump file.
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).
noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans.
processInMem – where functions allow it perform processing in memory rather than on disk.
saveProcessStats – is a bool which specifies that the image stretch stats and the kMeans centre stats should be saved along with a header.
imgStretchStats – is a string providing the file name and path for the image stretch stats (Output).
kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (don’t include file extension; .gmtxt will be added to the end) (Output).
imgStatsJSONFile – is a string providing the name and path of a JSON file storing the image spatial extent and imgStretchStats and kMeansCentres file paths for use by other commands (Output).
Example:
from rsgislib.segmentation import segutils inputImg = 'jers1palsar_stack.kea' outputClumps = 'jers1palsar_stack_clumps_elim_final.kea' outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea' segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg, minPxls=100)
-
rsgislib.segmentation.tiledsegsingle.
performTiledSegmentation
(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).
- Parameters
inputImage – is a string containing the name of the input file.
clumpsImage – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)
tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)
validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans (Default 200).
Example:
from rsgislib.segmentation import tiledsegsingle inputImage = 'LS5TM_20110428_sref_submask_osgb.kea' clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea' tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
-
rsgislib.segmentation.segutils.
runShepherdSegmentationPreCalcdStats
(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, minPxls=100, distThres=100, bands=None, processInMem=False)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) using pre-calculated stretch stats and KMeans cluster centres.
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658
Where:
- Parameters
inputImg – is a string containing the name of the input file.
outputClumps – is a string containing the name of the output clump file.
kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (Input)
imgStretchStats – is a string providing the file name and path for the image stretch stats (Input - not required if noStretch=True)
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/
noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).
noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
processInMem – where functions allow it perform processing in memory rather than on disk.
Example:
from rsgislib.segmentation import segutils inputImg = 'jers1palsar_stack.kea' outputClumps = 'jers1palsar_stack_clumps_elim_final.kea' outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea' kMeansCentres = 'jers1palsar_stack_kcentres.gmtxt' imgStretchStats = 'jers1palsar_stack_stchstats.txt' segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg, minPxls=100)
-
rsgislib.segmentation.segutils.
runShepherdSegmentationTestMinObjSize
(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=100, minPxlsStart=10, minPxlsStep=5, numOfMinPxlsSteps=20, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.
Where:
- Parameters
inputImg – is a string containing the name of the input file
outputClumps – is a string containing the name of the output clump file
outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default is KEA)
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.
noStretch – is a bool which specifies that the input image bands should not be stretched.
noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
numClusters – is an int which specifies the number of clusters within the KMeans clustering process
minPxlsStart – is an int which specifies the minimum number pixels within a segments at the start of processing.
minPxlsStep – is an int which specifies the minimum number pixels within a segments increment each step.
numOfMinPxlsSteps – is an int which specifies the number steps (i.e., tests) which are performed.
distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
bands – is an array providing a subset of image bands to use (default is None to use all bands)
sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
kmMaxIter – maximum iterations for KMeans.
minNormV – is a floating point =None
maxNormV – None
minNormMI – None
maxNormMI – None
Example:
from rsgislib.segmentation import segutils inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea' outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_MinPxl' outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_MinPxlMean' tmpath='./OptimalTests/tmp/' outStatsFile = './OptimalTests/StatsMinPxl.csv' # Will test minimum number of pixels within an object from 10 to 100 with intervals of 5. segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClusters=100, minPxlsStart=5, minPxlsStep=5, numOfMinPxlsSteps=20, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)
-
rsgislib.segmentation.segutils.
runShepherdSegmentationTestNumClumps
(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClustersStart=10, numClustersStep=10, numOfClustersSteps=10, minPxls=10, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, processInMem=False, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.
Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658
Where:
- Parameters
inputImg – is a string containing the name of the input file
outputClumps – is a string containing the name of the output clump file
outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats
outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmpath – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default is KEA)
noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.
noStretch – is a bool which specifies that the input image bands should not be stretched.
noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.
numClustersStart – is an int which specifies the number of clusters within the KMeans clustering to start the process
numClustersStep – is an int which specifies the number of clusters within the KMeans clustering added with each step
numOfClustersSteps – is an int which specifies the number steps (i.e., tests) which are performed.
minPxls – is an int which specifies the minimum number pixels within a segments.
distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).
bands – is an array providing a subset of image bands to use (default is None to use all bands)
sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)
kmMaxIter – maximum iterations for KMeans.
processInMem – where functions allow it perform processing in memory rather than on disk.
minNormV – is a floating point =None
maxNormV – None
minNormMI – None
maxNormMI – None
Example:
from rsgislib.segmentation import segutils inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea' outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_Clumps' outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_ClumpsMean' tmpath='./OptimalTests/tmp/' outStatsFile = './OptimalTests/StatsClumps.csv' # Will test clump values from 10 to 200 with intervals of 10. segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClustersStart=10, numClustersStep=10, numOfClustersSteps=20, minPxls=50, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)
Clump¶
-
rsgislib.segmentation.
clump
(inputimage, outputimage, gdalformat, processinmemory, nodata, addPxlVal2Rat)¶ A function which clumps an input image (of int pixel data type) to identify connected independent sets of pixels.
Where:
- Parameters
inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
nodata – is None or float
addPxlVal2Rat – is a boolean specifying whether the pixel value (from inputimage) should be added as a RAT.
-
rsgislib.segmentation.tiledclump.
performClumpingSingleThread
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
-
rsgislib.segmentation.tiledclump.
performClumpingMultiProcess
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
-
rsgislib.segmentation.tiledclump.
performUnionClumpingSingleThread
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
-
rsgislib.segmentation.tiledclump.
performUnionClumpingMultiProcess
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
Label¶
-
rsgislib.segmentation.
labelPixelsFromClusterCentres
(inputimage, outputimage, clustercenters, ignorezeros, gdalformat)¶ Labels image pixels with the ID of the nearest cluster centre.
Where:
- Parameters
inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
clustercentres – is a string containing the name of the cluster centre file
ignore – zeros is a bool
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
-
rsgislib.segmentation.
relabelClumps
(inputimage, outputimage, gdalformat, processinmemory)¶ Relabel clumps
Where:
- Parameters
inputimage – is a string containing the name of the input file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
Elimination¶
-
rsgislib.segmentation.
eliminateSinglePixels
(inputimage, clumpsimage, outputimage, tempfile, gdalformat, processinmemory, ignorezeros)¶ Eliminates single pixels
Where:
- Parameters
inputimage – is a string containing the name of the input file
clumpsimage – is a string containing the name of the clump file
outputimage – is a string containing the name of the output file
tempfile – is a string containing the name of the temporary file to use
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
ignorezeros – is a bool
-
rsgislib.segmentation.
rmSmallClumps
(clumpsImage, outputImage, threshold, gdalformat)¶ A function to remove small clumps and set them with a value of 0 (i.e., no data)
Where:
- Parameters
clumpsImage – is a string containing the name of the input clumps file - note a column called ‘Histogram’.
outputImage – is a string containing the name of the output clumps file
threshold – is a float containing the area threshold (in pixels)
gdalformat – is a string defining the format of the output image.
-
rsgislib.segmentation.
rmSmallClumpsStepwise
(inputimage, clumpsimage, outputimage, gdalformat, stretchstatsavail, stretchstatsfile, storemean, processinmemory, minclumpsize, specThreshold)¶ eliminate clumps smaller than a given size from the scene, small clumps will be combined with their spectrally closest neighbouring clump in a stepwise fashion unless over spectral distance threshold
Where:
- Parameters
inputimage – is a string containing the name of the input file
clumpsimage – is a string containing the name of the clump file
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
stretchstatsavail – is a bool
stretchstatsfile – is a string containing the name of the stretch stats file
storemean – is a bool
processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
minclumpsize – is an unsigned integer providing the minimum size for clumps.
specThreshold – is a float providing the maximum (Euclidian distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.
Join / Union¶
-
rsgislib.segmentation.
unionOfClumps
(outputimage, gdalformat, inputimagepaths, nodata, addPxlVals2Rat)¶ The function takes the union of clumps images - combining them so all lines from all clumps are preserved in the new outputted clumps image.
Where:
- Parameters
outputimage – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
inputimagepaths – is a list of input image paths
nodata – is None or float
addPxlVals2Rat – is a boolean specifying whether the pixel values (from inputimagepaths) should be added as a RAT; column names have prefix ‘ClumpVal_’ with index starting at 1 for each variable.
Visualisation¶
-
rsgislib.segmentation.
meanImage
(inputImage, inputClumps, outputImage, gdalformat, datatype)¶ A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.
Where:
- Parameters
inputImage – is a string containing the name of the input image file from which the mean is taken.
inputClumps – is a string containing the name of the input clumps file
outputImage – is a string containing the name of the output image.
gdalformat – is a string defining the format of the output image.
datatype – is an containing one of the values from rsgislib.TYPE_*
Tiles¶
-
rsgislib.segmentation.
mergeSegmentationTiles
(outputimage, bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)¶ Merge body clumps from tile segmentations into outputfile
Where:
- Parameters
outputimage – is a string containing the name of the output file
bordermaskimage – is a string containing the name of the border mask file
tileboundary – is an unsigned integer containing the tile boundary pixel value
tileoverlap – is an unsigned integer containing the tile overlap pixel value
tilebody – is an unsigned integer containing the tile body pixel value
colsname – is a string containing the name of the object id column
inputimagepaths – is a list of input image paths
-
rsgislib.segmentation.tiledsegsingle.
performTiledSegmentation
(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)¶ Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).
- Parameters
inputImage – is a string containing the name of the input file.
clumpsImage – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)
tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)
validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).
distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
kmMaxIter – maximum iterations for KMeans (Default 200).
Example:
from rsgislib.segmentation import tiledsegsingle inputImage = 'LS5TM_20110428_sref_submask_osgb.kea' clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea' tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
-
rsgislib.segmentation.tiledclump.
clumpImgFunc
(imgs)¶ Clump an image with values provides as an array for use within a multiprocessing Pool
-
rsgislib.segmentation.tiledclump.
performClumpingMultiProcess
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
-
rsgislib.segmentation.tiledclump.
performClumpingSingleThread
(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
-
rsgislib.segmentation.tiledclump.
performUnionClumpingMultiProcess
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=- 1)¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
nCores – is an int specifying the number of cores to be used for clumping processing.
-
rsgislib.segmentation.tiledclump.
performUnionClumpingSingleThread
(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')¶ Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.
- Parameters
inputImage – the input image to be clumped.
refImg – the reference image which the union is undertaken with (typically an existing classification)
clumpsImage – the output clumped image.
tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
-
rsgislib.segmentation.tiledclump.
unionClumpImgFunc
(imgs)¶ Union Clump an image with values provides as an array for use within a multiprocessing Pool
scikit-image¶
-
rsgislib.segmentation.skimgseg.
performFelsenszwalbSegmentation
(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, scale=1, sigma=0.8, min_size=20)¶ A function to perform the Felsenszwalb segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).
- Parameters
inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
scale – scikit-image Felsenszwalb parameter: ‘Free parameter. Higher means larger clusters.’
sigma – scikit-image Felsenszwalb parameter: ‘Width of Gaussian kernel used in preprocessing.’
min_size – scikit-image Felsenszwalb parameter: ‘Minimum component size. Enforced using postprocessing.’
-
rsgislib.segmentation.skimgseg.
performQuickshiftSegmentation
(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, pcaPxlSample=100, ratio=1.0, kernel_size=5, max_dist=10, sigma=0, convert2lab=True, random_seed=42)¶ A function to perform the quickshift segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).
- Parameters
inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 3 image bands in the input file then you can use PCA to reduce the number of image bands.
ratio – scikit-image Quickshift parameter: ‘Balances color-space proximity and image-space proximity. Higher values give more weight to color-space. (between 0 and 1)’
kernel_size – scikit-image Quickshift parameter: ‘Width of Gaussian kernel used in smoothing the sample density. Higher means fewer clusters.’
max_dist – scikit-image Quickshift parameter: ‘Cut-off point for data distances. Higher means fewer clusters.’
sigma – scikit-image Quickshift parameter: ‘Width for Gaussian smoothing as preprocessing. Zero means no smoothing.’
convert2lab – scikit-image Quickshift parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. For this purpose, the input is assumed to be RGB.’
random_seed – scikit-image Quickshift parameter: ‘Random seed used for breaking ties.’
-
rsgislib.segmentation.skimgseg.
performRandomWalkerSegmentation
(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, beta=130, mode='bf', tol=0.001, spacing=None)¶ A function to perform the random walker segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).
- Parameters
inputImg – input image file.
markersImg – input markers image file - markers must be uniquely numbered.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
beta – scikit-image random_walker parameter: ‘Penalization coefficient for the random walker motion (the greater beta, the more difficult the diffusion).’
mode – scikit-image random_walker parameter: ‘Mode for solving the linear system in the random walker algorithm. Available options {‘cg_mg’, ‘cg’, ‘bf’}.’ * ‘bf’ (brute force): an LU factorization of the Laplacian is computed. This is fast for small images (<1024x1024), but very slow and memory-intensive for large images (e.g., 3-D volumes). * ‘cg’ (conjugate gradient): the linear system is solved iteratively using the Conjugate Gradient method from scipy.sparse.linalg. This is less memory-consuming than the brute force method for large images, but it is quite slow. * ‘cg_mg’ (conjugate gradient with multigrid preconditioner): a preconditioner is computed using a multigrid solver, then the solution is computed with the Conjugate Gradient method. This mode requires that the pyamg module (http://pyamg.org/) is installed. For images of size > 512x512, this is the recommended (fastest) mode.
tol – scikit-image random_walker parameter: ‘tolerance to achieve when solving the linear system, in cg’ and ‘cg_mg’ modes.’
spacing – scikit-image random_walker parameter: ‘Spacing between voxels in each spatial dimension. If None, then the spacing between pixels/voxels in each dimension is assumed 1.’
-
rsgislib.segmentation.skimgseg.
performSlicSegmentation
(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, n_segments=100, compactness=10.0, max_iter=10, sigma=0, spacing=None, convert2lab=None, enforce_connectivity=True, min_size_factor=0.5, max_size_factor=3, slic_zero=False)¶ A function to perform the slic segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).
- Parameters
inputImg – input image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
n_segments – scikit-image Slic parameter: ‘The (approximate) number of labels in the segmented output image.’
compactness – scikit-image Slic parameter: ‘Balances color proximity and space proximity. Higher values give more weight to space proximity, making superpixel shapes more square/cubic. In SLICO mode, this is the initial compactness. This parameter depends strongly on image contrast and on the shapes of objects in the image. We recommend exploring possible values on a log scale, e.g., 0.01, 0.1, 1, 10, 100, before refining around a chosen value.’
max_iter – scikit-image Slic parameter: ‘Maximum number of iterations of k-means.’
sigma – scikit-image Slic parameter: ‘Width of Gaussian smoothing kernel for pre-processing for each dimension of the image. The same sigma is applied to each dimension in case of a scalar value. Zero means no smoothing. Note, that sigma is automatically scaled if it is scalar and a manual voxel spacing is provided (see Notes section).’
spacing – scikit-image Slic parameter: ‘The voxel spacing along each image dimension. By default, slic assumes uniform spacing (same voxel resolution along z, y and x). This parameter controls the weights of the distances along z, y, and x during k-means clustering.’
convert2lab – scikit-image Slic parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. The input image must be RGB. Highly recommended.’
enforce_connectivity – scikit-image Slic parameter: ‘Whether the generated segments are connected or not’
min_size_factor – scikit-image Slic parameter: ‘Proportion of the minimum segment size to be removed with respect to the supposed segment size “depth:paramwidth*height/n_segments”’
max_size_factor – scikit-image Slic parameter: ‘Proportion of the maximum connected segment size. A value of 3 works in most of the cases.’
slic_zero – scikit-image Slic parameter: ‘Run SLIC-zero, the zero-parameter mode of SLIC.’
-
rsgislib.segmentation.skimgseg.
performWatershedSegmentation
(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, compactness=0, watershed_line=False)¶ A function to perform the watershed segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).
- Parameters
inputImg – input image file.
markersImg – input markers image file.
outputImg – output image file.
gdalformat – output image file format.
tmpDIR – temp DIR used to output PCA files
calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
compactness – scikit-image Watershed parameter: ‘Use compact watershed with given compactness parameter. Higher values result in more regularly-shaped watershed basins; Peer Neubert & Peter Protzel (2014). Compact Watershed and Preemptive SLIC: On Improving Trade-offs of Superpixel Segmentation Algorithms. ICPR 2014’
watershed_line – scikit-image Watershed parameter: ‘If watershed_line is True, a one-pixel wide line separates the regions obtained by the watershed algorithm. The line has the label 0.’
Other¶
-
rsgislib.segmentation.
generateRegularGrid
(inputImage, outputClumps, gdalformat, numXPxls, numYPxls, offset)¶ A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.
Where:
- Parameters
inputImage – is a string containing the name of the input image file specifying the dimensions of the output image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
numXPxls – is the size of the grid cells in the X axis in pixel units.
numYPxls – is the size of the grid cells in the Y axis in pixel units.
offset – is a boolean specifying whether the grid should be offset, i.e., starts half way point of numXPxls and numYPxls (Default is false; optional)
-
rsgislib.segmentation.
dropSelectedClumps
(clumpsImage, outputClumps, gdalformat)¶ A function to drop the selected clumps from the segmentation.
Where:
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
-
rsgislib.segmentation.
findTileBordersMask
(bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)¶ Mask tile borders
Where:
- Parameters
bordermaskimage – is a string containing the name of the border mask file
tileboundary – is an unsigned integer containing the tile boundary pixel value
tileoverlap – is an unsigned integer containing the tile overlap pixel value
tilebody – is an unsigned integer containing the tile body pixel value
colsname – is a string containing the name of the object id column
inputimagepaths – is a list of input clump image paths
-
rsgislib.segmentation.
includeRegionsInClumps
(clumpsImage, regionsImage, outputClumps, gdalformat)¶ A function to include a set of clumped regions within an existing clumps (i.e., segmentation) image. NOTE. You should run the relabelClumps function on the output of this command before using further.
Where:
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
regionsImage – is a string containing the filepath for the input regions image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
-
rsgislib.segmentation.
mergeClumpImages
(inputimagepaths, outputimage, mergeRATs)¶ Merge all clumps from tile segmentations into outputfile
Where:
- Parameters
inputimagepaths – is a list of input image paths
outputimage – is a string containing the name of the output file
mergeRATs – is a boolean specifying with the image RATs are to merged (Default: false; Optional)
-
rsgislib.segmentation.
mergeEquivClumps
(clumpsImage, outputClumps, gdalformat, valClumpsCols)¶ A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.
Where:
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
valClumpsCol – is a list of strings defining the value(s) used to define equivalence (typically it might be the original pixel values when clumping through tiling).
-
rsgislib.segmentation.
mergeSegments2Neighbours
(clumpsImage, spectralImage, outputClumps, gdalformat, selectedClumpsCol, noDataClumpsCol)¶ A function to merge some selected clumps with the neighbours based on colour (spectral) distance where clumps identified as no data are ignored.
Where:
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
spectralImage – is a string containing the filepath for the input image used to define ‘distance’.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
noDataClumpsCol – is a string defining the binary column for defining the segments to be ignored as no data (1 == no-data clumps).
-
rsgislib.segmentation.
pxlGrowRegions
(clumpsImage, valsImage, outputImage, gdalformat, muParseCriteria, varNameBandPairs)¶ A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.
Where:
- Parameters
clumpsImage – is a string containing the filepath for the input clumps image.
valsImage – is a string containing the file path for the values (criteria) image.
outputClumps – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
muParseCriteria – is a string with an muparser criteria (muparser; e.g., b1 < 20?1:0). Expression output must be 0 or 1 (1 for True).
varNameBandPairs – is a list pairs specifying the variable name (in muparser expression) and the band number to which it refers in valsImage (note band numbers start a 1).
Example:
varBandPair = collections.namedtuple('VarBandPair', ['varName', 'bandIndex']) varBandPairSeq = list() varBandPairSeq.append(varBandPair(varName='b1', bandIndex=1)) muParseCriteria = 'b1 > 1000?1:0' rsgislib.segmentation.pxlGrowRegions(tmpInitClearSkyRegionsFinal, tmpCloudsImgDist2CloudsNoData, tmpClearSkyRegionsGrow, 'KEA', muParseCriteria, varBandPairSeq)