RSGISLib Image Segmentation Module

Commands to perform a series of segmentations on input imagery

Utilities

rsgislib.segmentation.shepherdseg.run_shepherd_segmentation(input_img, out_clumps_img, out_mean_img=None, tmp_dir='.', gdalformat='KEA', calc_stats=True, no_stretch=False, no_delete=False, num_clusters=60, min_n_pxls=100, dist_thres=100, bands=None, sampling=100, km_max_iter=200, process_in_mem=False, save_process_stats=False, img_stretch_stats='', kmeans_centres='', img_stats_json_file='')

Utility function to call the segmentation algorithm of Shepherd et al. (2019).

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Parameters:

input_img – is a string containing the name of the input file.
out_clumps_img – is a string containing the name of the output clump file.
out_mean_img – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmp_dir – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
calc_stats – is a bool which specifies that image statistics and pyramids should be built for the output images (default = True)
no_stretch – is a bool which specifies that the input image bands should not be stretched (default = False).
no_delete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
num_clusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
min_n_pxls – is an int which specifies the minimum number pixels within a segments (default = 100).
dist_thres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
km_max_iter – maximum iterations for KMeans.
process_in_mem – where functions allow it perform processing in memory rather than on disk.
save_process_stats – is a bool which specifies that the image stretch stats and the kMeans centre stats should be saved along with a header.
img_stretch_stats – is a string providing the file name and path for the image stretch stats (Output).
kmeans_centres – is a string providing the file name and path for the KMeans clusters centres (don’t include file extension; .gmtxt will be added to the end) (Output).
img_stats_json_file – is a string providing the name and path of a JSON file storing the image spatial extent and img_stretch_stats and kmeans_centres file paths for use by other commands (Output).

from rsgislib.segmentation import shepherdseg

input_img = 'jers1palsar_stack.kea'
out_clumps_img = 'jers1palsar_stack_clumps_elim_final.kea'
out_mean_img = 'jers1palsar_stack_clumps_elim_final_mean.kea'

shepherdseg.run_shepherd_segmentation(input_img, out_clumps_img,
                                      out_mean_img, min_n_pxls=100)

rsgislib.segmentation.tiledsegsingle.perform_tiled_segmentation(input_img, clumps_img, tmp_dir='segtmp', tile_width=2000, tile_height=2000, valid_data_threshold=0.3, num_clusters=60, min_pxls=100, dist_thres=100, bands=None, sampling=100, km_max_iter=200)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).

Parameters:

input_img – is a string containing the name of the input file.
clumps_img – is a string containing the name of the output clump file.
tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.
tile_width – is an int specifying the width of the tiles used for processing (Default 2000)
tile_height – is an int specifying the height of the tiles used for processing (Default 2000)
valid_data_threshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).
num_clusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).
min_pxls – is an int which specifies the minimum number pixels within a segments (default = 100).
dist_thres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
km_max_iter – maximum iterations for KMeans (Default 200).

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.perform_tiled_segmentation(inputImage, clumpsImage, tmpDIR='rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)

rsgislib.segmentation.shepherdseg.run_shepherd_segmentation_pre_calcd_stats(input_img, out_clumps_img, kmeans_centres, img_stretch_stats, out_mean_img=None, tmp_dir='.', gdalformat='KEA', calc_stats=True, no_stretch=False, no_delete=False, min_n_pxls=100, dist_thres=100, bands=None, process_in_mem=False)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using pre-calculated stretch stats and KMeans cluster centres.

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Parameters:

input_img – is a string containing the name of the input file.
out_clumps_img – is a string containing the name of the output clump file.
kmeans_centres – is a string providing the file name and path for the KMeans clusters centres (Input)
img_stretch_stats – is a string providing the file name and path for the image stretch stats (Input - not required if no_stretch=True)
out_mean_img – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.
tmp_dir – is a file path for intermediate files (default is current directory).
gdalformat – is a string containing the GDAL format for the output file (default = KEA).
calc_stats – is a bool which specifies that image statistics and pyramids should be built for the output images (default = True)
no_stretch – is a bool which specifies that the input image bands should not be stretched (default = False).
no_delete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).
min_n_pxls – is an int which specifies the minimum number pixels within a segments (default = 100).
dist_thres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).
bands – is an array providing a subset of image bands to use (default is None to use all bands).
sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).
process_in_mem – where functions allow it perform processing in memory rather than on disk.

from rsgislib.segmentation import shepherdseg

input_img = 'jers1palsar_stack.kea'
out_clumps_img = 'jers1palsar_stack_clumps_elim_final.kea'
out_mean_img = 'jers1palsar_stack_clumps_elim_final_mean.kea'
kmeans_centres = 'jers1palsar_stack_kcentres.gmtxt'
img_stretch_stats = 'jers1palsar_stack_stchstats.txt'

shepherdseg.run_shepherd_segmentation_pre_calcd_stats(input_img, out_clumps_img,
                                                      kmeans_centres,
                                                      img_stretch_stats,
                                                      out_mean_img,
                                                      min_n_pxls=100)

Clump

rsgislib.segmentation.clump(input_img, output_img, gdalformat, in_memory, no_data_val, add_to_rat)

A function which clumps an input image (of int pixel data type) to identify connected independent sets of pixels.

Parameters:

input_img – is a string containing the name of the input file
output_img – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
in_memory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
no_data_val – is None or float
add_to_rat – is a boolean specifying whether the pixel value (from input_img) should be added as a RAT (Column Name: PixelVal).

rsgislib.segmentation.tiledclump.perform_clumping_single_thread(input_img, clumps_img, tmp_dir='tmp', width=2000, height=2000, gdalformat='KEA')

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters:

input_img – the input image to be clumped.
clumps_img – the output clumped image.
tmp_dir – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.perform_clumping_multi_process(input_img, clumps_img, tmp_dir='tmp', width=2000, height=2000, gdalformat='KEA', n_cores=-1)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters:

input_img – the input image to be clumped.
clumps_img – the output clumped image.
tmp_dir – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
n_cores – is an int specifying the number of cores to be used for clumping processing.

Label

rsgislib.segmentation.label_pixels_from_cluster_centres(input_img, output_img, cluster_centres_file, ignore_zeros, gdalformat)

Labels image pixels with the ID of the nearest cluster centre.

Parameters:

input_img – is a string containing the name of the input file
output_img – is a string containing the name of the output file
cluster_centres_file – is a string containing the name of the cluster centre file
ignore_zeros – zeros is a bool
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

rsgislib.segmentation.relabel_clumps(input_img, output_img, gdalformat, in_memory)

Relabel clumps so numbering is consecutive with output gaps

Parameters:

input_img – is a string containing the name of the input file
output_img – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
in_memory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

Elimination

rsgislib.segmentation.eliminate_single_pixels(input_img, clumps_img, output_img, tmp_img, gdalformat, in_memory, ignorezeros)

Eliminates single pixels

Parameters:

input_img – is a string containing the name of the input file
clumps_img – is a string containing the name of the clump file
output_img – is a string containing the name of the output file
tmp_img – is a string containing the name of the temporary file to use
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
in_memory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
ignore_zeros – is a bool

rsgislib.segmentation.rm_small_clumps(clumps_img, output_img, area_threshold, gdalformat)

A function to remove small clumps and set them with a value of 0 (i.e., no data)

Parameters:

clumps_img – is a string containing the name of the input clumps file - note a column called ‘Histogram’.
output_img – is a string containing the name of the output clumps file
area_threshold – is a float containing the area threshold (in pixels)
gdalformat – is a string defining the format of the output image.

rsgislib.segmentation.rm_small_clumps_stepwise(input_img, clumps_img, output_img, gdalformat, use_stch_stats, stch_stats_file, store_mean, in_memory, min_clump_size, pxl_val_thres)

Eliminate clumps smaller than a given size from the scene, small clumps will be combined with their spectrally closest neighbouring clump in a stepwise fashion unless over spectral distance threshold

Parameters:

input_img – is a string containing the name of the input file
clumps_img – is a string containing the name of the clump file
output_img – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
use_stch_stats – is a bool
stch_stats_file – is a string containing the name of the stretch stats file
store_mean – is a bool
in_memory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).
min_clump_size – is an unsigned integer providing the minimum size for clumps.
pxl_val_thres – is a float providing the maximum (Euclidian distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.

Join / Union

rsgislib.segmentation.union_of_clumps(input_imgs, output_img, gdalformat, no_data_val, add_to_rat)

The function takes the union of clumps images, combining them so all lines from all clumps are preserved in the new outputted clumps image.

Parameters:

input_imgs – is a list of input image paths
output_img – is a string containing the name of the output file
gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’
no_data_val – is None or float
add_to_rat – is a boolean specifying whether the pixel values (from input_imgs) should be added as a RAT; column names have prefix ‘ClumpVal_’ with index starting at 1 for each variable.

rsgislib.segmentation.tiledclump.perform_union_clumping_single_thread(input_img, in_ref_img, clumps_img, tmp_dir='tmp', width=2000, height=2000, gdalformat='KEA')

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters:

input_img – the input image to be clumped.
in_ref_img – the reference image which the union is undertaken with (typically an existing classification)
clumps_img – the output clumped image.
tmp_dir – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.perform_union_clumping_multi_process(input_img, in_ref_img, clumps_img, tmp_dir='tmp', width=2000, height=2000, gdalformat='KEA', n_cores=-1)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters:

input_img – the input image to be clumped.
in_ref_img – the reference image which the union is undertaken with (typically an existing classification)
clumps_img – the output clumped image.
tmp_dir – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.
width – int for width of the image tiles used for processing (Default = 2000).
height – int for height of the image tiles used for processing (Default = 2000).
gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.
n_cores – is an int specifying the number of cores to be used for clumping processing.

Visualisation

rsgislib.segmentation.mean_image(input_img, clumps_img, output_img, gdalformat, datatype)

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Parameters:

input_img – is a string containing the name of the input image file from which the mean is taken.
clumps_img – is a string containing the name of the input clumps file
output_img – is a string containing the name of the output image.
gdalformat – is a string defining the format of the output image.
datatype – is an containing one of the values from rsgislib.TYPE_*

Tiles

rsgislib.segmentation.merge_segmentation_tiles(input_imgs, output_img, border_msk_img, tile_boundary, tile_overlap, tile_body, col_name)

Merge body clumps from tile segmentations into output file

Parameters:

input_imgs – is a list of input image paths
output_img – is a string containing the name of the output file
border_msk_img – is a string containing the name of the border mask file
tile_boundary – is an unsigned integer containing the tile boundary pixel value
tile_overlap – is an unsigned integer containing the tile overlap pixel value
tile_body – is an unsigned integer containing the tile body pixel value
col_name – is a string containing the name of the object id column

scikit-image

rsgislib.segmentation.skimgseg.perform_felsenszwalb_segmentation(input_img, output_img, gdalformat='KEA', no_data_val=0, tmp_dir='tmp', calc_stats=True, use_pca=False, n_pca_bands=3, pca_pxl_sample=100, scale=1, sigma=0.8, min_size=20)

A function to perform the Felsenszwalb segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters:

input_img – input image file.
output_img – output image file.
gdalformat – output image file format.
tmp_dir – temp DIR used to output PCA files
calc_stats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
use_pca – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
n_pca_bands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
scale – scikit-image Felsenszwalb parameter: ‘Free parameter. Higher means larger clusters.’
sigma – scikit-image Felsenszwalb parameter: ‘Width of Gaussian kernel used in preprocessing.’
min_size – scikit-image Felsenszwalb parameter: ‘Minimum component size. Enforced using postprocessing.’

rsgislib.segmentation.skimgseg.perform_quickshift_segmentation(input_img, output_img, gdalformat='KEA', no_data_val=0, tmp_dir='tmp', calc_stats=True, use_pca=False, pca_pxl_sample=100, ratio=1.0, kernel_size=5, max_dist=10, sigma=0, convert_to_lab=True, random_seed=42)

A function to perform the quickshift segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters:

input_img – input image file.
output_img – output image file.
gdalformat – output image file format.
tmp_dir – temp DIR used to output PCA files
calc_stats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
use_pca – if there are not 3 image bands in the input file then you can use PCA to reduce the number of image bands.
ratio – scikit-image Quickshift parameter: ‘Balances color-space proximity and image-space proximity. Higher values give more weight to color-space. (between 0 and 1)’
kernel_size – scikit-image Quickshift parameter: ‘Width of Gaussian kernel used in smoothing the sample density. Higher means fewer clusters.’
max_dist – scikit-image Quickshift parameter: ‘Cut-off point for data distances. Higher means fewer clusters.’
sigma – scikit-image Quickshift parameter: ‘Width for Gaussian smoothing as preprocessing. Zero means no smoothing.’
convert_to_lab – scikit-image Quickshift parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. For this purpose, the input is assumed to be RGB.’
random_seed – scikit-image Quickshift parameter: ‘Random seed used for breaking ties.’

rsgislib.segmentation.skimgseg.perform_random_walker_segmentation(input_img, in_markers_img, output_img, gdalformat='KEA', no_data_val=0, tmp_dir='tmp', calc_stats=True, use_pca=False, n_pca_bands=3, pca_pxl_sample=100, beta=130, mode='bf', tol=0.001, spacing=None)

A function to perform the random walker segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters:

input_img – input image file.
in_markers_img – input markers image file - markers must be uniquely numbered.
output_img – output image file.
gdalformat – output image file format.
tmp_dir – temp DIR used to output PCA files
calc_stats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
use_pca – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
n_pca_bands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
beta – scikit-image random_walker parameter: ‘Penalization coefficient for the random walker motion (the greater beta, the more difficult the diffusion).’
mode –

scikit-image random_walker parameter: ‘Mode for solving the linear
system in the random walker algorithm. Available options {‘cg_mg’, ‘cg’, ‘bf’}.’
- ’bf’ (brute force): an LU factorization of the Laplacian is computed.
  This is fast for small images (<1024x1024), but very slow and memory-intensive for large images (e.g., 3-D volumes).
- ’cg’ (conjugate gradient): the linear system is solved iteratively
  using the Conjugate Gradient method from scipy.sparse.linalg. This is less memory-consuming than the brute force method for large images, but it is quite slow.
- ’cg_mg’ (conjugate gradient with multigrid preconditioner): a
  preconditioner is computed using a multigrid solver, then the solution is computed with the Conjugate Gradient method. This mode requires that the pyamg module (http://pyamg.org/) is installed. For images of size > 512x512, this is the recommended (fastest) mode.
tol – scikit-image random_walker parameter: ‘tolerance to achieve when solving the linear system, in cg’ and ‘cg_mg’ modes.’
spacing – scikit-image random_walker parameter: ‘Spacing between voxels in each spatial dimension. If None, then the spacing between pixels/voxels in each dimension is assumed 1.’

rsgislib.segmentation.skimgseg.perform_slic_segmentation(input_img, output_img, gdalformat='KEA', no_data_val=0, tmp_dir='tmp', calc_stats=True, use_pca=False, n_pca_bands=3, pca_pxl_sample=100, n_segments=100, compactness=10.0, max_iter=10, sigma=0, spacing=None, convert_to_lab=None, enforce_connectivity=True, min_size_factor=0.5, max_size_factor=3, slic_zero=False)

A function to perform the slic segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters:

input_img – input image file.
output_img – output image file.
gdalformat – output image file format.
tmp_dir – temp DIR used to output PCA files
calc_stats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
use_pca – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
n_pca_bands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
n_segments – scikit-image Slic parameter: ‘The (approximate) number of labels in the segmented output image.’
compactness – scikit-image Slic parameter: ‘Balances color proximity and space proximity. Higher values give more weight to space proximity, making superpixel shapes more square/cubic. In SLICO mode, this is the initial compactness. This parameter depends strongly on image contrast and on the shapes of objects in the image. We recommend exploring possible values on a log scale, e.g., 0.01, 0.1, 1, 10, 100, before refining around a chosen value.’
max_iter – scikit-image Slic parameter: ‘Maximum number of iterations of k-means.’
sigma – scikit-image Slic parameter: ‘Width of Gaussian smoothing kernel for pre-processing for each dimension of the image. The same sigma is applied to each dimension in case of a scalar value. Zero means no smoothing. Note, that sigma is automatically scaled if it is scalar and a manual voxel spacing is provided (see Notes section).’
spacing – scikit-image Slic parameter: ‘The voxel spacing along each image dimension. By default, slic assumes uniform spacing (same voxel resolution along z, y and x). This parameter controls the weights of the distances along z, y, and x during k-means clustering.’
convert_to_lab – scikit-image Slic parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. The input image must be RGB. Highly recommended.’
enforce_connectivity – scikit-image Slic parameter: ‘Whether the generated segments are connected or not’
min_size_factor – scikit-image Slic parameter: ‘Proportion of the minimum segment size to be removed with respect to the supposed segment size “depth:paramwidth*height/n_segments”’
max_size_factor – scikit-image Slic parameter: ‘Proportion of the maximum connected segment size. A value of 3 works in most of the cases.’
slic_zero – scikit-image Slic parameter: ‘Run SLIC-zero, the zero-parameter mode of SLIC.’

rsgislib.segmentation.skimgseg.perform_watershed_segmentation(input_img, in_markers_img, output_img, gdalformat='KEA', no_data_val=0, tmp_dir='tmp', calc_stats=True, use_pca=False, n_pca_bands=3, pca_pxl_sample=100, compactness=0, watershed_line=False)

A function to perform the watershed segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters:

input_img – input image file.
in_markers_img – input markers image file.
output_img – output image file.
gdalformat – output image file format.
tmp_dir – temp DIR used to output PCA files
calc_stats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.
use_pca – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.
n_pca_bands – the number of principle components outputs from the PCA - needs to be either 1 or 3.
compactness – scikit-image Watershed parameter: ‘Use compact watershed with given compactness parameter. Higher values result in more regularly-shaped watershed basins; Peer Neubert & Peter Protzel (2014). Compact Watershed and Preemptive SLIC: On Improving Trade-offs of Superpixel Segmentation Algorithms. ICPR 2014’
watershed_line – scikit-image Watershed parameter: ‘If watershed_line is True, a one-pixel wide line separates the regions obtained by the watershed algorithm. The line has the label 0.’

Other

rsgislib.segmentation.generate_regular_grid(input_img, output_img, gdalformat, num_x_pxls, num_y_pxls, offset)

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Parameters:

input_img – is a string containing the name of the input image file specifying the dimensions of the output image.
output_img – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
num_x_pxls – is the size of the grid cells in the X axis in pixel units.
num_y_pxls – is the size of the grid cells in the Y axis in pixel units.
offset – is a boolean specifying whether the grid should be offset, i.e., starts half way point of numXPxls and numYPxls (Default is false; optional)

rsgislib.segmentation.drop_selected_clumps(clumps_img, output_img, gdalformat, sel_clumps_col)

A function to drop the selected clumps from the segmentation.

Parameters:

clumps_img – is a string containing the filepath for the input clumps image.
output_img – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
sel_clumps_col – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).

rsgislib.segmentation.find_tile_borders_mask(input_imgs, border_msk_img, tile_boundary, tile_overlap, tile_body, col_name)

Mask tile borders

Parameters:

input_imgs – is a list of input clump image paths
border_msk_img – is a string containing the name of the border mask file
tile_boundary – is an unsigned integer containing the tile boundary pixel value
tile_overlap – is an unsigned integer containing the tile overlap pixel value
tile_body – is an unsigned integer containing the tile body pixel value
col_name – is a string containing the name of the object id column

rsgislib.segmentation.include_regions_in_clumps(clumps_img, regions_img, output_img, gdalformat)

A function to include a set of clumped regions within an existing clumps (i.e., segmentation) image. NOTE. You should run the relabel_clumps function on the output of this command before using further.

Parameters:

clumps_img – is a string containing the filepath for the input clumps image.
regions_img – is a string containing the filepath for the input regions image.
output_img – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.

rsgislib.segmentation.merge_clump_images(input_imgs, output_img, merge_rats)

Merge all clumps from tile segmentations into output file

Parameters:

input_imgs – is a list of input image paths
output_img – is a string containing the name of the output file
merge_rats – is a boolean specifying with the image RATs are to merged (Default: false; Optional)

rsgislib.segmentation.merge_equiv_clumps(clumps_img, output_img, gdalformat, val_columns)

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Parameters:

clumps_img – is a string containing the filepath for the input clumps image.
output_img – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
val_columns – is a list of strings defining the value(s) used to define equivalence (typically it might be the original pixel values when clumping through tiling).

rsgislib.segmentation.merge_segments_to_neighbours(clumps_img, input_vals_img, output_img, gdalformat, sel_clumps_col, no_data_clumps_col)

A function to merge some selected clumps with the neighbours based on colour (spectral) distance where clumps identified as no data are ignored.

Parameters:

clumps_img – is a string containing the filepath for the input clumps image.
input_vals_img – is a string containing the filepath for the input image used to define ‘distance’.
output_img – is a string containing the name and path of the output clumps image
gdalformat – is a string defining the format of the output image.
sel_clumps_col – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).
no_data_clumps_col – is a string defining the binary column for defining the segments to be ignored as no data (1 == no-data clumps).