RSGISLib Zonal Stats Module

For undertaking a pixel-in-polygon analysis you need to consider the size of the polygons with respect to the size of the pixels being intersected.

Where the pixels are small with respect to the polygons so there is at least one pixel within the polygon then the best function to use is:

  • rsgislib.zonalstats.calc_zonal_band_stats

If the pixels are large with respect to the polygons then use the following function which intersects the polygon centroid.

  • rsgislib.zonalstats.calc_zonal_poly_pts_band_stats

If the pixel size in between and/or polygons are varied in size such that it is not certain that all polygons will contain a pixel then the following function will first attempt to intersect the polygon with the pixels and if there is not a pixel within the polygon then the centriod is used.

  • rsgislib.zonalstats.calc_zonal_band_stats_test_poly_pts

Points

These functions extract values from an image for a set of vector points.

rsgislib.zonalstats.ext_point_band_values_file(vec_file, vec_lyr, input_img, img_band, min_thres, max_thres, out_no_data_val, out_field, reproj_vec=False, vec_def_epsg=None)

A function which extracts point values for an input vector file for a particular image band.

Parameters
  • vec_file – input vector file

  • vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • min_thres – a lower threshold for values which will be included in the stats calculation.

  • max_thres – a upper threshold for values which will be included in the stats calculation.

  • out_no_data_val – output no data value if no valid pixels are within the polygon.

  • out_field – the name of the field in the vector layer where the pixel values will be written.

  • reproj_vec – boolean to specify whether the vector layer should be reprojected on the fly during processing if the projections are different. Default: False to ensure it is the users intention.

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

rsgislib.zonalstats.ext_point_band_values(vec_lyr_obj, input_img, img_band, min_thres, max_thres, out_no_data_val, out_field, reproj_vec=False, vec_def_epsg=None)

A function which extracts point values for an input vector file for a particular image band.

Parameters
  • vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • min_thres – a lower threshold for values which will be included in the stats calculation.

  • max_thres – a upper threshold for values which will be included in the stats calculation.

  • out_no_data_val – output no data value if no valid pixels are within the polygon.

  • out_field – the name of the field in the vector layer where the pixel values will be written.

  • reproj_vec – boolean to specify whether the vector layer should be reprojected on the fly during processing if the projections are different. Default: False to ensure it is the users intention.

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted

Polygons

rsgislib.zonalstats.calc_zonal_band_stats_file(vec_file, vec_lyr, input_img, img_band, min_thres, max_thres, out_no_data_val, min_field=None, max_field=None, mean_field=None, stddev_field=None, sum_field=None, count_field=None, mode_field=None, median_field=None, vec_def_epsg=None)

A function which calculates zonal statistics for a particular image band. If you know that the pixels in the values image are small with respect to the polygons then use this function.

Parameters
  • vec_file – input vector file

  • vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • min_thres – a lower threshold for values which will be included in the stats calculation.

  • max_thres – a upper threshold for values which will be included in the stats calculation.

  • out_no_data_val – output no data value if no valid pixels are within the polygon.

  • min_field – the name of the field for the min value (None or not specified to be ignored).

  • max_field – the name of the field for the max value (None or not specified to be ignored).

  • mean_field – the name of the field for the mean value (None or not specified to be ignored).

  • stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).

  • sum_field – the name of the field for the sum value (None or not specified to be ignored).

  • count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).

  • mode_field – the name of the field for the mode value (None or not specified to be ignored).

  • median_field – the name of the field for the median value (None or not specified to be ignored).

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

rsgislib.zonalstats.calc_zonal_band_stats(vec_lyr_obj, input_img, img_band, min_thres, max_thres, out_no_data_val, min_field=None, max_field=None, mean_field=None, stddev_field=None, sum_field=None, count_field=None, mode_field=None, median_field=None, vec_def_epsg=None)

A function which calculates zonal statistics for a particular image band. If you know that the pixels in the values image are small with respect to the polygons then use this function.

Parameters
  • vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • min_thres – a lower threshold for values which will be included in the stats calculation.

  • max_thres – a upper threshold for values which will be included in the stats calculation.

  • out_no_data_val – output no data value if no valid pixels are within the polygon.

  • min_field – the name of the field for the min value (None or not specified to be ignored).

  • max_field – the name of the field for the max value (None or not specified to be ignored).

  • mean_field – the name of the field for the mean value (None or not specified to be ignored).

  • stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).

  • sum_field – the name of the field for the sum value (None or not specified to be ignored).

  • count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).

  • mode_field – the name of the field for the mode value (None or not specified to be ignored).

  • median_field – the name of the field for the median value (None or not specified to be ignored).

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

rsgislib.zonalstats.calc_zonal_poly_pts_band_stats_file(vec_file, vec_lyr, input_img, img_band, out_field, vec_def_epsg=None)

A funtion which extracts zonal stats for a polygon using the polygon centroid. This is useful when you are intersecting a low resolution image with respect to the polygon resolution.

Parameters
  • vec_file – input vector file

  • vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • out_field – output field name within the vector layer.

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

rsgislib.zonalstats.calc_zonal_poly_pts_band_stats(vec_lyr_obj, input_img, img_band, out_field, vec_def_epsg=None)

A funtion which extracts zonal stats for a polygon using the polygon centroid. This is useful when you are intesecting a low resolution image with respect to the polygon resolution.

Parameters
  • vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • out_field – output field name within the vector layer.

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

rsgislib.zonalstats.calc_zonal_band_stats_test_poly_pts_file(vec_file, vec_lyr, input_img, img_band, min_thres, max_thres, out_no_data_val, percentile=None, percentile_field=None, min_field=None, max_field=None, mean_field=None, stddev_field=None, sum_field=None, count_field=None, mode_field=None, median_field=None, vec_def_epsg=None)

A function which calculates zonal statistics for a particular image band. If unsure then use this function. This function tests whether 1 or more pixels has been found within the polygon and if not then the centroid use used to find a value for the polygon.

If you are unsure as to whether the pixels are small enough to be contained within all the polygons then use this function.

Parameters
  • vec_file – input vector file

  • vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • min_thres – a lower threshold for values which will be included in the stats calculation.

  • max_thres – a upper threshold for values which will be included in the stats calculation.

  • out_no_data_val – output no data value if no valid pixels are within the polygon.

  • percentile – the percentile value to calculate (value between 0 and 100 inclusive).

  • percentile_field – the name of the field for the percentile value (None or not specified to be ignored).

  • min_field – the name of the field for the min value (None or not specified to be ignored).

  • max_field – the name of the field for the max value (None or not specified to be ignored).

  • mean_field – the name of the field for the mean value (None or not specified to be ignored).

  • stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).

  • sum_field – the name of the field for the sum value (None or not specified to be ignored).

  • count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).

  • mode_field – the name of the field for the mode value (None or not specified to be ignored).

  • median_field – the name of the field for the median value (None or not specified to be ignored).

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

rsgislib.zonalstats.calc_zonal_band_stats_test_poly_pts(vec_lyr_obj, input_img, img_band, min_thres, max_thres, out_no_data_val, percentile=None, percentile_field=None, min_field=None, max_field=None, mean_field=None, stddev_field=None, sum_field=None, count_field=None, mode_field=None, median_field=None, vec_def_epsg=None)

A function which calculates zonal statistics for a particular image band. If unsure then use this function. This function tests whether 1 or more pixels has been found within the polygon and if not then the centroid use used to find a value for the polygon.

If you are unsure as to whether the pixels are small enough to be contained within all the polygons then use this function.

Parameters
  • vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.

  • input_img – the values image

  • img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.

  • min_thres – a lower threshold for values which will be included in the stats calculation.

  • max_thres – a upper threshold for values which will be included in the stats calculation.

  • out_no_data_val – output no data value if no valid pixels are within the polygon.

  • percentile – the percentile value to calculate (value between 0 and 100 inclusive).

  • percentile_field – the name of the field for the percentile value (None or not specified to be ignored).

  • min_field – the name of the field for the min value (None or not specified to be ignored).

  • max_field – the name of the field for the max value (None or not specified to be ignored).

  • mean_field – the name of the field for the mean value (None or not specified to be ignored).

  • stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).

  • sum_field – the name of the field for the sum value (None or not specified to be ignored).

  • count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).

  • mode_field – the name of the field for the mode value (None or not specified to be ignored).

  • median_field – the name of the field for the median value (None or not specified to be ignored).

  • vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.

Extracting Pixels to HDF5

rsgislib.zonalstats.image_zone_to_hdf(input_img, vec_file, vec_lyr, out_h5_file, no_prj_warn=False, pxl_in_poly_method=METHOD_POLYCONTAINSPIXELCENTER)

Extract the all the pixel values for regions to a HDF5 file (1 column for each image band).

Parameters
  • input_img – is a string containing the name of the input image.

  • vec_file – is a string containing the input vector file path.

  • vec_lyr – is a string containing the name of the input vector layer.

  • out_h5_file – is a string containing name of the output HDF file.

  • no_prj_warn – is a bool, specifying whether to skip printing a warning if the vector and image have a different projections.

  • pxl_in_poly_method – is the method for determining if a pixel is included with a polygon of type rsgislib.zonalstats.METHOD_*.

from rsgislib import zonalstats
input_img = './Rasters/injune_p142_casi_sub_utm.kea'
vec_file = './Vectors/injune_p142_crowns_utm.shp'
vec_lyr = 'injune_p142_crowns_utm'
out_h5_file = './TestOutputs/InjuneP142.hdf'
zonalstats.image_zone_to_hdf(input_img, vec_file, vec_lyr, out_h5_file, True, zonalstats.METHOD_POLYCONTAINSPIXELCENTER)
rsgislib.zonalstats.extract_zone_img_values_to_hdf(input_img, in_msk_img, out_h5_file, mask_val, datatype)

Extract the all the pixel values for raster regions to a HDF5 file (1 column for each image band).

Parameters
  • input_img – is a string containing the name and path of the input file

  • in_msk_img – is a string containing the name and path of the input image mask file; the mask file must have only 1 image band.

  • out_h5_file – is a string containing the name and path of the output HDF5 file

  • mask_val – is a float containing the value of the pixel within the mask for which values are to be extracted

  • datatype – is a rsgislib.TYPE_* value providing the data type of the output image.

rsgislib.zonalstats.extract_zone_img_band_values_to_hdf(in_img_info, in_msk_img, out_h5_file, mask_val, datatype)

Extract the all the pixel values for raster regions to a HDF5 file (1 column for each image band). Multiple input rasters can be provided and the bands extracted selected.

Parameters
  • in_img_info – is a list of rsgislib::imageutils::ImageBandInfo objects with the file names and list of image bands within that file to be extracted.

  • in_msk_img – is a string containing the name and path of the input image mask file; the mask file must have only 1 image band.

  • out_h5_file – is a string containing the name and path of the output HDF5 file

  • mask_val – is a float containing the value of the pixel within the mask for which values are to be extracted

  • datatype – is a rsgislib.TYPE_* value providing the data type of the output image.

import rsgislib.zonalstats
import rsgislib.imageutils
fileInfo = []
fileInfo.append(rsgislib.imageutils.ImageBandInfo('InputImg1.kea', 'Image1', [1,3,4]))
fileInfo.append(rsgislib.imageutils.ImageBandInfo('InputImg2.kea', 'Image2', [2]))
rsgislib.zonalstats.extract_zone_img_band_values_to_hdf(fileInfo, 'ClassMask.kea', 'ForestRefl.h5', 1.0)
rsgislib.zonalstats.random_sample_hdf5_file(in_h5_file, out_h5_file, sample, rnd_seed, datatype)

A function which randomly samples a HDF5 of extracted values.

Parameters
  • in_h5_file – is a string with the path to the input file.

  • out_h5_file – is a string with the path to the output file.

  • sample – is an integer with the number values to be sampled from the input file.

  • rnd_seed – is an integer which seeds the random number generator.

  • datatype – is a rsgislib.TYPE_* value providing the data type of the output image.

rsgislib.zonalstats.split_sample_hdf5_file(in_h5_file, out_h5_p1_file, out_h5_p2_file, sample, rnd_seed, datatype)

A function which splits samples a HDF5 of extracted values.

Parameters
  • in_h5_file – is a string with the path to the input file.

  • out_h5_p1_file – is a string with the path to the output file.

  • out_h5_p2_file – is a string with the path to the output file.

  • sample – is an integer with the number values to be sampled from the input file.

  • rnd_seed – is an integer which seeds the random number generator.

  • datatype – is a rsgislib.TYPE_* value providing the data type of the output image.

rsgislib.zonalstats.merge_extracted_hdf5_data(h5_files, out_h5_file, datatype=None)

A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extractZoneImageBandValues2HDF) with the same number of variables (i.e., columns) into a single file. For example, if class training regions have been sourced from multiple images.

Parameters
  • h5_files – a list of input files.

  • out_h5_file – the output file.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

inTrainSamples = ['MSS_CloudTrain1.h5', 'MSS_CloudTrain2.h5',
                  'MSS_CloudTrain3.h5']
cloudTrainSamples = 'LandsatMSS_CloudTrainingSamples.h5'
rsgislib.zonalstats.merge_extracted_hdf5_data(inTrainSamples, cloudTrainSamples)

Extracting Image Chips to HDF5

rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf(input_image_info, image_mask, mask_value, chip_size, output_hdf, rotate_chips=None, datatype=None)

A function which extracts a chip/window of image pixel values. The expectation is that this is used to train a classifier (see deep learning functions in classification) but it could be used to extract image ‘chips’ for other purposes.

Parameters
  • input_image_info – is a list of rsgislib.imageutils.ImageBandInfo objects specifying the input images and bands

  • image_mask – is a single band input image to specify the regions of interest

  • mask_value – is the pixel value within the imageMask to specify the region of interest

  • chip_size – is the chip size .

  • output_hdf – is the output HDF5 file. If it all ready exists then it is overwritten.

  • rotate_chips – specify whether you wish to have the image chips rotated during extraction to increase the number of samples. Default is None and will therefore be ignored. Otherwise, provide a list of rotation angles in degrees (e.g., [30, 60, 90, 120, 180])

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

rsgislib.zonalstats.split_sample_chip_hdf5_file(input_h5_file, sample_h5_file, remain_h5_file, sample_size, rnd_seed, datatype=None)

A function to split the HDF5 outputs from the rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf function into two sets by taking a random set with the defined sample size from the input file, saving the sample and the remainder to output HDF5 files.

Parameters
  • input_h5_file – The input HDF5 file to the split.

  • sample_h5_file – The output HDF5 file with the sample outputted.

  • remain_h5_file – The output HDF5 file with the remainder outputted.

  • sample_size – An integer specifying the size of the sample to be taken.

  • rnd_seed – An integer specifying the seed for the random number generator, allowing the same ‘random’ sample to be taken.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

rsgislib.zonalstats.merge_extracted_hdf5_chip_data(h5_files, out_h5_file, datatype=None)

A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf) with the same number of variables (i.e., image bands) and chip size into a single file. For example, if class training regions have been sourced from multiple images.

Parameters
  • h5_files – a list of input files.

  • out_h5_file – the output file.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

inTrainSamples = ['MSS_CloudTrain1.h5', 'MSS_CloudTrain2.h5',
                  'MSS_CloudTrain3.h5']
cloudTrainSamples = 'LandsatMSS_CloudTrainingSamples.h5'
rsgislib.zonalstats.merge_extracted_hdf5_chip_data(inTrainSamples,
                                                   cloudTrainSamples)
rsgislib.zonalstats.extract_ref_chip_zone_image_band_values_to_hdf(input_image_info, ref_img, ref_img_band, image_mask, mask_value, chip_size, output_hdf, rotate_chips=None, datatype=None)

A function which extracts a chip/window of image pixel values. The expectation is that this is used to train a classifier (see deep learning functions in classification) but it could be used to extract image ‘chips’ for other purposes.

Parameters
  • input_image_info – is a list of rsgislib.imageutils.ImageBandInfo objects specifying the input images and bands

  • ref_img – is an image file (same pixel size and projection as the other input images) which is used as the class training

  • ref_img_band – is the image band in the reference image to be used (only a single reference band can be used).

  • image_mask – is a single band input image to specify the regions of interest

  • mask_value – is the pixel value within the imageMask to specify the region of interest

  • chip_size – is the chip size .

  • output_hdf – is the output HDF5 file. If it all ready exists then it is overwritten.

  • rotate_chips – specify whether you wish to have the image chips rotated during extraction to increase the number of samples. Default is None and will therefore be ignored. Otherwise, provide a list of rotation angles in degrees (e.g., [30, 60, 90, 120, 180])

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

rsgislib.zonalstats.split_sample_ref_chip_hdf5_file(input_h5_file, sample_h5_file, remain_h5_file, sample_size, rnd_seed, datatype=None)

A function to split the HDF5 outputs from the rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf function into two sets by taking a random set with the defined sample size from the input file, saving the sample and the remainder to output HDF5 files.

Parameters
  • input_h5_file – The input HDF5 file to the split.

  • sample_h5_file – The output HDF5 file with the sample outputted.

  • remain_h5_file – The output HDF5 file with the remainder outputted.

  • sample_size – An integer specifying the size of the sample to be taken.

  • rnd_seed – An integer specifying the seed for the random number generator, allowing the same ‘random’ sample to be taken.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

rsgislib.zonalstats.merge_extracted_hdf5_chip_ref_data(h5_files, out_h5_file, datatype=None)

A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extract_ref_chip_zone_image_band_values_to_hdf) with the same number of variables (i.e., image bands) and chip size into a single file. For example, if class training regions have been sourced from multiple images.

Parameters
  • h5_files – a list of input files.

  • out_h5_file – the output file.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

inTrainSamples = ['MSS_CloudTrain1.h5', 'MSS_CloudTrain2.h5',
                  'MSS_CloudTrain3.h5']
cloudTrainSamples = 'LandsatMSS_CloudTrainingSamples.h5'
rsgislib.zonalstats.merge_extracted_hdf5_chip_ref_data(inTrainSamples,
                                                       cloudTrainSamples)

HDF Utilities

rsgislib.zonalstats.msk_h5_smpls_to_finite_values(input_h5, output_h5, datatype=None, lower_limit=None, upper_limit=None)

A function to remove values from a HDF5 sample file which are not finite. Upper and lower values can also be specified.

Parameters
  • input_h5 – Input HDF5 file.

  • output_h5 – Output HDF5 file.

  • datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.

  • lower_limit – Optional lower value threshold (if None then not used).

  • upper_limit – Optional upper value threshold (if None then not used).

rsgislib.zonalstats.get_hdf5_data(h5_files: List[str]) numpy.array
A function to get the data from a list of HDF files

(e.g., from rsgislib.zonalstats.extract_zone_img_band_values_to_hdf)

Parameters

h5_files – a list of input files.

Returns

numpy array with the data or None is there is no data to return.

rsgislib.zonalstats.get_var_from_hdf5_data(h5_files: List[str], var_idx: int = 0) numpy.array
A function to get the data for a specific variable from a list of HDF files

(e.g., from rsgislib.zonalstats.extract_zone_img_band_values_to_hdf)

Parameters
  • h5_files – a list of input files.

  • var_idx – the index for the variable of interest. Note array indexing starts at 0. So if you want image band 2 then that will be index 1 etc.

Returns

numpy array with the data or None is there is no data to return.