RSGISLib Zonal Stats Module
For undertaking a pixel-in-polygon analysis you need to consider the size of the polygons with respect to the size of the pixels being intersected.
Where the pixels are small with respect to the polygons so there is at least one pixel within the polygon then the best function to use is:
rsgislib.zonalstats.calc_zonal_band_stats
If the pixels are large with respect to the polygons then use the following function which intersects the polygon centroid.
rsgislib.zonalstats.calc_zonal_poly_pts_band_stats
If the pixel size in between and/or polygons are varied in size such that it is not certain that all polygons will contain a pixel then the following function will first attempt to intersect the polygon with the pixels and if there is not a pixel within the polygon then the centriod is used.
rsgislib.zonalstats.calc_zonal_band_stats_test_poly_pts
Points
These functions extract values from an image for a set of vector points.
- rsgislib.zonalstats.ext_point_band_values_file(vec_file: str, vec_lyr: str, input_img: str, img_band: int, min_thres: float, max_thres: float, out_no_data_val: float, out_field: str, reproj_vec: bool = False, vec_def_epsg: int = None)
A function which extracts point values for an input vector file for a particular image band.
- Parameters:
vec_file – input vector file
vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
min_thres – a lower threshold for values which will be included in the stats calculation.
max_thres – a upper threshold for values which will be included in the stats calculation.
out_no_data_val – output no data value if no valid pixels are within the polygon.
out_field – the name of the field in the vector layer where the pixel values will be written.
reproj_vec – boolean to specify whether the vector layer should be reprojected on the fly during processing if the projections are different. Default: False to ensure it is the users intention.
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.ext_point_band_values(vec_lyr_obj: Layer, input_img: str, img_band: int, min_thres: float, max_thres: float, out_no_data_val: float, out_field: str, reproj_vec: bool = False, vec_def_epsg: int = None)
A function which extracts point values for an input vector file for a particular image band.
- Parameters:
vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
min_thres – a lower threshold for values which will be included in the stats calculation.
max_thres – a upper threshold for values which will be included in the stats calculation.
out_no_data_val – output no data value if no valid pixels are within the polygon.
out_field – the name of the field in the vector layer where the pixel values will be written.
reproj_vec – boolean to specify whether the vector layer should be reprojected on the fly during processing if the projections are different. Default: False to ensure it is the users intention.
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted
Polygons
- rsgislib.zonalstats.calc_zonal_band_stats_file(vec_file: str, vec_lyr: str, input_img: str, img_band: int, min_thres: float, max_thres: float, out_no_data_val: float, min_field: str = None, max_field: str = None, mean_field: str = None, stddev_field: str = None, sum_field: str = None, count_field: str = None, mode_field: str = None, median_field: str = None, vec_def_epsg: int = None)
A function which calculates zonal statistics for a particular image band. If you know that the pixels in the values image are small with respect to the polygons then use this function.
- Parameters:
vec_file – input vector file
vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
min_thres – a lower threshold for values which will be included in the stats calculation.
max_thres – a upper threshold for values which will be included in the stats calculation.
out_no_data_val – output no data value if no valid pixels are within the polygon.
min_field – the name of the field for the min value (None or not specified to be ignored).
max_field – the name of the field for the max value (None or not specified to be ignored).
mean_field – the name of the field for the mean value (None or not specified to be ignored).
stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).
sum_field – the name of the field for the sum value (None or not specified to be ignored).
count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).
mode_field – the name of the field for the mode value (None or not specified to be ignored).
median_field – the name of the field for the median value (None or not specified to be ignored).
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.calc_zonal_band_stats(vec_lyr_obj: Layer, input_img: str, img_band: int, min_thres: float, max_thres: float, out_no_data_val: float, min_field: str = None, max_field: str = None, mean_field: str = None, stddev_field: str = None, sum_field: str = None, count_field: str = None, mode_field: str = None, median_field: str = None, vec_def_epsg: int = None)
A function which calculates zonal statistics for a particular image band. If you know that the pixels in the values image are small with respect to the polygons then use this function.
- Parameters:
vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
min_thres – a lower threshold for values which will be included in the stats calculation.
max_thres – a upper threshold for values which will be included in the stats calculation.
out_no_data_val – output no data value if no valid pixels are within the polygon.
min_field – the name of the field for the min value (None or not specified to be ignored).
max_field – the name of the field for the max value (None or not specified to be ignored).
mean_field – the name of the field for the mean value (None or not specified to be ignored).
stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).
sum_field – the name of the field for the sum value (None or not specified to be ignored).
count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).
mode_field – the name of the field for the mode value (None or not specified to be ignored).
median_field – the name of the field for the median value (None or not specified to be ignored).
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.calc_zonal_poly_pts_band_stats_file(vec_file: str, vec_lyr: str, input_img: str, img_band: int, out_field: str, vec_def_epsg: int = None)
A funtion which extracts zonal stats for a polygon using the polygon centroid. This is useful when you are intersecting a low resolution image with respect to the polygon resolution.
- Parameters:
vec_file – input vector file
vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
out_field – output field name within the vector layer.
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.calc_zonal_poly_pts_band_stats(vec_lyr_obj: Layer, input_img: str, img_band: int, out_field: str, vec_def_epsg: int = None)
A funtion which extracts zonal stats for a polygon using the polygon centroid. This is useful when you are intesecting a low resolution image with respect to the polygon resolution.
- Parameters:
vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
out_field – output field name within the vector layer.
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.calc_zonal_band_stats_test_poly_pts_file(vec_file: str, vec_lyr: str, input_img: str, img_band: int, min_thres: float, max_thres: float, out_no_data_val: float, percentile: float = None, percentile_field: str = None, min_field: str = None, max_field: str = None, mean_field: str = None, stddev_field: str = None, sum_field: str = None, count_field: str = None, mode_field: str = None, median_field: str = None, vec_def_epsg: int = None)
A function which calculates zonal statistics for a particular image band. If unsure then use this function. This function tests whether 1 or more pixels has been found within the polygon and if not then the centroid use used to find a value for the polygon.
If you are unsure as to whether the pixels are small enough to be contained within all the polygons then use this function.
- Parameters:
vec_file – input vector file
vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
min_thres – a lower threshold for values which will be included in the stats calculation.
max_thres – a upper threshold for values which will be included in the stats calculation.
out_no_data_val – output no data value if no valid pixels are within the polygon.
percentile – the percentile value to calculate (value between 0 and 100 inclusive).
percentile_field – the name of the field for the percentile value (None or not specified to be ignored).
min_field – the name of the field for the min value (None or not specified to be ignored).
max_field – the name of the field for the max value (None or not specified to be ignored).
mean_field – the name of the field for the mean value (None or not specified to be ignored).
stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).
sum_field – the name of the field for the sum value (None or not specified to be ignored).
count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).
mode_field – the name of the field for the mode value (None or not specified to be ignored).
median_field – the name of the field for the median value (None or not specified to be ignored).
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.calc_zonal_band_stats_test_poly_pts(vec_lyr_obj: Layer, input_img: str, img_band: int, min_thres: float, max_thres: float, out_no_data_val: float, percentile: float = None, percentile_field: str = None, min_field: str = None, max_field: str = None, mean_field: str = None, stddev_field: str = None, sum_field: str = None, count_field: str = None, mode_field: str = None, median_field: str = None, vec_def_epsg: int = None)
A function which calculates zonal statistics for a particular image band. If unsure then use this function. This function tests whether 1 or more pixels has been found within the polygon and if not then the centroid use used to find a value for the polygon.
If you are unsure as to whether the pixels are small enough to be contained within all the polygons then use this function.
- Parameters:
vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
min_thres – a lower threshold for values which will be included in the stats calculation.
max_thres – a upper threshold for values which will be included in the stats calculation.
out_no_data_val – output no data value if no valid pixels are within the polygon.
percentile – the percentile value to calculate (value between 0 and 100 inclusive).
percentile_field – the name of the field for the percentile value (None or not specified to be ignored).
min_field – the name of the field for the min value (None or not specified to be ignored).
max_field – the name of the field for the max value (None or not specified to be ignored).
mean_field – the name of the field for the mean value (None or not specified to be ignored).
stddev_field – the name of the field for the standard deviation value (None or not specified to be ignored).
sum_field – the name of the field for the sum value (None or not specified to be ignored).
count_field – the name of the field for the count (of number of pixels) value (None or not specified to be ignored).
mode_field – the name of the field for the mode value (None or not specified to be ignored).
median_field – the name of the field for the median value (None or not specified to be ignored).
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
Extracting Pixels to HDF5
- rsgislib.zonalstats.image_zone_to_hdf(input_img, vec_file, vec_lyr, out_h5_file, no_prj_warn=False, pxl_in_poly_method=METHOD_POLYCONTAINSPIXELCENTER)
Extract the all the pixel values for regions to a HDF5 file (1 column for each image band).
- Parameters:
input_img – is a string containing the name of the input image.
vec_file – is a string containing the input vector file path.
vec_lyr – is a string containing the name of the input vector layer.
out_h5_file – is a string containing name of the output HDF file.
no_prj_warn – is a bool, specifying whether to skip printing a warning if the vector and image have a different projections.
pxl_in_poly_method – is the method for determining if a pixel is included with a polygon of type rsgislib.zonalstats.METHOD_*.
from rsgislib import zonalstats input_img = './Rasters/injune_p142_casi_sub_utm.kea' vec_file = './Vectors/injune_p142_crowns_utm.shp' vec_lyr = 'injune_p142_crowns_utm' out_h5_file = './TestOutputs/InjuneP142.hdf' zonalstats.image_zone_to_hdf(input_img, vec_file, vec_lyr, out_h5_file, True, zonalstats.METHOD_POLYCONTAINSPIXELCENTER)
- rsgislib.zonalstats.extract_zone_img_values_to_hdf(input_img, in_msk_img, out_h5_file, mask_val, datatype)
Extract the all the pixel values for raster regions to a HDF5 file (1 column for each image band).
- Parameters:
input_img – is a string containing the name and path of the input file
in_msk_img – is a string containing the name and path of the input image mask file; the mask file must have only 1 image band.
out_h5_file – is a string containing the name and path of the output HDF5 file
mask_val – is a float containing the value of the pixel within the mask for which values are to be extracted
datatype – is a rsgislib.TYPE_* value providing the data type of the output image.
- rsgislib.zonalstats.extract_zone_img_band_values_to_hdf(in_img_info, in_msk_img, out_h5_file, mask_val, datatype)
Extract the all the pixel values for raster regions to a HDF5 file (1 column for each image band). Multiple input rasters can be provided and the bands extracted selected.
- Parameters:
in_img_info – is a list of rsgislib::imageutils::ImageBandInfo objects with the file names and list of image bands within that file to be extracted.
in_msk_img – is a string containing the name and path of the input image mask file; the mask file must have only 1 image band.
out_h5_file – is a string containing the name and path of the output HDF5 file
mask_val – is a float containing the value of the pixel within the mask for which values are to be extracted
datatype – is a rsgislib.TYPE_* value providing the data type of the output image.
import rsgislib.zonalstats import rsgislib.imageutils fileInfo = [] fileInfo.append(rsgislib.imageutils.ImageBandInfo('InputImg1.kea', 'Image1', [1,3,4])) fileInfo.append(rsgislib.imageutils.ImageBandInfo('InputImg2.kea', 'Image2', [2])) rsgislib.zonalstats.extract_zone_img_band_values_to_hdf(fileInfo, 'ClassMask.kea', 'ForestRefl.h5', 1.0)
- rsgislib.zonalstats.extract_zone_band_values_to_h5_file(vec_file: str, vec_lyr: str, input_img: str, img_band: int, out_img_base: str, zone_unq_id_field: str, vec_def_epsg: int = None)
A function which exports the pixels for the selected band intersecting with each feature in the raster image to a HDF5 file. The input vector layer requires a column with a unique ID which can be inserted into the output file name.
- Parameters:
vec_file – input vector file
vec_lyr – input vector layer within the input file which specifies the features and where the output stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the stats will be calculated. If defined the no data value of the band will be ignored.
out_img_base – The base path and file name for the output files. for example /path/to/file/something would produce an output file /path/to/file/something_1.h5
zone_unq_id_field – The column name within the vector layer with the unique ID which will be added to the output file name.
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.extract_zone_band_values_to_h5(vec_lyr_obj: Layer, input_img: str, img_band: int, out_img_base: str, zone_unq_id_field: str, vec_def_epsg: int = None)
A function which exports the pixels for the selected band intersecting with each feature in the raster image to a HDF5 file. The input vector layer requires a column with a unique ID which can be inserted into the output file name.
- Parameters:
vec_lyr_obj – OGR vector layer object containing the geometries being processed and to which the stats will be written.
input_img – the values image
img_band – the index (starting at 1) of the image band for which the pixels to be exported.
out_img_base – The base path and file name for the output files. for example /path/to/file/something would produce an output file /path/to/file/something_1.h5
zone_unq_id_field – The column name within the vector layer with the unique ID which will be added to the output file name.
vec_def_epsg – an EPSG code can be specified for the vector layer is the projection is not well defined within the inputted vector layer.
- rsgislib.zonalstats.random_sample_hdf5_file(in_h5_file, out_h5_file, sample, rnd_seed, datatype)
A function which randomly samples a HDF5 of extracted values.
- Parameters:
in_h5_file – is a string with the path to the input file.
out_h5_file – is a string with the path to the output file.
sample – is an integer with the number values to be sampled from the input file.
rnd_seed – is an integer which seeds the random number generator.
datatype – is a rsgislib.TYPE_* value providing the data type of the output image.
- rsgislib.zonalstats.split_sample_hdf5_file(in_h5_file, out_h5_p1_file, out_h5_p2_file, sample, rnd_seed, datatype)
A function which splits samples a HDF5 of extracted values.
- Parameters:
in_h5_file – is a string with the path to the input file.
out_h5_p1_file – is a string with the path to the output file.
out_h5_p2_file – is a string with the path to the output file.
sample – is an integer with the number values to be sampled from the input file.
rnd_seed – is an integer which seeds the random number generator.
datatype – is a rsgislib.TYPE_* value providing the data type of the output image.
- rsgislib.zonalstats.merge_extracted_hdf5_data(h5_files: List[str], out_h5_file: str, datatype: int = None)
A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extractZoneImageBandValues2HDF) with the same number of variables (i.e., columns) into a single file. For example, if class training regions have been sourced from multiple images.
- Parameters:
h5_files – a list of input files.
out_h5_file – the output file.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
inTrainSamples = ['MSS_CloudTrain1.h5', 'MSS_CloudTrain2.h5', 'MSS_CloudTrain3.h5'] cloudTrainSamples = 'LandsatMSS_CloudTrainingSamples.h5' rsgislib.zonalstats.merge_extracted_hdf5_data(inTrainSamples, cloudTrainSamples)
- rsgislib.zonalstats.merge_extracted_hdf5_vars_data(h5_files: List[str], out_h5_file: str, datatype: int = None)
A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extractZoneImageBandValues2HDF) with the same number of features (i.e., rows) but different number of variables into a single file. For example, if class training regions have been sourced from multiple images (e.g., months)
- Parameters:
h5_files – a list of input files.
out_h5_file – the output file.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
Extracting Image Chips to HDF5
- rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf(input_image_info: List[ImageBandInfo], image_mask: str, mask_value: int, chip_size: int, output_hdf: str, rotate_chips: List[float] = None, datatype: int = None)
A function which extracts a chip/window of image pixel values. The expectation is that this is used to train a classifier (see deep learning functions in classification) but it could be used to extract image ‘chips’ for other purposes.
- Parameters:
input_image_info – is a list of rsgislib.imageutils.ImageBandInfo objects specifying the input images and bands
image_mask – is a single band input image to specify the regions of interest
mask_value – is the pixel value within the imageMask to specify the region of interest
chip_size – is the chip size .
output_hdf – is the output HDF5 file. If it all ready exists then it is overwritten.
rotate_chips – specify whether you wish to have the image chips rotated during extraction to increase the number of samples. Default is None and will therefore be ignored. Otherwise, provide a list of rotation angles in degrees (e.g., [30, 60, 90, 120, 180])
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
- rsgislib.zonalstats.split_sample_chip_hdf5_file(input_h5_file: str, sample_h5_file: str, remain_h5_file: str, sample_size: int, rnd_seed: int, datatype: int = None)
A function to split the HDF5 outputs from the rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf function into two sets by taking a random set with the defined sample size from the input file, saving the sample and the remainder to output HDF5 files.
- Parameters:
input_h5_file – The input HDF5 file to the split.
sample_h5_file – The output HDF5 file with the sample outputted.
remain_h5_file – The output HDF5 file with the remainder outputted.
sample_size – An integer specifying the size of the sample to be taken.
rnd_seed – An integer specifying the seed for the random number generator, allowing the same ‘random’ sample to be taken.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
- rsgislib.zonalstats.merge_extracted_hdf5_chip_data(h5_files: List[str], out_h5_file: str, datatype: int = None)
A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf) with the same number of variables (i.e., image bands) and chip size into a single file. For example, if class training regions have been sourced from multiple images.
- Parameters:
h5_files – a list of input files.
out_h5_file – the output file.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
inTrainSamples = ['MSS_CloudTrain1.h5', 'MSS_CloudTrain2.h5', 'MSS_CloudTrain3.h5'] cloudTrainSamples = 'LandsatMSS_CloudTrainingSamples.h5' rsgislib.zonalstats.merge_extracted_hdf5_chip_data(inTrainSamples, cloudTrainSamples)
- rsgislib.zonalstats.extract_ref_chip_zone_image_band_values_to_hdf(input_image_info: List[ImageBandInfo], ref_img: str, ref_img_band: int, image_mask: str, mask_value: int, chip_size: int, output_hdf: str, rotate_chips: List[float] = None, datatype: int = None)
A function which extracts a chip/window of image pixel values. The expectation is that this is used to train a classifier (see deep learning functions in classification) but it could be used to extract image ‘chips’ for other purposes.
- Parameters:
input_image_info – is a list of rsgislib.imageutils.ImageBandInfo objects specifying the input images and bands
ref_img – is an image file (same pixel size and projection as the other input images) which is used as the class training
ref_img_band – is the image band in the reference image to be used (only a single reference band can be used).
image_mask – is a single band input image to specify the regions of interest
mask_value – is the pixel value within the imageMask to specify the region of interest
chip_size – is the chip size .
output_hdf – is the output HDF5 file. If it all ready exists then it is overwritten.
rotate_chips – specify whether you wish to have the image chips rotated during extraction to increase the number of samples. Default is None and will therefore be ignored. Otherwise, provide a list of rotation angles in degrees (e.g., [30, 60, 90, 120, 180])
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
- rsgislib.zonalstats.split_sample_ref_chip_hdf5_file(input_h5_file: str, sample_h5_file: str, remain_h5_file: str, sample_size: int, rnd_seed: int, datatype: int = None)
A function to split the HDF5 outputs from the rsgislib.zonalstats.extract_chip_zone_image_band_values_to_hdf function into two sets by taking a random set with the defined sample size from the input file, saving the sample and the remainder to output HDF5 files.
- Parameters:
input_h5_file – The input HDF5 file to the split.
sample_h5_file – The output HDF5 file with the sample outputted.
remain_h5_file – The output HDF5 file with the remainder outputted.
sample_size – An integer specifying the size of the sample to be taken.
rnd_seed – An integer specifying the seed for the random number generator, allowing the same ‘random’ sample to be taken.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
- rsgislib.zonalstats.merge_extracted_hdf5_chip_ref_data(h5_files: List[str], out_h5_file: str, datatype: int = None)
A function to merge a list of HDF files (e.g., from rsgislib.zonalstats.extract_ref_chip_zone_image_band_values_to_hdf) with the same number of variables (i.e., image bands) and chip size into a single file. For example, if class training regions have been sourced from multiple images.
- Parameters:
h5_files – a list of input files.
out_h5_file – the output file.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
inTrainSamples = ['MSS_CloudTrain1.h5', 'MSS_CloudTrain2.h5', 'MSS_CloudTrain3.h5'] cloudTrainSamples = 'LandsatMSS_CloudTrainingSamples.h5' rsgislib.zonalstats.merge_extracted_hdf5_chip_ref_data(inTrainSamples, cloudTrainSamples)
HDF Utilities
- rsgislib.zonalstats.msk_h5_smpls_to_finite_values(in_h5_file: str, out_h5_file: str, datatype: int = None, lower_limit: float = None, upper_limit: float = None, limits_all_vars: bool = True)
A function to remove values from a HDF5 sample file which are not finite. Upper and lower values can also be specified.
- Parameters:
in_h5_file – Input HDF5 file.
out_h5_file – Output HDF5 file.
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
lower_limit – Optional lower value threshold (if None then not used).
upper_limit – Optional upper value threshold (if None then not used).
limits_all_vars – If upper or lower thresholds specified then specify whether any or all of the variables need to be above the threshold.
- rsgislib.zonalstats.filter_h5_smpls_var_range(in_h5_file: str, out_h5_file: str, var_idx: int, lower_limit: float = None, upper_limit: float = None, datatype: int = None)
A function which filters the data in the H5 file using the data values of one variable. The function will remove rows where the value of the specified variable is not within the range specified. Note, you must specify at least the lower_limit or upper_limit but both can also be specified.
- Parameters:
in_h5_file – Input HDF5 file.
out_h5_file – Output HDF5 file.
var_idx – The index of the variable to be used for filtering. Note, indexing numbering starts at 0.
lower_limit – Optional lower value threshold (if None then not used).
upper_limit – Optional upper value threshold (if None then not used).
datatype – is the data type used for the output HDF5 file (e.g., rsgislib.TYPE_32FLOAT). If None (default) then the output data type will be float32.
- rsgislib.zonalstats.get_hdf5_data(h5_files: List[str]) array
- A function to get the data from a list of HDF files
(e.g., from rsgislib.zonalstats.extract_zone_img_band_values_to_hdf)
- Parameters:
h5_files – a list of input files.
- Returns:
numpy array with the data or None is there is no data to return.
- rsgislib.zonalstats.write_data_to_h5(data_arr: array, out_h5_file: str, datatype: int = 9)
A function which writes the data array to a HDF5 file.
- Parameters:
data_arr – Numpy array - shape: samples x variables
out_h5_file – the output hdf5 file path
datatype – the output data type
- rsgislib.zonalstats.get_var_from_hdf5_data(h5_files: List[str], var_idx: int = 0) array
- A function to get the data for a specific variable from a list of HDF files
(e.g., from rsgislib.zonalstats.extract_zone_img_band_values_to_hdf)
- Parameters:
h5_files – a list of input files.
var_idx – the index for the variable of interest. Note array indexing starts at 0. So if you want image band 2 then that will be index 1 etc.
- Returns:
numpy array with the data or None is there is no data to return.