RSGISLib Check Datasets Tools

Images

rsgislib.tools.checkdatasets.run_check_gdal_image_file(input_img: str, check_bands: bool = True, n_bands: int = 0, chk_proj: bool = False, epsg_code: int = 0, read_img: bool = False, smpl_n_pxls: int = 10, calc_chk_sum: bool = False, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, timeout: int = 4)

A function which checks a GDAL compatible image file using the check_gdal_image_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_image_file directly.

Parameters:
  • input_img – the file path to the gdal image file.

  • check_bands – boolean specifying whether individual image bands should be opened and checked (Default: True)

  • n_bands – int specifying the number of expected image bands. Ignored if 0; Default is 0.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • read_img – boolean specifying whether to try reading some image pixel values from the image. This option will read npxls (e.g., 10) random image pixel values from a randomly selected band.

  • smpl_n_pxls – The number of pixel values to be randomly selected (default = 10). More values = longer runtime.

  • calc_chk_sum – boolean specifying whether a checksum should be calculated for each band to check validity

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean whether the file is OK (i.e., passed tests) or not.

rsgislib.tools.checkdatasets.run_check_gdal_image_files(input_imgs: list, check_bands: bool = True, n_bands: int = 0, chk_proj: bool = False, epsg_code: int = 0, read_img: bool = False, smpl_n_pxls: int = 10, calc_chk_sum: bool = False, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, print_file_names: bool = False, timeout: int = 4)

A function which checks a list of GDAL compatible image files using the check_gdal_image_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_image_file directly.

Parameters:
  • input_imgs – a list of input images.

  • check_bands – boolean specifying whether individual image bands should be opened and checked (Default: True)

  • n_bands – int specifying the number of expected image bands. Ignored if 0; Default is 0.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • read_img – boolean specifying whether to try reading some image pixel values from the image. This option will read npxls (e.g., 10) random image pixel values from a randomly selected band.

  • smpl_n_pxls – The number of pixel values to be randomly selected (default = 10). More values = longer runtime.

  • calc_chk_sum – boolean specifying whether a checksum should be calculated for each band to check validity

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • print_file_names – print the names of the file before they are tested.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean whether all the files are OK (i.e., passed tests) or not.

rsgislib.tools.checkdatasets.check_gdal_image_file(input_img: str, check_bands: bool = True, n_bands: int = 0, chk_proj: bool = False, epsg_code: int = 0, read_img: bool = False, smpl_n_pxls: int = 10, calc_chk_sum: bool = False, max_file_size: int = None)

A function which checks a GDAL compatible image file and returns an error message if appropriate.

Parameters:
  • input_img – the file path to the gdal image file.

  • check_bands – boolean specifying whether individual image bands should be opened and checked (Default: True)

  • n_bands – int specifying the number of expected image bands. Ignored if 0; Default is 0.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • read_img – boolean specifying whether to try reading some image pixel values from the image. This option will read npxls (e.g., 10) random image pixel values from a randomly selected band.

  • smpl_n_pxls – The number of pixel values to be randomly selected (default = 10). More values = longer runtime.

  • calc_chk_sum – boolean specifying whether a checksum should be calculated for each band to check validity

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

Returns:

boolean (True: file ok; False: Error found), string (error message if required otherwise empty string)

rsgislib.tools.checkdatasets.cmp_to_ref_imgs(ref_imgs: List[str], input_img_dir: str, input_img_ext: str, rm_errs: bool = False, output_file: str = None) Dict

A utility which checks an image against a reference image (i.e., projection matches, number of pixels and coordinates). Note. the reference image file name assumed to be within the name of the images within the –input directory. If there are multiple images which match the reference image file name they will all be checked.

Parameters:
  • ref_imgs – List of reference image paths.

  • input_img_dir – Input image directory, containing the images to be checked

  • input_img_ext – The image extension to be checked (e.g., tif, kea)

  • rm_errs – Boolean specifying whether to delete the file if an error is found

  • output_file – optional output report with list of images checked, not checked and errors.

Vectors

rsgislib.tools.checkdatasets.run_check_gdal_vector_file(vec_file: str, chk_proj: bool = True, epsg_code: int = 0, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, multi_file: bool = False, timeout: int = 4)

A function which checks a GDAL compatible vector file using the check_gdal_vector_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_vector_file directly.

Parameters:
  • vec_file – the file path to the gdal vector file.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • multi_file – if True (Default: False) then remove files with the same basename. Useful for ESRI Shapefiles which are made up of multiple files.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether the file is OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.run_check_gdal_vector_files(vec_files: list, chk_proj: bool = True, epsg_code: int = 0, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, multi_file: bool = False, print_file_names: bool = False, timeout: int = 4)

A function which checks a list of GDAL compatible vector files using the check_gdal_vector_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_vector_file directly.

Parameters:
  • vec_files – list of input file paths.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • multi_file – if True (Default: False) then remove files with the same basename. Useful for ESRI Shapefiles which are made up of multiple files.

  • print_file_names – print the names of the file before they are tested.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether all the files are OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.check_gdal_vector_file(vec_file: str, chk_proj: bool = True, epsg_code: int = 0, max_file_size: int = None)

A function which checks a GDAL compatible vector file and returns an error message if appropriate.

Parameters:
  • vec_file – the file path to the gdal vector file.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

Returns:

boolean (True: file OK; False: Error found), string (error message if required otherwise empty string)

HDF5 Files

rsgislib.tools.checkdatasets.run_check_hdf5_file(input_file: str, rm_err: bool = False, print_err: bool = True, timeout: int = 4)

A function which checks a HDF5 file using the check_hdf5_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_hdf5_file directly.

Parameters:
  • input_file – the file path to the HDF5 file.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether the file is OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.run_check_hdf5_files(input_files: list, rm_err: bool = False, print_err: bool = True, print_file_names: bool = False, timeout: int = 4)

A function which checks a list of HDF5 files using the check_hdf5_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_hdf5_file directly.

Parameters:
  • input_files – a list of input HDF5 file paths.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • print_file_names – print the names of the file before they are tested.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether the file is OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.check_hdf5_file(input_file: str)

A function which checks whether a HDF5 file is valid.

Parameters:

input_file – the file path to the input file.

Returns:

a boolean - True file is valid. False file is not valid.