RSGISLib Check Datasets Tools

Images

rsgislib.tools.checkdatasets.run_check_gdal_image_file(input_img: str, check_bands: bool = True, n_bands: int = 0, chk_proj: bool = False, epsg_code: int = 0, read_img: bool = False, smpl_n_pxls: int = 10, calc_chk_sum: bool = False, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, timeout: int = 4)

A function which checks a GDAL compatible image file using the check_gdal_image_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_image_file directly.

Parameters:
  • input_img – the file path to the gdal image file.

  • check_bands – boolean specifying whether individual image bands should be opened and checked (Default: True)

  • n_bands – int specifying the number of expected image bands. Ignored if 0; Default is 0.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • read_img – boolean specifying whether to try reading some image pixel values from the image. This option will read npxls (e.g., 10) random image pixel values from a randomly selected band.

  • smpl_n_pxls – The number of pixel values to be randomly selected (default = 10). More values = longer runtime.

  • calc_chk_sum – boolean specifying whether a checksum should be calculated for each band to check validity

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean whether the file is OK (i.e., passed tests) or not.

rsgislib.tools.checkdatasets.run_check_gdal_image_files(input_imgs: list, check_bands: bool = True, n_bands: int = 0, chk_proj: bool = False, epsg_code: int = 0, read_img: bool = False, smpl_n_pxls: int = 10, calc_chk_sum: bool = False, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, print_file_names: bool = False, timeout: int = 4)

A function which checks a list of GDAL compatible image files using the check_gdal_image_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_image_file directly.

Parameters:
  • input_imgs – a list of input images.

  • check_bands – boolean specifying whether individual image bands should be opened and checked (Default: True)

  • n_bands – int specifying the number of expected image bands. Ignored if 0; Default is 0.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • read_img – boolean specifying whether to try reading some image pixel values from the image. This option will read npxls (e.g., 10) random image pixel values from a randomly selected band.

  • smpl_n_pxls – The number of pixel values to be randomly selected (default = 10). More values = longer runtime.

  • calc_chk_sum – boolean specifying whether a checksum should be calculated for each band to check validity

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • print_file_names – print the names of the file before they are tested.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean whether all the files are OK (i.e., passed tests) or not.

rsgislib.tools.checkdatasets.check_gdal_image_file(input_img: str, check_bands: bool = True, n_bands: int = 0, chk_proj: bool = False, epsg_code: int = 0, read_img: bool = False, smpl_n_pxls: int = 10, calc_chk_sum: bool = False, max_file_size: int = None)

A function which checks a GDAL compatible image file and returns an error message if appropriate.

Parameters:
  • input_img – the file path to the gdal image file.

  • check_bands – boolean specifying whether individual image bands should be opened and checked (Default: True)

  • n_bands – int specifying the number of expected image bands. Ignored if 0; Default is 0.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • read_img – boolean specifying whether to try reading some image pixel values from the image. This option will read npxls (e.g., 10) random image pixel values from a randomly selected band.

  • smpl_n_pxls – The number of pixel values to be randomly selected (default = 10). More values = longer runtime.

  • calc_chk_sum – boolean specifying whether a checksum should be calculated for each band to check validity

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

Returns:

boolean (True: file ok; False: Error found), string (error message if required otherwise empty string)

Vectors

rsgislib.tools.checkdatasets.run_check_gdal_vector_file(vec_file: str, chk_proj: bool = True, epsg_code: int = 0, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, multi_file: bool = False, timeout: int = 4)

A function which checks a GDAL compatible vector file using the check_gdal_vector_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_vector_file directly.

Parameters:
  • vec_file – the file path to the gdal vector file.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • multi_file – if True (Default: False) then remove files with the same basename. Useful for ESRI Shapefiles which are made up of multiple files.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether the file is OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.run_check_gdal_vector_files(vec_files: list, chk_proj: bool = True, epsg_code: int = 0, max_file_size: int = None, rm_err: bool = False, print_err: bool = True, multi_file: bool = False, print_file_names: bool = False, timeout: int = 4)

A function which checks a list of GDAL compatible vector files using the check_gdal_vector_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_gdal_vector_file directly.

Parameters:
  • vec_files – list of input file paths.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • multi_file – if True (Default: False) then remove files with the same basename. Useful for ESRI Shapefiles which are made up of multiple files.

  • print_file_names – print the names of the file before they are tested.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether all the files are OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.check_gdal_vector_file(vec_file: str, chk_proj: bool = True, epsg_code: int = 0, max_file_size: int = None)

A function which checks a GDAL compatible vector file and returns an error message if appropriate.

Parameters:
  • vec_file – the file path to the gdal vector file.

  • chk_proj – boolean specifying whether to check that the projection has been defined.

  • epsg_code – int for the EPSG code for the projection. Error raised if image is not that projection.

  • max_file_size – int specifying the maximum file size for the input file. If None then ignored.

Returns:

boolean (True: file OK; False: Error found), string (error message if required otherwise empty string)

HDF5 Files

rsgislib.tools.checkdatasets.run_check_hdf5_file(input_file: str, rm_err: bool = False, print_err: bool = True, timeout: int = 4)

A function which checks a HDF5 file using the check_hdf5_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_hdf5_file directly.

Parameters:
  • input_file – the file path to the HDF5 file.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether the file is OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.run_check_hdf5_files(input_files: list, rm_err: bool = False, print_err: bool = True, print_file_names: bool = False, timeout: int = 4)

A function which checks a list of HDF5 files using the check_hdf5_file function where a mutliprocessing object is used to catch errors which can crash Python and still continue without crashing the Python environment.

You probably want to call this function rather than calling check_hdf5_file directly.

Parameters:
  • input_files – a list of input HDF5 file paths.

  • rm_err – boolean specifying whether to delete the file if an error is found

  • print_err – print any errors associated with the file to the console

  • print_file_names – print the names of the file before they are tested.

  • timeout – a timeout in seconds (Default = 4) for the tests to be undertaken.

Returns:

boolean specifying whether the file is OK (i.e., tests passed) or not.

rsgislib.tools.checkdatasets.check_hdf5_file(input_file: str)

A function which checks whether a HDF5 file is valid.

Parameters:

input_file – the file path to the input file.

Returns:

a boolean - True file is valid. False file is not valid.