RSGISLib File Tools

Naming

rsgislib.tools.filetools.get_file_basename(input_file: str, check_valid: bool = False, n_comps: int = 0, rm_n_exts: int = 0) str

Uses os.path module to return file basename (i.e., path and extension removed)

Parameters
  • input_file – string for the input file name and path

  • check_valid – if True then resulting basename will be checked for punctuation characters (other than underscores) and spaces, punctuation will be either removed and spaces changed to an underscore. (Default = False)

  • n_comps – if > 0 then the resulting basename will be split using underscores and the return based name will be defined using the n_comps components split by under scores.

  • rm_n_exts – used where an input file has more than one extension (e.g., tar.gz) and only n extensions should be removed. Default: 0 which will removed all extensions calculated based on the number of full-stops (.) within the file name. If a value of 1 was provided for filename.tar.gz then the returns output would be filename.tar.

Returns

basename for file

rsgislib.tools.filetools.get_dir_name(input_file: str) str

A function which returns just the name of the directory of the input path (file or directory) without the rest of the path.

Parameters

input_file – string for the input path (file or directory) name and path

Returns

directory name

rsgislib.tools.filetools.split_path_all(input_path: str) List[str]

A function which splits all the components within a file path into a list of components rather than the os.path.split function which just splits the last item.

Parameters

input_path – the input file path.

Returns

a list of the file path components.

Searching

rsgislib.tools.filetools.find_file(dir_path: str, file_search: str) str

Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).

Parameters
  • dir_path – string for the input directory path

  • file_search – string with a * wildcard for the file being searched for.

Returns

string with the path to the file

import rsgislib.tools.filetools
file_path = rsgislib.tools.filetools.find_file("in/dir", "*N15W093*.tif")
rsgislib.tools.filetools.find_file_none(dir_path: str, file_search: str) Union[None, str]

Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s). Returns None is not found.

Parameters
  • dir_path – string for the input directory path

  • file_search – string with a * wildcard for the file being searched for.

Returns

string with the path to the file

import rsgislib.tools.filetools
file_path = rsgislib.tools.filetools.find_file_none("in/dir", "*N15W093*.tif")
if file_path is not None:
    print(file_path)
rsgislib.tools.filetools.find_files_ext(dir_path: str, ending: str) dict

Find all the files within a directory structure with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.

Parameters
  • dir_path – the base directory path within which to search.

  • ending – the file ending (e.g., .txt, or txt or .kea, kea).

Returns

dict with file name as key

import rsgislib.tools.filetools
file_paths = rsgislib.tools.filetools.find_files_ext("in/dir", ".tif")
rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths: list, ending: str) dict

Find all the files within a list of input directories and the structure beneath with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.

Parameters
  • dir_paths – a list of base directory paths within which to search.

  • ending – the file ending (e.g., .txt, or txt or .kea, kea).

Returns

dict with file name as key

import rsgislib.tools.filetools
dir_paths = ["in/dir", "test/dir", "img/files"]
file_paths = rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths, ".tif")
rsgislib.tools.filetools.find_first_file(dir_path: str, file_search: str, rtn_except: bool = True) str

Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).

Parameters
  • dir_path – The directory within which to search, note that the search will be within sub-directories within the base directory until a file meeting the search criteria are met.

  • file_search – The file search string in the file name and must contain a wild character (i.e., *).

  • rtn_except – if True then an exception will be raised if no file or multiple files are found (default). If False then None will be returned rather than an exception raised.

Returns

The file found (or None if rtn_except=False)

import rsgislib.tools.filetools
file_paths = rsgislib.tools.filetools.find_first_file("in/dir", "*N15W093*.tif")
rsgislib.tools.filetools.get_files_mod_time(file_lst: list, dt_before: Optional[datetime.datetime] = None, dt_after: Optional[datetime.datetime] = None) list

A function which subsets a list of files based on datetime of last modification. The function also does a check as to whether a file exists, files which don’t exist will be ignored.

Parameters
  • file_lst – The list of file path - represented as strings.

  • dt_before – a datetime object with a date/time where files modified before this will be returned

  • dt_after – a datetime object with a date/time where files modified after this will be returned

Example:

import glob
import datetime
import rsgislib.tools.filetools

input_files = glob.glob("in/dir/*.tif")
dt_before = datetime.datetime(year=2020, month=12, day=25, hour=12, minute=30)
file_path = rsgislib.tools.filetools.get_files_mod_time(input_files, dt_before)
rsgislib.tools.filetools.find_files_size_limits(dir_path: str, file_search: str, min_size: int = 0, max_size: Optional[int] = None) list

Search for files with a path using glob. Therefore, the file paths returned is a true path. Within the file_search provide the file names with ‘*’ as wildcard(s).

Parameters
  • dir_path – string for the input directory path

  • file_search – string with a * wildcard for the file being searched for.

  • min_size – the minimum file size in bytes (default is 0)

  • max_size – the maximum file size in bytes, if None (default) then ignored.

Returns

string with the path to the file

Example:

import rsgislib.tools.filetools
file_paths = rsgislib.tools.filetools.find_files_size_limits("in/dir",
                                                             "*N15W093*.tif",
                                                             0, 100000)
rsgislib.tools.filetools.get_dir_list(dir_path: str, inc_hidden: bool = False) list

Function which get the list of directories within the specified path.

Parameters
  • dir_path – file path to search within

  • inc_hidden – boolean specifying whether hidden files should be included (default=False)

Returns

list of directory paths

Example:

import rsgislib.tools.filetools
files = rsgislib.tools.filetools.get_dir_list("in/dir")

Archives

rsgislib.tools.filetools.create_directory_archive(in_dir: str, out_arch: str, arch_format: str) str

A function which creates an archive from an input directory. This function uses subprocess to call the appropriate command line function.

Please note that this function has similar functionality to shutil.make_archive and I would recommend you use that but I found it sometimes produces an error so I provided this function which uses the terminal functions as a drop in replacement.

Parameters
  • in_dir – The input directory path for which the archive with be created.

  • out_arch – The output archive file path and name. Note this should not include an extension as this will be added automatically.

  • arch_format – The format for the archive. The options are: zip, tar, gztar, bztar, xztar

Returns

a string with the full file path and name, including the file extension.

rsgislib.tools.filetools.create_targz_arch(out_arch_file: str, file_list: list, base_path: Optional[str] = None)

A function which can be used to create a tar.gz file containing the list of input files. If you wish to remove some of the directory structure from the file paths in provided then a single base_path can be provided and will be removed from the file paths in the archive.

Parameters
  • out_arch_file – the output tar.gz file path

  • file_list – the list of files to be added to the archive.

  • base_path – the base path which will be removed from all the input files. Note, this means all the input files must have the same base path. Optional: Default is None (i.e., ignored).

rsgislib.tools.filetools.untar_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a tar file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns

output directory where data was extracted to.

rsgislib.tools.filetools.untar_gz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a tar.gz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns

output directory where data was extracted to.

rsgislib.tools.filetools.unzip_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a zip file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns

output directory where data was extracted to.

rsgislib.tools.filetools.untar_bz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a tar.bz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns

output directory where data was extracted to.

File Info

rsgislib.tools.filetools.file_is_hidden(dir_path: str) bool

A function to test whether a file or folder is ‘hidden’ or not on the file system. Should be cross platform between Linux/UNIX and windows.

Parameters

dir_path – input file path to be tested

Returns

boolean (True = hidden)

Example:

import rsgislib.tools.filetools
if rsgislib.tools.filetools.file_is_hidden("in/dir/img.kea"):
    print("File is hidden")
rsgislib.tools.filetools.get_file_size(file_path: str, unit: str = 'bytes') float

A function which returns the file size of a file in the specified unit.

Units: * bytes * kb - kilobytes (bytes / 1024) * mb - megabytes (bytes / 1024^2) * gb - gigabytes (bytes / 1024^3) * tb - terabytes (bytes / 1024^4)

Parameters
  • file_path – the path to the file for which the size is to be calculated.

  • unit – the unit for the file size. Options: bytes, kb, mb, gb, tb

Returns

float for the file size.

Sorting

rsgislib.tools.filetools.sort_imgs_to_dirs_utm(input_imgs_dir: str, file_search_str: str, out_base_dir: str)

A function which will sort a series of input image files which a projected using the UTM system into individual directories per UTM zone. Please note that the input files are moved on your system!!

Parameters
  • input_imgs_dir – directory where the input files are to be found.

  • file_search_str – the wildcard search string to find files within the input directory (e.g., “in_dir/*.kea”).

  • out_base_dir – the output directory where the UTM folders will be created and the files copied.

Deleting

rsgislib.tools.filetools.delete_file_with_basename(input_file: str, print_rms=True)

Function to delete all the files which have a path and base name defined in the input_file attribute.

Parameters
  • input_file – string for the input file name and path

  • print_rms – print the files being deleted (Default: True)

rsgislib.tools.filetools.delete_file_silent(input_file: str) bool

A function which can be used in-place of os.remove to delete a file but if checks if the file exists and only calls os.remove if it does exist but also catches any Exceptions from os.remove and just returns a boolean as to whether the input_file has been removed.

Parameters

input_file – input file path for the file which is to be removed.

Returns

boolean (True: File was removed or did not exist. False: os.remove through an Exception so assume file was not removed)

Lock Files

rsgislib.tools.filetools.get_file_lock(input_file: str, sleep_period: int = 1, wait_iters: int = 120, use_except: bool = False) bool

A function which gets a lock on a file.

The lock file will be a unix hidden file (i.e., starts with a .) and it will have .lok added to the end. E.g., for input file hello_world.txt the lock file will be .hello_world.txt.lok. The contents of the lock file will be the time and date of creation.

Using the default parameters (sleep 1 second and wait 120 iterations) if the lock isn’t available it will be retried every second for 120 seconds (i.e., 2 mins).

Parameters
  • input_file – The input file for which the lock will be created.

  • sleep_period – time in seconds to sleep for, if the lock isn’t available. (Default=1 second)

  • wait_iters – the number of iterations to wait for before giving up. (Default=120)

  • use_except – Boolean. If True then an exception will be thrown if the lock is not available. If False (default) False will be returned if the lock is not successful.

Returns

boolean. True: lock was successfully gained. False: lock was not gained.

rsgislib.tools.filetools.release_file_lock(input_file: str)

A function which releases a lock file for the input file.

Parameters

input_file – The input file for which the lock will be created.

rsgislib.tools.filetools.clean_file_locks(dir_path: str, timeout: int = 3600)

A function which cleans up any remaining lock file (i.e., if an application has crashed). The timeout time will be compared with the time written within the file.

Parameters
  • dir_path – the file path to search for lock files (i.e., “.*.lok”)

  • timeout – the time (in seconds) for the timeout. Default: 3600 (1 hours)

File Hash

rsgislib.tools.filetools.create_sha1_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA1 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA1 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA1 hash string of the file.

rsgislib.tools.filetools.create_sha224_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA224 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA224 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA224 hash string of the file.

rsgislib.tools.filetools.create_sha256_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA256 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA256 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA256 hash string of the file.

rsgislib.tools.filetools.create_sha384_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA384 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA384 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA384 hash string of the file.

rsgislib.tools.filetools.create_sha512_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA512 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA512 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA512 hash string of the file.

rsgislib.tools.filetools.create_md5_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the MD5 hash string of the input file.

Parameters
  • input_file – the input file for which the MD5 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

MD5 hash string of the file.

rsgislib.tools.filetools.create_blake2b_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the Blake2B hash string of the input file.

Parameters
  • input_file – the input file for which the Blake2B hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

Blake2B hash string of the file.

rsgislib.tools.filetools.create_blake2s_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the Blake2S hash string of the input file.

Parameters
  • input_file – the input file for which the Blake2S hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

Blake2S hash string of the file.

rsgislib.tools.filetools.create_sha3_224_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_224 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA3_224 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA3_224 hash string of the file.

rsgislib.tools.filetools.create_sha3_256_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_256 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA3_256 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA3_256 hash string of the file.

rsgislib.tools.filetools.create_sha3_384_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_384 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA3_384 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA3_384 hash string of the file.

rsgislib.tools.filetools.create_sha3_512_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_512 hash string of the input file.

Parameters
  • input_file – the input file for which the SHA3_512 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns

SHA3_512 hash string of the file.

Other

rsgislib.tools.filetools.convert_file_size_units(in_size: int, in_unit: str, out_unit: str) float

A function which converts between file size units

Parameters
  • in_size – input file size

  • in_unit – the input unit for the file size. Options: bytes, kb, mb, gb, tb

  • out_unit – the output unit for the file size. Options: bytes, kb, mb, gb, tb

Returns

float for the output file size