RSGISLib File Tools

Naming

rsgislib.tools.filetools.get_file_basename(input_file: str, check_valid: bool = False, n_comps: int = 0, rm_n_exts: int = 0) str

Uses os.path module to return file basename (i.e., path and extension removed)

Parameters:
  • input_file – string for the input file name and path

  • check_valid – if True then resulting basename will be checked for punctuation characters (other than underscores) and spaces, punctuation will be either removed and spaces changed to an underscore. (Default = False)

  • n_comps – if > 0 then the resulting basename will be split using underscores and the return based name will be defined using the n_comps components split by under scores.

  • rm_n_exts – used where an input file has more than one extension (e.g., tar.gz) and only n extensions should be removed. Default: 0 which will removed all extensions calculated based on the number of full-stops (.) within the file name. If a value of 1 was provided for filename.tar.gz then the returns output would be filename.tar.

Returns:

basename for file

rsgislib.tools.filetools.get_dir_name(input_file: str) str

A function which returns just the name of the directory of the input path (file or directory) without the rest of the path.

Parameters:

input_file – string for the input path (file or directory) name and path

Returns:

directory name

rsgislib.tools.filetools.split_path_all(input_path: str) List[str]

A function which splits all the components within a file path into a list of components rather than the os.path.split function which just splits the last item.

Parameters:

input_path – the input file path.

Returns:

a list of the file path components.

Searching

rsgislib.tools.filetools.find_file(dir_path: str, file_search: str) str

Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).

Parameters:
  • dir_path – string for the input directory path

  • file_search – string with a * wildcard for the file being searched for.

Returns:

string with the path to the file

import rsgislib.tools.filetools
file_path = rsgislib.tools.filetools.find_file("in/dir", "*N15W093*.tif")
rsgislib.tools.filetools.find_file_none(dir_path: str, file_search: str) None | str

Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s). Returns None is not found.

Parameters:
  • dir_path – string for the input directory path

  • file_search – string with a * wildcard for the file being searched for.

Returns:

string with the path to the file

import rsgislib.tools.filetools
file_path = rsgislib.tools.filetools.find_file_none("in/dir", "*N15W093*.tif")
if file_path is not None:
    print(file_path)
rsgislib.tools.filetools.find_files_ext(dir_path: str, ending: str) dict

Find all the files within a directory structure with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.

Parameters:
  • dir_path – the base directory path within which to search.

  • ending – the file ending (e.g., .txt, or txt or .kea, kea).

Returns:

dict with file name as key

import rsgislib.tools.filetools
file_paths = rsgislib.tools.filetools.find_files_ext("in/dir", ".tif")
rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths: list, ending: str) dict

Find all the files within a list of input directories and the structure beneath with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.

Parameters:
  • dir_paths – a list of base directory paths within which to search.

  • ending – the file ending (e.g., .txt, or txt or .kea, kea).

Returns:

dict with file name as key

import rsgislib.tools.filetools
dir_paths = ["in/dir", "test/dir", "img/files"]
file_paths = rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths, ".tif")
rsgislib.tools.filetools.find_first_file(dir_path: str, file_search: str, rtn_except: bool = True) str

Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).

Parameters:
  • dir_path – The directory within which to search, note that the search will be within sub-directories within the base directory until a file meeting the search criteria are met.

  • file_search – The file search string in the file name and must contain a wild character (i.e., *).

  • rtn_except – if True then an exception will be raised if no file or multiple files are found (default). If False then None will be returned rather than an exception raised.

Returns:

The file found (or None if rtn_except=False)

import rsgislib.tools.filetools
file_paths = rsgislib.tools.filetools.find_first_file("in/dir", "*N15W093*.tif")
rsgislib.tools.filetools.get_files_mod_time(file_lst: list, dt_before: datetime = None, dt_after: datetime = None) list

A function which subsets a list of files based on datetime of last modification. The function also does a check as to whether a file exists, files which don’t exist will be ignored.

Parameters:
  • file_lst – The list of file path - represented as strings.

  • dt_before – a datetime object with a date/time where files modified before this will be returned

  • dt_after – a datetime object with a date/time where files modified after this will be returned

Example:

import glob
import datetime
import rsgislib.tools.filetools

input_files = glob.glob("in/dir/*.tif")
dt_before = datetime.datetime(year=2020, month=12, day=25, hour=12, minute=30)
file_path = rsgislib.tools.filetools.get_files_mod_time(input_files, dt_before)
rsgislib.tools.filetools.find_files_size_limits(dir_path: str, file_search: str, min_size: int = 0, max_size: int = None) list

Search for files with a path using glob. Therefore, the file paths returned is a true path. Within the file_search provide the file names with ‘*’ as wildcard(s).

Parameters:
  • dir_path – string for the input directory path

  • file_search – string with a * wildcard for the file being searched for.

  • min_size – the minimum file size in bytes (default is 0)

  • max_size – the maximum file size in bytes, if None (default) then ignored.

Returns:

string with the path to the file

Example:

import rsgislib.tools.filetools
file_paths = rsgislib.tools.filetools.find_files_size_limits("in/dir",
                                                             "*N15W093*.tif",
                                                             0, 100000)
rsgislib.tools.filetools.get_dir_list(dir_path: str, inc_hidden: bool = False) list

Function which get the list of directories within the specified path.

Parameters:
  • dir_path – file path to search within

  • inc_hidden – boolean specifying whether hidden files should be included (default=False)

Returns:

list of directory paths

Example:

import rsgislib.tools.filetools
files = rsgislib.tools.filetools.get_dir_list("in/dir")

Archives

rsgislib.tools.filetools.create_directory_archive(in_dir: str, out_arch: str, arch_format: str) str

A function which creates an archive from an input directory. This function uses subprocess to call the appropriate command line function.

Please note that this function has similar functionality to shutil.make_archive and I would recommend you use that but I found it sometimes produces an error so I provided this function which uses the terminal functions as a drop in replacement.

Parameters:
  • in_dir – The input directory path for which the archive with be created.

  • out_arch – The output archive file path and name. Note this should not include an extension as this will be added automatically.

  • arch_format – The format for the archive. The options are: zip, tar, gztar, bztar, xztar

Returns:

a string with the full file path and name, including the file extension.

rsgislib.tools.filetools.create_targz_arch(out_arch_file: str, file_list: list, base_path: str = None)

A function which can be used to create a tar.gz file containing the list of input files. If you wish to remove some of the directory structure from the file paths in provided then a single base_path can be provided and will be removed from the file paths in the archive.

Parameters:
  • out_arch_file – the output tar.gz file path

  • file_list – the list of files to be added to the archive.

  • base_path – the base path which will be removed from all the input files. Note, this means all the input files must have the same base path. Optional: Default is None (i.e., ignored).

rsgislib.tools.filetools.untar_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a tar file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters:
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns:

output directory where data was extracted to.

rsgislib.tools.filetools.untar_gz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a tar.gz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters:
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns:

output directory where data was extracted to.

rsgislib.tools.filetools.unzip_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a zip file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters:
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns:

output directory where data was extracted to.

rsgislib.tools.filetools.untar_bz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str

A function which extracts data from a tar.bz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.

Parameters:
  • in_file – The input archive file.

  • out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir

  • gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)

  • verbose – If True (default: False) then more user feedback will be printed to the console.

Returns:

output directory where data was extracted to.

File Info

rsgislib.tools.filetools.file_is_hidden(dir_path: str) bool

A function to test whether a file or folder is ‘hidden’ or not on the file system. Should be cross platform between Linux/UNIX and windows.

Parameters:

dir_path – input file path to be tested

Returns:

boolean (True = hidden)

Example:

import rsgislib.tools.filetools
if rsgislib.tools.filetools.file_is_hidden("in/dir/img.kea"):
    print("File is hidden")
rsgislib.tools.filetools.get_file_size(file_path: str, unit: str = 'bytes') float

A function which returns the file size of a file in the specified unit.

Units: * bytes * kb - kilobytes (bytes / 1024) * mb - megabytes (bytes / 1024^2) * gb - gigabytes (bytes / 1024^3) * tb - terabytes (bytes / 1024^4)

Parameters:
  • file_path – the path to the file for which the size is to be calculated.

  • unit – the unit for the file size. Options: bytes, kb, mb, gb, tb

Returns:

float for the file size.

Sorting

rsgislib.tools.filetools.sort_imgs_to_dirs_utm(input_imgs_dir: str, file_search_str: str, out_base_dir: str)

A function which will sort a series of input image files which a projected using the UTM system into individual directories per UTM zone. Please note that the input files are moved on your system!!

Parameters:
  • input_imgs_dir – directory where the input files are to be found.

  • file_search_str – the wildcard search string to find files within the input directory (e.g., “in_dir/*.kea”).

  • out_base_dir – the output directory where the UTM folders will be created and the files copied.

rsgislib.tools.filetools.natural_sort_file_names(in_file_lst: List[str]) List[str]

A function which performs a natural sort of a list of files. For example, if you start file names with dates (YYYYMMDD) then this function will return the list of file names in date order (earliest first).

Parameters:

in_file_lst – the input list of file paths. The get_file_basename function is used to extract the file name which is used for the sort.

Returns:

the sorted list of names.

rsgislib.tools.filetools.sort_file_by_datetime(in_file_lst: List[str]) List[str]

A function which sorts a list of file based on the list the file was last modified. The list will be outputted in ascending order (i.e., oldest to earliest). The python function os.path.getmtime is used to access the modified file for the file.

Parameters:

in_file_lst – the input list of file paths, which need to be accessible.

Returns:

the sorted list of names.

Deleting

rsgislib.tools.filetools.delete_file_with_basename(input_file: str, print_rms=True)

Function to delete all the files which have a path and base name defined in the input_file attribute.

Parameters:
  • input_file – string for the input file name and path

  • print_rms – print the files being deleted (Default: True)

rsgislib.tools.filetools.delete_file_silent(input_file: str) bool

A function which can be used in-place of os.remove to delete a file but if checks if the file exists and only calls os.remove if it does exist but also catches any Exceptions from os.remove and just returns a boolean as to whether the input_file has been removed.

Parameters:

input_file – input file path for the file which is to be removed.

Returns:

boolean (True: File was removed or did not exist. False: os.remove through an Exception so assume file was not removed)

rsgislib.tools.filetools.rm_files_size_gt(file_path: str, file_srch: str, min_size: int, rm_file: bool = False, rm_use_basename: bool = False)

A function which removes all the files from the search path which are greater than the specified size.

Note, the file_path and file_srch will be merged with os.path.join. e.g., file_path=”/hello/world”, file_srch=”.txt” would result in “/hello/world/.txt”. Wild characters can get put in both parts if needed.

Parameters:
  • file_path – The directory within which the files will be search for.

  • file_srch – The search string (must have a wild card ‘*’ for glob).

  • min_size – the minimum valid size, above this size the files will be deleted. In bytes.

  • rm_file – If True then files will be deleted if False then a list of ‘rm file’ commands will be produced rather than the files actually being deleted. (default: False)

  • rm_use_basename – If True then all files with the same base name (i.e., same name but different file extension) within the same directory will also be deleted. Useful if you have file formats which have multiple files. (default: False)

Lock Files

rsgislib.tools.filetools.get_file_lock(input_file: str, sleep_period: int = 1, wait_iters: int = 120, use_except: bool = False) bool

A function which gets a lock on a file.

The lock file will be a unix hidden file (i.e., starts with a .) and it will have .lok added to the end. E.g., for input file hello_world.txt the lock file will be .hello_world.txt.lok. The contents of the lock file will be the time and date of creation.

Using the default parameters (sleep 1 second and wait 120 iterations) if the lock isn’t available it will be retried every second for 120 seconds (i.e., 2 mins).

Parameters:
  • input_file – The input file for which the lock will be created.

  • sleep_period – time in seconds to sleep for, if the lock isn’t available. (Default=1 second)

  • wait_iters – the number of iterations to wait for before giving up. (Default=120)

  • use_except – Boolean. If True then an exception will be thrown if the lock is not available. If False (default) False will be returned if the lock is not successful.

Returns:

boolean. True: lock was successfully gained. False: lock was not gained.

rsgislib.tools.filetools.release_file_lock(input_file: str)

A function which releases a lock file for the input file.

Parameters:

input_file – The input file for which the lock will be created.

rsgislib.tools.filetools.clean_file_locks(dir_path: str, timeout: int = 3600)

A function which cleans up any remaining lock file (i.e., if an application has crashed). The timeout time will be compared with the time written within the file.

Parameters:
  • dir_path – the file path to search for lock files (i.e., “.*.lok”)

  • timeout – the time (in seconds) for the timeout. Default: 3600 (1 hours)

File Hash

rsgislib.tools.filetools.create_sha1_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA1 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA1 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA1 hash string of the file.

rsgislib.tools.filetools.create_sha224_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA224 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA224 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA224 hash string of the file.

rsgislib.tools.filetools.create_sha256_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA256 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA256 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA256 hash string of the file.

rsgislib.tools.filetools.create_sha384_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA384 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA384 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA384 hash string of the file.

rsgislib.tools.filetools.create_sha512_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA512 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA512 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA512 hash string of the file.

rsgislib.tools.filetools.create_md5_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the MD5 hash string of the input file.

Parameters:
  • input_file – the input file for which the MD5 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

MD5 hash string of the file.

rsgislib.tools.filetools.create_blake2b_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the Blake2B hash string of the input file.

Parameters:
  • input_file – the input file for which the Blake2B hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

Blake2B hash string of the file.

rsgislib.tools.filetools.create_blake2s_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the Blake2S hash string of the input file.

Parameters:
  • input_file – the input file for which the Blake2S hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

Blake2S hash string of the file.

rsgislib.tools.filetools.create_sha3_224_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_224 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA3_224 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA3_224 hash string of the file.

rsgislib.tools.filetools.create_sha3_256_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_256 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA3_256 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA3_256 hash string of the file.

rsgislib.tools.filetools.create_sha3_384_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_384 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA3_384 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA3_384 hash string of the file.

rsgislib.tools.filetools.create_sha3_512_hash(input_file: str, block_size: int = 4096) str

A function which calculates finds the SHA3_512 hash string of the input file.

Parameters:
  • input_file – the input file for which the SHA3_512 hash string with be found.

  • block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)

Returns:

SHA3_512 hash string of the file.

Other

rsgislib.tools.filetools.convert_file_size_units(in_size: int, in_unit: str, out_unit: str) float

A function which converts between file size units

Parameters:
  • in_size – input file size

  • in_unit – the input unit for the file size. Options: bytes, kb, mb, gb, tb

  • out_unit – the output unit for the file size. Options: bytes, kb, mb, gb, tb

Returns:

float for the output file size