RSGISLib File Tools
Naming
- rsgislib.tools.filetools.get_file_basename(input_file: str, check_valid: bool = False, n_comps: int = 0, rm_n_exts: int = 0) str
Uses os.path module to return file basename (i.e., path and extension removed)
- Parameters:
input_file – string for the input file name and path
check_valid – if True then resulting basename will be checked for punctuation characters (other than underscores) and spaces, punctuation will be either removed and spaces changed to an underscore. (Default = False)
n_comps – if > 0 then the resulting basename will be split using underscores and the return based name will be defined using the n_comps components split by under scores.
rm_n_exts – used where an input file has more than one extension (e.g., tar.gz) and only n extensions should be removed. Default: 0 which will removed all extensions calculated based on the number of full-stops (.) within the file name. If a value of 1 was provided for filename.tar.gz then the returns output would be filename.tar.
- Returns:
basename for file
- rsgislib.tools.filetools.get_dir_name(input_file: str) str
A function which returns just the name of the directory of the input path (file or directory) without the rest of the path.
- Parameters:
input_file – string for the input path (file or directory) name and path
- Returns:
directory name
- rsgislib.tools.filetools.split_path_all(input_path: str) List[str]
A function which splits all the components within a file path into a list of components rather than the os.path.split function which just splits the last item.
- Parameters:
input_path – the input file path.
- Returns:
a list of the file path components.
Searching
- rsgislib.tools.filetools.find_file(dir_path: str, file_search: str) str
Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).
- Parameters:
dir_path – string for the input directory path
file_search – string with a * wildcard for the file being searched for.
- Returns:
string with the path to the file
import rsgislib.tools.filetools file_path = rsgislib.tools.filetools.find_file("in/dir", "*N15W093*.tif")
- rsgislib.tools.filetools.find_file_none(dir_path: str, file_search: str) None | str
Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s). Returns None is not found.
- Parameters:
dir_path – string for the input directory path
file_search – string with a * wildcard for the file being searched for.
- Returns:
string with the path to the file
import rsgislib.tools.filetools file_path = rsgislib.tools.filetools.find_file_none("in/dir", "*N15W093*.tif") if file_path is not None: print(file_path)
- rsgislib.tools.filetools.find_files_ext(dir_path: str, ending: str) dict
Find all the files within a directory structure with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.
- Parameters:
dir_path – the base directory path within which to search.
ending – the file ending (e.g., .txt, or txt or .kea, kea).
- Returns:
dict with file name as key
import rsgislib.tools.filetools file_paths = rsgislib.tools.filetools.find_files_ext("in/dir", ".tif")
- rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths: list, ending: str) dict
Find all the files within a list of input directories and the structure beneath with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.
- Parameters:
dir_paths – a list of base directory paths within which to search.
ending – the file ending (e.g., .txt, or txt or .kea, kea).
- Returns:
dict with file name as key
import rsgislib.tools.filetools dir_paths = ["in/dir", "test/dir", "img/files"] file_paths = rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths, ".tif")
- rsgislib.tools.filetools.find_first_file(dir_path: str, file_search: str, rtn_except: bool = True) str
Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).
- Parameters:
dir_path – The directory within which to search, note that the search will be within sub-directories within the base directory until a file meeting the search criteria are met.
file_search – The file search string in the file name and must contain a wild character (i.e., *).
rtn_except – if True then an exception will be raised if no file or multiple files are found (default). If False then None will be returned rather than an exception raised.
- Returns:
The file found (or None if rtn_except=False)
import rsgislib.tools.filetools file_paths = rsgislib.tools.filetools.find_first_file("in/dir", "*N15W093*.tif")
- rsgislib.tools.filetools.get_files_mod_time(file_lst: list, dt_before: datetime = None, dt_after: datetime = None) list
A function which subsets a list of files based on datetime of last modification. The function also does a check as to whether a file exists, files which don’t exist will be ignored.
- Parameters:
file_lst – The list of file path - represented as strings.
dt_before – a datetime object with a date/time where files modified before this will be returned
dt_after – a datetime object with a date/time where files modified after this will be returned
Example:
import glob import datetime import rsgislib.tools.filetools input_files = glob.glob("in/dir/*.tif") dt_before = datetime.datetime(year=2020, month=12, day=25, hour=12, minute=30) file_path = rsgislib.tools.filetools.get_files_mod_time(input_files, dt_before)
- rsgislib.tools.filetools.find_files_size_limits(dir_path: str, file_search: str, min_size: int = 0, max_size: int = None) list
Search for files with a path using glob. Therefore, the file paths returned is a true path. Within the file_search provide the file names with ‘*’ as wildcard(s).
- Parameters:
dir_path – string for the input directory path
file_search – string with a * wildcard for the file being searched for.
min_size – the minimum file size in bytes (default is 0)
max_size – the maximum file size in bytes, if None (default) then ignored.
- Returns:
string with the path to the file
Example:
import rsgislib.tools.filetools file_paths = rsgislib.tools.filetools.find_files_size_limits("in/dir", "*N15W093*.tif", 0, 100000)
- rsgislib.tools.filetools.get_dir_list(dir_path: str, inc_hidden: bool = False) list
Function which get the list of directories within the specified path.
- Parameters:
dir_path – file path to search within
inc_hidden – boolean specifying whether hidden files should be included (default=False)
- Returns:
list of directory paths
Example:
import rsgislib.tools.filetools files = rsgislib.tools.filetools.get_dir_list("in/dir")
Archives
- rsgislib.tools.filetools.create_directory_archive(in_dir: str, out_arch: str, arch_format: str) str
A function which creates an archive from an input directory. This function uses subprocess to call the appropriate command line function.
Please note that this function has similar functionality to shutil.make_archive and I would recommend you use that but I found it sometimes produces an error so I provided this function which uses the terminal functions as a drop in replacement.
- Parameters:
in_dir – The input directory path for which the archive with be created.
out_arch – The output archive file path and name. Note this should not include an extension as this will be added automatically.
arch_format – The format for the archive. The options are: zip, tar, gztar, bztar, xztar
- Returns:
a string with the full file path and name, including the file extension.
- rsgislib.tools.filetools.create_targz_arch(out_arch_file: str, file_list: list, base_path: str = None)
A function which can be used to create a tar.gz file containing the list of input files. If you wish to remove some of the directory structure from the file paths in provided then a single base_path can be provided and will be removed from the file paths in the archive.
- Parameters:
out_arch_file – the output tar.gz file path
file_list – the list of files to be added to the archive.
base_path – the base path which will be removed from all the input files. Note, this means all the input files must have the same base path. Optional: Default is None (i.e., ignored).
- rsgislib.tools.filetools.untar_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a tar file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters:
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns:
output directory where data was extracted to.
- rsgislib.tools.filetools.untar_gz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a tar.gz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters:
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns:
output directory where data was extracted to.
- rsgislib.tools.filetools.unzip_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a zip file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters:
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns:
output directory where data was extracted to.
- rsgislib.tools.filetools.untar_bz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a tar.bz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters:
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns:
output directory where data was extracted to.
File Info
A function to test whether a file or folder is ‘hidden’ or not on the file system. Should be cross platform between Linux/UNIX and windows.
- Parameters:
dir_path – input file path to be tested
- Returns:
boolean (True = hidden)
Example:
import rsgislib.tools.filetools if rsgislib.tools.filetools.file_is_hidden("in/dir/img.kea"): print("File is hidden")
- rsgislib.tools.filetools.get_file_size(file_path: str, unit: str = 'bytes') float
A function which returns the file size of a file in the specified unit.
Units: * bytes * kb - kilobytes (bytes / 1024) * mb - megabytes (bytes / 1024^2) * gb - gigabytes (bytes / 1024^3) * tb - terabytes (bytes / 1024^4)
- Parameters:
file_path – the path to the file for which the size is to be calculated.
unit – the unit for the file size. Options: bytes, kb, mb, gb, tb
- Returns:
float for the file size.
Sorting
- rsgislib.tools.filetools.sort_imgs_to_dirs_utm(input_imgs_dir: str, file_search_str: str, out_base_dir: str)
A function which will sort a series of input image files which a projected using the UTM system into individual directories per UTM zone. Please note that the input files are moved on your system!!
- Parameters:
input_imgs_dir – directory where the input files are to be found.
file_search_str – the wildcard search string to find files within the input directory (e.g., “in_dir/*.kea”).
out_base_dir – the output directory where the UTM folders will be created and the files copied.
- rsgislib.tools.filetools.natural_sort_file_names(in_file_lst: List[str]) List[str]
A function which performs a natural sort of a list of files. For example, if you start file names with dates (YYYYMMDD) then this function will return the list of file names in date order (earliest first).
- Parameters:
in_file_lst – the input list of file paths. The get_file_basename function is used to extract the file name which is used for the sort.
- Returns:
the sorted list of names.
- rsgislib.tools.filetools.sort_file_by_datetime(in_file_lst: List[str]) List[str]
A function which sorts a list of file based on the list the file was last modified. The list will be outputted in ascending order (i.e., oldest to earliest). The python function os.path.getmtime is used to access the modified file for the file.
- Parameters:
in_file_lst – the input list of file paths, which need to be accessible.
- Returns:
the sorted list of names.
Deleting
- rsgislib.tools.filetools.delete_file_with_basename(input_file: str, print_rms=True)
Function to delete all the files which have a path and base name defined in the input_file attribute.
- Parameters:
input_file – string for the input file name and path
print_rms – print the files being deleted (Default: True)
- rsgislib.tools.filetools.delete_file_silent(input_file: str) bool
A function which can be used in-place of os.remove to delete a file but if checks if the file exists and only calls os.remove if it does exist but also catches any Exceptions from os.remove and just returns a boolean as to whether the input_file has been removed.
- Parameters:
input_file – input file path for the file which is to be removed.
- Returns:
boolean (True: File was removed or did not exist. False: os.remove through an Exception so assume file was not removed)
- rsgislib.tools.filetools.rm_files_size_gt(file_path: str, file_srch: str, min_size: int, rm_file: bool = False, rm_use_basename: bool = False)
A function which removes all the files from the search path which are greater than the specified size.
Note, the file_path and file_srch will be merged with os.path.join. e.g., file_path=”/hello/world”, file_srch=”.txt” would result in “/hello/world/.txt”. Wild characters can get put in both parts if needed.
- Parameters:
file_path – The directory within which the files will be search for.
file_srch – The search string (must have a wild card ‘*’ for glob).
min_size – the minimum valid size, above this size the files will be deleted. In bytes.
rm_file – If True then files will be deleted if False then a list of ‘rm file’ commands will be produced rather than the files actually being deleted. (default: False)
rm_use_basename – If True then all files with the same base name (i.e., same name but different file extension) within the same directory will also be deleted. Useful if you have file formats which have multiple files. (default: False)
Lock Files
- rsgislib.tools.filetools.get_file_lock(input_file: str, sleep_period: int = 1, wait_iters: int = 120, use_except: bool = False) bool
A function which gets a lock on a file.
The lock file will be a unix hidden file (i.e., starts with a .) and it will have .lok added to the end. E.g., for input file hello_world.txt the lock file will be .hello_world.txt.lok. The contents of the lock file will be the time and date of creation.
Using the default parameters (sleep 1 second and wait 120 iterations) if the lock isn’t available it will be retried every second for 120 seconds (i.e., 2 mins).
- Parameters:
input_file – The input file for which the lock will be created.
sleep_period – time in seconds to sleep for, if the lock isn’t available. (Default=1 second)
wait_iters – the number of iterations to wait for before giving up. (Default=120)
use_except – Boolean. If True then an exception will be thrown if the lock is not available. If False (default) False will be returned if the lock is not successful.
- Returns:
boolean. True: lock was successfully gained. False: lock was not gained.
- rsgislib.tools.filetools.release_file_lock(input_file: str)
A function which releases a lock file for the input file.
- Parameters:
input_file – The input file for which the lock will be created.
- rsgislib.tools.filetools.clean_file_locks(dir_path: str, timeout: int = 3600)
A function which cleans up any remaining lock file (i.e., if an application has crashed). The timeout time will be compared with the time written within the file.
- Parameters:
dir_path – the file path to search for lock files (i.e., “.*.lok”)
timeout – the time (in seconds) for the timeout. Default: 3600 (1 hours)
File Hash
- rsgislib.tools.filetools.create_sha1_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA1 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA1 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA1 hash string of the file.
- rsgislib.tools.filetools.create_sha224_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA224 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA224 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA224 hash string of the file.
- rsgislib.tools.filetools.create_sha256_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA256 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA256 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA256 hash string of the file.
- rsgislib.tools.filetools.create_sha384_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA384 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA384 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA384 hash string of the file.
- rsgislib.tools.filetools.create_sha512_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA512 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA512 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA512 hash string of the file.
- rsgislib.tools.filetools.create_md5_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the MD5 hash string of the input file.
- Parameters:
input_file – the input file for which the MD5 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
MD5 hash string of the file.
- rsgislib.tools.filetools.create_blake2b_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the Blake2B hash string of the input file.
- Parameters:
input_file – the input file for which the Blake2B hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
Blake2B hash string of the file.
- rsgislib.tools.filetools.create_blake2s_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the Blake2S hash string of the input file.
- Parameters:
input_file – the input file for which the Blake2S hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
Blake2S hash string of the file.
- rsgislib.tools.filetools.create_sha3_224_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_224 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA3_224 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA3_224 hash string of the file.
- rsgislib.tools.filetools.create_sha3_256_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_256 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA3_256 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA3_256 hash string of the file.
- rsgislib.tools.filetools.create_sha3_384_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_384 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA3_384 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA3_384 hash string of the file.
- rsgislib.tools.filetools.create_sha3_512_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_512 hash string of the input file.
- Parameters:
input_file – the input file for which the SHA3_512 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns:
SHA3_512 hash string of the file.
Other
- rsgislib.tools.filetools.convert_file_size_units(in_size: int, in_unit: str, out_unit: str) float
A function which converts between file size units
- Parameters:
in_size – input file size
in_unit – the input unit for the file size. Options: bytes, kb, mb, gb, tb
out_unit – the output unit for the file size. Options: bytes, kb, mb, gb, tb
- Returns:
float for the output file size