RSGISLib File Tools
Naming
- rsgislib.tools.filetools.get_file_basename(input_file: str, check_valid: bool = False, n_comps: int = 0, rm_n_exts: int = 0) str
Uses os.path module to return file basename (i.e., path and extension removed)
- Parameters
input_file – string for the input file name and path
check_valid – if True then resulting basename will be checked for punctuation characters (other than underscores) and spaces, punctuation will be either removed and spaces changed to an underscore. (Default = False)
n_comps – if > 0 then the resulting basename will be split using underscores and the return based name will be defined using the n_comps components split by under scores.
rm_n_exts – used where an input file has more than one extension (e.g., tar.gz) and only n extensions should be removed. Default: 0 which will removed all extensions calculated based on the number of full-stops (.) within the file name. If a value of 1 was provided for filename.tar.gz then the returns output would be filename.tar.
- Returns
basename for file
- rsgislib.tools.filetools.get_dir_name(input_file: str) str
A function which returns just the name of the directory of the input path (file or directory) without the rest of the path.
- Parameters
input_file – string for the input path (file or directory) name and path
- Returns
directory name
- rsgislib.tools.filetools.split_path_all(input_path: str) List[str]
A function which splits all the components within a file path into a list of components rather than the os.path.split function which just splits the last item.
- Parameters
input_path – the input file path.
- Returns
a list of the file path components.
Searching
- rsgislib.tools.filetools.find_file(dir_path: str, file_search: str) str
Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).
- Parameters
dir_path – string for the input directory path
file_search – string with a * wildcard for the file being searched for.
- Returns
string with the path to the file
import rsgislib.tools.filetools file_path = rsgislib.tools.filetools.find_file("in/dir", "*N15W093*.tif")
- rsgislib.tools.filetools.find_file_none(dir_path: str, file_search: str) Union[None, str]
Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s). Returns None is not found.
- Parameters
dir_path – string for the input directory path
file_search – string with a * wildcard for the file being searched for.
- Returns
string with the path to the file
import rsgislib.tools.filetools file_path = rsgislib.tools.filetools.find_file_none("in/dir", "*N15W093*.tif") if file_path is not None: print(file_path)
- rsgislib.tools.filetools.find_files_ext(dir_path: str, ending: str) dict
Find all the files within a directory structure with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.
- Parameters
dir_path – the base directory path within which to search.
ending – the file ending (e.g., .txt, or txt or .kea, kea).
- Returns
dict with file name as key
import rsgislib.tools.filetools file_paths = rsgislib.tools.filetools.find_files_ext("in/dir", ".tif")
- rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths: list, ending: str) dict
Find all the files within a list of input directories and the structure beneath with a specific file ending. The files are return as dictionary using the file name as the dictionary key. This means you cannot have files with the same name within the structure.
- Parameters
dir_paths – a list of base directory paths within which to search.
ending – the file ending (e.g., .txt, or txt or .kea, kea).
- Returns
dict with file name as key
import rsgislib.tools.filetools dir_paths = ["in/dir", "test/dir", "img/files"] file_paths = rsgislib.tools.filetools.find_files_mpaths_ext(dir_paths, ".tif")
- rsgislib.tools.filetools.find_first_file(dir_path: str, file_search: str, rtn_except: bool = True) str
Search for a single file with a path using glob. Therefore, the file path returned is a true path. Within the file_search provide the file name with ‘*’ as wildcard(s).
- Parameters
dir_path – The directory within which to search, note that the search will be within sub-directories within the base directory until a file meeting the search criteria are met.
file_search – The file search string in the file name and must contain a wild character (i.e., *).
rtn_except – if True then an exception will be raised if no file or multiple files are found (default). If False then None will be returned rather than an exception raised.
- Returns
The file found (or None if rtn_except=False)
import rsgislib.tools.filetools file_paths = rsgislib.tools.filetools.find_first_file("in/dir", "*N15W093*.tif")
- rsgislib.tools.filetools.get_files_mod_time(file_lst: list, dt_before: Optional[datetime.datetime] = None, dt_after: Optional[datetime.datetime] = None) list
A function which subsets a list of files based on datetime of last modification. The function also does a check as to whether a file exists, files which don’t exist will be ignored.
- Parameters
file_lst – The list of file path - represented as strings.
dt_before – a datetime object with a date/time where files modified before this will be returned
dt_after – a datetime object with a date/time where files modified after this will be returned
Example:
import glob import datetime import rsgislib.tools.filetools input_files = glob.glob("in/dir/*.tif") dt_before = datetime.datetime(year=2020, month=12, day=25, hour=12, minute=30) file_path = rsgislib.tools.filetools.get_files_mod_time(input_files, dt_before)
- rsgislib.tools.filetools.find_files_size_limits(dir_path: str, file_search: str, min_size: int = 0, max_size: Optional[int] = None) list
Search for files with a path using glob. Therefore, the file paths returned is a true path. Within the file_search provide the file names with ‘*’ as wildcard(s).
- Parameters
dir_path – string for the input directory path
file_search – string with a * wildcard for the file being searched for.
min_size – the minimum file size in bytes (default is 0)
max_size – the maximum file size in bytes, if None (default) then ignored.
- Returns
string with the path to the file
Example:
import rsgislib.tools.filetools file_paths = rsgislib.tools.filetools.find_files_size_limits("in/dir", "*N15W093*.tif", 0, 100000)
- rsgislib.tools.filetools.get_dir_list(dir_path: str, inc_hidden: bool = False) list
Function which get the list of directories within the specified path.
- Parameters
dir_path – file path to search within
inc_hidden – boolean specifying whether hidden files should be included (default=False)
- Returns
list of directory paths
Example:
import rsgislib.tools.filetools files = rsgislib.tools.filetools.get_dir_list("in/dir")
Archives
- rsgislib.tools.filetools.create_directory_archive(in_dir: str, out_arch: str, arch_format: str) str
A function which creates an archive from an input directory. This function uses subprocess to call the appropriate command line function.
Please note that this function has similar functionality to shutil.make_archive and I would recommend you use that but I found it sometimes produces an error so I provided this function which uses the terminal functions as a drop in replacement.
- Parameters
in_dir – The input directory path for which the archive with be created.
out_arch – The output archive file path and name. Note this should not include an extension as this will be added automatically.
arch_format – The format for the archive. The options are: zip, tar, gztar, bztar, xztar
- Returns
a string with the full file path and name, including the file extension.
- rsgislib.tools.filetools.create_targz_arch(out_arch_file: str, file_list: list, base_path: Optional[str] = None)
A function which can be used to create a tar.gz file containing the list of input files. If you wish to remove some of the directory structure from the file paths in provided then a single base_path can be provided and will be removed from the file paths in the archive.
- Parameters
out_arch_file – the output tar.gz file path
file_list – the list of files to be added to the archive.
base_path – the base path which will be removed from all the input files. Note, this means all the input files must have the same base path. Optional: Default is None (i.e., ignored).
- rsgislib.tools.filetools.untar_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a tar file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns
output directory where data was extracted to.
- rsgislib.tools.filetools.untar_gz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a tar.gz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns
output directory where data was extracted to.
- rsgislib.tools.filetools.unzip_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a zip file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns
output directory where data was extracted to.
- rsgislib.tools.filetools.untar_bz_file(in_file: str, out_dir: str, gen_arch_dir: bool = True, verbose: bool = False) str
A function which extracts data from a tar.bz file into the specified output directory. Optionally, an output directory of the same name as the archive file can be created for the output files.
- Parameters
in_file – The input archive file.
out_dir – The output directory which must exist (if gen_arch_dir=True then a new directory will be created within the out_dir
gen_arch_dir – Create a new directory with the same name as the input file where the output files will be extracted to. (Default: True)
verbose – If True (default: False) then more user feedback will be printed to the console.
- Returns
output directory where data was extracted to.
File Info
A function to test whether a file or folder is ‘hidden’ or not on the file system. Should be cross platform between Linux/UNIX and windows.
- Parameters
dir_path – input file path to be tested
- Returns
boolean (True = hidden)
Example:
import rsgislib.tools.filetools if rsgislib.tools.filetools.file_is_hidden("in/dir/img.kea"): print("File is hidden")
- rsgislib.tools.filetools.get_file_size(file_path: str, unit: str = 'bytes') float
A function which returns the file size of a file in the specified unit.
Units: * bytes * kb - kilobytes (bytes / 1024) * mb - megabytes (bytes / 1024^2) * gb - gigabytes (bytes / 1024^3) * tb - terabytes (bytes / 1024^4)
- Parameters
file_path – the path to the file for which the size is to be calculated.
unit – the unit for the file size. Options: bytes, kb, mb, gb, tb
- Returns
float for the file size.
Sorting
- rsgislib.tools.filetools.sort_imgs_to_dirs_utm(input_imgs_dir: str, file_search_str: str, out_base_dir: str)
A function which will sort a series of input image files which a projected using the UTM system into individual directories per UTM zone. Please note that the input files are moved on your system!!
- Parameters
input_imgs_dir – directory where the input files are to be found.
file_search_str – the wildcard search string to find files within the input directory (e.g., “in_dir/*.kea”).
out_base_dir – the output directory where the UTM folders will be created and the files copied.
Deleting
- rsgislib.tools.filetools.delete_file_with_basename(input_file: str, print_rms=True)
Function to delete all the files which have a path and base name defined in the input_file attribute.
- Parameters
input_file – string for the input file name and path
print_rms – print the files being deleted (Default: True)
- rsgislib.tools.filetools.delete_file_silent(input_file: str) bool
A function which can be used in-place of os.remove to delete a file but if checks if the file exists and only calls os.remove if it does exist but also catches any Exceptions from os.remove and just returns a boolean as to whether the input_file has been removed.
- Parameters
input_file – input file path for the file which is to be removed.
- Returns
boolean (True: File was removed or did not exist. False: os.remove through an Exception so assume file was not removed)
Lock Files
- rsgislib.tools.filetools.get_file_lock(input_file: str, sleep_period: int = 1, wait_iters: int = 120, use_except: bool = False) bool
A function which gets a lock on a file.
The lock file will be a unix hidden file (i.e., starts with a .) and it will have .lok added to the end. E.g., for input file hello_world.txt the lock file will be .hello_world.txt.lok. The contents of the lock file will be the time and date of creation.
Using the default parameters (sleep 1 second and wait 120 iterations) if the lock isn’t available it will be retried every second for 120 seconds (i.e., 2 mins).
- Parameters
input_file – The input file for which the lock will be created.
sleep_period – time in seconds to sleep for, if the lock isn’t available. (Default=1 second)
wait_iters – the number of iterations to wait for before giving up. (Default=120)
use_except – Boolean. If True then an exception will be thrown if the lock is not available. If False (default) False will be returned if the lock is not successful.
- Returns
boolean. True: lock was successfully gained. False: lock was not gained.
- rsgislib.tools.filetools.release_file_lock(input_file: str)
A function which releases a lock file for the input file.
- Parameters
input_file – The input file for which the lock will be created.
- rsgislib.tools.filetools.clean_file_locks(dir_path: str, timeout: int = 3600)
A function which cleans up any remaining lock file (i.e., if an application has crashed). The timeout time will be compared with the time written within the file.
- Parameters
dir_path – the file path to search for lock files (i.e., “.*.lok”)
timeout – the time (in seconds) for the timeout. Default: 3600 (1 hours)
File Hash
- rsgislib.tools.filetools.create_sha1_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA1 hash string of the input file.
- Parameters
input_file – the input file for which the SHA1 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA1 hash string of the file.
- rsgislib.tools.filetools.create_sha224_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA224 hash string of the input file.
- Parameters
input_file – the input file for which the SHA224 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA224 hash string of the file.
- rsgislib.tools.filetools.create_sha256_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA256 hash string of the input file.
- Parameters
input_file – the input file for which the SHA256 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA256 hash string of the file.
- rsgislib.tools.filetools.create_sha384_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA384 hash string of the input file.
- Parameters
input_file – the input file for which the SHA384 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA384 hash string of the file.
- rsgislib.tools.filetools.create_sha512_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA512 hash string of the input file.
- Parameters
input_file – the input file for which the SHA512 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA512 hash string of the file.
- rsgislib.tools.filetools.create_md5_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the MD5 hash string of the input file.
- Parameters
input_file – the input file for which the MD5 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
MD5 hash string of the file.
- rsgislib.tools.filetools.create_blake2b_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the Blake2B hash string of the input file.
- Parameters
input_file – the input file for which the Blake2B hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
Blake2B hash string of the file.
- rsgislib.tools.filetools.create_blake2s_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the Blake2S hash string of the input file.
- Parameters
input_file – the input file for which the Blake2S hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
Blake2S hash string of the file.
- rsgislib.tools.filetools.create_sha3_224_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_224 hash string of the input file.
- Parameters
input_file – the input file for which the SHA3_224 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA3_224 hash string of the file.
- rsgislib.tools.filetools.create_sha3_256_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_256 hash string of the input file.
- Parameters
input_file – the input file for which the SHA3_256 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA3_256 hash string of the file.
- rsgislib.tools.filetools.create_sha3_384_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_384 hash string of the input file.
- Parameters
input_file – the input file for which the SHA3_384 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA3_384 hash string of the file.
- rsgislib.tools.filetools.create_sha3_512_hash(input_file: str, block_size: int = 4096) str
A function which calculates finds the SHA3_512 hash string of the input file.
- Parameters
input_file – the input file for which the SHA3_512 hash string with be found.
block_size – the size of the blocks the file is read in in bytes (default 4096; i.e., 4kb)
- Returns
SHA3_512 hash string of the file.
Other
- rsgislib.tools.filetools.convert_file_size_units(in_size: int, in_unit: str, out_unit: str) float
A function which converts between file size units
- Parameters
in_size – input file size
in_unit – the input unit for the file size. Options: bytes, kb, mb, gb, tb
out_unit – the output unit for the file size. Options: bytes, kb, mb, gb, tb
- Returns
float for the output file size