RSGISLib Data Sources
This module has functions to help with searching for and downloading data. Tutorials showing how to use these functions available here
USGS Earth Explorer
- rsgislib.dataaccess.usgs_m2m.usgs_login(username: str = None, password: str = None) str
A function to login to the USGS m2m service.
- Parameters:
username – Your username for USGS EarthExplorer. If RSGIS_USGS_USER environmental variable is specified then username will read from there is None is passed (Default: None)
password – Your password for USGS EarthExplorer. If RSGIS_USGS_PASS environmental variable is specified then password will read from there is None is passed (Default: None)
- Returns:
the API key for the USGS session.
- rsgislib.dataaccess.usgs_m2m.usgs_logout(api_key: str)
Log out of the USGS m2m system using the api_key created at login. :param api_key: The API key created at login to authenticate.
- rsgislib.dataaccess.usgs_m2m.can_user_dwnld(api_key: str) bool
Does the user logged in with the api_key have permission to download data.
- Parameters:
api_key – The API key created at login to authenticate.
- Returns:
boolean - True does have permission.
- rsgislib.dataaccess.usgs_m2m.can_user_order(api_key: str) bool
Does the user logged in with the api_key have permission to order data.
- Parameters:
api_key – The API key created at login to authenticate.
- Returns:
boolean - True does have permission.
- rsgislib.dataaccess.usgs_m2m.get_wrs_pt(api_key: str, row: int, path: int, grid_version: int = 2) -> (<class 'float'>, <class 'float'>)
Get a point for the WRS row/path which can be used for a query.
- Parameters:
api_key – The API key created at login to authenticate.
row – integer for row
path – integer for path
grid_version – Whether the row/path is WRS1 or WRS2. Default: WRS2.
- Returns:
longitude, latitude
- rsgislib.dataaccess.usgs_m2m.get_wrs_bbox(api_key: str, row: int, path: int, grid_version: int = 2) -> (<class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>)
Get a bbox for the WRS row/path which can be used for a query.
- Parameters:
api_key – The API key created at login to authenticate.
row – integer for row
path – integer for path
grid_version – Whether the row/path is WRS1 or WRS2. Default: WRS2.
- Returns:
BBOX in lon/lat (x_min, x_max, y_min, y_max)
- rsgislib.dataaccess.usgs_m2m.usgs_search(dataset: str, api_key: str, start_date: datetime = None, end_date: datetime = None, cloud_min: int = 0, cloud_max: int = None, pt: List = None, bbox: List = None, poly_geom: str = None, months: List[int] = None, full_meta: bool = False, max_n_rslts: int = None, start_n: int = None)
A function to search for landsat imagery from the USGS.
- Parameters:
dataset – The name of the dataset to query.
api_key – The API key created at login to authenticate.
start_date – Start date as a datetime object. (Earlier date)
end_date – End date as a datetime object. (Later date)
cloud_min – Minimum cloud cover (Default: 0)
cloud_max – Maximum cloud cover.
bbox – (MinX, MaxX, MinY, MaxY)
pt – (X, Y)
poly_geom – NOT IMPLEMENTED YET!
months – List of months as ints (1-12) you want to limit the search for.
full_meta – Full metadata returned (Default: False)
max_n_rslts – the maximum number of scenes to be returned (cannot be larger than 100 - if larger than 100 then use get_all_usgs_search function.
start_n – The scene number to start the data retrieval from. Note you probably don’t want to use this parameter but use the get_all_usgs_search function.
- Returns:
List of scenes found and Dict of meta-data for the number of scenes available.
- rsgislib.dataaccess.usgs_m2m.get_all_usgs_search(dataset: str, api_key: str, max_n_rslts: int = 1000, start_date: datetime = None, end_date: datetime = None, cloud_min: int = 0, cloud_max: int = None, pt: List = None, bbox: List = None, poly_geom: str = None, months: List[int] = None, full_meta: bool = False) List
Uses the usgs_search function to retrive multiple ‘pages’ of search results. So, if you need more than 100 scenes you can use this function to undertake the multiple queries required and merge the results into a single list.
- Parameters:
dataset – The name of the dataset to query.
api_key – The API key created at login to authenticate.
max_n_rslts – The maximum number of scenes you want returned.
start_date – Start date as a datetime object. (Earlier date)
end_date – End date as a datetime object. (Later date)
cloud_min – Minimum cloud cover (Default: 0)
cloud_max – Maximum cloud cover.
bbox – (MinX, MaxX, MinY, MaxY)
pt – (X, Y)
poly_geom – NOT IMPLEMENTED YET!
months – List of months as ints (1-12) you want to limit the search for.
full_meta – Full metadata returned (Default: False)
- Returns:
List of scenes found through the query.
- rsgislib.dataaccess.usgs_m2m.get_download_ids(scns, bulk=False)
A function for extracting a list of display and entity IDs from a list of scenes as would have been returned by from a search query.
- Parameters:
scns – a list of the scenes
bulk – If True then only scenes available for bulk download will be outputted.
- Returns:
List of display IDs, List of Entity IDs
- rsgislib.dataaccess.usgs_m2m.create_scene_list(api_key: str, dataset: str, scn_ent_ids: List[str], lst_name: str, lst_period: str = 'P1W') int
A function which creates a list of scenes on the system which could be downloaded.
ISO 8601 duration format: P(n)Y(n)M(n)DT(n)H(n)M(n)S
- Where:
- P is the duration designator (referred to as “period”), and is always placed
at the beginning of the duration.
Y is the year designator that follows the value for the number of years. M is the month designator that follows the value for the number of months. W is the week designator that follows the value for the number of weeks. D is the day designator that follows the value for the number of days. T is the time designator that precedes the time components. H is the hour designator that follows the value for the number of hours. M is the minute designator that follows the value for the number of minutes. S is the second designator that follows the value for the number of seconds.
For example: “P3Y6M4DT12H30M5S” = A duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.
- Parameters:
api_key – The API key created at login to authenticate user.
dataset – name of the dataset
scn_ent_ids – list of entity IDs
lst_name – a name for the list - can be anything you want but should be meaningful to you.
lst_period – Period the list will exist for in ISO 8601 duration format. Default is P1W (i.e., 1 week).
- Returns:
Number of scenes added.
- rsgislib.dataaccess.usgs_m2m.remove_scene_list(api_key: str, lst_name: str)
A function to remove a scene list from the system.
- Parameters:
api_key – The API key created at login to authenticate user.
lst_name – a name for the list. Defined by create_scene_list.
- rsgislib.dataaccess.usgs_m2m.check_dwnld_opts(api_key: str, lst_name: str, dataset: str, dwnld_filetype: str = 'bundle', rm_lst: bool = True) List[Dict[str, str]]
- Parameters:
api_key – The API key created at login to authenticate user.
lst_name – A name for the list - Defined by create_scene_list.
dataset – name of the dataset
dwnld_filetype – What you want to download. Options: bundle, band or all Default: is bundle which will be a tar.gz with all the files for the scene.
rm_lst – bool specifying whether the list should be deleted once the processing has finished.
- Returns:
returns a list of dicts with the entityId and productId.
- rsgislib.dataaccess.usgs_m2m.request_downloads(api_key: str, dwlds_lst: List[Dict[str, str]], dwnld_label: str)
A function to request download URLs for a download list (from check_dwnld_opts)
- Parameters:
api_key – The API key created at login to authenticate user.
dwlds_lst – The available download list created and returned by check_dwnld_opts.
dwnld_label – A name for the download - can be anything you want but should be meaningful to you.
- Returns:
a dict of download IDs and URL which are ready to be downloaded and a list of downloads IDs for scenes which are being prepared for download.
NASA Common Metadata Repository
- rsgislib.dataaccess.nasa_cmr.get_prods_info(prod_short_name: str) List[Dict]
A function which returns information for a product available from the CMR.
Available products can be found here: https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-standard-products
- Parameters:
prod_short_name – The name of the product you are interested in.
- Returns:
A list of products (probably different versions).
- rsgislib.dataaccess.nasa_cmr.check_prod_version_avail(prod_short_name: str, version: str) bool
A function which checks if a version is available.
- Parameters:
prod_short_name – the product short name for the product of interest.
version – the version of the product to be retrieved.
- Returns:
Boolean specifying whether the version is available.
- rsgislib.dataaccess.nasa_cmr.get_max_prod_version(prod_short_name: str) str
A function which attempts to find the highest (latest) version for a product.
- Parameters:
prod_short_name – the product short name for the product of interest.
- Returns:
string representation of the highest version.
- rsgislib.dataaccess.nasa_cmr.find_granules(prod_short_name: str, version: str, only_dnwld: bool = True, bbox: List[float] = None, pt: List[float] = None, start_date: datetime = None, end_date: datetime = None, cloud_min: int = 0, cloud_max: int = None, sort_date: bool = True, sort_desc: bool = True, page_size: int = 100, page_num: int = 1, other_params: Dict[str, str] = None) List[Dict]
A function which will find granules from the CMR system for the product of interest using the search parameters provided.
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#granule-search-by-parameters
- Parameters:
prod_short_name – the product short name for the product of interest.
version – the version of the product to be retrieved.
only_dnwld – If true (default)
bbox – (MinX, MaxX, MinY, MaxY)
pt – (X, Y)
start_date – Start date as a datetime object. (Earlier date)
end_date – End date as a datetime object. (Later date)
cloud_min – Minimum cloud cover (Default: 0)
cloud_max – Maximum cloud cover.
sort_date – Sort the response by the acquisition date
sort_desc – Sort order (ascending or descending). Ascending: oldest version. Descending: newest version.
page_size – The number of records to be returned by a single query as a ‘page’.
page_num – The page number to be retrieved allowing results greater than the number which will fit on a single page to be retrieved.
other_params – A dict of other parameters where the key is the search parameter name and the value is the value to search with.
- Returns:
A list of dictionaries with a dictionary for item.
- rsgislib.dataaccess.nasa_cmr.find_all_granules(prod_short_name: str, version: str, only_dnwld: bool = True, bbox: List[float] = None, pt: List[float] = None, start_date: datetime = None, end_date: datetime = None, cloud_min: int = 0, cloud_max: int = None, sort_date: bool = True, sort_desc: bool = True, page_size: int = 100, max_n_pages: int = 100, other_params: Dict[str, str] = None) List[Dict]
A function which will find granules from the CMR system for the product of interest using the search parameters provided using the find_granules function but iterates through all the pages available to return all the available granules rather than just a single page.
- Parameters:
prod_short_name – the product short name for the product of interest.
version – the version of the product to be retrieved.
only_dnwld – If true (default)
bbox – (MinX, MaxX, MinY, MaxY)
pt – (X, Y)
start_date – Start date as a datetime object. (Earlier date)
end_date – End date as a datetime object. (Later date)
cloud_min – Minimum cloud cover (Default: 0)
cloud_max – Maximum cloud cover.
sort_date – Sort the response by the acquisition date
sort_desc – Sort order (ascending or descending). Ascending: oldest version. Descending: newest version.
page_size – The number of records to be returned by a single query as a ‘page’. (Default: 100)
max_n_pages – the maximum number of pages returned (Default: 100)
other_params – A dict of other parameters where the key is the search parameter name and the value is the value to search with.
- Returns:
A list of dictionaries with a dictionary for item.
- rsgislib.dataaccess.nasa_cmr.get_total_file_size(granule_lst: List[Dict]) float
A function which using the list granules to sum the total file size of the granules in the list. The file size units are whatever has been use for the product but seems to be usually be MegaBytes (MB).
- Parameters:
granule_lst – List of granules from find_granules or find_all_granules
- Returns:
float for the total file size.
- rsgislib.dataaccess.nasa_cmr.cmr_download_file_http(input_url: str, out_file_path: str, username: str, password: str, no_except: bool = True) bool
- Parameters:
input_url – The input remote URL to be downloaded.
out_file_path – the local file path and file name
username – the username for the server
password – the password for the server
- Returns:
boolean as to whether the file was successfully downloaded or not.
- rsgislib.dataaccess.nasa_cmr.create_cmr_dwnld_db(db_json: str, granule_lst: List[Dict], dwnld_file_mime_type: str) List[str]
A function which iterates through a granule list and builds a json database of file to be downloaded. The database can then be used to keep track of which files have been successfully downloaded allowing those which haven’t downloaded to be tried again.
For the mine types, print a few of the granule list of the terminal (i.e., pprint.pprint(granule_lst) and check what the mime type is for the download you want. However, for GEDI data it is probably application/x-hdfeos.
- Parameters:
db_json – The file path for the databases JSON file.
granule_lst – List of granules from find_granules or find_all_granules
dwnld_file_mime_type – (e.g., application/x-hdfeos, application/x-hdf)
- Returns:
a List of producer_granule_id’s for the granules where a URL could not be found.
- rsgislib.dataaccess.nasa_cmr.download_granules_use_dwnld_db(db_json: str, out_path: str, user_pass_file: str, use_wget: bool = False)
A function which can use the JSON database built by create_cmr_dwnld_db to batch download a set of files keeping track of those which where successfully downloaded and those that were unsuccessful.
- Parameters:
db_json – file path for the JSON db file.
out_path – the output path where data should be downloaded to.
user_pass_file – path to an encoded (base64) username/password file
use_wget – boolean as to whether to use wget to download files or a pure python function. (Default: False - i.e., pure python).
Copernicus Data Space Ecosystem
Note
See Copernicus OData API for constants.
- rsgislib.dataaccess.copernicus_odata.get_access_token(username: str = None, password: str = None) str
A function to get the access token from the Copernicus Data Space Ecosystem.
- Parameters:
username – Your username for the Copernicus Data Space Ecosystem. If RSGIS_COP_USER environmental variable is specified then username will read from there is None is passed (Default: None)
password – Your password for the Copernicus Data Space Ecosystem. If RSGIS_COP_PASS environmental variable is specified then password will read from there is None is passed (Default: None)
- Returns:
the access token Copernicus Open Access Hub.
- rsgislib.dataaccess.copernicus_odata.query_scn(scn_name: str) Dict
A function which queries for a single scene using the scene name for example: S2B_MSIL2A_20240602T112119_N0510_R037_T30UVD_20240602T125034.SAFE
- Parameters:
scn_name – name of the scene to be found
- Returns:
dictionary of the information for the scene
- rsgislib.dataaccess.copernicus_odata.query_scn_lst(sensor: int, bbox: Tuple[float, float, float, float] | List[float], start_date: datetime = None, end_date: datetime = None, cloud_cover: float = None, orbit_dir: int = None, product_type: int = None, order_by: int = 1, max_n_rslts: int = 25, start_n: int = None) List[Dict]
A function which uses the Copernicus Data Space Ecosystem OData API to find lists of scenes using criteria to filter the return list of scenes.
- Parameters:
sensor – The sensor / product being search for (e.g., RSGIS_ODATA_SEN1 or RSGIS_ODATA_SEN2)
bbox – is a bbox (xMin, xMax, yMin, yMax) in EPSG:4326
start_date – a datetime object representing the start date (i.e., earlier date)
end_date – a datetime object representing the end date (i.e., later date)
cloud_cover – value between 0-100 where scenes with cloud cover less than the threshold will be returned
orbit_dir – The orbit direction ascending (RSGIS_ODATA_ORBIT_DIR_ASC) or descending (RSGIS_ODATA_ORBIT_DIR_DESC)
product_type – The product type specified as RSGIS_ODATA_PROD_TYPE* (e.g., RSGIS_ODATA_PROD_TYPE_S1_SLC, RSGIS_ODATA_PROD_TYPE_S1_GRD, RSGIS_ODATA_PROD_TYPE_S2_MSI_1C, RSGIS_ODATA_PROD_TYPE_S2_MSI_2A)
order_by – Order by date either ascending (RSGIS_ODATA_ORDERBY_ASC) or descending (RSGIS_ODATA_ORDERBY_DESC)
max_n_rslts – Maximum number of scenes that will be returned (default: 25)
start_n – An offset skipping the first n scenes. Can be used in combination with max_n_rslts to query in ‘pages’
- Returns:
returns list of dictionaries containing all scene information
- rsgislib.dataaccess.copernicus_odata.download_scn(access_token: str, scn_info: Dict, out_path: str)
A function which downloads a single scene to the out_path. Note, during the download the file will be given the extension .incomplete until the is complete when it will be renamed. If available the MD5 checksum of the file will be checked.
- Parameters:
access_token – The access token to download the scene use get_access_token function to generate the access token.
scn_info – A dictionary with the scene information from query_scn or query_scn_lst functions.
out_path – The output path where the scene will be saved.
- rsgislib.dataaccess.copernicus_odata.download_scns(access_token: str, scns_info: List[Dict], out_path: str, no_except: bool = True)
A function which loops through a list of scenes and downloads the datasets using the download_scn function. Option to print exceptions rather than stopping so all available scenes are downloaded.
- Parameters:
access_token – The access token to download the scene use get_access_token function to generate the access token.
scns_info – A list of dictionaries with the scene information from query_scn or query_scn_lst functions.
out_path – The output path where the scene will be saved.
no_except – If True (Default) then expections are not outputted.
- rsgislib.dataaccess.copernicus_odata.get_sensor_collection_name(sensor: int) str
A function which returns the name of a sensor collection for the RSGIS_ODATA_* sensor specified. This function is primarily used internally by the functions in this module.
- Parameters:
sensor – RSGIS_ODATA_* sensor
- Returns:
ODATA string for the sensor
Planet Data
Note
See Planet Data API for constants.
- rsgislib.dataaccess.planet_data_api.planet_auth(username: str = None, password: str = None, api_key: str = None)
A function to authenticate with planet.
- Parameters:
username – Your username for the planet. If RSGIS_PLANET_USER environmental variable is specified then username will read from there is None is passed (Default: None)
password – Your password for the Planet. If RSGIS_PLANET_PASS environmental variable is specified then password will read from there is None is passed (Default: None)
api_key – Your api key for Planet API. If RSGIS_PLANET_API_KEY then the api key will be read from there. Alternatively, the PL_API_KEY environment variable can be specified. (Default: None)
- Returns:
returns a planet.Auth object
- rsgislib.dataaccess.planet_data_api.run_search_planet_items(planet_auth, item_type: int, bbox: Tuple[float, float, float, float] | List[float], start_date: datetime = None, end_date: datetime = None, cloud_cover: float = None, sun_elevation_min: float = None, sun_elevation_max: float = None, view_angle_min: float = None, view_angle_max: float = None, max_n_rslts: int = 25) List[Dict]
A function which searches the planet API to find scenes/items
- Parameters:
planet_auth – A planet.Auth object which can be created uing the rsgislib.dataccess.planet_data_api.planet_auth function.
item_type – The type of item to be downloaded (RSGIS_PLANET_ITEM_*)
bbox – is a bbox (xMin, xMax, yMin, yMax) in EPSG:4326 defining the region of interest.
start_date – a datetime object representing the start date (i.e., earlier date)
end_date – a datetime object representing the end date (i.e., later date)
cloud_cover – value between 0-100 where scenes with cloud cover less than the threshold will be returned. If None (default) then ignored.
sun_elevation_min – the minimum solar elevation (in degrees). If None (default) then ignored.
sun_elevation_max – the maximum solar elevation (in degrees). If None (default) then ignored.
view_angle_min – the minimum view angle (in degrees). If None (default) then ignored.
view_angle_max – the maximum view angle (in degrees). If None (default) then ignored.
max_n_rslts – The maximum number of results to return.
- Returns:
A list of dictionaries containing all scene items.
- rsgislib.dataaccess.planet_data_api.run_create_planet_order(planet_auth, order_name: str, items: List[Dict], item_type: int, bundle_type: int, email_notification: bool = True) Dict
A function which creates an order for a list of items
- Parameters:
planet_auth – A planet.Auth object which can be created uing the rsgislib.dataccess.planet_data_api.planet_auth function.
order_name – A name for the new order.
items – A list of dictionaries containing the items to be created.
item_type – The type of item to be included in the order (RSGIS_PLANET_ITEM_*)
bundle_type – The bundle type for the order (RSGIS_PLANET_BUNDLE_*)
email_notification – Boolean specifying whether you will receive an email notification when the order is ready to download. (Default: True)
- Returns:
dict of information for the created order (including the order id)
- rsgislib.dataaccess.planet_data_api.run_download_planet_order(planet_auth, order_id: str, out_file_path: str, overwrite: bool = False) List
A function which downloads an order which has been processed and ready to download. If the order is not ready do download then an expection will be thrown.
- Parameters:
planet_auth – A planet.Auth object which can be created uing the rsgislib.dataccess.planet_data_api.planet_auth function.
order_id – The order ID (not name) of the order to be downloaded.
out_file_path – the output directory where the order will be downloaded.
overwrite – Specify whether downloads should overwrite existing files. (Default = False)
- Returns:
list of downloaded file paths.
- rsgislib.dataaccess.planet_data_api.run_get_planet_orders(planet_auth) List[Dict]
A function which gets a list of all the current planet orders.
- Parameters:
planet_auth – A planet.Auth object which can be created uing the rsgislib.dataccess.planet_data_api.planet_auth function.
- Returns:
List of orders
- rsgislib.dataaccess.planet_data_api.run_cancel_planet_orders(planet_auth, order_ids: List[str] = None) Dict
A function which the cancels planet orders.
- Parameters:
planet_auth – A planet.Auth object which can be created uing the rsgislib.dataccess.planet_data_api.planet_auth function.
order_ids – optional list of order IDs to be cancelled. If None (Default) then all are cancelled.
- Returns:
Dictionary of orders
- rsgislib.dataaccess.planet_data_api.run_download_and_validate_item(planet_auth, item_type: int, item_id: str, asset_type: int, out_file_path: str, overwrite: bool = False)
A function which can be used to download a single item from a planet
- Parameters:
planet_auth – A planet.Auth object which can be created uing the rsgislib.dataccess.planet_data_api.planet_auth function.
item_type – The type of item to be downloaded (RSGIS_PLANET_ITEM_*)
item_id – The unique id for the item to be downloaded.
asset_type – The type of asset type to be downloaded (RSGIS_PLANET_ASSET_*)
out_file_path – The output file path to download the file to.
overwrite – Boolean to overwrite existing files if it exists.
- rsgislib.dataaccess.planet_data_api.get_item_type_str(item_type: int) str
Get the string representation of a given item type (sensor).
- Parameters:
item_type – RSGIS_PLANET_ITEM_* value
- Returns:
string representation of a given item type
- rsgislib.dataaccess.planet_data_api.get_asset_type_str(asset_type: int) str
A function to get the string representation of a given asset type. :param asset_type: RSGIS_PLANET_ASSET_* :return: string representation of a given asset type
- rsgislib.dataaccess.planet_data_api.get_bundle_type_str(bundle_type: int) str
A function to get the string representation of a given bundle type. :param bundle_type: RSGIS_PLANET_ASSET_* :return: string representation of a given bundle type