RSGISLib Regression Module

This module contains functions for fitting and applying regression models to image data.

General

rsgislib.regression.get_regression_stats(ref_data, pred_data, n_vars=1)

A function which calculates a set of accuracy metrics using a set of reference and predicted values.

if n_vars == 1, then ref_data and pred_data can be flat arrays.

Parameters:
  • ref_data – a numpy array of n x m (m = n_vars) with the reference values.

  • pred_data – a numpy array of n x m (m = n_vars) with the predicted values.

  • n_vars – the number of variables to be used.

Returns:

list of dicts

Using Scikit-Learn

rsgislib.regression.regresssklearn.perform_kfold_fit(skl_regrs_obj, x, y, n_splits=5, repeats=1, shuffle=False, data_scaler=None)

A function which performs a k-fold fitting of a regression model to a dataset to estimate the output model variance in the relevant summary metrics.

Parameters:
  • skl_regrs_obj – A scikit learn regression object.

  • x – The independent variables used to estimate y.

  • y – The dependent variable(s) to for the which the regression will be carried out.

  • n_splits – number of splits to perform

  • repeats – the number of times to repeat each split.

  • shuffle – if not using repeats and shuffle=True then the data will be shuffled before splitting.

  • data_scaler – optional data scaler from scikit learn

Returns:

acc_metrics, residuals

rsgislib.regression.regresssklearn.perform_search_param_opt(opt_params_file, x, y, skl_srch_obj=None, data_scaler=None)

A function which performs a parameter optimisation using an instance of a scikit-learn BaseSearchCV (i.e., either GridSearchCV or RandomizedSearchCV).

Parameters:
  • opt_params_file – output JSON file where there optimal parameters will be written.

  • x – The independent variables used to estimate y.

  • y – The dependent variable(s) to for the which the regression will be carried out.

  • skl_srch_obj – a scikit-learn BaseSearchCV (i.e., either GridSearchCV or RandomizedSearchCV) instance.

  • data_scaler – optional data scaler from scikit learn

Returns:

the optimal scikit learn regression object.

rsgislib.regression.regresssklearn.apply_regress_sklearn_mdl(regrs_mdl, n_out_vars, predictor_img, predictor_img_bands, vld_msk_img, vld_msk_val, out_img, gdalformat='KEA', out_band_names=None, calc_stats=True, out_no_date_val=0.0)
Parameters:
  • regrs_mdl – the scikit-learn model

  • n_out_vars – the number of output variables (i.e., image bands)

  • predictor_img – the input image file providing the independent predictor variables used as the inputs to the model.

  • predictor_img_bands – list of the image bands used. Ensure this is in the same order as the variables used to train the model.

  • vld_msk_img – An input image file defining where the model should be applied

  • vld_msk_val – the pixel value within the vld_msk_img specifying the pixels to which the model should be applied to.

  • out_img – the output image file path.

  • gdalformat – output image file format (Default: KEA)

  • out_band_names – Optional list of band names for the output image. If None (Default) then not band names will be defined.

  • calc_stats – boolean specifying whether image statistics and pyramids should be build (default: True)

  • out_no_date_val – Output no data value to be used within the image file.

rsgislib.regression.regresssklearn.create_search_obj(regrs_obj, regrs_params, n_runs=250, n_cv=5, n_cores=1)

A function which creates a scikit-learn search object (i.e., GridSearchCV or RandomizedSearchCV) to be used within the perform_search_param_opt function to identify the optimal algorithms for the algorithm.

Default is to use a Grid Search which tries all combinations but if that is too many runs then a random search is used.

Parameters:
  • regrs_obj – The scikit-learn object (e.g., ExtraTreesRegressor)

  • regrs_params – a dict of the parameters to search. i.e., provided by get_XX_obj_params functions (e.g., get_et_obj_params)

  • n_runs – The maximum number of runs to perform. If the required number of runs is > n_runs then RandomizedSearchCV is used.

  • n_cv – the number of cross-validations to use for the analysis

  • n_cores – the number of cores to use.

Returns:

the instance of a scikit-learn BaseSearchCV (i.e., either GridSearchCV or RandomizedSearchCV)

rsgislib.regression.regresssklearn.get_ann_obj_params(n_predictors)

Get a Artificial neural network object and parameters.

Returns:

set [ANN Object, ANN Parameters Dict, Boolean as to whether data needs scaling]

rsgislib.regression.regresssklearn.get_en_obj_params(n_predictors)

Get a ElasticNet object and parameters.

Returns:

set [EN Object, EN Parameters Dict, Boolean as to whether data needs scaling]

rsgislib.regression.regresssklearn.get_knn_obj_params(n_predictors)

Get a KNeighborsRegressor object and parameters.

Returns:

set [KNN Object, KNN Parameters Dict, Boolean as to whether data needs scaling]

rsgislib.regression.regresssklearn.get_kr_obj_params(n_predictors)

Get a KernelRidge object and parameters.

Returns:

set [KR Object, KR Parameters Dict, Boolean as to whether data needs scaling]

rsgislib.regression.regresssklearn.get_et_obj_params(n_predictors)

Get a ExtraTreesRegressor object and parameters.

Returns:

set [ET Object, ET Parameters Dict, Boolean as to whether data needs scaling]

rsgislib.regression.regresssklearn.get_pls_obj_params(n_predictors)

Get a PLSRegression object and parameters.

Returns:

set [PLS Object, PLS Parameters Dict, Boolean as to whether data needs scaling]