Benchmarking regression models under spatial heterogeneity

This repository accompanies our GIScience publication "Benchmarking regression models under spatial heterogeneity" (see reference below). In the code base, we provide 1) the script for reproducing our experiments on synthetic data, 2) the script for reproducing our benchmarking experiments on several real datasets and 3) an open-source Python implementation of spatial Random Forests. Each part is described in the following.

Installation

The required packages and our sprf package can be installed via pip in editable mode in a virtual environment with the following commands:

git clone https://github.com/mie-lab/spatial_rf_python.git
cd spatial_rf_python
python -m venv env
source env/bin/activate
pip install -e .

1) Experiments on synthetic datasets

To reproduce our analysis on synthetic data, run:

python scripts/synthetic_tests.py

All results will be saved in a single csv file named synthetic_data_results.csv.

2) Benchmarking on real datasets

We use five public data sets to validate our results and to benchmark different algorithms. The datasets are provided as csv fils in the data folder. They include

A plants dataset
A deforestation dataset
A mortality rate dataset from here

Please cite these sources if reusing their data.

Our code for benchmarking is provided as a notebook and as a script. To reproduce our experiments from the paper, run

python scripts/benchmarks.py

The results will be saved as csv files in a folder named outputs.

3) Spatial Random Forest implementation in Python

This repository further provides Python implementations of Spatial Random Forests. Different approaches have been proposed in the literature, but here, we focus on the one by Georganos et al termed Geographical Random Forests. We implement their approach, but since it is very inefficient to train one random forest per sample, we additionally implement a more efficient variant (which we simply call Spatial Random Forests): Instead of training one Random Forest per sample, we train a fixed number of random forests on spatially distinct set of points. The prediction is then a weighted average of the tree-wise predictions, weighted by the distance of the test sample from the centers of each tree (see figure below).

Usage

We demonstrate the usage of the spatial Random Forests in the demonstration notebook.

The usage is analogous to other scikit-learn models, except that the coordinates must also be given as input.

from sprf import SpatialRandomForest
spatial_rf = SpatialRandomForest()
spatial_rf.fit(train_x, train_y, train_coords)
test_pred = spatial_rf.predict(test_x, test_coords)

To train a Geographical Random Forest as proposed by Georganos et al, we provide the corresponding class which can be used in the same way:

from sprf import GeographicalRandomForest
geo_rf = GeographicalRandomForest()
geo_rf.fit(train_x, train_y, train_coords)
test_pred = geo_rf.predict(test_x, test_coords)

Citation

If you use our work, please cite our paper with the following bibtex entry:

@inproceedings{wiedemann2023benchmarking,
  title={Benchmarking regression models under spatial heterogeneity},
  author={Wiedemann, Nina and Martin, Henry and Westerholt, René},
  booktitle={12th International Conference on Geographic Information Science (GIScience 2023)},
  year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
data		data
scripts		scripts
sprf		sprf
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
figure.ipynb		figure.ipynb
requirements.txt		requirements.txt
setup.py		setup.py
sprf_demo.ipynb		sprf_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking regression models under spatial heterogeneity

Installation

1) Experiments on synthetic datasets

2) Benchmarking on real datasets

3) Spatial Random Forest implementation in Python

Usage

Citation

About

Releases

Packages

Languages

License

mie-lab/spatial_rf_python

Folders and files

Latest commit

History

Repository files navigation

Benchmarking regression models under spatial heterogeneity

Installation

1) Experiments on synthetic datasets

2) Benchmarking on real datasets

3) Spatial Random Forest implementation in Python

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages