Spatial processing

ela.spatial module

class ela.spatial.DepthsRounding(depth_from_col='Depth From (m)', depth_to_col='Depth To (m)')

Bases: object

Helper class to round lithology record classes to the nearest metre of depth

depth_from_col

Column name storing “from depth” information

Type

str

depth_to_col

Column name storing “to depth” information

Type

str

assess_num_collapsed(df, func=<function round_>)

How many records would be removed from the lithology records if rounding from/to depths

Parameters
  • df (pandas data frame) – bore lithology data

  • func (callable) – rounding function callable with a signature similar to np.round

Returns

the number of entries that would be collapsed from/to depths

Return type

(int)

round_to_metre_depths(df, func=<function round_>, remove_collapsed=False)

Round lithology record classes to the nearest metre of depth

Parameters
  • df (pandas data frame) – bore lithology data

  • func (callable) – rounding function callable with a signature similar to np.round

  • remove_collapsed (bool) – should entries where depths from and to are equal be removed from the result

Returns

a data frame of similar structure as the input but without entries less than a metre resolution.

Return type

(pandas dataframe)

class ela.spatial.GeospatialDataFrameColumnNames(easting_col='Easting', northing_col='Northing', depth_from_ahd_col='Depth From (AHD)', depth_to_ahd_col='Depth To (AHD)')

Bases: object

Operations on data frames with 3D spatial information of lithology logs. The purpose of this class is to adapt ‘pyela’ operations to different data without requiring renaming columns.

easting_col

name of the data frame column for easting

Type

str

northing_col

name of the data frame column for northing

Type

str

depth_from_ahd_col

name of the data frame column for the height of the top of the soil column (ahd stands for for australian height datum, but not restricted)

Type

str

depth_to_ahd_col

name of the data frame column for the height of the bottom of the soil column (ahd stands for for australian height datum, but not restricted)

Type

str

class_probability_estimates_depth(df, column_name, slice_depth, n_neighbours, mesh_grid, func_training_set=None)

Subset data frame with entries at a specified AHD coordinate

Parameters
  • df (pandas data frame) – bore lithology data

  • column_name (str) – name of the column with string information to use to strip entries with missing lithology information

  • slice_depth (float) – AHD coordinate at which to slice the data frame for lithology observations

  • n_neighbours (int) – number of nearest neighbours

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

  • func_training_set (callable) – a function to processing the training set (e.g. completing dummy with dummy classes, other not present in the trainining set)

Returns

a list of numpy arrays, shaped like the meshgrid.

class_probability_estimates_depth_bbox(df, column_name, slice_depth, n_neighbours, geo_pd, grid_res=100, func_training_set=None)

Interpolate over a volume the probability of each lithology class

Parameters
  • df (pandas data frame) – bore lithology data, spatially georeferenced

  • column_name (str) – name of the column with numeric codes for lithology classes

  • z_ahd_coords (iterable of int) – datum heights at which to interpolate. Must be equal to the length of the last dimension of the volume

  • n_neighbours (int) – number of nearest neighbours

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

Returns

numpy array, predicted values over the grid.

extract_bore_class_num(bore_log_df, column_name)

Gets the columns easting, northing, primary lithology class number, AHD depth ‘from’ and ‘to’ from a bore data log

Parameters
  • bore_log_df (pandas data frame) – bore lithology data

  • column_name (str) – name of the column of interest e.g. lithology descriptions or classes

get_knn_model(df, column_name, slice_depth, n_neighbours)

Train a K-nearest neighbours model for a given plane

Parameters
  • df (data frame) –

  • column_name (str) –

  • slice_depth (numeric) –

  • n_neighbours (int) –

Returns

trained classifier.

Return type

KNeighborsClassifier

get_lithology_classes_probabilities(lithologies, shape, df, column_name, z_ahd_coords, n_neighbours, mesh_grid)

Interpolate over a volume the probability of each lithology class

Parameters
  • shape (tuple of ints) – shape of the output volumes to interpolate over

  • df (pandas data frame) – bore lithology data, spatially georeferenced

  • column_name (str) – name of the column with numeric codes for lithology classes

  • z_ahd_coords (iterable of int) – datum heights at which to interpolate. Must be equal to the length of the last dimension of the volume

  • n_neighbours (int) – number of nearest neighbours

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

Returns

numpy array, predicted values over the grid.

get_lithology_observations_for_depth(df, slice_depth, column_name)

Subset data frame with entries at a specified AHD coordinate, and with valid lithology information.

Args:

df (pandas data frame): bore lithology data slice_depth (float): AHD coordinate at which to slice the data frame for lithology observations column_name (str): name of the column with string information to use to strip entries with missing lithology information

Returns:

a (view of a) data frame; a subset of the input data frame, entries intersecting with the specified slice depth

interpolate_lithologydata_slice_depth(df, column_name, slice_depth, n_neighbours, mesh_grid)

Interpolate lithology data

Parameters
  • df (pandas data frame) – bore lithology data

  • slice_depth (float) – AHD coordinate at which to slice the data frame for lithology observations

  • n_neighbours (int) – number of nearest neighbours

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

Returns

numpy array, predicted values over the grid.

interpolate_lithologydata_slice_depth_bbox(df, column_name, slice_depth, n_neighbours, geo_pd, grid_res=100)

Interpolate lithology data

Parameters
  • df (pandas data frame) – bore lithology data

  • slice_depth (float) – AHD coordinate at which to slice the data frame for lithology observations

  • n_neighbours (int) – number of nearest neighbours

  • geo_pd (geopandas df) – vector of spatial data from which to get the bounds of interest (bounding box)

  • grid_res (int) – grid resolution in m for x and y.

Returns

numpy array, predicted values over the grid.

lithologydata_slice_depth(df, slice_depth)

Subset data frame with entries at a specified AHD coordinate

Args:

df (pandas data frame): bore lithology data slice_depth (float): AHD coordinate at which to slice the data frame for lithology observations

Returns:

a (view of a) data frame, a subset of the input data frame, entries intersecting with the specified slice depth

make_training_set(observations, column_name)

Create a training set suitable for machine learning by e.g. scikit-learn out of a georeferenced data frame.

Parameters
  • observations (pandas data frame) – bore lithology data

  • column_name (str) – name of the column of interest e.g. lithology descriptions or classes

class ela.spatial.GridInterpolation(easting_col='Easting', northing_col='Northing', depth_from_ahd_col='Depth From (AHD)', depth_to_ahd_col='Depth To (AHD)')

Bases: object

Operations interpolating over a grid using a trained model The purpose of this class is to adapt ‘pyela’ operations to different data without requiring renaming columns.

dfcn
Type

GeospatialDataFrameColumnNames

northing_col

name of the data frame column for northing

Type

str

depth_from_ahd_col

name of the data frame column for the height of the top of the soil column (ahd stands for for australian height datum, but not restricted)

Type

str

depth_to_ahd_col

name of the data frame column for the height of the bottom of the soil column (ahd stands for for australian height datum, but not restricted)

Type

str

get_knn_model(df, column_name, slice_depth, n_neighbours)

Train a K-nearest neighbours model for a given plane

Args:

df (pandas data frame): bore lithology data column_name (str): name of the column with string information to use to strip entries with missing lithology information slice_depth (float): AHD coordinate at which to slice the data frame for lithology observations n_neighbours (int): number of nearest neighbours

Returns

trained classifier.

Return type

KNeighborsClassifier

get_lithology_observations_for_depth(df, slice_depth, column_name)

Subset data frame with entries at a specified AHD coordinate, and with valid lithology information.

Args:

df (pandas data frame): bore lithology data slice_depth (float): AHD coordinate at which to slice the data frame for lithology observations column_name (str): name of the column with string information to use to strip entries with missing lithology information

Returns:

a (view of a) data frame; a subset of the input data frame, entries intersecting with the specified slice depth

interpolate_lithologydata_slice_depth(df, column_name, slice_depth, n_neighbours, mesh_grid)

Interpolate lithology data

Parameters
  • df (pandas data frame) – bore lithology data

  • column_name (str) – name of the column with string information to use to strip entries with missing lithology information

  • slice_depth (float) – AHD coordinate at which to slice the data frame for lithology observations

  • n_neighbours (int) – number of nearest neighbours

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

Returns

numpy array, predicted values over the grid.

interpolate_volume(volume, df, column_name, z_ahd_coords, n_neighbours, mesh_grid)

Interpolate lithology data over a volume

Parameters
  • volume (ndarray) – 3d volume to fill with interpolated lithologies

  • df (pandas data frame) – bore lithology data

  • column_name (str) – name of the column with numeric codes for lithology classes

  • z_ahd_coords (iterable of int) – datum heights at which to interpolate. Must be equal to the length of the last dimension of the volume

  • n_neighbours (int) – number of nearest neighbours

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

Returns

numpy array, predicted values over the grid.

make_training_set(observations, column_name)

Create a training set from a set of geolocated observations

Parameters
  • observations (pandas data frame) – bore lithology data with geocoordinates

  • column_name (str) – name of the column with string information to use to strip entries with missing lithology information

Returns

observations and predictors (geolocation).

Return type

(tuple)

class ela.spatial.HeightDatumConverter(dem_raster, easting_col='Easting', northing_col='Northing', depth_from_ahd_col='Depth From (AHD)', depth_to_ahd_col='Depth To (AHD)')

Bases: object

dfcn

Adapter to a dataframe column names for depth/height information.

Type

GeospatialDataFrameColumnNames

dem_raster

DEM raster

Type

rasterio dataset

data_grid

band 1 data in the raster

Type

??

add_height(lithology_df, depth_from_col='Depth From (m)', depth_to_col='Depth To (m)', depth_from_ahd_col='Depth From (AHD)', depth_to_ahd_col='Depth To (AHD)', easting_col='Easting', northing_col='Northing', drop_na=False)

Helper class to round lithology record classes to the nearest metre of depth

Parameters
  • lithology_df

  • depth_from_col (str) – Column name storing “from depth” information

  • depth_to_col (str) – Column name storing “to depth” information

  • depth_from_ahd_col (str) – Column name datum height information

  • depth_to_ahd_col (str) – Column name datum height information

  • easting_col (str) – Column name storing easting information

  • northing_col (str) – Column name storing northing information

  • drop_na (bool) – If true, entries with nissing datum heights are dropped

Returns

Return type

(pandas.DataFrame)

raster_drill_df(df)

Gets the DEM values for the easting/northing in a data frame

Parameters

df (pandas.DataFrame) – data frame of records with easting/northing information in columns compatible with this object

Returns

the datum height values at the specified points

Return type

(1D numpy array)

raster_value_at(easting, northing)

Raster value for a given easting/northing

Parameters
  • easting (float) – easting value

  • northing (float) – northing value

Returns

the value of the band at the specified point

Return type

(float)

class ela.spatial.SliceOperation(dem_array_zeroes_infill, z_index_for_ahd)

Bases: object

Helper class to perform slicing operations on a 3D volume.

dem_array_zeroes_infill

An array, DEM for the grid at the same x/y resolution as the volume to be sliced.

Type

2D array

z_index_for_ahd

bijection from a z index in the volume to its AHD height

Type

callable

static arithmetic_average(slices)
static arithmetic_average_int(slices)
from_ahd_to_depth_below_ground_level(volume, from_depth, to_depth)

Slice a volume at every meter between two depths below ground level

Parameters
  • volume (3D array) – volume of interest

  • from_depth (numeric) – top level depth below ground

  • to_depth (numeric) – bottom level depth below ground. from_depth is less than to_depth

Returns

volume values between the two depths below ground.

Return type

(3D Array)

reduce_slices_at_depths(volume, from_depth, to_depth, reduce_func)

Slice a volume at every meter between two depths below ground level, and reduce the resulting ‘vertical’ values to a single statistic.

Parameters
  • volume (3D array) – volume of interest

  • from_depth (numeric) – top level depth below ground

  • to_depth (numeric) – bottom level depth below ground. from_depth is less than to_depth

  • reduce_func (callable) – A function that takes in a list of 2D grids and returns one 2D grid

Returns

reduced values (e.g. averaged) for each grid geolocation between the two depths below ground.

Return type

(2D Array)

ela.spatial.average_slices(slices)

Gets the average values over numeric slices

Parameters

slices (list of 2D np arrays) – slices to average

ela.spatial.burn_volume(volume, surface_raster, height_to_z, below=False, ignore_nan=False, inclusive=False)

Burn out parts of a xyz volume given a surface, below or above the intersection of the volume with the surface

Volume

volume to modify

Type

3D numpy

Surface_raster

AHD coordinate at which to slice the data frame for lithology observations

Type

2D numpy

Height_to_z

Number of neighbors to pass to KNeighborsClassifier

Type

a function to convert the surface raster value (height for a DEM) to a corresponding z-index in the volume.

Below

should the part below or above be burnt out

Type

bool

Ignore_nan

If the surface to burn from has NaNs, should it mask out the whole corresponding cells in the volume

Type

bool

Inclusive

is the cell in the volume cut by the surface included in the burning out (i.e. set to NaN) or its value kept?

Type

bool

ela.spatial.burn_volume_func(func_below, func_above, volume, surface_raster, height_to_z, below=False, ignore_nan=False, inclusive=False)

Reusable function, not for end user. Process parts of a xyz volume given a surface, below or above the intersection of the volume with the surface

ela.spatial.create_meshgrid(geo_pd, grid_res)

Create a 2D meshgrid to be used with numpy for vectorized operations and Mayavi visualisation.

Parameters
  • geo_pd (geopandas) – shape from which we can get the bounding box as a basis for the extend of the meshgrid

  • grid_res (numeric) – x and y resolution of the grid we create

Returns

2-D coordinate arrays for vectorized evaluations of 2-D scalar/vector fields.

The arrays are ordered such that the first dimension relates to the X coordinate and the second the Y coordinate. This is done such that the 2D coordinate arrays work as-is with visualisation with Mayavi without unnecessary transpose operations.

Return type

(list of 2 2dim numpy.ndarray)

ela.spatial.create_meshgrid_cartesian(x_min, x_max, y_min, y_max, grid_res)

Create a 2D meshgrid to be used with numpy for vectorized operations and Mayavi visualisation.

Parameters
  • x_min (numeric) – lower x coordinate

  • x_max (numeric) – upper x coordinate

  • y_min (numeric) – lower y coordinate

  • y_max (numeric) – upper y coordinate

  • grid_res (numeric) – x and y resolution of the grid we create

Returns

2-D coordinate arrays for vectorized evaluations of 2-D scalar/vector fields.

The arrays are ordered such that the first dimension relates to the X coordinate and the second the Y coordinate. This is done such that the 2D coordinate arrays work as-is with visualisation with Mayavi without unnecessary transpose operations.

Return type

(list of 2 2dim numpy.ndarray)

ela.spatial.get_bbox(geo_pd)

Gets the bounding box of a geopandas dataframe

Parameters

geo_pd (geopandas) – shape from which we can get the bounding box as a basis for the extend of the meshgrid

Returns

bbox (x_min, y_min, x_max, y_max)

Return type

(tuple of floats)

ela.spatial.get_coords_from_gpd_shape(shp, colname='geometry', out_colnames=None)

Gets a dataframe of x/y geolocations out of a geopandas object

Parameters
  • shp (GeoDataFrame) – a geopandas GeoDataFrame

  • colname (str) – name of columns that has the geometry in the geopandas shapefile

  • out_colnames (list of str) – names of the added geolocation columns in the output data frame

Returns

a two column data frames, coordinates of all the points in the geodataframe

Return type

(DataFrame)

Example

>>> from ela.doc.sampledata import *
>>> from ela.spatial import get_coords_from_gpd_shape
>>> dem ,bore_loc, litho_logs = sample_data()
>>> df = litho_logs[[DEPTH_FROM_COL, DEPTH_TO_COL,TOP_ELEV_COL,BOTTOM_ELEV_COL,LITHO_DESC_COL,HYDRO_CODE_COL]]
>>> geoloc = get_coords_from_gpd_shape(bore_loc, colname='geometry', out_colnames=[EASTING_COL, NORTHING_COL])
>>> geoloc[HYDRO_CODE_COL] = bore_loc[HYDRO_CODE_COL]
>>> geoloc.head()
>>> # With this data frame we can perform two operations in one go: subsetting the lithology records to only the 640 bores of interest, and adding to the result the x/y geolocations to the data frame.
>>> df = pd.merge(df, geoloc, how='inner', on=HYDRO_CODE_COL, sort=False, copy=True, indicator=False, validate=None)
ela.spatial.get_unique_coordinates(easting, northing)

Gets the unique set of geolocated points

Parameters
  • easting (iterable of floats) – easting values

  • northing (iterable of floats) – northing values

Returns

two dimensional nparray

Return type

(numpy array)

ela.spatial.interpolate_over_meshgrid(predicting_algorithm, mesh_grid)

Interpolate lithology data

Parameters
  • predicting_algorithm (algorithm with a predict method.) – trained algorithm such as the K Nearest Neighbours in scikit (KNN)

  • mesh_grid (tuple) – coordinate matrices to interpolate over (numpy.meshgrid)

Returns

numpy array, predicted values over the grid.

ela.spatial.pad_training_set_functor(classes)

Create a function that pads a training set so that all possible classes are present

ela.spatial.read_raster_value(dem, band_1, easting, northing)

Read a value in a raster grid given easting/northing

Parameters
  • dem (rasterio dataset) – dem

  • band_1 (numpy array) – the band 1 from the DEM

  • easting (float) – easting value

  • northing (float) – northing value

Returns

the value of the band at the specified point

Return type

(float)

ela.spatial.set_at_surface_boundary(volume, surface_raster, height_to_z, value=0.0, ignore_nan=False)

Burn out parts of a xyz volume given a surface, below or above the intersection of the volume with the surface

Volume

volume to modify

Type

3D numpy

Surface_raster

AHD coordinate at which to slice the data frame for lithology observations

Type

2D numpy

Height_to_z

Number of neighbors to pass to KNeighborsClassifier

Type

a function to convert the surface raster value (height for a DEM) to a corresponding z-index in the volume.

Below

should the part below or above be burnt out

Type

bool

Ignore_nan

If the surface to burn from has NaNs, should it mask out the whole corresponding cells in the volume

Type

bool

Inclusive

is the cell in the volume cut by the surface included in the burning out (i.e. set to NaN) or its value kept?

Type

bool

ela.spatial.slice_volume(volume, slice_surface, height_to_z)

read values in a volume along a slicing surface

Parameters
  • volume (3D ndarray) – volume to drill

  • slice_surface (2D numpy array) – array of elevations

  • height_to_z (callable) – converter from datum height to z index

Returns

the value in the volume at the specified point

Return type

(float)

ela.spatial.surface_array(raster, x_min, y_min, x_max, y_max, grid_res)

Creates an isometric 2D grid from a raster, effectively resampling.

This resampling is done to facilitate downstream operations on gridded interpolated volumes.

Args: raster (rasterio dataset): raster to sample from, typically DEM elevations

x_min (numeric): lower x coordinate x_max (numeric): upper x coordinate y_min (numeric): lower y coordinate y_max (numeric): upper y coordinate grid_res (numeric): x and y resolution of the grid we create

Returns

grid of raster values, typically DEM elevations

Return type

(2dim numpy.ndarray)

ela.spatial.volume_value_at(volume, slice_surface, height_to_z, x, y)

read a volume value at an x/y/z location.

Parameters
  • volume (3D ndarray) – volume to drill

  • slice_surface (2D numpy array) – array of elevations

  • height_to_z (callable) – converter from datum height to z index

  • x (float) – easting index

  • y (float) – northing index

Returns

the value in the volume at the specified point

Return type

(float)

ela.spatial.vstacked_points(xx, yy)
ela.spatial.z_index_for_ahd_functor(a=1, b=50)

Creates a linear univariate function for height to index conversions

Parameters
  • a (float) – slope - defaults to one

  • b (float) – height offset

Returns

univariate linear function

Return type

(callable)