-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Hi,
This is the first time I'm rising a possible issue of this library, so I apologize if this is not the right way to communicate it. I'm also not an expert in the library, and my knowledge field is nowhere near geostatistics, but I have been involved in a project where I'm using the variogram estimation and covarianze model fitting to those variograms a lot, so I have taken a look at the source code now and then.
When trying to compute an isotropic variogram using a field and a mask to that field (where True values are values to exclude from the passed field), i always get a IndexError, either by passing the mask using the mask argument, or making the field using masked_field = ma.array(data.flatten(), mask=mask.flatten()). Taking a look at the source code i have noticed a potential issue.
I think that pnt_cnt is calculated before the mask is applied, so it doesn't reflect the reduced size of the field after the mask. This leads to an out-of-bounds error when sampling, since the function doesn't update pnt_cnt post-masking. When the user passes a sampling_size argument, sampling indices are generated, which refer to the original size, not the masked size. Later, when the indices are used to sample the field (field = field[:, sampled_idx]), they exceed the available range of the smaller masked array, resulting in the out-of-bounds error.
I will attach the function that raises this exception for me:
def estimate_variogram_isotropic_3d(
data: np.ndarray,
bins: np.ndarray,
mask: np.ndarray,
sampling_size: int = 2000,
sampling_seed: int = 19920516,
) -> Tuple[np.ndarray, np.ndarray]:
"""
Estimate the isotropic variogram from 3D data using gstools.vario_estimate,
passing the exclusion mask directly.
Parameters
----------
data : np.ndarray
3D array containing the volume data.
bins : np.ndarray
1D array defining the bin edges for variogram estimation.
mask : np.ndarray
3D boolean exclusion mask (True indicates voxels to exclude).
sampling_size : int, optional
Number of voxel pairs to sample.
sampling_seed : int, optional
Seed for random sampling.
Returns
-------
bin_centers : np.ndarray
Centers of the distance bins.
gamma : np.ndarray
Estimated variogram values.
"""
# Generate a grid of coordinates for the full volume.
x = np.arange(data.shape[0])
y = np.arange(data.shape[1])
z = np.arange(data.shape[2])
pos_x, pos_y, pos_z = np.meshgrid(x, y, z, indexing="ij")
pos_all = np.vstack(
(
pos_x.flatten(),
pos_y.flatten(),
pos_z.flatten(),
)
)
# Flatten the field and the exclusion mask.
field = data.flatten()
mask_flat = mask.flatten()
print(f"Total number of voxels: {data.size}")
# Call gs.vario_estimate passing the mask.
bin_centers, gamma = gs.vario_estimate(
pos=pos_all,
field=field,
mask=mask_flat,
bin_edges=bins,
mesh_type="unstructured",
sampling_size=sampling_size,
sampling_seed=sampling_seed,
)
return bin_centers, gammaAnd a possible solution from the user point-of-view, which would be to apply the mask yourself to the position and field arrays. In this approach, we compute the valid (i.e. unmasked) indices and then pass only those points. This ensures that the number of points used in the sampling step (pnt_cnt) exactly matches the number of unmasked points.
def estimate_variogram_isotropic_3d(
data: np.ndarray,
bins: np.ndarray,
mask: np.ndarray,
sampling_size: int = 2000,
sampling_seed: int = 19920516,
) -> Tuple[np.ndarray, np.ndarray]:
"""
Estimate the isotropic variogram from 3D data using gstools.vario_estimate.
Instead of relying on the mask argument (which causes a mismatch in point counts),
this function pre-selects the unmasked (valid) points and passes only those.
Parameters
----------
data : np.ndarray
3D volume data.
bins : np.ndarray
1D array of bin edges for the variogram.
mask : np.ndarray
3D boolean exclusion mask (True indicates voxels to exclude).
sampling_size : int, optional
Number of voxel pairs to sample.
sampling_seed : int, optional
Seed for random sampling.
Returns
-------
bin_centers : np.ndarray
Bin centers.
gamma : np.ndarray
Estimated variogram values.
"""
# Generate the full grid of coordinates.
x = np.arange(data.shape[0])
y = np.arange(data.shape[1])
z = np.arange(data.shape[2])
pos_x, pos_y, pos_z = np.meshgrid(x, y, z, indexing="ij")
pos_all = np.vstack((pos_x.flatten(), pos_y.flatten(), pos_z.flatten()))
# Create the full field and mask.
full_field = data.flatten()
full_mask = mask.flatten()
# Select only the unmasked (valid) indices.
valid_idx = np.flatnonzero(~full_mask)
pos_valid = pos_all[:, valid_idx]
field_valid = np.atleast_2d(full_field[valid_idx])
# Now call gstools without a mask (or with np.ma.nomask) because we've pre-filtered.
bin_centers, gamma = gs.vario_estimate(
pos=pos_valid,
field=field_valid,
mask=np.ma.nomask,
bin_edges=bins,
mesh_type="unstructured",
sampling_size=sampling_size,
sampling_seed=sampling_seed,
)
return bin_centers, gammaI hope this helps, and i also hope I'm doing something wrong. Thank you for reading.