Skip to content

Possible unexpected behaviour of gs.vario_estimate with masked fields #377

@MarioPasc

Description

@MarioPasc

Hi,

This is the first time I'm rising a possible issue of this library, so I apologize if this is not the right way to communicate it. I'm also not an expert in the library, and my knowledge field is nowhere near geostatistics, but I have been involved in a project where I'm using the variogram estimation and covarianze model fitting to those variograms a lot, so I have taken a look at the source code now and then.

When trying to compute an isotropic variogram using a field and a mask to that field (where True values are values to exclude from the passed field), i always get a IndexError, either by passing the mask using the mask argument, or making the field using masked_field = ma.array(data.flatten(), mask=mask.flatten()). Taking a look at the source code i have noticed a potential issue.

I think that pnt_cnt is calculated before the mask is applied, so it doesn't reflect the reduced size of the field after the mask. This leads to an out-of-bounds error when sampling, since the function doesn't update pnt_cnt post-masking. When the user passes a sampling_size argument, sampling indices are generated, which refer to the original size, not the masked size. Later, when the indices are used to sample the field (field = field[:, sampled_idx]), they exceed the available range of the smaller masked array, resulting in the out-of-bounds error.

I will attach the function that raises this exception for me:

def estimate_variogram_isotropic_3d(
    data: np.ndarray,
    bins: np.ndarray,
    mask: np.ndarray,
    sampling_size: int = 2000,
    sampling_seed: int = 19920516,
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Estimate the isotropic variogram from 3D data using gstools.vario_estimate,
    passing the exclusion mask directly.

    Parameters
    ----------
    data : np.ndarray
        3D array containing the volume data.
    bins : np.ndarray
        1D array defining the bin edges for variogram estimation.
    mask : np.ndarray
        3D boolean exclusion mask (True indicates voxels to exclude).
    sampling_size : int, optional
        Number of voxel pairs to sample.
    sampling_seed : int, optional
        Seed for random sampling.

    Returns
    -------
    bin_centers : np.ndarray
        Centers of the distance bins.
    gamma : np.ndarray
        Estimated variogram values.
    """
    # Generate a grid of coordinates for the full volume.
    x = np.arange(data.shape[0])
    y = np.arange(data.shape[1])
    z = np.arange(data.shape[2])
    pos_x, pos_y, pos_z = np.meshgrid(x, y, z, indexing="ij")
    pos_all = np.vstack(
        (
            pos_x.flatten(),
            pos_y.flatten(),
            pos_z.flatten(),
        )
    )

    # Flatten the field and the exclusion mask.
    field = data.flatten()
    mask_flat = mask.flatten()

    print(f"Total number of voxels: {data.size}")

    # Call gs.vario_estimate passing the mask.
    bin_centers, gamma = gs.vario_estimate(
        pos=pos_all,
        field=field,
        mask=mask_flat,
        bin_edges=bins,
        mesh_type="unstructured",
        sampling_size=sampling_size,
        sampling_seed=sampling_seed,
    )
    return bin_centers, gamma

And a possible solution from the user point-of-view, which would be to apply the mask yourself to the position and field arrays. In this approach, we compute the valid (i.e. unmasked) indices and then pass only those points. This ensures that the number of points used in the sampling step (pnt_cnt) exactly matches the number of unmasked points.

def estimate_variogram_isotropic_3d(
    data: np.ndarray,
    bins: np.ndarray,
    mask: np.ndarray,
    sampling_size: int = 2000,
    sampling_seed: int = 19920516,
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Estimate the isotropic variogram from 3D data using gstools.vario_estimate.
    Instead of relying on the mask argument (which causes a mismatch in point counts),
    this function pre-selects the unmasked (valid) points and passes only those.
    
    Parameters
    ----------
    data : np.ndarray
        3D volume data.
    bins : np.ndarray
        1D array of bin edges for the variogram.
    mask : np.ndarray
        3D boolean exclusion mask (True indicates voxels to exclude).
    sampling_size : int, optional
        Number of voxel pairs to sample.
    sampling_seed : int, optional
        Seed for random sampling.
    
    Returns
    -------
    bin_centers : np.ndarray
        Bin centers.
    gamma : np.ndarray
        Estimated variogram values.
    """
    # Generate the full grid of coordinates.
    x = np.arange(data.shape[0])
    y = np.arange(data.shape[1])
    z = np.arange(data.shape[2])
    pos_x, pos_y, pos_z = np.meshgrid(x, y, z, indexing="ij")
    pos_all = np.vstack((pos_x.flatten(), pos_y.flatten(), pos_z.flatten()))
    
    # Create the full field and mask.
    full_field = data.flatten()
    full_mask = mask.flatten()
    
    # Select only the unmasked (valid) indices.
    valid_idx = np.flatnonzero(~full_mask)
    pos_valid = pos_all[:, valid_idx]
    field_valid = np.atleast_2d(full_field[valid_idx])
    
    # Now call gstools without a mask (or with np.ma.nomask) because we've pre-filtered.
    bin_centers, gamma = gs.vario_estimate(
        pos=pos_valid,
        field=field_valid,
        mask=np.ma.nomask,
        bin_edges=bins,
        mesh_type="unstructured",
        sampling_size=sampling_size,
        sampling_seed=sampling_seed,
    )
    return bin_centers, gamma

I hope this helps, and i also hope I'm doing something wrong. Thank you for reading.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions