boltzkit.utils.dataset

Classes

Dataset

Dataset container for samples drawn from a Boltzmann distribution.

class boltzkit.utils.dataset.Dataset[source]

Bases: object

Dataset container for samples drawn from a Boltzmann distribution.

The class stores samples together with thermodynamic quantities such as energies, log-probabilities, scores, or forces. Missing quantities are computed lazily from the available ones using the thermodynamic relations between energy, probability, and forces.

The following relationships are assumed:

log p(x) = -E(x) / (k_B T) score(x) = ∇_x log p(x) force(x) = -∇_x E(x)

which implies

score(x) = force(x) / (k_B T)

Internally, all vector quantities are represented in flattened form with shape (batch, d). Inputs with atomic coordinate shape (batch, n_atoms, 3) are automatically reshaped to this representation.

The class performs consistency checks on batch dimensions and ensures that incompatible quantities (e.g., energies and log-probabilities, or scores and forces) are not provided simultaneously.

__init__(kB_T: float, *, samples: ndarray | None = None, log_probs: ndarray | None = None, energies: ndarray | None = None, scores: ndarray | None = None, forces: ndarray | None = None)[source]

This class stores samples together with associated energies, log-probabilities, scores, or forces. Related quantities are computed lazily when accessed.

The following physical relationships are assumed:

  • log p(x) = -E(x) / (k_B T)

  • score(x) = ∇_x log p(x)

  • force(x) = -∇_x E(x)

which implies

  • score(x) = force(x) / (k_B T)

Either energies or log-probabilities may be provided, but not both. Likewise, either scores or forces may be provided.

Parameters:
  • kB_T (float) – Thermal energy (Boltzmann constant times temperature). Must be positive.

  • samples (ndarray, optional) – Sample configurations with shape (batch, d) or (batch, #atoms, 3).

  • log_probs (ndarray, optional) – Log-probabilities of samples with shape (batch,) or (batch, 1).

  • energies (ndarray, optional) – Energies of samples with shape (batch,) or (batch, 1).

  • scores (ndarray, optional) – Score function values ∇_x log p(x) with shape (batch, d) or (batch, n_atoms, 3).

  • forces (ndarray, optional) – Forces -∇_x E(x) with shape (batch, d) or (batch, n_atoms, 3).

Notes

  • All provided arrays must share the same first dimension (batch size).

  • Arrays with shape (batch, n_atoms, 3) are flattened internally to shape (batch, n_atoms * 3).

  • Arrays with shape (batch, 1) are converted to shape (batch,).

  • Missing quantities are computed on demand from the available ones.

property size
get_samples(length: int = -1) ndarray | None[source]

Sample configurations.

Returns:

Array with shape (batch, d), or None if no samples were provided.

Return type:

ndarray or None

get_log_probs(length: int = -1) ndarray | None[source]

Log-probabilities of the samples.

If log-probabilities were not provided during initialization but energies were given, they are computed as

log p(x) = -E(x) / (k_B T)

Returns:

Array with shape (batch,), or None if neither energies nor log-probabilities are available.

Return type:

ndarray or None

get_energies(length: int = -1) ndarray | None[source]

Energies of the samples.

If energies were not provided during initialization but log-probabilities were given, they are computed as

E(x) = -log p(x) * (k_B T)

Returns:

Array with shape (batch,), or None if neither energies nor log-probabilities are available.

Return type:

ndarray or None

get_scores(length: int = -1) ndarray | None[source]

Score function values ∇_x log p(x).

If scores were not provided but forces were given, they are computed as

score(x) = force(x) / (k_B T)

Returns:

Array with shape (batch, d), or None if neither scores nor forces are available.

Return type:

ndarray or None

get_forces(length: int = -1) ndarray | None[source]

Forces acting on the samples.

Forces are defined as

force(x) = -∇_x E(x)

If forces were not provided but scores were given, they are computed as

force(x) = score(x) * (k_B T)

Returns:

Array with shape (batch, d), or None if neither forces nor scores are available.

Return type:

ndarray or None

scale_domain(length_scale: float) Dataset[source]

Creates a new dataset with a rescaled spatial domain.

The transformation rescales the coordinate samples and all related differential quantities consistently according to the change of variables:

\[x_{scaled} =\]

rac{x}{L}

where \(L\) is the provided length_scale.

Under this transformation:

  • Samples/coordinates are divided by length_scale

  • Scores are multiplied by length_scale

  • Forces are multiplied by length_scale

Log probabilities, energies, and thermodynamic quantities are left unchanged.

length_scalefloat

Spatial scaling factor \(L\) used to rescale the dataset domain. A value greater than 1 contracts the coordinate representation, while a value smaller than 1 expands it.

Dataset

A new dataset instance with rescaled samples, scores, and forces.

This operation does NOT modify the dataset in-place.