boltzkit.utils.dataset

Classes

Dataset

Dataset container for samples drawn from a Boltzmann distribution.

class boltzkit.utils.dataset.Dataset[source]

Bases: object

Dataset container for samples drawn from a Boltzmann distribution.

The class stores samples together with thermodynamic quantities such as energies, log-probabilities, scores, or forces. Missing quantities are computed lazily from the available ones using the thermodynamic relations between energy, probability, and forces.

The following relationships are assumed:

log p(x) = -E(x) / (k_B T) score(x) = ∇_x log p(x) force(x) = -∇_x E(x)

which implies

score(x) = force(x) / (k_B T)

Internally, all vector quantities are represented in flattened form with shape (batch, d). Inputs with atomic coordinate shape (batch, n_atoms, 3) are automatically reshaped to this representation.

The class performs consistency checks on batch dimensions and ensures that incompatible quantities (e.g., energies and log-probabilities, or scores and forces) are not provided simultaneously.

This class stores samples together with associated energies, log-probabilities, scores, or forces. Related quantities are computed lazily when accessed.

The following physical relationships are assumed:

log p(x) = -E(x) / (k_B T)
score(x) = ∇_x log p(x)
force(x) = -∇_x E(x)

which implies

score(x) = force(x) / (k_B T)

Either energies or log-probabilities may be provided, but not both. Likewise, either scores or forces may be provided.

Parameters:

kB_T (float) – Thermal energy (Boltzmann constant times temperature). Must be positive.
samples (ndarray, optional) – Sample configurations with shape (batch, d) or (batch, #atoms, 3).
log_probs (ndarray, optional) – Log-probabilities of samples with shape (batch,) or (batch, 1).
energies (ndarray, optional) – Energies of samples with shape (batch,) or (batch, 1).
scores (ndarray, optional) – Score function values ∇_x log p(x) with shape (batch, d) or (batch, n_atoms, 3).
forces (ndarray, optional) – Forces -∇_x E(x) with shape (batch, d) or (batch, n_atoms, 3).

Notes

All provided arrays must share the same first dimension (batch size).
Arrays with shape (batch, n_atoms, 3) are flattened internally to shape (batch, n_atoms * 3).
Arrays with shape (batch, 1) are converted to shape (batch,).
Missing quantities are computed on demand from the available ones.

property size

get_samples(length: int = -1) → ndarray | None[source]

Sample configurations.

Returns:: Array with shape (batch, d), or None if no samples were provided.
Return type:: ndarray or None

get_log_probs(length: int = -1) → ndarray | None[source]

Log-probabilities of the samples.

If log-probabilities were not provided during initialization but energies were given, they are computed as

log p(x) = -E(x) / (k_B T)

Returns:: Array with shape (batch,), or None if neither energies nor log-probabilities are available.
Return type:: ndarray or None

get_energies(length: int = -1) → ndarray | None[source]

Energies of the samples.

If energies were not provided during initialization but log-probabilities were given, they are computed as

E(x) = -log p(x) * (k_B T)

Returns:: Array with shape (batch,), or None if neither energies nor log-probabilities are available.
Return type:: ndarray or None

get_scores(length: int = -1) → ndarray | None[source]

Score function values ∇_x log p(x).

If scores were not provided but forces were given, they are computed as

score(x) = force(x) / (k_B T)

Returns:: Array with shape (batch, d), or None if neither scores nor forces are available.
Return type:: ndarray or None

get_forces(length: int = -1) → ndarray | None[source]

Forces acting on the samples.

Forces are defined as

force(x) = -∇_x E(x)

If forces were not provided but scores were given, they are computed as

force(x) = score(x) * (k_B T)

Returns:: Array with shape (batch, d), or None if neither forces nor scores are available.
Return type:: ndarray or None

scale_domain(length_scale: float) → Dataset[source]

Creates a new dataset with a rescaled spatial domain.

The transformation rescales the coordinate samples and all related differential quantities consistently according to the change of variables:

\[x_{scaled} =\]

rac{x}{L}

where \(L\) is the provided length_scale.

Under this transformation:

Samples/coordinates are divided by length_scale

Scores are multiplied by length_scale

Forces are multiplied by length_scale

Log probabilities, energies, and thermodynamic quantities are left unchanged.

length_scalefloat
Spatial scaling factor \(L\) used to rescale the dataset domain. A value greater than 1 contracts the coordinate representation, while a value smaller than 1 expands it.

Dataset
A new dataset instance with rescaled samples, scores, and forces.

This operation does NOT modify the dataset in-place.