boltzkit.utils.dataset
Classes
Dataset container for samples drawn from a Boltzmann distribution. |
- class boltzkit.utils.dataset.Dataset[source]
Bases:
objectDataset container for samples drawn from a Boltzmann distribution.
The class stores samples together with thermodynamic quantities such as energies, log-probabilities, scores, or forces. Missing quantities are computed lazily from the available ones using the thermodynamic relations between energy, probability, and forces.
The following relationships are assumed:
log p(x) = -E(x) / (k_B T) score(x) = ∇_x log p(x) force(x) = -∇_x E(x)
which implies
score(x) = force(x) / (k_B T)
Internally, all vector quantities are represented in flattened form with shape (batch, d). Inputs with atomic coordinate shape (batch, n_atoms, 3) are automatically reshaped to this representation.
The class performs consistency checks on batch dimensions and ensures that incompatible quantities (e.g., energies and log-probabilities, or scores and forces) are not provided simultaneously.
- __init__(kB_T: float, *, samples: ndarray | None = None, log_probs: ndarray | None = None, energies: ndarray | None = None, scores: ndarray | None = None, forces: ndarray | None = None)[source]
This class stores samples together with associated energies, log-probabilities, scores, or forces. Related quantities are computed lazily when accessed.
The following physical relationships are assumed:
log p(x) = -E(x) / (k_B T)
score(x) = ∇_x log p(x)
force(x) = -∇_x E(x)
which implies
score(x) = force(x) / (k_B T)
Either energies or log-probabilities may be provided, but not both. Likewise, either scores or forces may be provided.
- Parameters:
kB_T (float) – Thermal energy (Boltzmann constant times temperature). Must be positive.
samples (ndarray, optional) – Sample configurations with shape (batch, d) or (batch, #atoms, 3).
log_probs (ndarray, optional) – Log-probabilities of samples with shape (batch,) or (batch, 1).
energies (ndarray, optional) – Energies of samples with shape (batch,) or (batch, 1).
scores (ndarray, optional) – Score function values ∇_x log p(x) with shape (batch, d) or (batch, n_atoms, 3).
forces (ndarray, optional) – Forces -∇_x E(x) with shape (batch, d) or (batch, n_atoms, 3).
Notes
All provided arrays must share the same first dimension (batch size).
Arrays with shape (batch, n_atoms, 3) are flattened internally to shape (batch, n_atoms * 3).
Arrays with shape (batch, 1) are converted to shape (batch,).
Missing quantities are computed on demand from the available ones.
- property size
- get_samples(length: int = -1) ndarray | None[source]
Sample configurations.
- Returns:
Array with shape (batch, d), or
Noneif no samples were provided.- Return type:
ndarray or None
- get_log_probs(length: int = -1) ndarray | None[source]
Log-probabilities of the samples.
If log-probabilities were not provided during initialization but energies were given, they are computed as
log p(x) = -E(x) / (k_B T)
- Returns:
Array with shape (batch,), or
Noneif neither energies nor log-probabilities are available.- Return type:
ndarray or None
- get_energies(length: int = -1) ndarray | None[source]
Energies of the samples.
If energies were not provided during initialization but log-probabilities were given, they are computed as
E(x) = -log p(x) * (k_B T)
- Returns:
Array with shape (batch,), or
Noneif neither energies nor log-probabilities are available.- Return type:
ndarray or None
- get_scores(length: int = -1) ndarray | None[source]
Score function values ∇_x log p(x).
If scores were not provided but forces were given, they are computed as
score(x) = force(x) / (k_B T)
- Returns:
Array with shape (batch, d), or
Noneif neither scores nor forces are available.- Return type:
ndarray or None
- get_forces(length: int = -1) ndarray | None[source]
Forces acting on the samples.
Forces are defined as
force(x) = -∇_x E(x)
If forces were not provided but scores were given, they are computed as
force(x) = score(x) * (k_B T)
- Returns:
Array with shape (batch, d), or
Noneif neither forces nor scores are available.- Return type:
ndarray or None
- scale_domain(length_scale: float) Dataset[source]
Creates a new dataset with a rescaled spatial domain.
The transformation rescales the coordinate samples and all related differential quantities consistently according to the change of variables:
\[x_{scaled} =\]rac{x}{L}
where \(L\) is the provided
length_scale.Under this transformation:
Samples/coordinates are divided by
length_scaleScores are multiplied by
length_scaleForces are multiplied by
length_scale
Log probabilities, energies, and thermodynamic quantities are left unchanged.
- length_scalefloat
Spatial scaling factor \(L\) used to rescale the dataset domain. A value greater than
1contracts the coordinate representation, while a value smaller than1expands it.
- Dataset
A new dataset instance with rescaled samples, scores, and forces.
This operation does NOT modify the dataset in-place.