Mixture of Gaussians
Overview
The boltzkit.targets.gaussian_mixture.DiagonalGaussianMixture class defines a Gaussian mixture model (GMM) with diagonal covariance matrices.
Each component is a multivariate normal distribution with independent dimensions:
The mixture weights are parameterized via logits and normalized internally using a log-softmax.
Instantiation
A mixture model can be created directly from parameters:
import numpy as np
from boltzkit.targets import DiagonalGaussianMixture
means = np.array([
[-2.0, -2.0],
[ 2.0, 2.0],
])
diag_stds = np.array([
[0.8, 0.8],
[0.5, 0.9],
])
logits = np.array([0.4, 0.6])
target = DiagonalGaussianMixture(means, diag_stds, logits)
Structure
means: shape(K, D)diag_stds: shape(K, D), strictly positivelogits: shape(K,)(normalized internally)
Factory constructors
Isotropic mixture
All components share the same standard deviation:
target = DiagonalGaussianMixture.create_isotropic(
means=means,
std=1.0,
logits=logits
)
Uniform isotropic mixture (e.g., GMM40)
Means are sampled uniformly in a range and weights are equal:
target = DiagonalGaussianMixture.create_isotropic_uniform(
std=1.0,
n_components=40,
dim=2,
mean_range=(-40.0, 40.0),
seed=0
)
Predefined GMM (toy benchmark)
A standard test configuration:
target = DiagonalGaussianMixture.create_gmm40()
Sampling
Samples can be drawn directly from the mixture:
samples = target.sample(n_samples=1000, seed=0)
Dataset generation
A deterministic synthetic dataset can be generated via:
dataset = target.load_dataset(
type="val",
length=1000,
# + optional arguments to automatically include log_probs and/or scores
)
Properties:
Deterministic for fixed
(type, seed), defines the entire infinite sequence of samplesReproducible across calls
Supports
train,val,testsplitsReturns the first
lengthsamples of the sequence; increasinglengthpreserves the existing prefix and appends additional samples without actually caching anything using procedural generation.
Interpretation
The model defines:
where:
\(\pi_k\) are normalized mixture weights
\(\Sigma_k\) are diagonal covariance matrices