boltzkit.utils.cached_repo

Module Attributes

Content

Write text bytes: write binary Callable: creates file at path

Functions

create_cached_repo(uri[, local_repos_dir, ...])

Creates CachedRepo object from the given URI (Unified Resource Identifier).

normalize_path(path)

strip_repo_prefix(full_path, repo_root)

Returns the relative path to the given repo root.

Classes

CachedRepo

Abstract base class representing a cached repository.

HuggingfaceRepo

LocalRepo

VirtualRepo

Creates cache directory from in-memory content, i.e., cache dir is not backed by some form of directory or repository

boltzkit.utils.cached_repo.strip_repo_prefix(full_path: str, repo_root: str) str[source]

Returns the relative path to the given repo root.

class boltzkit.utils.cached_repo.CachedRepo[source]

Bases: ABC

Abstract base class representing a cached repository.

A CachedRepo provides a unified interface for interacting with remote repositories (e.g., Huggingface datasets or local directories) while caching files locally for efficient repeated access.

remote_uri

The URI or path of the remote repository.

Type:

str

local_path

The local directory where files are cached.

Type:

Path

__init__(remote_uri: str, local_repo_path: Path, lazy_load: bool)[source]

Initialize a CachedRepo instance.

Parameters:
  • remote_uri (str) – The remote repository URI or path.

  • local_repo_path (Path) – Local path where cached files will be stored.

  • lazy_load (bool) – If True, files are loaded on demand; if False, all files are loaded immediately.

post_init()[source]
abstractmethod load_file(relative_fpath: str) Path[source]
try_load_file(relative_fpath: str | None) Path | None[source]
abstractmethod load_all_files() None[source]
abstractmethod list_remote_files() list[str][source]
find_file(regex: str) list[str][source]

Return all remote files matching the given regex pattern.

Parameters:

regex (str) – Regular expression to match against file paths.

Returns:

List of matching file paths (repo-relative).

Return type:

List[str]

property config: dict[str, Any]
abstractmethod classmethod match_uri(uri: str) bool[source]
abstractmethod classmethod get_name_from_uri(uri: str) str[source]
property remote_uri: str
property local_path: Path
get_cached_key_value_store()[source]
class boltzkit.utils.cached_repo.HuggingfaceRepo[source]

Bases: CachedRepo

__init__(remote_uri, local_repo_path, lazy_load)[source]

Initialize a CachedRepo instance.

Parameters:
  • remote_uri (str) – The remote repository URI or path.

  • local_repo_path (Path) – Local path where cached files will be stored.

  • lazy_load (bool) – If True, files are loaded on demand; if False, all files are loaded immediately.

load_file(relative_fpath)[source]
load_all_files()[source]
list_remote_files()[source]
classmethod match_uri(uri)[source]
classmethod get_name_from_uri(uri)[source]
class boltzkit.utils.cached_repo.LocalRepo[source]

Bases: CachedRepo

__init__(remote_uri, local_repo_path, lazy_load)[source]

Initialize a CachedRepo instance.

Parameters:
  • remote_uri (str) – The remote repository URI or path.

  • local_repo_path (Path) – Local path where cached files will be stored.

  • lazy_load (bool) – If True, files are loaded on demand; if False, all files are loaded immediately.

load_file(relative_fpath)[source]
load_all_files()[source]
list_remote_files()[source]
classmethod match_uri(uri)[source]
classmethod get_name_from_uri(uri)[source]
boltzkit.utils.cached_repo.Content

Write text bytes: write binary Callable: creates file at path

Type:

str

alias of str | bytes | Callable[[Path], None]

boltzkit.utils.cached_repo.normalize_path(path: str | PurePosixPath) str[source]
class boltzkit.utils.cached_repo.VirtualRepo[source]

Bases: CachedRepo

Creates cache directory from in-memory content, i.e., cache dir is not backed by some form of directory or repository

__init__(remote_uri, local_repo_path, lazy_load, file_tree: dict[str, str | bytes | Callable[[Path], None]])[source]

remote_uri must have format ‘virtual://<name>’, e.g., ‘virtual://foo’, which will create a cache dir with name ‘virtual_foo’.

load_file(relative_fpath)[source]
load_all_files()[source]
list_remote_files()[source]
classmethod match_uri(uri)[source]
classmethod get_name_from_uri(uri)[source]
boltzkit.utils.cached_repo.create_cached_repo(uri: str, local_repos_dir: Path = PosixPath('target_cache'), lazy_load: bool = True, **kwargs)[source]

Creates CachedRepo object from the given URI (Unified Resource Identifier). The type of the CachedRepo is automatically determined by the given URI.