Base Provider

Base package for the data provider layer of the library.

This module provides the base classes and exceptions for the data provider layer implementations.

Description

Production/subproduction can be access through a data provider. From data providers, job entries can be accessed, created, updated, and deleted. From job entries, data entries can be accessed, created, updated, and deleted. From data entries, data can be read/written.

E.g.: Production 00001234/0000: - Job n°00000042: - Data log_data_0001.xml - Data job.info ... - Job 00001200: - Data log_data_0042.xml - Data log.txt - Data stdout.xml ...

Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression. Dict entries are used to access dictionaries data, identified by a unique name. The name of a dictionary is the common part between all related files = the filenames without numbers.

E.g.: - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

When a provider is read-only, it is not possible to create, update, or delete anything. When a data provider is compressed (assigned to a dictionary provider), underlying data is compressed with Zstandard and the dictionary provider contains the dictionaries, used for both compression and decompression.

Some data providers implementations may not implement the related dict provider, supporting only the old "uncompressed" format. On the other hand, some implementations may only support access to the new "compressed" format (Zstandard). Some implementations may only support the read-only mode.

The "compressed" status, indicating if the underlying data are compressed in Zstandard or just in a raw uncompressed format, is just an "indication". Indeed, no matter what type is a data provider, it will always read/write data as is without any processing, considering data are provided/gave already in the good compressed/uncompressed state.

Classes:

Name	Description
`- DataProvider`	Base class for data provider implementations.
`- DictProvider`	Base class for Zstandard dictionary provider implementations.
`- DataEntry`	Base class for data entry objects.
`- DictEntry`	Base class for Zstandard dictionary entry objects.
`- JobEntry`	Base class for job entry objects.
`- JobInfo`	Base class for job information objects.

Raises:

Type	Description
`-DataExistsError`	Exception raised when data already exists in the provider.
`-DataNotExistsError`	Exception raised when data does not exist in the provider.
`-DictExistsError`	Exception raised when a dictionary already exists in the provider.
`-DictInvalidError`	Exception raised when a dictionary is invalid.
`-DictNotExistsError`	Exception raised when a dictionary does not exist in the provider.
`-JobExistsError`	Exception raised when a job already exists in the provider.
`-JobNotExistsError`	Exception raised when a job does not exist in the provider.
`-ReadOnlyError`	Exception raised when a provider is read-only.

`DataEntry`

Bases: NamedEntry, ABC

Base data entry class, provide an abstraction layer for data access / IO operations.

This represents a data entry through a data provider, the entry's data can be read/write through file-like object obtained from the main reader/writer methods

Source code in src/lhcbdirac_log/providers/base/accessors.py

class DataEntry(NamedEntry, ABC):
    """Base data entry class, provide an abstraction layer for data access / IO operations.

    This represents a data entry through a data provider,
    the entry's data can be read/write through file-like object obtained from the main reader/writer methods
    """

    __slots__ = (
        "_compressed",
        "_job",
    )

    def __init__(self, name: str, job: int, *, compressed: bool, readonly: bool) -> None:
        """[Internal] Initialize the data entry.

        Args:
            name: the data name
            job: the job id
            compressed: indicate that the underlying data is compressed (in Zstandard)
            readonly: indicate weather the data is read-only or not

        Notes:
            - instantiation alone has no effects on the provider, the data will be created on first write
        """
        self._job = job
        self._compressed = compressed
        super().__init__(name, readonly=readonly)

    @property
    @final
    def compressed(self) -> bool:
        """Check if the underlying data is compressed or not (in Zstandard).

        Returns:
            True if the underlying data is compressed, False otherwise
        """
        return self._compressed

    @property
    @final
    def size(self) -> int:
        """Get the stored data size.

        This is the size of the stored data, the same as the read data from the reader.

        Returns:
            the data size or 0 if the entry not exists

        Notes:
            - zero-size may indicate that the data exists but is empty
            - compressed size for compressed entry, uncompressed size for uncompressed entry
        """
        return self._size() or 0

    @property
    @final
    def exists(self) -> bool:
        """Check if the data exists.

        Returns:
            True if the data exists, False otherwise
        """
        return self._size() is not None

    @property
    @final
    def job(self) -> int:
        """Get the job id.

        Returns:
            the job id
        """
        return self._job

    @property
    @override
    @final
    def dict_name(self) -> str:
        """Get the dict name.

        Returns:
            the dict name
        """
        return self.filename_to_dictname(self._name)

    @abstractmethod
    def _reader(self) -> BinaryIO:
        """[Internal] Get a data reader as a file-like object.

        Returns:
            a data reader

        Notes:
            - the caller is responsible of the reader's lifecycle
            - each call returns a new reader
            - may not support concurrent readers and / or writers
        """

    @final
    def reader(self) -> BinaryIO:
        """Get a data reader as a file-like object.

        Returns:
            a data reader

        Raises:
            DataNotExistsError: if the data does not exist

        Notes:
            - the caller is responsible of the reader's lifecycle
            - each call returns a new reader
            - may not support concurrent readers and / or writers
        """
        if not self.exists:
            raise DataNotExistsError(self._name)

        return self._reader()

    @abstractmethod
    def _writer(self) -> BinaryIO:
        """[Internal] Get a data writer as a file-like object.

        Returns:
            a data writer

        Notes:
            - the caller is responsible of the writer's lifecycle
            - each call returns a new writer
            - may not support concurrent readers and / or writers
        """

    @final
    def writer(self) -> BinaryIO:
        """Get a data writer as a file-like object.

        Returns:
            a data writer

        Raises:
            ReadOnlyError: if the data is read-only

        Notes:
            - the caller is responsible of the writer's lifecycle
            - each call returns a new writer
            - may not support concurrent readers and / or writers
        """
        if self._readonly:
            msg = f"Data '{self._name}' is read-only"
            raise ReadOnlyError(msg)

        return self._writer()

    @abstractmethod
    def _size(self) -> int | None:
        """[Internal] Get the stored data size.

        Returns:
            the stored data size or None if the data does not exist
        """

    @abstractmethod
    def _delete(self) -> None:
        """[Internal] Delete the data.

        Raises:
            DataNotExistsError: if the data does not exist
        """

    @final
    def delete(self) -> None:
        """Delete the data.

        Raises:
            DataNotExistsError: if the data does not exist
            ReadOnlyError: if the data is read-only
        """
        if self._readonly:
            msg = f"Data '{self._name}' is read-only"
            raise ReadOnlyError(msg)

        self._delete()

`compressed: bool` `property`

Check if the underlying data is compressed or not (in Zstandard).

Returns:

Type	Description
`bool`	True if the underlying data is compressed, False otherwise

`dict_name: str` `property`

Get the dict name.

Returns:

Type	Description
`str`	the dict name

`exists: bool` `property`

Check if the data exists.

Returns:

Type	Description
`bool`	True if the data exists, False otherwise

`job: int` `property`

Get the job id.

Returns:

Type	Description
`int`	the job id

`size: int` `property`

Get the stored data size.

This is the size of the stored data, the same as the read data from the reader.

Returns:

Type	Description
`int`	the data size or 0 if the entry not exists

Notes

zero-size may indicate that the data exists but is empty
compressed size for compressed entry, uncompressed size for uncompressed entry

`init(name, job, *, compressed, readonly)`

[Internal] Initialize the data entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the data name	required
`job`	`int`	the job id	required
`compressed`	`bool`	indicate that the underlying data is compressed (in Zstandard)	required
`readonly`	`bool`	indicate weather the data is read-only or not	required

Notes

instantiation alone has no effects on the provider, the data will be created on first write

Source code in src/lhcbdirac_log/providers/base/accessors.py

def __init__(self, name: str, job: int, *, compressed: bool, readonly: bool) -> None:
    """[Internal] Initialize the data entry.

    Args:
        name: the data name
        job: the job id
        compressed: indicate that the underlying data is compressed (in Zstandard)
        readonly: indicate weather the data is read-only or not

    Notes:
        - instantiation alone has no effects on the provider, the data will be created on first write
    """
    self._job = job
    self._compressed = compressed
    super().__init__(name, readonly=readonly)

`delete()`

Delete the data.

Raises:

Type	Description
`DataNotExistsError`	if the data does not exist
`ReadOnlyError`	if the data is read-only

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def delete(self) -> None:
    """Delete the data.

    Raises:
        DataNotExistsError: if the data does not exist
        ReadOnlyError: if the data is read-only
    """
    if self._readonly:
        msg = f"Data '{self._name}' is read-only"
        raise ReadOnlyError(msg)

    self._delete()

`reader()`

Get a data reader as a file-like object.

Returns:

Type	Description
`BinaryIO`	a data reader

Raises:

Type	Description
`DataNotExistsError`	if the data does not exist

Notes

the caller is responsible of the reader's lifecycle
each call returns a new reader
may not support concurrent readers and / or writers

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def reader(self) -> BinaryIO:
    """Get a data reader as a file-like object.

    Returns:
        a data reader

    Raises:
        DataNotExistsError: if the data does not exist

    Notes:
        - the caller is responsible of the reader's lifecycle
        - each call returns a new reader
        - may not support concurrent readers and / or writers
    """
    if not self.exists:
        raise DataNotExistsError(self._name)

    return self._reader()

`writer()`

Get a data writer as a file-like object.

Returns:

Type	Description
`BinaryIO`	a data writer

Raises:

Type	Description
`ReadOnlyError`	if the data is read-only

Notes

the caller is responsible of the writer's lifecycle
each call returns a new writer
may not support concurrent readers and / or writers

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def writer(self) -> BinaryIO:
    """Get a data writer as a file-like object.

    Returns:
        a data writer

    Raises:
        ReadOnlyError: if the data is read-only

    Notes:
        - the caller is responsible of the writer's lifecycle
        - each call returns a new writer
        - may not support concurrent readers and / or writers
    """
    if self._readonly:
        msg = f"Data '{self._name}' is read-only"
        raise ReadOnlyError(msg)

    return self._writer()

`DataExistsError`

Bases: Exception

Raised when a data already exists.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class DataExistsError(Exception):
    """Raised when a data already exists."""

`DataNotExistsError`

Bases: Exception

Raised when a data does not exist.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class DataNotExistsError(Exception):
    """Raised when a data does not exist."""

`DataProvider`

Bases: Provider[J], ABC

Base class for data provider implementations, providing an abstraction layer for data management.

Production/subproduction can be access through a data provider. From data providers, job entries can be accessed, created, updated, and deleted. From job entries, data entries can be accessed, created, updated, and deleted. From data entries, data can be read/written.

E.g.: Production 00001234/0000: - Job n°00000042: - Data log_data_0001.xml - Data job.info ... - Job 00001200: - Data log_data_0042.xml - Data log.txt - Data stdout.xml ...

Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression. Dict entries are used to access dictionaries data, identified by a unique name. The name of a dictionary is the common part between all related files = the filenames without numbers.

E.g.: - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

When a provider is read-only, it is not possible to create, update, or delete anything. When a data provider is compressed (assigned to a dictionary provider), underlying data is compressed with Zstandard and the dictionary provider contains the dictionaries, used for both compression and decompression.

Some data providers implementations may not implement the related dict provider, supporting only the old "uncompressed" format. On the other hand, some implementations may only support access to the new "compressed" format (Zstandard). Some implementations may only support the read-only mode.

The "compressed" status, indicating if the underlying data are compressed in Zstandard or just in a raw uncompressed format, is just an "indication". Indeed, no matter what type is a data provider, it will always read/write data as is without any processing, considering data are provided/gave already in the good compressed/uncompressed state.

Source code in src/lhcbdirac_log/providers/base/providers.py

class DataProvider[J: JobEntry](Provider[J], ABC):
    """Base class for data provider implementations, providing an abstraction layer for data management.

    Production/subproduction can be access through a data provider.
    From data providers, job entries can be accessed, created, updated, and deleted.
    From job entries, data entries can be accessed, created, updated, and deleted.
    From data entries, data can be read/written.

    E.g.:
        Production 00001234/0000:
            - Job n°00000042:
                - Data log_data_0001.xml
                - Data job.info
                  ...
            - Job 00001200:
                - Data log_data_0042.xml
                - Data log.txt
                - Data stdout.xml
                ...

    Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression.
    Dict entries are used to access dictionaries data, identified by a unique name.
    The name of a dictionary is the common part between all related files = the filenames without numbers.

    E.g.:
        - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

    When a provider is read-only, it is not possible to create, update, or delete anything.
    When a data provider is compressed (assigned to a dictionary provider),
    underlying data is compressed with Zstandard and the dictionary provider contains the dictionaries,
    used for both compression and decompression.

    Some data providers implementations may not implement the related dict provider,
    supporting only the old "uncompressed" format. On the other hand, some implementations may only support access to
    the new "compressed" format (Zstandard).
    Some implementations may only support the read-only mode.

    The "compressed" status, indicating if the underlying data are compressed in Zstandard or just in a raw uncompressed
    format, is just an "indication". Indeed, no matter what type is a data provider, it will always read/write data as
    is without any processing, considering data are provided/gave already in the good compressed/uncompressed state.
    """

    __slots__ = ("_dict_provider",)

    def __init__(self, dict_provider: DictProvider | None = None, *, readonly: bool) -> None:
        """[Internal] Initialize the provider.

        Args:
            dict_provider: the dict provider associated to the data (default is None), specifying this implies that the provided data are compressed.
                           Some providers may not support this or supports only dict providers from the same implementation.
            readonly: indicate weather the provider is read-only or not
        """
        super().__init__(readonly=readonly)
        self._dict_provider = dict_provider

    @property
    @final
    def compressed(self) -> bool:
        """Check if the underlying data are compressed or not (in Zstandard).

        Returns:
            True if the data are compressed, False otherwise

        Notes:
            - True implies that the `dict_provider` is not None
            - False implies that the `dict_provider` is None
        """
        return self._dict_provider is not None

    @property
    @final
    def dict_provider(self) -> DictProvider | None:
        """Get the linked dict provider or None.

        Returns:
            the dict provider or None
        """
        return self._dict_provider

    @abstractmethod
    def _get(self, job: int, *, create: bool = False) -> J:
        """[Internal] Get a job entry.

        Args:
            job: the job id
            create: if True, create the job if it does not exist (default is False)

        Returns:
            the job entry

        Raises:
            JobNotExistsError: if the job does not exist and create is False
        """

    @final
    def get(self, job: int, *, create: bool = False) -> J:
        """Get a job entry.

        Args:
            job: the job id
            create: if True, create the job if it does not exist (default is False)

        Returns:
            the job entry

        Raises:
            JobNotExistsError: if the job does not exist and create is False
            ReadOnlyError: if the provider is read-only and create is True
        """
        if create and self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        return self._get(job, create=create)

    @abstractmethod
    def _create(self, job: int, *, exists_ok: bool = False) -> J:
        """[Internal] Create a job entry.

        Args:
            job: the job id
            exists_ok: if True, ignore the error if the job already exists (default is False)

        Returns:
            the job entry

        Raises:
            JobExistsError: if the job already exists and exists_ok is False
        """

    @final
    def create(self, job: int, *, exists_ok: bool = False) -> J:
        """Create a job entry.

        Args:
            job: the job id
            exists_ok: if True, ignore the error if the job already exists (default is False)

        Returns:
            the job entry

        Raises:
            JobExistsError: if the job already exists and exists_ok is False
            ReadOnlyError: if the provider is read-only
        """
        if self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        return self._create(job, exists_ok=exists_ok)

    @abstractmethod
    def _delete(self, job: int, *, force: bool = False) -> None:
        """[Internal] Delete a job.

        Args:
            job: the job id
            force: if True, delete the job even if it is not empty (default is False)

        Raises:
            JobNotExistsError: if the job does not exist
            DataExistsError: if the job is not empty and force is False
        """

    @final
    def delete(self, job: int, *, force: bool = False) -> None:
        """Delete a job.

        Args:
            job: the job id
            force: if True, delete the job even if it is not empty (default is False)

        Raises:
            JobNotExistsError: if the job does not exist
            DataExistsError: if the job is not empty and force is False
            ReadOnlyError: if the provider is read-only
        """
        if self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        self._delete(job, force=force)

    @final
    def clear(self, *, force: bool = False) -> None:
        """Clear all the jobs.

        Args:
            force: if True, delete the jobs even if they are not empty (default is False)

        Raises:
            DataExistsError: if a job is not empty and force is False
            ReadOnlyError: if the provider is read-only
        """
        if self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        for j in self.jobs():
            self.delete(j, force=force)

    @abstractmethod
    def jobs(self) -> Generator[int, None, None]:
        """Get all existing jobs id.

        Returns:
            a generator of the jobs id
        """

    @override
    @property
    def size(self) -> int:
        """Get the size of all jobs.

        Returns:
            the whole job size

        Notes:
            - see JobEntry.job_size for more details
        """
        return sum(i.job_size for i in self)

    @property
    def data_size(self) -> int:
        """Get all stored data size.

        Returns:
            the whole data size

        Notes:
            - see JobEntry.data_size for more details
        """
        return sum(i.data_size for i in self)

    @final
    def __iter__(self) -> Iterator[J]:
        """Iterate over all the jobs entries.

        Returns:
            an iterator of the jobs entries
        """
        return (self.get(i) for i in self.jobs())

    @final
    def __getitem__(self, job: int) -> J:
        """Get a job entry.

        Args:
            job: the job id

        Returns:
            the job entry

        Raises:
            JobNotExistsError: if the job does not exist

        Notes:
            - same as: provider.get(job)
        """
        return self.get(job)

    @override
    def __len__(self) -> int:
        """Get the number of jobs.

        Returns:
            the number of jobs
        """
        return sum(1 for _ in self.jobs())

    @final
    def transfer(self, target: DataProvider, *, limit: int = 0) -> int:
        """Transfer (copy) all jobs data to another provider.

        Args:
            target: the target provider
            limit: the maximum number of jobs to transfer (default is 0, meaning all)

        Returns:
            the number of transferred jobs

        Raises:
            ValueError: if the target provider is not of the same format (`compressed` value is different)
            JobExistsError: if copied jobs already exists in the target provider
            DictExistsError: if copied dicts already exists in the target provider
            ReadOnlyError: if the target provider is read-only

        Notes:
            - nothing is transferred if the target provider is the same as the source provider
            - if the providers are compressed (Zstandard format), the dict provider is also transferred, first
            - dict provider transfer is only performed if the target dict provider is not read-only
        """
        if self is target:
            return 0

        if self.compressed != target.compressed:
            msg = "The data provider compression mode must be the same"
            raise ValueError(msg)

        if target.readonly:
            msg = "The target provider is read-only"
            raise ReadOnlyError(msg)

        if self.compressed and not target.dict_provider.readonly:
            self.dict_provider.transfer(target.dict_provider)

        n = -1
        for n, job in enumerate(self):
            tjob = target.create(job.job, exists_ok=False)

            if tjob.compressed:
                tjob.update_info(job.info)

            for i in job:
                dst = tjob.create(i.name, exists_ok=False)

                with i.reader() as r, dst.writer() as w:
                    copyfileobj(r, w)

            if limit and n >= limit - 1:
                break

        return n + 1

`compressed: bool` `property`

Check if the underlying data are compressed or not (in Zstandard).

Returns:

Type	Description
`bool`	True if the data are compressed, False otherwise

Notes

True implies that the dict_provider is not None
False implies that the dict_provider is None

`data_size: int` `property`

Get all stored data size.

Returns:

Type	Description
`int`	the whole data size

Notes

see JobEntry.data_size for more details

`dict_provider: DictProvider | None` `property`

Get the linked dict provider or None.

Returns:

Type	Description
`DictProvider \| None`	the dict provider or None

`size: int` `property`

Get the size of all jobs.

Returns:

Type	Description
`int`	the whole job size

Notes

see JobEntry.job_size for more details

`getitem(job)`

Get a job entry.

Parameters:

Name	Type	Description	Default
`job`	`int`	the job id	required

Returns:

Type	Description
`J`	the job entry

Raises:

Type	Description
`JobNotExistsError`	if the job does not exist

Notes

same as: provider.get(job)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __getitem__(self, job: int) -> J:
    """Get a job entry.

    Args:
        job: the job id

    Returns:
        the job entry

    Raises:
        JobNotExistsError: if the job does not exist

    Notes:
        - same as: provider.get(job)
    """
    return self.get(job)

`init(dict_provider=None, *, readonly)`

[Internal] Initialize the provider.

Parameters:

Name	Type	Description	Default
`dict_provider`	`DictProvider \| None`	the dict provider associated to the data (default is None), specifying this implies that the provided data are compressed. Some providers may not support this or supports only dict providers from the same implementation.	`None`
`readonly`	`bool`	indicate weather the provider is read-only or not	required

Source code in src/lhcbdirac_log/providers/base/providers.py

def __init__(self, dict_provider: DictProvider | None = None, *, readonly: bool) -> None:
    """[Internal] Initialize the provider.

    Args:
        dict_provider: the dict provider associated to the data (default is None), specifying this implies that the provided data are compressed.
                       Some providers may not support this or supports only dict providers from the same implementation.
        readonly: indicate weather the provider is read-only or not
    """
    super().__init__(readonly=readonly)
    self._dict_provider = dict_provider

`iter()`

Iterate over all the jobs entries.

Returns:

Type	Description
`Iterator[J]`	an iterator of the jobs entries

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __iter__(self) -> Iterator[J]:
    """Iterate over all the jobs entries.

    Returns:
        an iterator of the jobs entries
    """
    return (self.get(i) for i in self.jobs())

`len()`

Get the number of jobs.

Returns:

Type	Description
`int`	the number of jobs

Source code in src/lhcbdirac_log/providers/base/providers.py

@override
def __len__(self) -> int:
    """Get the number of jobs.

    Returns:
        the number of jobs
    """
    return sum(1 for _ in self.jobs())

`clear(*, force=False)`

Clear all the jobs.

Parameters:

Name	Type	Description	Default
`force`	`bool`	if True, delete the jobs even if they are not empty (default is False)	`False`

Raises:

Type	Description
`DataExistsError`	if a job is not empty and force is False
`ReadOnlyError`	if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def clear(self, *, force: bool = False) -> None:
    """Clear all the jobs.

    Args:
        force: if True, delete the jobs even if they are not empty (default is False)

    Raises:
        DataExistsError: if a job is not empty and force is False
        ReadOnlyError: if the provider is read-only
    """
    if self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    for j in self.jobs():
        self.delete(j, force=force)

`create(job, *, exists_ok=False)`

Create a job entry.

Parameters:

Name	Type	Description	Default
`job`	`int`	the job id	required
`exists_ok`	`bool`	if True, ignore the error if the job already exists (default is False)	`False`

Returns:

Type	Description
`J`	the job entry

Raises:

Type	Description
`JobExistsError`	if the job already exists and exists_ok is False
`ReadOnlyError`	if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def create(self, job: int, *, exists_ok: bool = False) -> J:
    """Create a job entry.

    Args:
        job: the job id
        exists_ok: if True, ignore the error if the job already exists (default is False)

    Returns:
        the job entry

    Raises:
        JobExistsError: if the job already exists and exists_ok is False
        ReadOnlyError: if the provider is read-only
    """
    if self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    return self._create(job, exists_ok=exists_ok)

`delete(job, *, force=False)`

Delete a job.

Parameters:

Name	Type	Description	Default
`job`	`int`	the job id	required
`force`	`bool`	if True, delete the job even if it is not empty (default is False)	`False`

Raises:

Type	Description
`JobNotExistsError`	if the job does not exist
`DataExistsError`	if the job is not empty and force is False
`ReadOnlyError`	if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def delete(self, job: int, *, force: bool = False) -> None:
    """Delete a job.

    Args:
        job: the job id
        force: if True, delete the job even if it is not empty (default is False)

    Raises:
        JobNotExistsError: if the job does not exist
        DataExistsError: if the job is not empty and force is False
        ReadOnlyError: if the provider is read-only
    """
    if self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    self._delete(job, force=force)

`get(job, *, create=False)`

Get a job entry.

Parameters:

Name	Type	Description	Default
`job`	`int`	the job id	required
`create`	`bool`	if True, create the job if it does not exist (default is False)	`False`

Returns:

Type	Description
`J`	the job entry

Raises:

Type	Description
`JobNotExistsError`	if the job does not exist and create is False
`ReadOnlyError`	if the provider is read-only and create is True

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def get(self, job: int, *, create: bool = False) -> J:
    """Get a job entry.

    Args:
        job: the job id
        create: if True, create the job if it does not exist (default is False)

    Returns:
        the job entry

    Raises:
        JobNotExistsError: if the job does not exist and create is False
        ReadOnlyError: if the provider is read-only and create is True
    """
    if create and self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    return self._get(job, create=create)

`jobs()` `abstractmethod`

Get all existing jobs id.

Returns:

Type	Description
`Generator[int, None, None]`	a generator of the jobs id

Source code in src/lhcbdirac_log/providers/base/providers.py

@abstractmethod
def jobs(self) -> Generator[int, None, None]:
    """Get all existing jobs id.

    Returns:
        a generator of the jobs id
    """

`transfer(target, *, limit=0)`

Transfer (copy) all jobs data to another provider.

Parameters:

Name	Type	Description	Default
`target`	`DataProvider`	the target provider	required
`limit`	`int`	the maximum number of jobs to transfer (default is 0, meaning all)	`0`

Returns:

Type	Description
`int`	the number of transferred jobs

Raises:

Type	Description
`ValueError`	if the target provider is not of the same format (`compressed` value is different)
`JobExistsError`	if copied jobs already exists in the target provider
`DictExistsError`	if copied dicts already exists in the target provider
`ReadOnlyError`	if the target provider is read-only

Notes

nothing is transferred if the target provider is the same as the source provider
if the providers are compressed (Zstandard format), the dict provider is also transferred, first
dict provider transfer is only performed if the target dict provider is not read-only

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def transfer(self, target: DataProvider, *, limit: int = 0) -> int:
    """Transfer (copy) all jobs data to another provider.

    Args:
        target: the target provider
        limit: the maximum number of jobs to transfer (default is 0, meaning all)

    Returns:
        the number of transferred jobs

    Raises:
        ValueError: if the target provider is not of the same format (`compressed` value is different)
        JobExistsError: if copied jobs already exists in the target provider
        DictExistsError: if copied dicts already exists in the target provider
        ReadOnlyError: if the target provider is read-only

    Notes:
        - nothing is transferred if the target provider is the same as the source provider
        - if the providers are compressed (Zstandard format), the dict provider is also transferred, first
        - dict provider transfer is only performed if the target dict provider is not read-only
    """
    if self is target:
        return 0

    if self.compressed != target.compressed:
        msg = "The data provider compression mode must be the same"
        raise ValueError(msg)

    if target.readonly:
        msg = "The target provider is read-only"
        raise ReadOnlyError(msg)

    if self.compressed and not target.dict_provider.readonly:
        self.dict_provider.transfer(target.dict_provider)

    n = -1
    for n, job in enumerate(self):
        tjob = target.create(job.job, exists_ok=False)

        if tjob.compressed:
            tjob.update_info(job.info)

        for i in job:
            dst = tjob.create(i.name, exists_ok=False)

            with i.reader() as r, dst.writer() as w:
                copyfileobj(r, w)

        if limit and n >= limit - 1:
            break

    return n + 1

`DictEntry`

Bases: NamedEntry, ABC

Base dict entry class, provide an abstraction layer for dict access.

This represents a dict entry, an instance implies the dict exists and can be accessed (except after manual delete).

Delete operation must be done through the associated provider, as dict entry instances are considered read-only and may not be used after delete operation.

Source code in src/lhcbdirac_log/providers/base/accessors.py

class DictEntry(NamedEntry, ABC):
    """Base dict entry class, provide an abstraction layer for dict access.

    This represents a dict entry, an instance implies the dict exists and can be accessed (except after manual delete).

    Delete operation must be done through the associated provider,
    as dict entry instances are considered read-only and may not be used after delete operation.
    """

    __slots__ = (
        "_config",
        "_data",
        "_dict",
        "_zstd_id",
    )

    def __init__(self, name: str, config: Config, data: bytes | None = None, zstd_id: int | None = None) -> None:
        """[Internal] Initialize the dict entry.

        Args:
            name: the dict name
            config: the configuration to use for precomputing the dictionary
            data: the dict data (create a new dict if not None)
            zstd_id: the zstd dictionary id (None for unknown)
        """
        super().__init__(name, readonly=True)
        self._data = data
        self._dict: ZstdCompressionDict | None = None
        self._config = config
        self._zstd_id = zstd_id

        if self._data is not None:
            self._save()

    @property
    @override
    @final
    def dict_name(self) -> str:
        """Get the dict name.

        Returns:
            the dict name

        Notes:
            - alias to the dict entry name
        """
        return self._name

    @property
    @final
    def data(self) -> bytes:
        """Get the dict data.

        Returns:
            the dict data

        Raises:
            DictNotExistsError: if the dict does not exist (may never be raised under normal usages)

        Notes:
            - the data is loaded on first access (lazy loading)
        """
        if self._data is None:
            self._data = self._load_data()

        return self._data

    @property
    @abstractmethod
    def size(self) -> int:
        """Get the dict size.

        Returns:
            the dict size or 0 if the dict does not exist
        """

    @property
    @abstractmethod
    def exists(self) -> bool:
        """Check if the dict exists.

        Returns:
            True if the dict exists, False otherwise
        """

    @property
    @final
    def dict(self) -> ZstdCompressionDict:
        """Get the zstd-dict object (precomputed for shared usaged).

        Returns:
            the zstd-dict object

        Raises:
            DictNotExistsError: if the dict does not exist (may never be raised under normal usages)
        """
        if self._dict is None:
            self._load_dict()

        return self._dict

    @property
    @final
    def zstd_id(self) -> int:
        """Get the zstd dictionary id.

        Returns:
            the zstd dictionary id

        Raises:
            DictNotExistsError: if the dict does not exist (may never be raised under normal usages)
        """
        if self._zstd_id is None:
            self._zstd_id = self.dict.dict_id()

        return self._zstd_id

    @property
    @final
    def is_loaded(self) -> bool:
        """Check if the dict is loaded.

        Returns:
            True if the dict is loaded, False otherwise
        """
        return self._data is not None

    @final
    def _load_dict(self) -> None:
        """[Internal] Load the dict from data."""
        self._dict = ZstdCompressionDict(self.data, DICT_TYPE_FULLDICT)
        self._dict.precompute_compress(compression_params=self._config.params)

    @abstractmethod
    def _load_data(self) -> bytes:
        """[Internal] Get the dict's data.

        Returns:
            the dict's data

        Raises:
            DictNotExistsError: if the dict does not exist
        """

    @abstractmethod
    def _save(self) -> None:
        """[Internal] Save the dict data / create the dict entry.

        Notes:
            - the behavior is undefined if the dict already exists (may raise an error or overwrite)
        """

`data: bytes` `property`

Get the dict data.

Returns:

Type	Description
`bytes`	the dict data

Raises:

Type	Description
`DictNotExistsError`	if the dict does not exist (may never be raised under normal usages)

Notes

the data is loaded on first access (lazy loading)

`dict: ZstdCompressionDict` `property`

Get the zstd-dict object (precomputed for shared usaged).

Returns:

Type	Description
`ZstdCompressionDict`	the zstd-dict object

Raises:

Type	Description
`DictNotExistsError`	if the dict does not exist (may never be raised under normal usages)

`dict_name: str` `property`

Get the dict name.

Returns:

Type	Description
`str`	the dict name

Notes

alias to the dict entry name

`exists: bool` `abstractmethod` `property`

Check if the dict exists.

Returns:

Type	Description
`bool`	True if the dict exists, False otherwise

`is_loaded: bool` `property`

Check if the dict is loaded.

Returns:

Type	Description
`bool`	True if the dict is loaded, False otherwise

`size: int` `abstractmethod` `property`

Get the dict size.

Returns:

Type	Description
`int`	the dict size or 0 if the dict does not exist

`zstd_id: int` `property`

Get the zstd dictionary id.

Returns:

Type	Description
`int`	the zstd dictionary id

Raises:

Type	Description
`DictNotExistsError`	if the dict does not exist (may never be raised under normal usages)

`init(name, config, data=None, zstd_id=None)`

[Internal] Initialize the dict entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required
`config`	`Config`	the configuration to use for precomputing the dictionary	required
`data`	`bytes \| None`	the dict data (create a new dict if not None)	`None`
`zstd_id`	`int \| None`	the zstd dictionary id (None for unknown)	`None`

Source code in src/lhcbdirac_log/providers/base/accessors.py

def __init__(self, name: str, config: Config, data: bytes | None = None, zstd_id: int | None = None) -> None:
    """[Internal] Initialize the dict entry.

    Args:
        name: the dict name
        config: the configuration to use for precomputing the dictionary
        data: the dict data (create a new dict if not None)
        zstd_id: the zstd dictionary id (None for unknown)
    """
    super().__init__(name, readonly=True)
    self._data = data
    self._dict: ZstdCompressionDict | None = None
    self._config = config
    self._zstd_id = zstd_id

    if self._data is not None:
        self._save()

`DictExistsError`

Bases: Exception

Raised when a dict already exists.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class DictExistsError(Exception):
    """Raised when a dict already exists."""

`DictInvalidError`

Bases: Exception

Raised when an invalid dict is requested.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class DictInvalidError(Exception):
    """Raised when an invalid dict is requested."""

`DictNotExistsError`

Bases: Exception

Raised when a dict does not exist.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class DictNotExistsError(Exception):
    """Raised when a dict does not exist."""

`DictProvider`

Bases: Provider[E], ABC

Base class for Zstandard dictionary provider implementations, providing an abstraction layer for dictionary management.

Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression.

The provider manages the dictionaries and provides access to them. Dict entries are used to access dictionaries data, identified by a unique name. The name of a dictionary is the common part between all related files = the filenames without numbers.

E.g.: - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

Dictionaries are loaded on demand and are marked as missing if they do not exist. This marking can be used to avoid trying to load them again if they are not found.

Dictionaries that failed their training are marked as invalid. Whereas missing dictionaries, invalid dictionaries marks can be persistent, depending on the implementation.

Source code in src/lhcbdirac_log/providers/base/providers.py

class DictProvider[E: DictEntry](Provider[E], ABC):
    """Base class for Zstandard dictionary provider implementations, providing an abstraction layer for dictionary management.

    Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression.

    The provider manages the dictionaries and provides access to them.
    Dict entries are used to access dictionaries data, identified by a unique name.
    The name of a dictionary is the common part between all related files = the filenames without numbers.

    E.g.:
        - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

    Dictionaries are loaded on demand and are marked as missing if they do not exist.
    This marking can be used to avoid trying to load them again if they are not found.

    Dictionaries that failed their training are marked as invalid.
    Whereas missing dictionaries, invalid dictionaries marks can be persistent, depending on the implementation.
    """

    __slots__ = (
        "_config",
        "_dicts",
        "_invalid",
        "_missing",
    )

    def __init__(self, config: Config, *, readonly: bool) -> None:
        """[Internal] Initialize the provider.

        Args:
            config: the configuration to use for precomputing the dictionaries
            readonly: indicate weather the provider is read-only or not
        """
        super().__init__(readonly=readonly)
        self._dicts = dict[str, E]()
        self._invalid = self._load_invalid()
        self._missing = set[str]()
        self._config = config

    def _load_invalid(self) -> set[str]:
        """[Internal] Load the invalid dicts names.

        Returns:
            the invalid dicts names

        Notes:
            - default implementation returns an empty set
            - must be implemented if invalid dicts are saved
        """
        return set()

    def _mark_invalid(self, name: str) -> None:
        """[Internal] Mark a dict as invalid.

        Args:
            name: the dict name

        Notes:
            - default implementation does nothing
            - must be implemented if invalid dicts are saved
        """

    @property
    def config(self) -> Config:
        """Get the configuration.

        Returns:
            the configuration
        """
        return self._config

    @final
    def is_invalid(self, name: str) -> bool:
        """Check if the dict is marked as invalid.

        Args:
            name: the dict name

        Returns:
            True if the dict is invalid, False otherwise

        Notes:
            - Invalid dicts are dicts that failed to train
            - These dicts may be marked only when a training was attempted
        """
        return name in self._invalid

    @final
    def is_missing(self, name: str) -> bool:
        """Check if the dict is marked as missing.

        Args:
            name: the dict name

        Returns:
            True if the dict is missing, False otherwise

        Notes:
            - Missing dicts are dicts that failed to load
            - These dicts are marked only when loading were attempted
            - Must not be confused with !is_loaded
        """
        return name in self._missing

    @final
    def is_loaded(self, name: str) -> bool:
        """Check if the dict is loaded (exists in cache).

        Args:
            name: the dict name

        Returns:
            True if the dict is loaded, False otherwise

        Notes:
            - same as: 'dict_name' in provider (__contains__)
        """
        return name in self

    @final
    def mark_invalid(self, name: str) -> None:
        """Mark a dict as invalid.

        Args:
            name: the dict name

        Raises:
            DictExistsError: if the dict is already loaded

        Notes:
            - unload if the dict is loaded but not exist anymore (may never happen under normal usages)
        """
        if self.is_loaded(name):
            if self[name].exists:
                raise DictExistsError(name)
            del self._dicts[name]

        if self.is_missing(name):
            self._missing.discard(name)

        self._mark_invalid(name)
        self._invalid.add(name)

    @final
    def mark_missing(self, name: str) -> None:
        """Mark a dict as missing.

        Args:
            name: the dict name

        Raises:
            DictExistsError: if the dict is already loaded
            DictInvalidError: if the dict is already marked as invalid

        Notes:
            - unload if the dict is loaded but not exist (may never happen under normal usages)
        """
        if self.is_loaded(name):
            if self[name].exists:
                raise DictExistsError(name)
            del self._dicts[name]

        if self.is_invalid(name):
            raise DictInvalidError(name)

        self._missing.add(name)

    @override
    @property
    def size(self) -> int:
        """Get the total size of all dicts data.

        Returns:
            the total size
        """
        return sum(self[i].size for i in self)

    @abstractmethod
    def _load(self, name: str) -> E:
        """Load a dict.

        Args:
            name: the dict name

        Raises:
            DictNotExistsError: if the dict does not exist
        """

    @abstractmethod
    def _add(self, name: str, data: bytes, zstd_id: int) -> E:
        """[Internal] Create a new dict entry.

        Args:
            name: the dict name
            data: the dict data
            zstd_id: the zstd dictionary id

        Raises:
            DictExistsError: if the dict already exists
        """

    @final
    def add(self, name: str, data: bytes, zstd_id: int, *, load: bool = True) -> E:
        """Add a new dict entry from data.

        Args:
            name: the dict name
            data: the dict data
            zstd_id: the zstd dictionary id
            load: if True, keep the dict loaded (default is True)

        Returns:
            the dict entry

        Raises:
            DictExistsError: if the dict already exists
            ReadOnlyError: if the provider is read-only
        """
        if self._readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        d = self._add(name, data, zstd_id)
        if load:
            self._dicts[name] = d

        self._missing.discard(name)  # ensure valid state
        self._invalid.discard(name)  # ensure valid state
        return d

    @abstractmethod
    def _delete(self, name: str) -> None:
        """[Internal] Delete a dict entry.

        Args:
            name: the dict name to delete

        Raises:
            DictNotExistsError: if the dict does not exist
        """

    @final
    def delete(self, name: str) -> None:
        """Delete a dict entry.

        Args:
            name: the dict name to delete

        Raises:
            DictNotExistsError: if the dict does not exist
            ReadOnlyError: if the provider is read-only

        Notes:
            - may or may not check if data are linked to the dict before deletion
            - if the dict is loaded, it is obviously considered unloaded
            - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
            - same as: del provider[dict_name]
        """
        if self._readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        self._delete(name)
        self._dicts.pop(name, None)  # delete it from the loaded dicts if loaded

    @final
    def clear(self) -> None:
        """Clear all the existing dicts.

        Raises:
            ReadOnlyError: if the provider is read-only
        """
        if self._readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        for i in self:
            del self[i]

    @final
    def transfer(self, target: DictProvider | None) -> int:
        """Transfer (copy) all dicts to another provider.

        Args:
            target: the target provider

        Returns:
            the number of transferred dicts

        Raises:
            ValueError: if the target provider is not specified (None)
            DictExistsError: if copied dicts already exists in the target provider
            ReadOnlyError: if the target provider is read-only

        Notes:
            - Nothing is transferred if the target provider is the same as the source provider
        """
        if self is target:
            return 0

        if target is None:
            msg = "The target provider must be specified"
            raise ValueError(msg)

        if target.readonly:
            msg = "The target provider is read-only"
            raise ReadOnlyError(msg)

        for i in self._invalid:
            target.mark_invalid(i)

        n = -1
        for n, i in enumerate(self):
            if (d := self.get(i)) is not None:
                target.add(i, d.data, d.zstd_id, load=False)

        return n + 1

    @final
    def _iter_loaded(self) -> Generator[str, None, None]:
        """[Internal] Get the loaded dicts names.

        Returns:
            a generator of the loaded dicts names
        """
        yield from self._dicts.keys()

    @abstractmethod
    def _iter_all(self) -> Generator[str, None, None]:
        """[Internal] Get all the dicts names (loaded and non-loaded/loadable).

        Returns:
            a generator of all the dicts names
        """

    @final
    def iter(self, *, loaded_only: bool = False) -> Generator[str, None, None]:
        """Get the dicts names.

        Args:
            loaded_only: if True, only returns the loaded dicts names, otherwise returns all loadable (default False)

        Returns:
            a generator of the dicts names
        """
        return self._iter_loaded() if loaded_only else self._iter_all()

    @final
    def get(
        self,
        name: str,
        default: E | None = None,
        *,
        invalid_ok: bool = False,
        missing_ok: bool = True,
    ) -> E | None:
        """Get the dict.

        Args:
            name: the dict name to get
            default: the default value returned on ignored errors (default is None)
            invalid_ok: if True, ignores invalid dict error and returns the default value (default is False)
            missing_ok: if True, ignores missing dict error and returns the default value (default is True)

        Returns:
            the dict entry or the default value

        Raises:
            DictInvalidError: if the dict is invalid and invalid_ok is False
            DictNotExistsError: if the dict does not exist and missing_ok is False

        Notes:
            - if invalid_ok and missing_ok are both False, then it is equivalent to provider[dict_name] (__getitem__)
            - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
            - getting the same dict multiple times will return the same object (cached)
        """
        try:
            return self[name]

        except DictInvalidError:
            if not invalid_ok:
                raise

        except DictNotExistsError:
            if not missing_ok:
                raise

        return default

    @final
    def __getitem__(self, name: str) -> E:
        """Get the dict.

        Args:
            name: the dict name to get

        Returns:
            the dict entry

        Raises:
            DictNotExistsError: if the dict does not exist
            DictInvalidError: if the dict is invalid

        Notes:
            - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
            - getting the same dict multiple times will return the same object (cached)
            - same as: provider.get(dict_name, invalid_ok=False, missing_ok=False)
        """
        if self.is_invalid(name):  # only the provider can know if it is invalid
            raise DictInvalidError(name)

        if self.is_missing(name):  # avoid trying to (re)load if we know it is missing
            raise DictNotExistsError(name)

        if (d := self._dicts.get(name)) is None:  # not already loaded
            try:
                d = self._dicts[name] = self._load(name)  # try to load it
            except Exception as err:
                self._missing.add(name)  # mark it as missing
                raise DictNotExistsError(name) from err

        return d

    @final
    def __delitem__(self, name: str) -> None:
        """Delete a dict entry.

        Args:
            name: the dict name to delete

        Raises:
            DictNotExistsError: if the dict does not exist
            ReadOnlyError: if the provider is read-only

        Notes:
            - may or may not check if data are linked to the dict before deletion
            - if the dict is loaded, it is obviously considered unloaded
            - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
            - same as: provider.delete(dict_name)
        """
        self.delete(name)

    @final
    def __contains__(self, name: str) -> bool:
        """Check if the dict is loaded (exists in cache).

        Args:
            name: the dict name

        Returns:
            True if the dict loaded, False otherwise

        Notes:
            - same as: provider.is_loaded(dict_name)
        """
        return name in self._dicts

    @final
    def __iter__(self) -> Iterator[str]:
        """Get all the dicts names.

        Returns:
            an iterator of all the dicts names

        Notes:
            - same as: provider.iter(False)
        """
        return self.iter()

    @final
    def __len__(self) -> int:
        """Get the number of existing dicts.

        Returns:
            the number of existing dicts
        """
        return sum(1 for _ in self)

`config: Config` `property`

Get the configuration.

Returns:

Type	Description
`Config`	the configuration

`size: int` `property`

Get the total size of all dicts data.

Returns:

Type	Description
`int`	the total size

`contains(name)`

Check if the dict is loaded (exists in cache).

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required

Returns:

Type	Description
`bool`	True if the dict loaded, False otherwise

Notes

same as: provider.is_loaded(dict_name)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __contains__(self, name: str) -> bool:
    """Check if the dict is loaded (exists in cache).

    Args:
        name: the dict name

    Returns:
        True if the dict loaded, False otherwise

    Notes:
        - same as: provider.is_loaded(dict_name)
    """
    return name in self._dicts

`delitem(name)`

Delete a dict entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name to delete	required

Raises:

Type	Description
`DictNotExistsError`	if the dict does not exist
`ReadOnlyError`	if the provider is read-only

Notes

may or may not check if data are linked to the dict before deletion
if the dict is loaded, it is obviously considered unloaded
instances of the related dict entry, if still accessible, may not be used nor trusted anymore
same as: provider.delete(dict_name)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __delitem__(self, name: str) -> None:
    """Delete a dict entry.

    Args:
        name: the dict name to delete

    Raises:
        DictNotExistsError: if the dict does not exist
        ReadOnlyError: if the provider is read-only

    Notes:
        - may or may not check if data are linked to the dict before deletion
        - if the dict is loaded, it is obviously considered unloaded
        - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
        - same as: provider.delete(dict_name)
    """
    self.delete(name)

`getitem(name)`

Get the dict.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name to get	required

Returns:

Type	Description
`E`	the dict entry

Raises:

Type	Description
`DictNotExistsError`	if the dict does not exist
`DictInvalidError`	if the dict is invalid

Notes

if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
getting the same dict multiple times will return the same object (cached)
same as: provider.get(dict_name, invalid_ok=False, missing_ok=False)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __getitem__(self, name: str) -> E:
    """Get the dict.

    Args:
        name: the dict name to get

    Returns:
        the dict entry

    Raises:
        DictNotExistsError: if the dict does not exist
        DictInvalidError: if the dict is invalid

    Notes:
        - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
        - getting the same dict multiple times will return the same object (cached)
        - same as: provider.get(dict_name, invalid_ok=False, missing_ok=False)
    """
    if self.is_invalid(name):  # only the provider can know if it is invalid
        raise DictInvalidError(name)

    if self.is_missing(name):  # avoid trying to (re)load if we know it is missing
        raise DictNotExistsError(name)

    if (d := self._dicts.get(name)) is None:  # not already loaded
        try:
            d = self._dicts[name] = self._load(name)  # try to load it
        except Exception as err:
            self._missing.add(name)  # mark it as missing
            raise DictNotExistsError(name) from err

    return d

`init(config, *, readonly)`

[Internal] Initialize the provider.

Parameters:

Name	Type	Description	Default
`config`	`Config`	the configuration to use for precomputing the dictionaries	required
`readonly`	`bool`	indicate weather the provider is read-only or not	required

Source code in src/lhcbdirac_log/providers/base/providers.py

def __init__(self, config: Config, *, readonly: bool) -> None:
    """[Internal] Initialize the provider.

    Args:
        config: the configuration to use for precomputing the dictionaries
        readonly: indicate weather the provider is read-only or not
    """
    super().__init__(readonly=readonly)
    self._dicts = dict[str, E]()
    self._invalid = self._load_invalid()
    self._missing = set[str]()
    self._config = config

`iter()`

Get all the dicts names.

Returns:

Type	Description
`Iterator[str]`	an iterator of all the dicts names

Notes

same as: provider.iter(False)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __iter__(self) -> Iterator[str]:
    """Get all the dicts names.

    Returns:
        an iterator of all the dicts names

    Notes:
        - same as: provider.iter(False)
    """
    return self.iter()

`len()`

Get the number of existing dicts.

Returns:

Type	Description
`int`	the number of existing dicts

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def __len__(self) -> int:
    """Get the number of existing dicts.

    Returns:
        the number of existing dicts
    """
    return sum(1 for _ in self)

`add(name, data, zstd_id, *, load=True)`

Add a new dict entry from data.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required
`data`	`bytes`	the dict data	required
`zstd_id`	`int`	the zstd dictionary id	required
`load`	`bool`	if True, keep the dict loaded (default is True)	`True`

Returns:

Type	Description
`E`	the dict entry

Raises:

Type	Description
`DictExistsError`	if the dict already exists
`ReadOnlyError`	if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def add(self, name: str, data: bytes, zstd_id: int, *, load: bool = True) -> E:
    """Add a new dict entry from data.

    Args:
        name: the dict name
        data: the dict data
        zstd_id: the zstd dictionary id
        load: if True, keep the dict loaded (default is True)

    Returns:
        the dict entry

    Raises:
        DictExistsError: if the dict already exists
        ReadOnlyError: if the provider is read-only
    """
    if self._readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    d = self._add(name, data, zstd_id)
    if load:
        self._dicts[name] = d

    self._missing.discard(name)  # ensure valid state
    self._invalid.discard(name)  # ensure valid state
    return d

`clear()`

Clear all the existing dicts.

Raises:

Type	Description
`ReadOnlyError`	if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def clear(self) -> None:
    """Clear all the existing dicts.

    Raises:
        ReadOnlyError: if the provider is read-only
    """
    if self._readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    for i in self:
        del self[i]

`delete(name)`

Delete a dict entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name to delete	required

Raises:

Type	Description
`DictNotExistsError`	if the dict does not exist
`ReadOnlyError`	if the provider is read-only

Notes

may or may not check if data are linked to the dict before deletion
if the dict is loaded, it is obviously considered unloaded
instances of the related dict entry, if still accessible, may not be used nor trusted anymore
same as: del provider[dict_name]

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def delete(self, name: str) -> None:
    """Delete a dict entry.

    Args:
        name: the dict name to delete

    Raises:
        DictNotExistsError: if the dict does not exist
        ReadOnlyError: if the provider is read-only

    Notes:
        - may or may not check if data are linked to the dict before deletion
        - if the dict is loaded, it is obviously considered unloaded
        - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
        - same as: del provider[dict_name]
    """
    if self._readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    self._delete(name)
    self._dicts.pop(name, None)  # delete it from the loaded dicts if loaded

`get(name, default=None, *, invalid_ok=False, missing_ok=True)`

Get the dict.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name to get	required
`default`	`E \| None`	the default value returned on ignored errors (default is None)	`None`
`invalid_ok`	`bool`	if True, ignores invalid dict error and returns the default value (default is False)	`False`
`missing_ok`	`bool`	if True, ignores missing dict error and returns the default value (default is True)	`True`

Returns:

Type	Description
`E \| None`	the dict entry or the default value

Raises:

Type	Description
`DictInvalidError`	if the dict is invalid and invalid_ok is False
`DictNotExistsError`	if the dict does not exist and missing_ok is False

Notes

if invalid_ok and missing_ok are both False, then it is equivalent to provider[dict_name] (getitem)
if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
getting the same dict multiple times will return the same object (cached)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def get(
    self,
    name: str,
    default: E | None = None,
    *,
    invalid_ok: bool = False,
    missing_ok: bool = True,
) -> E | None:
    """Get the dict.

    Args:
        name: the dict name to get
        default: the default value returned on ignored errors (default is None)
        invalid_ok: if True, ignores invalid dict error and returns the default value (default is False)
        missing_ok: if True, ignores missing dict error and returns the default value (default is True)

    Returns:
        the dict entry or the default value

    Raises:
        DictInvalidError: if the dict is invalid and invalid_ok is False
        DictNotExistsError: if the dict does not exist and missing_ok is False

    Notes:
        - if invalid_ok and missing_ok are both False, then it is equivalent to provider[dict_name] (__getitem__)
        - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
        - getting the same dict multiple times will return the same object (cached)
    """
    try:
        return self[name]

    except DictInvalidError:
        if not invalid_ok:
            raise

    except DictNotExistsError:
        if not missing_ok:
            raise

    return default

`is_invalid(name)`

Check if the dict is marked as invalid.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required

Returns:

Type	Description
`bool`	True if the dict is invalid, False otherwise

Notes

Invalid dicts are dicts that failed to train
These dicts may be marked only when a training was attempted

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def is_invalid(self, name: str) -> bool:
    """Check if the dict is marked as invalid.

    Args:
        name: the dict name

    Returns:
        True if the dict is invalid, False otherwise

    Notes:
        - Invalid dicts are dicts that failed to train
        - These dicts may be marked only when a training was attempted
    """
    return name in self._invalid

`is_loaded(name)`

Check if the dict is loaded (exists in cache).

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required

Returns:

Type	Description
`bool`	True if the dict is loaded, False otherwise

Notes

same as: 'dict_name' in provider (contains)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def is_loaded(self, name: str) -> bool:
    """Check if the dict is loaded (exists in cache).

    Args:
        name: the dict name

    Returns:
        True if the dict is loaded, False otherwise

    Notes:
        - same as: 'dict_name' in provider (__contains__)
    """
    return name in self

`is_missing(name)`

Check if the dict is marked as missing.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required

Returns:

Type	Description
`bool`	True if the dict is missing, False otherwise

Notes

Missing dicts are dicts that failed to load
These dicts are marked only when loading were attempted
Must not be confused with !is_loaded

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def is_missing(self, name: str) -> bool:
    """Check if the dict is marked as missing.

    Args:
        name: the dict name

    Returns:
        True if the dict is missing, False otherwise

    Notes:
        - Missing dicts are dicts that failed to load
        - These dicts are marked only when loading were attempted
        - Must not be confused with !is_loaded
    """
    return name in self._missing

`iter(*, loaded_only=False)`

Get the dicts names.

Parameters:

Name	Type	Description	Default
`loaded_only`	`bool`	if True, only returns the loaded dicts names, otherwise returns all loadable (default False)	`False`

Returns:

Type	Description
`Generator[str, None, None]`	a generator of the dicts names

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def iter(self, *, loaded_only: bool = False) -> Generator[str, None, None]:
    """Get the dicts names.

    Args:
        loaded_only: if True, only returns the loaded dicts names, otherwise returns all loadable (default False)

    Returns:
        a generator of the dicts names
    """
    return self._iter_loaded() if loaded_only else self._iter_all()

`mark_invalid(name)`

Mark a dict as invalid.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required

Raises:

Type	Description
`DictExistsError`	if the dict is already loaded

Notes

unload if the dict is loaded but not exist anymore (may never happen under normal usages)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def mark_invalid(self, name: str) -> None:
    """Mark a dict as invalid.

    Args:
        name: the dict name

    Raises:
        DictExistsError: if the dict is already loaded

    Notes:
        - unload if the dict is loaded but not exist anymore (may never happen under normal usages)
    """
    if self.is_loaded(name):
        if self[name].exists:
            raise DictExistsError(name)
        del self._dicts[name]

    if self.is_missing(name):
        self._missing.discard(name)

    self._mark_invalid(name)
    self._invalid.add(name)

`mark_missing(name)`

Mark a dict as missing.

Parameters:

Name	Type	Description	Default
`name`	`str`	the dict name	required

Raises:

Type	Description
`DictExistsError`	if the dict is already loaded
`DictInvalidError`	if the dict is already marked as invalid

Notes

unload if the dict is loaded but not exist (may never happen under normal usages)

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def mark_missing(self, name: str) -> None:
    """Mark a dict as missing.

    Args:
        name: the dict name

    Raises:
        DictExistsError: if the dict is already loaded
        DictInvalidError: if the dict is already marked as invalid

    Notes:
        - unload if the dict is loaded but not exist (may never happen under normal usages)
    """
    if self.is_loaded(name):
        if self[name].exists:
            raise DictExistsError(name)
        del self._dicts[name]

    if self.is_invalid(name):
        raise DictInvalidError(name)

    self._missing.add(name)

`transfer(target)`

Transfer (copy) all dicts to another provider.

Parameters:

Name	Type	Description	Default
`target`	`DictProvider \| None`	the target provider	required

Returns:

Type	Description
`int`	the number of transferred dicts

Raises:

Type	Description
`ValueError`	if the target provider is not specified (None)
`DictExistsError`	if copied dicts already exists in the target provider
`ReadOnlyError`	if the target provider is read-only

Notes

Nothing is transferred if the target provider is the same as the source provider

Source code in src/lhcbdirac_log/providers/base/providers.py

@final
def transfer(self, target: DictProvider | None) -> int:
    """Transfer (copy) all dicts to another provider.

    Args:
        target: the target provider

    Returns:
        the number of transferred dicts

    Raises:
        ValueError: if the target provider is not specified (None)
        DictExistsError: if copied dicts already exists in the target provider
        ReadOnlyError: if the target provider is read-only

    Notes:
        - Nothing is transferred if the target provider is the same as the source provider
    """
    if self is target:
        return 0

    if target is None:
        msg = "The target provider must be specified"
        raise ValueError(msg)

    if target.readonly:
        msg = "The target provider is read-only"
        raise ReadOnlyError(msg)

    for i in self._invalid:
        target.mark_invalid(i)

    n = -1
    for n, i in enumerate(self):
        if (d := self.get(i)) is not None:
            target.add(i, d.data, d.zstd_id, load=False)

    return n + 1

`JobEntry`

Bases: Entry, ABC

Base job entry class, provide an abstraction layer for job access.

This represents a job entry, managing the job's data entries. Job metadata can be accessed on uncompressed jobs. For compressed jobs, the implementation must handle metadata saving and loading.

Source code in src/lhcbdirac_log/providers/base/accessors.py

class JobEntry[E: DataEntry](Entry, ABC):
    """Base job entry class, provide an abstraction layer for job access.

    This represents a job entry, managing the job's data entries.
    Job metadata can be accessed on uncompressed jobs.
    For compressed jobs, the implementation must handle metadata saving and loading.
    """

    __slots__ = (
        "_compressed",
        "_job",
        "_info",
    )

    def __init__(self, job: int, *, compressed: bool, readonly: bool) -> None:
        """[Internal] Initialize the job entry.

        Args:
            job: the job id
            compressed: indicate whether the underlying data is compressed or not (in Zstandard)
            readonly: indicate weather the job is read-only or not
        """
        super().__init__(readonly=readonly)

        self._info = None
        self._job = job
        self._compressed = compressed

    @property
    @final
    def job(self) -> int:
        """Get the job id.

        Returns:
            the job id
        """
        return self._job

    @property
    @final
    def compressed(self) -> bool:
        """Check if the underlying data is compressed or not (in Zstandard).

        Returns:
            True if the underlying data is compressed, False otherwise
        """
        return self._compressed

    @property
    def empty(self) -> bool:
        """Check if the job is empty.

        Returns:
            True if the job is empty, False otherwise
        """
        return not any(True for _ in self.files())

    @property
    def info(self) -> JobInfo:
        """Get the job's metadata.

        Returns:
            the job's metadata

        Notes:
            - call update_info to save changes if any
        """
        if self._info is None:
            self._info = self._load_info()

        return self._info

    @staticmethod
    def _readlines(f: BinaryIO, lines: list[bytes], n: int = 1024) -> bool:
        """[Internal] Read lines from a file.

        Args:
            f: the file to read from
            lines: the list to store the lines in
            n: the number of bytes to read at once

        Returns:
            False if EOF is reached, True otherwise
        """
        read = True
        while len(lines) <= 1 and read:
            data = f.read(n)

            if len(data) < n:  # EOF
                read = False

            st, *end = data.splitlines()
            if not lines or lines[-1].endswith(b"\n"):
                lines.append(st)
            else:
                lines[-1] += st
            lines.extend(end)

        return read

    @staticmethod
    def _sub_id(x: str) -> int:
        """[Internal] Get the sub id from the file name.

        Args:
            x: the file name

        Returns:
            the sub id
        """
        try:
            return int(x.rsplit("_", 1)[-1][:-4])
        except ValueError:  # pragma: no cover
            return 0

    def _get_dirac_id(self) -> int | None:
        """[Internal] Get the job's DIRAC ID."""
        try:
            with self.get("job.info").reader() as f:
                read = True
                lines: list[bytes] = []

                while read:
                    read = self._readlines(f, lines)

                    while len(lines) > read:
                        line = lines.pop(0)

                        if line.startswith(b"/JobID"):
                            return int(line.split(b"=")[1].strip())
        except (DataNotExistsError, ValueError):
            pass

    def _get_success(self) -> bool | None:
        """[Internal] Get the job's success status.

        Returns:
            the job's success status
        """
        try:
            file = max((i for i in self.files() if i.startswith("summary") and i.endswith(".xml")), key=self._sub_id)

            with self.get(file).reader() as f:
                read = True
                lines = []

                while read:
                    read = self._readlines(f, lines)

                    while len(lines) > read:
                        line = lines.pop(0)

                        try:
                            return line[line.index(b"<success>") + 9] in b"Tt"
                        except ValueError:
                            continue

        except (DataNotExistsError, ValueError):
            pass
        return None

    def _load_info(self) -> JobInfo:
        """[Internal] Load the job's metadata.

        Returns:
            the job's metadata
        """
        info = JobInfo(None, None)

        if not self._compressed:
            info.dirac_id = self._get_dirac_id()
            info.success = self._get_success()

        return info

    @abstractmethod
    def _update_info(self) -> None:
        """[Internal] Update the job's metadata.

        Notes:
            - implementation not handling saving metadata can leave this method empty
        """

    @final
    def update_info(self, info: JobInfo | None = None) -> None:
        """Update the job's metadata, for compressed job.

        Args:
            info: new metadata to copy from and save (or None to save the current metadata)

        Raises:
            RuntimeError: if called on non-compressed job
            ReadOnlyError: if the job is read-only

        Notes:
            - for non-compressed job, the metadata is loaded directly from the job's data
            - for compressed job, the metadata is not accessible, and so, must be saved by the provider (if support is intended)
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        if not self._compressed:
            msg = f"Job '{self._job}' is not compressed"
            raise RuntimeError(msg)

        if info is None:
            if self._info is None:
                self._info = self._load_info()
        elif self._info is None:  # copy the info
            self._info = JobInfo(info.dirac_id, info.success)
        else:  # transfer the info
            self._info.dirac_id = info.dirac_id
            self._info.success = info.success

        self._update_info()

    @abstractmethod
    def _get(self, name: str, *, create: bool = False) -> E:
        """[Internal] Get a data entry.

        Args:
            name: the data name
            create: if True, create the data if it does not exist (default is False)

        Returns:
            the data entry

        Raises:
            DataNotExistsError: if the data does not exist and create is False

        Notes:
            - the entry, if newly created, will not exist until data is written
        """

    @final
    def get(self, name: str, *, create: bool = False) -> E:
        """Get a data entry.

        Args:
            name: the data name
            create: if True, create the data if it does not exist (default is False)

        Returns:
            the data entry

        Raises:
            DataNotExistsError: if the data does not exist and create is False
            ReadOnlyError: if the job is read-only and create is True

        Notes:
            - the entry, if newly created, will not exist until data is written
        """
        if create and self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        return self._get(name, create=create)

    @abstractmethod
    def _create(self, name: str, *, exists_ok: bool = False) -> E:
        """[Internal] Create a data entry.

        Args:
            name: the data name
            exists_ok: if True, ignore the error if the data already exists (default is False)

        Returns:
            the data entry

        Raises:
            DataExistsError: if the data already exists and exists_ok is False

        Notes:
            - the entry, if newly created, will not exist until data is written
        """

    @final
    def create(self, name: str, *, exists_ok: bool = False) -> E:
        """Create a data entry.

        Args:
            name: the data name
            exists_ok: if True, ignore the error if the data already exists (default is False)

        Returns:
            the data entry

        Raises:
            DataExistsError: if the data already exists and exists_ok is False
            ReadOnlyError: if the job is read-only

        Notes:
            - the entry, if newly created, will not exist until data is written
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        return self._create(name, exists_ok=exists_ok)

    def delete(self, name: str) -> None:
        """Delete a data entry.

        Args:
            name: the data name

        Raises:
            DataNotExistsError: if the data does not exist
            ReadOnlyError: if the job is read-only
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        self.get(name).delete()

    def clear(self) -> None:
        """Clear all data entries.

        Raises:
            ReadOnlyError: if the job is read-only
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        for i in self:
            i.delete()

    @abstractmethod
    def files(self) -> Generator[str, None, None]:
        """Get a generator of the job's files.

        Returns:
            a generator of the job's file names
        """

    @property
    def data_size(self) -> int:
        """Get all stored data size.

        Returns:
            the sum of all the job's data sizes

        Notes:
            - see DataEntry.size for more details
        """
        return sum(i.size for i in self)

    @property
    def job_size(self) -> int:
        """Get the stored job size.

        Returns:
            the stored job size

        Notes:
            - may include additional overheads compared to data_size
            - may represent better the 'on-disk' size
            - may not be necessarily upper or lower than data_size
        """
        return self.data_size

    @final
    def __iter__(self) -> Iterator[E]:
        """Iterate over all the job's data entries.

        Returns:
            a generator of the job's data entries
        """
        return (self.get(i) for i in self.files())

    def __len__(self) -> int:
        """Get the number of data entries.

        Returns:
            the number of data entries
        """
        return sum(1 for _ in self.files())

`compressed: bool` `property`

Check if the underlying data is compressed or not (in Zstandard).

Returns:

Type	Description
`bool`	True if the underlying data is compressed, False otherwise

`data_size: int` `property`

Get all stored data size.

Returns:

Type	Description
`int`	the sum of all the job's data sizes

Notes

see DataEntry.size for more details

`empty: bool` `property`

Check if the job is empty.

Returns:

Type	Description
`bool`	True if the job is empty, False otherwise

`info: JobInfo` `property`

Get the job's metadata.

Returns:

Type	Description
`JobInfo`	the job's metadata

Notes

call update_info to save changes if any

`job: int` `property`

Get the job id.

Returns:

Type	Description
`int`	the job id

`job_size: int` `property`

Get the stored job size.

Returns:

Type	Description
`int`	the stored job size

Notes

may include additional overheads compared to data_size
may represent better the 'on-disk' size
may not be necessarily upper or lower than data_size

`init(job, *, compressed, readonly)`

[Internal] Initialize the job entry.

Parameters:

Name	Type	Description	Default
`job`	`int`	the job id	required
`compressed`	`bool`	indicate whether the underlying data is compressed or not (in Zstandard)	required
`readonly`	`bool`	indicate weather the job is read-only or not	required

Source code in src/lhcbdirac_log/providers/base/accessors.py

def __init__(self, job: int, *, compressed: bool, readonly: bool) -> None:
    """[Internal] Initialize the job entry.

    Args:
        job: the job id
        compressed: indicate whether the underlying data is compressed or not (in Zstandard)
        readonly: indicate weather the job is read-only or not
    """
    super().__init__(readonly=readonly)

    self._info = None
    self._job = job
    self._compressed = compressed

`iter()`

Iterate over all the job's data entries.

Returns:

Type	Description
`Iterator[E]`	a generator of the job's data entries

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def __iter__(self) -> Iterator[E]:
    """Iterate over all the job's data entries.

    Returns:
        a generator of the job's data entries
    """
    return (self.get(i) for i in self.files())

`len()`

Get the number of data entries.

Returns:

Type	Description
`int`	the number of data entries

Source code in src/lhcbdirac_log/providers/base/accessors.py

def __len__(self) -> int:
    """Get the number of data entries.

    Returns:
        the number of data entries
    """
    return sum(1 for _ in self.files())

`clear()`

Clear all data entries.

Raises:

Type	Description
`ReadOnlyError`	if the job is read-only

Source code in src/lhcbdirac_log/providers/base/accessors.py

def clear(self) -> None:
    """Clear all data entries.

    Raises:
        ReadOnlyError: if the job is read-only
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    for i in self:
        i.delete()

`create(name, *, exists_ok=False)`

Create a data entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the data name	required
`exists_ok`	`bool`	if True, ignore the error if the data already exists (default is False)	`False`

Returns:

Type	Description
`E`	the data entry

Raises:

Type	Description
`DataExistsError`	if the data already exists and exists_ok is False
`ReadOnlyError`	if the job is read-only

Notes

the entry, if newly created, will not exist until data is written

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def create(self, name: str, *, exists_ok: bool = False) -> E:
    """Create a data entry.

    Args:
        name: the data name
        exists_ok: if True, ignore the error if the data already exists (default is False)

    Returns:
        the data entry

    Raises:
        DataExistsError: if the data already exists and exists_ok is False
        ReadOnlyError: if the job is read-only

    Notes:
        - the entry, if newly created, will not exist until data is written
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    return self._create(name, exists_ok=exists_ok)

`delete(name)`

Delete a data entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the data name	required

Raises:

Type	Description
`DataNotExistsError`	if the data does not exist
`ReadOnlyError`	if the job is read-only

Source code in src/lhcbdirac_log/providers/base/accessors.py

def delete(self, name: str) -> None:
    """Delete a data entry.

    Args:
        name: the data name

    Raises:
        DataNotExistsError: if the data does not exist
        ReadOnlyError: if the job is read-only
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    self.get(name).delete()

`files()` `abstractmethod`

Get a generator of the job's files.

Returns:

Type	Description
`Generator[str, None, None]`	a generator of the job's file names

Source code in src/lhcbdirac_log/providers/base/accessors.py

@abstractmethod
def files(self) -> Generator[str, None, None]:
    """Get a generator of the job's files.

    Returns:
        a generator of the job's file names
    """

`get(name, *, create=False)`

Get a data entry.

Parameters:

Name	Type	Description	Default
`name`	`str`	the data name	required
`create`	`bool`	if True, create the data if it does not exist (default is False)	`False`

Returns:

Type	Description
`E`	the data entry

Raises:

Type	Description
`DataNotExistsError`	if the data does not exist and create is False
`ReadOnlyError`	if the job is read-only and create is True

Notes

the entry, if newly created, will not exist until data is written

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def get(self, name: str, *, create: bool = False) -> E:
    """Get a data entry.

    Args:
        name: the data name
        create: if True, create the data if it does not exist (default is False)

    Returns:
        the data entry

    Raises:
        DataNotExistsError: if the data does not exist and create is False
        ReadOnlyError: if the job is read-only and create is True

    Notes:
        - the entry, if newly created, will not exist until data is written
    """
    if create and self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    return self._get(name, create=create)

`update_info(info=None)`

Update the job's metadata, for compressed job.

Parameters:

Name	Type	Description	Default
`info`	`JobInfo \| None`	new metadata to copy from and save (or None to save the current metadata)	`None`

Raises:

Type	Description
`RuntimeError`	if called on non-compressed job
`ReadOnlyError`	if the job is read-only

Notes

for non-compressed job, the metadata is loaded directly from the job's data
for compressed job, the metadata is not accessible, and so, must be saved by the provider (if support is intended)

Source code in src/lhcbdirac_log/providers/base/accessors.py

@final
def update_info(self, info: JobInfo | None = None) -> None:
    """Update the job's metadata, for compressed job.

    Args:
        info: new metadata to copy from and save (or None to save the current metadata)

    Raises:
        RuntimeError: if called on non-compressed job
        ReadOnlyError: if the job is read-only

    Notes:
        - for non-compressed job, the metadata is loaded directly from the job's data
        - for compressed job, the metadata is not accessible, and so, must be saved by the provider (if support is intended)
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    if not self._compressed:
        msg = f"Job '{self._job}' is not compressed"
        raise RuntimeError(msg)

    if info is None:
        if self._info is None:
            self._info = self._load_info()
    elif self._info is None:  # copy the info
        self._info = JobInfo(info.dirac_id, info.success)
    else:  # transfer the info
        self._info.dirac_id = info.dirac_id
        self._info.success = info.success

    self._update_info()

`JobExistsError`

Bases: Exception

Raised when a job already exists.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class JobExistsError(Exception):
    """Raised when a job already exists."""

`JobInfo` `dataclass`

Metadata for a job entry.

Unset attributes are set to None.

Attributes:

Name	Type	Description
`dirac_id`	`int \| None`	the job dirac id
`success`	`bool \| None`	the job success status

Source code in src/lhcbdirac_log/providers/base/accessors.py

@dataclass
class JobInfo:
    """Metadata for a job entry.

    Unset attributes are set to None.

    Attributes:
        dirac_id: the job dirac id
        success: the job success status
    """

    __slots__ = (
        "dirac_id",
        "success",
    )

    dirac_id: int | None
    success: bool | None

`JobNotExistsError`

Bases: Exception

Raised when a job does not exist.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class JobNotExistsError(Exception):
    """Raised when a job does not exist."""

`ReadOnlyError`

Bases: Exception

Raised when a write operation is attempted on a read-only object.

Source code in src/lhcbdirac_log/providers/base/exceptions.py

class ReadOnlyError(Exception):
    """Raised when a write operation is attempted on a read-only object."""

Base Provider

DataEntry

compressed: bool property

dict_name: str property

exists: bool property

job: int property

size: int property

__init__(name, job, *, compressed, readonly)

delete()

reader()

writer()

DataExistsError

DataNotExistsError

DataProvider

compressed: bool property

data_size: int property

dict_provider: DictProvider | None property

size: int property

__getitem__(job)

__init__(dict_provider=None, *, readonly)

__iter__()

__len__()

clear(*, force=False)

create(job, *, exists_ok=False)

delete(job, *, force=False)

get(job, *, create=False)

jobs() abstractmethod

transfer(target, *, limit=0)

DictEntry

data: bytes property

dict: ZstdCompressionDict property

dict_name: str property

exists: bool abstractmethod property

is_loaded: bool property

size: int abstractmethod property

zstd_id: int property

__init__(name, config, data=None, zstd_id=None)

DictExistsError

DictInvalidError

DictNotExistsError

DictProvider

config: Config property

size: int property

__contains__(name)

__delitem__(name)

__getitem__(name)

__init__(config, *, readonly)

__iter__()

__len__()

add(name, data, zstd_id, *, load=True)

clear()

delete(name)

get(name, default=None, *, invalid_ok=False, missing_ok=True)

is_invalid(name)

is_loaded(name)

is_missing(name)

iter(*, loaded_only=False)

mark_invalid(name)

mark_missing(name)

transfer(target)

JobEntry

compressed: bool property

data_size: int property

empty: bool property

info: JobInfo property

job: int property

job_size: int property

__init__(job, *, compressed, readonly)

__iter__()

__len__()

clear()

create(name, *, exists_ok=False)

delete(name)

files() abstractmethod

get(name, *, create=False)

update_info(info=None)

JobExistsError

JobInfo dataclass

JobNotExistsError

ReadOnlyError

`DataEntry`

`compressed: bool` `property`

`dict_name: str` `property`

`exists: bool` `property`

`job: int` `property`

`size: int` `property`

`init(name, job, *, compressed, readonly)`

`delete()`

`reader()`

`writer()`

`DataExistsError`

`DataNotExistsError`

`DataProvider`

`compressed: bool` `property`

`data_size: int` `property`

`dict_provider: DictProvider | None` `property`

`size: int` `property`

`getitem(job)`

`init(dict_provider=None, *, readonly)`

`iter()`

`len()`

`clear(*, force=False)`

`create(job, *, exists_ok=False)`

`delete(job, *, force=False)`

`get(job, *, create=False)`

`jobs()` `abstractmethod`

`transfer(target, *, limit=0)`

`DictEntry`

`data: bytes` `property`

`dict: ZstdCompressionDict` `property`

`dict_name: str` `property`

`exists: bool` `abstractmethod` `property`

`is_loaded: bool` `property`

`size: int` `abstractmethod` `property`

`zstd_id: int` `property`

`init(name, config, data=None, zstd_id=None)`

`DictExistsError`

`DictInvalidError`

`DictNotExistsError`

`DictProvider`

`config: Config` `property`

`size: int` `property`

`contains(name)`

`delitem(name)`

`getitem(name)`

`init(config, *, readonly)`

`iter()`

`len()`

`add(name, data, zstd_id, *, load=True)`

`clear()`

`delete(name)`

`get(name, default=None, *, invalid_ok=False, missing_ok=True)`

`is_invalid(name)`

`is_loaded(name)`

`is_missing(name)`

`iter(*, loaded_only=False)`

`mark_invalid(name)`

`mark_missing(name)`

`transfer(target)`

`JobEntry`

`compressed: bool` `property`

`data_size: int` `property`

`empty: bool` `property`

`info: JobInfo` `property`

`job: int` `property`

`job_size: int` `property`

`init(job, *, compressed, readonly)`

`iter()`

`len()`

`clear()`

`create(name, *, exists_ok=False)`

`delete(name)`

`files()` `abstractmethod`

`get(name, *, create=False)`

`update_info(info=None)`

`JobExistsError`

`JobInfo` `dataclass`

`JobNotExistsError`

`ReadOnlyError`