Skip to content

Base Provider

Base package for the data provider layer of the library.

This module provides the base classes and exceptions for the data provider layer implementations.

Description

Production/subproduction can be access through a data provider. From data providers, job entries can be accessed, created, updated, and deleted. From job entries, data entries can be accessed, created, updated, and deleted. From data entries, data can be read/written.

E.g.: Production 00001234/0000: - Job n°00000042: - Data log_data_0001.xml - Data job.info ... - Job 00001200: - Data log_data_0042.xml - Data log.txt - Data stdout.xml ...

Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression. Dict entries are used to access dictionaries data, identified by a unique name. The name of a dictionary is the common part between all related files = the filenames without numbers.

E.g.: - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

When a provider is read-only, it is not possible to create, update, or delete anything. When a data provider is compressed (assigned to a dictionary provider), underlying data is compressed with Zstandard and the dictionary provider contains the dictionaries, used for both compression and decompression.

Some data providers implementations may not implement the related dict provider, supporting only the old "uncompressed" format. On the other hand, some implementations may only support access to the new "compressed" format (Zstandard). Some implementations may only support the read-only mode.

The "compressed" status, indicating if the underlying data are compressed in Zstandard or just in a raw uncompressed format, is just an "indication". Indeed, no matter what type is a data provider, it will always read/write data as is without any processing, considering data are provided/gave already in the good compressed/uncompressed state.

Classes:

Name Description
- DataProvider

Base class for data provider implementations.

- DictProvider

Base class for Zstandard dictionary provider implementations.

- DataEntry

Base class for data entry objects.

- DictEntry

Base class for Zstandard dictionary entry objects.

- JobEntry

Base class for job entry objects.

- JobInfo

Base class for job information objects.

Raises:

Type Description
-DataExistsError

Exception raised when data already exists in the provider.

-DataNotExistsError

Exception raised when data does not exist in the provider.

-DictExistsError

Exception raised when a dictionary already exists in the provider.

-DictInvalidError

Exception raised when a dictionary is invalid.

-DictNotExistsError

Exception raised when a dictionary does not exist in the provider.

-JobExistsError

Exception raised when a job already exists in the provider.

-JobNotExistsError

Exception raised when a job does not exist in the provider.

-ReadOnlyError

Exception raised when a provider is read-only.

DataEntry

Bases: NamedEntry, ABC

Base data entry class, provide an abstraction layer for data access / IO operations.

This represents a data entry through a data provider, the entry's data can be read/write through file-like object obtained from the main reader/writer methods

Source code in src/lhcbdirac_log/providers/base/accessors.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
class DataEntry(NamedEntry, ABC):
    """Base data entry class, provide an abstraction layer for data access / IO operations.

    This represents a data entry through a data provider,
    the entry's data can be read/write through file-like object obtained from the main reader/writer methods
    """

    __slots__ = (
        "_compressed",
        "_job",
    )

    def __init__(self, name: str, job: int, *, compressed: bool, readonly: bool) -> None:
        """[Internal] Initialize the data entry.

        Args:
            name: the data name
            job: the job id
            compressed: indicate that the underlying data is compressed (in Zstandard)
            readonly: indicate weather the data is read-only or not

        Notes:
            - instantiation alone has no effects on the provider, the data will be created on first write
        """
        self._job = job
        self._compressed = compressed
        super().__init__(name, readonly=readonly)

    @property
    @final
    def compressed(self) -> bool:
        """Check if the underlying data is compressed or not (in Zstandard).

        Returns:
            True if the underlying data is compressed, False otherwise
        """
        return self._compressed

    @property
    @final
    def size(self) -> int:
        """Get the stored data size.

        This is the size of the stored data, the same as the read data from the reader.

        Returns:
            the data size or 0 if the entry not exists

        Notes:
            - zero-size may indicate that the data exists but is empty
            - compressed size for compressed entry, uncompressed size for uncompressed entry
        """
        return self._size() or 0

    @property
    @final
    def exists(self) -> bool:
        """Check if the data exists.

        Returns:
            True if the data exists, False otherwise
        """
        return self._size() is not None

    @property
    @final
    def job(self) -> int:
        """Get the job id.

        Returns:
            the job id
        """
        return self._job

    @property
    @override
    @final
    def dict_name(self) -> str:
        """Get the dict name.

        Returns:
            the dict name
        """
        return self.filename_to_dictname(self._name)

    @abstractmethod
    def _reader(self) -> BinaryIO:
        """[Internal] Get a data reader as a file-like object.

        Returns:
            a data reader

        Notes:
            - the caller is responsible of the reader's lifecycle
            - each call returns a new reader
            - may not support concurrent readers and / or writers
        """

    @final
    def reader(self) -> BinaryIO:
        """Get a data reader as a file-like object.

        Returns:
            a data reader

        Raises:
            DataNotExistsError: if the data does not exist

        Notes:
            - the caller is responsible of the reader's lifecycle
            - each call returns a new reader
            - may not support concurrent readers and / or writers
        """
        if not self.exists:
            raise DataNotExistsError(self._name)

        return self._reader()

    @abstractmethod
    def _writer(self) -> BinaryIO:
        """[Internal] Get a data writer as a file-like object.

        Returns:
            a data writer

        Notes:
            - the caller is responsible of the writer's lifecycle
            - each call returns a new writer
            - may not support concurrent readers and / or writers
        """

    @final
    def writer(self) -> BinaryIO:
        """Get a data writer as a file-like object.

        Returns:
            a data writer

        Raises:
            ReadOnlyError: if the data is read-only

        Notes:
            - the caller is responsible of the writer's lifecycle
            - each call returns a new writer
            - may not support concurrent readers and / or writers
        """
        if self._readonly:
            msg = f"Data '{self._name}' is read-only"
            raise ReadOnlyError(msg)

        return self._writer()

    @abstractmethod
    def _size(self) -> int | None:
        """[Internal] Get the stored data size.

        Returns:
            the stored data size or None if the data does not exist
        """

    @abstractmethod
    def _delete(self) -> None:
        """[Internal] Delete the data.

        Raises:
            DataNotExistsError: if the data does not exist
        """

    @final
    def delete(self) -> None:
        """Delete the data.

        Raises:
            DataNotExistsError: if the data does not exist
            ReadOnlyError: if the data is read-only
        """
        if self._readonly:
            msg = f"Data '{self._name}' is read-only"
            raise ReadOnlyError(msg)

        self._delete()

compressed: bool property

Check if the underlying data is compressed or not (in Zstandard).

Returns:

Type Description
bool

True if the underlying data is compressed, False otherwise

dict_name: str property

Get the dict name.

Returns:

Type Description
str

the dict name

exists: bool property

Check if the data exists.

Returns:

Type Description
bool

True if the data exists, False otherwise

job: int property

Get the job id.

Returns:

Type Description
int

the job id

size: int property

Get the stored data size.

This is the size of the stored data, the same as the read data from the reader.

Returns:

Type Description
int

the data size or 0 if the entry not exists

Notes
  • zero-size may indicate that the data exists but is empty
  • compressed size for compressed entry, uncompressed size for uncompressed entry

__init__(name, job, *, compressed, readonly)

[Internal] Initialize the data entry.

Parameters:

Name Type Description Default
name str

the data name

required
job int

the job id

required
compressed bool

indicate that the underlying data is compressed (in Zstandard)

required
readonly bool

indicate weather the data is read-only or not

required
Notes
  • instantiation alone has no effects on the provider, the data will be created on first write
Source code in src/lhcbdirac_log/providers/base/accessors.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def __init__(self, name: str, job: int, *, compressed: bool, readonly: bool) -> None:
    """[Internal] Initialize the data entry.

    Args:
        name: the data name
        job: the job id
        compressed: indicate that the underlying data is compressed (in Zstandard)
        readonly: indicate weather the data is read-only or not

    Notes:
        - instantiation alone has no effects on the provider, the data will be created on first write
    """
    self._job = job
    self._compressed = compressed
    super().__init__(name, readonly=readonly)

delete()

Delete the data.

Raises:

Type Description
DataNotExistsError

if the data does not exist

ReadOnlyError

if the data is read-only

Source code in src/lhcbdirac_log/providers/base/accessors.py
284
285
286
287
288
289
290
291
292
293
294
295
296
@final
def delete(self) -> None:
    """Delete the data.

    Raises:
        DataNotExistsError: if the data does not exist
        ReadOnlyError: if the data is read-only
    """
    if self._readonly:
        msg = f"Data '{self._name}' is read-only"
        raise ReadOnlyError(msg)

    self._delete()

reader()

Get a data reader as a file-like object.

Returns:

Type Description
BinaryIO

a data reader

Raises:

Type Description
DataNotExistsError

if the data does not exist

Notes
  • the caller is responsible of the reader's lifecycle
  • each call returns a new reader
  • may not support concurrent readers and / or writers
Source code in src/lhcbdirac_log/providers/base/accessors.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
@final
def reader(self) -> BinaryIO:
    """Get a data reader as a file-like object.

    Returns:
        a data reader

    Raises:
        DataNotExistsError: if the data does not exist

    Notes:
        - the caller is responsible of the reader's lifecycle
        - each call returns a new reader
        - may not support concurrent readers and / or writers
    """
    if not self.exists:
        raise DataNotExistsError(self._name)

    return self._reader()

writer()

Get a data writer as a file-like object.

Returns:

Type Description
BinaryIO

a data writer

Raises:

Type Description
ReadOnlyError

if the data is read-only

Notes
  • the caller is responsible of the writer's lifecycle
  • each call returns a new writer
  • may not support concurrent readers and / or writers
Source code in src/lhcbdirac_log/providers/base/accessors.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
@final
def writer(self) -> BinaryIO:
    """Get a data writer as a file-like object.

    Returns:
        a data writer

    Raises:
        ReadOnlyError: if the data is read-only

    Notes:
        - the caller is responsible of the writer's lifecycle
        - each call returns a new writer
        - may not support concurrent readers and / or writers
    """
    if self._readonly:
        msg = f"Data '{self._name}' is read-only"
        raise ReadOnlyError(msg)

    return self._writer()

DataExistsError

Bases: Exception

Raised when a data already exists.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
48
49
class DataExistsError(Exception):
    """Raised when a data already exists."""

DataNotExistsError

Bases: Exception

Raised when a data does not exist.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
36
37
class DataNotExistsError(Exception):
    """Raised when a data does not exist."""

DataProvider

Bases: Provider[J], ABC

Base class for data provider implementations, providing an abstraction layer for data management.

Production/subproduction can be access through a data provider. From data providers, job entries can be accessed, created, updated, and deleted. From job entries, data entries can be accessed, created, updated, and deleted. From data entries, data can be read/written.

E.g.: Production 00001234/0000: - Job n°00000042: - Data log_data_0001.xml - Data job.info ... - Job 00001200: - Data log_data_0042.xml - Data log.txt - Data stdout.xml ...

Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression. Dict entries are used to access dictionaries data, identified by a unique name. The name of a dictionary is the common part between all related files = the filenames without numbers.

E.g.: - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

When a provider is read-only, it is not possible to create, update, or delete anything. When a data provider is compressed (assigned to a dictionary provider), underlying data is compressed with Zstandard and the dictionary provider contains the dictionaries, used for both compression and decompression.

Some data providers implementations may not implement the related dict provider, supporting only the old "uncompressed" format. On the other hand, some implementations may only support access to the new "compressed" format (Zstandard). Some implementations may only support the read-only mode.

The "compressed" status, indicating if the underlying data are compressed in Zstandard or just in a raw uncompressed format, is just an "indication". Indeed, no matter what type is a data provider, it will always read/write data as is without any processing, considering data are provided/gave already in the good compressed/uncompressed state.

Source code in src/lhcbdirac_log/providers/base/providers.py
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
class DataProvider[J: JobEntry](Provider[J], ABC):
    """Base class for data provider implementations, providing an abstraction layer for data management.

    Production/subproduction can be access through a data provider.
    From data providers, job entries can be accessed, created, updated, and deleted.
    From job entries, data entries can be accessed, created, updated, and deleted.
    From data entries, data can be read/written.

    E.g.:
        Production 00001234/0000:
            - Job n°00000042:
                - Data log_data_0001.xml
                - Data job.info
                  ...
            - Job 00001200:
                - Data log_data_0042.xml
                - Data log.txt
                - Data stdout.xml
                ...

    Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression.
    Dict entries are used to access dictionaries data, identified by a unique name.
    The name of a dictionary is the common part between all related files = the filenames without numbers.

    E.g.:
        - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

    When a provider is read-only, it is not possible to create, update, or delete anything.
    When a data provider is compressed (assigned to a dictionary provider),
    underlying data is compressed with Zstandard and the dictionary provider contains the dictionaries,
    used for both compression and decompression.

    Some data providers implementations may not implement the related dict provider,
    supporting only the old "uncompressed" format. On the other hand, some implementations may only support access to
    the new "compressed" format (Zstandard).
    Some implementations may only support the read-only mode.

    The "compressed" status, indicating if the underlying data are compressed in Zstandard or just in a raw uncompressed
    format, is just an "indication". Indeed, no matter what type is a data provider, it will always read/write data as
    is without any processing, considering data are provided/gave already in the good compressed/uncompressed state.
    """

    __slots__ = ("_dict_provider",)

    def __init__(self, dict_provider: DictProvider | None = None, *, readonly: bool) -> None:
        """[Internal] Initialize the provider.

        Args:
            dict_provider: the dict provider associated to the data (default is None), specifying this implies that the provided data are compressed.
                           Some providers may not support this or supports only dict providers from the same implementation.
            readonly: indicate weather the provider is read-only or not
        """
        super().__init__(readonly=readonly)
        self._dict_provider = dict_provider

    @property
    @final
    def compressed(self) -> bool:
        """Check if the underlying data are compressed or not (in Zstandard).

        Returns:
            True if the data are compressed, False otherwise

        Notes:
            - True implies that the `dict_provider` is not None
            - False implies that the `dict_provider` is None
        """
        return self._dict_provider is not None

    @property
    @final
    def dict_provider(self) -> DictProvider | None:
        """Get the linked dict provider or None.

        Returns:
            the dict provider or None
        """
        return self._dict_provider

    @abstractmethod
    def _get(self, job: int, *, create: bool = False) -> J:
        """[Internal] Get a job entry.

        Args:
            job: the job id
            create: if True, create the job if it does not exist (default is False)

        Returns:
            the job entry

        Raises:
            JobNotExistsError: if the job does not exist and create is False
        """

    @final
    def get(self, job: int, *, create: bool = False) -> J:
        """Get a job entry.

        Args:
            job: the job id
            create: if True, create the job if it does not exist (default is False)

        Returns:
            the job entry

        Raises:
            JobNotExistsError: if the job does not exist and create is False
            ReadOnlyError: if the provider is read-only and create is True
        """
        if create and self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        return self._get(job, create=create)

    @abstractmethod
    def _create(self, job: int, *, exists_ok: bool = False) -> J:
        """[Internal] Create a job entry.

        Args:
            job: the job id
            exists_ok: if True, ignore the error if the job already exists (default is False)

        Returns:
            the job entry

        Raises:
            JobExistsError: if the job already exists and exists_ok is False
        """

    @final
    def create(self, job: int, *, exists_ok: bool = False) -> J:
        """Create a job entry.

        Args:
            job: the job id
            exists_ok: if True, ignore the error if the job already exists (default is False)

        Returns:
            the job entry

        Raises:
            JobExistsError: if the job already exists and exists_ok is False
            ReadOnlyError: if the provider is read-only
        """
        if self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        return self._create(job, exists_ok=exists_ok)

    @abstractmethod
    def _delete(self, job: int, *, force: bool = False) -> None:
        """[Internal] Delete a job.

        Args:
            job: the job id
            force: if True, delete the job even if it is not empty (default is False)

        Raises:
            JobNotExistsError: if the job does not exist
            DataExistsError: if the job is not empty and force is False
        """

    @final
    def delete(self, job: int, *, force: bool = False) -> None:
        """Delete a job.

        Args:
            job: the job id
            force: if True, delete the job even if it is not empty (default is False)

        Raises:
            JobNotExistsError: if the job does not exist
            DataExistsError: if the job is not empty and force is False
            ReadOnlyError: if the provider is read-only
        """
        if self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        self._delete(job, force=force)

    @final
    def clear(self, *, force: bool = False) -> None:
        """Clear all the jobs.

        Args:
            force: if True, delete the jobs even if they are not empty (default is False)

        Raises:
            DataExistsError: if a job is not empty and force is False
            ReadOnlyError: if the provider is read-only
        """
        if self.readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        for j in self.jobs():
            self.delete(j, force=force)

    @abstractmethod
    def jobs(self) -> Generator[int, None, None]:
        """Get all existing jobs id.

        Returns:
            a generator of the jobs id
        """

    @override
    @property
    def size(self) -> int:
        """Get the size of all jobs.

        Returns:
            the whole job size

        Notes:
            - see JobEntry.job_size for more details
        """
        return sum(i.job_size for i in self)

    @property
    def data_size(self) -> int:
        """Get all stored data size.

        Returns:
            the whole data size

        Notes:
            - see JobEntry.data_size for more details
        """
        return sum(i.data_size for i in self)

    @final
    def __iter__(self) -> Iterator[J]:
        """Iterate over all the jobs entries.

        Returns:
            an iterator of the jobs entries
        """
        return (self.get(i) for i in self.jobs())

    @final
    def __getitem__(self, job: int) -> J:
        """Get a job entry.

        Args:
            job: the job id

        Returns:
            the job entry

        Raises:
            JobNotExistsError: if the job does not exist

        Notes:
            - same as: provider.get(job)
        """
        return self.get(job)

    @override
    def __len__(self) -> int:
        """Get the number of jobs.

        Returns:
            the number of jobs
        """
        return sum(1 for _ in self.jobs())

    @final
    def transfer(self, target: DataProvider, *, limit: int = 0) -> int:
        """Transfer (copy) all jobs data to another provider.

        Args:
            target: the target provider
            limit: the maximum number of jobs to transfer (default is 0, meaning all)

        Returns:
            the number of transferred jobs

        Raises:
            ValueError: if the target provider is not of the same format (`compressed` value is different)
            JobExistsError: if copied jobs already exists in the target provider
            DictExistsError: if copied dicts already exists in the target provider
            ReadOnlyError: if the target provider is read-only

        Notes:
            - nothing is transferred if the target provider is the same as the source provider
            - if the providers are compressed (Zstandard format), the dict provider is also transferred, first
            - dict provider transfer is only performed if the target dict provider is not read-only
        """
        if self is target:
            return 0

        if self.compressed != target.compressed:
            msg = "The data provider compression mode must be the same"
            raise ValueError(msg)

        if target.readonly:
            msg = "The target provider is read-only"
            raise ReadOnlyError(msg)

        if self.compressed and not target.dict_provider.readonly:
            self.dict_provider.transfer(target.dict_provider)

        n = -1
        for n, job in enumerate(self):
            tjob = target.create(job.job, exists_ok=False)

            if tjob.compressed:
                tjob.update_info(job.info)

            for i in job:
                dst = tjob.create(i.name, exists_ok=False)

                with i.reader() as r, dst.writer() as w:
                    copyfileobj(r, w)

            if limit and n >= limit - 1:
                break

        return n + 1

compressed: bool property

Check if the underlying data are compressed or not (in Zstandard).

Returns:

Type Description
bool

True if the data are compressed, False otherwise

Notes
  • True implies that the dict_provider is not None
  • False implies that the dict_provider is None

data_size: int property

Get all stored data size.

Returns:

Type Description
int

the whole data size

Notes
  • see JobEntry.data_size for more details

dict_provider: DictProvider | None property

Get the linked dict provider or None.

Returns:

Type Description
DictProvider | None

the dict provider or None

size: int property

Get the size of all jobs.

Returns:

Type Description
int

the whole job size

Notes
  • see JobEntry.job_size for more details

__getitem__(job)

Get a job entry.

Parameters:

Name Type Description Default
job int

the job id

required

Returns:

Type Description
J

the job entry

Raises:

Type Description
JobNotExistsError

if the job does not exist

Notes
  • same as: provider.get(job)
Source code in src/lhcbdirac_log/providers/base/providers.py
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
@final
def __getitem__(self, job: int) -> J:
    """Get a job entry.

    Args:
        job: the job id

    Returns:
        the job entry

    Raises:
        JobNotExistsError: if the job does not exist

    Notes:
        - same as: provider.get(job)
    """
    return self.get(job)

__init__(dict_provider=None, *, readonly)

[Internal] Initialize the provider.

Parameters:

Name Type Description Default
dict_provider DictProvider | None

the dict provider associated to the data (default is None), specifying this implies that the provided data are compressed. Some providers may not support this or supports only dict providers from the same implementation.

None
readonly bool

indicate weather the provider is read-only or not

required
Source code in src/lhcbdirac_log/providers/base/providers.py
668
669
670
671
672
673
674
675
676
677
def __init__(self, dict_provider: DictProvider | None = None, *, readonly: bool) -> None:
    """[Internal] Initialize the provider.

    Args:
        dict_provider: the dict provider associated to the data (default is None), specifying this implies that the provided data are compressed.
                       Some providers may not support this or supports only dict providers from the same implementation.
        readonly: indicate weather the provider is read-only or not
    """
    super().__init__(readonly=readonly)
    self._dict_provider = dict_provider

__iter__()

Iterate over all the jobs entries.

Returns:

Type Description
Iterator[J]

an iterator of the jobs entries

Source code in src/lhcbdirac_log/providers/base/providers.py
858
859
860
861
862
863
864
865
@final
def __iter__(self) -> Iterator[J]:
    """Iterate over all the jobs entries.

    Returns:
        an iterator of the jobs entries
    """
    return (self.get(i) for i in self.jobs())

__len__()

Get the number of jobs.

Returns:

Type Description
int

the number of jobs

Source code in src/lhcbdirac_log/providers/base/providers.py
885
886
887
888
889
890
891
892
@override
def __len__(self) -> int:
    """Get the number of jobs.

    Returns:
        the number of jobs
    """
    return sum(1 for _ in self.jobs())

clear(*, force=False)

Clear all the jobs.

Parameters:

Name Type Description Default
force bool

if True, delete the jobs even if they are not empty (default is False)

False

Raises:

Type Description
DataExistsError

if a job is not empty and force is False

ReadOnlyError

if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
@final
def clear(self, *, force: bool = False) -> None:
    """Clear all the jobs.

    Args:
        force: if True, delete the jobs even if they are not empty (default is False)

    Raises:
        DataExistsError: if a job is not empty and force is False
        ReadOnlyError: if the provider is read-only
    """
    if self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    for j in self.jobs():
        self.delete(j, force=force)

create(job, *, exists_ok=False)

Create a job entry.

Parameters:

Name Type Description Default
job int

the job id

required
exists_ok bool

if True, ignore the error if the job already exists (default is False)

False

Returns:

Type Description
J

the job entry

Raises:

Type Description
JobExistsError

if the job already exists and exists_ok is False

ReadOnlyError

if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
@final
def create(self, job: int, *, exists_ok: bool = False) -> J:
    """Create a job entry.

    Args:
        job: the job id
        exists_ok: if True, ignore the error if the job already exists (default is False)

    Returns:
        the job entry

    Raises:
        JobExistsError: if the job already exists and exists_ok is False
        ReadOnlyError: if the provider is read-only
    """
    if self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    return self._create(job, exists_ok=exists_ok)

delete(job, *, force=False)

Delete a job.

Parameters:

Name Type Description Default
job int

the job id

required
force bool

if True, delete the job even if it is not empty (default is False)

False

Raises:

Type Description
JobNotExistsError

if the job does not exist

DataExistsError

if the job is not empty and force is False

ReadOnlyError

if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
@final
def delete(self, job: int, *, force: bool = False) -> None:
    """Delete a job.

    Args:
        job: the job id
        force: if True, delete the job even if it is not empty (default is False)

    Raises:
        JobNotExistsError: if the job does not exist
        DataExistsError: if the job is not empty and force is False
        ReadOnlyError: if the provider is read-only
    """
    if self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    self._delete(job, force=force)

get(job, *, create=False)

Get a job entry.

Parameters:

Name Type Description Default
job int

the job id

required
create bool

if True, create the job if it does not exist (default is False)

False

Returns:

Type Description
J

the job entry

Raises:

Type Description
JobNotExistsError

if the job does not exist and create is False

ReadOnlyError

if the provider is read-only and create is True

Source code in src/lhcbdirac_log/providers/base/providers.py
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
@final
def get(self, job: int, *, create: bool = False) -> J:
    """Get a job entry.

    Args:
        job: the job id
        create: if True, create the job if it does not exist (default is False)

    Returns:
        the job entry

    Raises:
        JobNotExistsError: if the job does not exist and create is False
        ReadOnlyError: if the provider is read-only and create is True
    """
    if create and self.readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    return self._get(job, create=create)

jobs() abstractmethod

Get all existing jobs id.

Returns:

Type Description
Generator[int, None, None]

a generator of the jobs id

Source code in src/lhcbdirac_log/providers/base/providers.py
825
826
827
828
829
830
831
@abstractmethod
def jobs(self) -> Generator[int, None, None]:
    """Get all existing jobs id.

    Returns:
        a generator of the jobs id
    """

transfer(target, *, limit=0)

Transfer (copy) all jobs data to another provider.

Parameters:

Name Type Description Default
target DataProvider

the target provider

required
limit int

the maximum number of jobs to transfer (default is 0, meaning all)

0

Returns:

Type Description
int

the number of transferred jobs

Raises:

Type Description
ValueError

if the target provider is not of the same format (compressed value is different)

JobExistsError

if copied jobs already exists in the target provider

DictExistsError

if copied dicts already exists in the target provider

ReadOnlyError

if the target provider is read-only

Notes
  • nothing is transferred if the target provider is the same as the source provider
  • if the providers are compressed (Zstandard format), the dict provider is also transferred, first
  • dict provider transfer is only performed if the target dict provider is not read-only
Source code in src/lhcbdirac_log/providers/base/providers.py
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
@final
def transfer(self, target: DataProvider, *, limit: int = 0) -> int:
    """Transfer (copy) all jobs data to another provider.

    Args:
        target: the target provider
        limit: the maximum number of jobs to transfer (default is 0, meaning all)

    Returns:
        the number of transferred jobs

    Raises:
        ValueError: if the target provider is not of the same format (`compressed` value is different)
        JobExistsError: if copied jobs already exists in the target provider
        DictExistsError: if copied dicts already exists in the target provider
        ReadOnlyError: if the target provider is read-only

    Notes:
        - nothing is transferred if the target provider is the same as the source provider
        - if the providers are compressed (Zstandard format), the dict provider is also transferred, first
        - dict provider transfer is only performed if the target dict provider is not read-only
    """
    if self is target:
        return 0

    if self.compressed != target.compressed:
        msg = "The data provider compression mode must be the same"
        raise ValueError(msg)

    if target.readonly:
        msg = "The target provider is read-only"
        raise ReadOnlyError(msg)

    if self.compressed and not target.dict_provider.readonly:
        self.dict_provider.transfer(target.dict_provider)

    n = -1
    for n, job in enumerate(self):
        tjob = target.create(job.job, exists_ok=False)

        if tjob.compressed:
            tjob.update_info(job.info)

        for i in job:
            dst = tjob.create(i.name, exists_ok=False)

            with i.reader() as r, dst.writer() as w:
                copyfileobj(r, w)

        if limit and n >= limit - 1:
            break

    return n + 1

DictEntry

Bases: NamedEntry, ABC

Base dict entry class, provide an abstraction layer for dict access.

This represents a dict entry, an instance implies the dict exists and can be accessed (except after manual delete).

Delete operation must be done through the associated provider, as dict entry instances are considered read-only and may not be used after delete operation.

Source code in src/lhcbdirac_log/providers/base/accessors.py
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
class DictEntry(NamedEntry, ABC):
    """Base dict entry class, provide an abstraction layer for dict access.

    This represents a dict entry, an instance implies the dict exists and can be accessed (except after manual delete).

    Delete operation must be done through the associated provider,
    as dict entry instances are considered read-only and may not be used after delete operation.
    """

    __slots__ = (
        "_config",
        "_data",
        "_dict",
        "_zstd_id",
    )

    def __init__(self, name: str, config: Config, data: bytes | None = None, zstd_id: int | None = None) -> None:
        """[Internal] Initialize the dict entry.

        Args:
            name: the dict name
            config: the configuration to use for precomputing the dictionary
            data: the dict data (create a new dict if not None)
            zstd_id: the zstd dictionary id (None for unknown)
        """
        super().__init__(name, readonly=True)
        self._data = data
        self._dict: ZstdCompressionDict | None = None
        self._config = config
        self._zstd_id = zstd_id

        if self._data is not None:
            self._save()

    @property
    @override
    @final
    def dict_name(self) -> str:
        """Get the dict name.

        Returns:
            the dict name

        Notes:
            - alias to the dict entry name
        """
        return self._name

    @property
    @final
    def data(self) -> bytes:
        """Get the dict data.

        Returns:
            the dict data

        Raises:
            DictNotExistsError: if the dict does not exist (may never be raised under normal usages)

        Notes:
            - the data is loaded on first access (lazy loading)
        """
        if self._data is None:
            self._data = self._load_data()

        return self._data

    @property
    @abstractmethod
    def size(self) -> int:
        """Get the dict size.

        Returns:
            the dict size or 0 if the dict does not exist
        """

    @property
    @abstractmethod
    def exists(self) -> bool:
        """Check if the dict exists.

        Returns:
            True if the dict exists, False otherwise
        """

    @property
    @final
    def dict(self) -> ZstdCompressionDict:
        """Get the zstd-dict object (precomputed for shared usaged).

        Returns:
            the zstd-dict object

        Raises:
            DictNotExistsError: if the dict does not exist (may never be raised under normal usages)
        """
        if self._dict is None:
            self._load_dict()

        return self._dict

    @property
    @final
    def zstd_id(self) -> int:
        """Get the zstd dictionary id.

        Returns:
            the zstd dictionary id

        Raises:
            DictNotExistsError: if the dict does not exist (may never be raised under normal usages)
        """
        if self._zstd_id is None:
            self._zstd_id = self.dict.dict_id()

        return self._zstd_id

    @property
    @final
    def is_loaded(self) -> bool:
        """Check if the dict is loaded.

        Returns:
            True if the dict is loaded, False otherwise
        """
        return self._data is not None

    @final
    def _load_dict(self) -> None:
        """[Internal] Load the dict from data."""
        self._dict = ZstdCompressionDict(self.data, DICT_TYPE_FULLDICT)
        self._dict.precompute_compress(compression_params=self._config.params)

    @abstractmethod
    def _load_data(self) -> bytes:
        """[Internal] Get the dict's data.

        Returns:
            the dict's data

        Raises:
            DictNotExistsError: if the dict does not exist
        """

    @abstractmethod
    def _save(self) -> None:
        """[Internal] Save the dict data / create the dict entry.

        Notes:
            - the behavior is undefined if the dict already exists (may raise an error or overwrite)
        """

data: bytes property

Get the dict data.

Returns:

Type Description
bytes

the dict data

Raises:

Type Description
DictNotExistsError

if the dict does not exist (may never be raised under normal usages)

Notes
  • the data is loaded on first access (lazy loading)

dict: ZstdCompressionDict property

Get the zstd-dict object (precomputed for shared usaged).

Returns:

Type Description
ZstdCompressionDict

the zstd-dict object

Raises:

Type Description
DictNotExistsError

if the dict does not exist (may never be raised under normal usages)

dict_name: str property

Get the dict name.

Returns:

Type Description
str

the dict name

Notes
  • alias to the dict entry name

exists: bool abstractmethod property

Check if the dict exists.

Returns:

Type Description
bool

True if the dict exists, False otherwise

is_loaded: bool property

Check if the dict is loaded.

Returns:

Type Description
bool

True if the dict is loaded, False otherwise

size: int abstractmethod property

Get the dict size.

Returns:

Type Description
int

the dict size or 0 if the dict does not exist

zstd_id: int property

Get the zstd dictionary id.

Returns:

Type Description
int

the zstd dictionary id

Raises:

Type Description
DictNotExistsError

if the dict does not exist (may never be raised under normal usages)

__init__(name, config, data=None, zstd_id=None)

[Internal] Initialize the dict entry.

Parameters:

Name Type Description Default
name str

the dict name

required
config Config

the configuration to use for precomputing the dictionary

required
data bytes | None

the dict data (create a new dict if not None)

None
zstd_id int | None

the zstd dictionary id (None for unknown)

None
Source code in src/lhcbdirac_log/providers/base/accessors.py
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
def __init__(self, name: str, config: Config, data: bytes | None = None, zstd_id: int | None = None) -> None:
    """[Internal] Initialize the dict entry.

    Args:
        name: the dict name
        config: the configuration to use for precomputing the dictionary
        data: the dict data (create a new dict if not None)
        zstd_id: the zstd dictionary id (None for unknown)
    """
    super().__init__(name, readonly=True)
    self._data = data
    self._dict: ZstdCompressionDict | None = None
    self._config = config
    self._zstd_id = zstd_id

    if self._data is not None:
        self._save()

DictExistsError

Bases: Exception

Raised when a dict already exists.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
44
45
class DictExistsError(Exception):
    """Raised when a dict already exists."""

DictInvalidError

Bases: Exception

Raised when an invalid dict is requested.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
56
57
class DictInvalidError(Exception):
    """Raised when an invalid dict is requested."""

DictNotExistsError

Bases: Exception

Raised when a dict does not exist.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
32
33
class DictNotExistsError(Exception):
    """Raised when a dict does not exist."""

DictProvider

Bases: Provider[E], ABC

Base class for Zstandard dictionary provider implementations, providing an abstraction layer for dictionary management.

Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression.

The provider manages the dictionaries and provides access to them. Dict entries are used to access dictionaries data, identified by a unique name. The name of a dictionary is the common part between all related files = the filenames without numbers.

E.g.: - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

Dictionaries are loaded on demand and are marked as missing if they do not exist. This marking can be used to avoid trying to load them again if they are not found.

Dictionaries that failed their training are marked as invalid. Whereas missing dictionaries, invalid dictionaries marks can be persistent, depending on the implementation.

Source code in src/lhcbdirac_log/providers/base/providers.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
class DictProvider[E: DictEntry](Provider[E], ABC):
    """Base class for Zstandard dictionary provider implementations, providing an abstraction layer for dictionary management.

    Dict providers are used to stored Zstandard dictionaries, used for both compression and decompression.

    The provider manages the dictionaries and provides access to them.
    Dict entries are used to access dictionaries data, identified by a unique name.
    The name of a dictionary is the common part between all related files = the filenames without numbers.

    E.g.:
        - Both log_data_0001.xml and log_data_0042.xml are compressed with the same dictionary, named log_data_xxxx.xml.

    Dictionaries are loaded on demand and are marked as missing if they do not exist.
    This marking can be used to avoid trying to load them again if they are not found.

    Dictionaries that failed their training are marked as invalid.
    Whereas missing dictionaries, invalid dictionaries marks can be persistent, depending on the implementation.
    """

    __slots__ = (
        "_config",
        "_dicts",
        "_invalid",
        "_missing",
    )

    def __init__(self, config: Config, *, readonly: bool) -> None:
        """[Internal] Initialize the provider.

        Args:
            config: the configuration to use for precomputing the dictionaries
            readonly: indicate weather the provider is read-only or not
        """
        super().__init__(readonly=readonly)
        self._dicts = dict[str, E]()
        self._invalid = self._load_invalid()
        self._missing = set[str]()
        self._config = config

    def _load_invalid(self) -> set[str]:
        """[Internal] Load the invalid dicts names.

        Returns:
            the invalid dicts names

        Notes:
            - default implementation returns an empty set
            - must be implemented if invalid dicts are saved
        """
        return set()

    def _mark_invalid(self, name: str) -> None:
        """[Internal] Mark a dict as invalid.

        Args:
            name: the dict name

        Notes:
            - default implementation does nothing
            - must be implemented if invalid dicts are saved
        """

    @property
    def config(self) -> Config:
        """Get the configuration.

        Returns:
            the configuration
        """
        return self._config

    @final
    def is_invalid(self, name: str) -> bool:
        """Check if the dict is marked as invalid.

        Args:
            name: the dict name

        Returns:
            True if the dict is invalid, False otherwise

        Notes:
            - Invalid dicts are dicts that failed to train
            - These dicts may be marked only when a training was attempted
        """
        return name in self._invalid

    @final
    def is_missing(self, name: str) -> bool:
        """Check if the dict is marked as missing.

        Args:
            name: the dict name

        Returns:
            True if the dict is missing, False otherwise

        Notes:
            - Missing dicts are dicts that failed to load
            - These dicts are marked only when loading were attempted
            - Must not be confused with !is_loaded
        """
        return name in self._missing

    @final
    def is_loaded(self, name: str) -> bool:
        """Check if the dict is loaded (exists in cache).

        Args:
            name: the dict name

        Returns:
            True if the dict is loaded, False otherwise

        Notes:
            - same as: 'dict_name' in provider (__contains__)
        """
        return name in self

    @final
    def mark_invalid(self, name: str) -> None:
        """Mark a dict as invalid.

        Args:
            name: the dict name

        Raises:
            DictExistsError: if the dict is already loaded

        Notes:
            - unload if the dict is loaded but not exist anymore (may never happen under normal usages)
        """
        if self.is_loaded(name):
            if self[name].exists:
                raise DictExistsError(name)
            del self._dicts[name]

        if self.is_missing(name):
            self._missing.discard(name)

        self._mark_invalid(name)
        self._invalid.add(name)

    @final
    def mark_missing(self, name: str) -> None:
        """Mark a dict as missing.

        Args:
            name: the dict name

        Raises:
            DictExistsError: if the dict is already loaded
            DictInvalidError: if the dict is already marked as invalid

        Notes:
            - unload if the dict is loaded but not exist (may never happen under normal usages)
        """
        if self.is_loaded(name):
            if self[name].exists:
                raise DictExistsError(name)
            del self._dicts[name]

        if self.is_invalid(name):
            raise DictInvalidError(name)

        self._missing.add(name)

    @override
    @property
    def size(self) -> int:
        """Get the total size of all dicts data.

        Returns:
            the total size
        """
        return sum(self[i].size for i in self)

    @abstractmethod
    def _load(self, name: str) -> E:
        """Load a dict.

        Args:
            name: the dict name

        Raises:
            DictNotExistsError: if the dict does not exist
        """

    @abstractmethod
    def _add(self, name: str, data: bytes, zstd_id: int) -> E:
        """[Internal] Create a new dict entry.

        Args:
            name: the dict name
            data: the dict data
            zstd_id: the zstd dictionary id

        Raises:
            DictExistsError: if the dict already exists
        """

    @final
    def add(self, name: str, data: bytes, zstd_id: int, *, load: bool = True) -> E:
        """Add a new dict entry from data.

        Args:
            name: the dict name
            data: the dict data
            zstd_id: the zstd dictionary id
            load: if True, keep the dict loaded (default is True)

        Returns:
            the dict entry

        Raises:
            DictExistsError: if the dict already exists
            ReadOnlyError: if the provider is read-only
        """
        if self._readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        d = self._add(name, data, zstd_id)
        if load:
            self._dicts[name] = d

        self._missing.discard(name)  # ensure valid state
        self._invalid.discard(name)  # ensure valid state
        return d

    @abstractmethod
    def _delete(self, name: str) -> None:
        """[Internal] Delete a dict entry.

        Args:
            name: the dict name to delete

        Raises:
            DictNotExistsError: if the dict does not exist
        """

    @final
    def delete(self, name: str) -> None:
        """Delete a dict entry.

        Args:
            name: the dict name to delete

        Raises:
            DictNotExistsError: if the dict does not exist
            ReadOnlyError: if the provider is read-only

        Notes:
            - may or may not check if data are linked to the dict before deletion
            - if the dict is loaded, it is obviously considered unloaded
            - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
            - same as: del provider[dict_name]
        """
        if self._readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        self._delete(name)
        self._dicts.pop(name, None)  # delete it from the loaded dicts if loaded

    @final
    def clear(self) -> None:
        """Clear all the existing dicts.

        Raises:
            ReadOnlyError: if the provider is read-only
        """
        if self._readonly:
            msg = "The provider is read-only"
            raise ReadOnlyError(msg)

        for i in self:
            del self[i]

    @final
    def transfer(self, target: DictProvider | None) -> int:
        """Transfer (copy) all dicts to another provider.

        Args:
            target: the target provider

        Returns:
            the number of transferred dicts

        Raises:
            ValueError: if the target provider is not specified (None)
            DictExistsError: if copied dicts already exists in the target provider
            ReadOnlyError: if the target provider is read-only

        Notes:
            - Nothing is transferred if the target provider is the same as the source provider
        """
        if self is target:
            return 0

        if target is None:
            msg = "The target provider must be specified"
            raise ValueError(msg)

        if target.readonly:
            msg = "The target provider is read-only"
            raise ReadOnlyError(msg)

        for i in self._invalid:
            target.mark_invalid(i)

        n = -1
        for n, i in enumerate(self):
            if (d := self.get(i)) is not None:
                target.add(i, d.data, d.zstd_id, load=False)

        return n + 1

    @final
    def _iter_loaded(self) -> Generator[str, None, None]:
        """[Internal] Get the loaded dicts names.

        Returns:
            a generator of the loaded dicts names
        """
        yield from self._dicts.keys()

    @abstractmethod
    def _iter_all(self) -> Generator[str, None, None]:
        """[Internal] Get all the dicts names (loaded and non-loaded/loadable).

        Returns:
            a generator of all the dicts names
        """

    @final
    def iter(self, *, loaded_only: bool = False) -> Generator[str, None, None]:
        """Get the dicts names.

        Args:
            loaded_only: if True, only returns the loaded dicts names, otherwise returns all loadable (default False)

        Returns:
            a generator of the dicts names
        """
        return self._iter_loaded() if loaded_only else self._iter_all()

    @final
    def get(
        self,
        name: str,
        default: E | None = None,
        *,
        invalid_ok: bool = False,
        missing_ok: bool = True,
    ) -> E | None:
        """Get the dict.

        Args:
            name: the dict name to get
            default: the default value returned on ignored errors (default is None)
            invalid_ok: if True, ignores invalid dict error and returns the default value (default is False)
            missing_ok: if True, ignores missing dict error and returns the default value (default is True)

        Returns:
            the dict entry or the default value

        Raises:
            DictInvalidError: if the dict is invalid and invalid_ok is False
            DictNotExistsError: if the dict does not exist and missing_ok is False

        Notes:
            - if invalid_ok and missing_ok are both False, then it is equivalent to provider[dict_name] (__getitem__)
            - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
            - getting the same dict multiple times will return the same object (cached)
        """
        try:
            return self[name]

        except DictInvalidError:
            if not invalid_ok:
                raise

        except DictNotExistsError:
            if not missing_ok:
                raise

        return default

    @final
    def __getitem__(self, name: str) -> E:
        """Get the dict.

        Args:
            name: the dict name to get

        Returns:
            the dict entry

        Raises:
            DictNotExistsError: if the dict does not exist
            DictInvalidError: if the dict is invalid

        Notes:
            - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
            - getting the same dict multiple times will return the same object (cached)
            - same as: provider.get(dict_name, invalid_ok=False, missing_ok=False)
        """
        if self.is_invalid(name):  # only the provider can know if it is invalid
            raise DictInvalidError(name)

        if self.is_missing(name):  # avoid trying to (re)load if we know it is missing
            raise DictNotExistsError(name)

        if (d := self._dicts.get(name)) is None:  # not already loaded
            try:
                d = self._dicts[name] = self._load(name)  # try to load it
            except Exception as err:
                self._missing.add(name)  # mark it as missing
                raise DictNotExistsError(name) from err

        return d

    @final
    def __delitem__(self, name: str) -> None:
        """Delete a dict entry.

        Args:
            name: the dict name to delete

        Raises:
            DictNotExistsError: if the dict does not exist
            ReadOnlyError: if the provider is read-only

        Notes:
            - may or may not check if data are linked to the dict before deletion
            - if the dict is loaded, it is obviously considered unloaded
            - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
            - same as: provider.delete(dict_name)
        """
        self.delete(name)

    @final
    def __contains__(self, name: str) -> bool:
        """Check if the dict is loaded (exists in cache).

        Args:
            name: the dict name

        Returns:
            True if the dict loaded, False otherwise

        Notes:
            - same as: provider.is_loaded(dict_name)
        """
        return name in self._dicts

    @final
    def __iter__(self) -> Iterator[str]:
        """Get all the dicts names.

        Returns:
            an iterator of all the dicts names

        Notes:
            - same as: provider.iter(False)
        """
        return self.iter()

    @final
    def __len__(self) -> int:
        """Get the number of existing dicts.

        Returns:
            the number of existing dicts
        """
        return sum(1 for _ in self)

config: Config property

Get the configuration.

Returns:

Type Description
Config

the configuration

size: int property

Get the total size of all dicts data.

Returns:

Type Description
int

the total size

__contains__(name)

Check if the dict is loaded (exists in cache).

Parameters:

Name Type Description Default
name str

the dict name

required

Returns:

Type Description
bool

True if the dict loaded, False otherwise

Notes
  • same as: provider.is_loaded(dict_name)
Source code in src/lhcbdirac_log/providers/base/providers.py
587
588
589
590
591
592
593
594
595
596
597
598
599
600
@final
def __contains__(self, name: str) -> bool:
    """Check if the dict is loaded (exists in cache).

    Args:
        name: the dict name

    Returns:
        True if the dict loaded, False otherwise

    Notes:
        - same as: provider.is_loaded(dict_name)
    """
    return name in self._dicts

__delitem__(name)

Delete a dict entry.

Parameters:

Name Type Description Default
name str

the dict name to delete

required

Raises:

Type Description
DictNotExistsError

if the dict does not exist

ReadOnlyError

if the provider is read-only

Notes
  • may or may not check if data are linked to the dict before deletion
  • if the dict is loaded, it is obviously considered unloaded
  • instances of the related dict entry, if still accessible, may not be used nor trusted anymore
  • same as: provider.delete(dict_name)
Source code in src/lhcbdirac_log/providers/base/providers.py
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
@final
def __delitem__(self, name: str) -> None:
    """Delete a dict entry.

    Args:
        name: the dict name to delete

    Raises:
        DictNotExistsError: if the dict does not exist
        ReadOnlyError: if the provider is read-only

    Notes:
        - may or may not check if data are linked to the dict before deletion
        - if the dict is loaded, it is obviously considered unloaded
        - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
        - same as: provider.delete(dict_name)
    """
    self.delete(name)

__getitem__(name)

Get the dict.

Parameters:

Name Type Description Default
name str

the dict name to get

required

Returns:

Type Description
E

the dict entry

Raises:

Type Description
DictNotExistsError

if the dict does not exist

DictInvalidError

if the dict is invalid

Notes
  • if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
  • getting the same dict multiple times will return the same object (cached)
  • same as: provider.get(dict_name, invalid_ok=False, missing_ok=False)
Source code in src/lhcbdirac_log/providers/base/providers.py
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
@final
def __getitem__(self, name: str) -> E:
    """Get the dict.

    Args:
        name: the dict name to get

    Returns:
        the dict entry

    Raises:
        DictNotExistsError: if the dict does not exist
        DictInvalidError: if the dict is invalid

    Notes:
        - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
        - getting the same dict multiple times will return the same object (cached)
        - same as: provider.get(dict_name, invalid_ok=False, missing_ok=False)
    """
    if self.is_invalid(name):  # only the provider can know if it is invalid
        raise DictInvalidError(name)

    if self.is_missing(name):  # avoid trying to (re)load if we know it is missing
        raise DictNotExistsError(name)

    if (d := self._dicts.get(name)) is None:  # not already loaded
        try:
            d = self._dicts[name] = self._load(name)  # try to load it
        except Exception as err:
            self._missing.add(name)  # mark it as missing
            raise DictNotExistsError(name) from err

    return d

__init__(config, *, readonly)

[Internal] Initialize the provider.

Parameters:

Name Type Description Default
config Config

the configuration to use for precomputing the dictionaries

required
readonly bool

indicate weather the provider is read-only or not

required
Source code in src/lhcbdirac_log/providers/base/providers.py
171
172
173
174
175
176
177
178
179
180
181
182
def __init__(self, config: Config, *, readonly: bool) -> None:
    """[Internal] Initialize the provider.

    Args:
        config: the configuration to use for precomputing the dictionaries
        readonly: indicate weather the provider is read-only or not
    """
    super().__init__(readonly=readonly)
    self._dicts = dict[str, E]()
    self._invalid = self._load_invalid()
    self._missing = set[str]()
    self._config = config

__iter__()

Get all the dicts names.

Returns:

Type Description
Iterator[str]

an iterator of all the dicts names

Notes
  • same as: provider.iter(False)
Source code in src/lhcbdirac_log/providers/base/providers.py
602
603
604
605
606
607
608
609
610
611
612
@final
def __iter__(self) -> Iterator[str]:
    """Get all the dicts names.

    Returns:
        an iterator of all the dicts names

    Notes:
        - same as: provider.iter(False)
    """
    return self.iter()

__len__()

Get the number of existing dicts.

Returns:

Type Description
int

the number of existing dicts

Source code in src/lhcbdirac_log/providers/base/providers.py
614
615
616
617
618
619
620
621
@final
def __len__(self) -> int:
    """Get the number of existing dicts.

    Returns:
        the number of existing dicts
    """
    return sum(1 for _ in self)

add(name, data, zstd_id, *, load=True)

Add a new dict entry from data.

Parameters:

Name Type Description Default
name str

the dict name

required
data bytes

the dict data

required
zstd_id int

the zstd dictionary id

required
load bool

if True, keep the dict loaded (default is True)

True

Returns:

Type Description
E

the dict entry

Raises:

Type Description
DictExistsError

if the dict already exists

ReadOnlyError

if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
@final
def add(self, name: str, data: bytes, zstd_id: int, *, load: bool = True) -> E:
    """Add a new dict entry from data.

    Args:
        name: the dict name
        data: the dict data
        zstd_id: the zstd dictionary id
        load: if True, keep the dict loaded (default is True)

    Returns:
        the dict entry

    Raises:
        DictExistsError: if the dict already exists
        ReadOnlyError: if the provider is read-only
    """
    if self._readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    d = self._add(name, data, zstd_id)
    if load:
        self._dicts[name] = d

    self._missing.discard(name)  # ensure valid state
    self._invalid.discard(name)  # ensure valid state
    return d

clear()

Clear all the existing dicts.

Raises:

Type Description
ReadOnlyError

if the provider is read-only

Source code in src/lhcbdirac_log/providers/base/providers.py
410
411
412
413
414
415
416
417
418
419
420
421
422
@final
def clear(self) -> None:
    """Clear all the existing dicts.

    Raises:
        ReadOnlyError: if the provider is read-only
    """
    if self._readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    for i in self:
        del self[i]

delete(name)

Delete a dict entry.

Parameters:

Name Type Description Default
name str

the dict name to delete

required

Raises:

Type Description
DictNotExistsError

if the dict does not exist

ReadOnlyError

if the provider is read-only

Notes
  • may or may not check if data are linked to the dict before deletion
  • if the dict is loaded, it is obviously considered unloaded
  • instances of the related dict entry, if still accessible, may not be used nor trusted anymore
  • same as: del provider[dict_name]
Source code in src/lhcbdirac_log/providers/base/providers.py
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
@final
def delete(self, name: str) -> None:
    """Delete a dict entry.

    Args:
        name: the dict name to delete

    Raises:
        DictNotExistsError: if the dict does not exist
        ReadOnlyError: if the provider is read-only

    Notes:
        - may or may not check if data are linked to the dict before deletion
        - if the dict is loaded, it is obviously considered unloaded
        - instances of the related dict entry, if still accessible, may not be used nor trusted anymore
        - same as: del provider[dict_name]
    """
    if self._readonly:
        msg = "The provider is read-only"
        raise ReadOnlyError(msg)

    self._delete(name)
    self._dicts.pop(name, None)  # delete it from the loaded dicts if loaded

get(name, default=None, *, invalid_ok=False, missing_ok=True)

Get the dict.

Parameters:

Name Type Description Default
name str

the dict name to get

required
default E | None

the default value returned on ignored errors (default is None)

None
invalid_ok bool

if True, ignores invalid dict error and returns the default value (default is False)

False
missing_ok bool

if True, ignores missing dict error and returns the default value (default is True)

True

Returns:

Type Description
E | None

the dict entry or the default value

Raises:

Type Description
DictInvalidError

if the dict is invalid and invalid_ok is False

DictNotExistsError

if the dict does not exist and missing_ok is False

Notes
  • if invalid_ok and missing_ok are both False, then it is equivalent to provider[dict_name] (getitem)
  • if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
  • getting the same dict multiple times will return the same object (cached)
Source code in src/lhcbdirac_log/providers/base/providers.py
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
@final
def get(
    self,
    name: str,
    default: E | None = None,
    *,
    invalid_ok: bool = False,
    missing_ok: bool = True,
) -> E | None:
    """Get the dict.

    Args:
        name: the dict name to get
        default: the default value returned on ignored errors (default is None)
        invalid_ok: if True, ignores invalid dict error and returns the default value (default is False)
        missing_ok: if True, ignores missing dict error and returns the default value (default is True)

    Returns:
        the dict entry or the default value

    Raises:
        DictInvalidError: if the dict is invalid and invalid_ok is False
        DictNotExistsError: if the dict does not exist and missing_ok is False

    Notes:
        - if invalid_ok and missing_ok are both False, then it is equivalent to provider[dict_name] (__getitem__)
        - if not found, the dict is marked as missing before raising (avoiding trying to load it again on next call)
        - getting the same dict multiple times will return the same object (cached)
    """
    try:
        return self[name]

    except DictInvalidError:
        if not invalid_ok:
            raise

    except DictNotExistsError:
        if not missing_ok:
            raise

    return default

is_invalid(name)

Check if the dict is marked as invalid.

Parameters:

Name Type Description Default
name str

the dict name

required

Returns:

Type Description
bool

True if the dict is invalid, False otherwise

Notes
  • Invalid dicts are dicts that failed to train
  • These dicts may be marked only when a training was attempted
Source code in src/lhcbdirac_log/providers/base/providers.py
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
@final
def is_invalid(self, name: str) -> bool:
    """Check if the dict is marked as invalid.

    Args:
        name: the dict name

    Returns:
        True if the dict is invalid, False otherwise

    Notes:
        - Invalid dicts are dicts that failed to train
        - These dicts may be marked only when a training was attempted
    """
    return name in self._invalid

is_loaded(name)

Check if the dict is loaded (exists in cache).

Parameters:

Name Type Description Default
name str

the dict name

required

Returns:

Type Description
bool

True if the dict is loaded, False otherwise

Notes
  • same as: 'dict_name' in provider (contains)
Source code in src/lhcbdirac_log/providers/base/providers.py
249
250
251
252
253
254
255
256
257
258
259
260
261
262
@final
def is_loaded(self, name: str) -> bool:
    """Check if the dict is loaded (exists in cache).

    Args:
        name: the dict name

    Returns:
        True if the dict is loaded, False otherwise

    Notes:
        - same as: 'dict_name' in provider (__contains__)
    """
    return name in self

is_missing(name)

Check if the dict is marked as missing.

Parameters:

Name Type Description Default
name str

the dict name

required

Returns:

Type Description
bool

True if the dict is missing, False otherwise

Notes
  • Missing dicts are dicts that failed to load
  • These dicts are marked only when loading were attempted
  • Must not be confused with !is_loaded
Source code in src/lhcbdirac_log/providers/base/providers.py
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
@final
def is_missing(self, name: str) -> bool:
    """Check if the dict is marked as missing.

    Args:
        name: the dict name

    Returns:
        True if the dict is missing, False otherwise

    Notes:
        - Missing dicts are dicts that failed to load
        - These dicts are marked only when loading were attempted
        - Must not be confused with !is_loaded
    """
    return name in self._missing

iter(*, loaded_only=False)

Get the dicts names.

Parameters:

Name Type Description Default
loaded_only bool

if True, only returns the loaded dicts names, otherwise returns all loadable (default False)

False

Returns:

Type Description
Generator[str, None, None]

a generator of the dicts names

Source code in src/lhcbdirac_log/providers/base/providers.py
480
481
482
483
484
485
486
487
488
489
490
@final
def iter(self, *, loaded_only: bool = False) -> Generator[str, None, None]:
    """Get the dicts names.

    Args:
        loaded_only: if True, only returns the loaded dicts names, otherwise returns all loadable (default False)

    Returns:
        a generator of the dicts names
    """
    return self._iter_loaded() if loaded_only else self._iter_all()

mark_invalid(name)

Mark a dict as invalid.

Parameters:

Name Type Description Default
name str

the dict name

required

Raises:

Type Description
DictExistsError

if the dict is already loaded

Notes
  • unload if the dict is loaded but not exist anymore (may never happen under normal usages)
Source code in src/lhcbdirac_log/providers/base/providers.py
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
@final
def mark_invalid(self, name: str) -> None:
    """Mark a dict as invalid.

    Args:
        name: the dict name

    Raises:
        DictExistsError: if the dict is already loaded

    Notes:
        - unload if the dict is loaded but not exist anymore (may never happen under normal usages)
    """
    if self.is_loaded(name):
        if self[name].exists:
            raise DictExistsError(name)
        del self._dicts[name]

    if self.is_missing(name):
        self._missing.discard(name)

    self._mark_invalid(name)
    self._invalid.add(name)

mark_missing(name)

Mark a dict as missing.

Parameters:

Name Type Description Default
name str

the dict name

required

Raises:

Type Description
DictExistsError

if the dict is already loaded

DictInvalidError

if the dict is already marked as invalid

Notes
  • unload if the dict is loaded but not exist (may never happen under normal usages)
Source code in src/lhcbdirac_log/providers/base/providers.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
@final
def mark_missing(self, name: str) -> None:
    """Mark a dict as missing.

    Args:
        name: the dict name

    Raises:
        DictExistsError: if the dict is already loaded
        DictInvalidError: if the dict is already marked as invalid

    Notes:
        - unload if the dict is loaded but not exist (may never happen under normal usages)
    """
    if self.is_loaded(name):
        if self[name].exists:
            raise DictExistsError(name)
        del self._dicts[name]

    if self.is_invalid(name):
        raise DictInvalidError(name)

    self._missing.add(name)

transfer(target)

Transfer (copy) all dicts to another provider.

Parameters:

Name Type Description Default
target DictProvider | None

the target provider

required

Returns:

Type Description
int

the number of transferred dicts

Raises:

Type Description
ValueError

if the target provider is not specified (None)

DictExistsError

if copied dicts already exists in the target provider

ReadOnlyError

if the target provider is read-only

Notes
  • Nothing is transferred if the target provider is the same as the source provider
Source code in src/lhcbdirac_log/providers/base/providers.py
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
@final
def transfer(self, target: DictProvider | None) -> int:
    """Transfer (copy) all dicts to another provider.

    Args:
        target: the target provider

    Returns:
        the number of transferred dicts

    Raises:
        ValueError: if the target provider is not specified (None)
        DictExistsError: if copied dicts already exists in the target provider
        ReadOnlyError: if the target provider is read-only

    Notes:
        - Nothing is transferred if the target provider is the same as the source provider
    """
    if self is target:
        return 0

    if target is None:
        msg = "The target provider must be specified"
        raise ValueError(msg)

    if target.readonly:
        msg = "The target provider is read-only"
        raise ReadOnlyError(msg)

    for i in self._invalid:
        target.mark_invalid(i)

    n = -1
    for n, i in enumerate(self):
        if (d := self.get(i)) is not None:
            target.add(i, d.data, d.zstd_id, load=False)

    return n + 1

JobEntry

Bases: Entry, ABC

Base job entry class, provide an abstraction layer for job access.

This represents a job entry, managing the job's data entries. Job metadata can be accessed on uncompressed jobs. For compressed jobs, the implementation must handle metadata saving and loading.

Source code in src/lhcbdirac_log/providers/base/accessors.py
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
class JobEntry[E: DataEntry](Entry, ABC):
    """Base job entry class, provide an abstraction layer for job access.

    This represents a job entry, managing the job's data entries.
    Job metadata can be accessed on uncompressed jobs.
    For compressed jobs, the implementation must handle metadata saving and loading.
    """

    __slots__ = (
        "_compressed",
        "_job",
        "_info",
    )

    def __init__(self, job: int, *, compressed: bool, readonly: bool) -> None:
        """[Internal] Initialize the job entry.

        Args:
            job: the job id
            compressed: indicate whether the underlying data is compressed or not (in Zstandard)
            readonly: indicate weather the job is read-only or not
        """
        super().__init__(readonly=readonly)

        self._info = None
        self._job = job
        self._compressed = compressed

    @property
    @final
    def job(self) -> int:
        """Get the job id.

        Returns:
            the job id
        """
        return self._job

    @property
    @final
    def compressed(self) -> bool:
        """Check if the underlying data is compressed or not (in Zstandard).

        Returns:
            True if the underlying data is compressed, False otherwise
        """
        return self._compressed

    @property
    def empty(self) -> bool:
        """Check if the job is empty.

        Returns:
            True if the job is empty, False otherwise
        """
        return not any(True for _ in self.files())

    @property
    def info(self) -> JobInfo:
        """Get the job's metadata.

        Returns:
            the job's metadata

        Notes:
            - call update_info to save changes if any
        """
        if self._info is None:
            self._info = self._load_info()

        return self._info

    @staticmethod
    def _readlines(f: BinaryIO, lines: list[bytes], n: int = 1024) -> bool:
        """[Internal] Read lines from a file.

        Args:
            f: the file to read from
            lines: the list to store the lines in
            n: the number of bytes to read at once

        Returns:
            False if EOF is reached, True otherwise
        """
        read = True
        while len(lines) <= 1 and read:
            data = f.read(n)

            if len(data) < n:  # EOF
                read = False

            st, *end = data.splitlines()
            if not lines or lines[-1].endswith(b"\n"):
                lines.append(st)
            else:
                lines[-1] += st
            lines.extend(end)

        return read

    @staticmethod
    def _sub_id(x: str) -> int:
        """[Internal] Get the sub id from the file name.

        Args:
            x: the file name

        Returns:
            the sub id
        """
        try:
            return int(x.rsplit("_", 1)[-1][:-4])
        except ValueError:  # pragma: no cover
            return 0

    def _get_dirac_id(self) -> int | None:
        """[Internal] Get the job's DIRAC ID."""
        try:
            with self.get("job.info").reader() as f:
                read = True
                lines: list[bytes] = []

                while read:
                    read = self._readlines(f, lines)

                    while len(lines) > read:
                        line = lines.pop(0)

                        if line.startswith(b"/JobID"):
                            return int(line.split(b"=")[1].strip())
        except (DataNotExistsError, ValueError):
            pass

    def _get_success(self) -> bool | None:
        """[Internal] Get the job's success status.

        Returns:
            the job's success status
        """
        try:
            file = max((i for i in self.files() if i.startswith("summary") and i.endswith(".xml")), key=self._sub_id)

            with self.get(file).reader() as f:
                read = True
                lines = []

                while read:
                    read = self._readlines(f, lines)

                    while len(lines) > read:
                        line = lines.pop(0)

                        try:
                            return line[line.index(b"<success>") + 9] in b"Tt"
                        except ValueError:
                            continue

        except (DataNotExistsError, ValueError):
            pass
        return None

    def _load_info(self) -> JobInfo:
        """[Internal] Load the job's metadata.

        Returns:
            the job's metadata
        """
        info = JobInfo(None, None)

        if not self._compressed:
            info.dirac_id = self._get_dirac_id()
            info.success = self._get_success()

        return info

    @abstractmethod
    def _update_info(self) -> None:
        """[Internal] Update the job's metadata.

        Notes:
            - implementation not handling saving metadata can leave this method empty
        """

    @final
    def update_info(self, info: JobInfo | None = None) -> None:
        """Update the job's metadata, for compressed job.

        Args:
            info: new metadata to copy from and save (or None to save the current metadata)

        Raises:
            RuntimeError: if called on non-compressed job
            ReadOnlyError: if the job is read-only

        Notes:
            - for non-compressed job, the metadata is loaded directly from the job's data
            - for compressed job, the metadata is not accessible, and so, must be saved by the provider (if support is intended)
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        if not self._compressed:
            msg = f"Job '{self._job}' is not compressed"
            raise RuntimeError(msg)

        if info is None:
            if self._info is None:
                self._info = self._load_info()
        elif self._info is None:  # copy the info
            self._info = JobInfo(info.dirac_id, info.success)
        else:  # transfer the info
            self._info.dirac_id = info.dirac_id
            self._info.success = info.success

        self._update_info()

    @abstractmethod
    def _get(self, name: str, *, create: bool = False) -> E:
        """[Internal] Get a data entry.

        Args:
            name: the data name
            create: if True, create the data if it does not exist (default is False)

        Returns:
            the data entry

        Raises:
            DataNotExistsError: if the data does not exist and create is False

        Notes:
            - the entry, if newly created, will not exist until data is written
        """

    @final
    def get(self, name: str, *, create: bool = False) -> E:
        """Get a data entry.

        Args:
            name: the data name
            create: if True, create the data if it does not exist (default is False)

        Returns:
            the data entry

        Raises:
            DataNotExistsError: if the data does not exist and create is False
            ReadOnlyError: if the job is read-only and create is True

        Notes:
            - the entry, if newly created, will not exist until data is written
        """
        if create and self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        return self._get(name, create=create)

    @abstractmethod
    def _create(self, name: str, *, exists_ok: bool = False) -> E:
        """[Internal] Create a data entry.

        Args:
            name: the data name
            exists_ok: if True, ignore the error if the data already exists (default is False)

        Returns:
            the data entry

        Raises:
            DataExistsError: if the data already exists and exists_ok is False

        Notes:
            - the entry, if newly created, will not exist until data is written
        """

    @final
    def create(self, name: str, *, exists_ok: bool = False) -> E:
        """Create a data entry.

        Args:
            name: the data name
            exists_ok: if True, ignore the error if the data already exists (default is False)

        Returns:
            the data entry

        Raises:
            DataExistsError: if the data already exists and exists_ok is False
            ReadOnlyError: if the job is read-only

        Notes:
            - the entry, if newly created, will not exist until data is written
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        return self._create(name, exists_ok=exists_ok)

    def delete(self, name: str) -> None:
        """Delete a data entry.

        Args:
            name: the data name

        Raises:
            DataNotExistsError: if the data does not exist
            ReadOnlyError: if the job is read-only
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        self.get(name).delete()

    def clear(self) -> None:
        """Clear all data entries.

        Raises:
            ReadOnlyError: if the job is read-only
        """
        if self._readonly:
            msg = f"Job '{self._job}' is read-only"
            raise ReadOnlyError(msg)

        for i in self:
            i.delete()

    @abstractmethod
    def files(self) -> Generator[str, None, None]:
        """Get a generator of the job's files.

        Returns:
            a generator of the job's file names
        """

    @property
    def data_size(self) -> int:
        """Get all stored data size.

        Returns:
            the sum of all the job's data sizes

        Notes:
            - see DataEntry.size for more details
        """
        return sum(i.size for i in self)

    @property
    def job_size(self) -> int:
        """Get the stored job size.

        Returns:
            the stored job size

        Notes:
            - may include additional overheads compared to data_size
            - may represent better the 'on-disk' size
            - may not be necessarily upper or lower than data_size
        """
        return self.data_size

    @final
    def __iter__(self) -> Iterator[E]:
        """Iterate over all the job's data entries.

        Returns:
            a generator of the job's data entries
        """
        return (self.get(i) for i in self.files())

    def __len__(self) -> int:
        """Get the number of data entries.

        Returns:
            the number of data entries
        """
        return sum(1 for _ in self.files())

compressed: bool property

Check if the underlying data is compressed or not (in Zstandard).

Returns:

Type Description
bool

True if the underlying data is compressed, False otherwise

data_size: int property

Get all stored data size.

Returns:

Type Description
int

the sum of all the job's data sizes

Notes
  • see DataEntry.size for more details

empty: bool property

Check if the job is empty.

Returns:

Type Description
bool

True if the job is empty, False otherwise

info: JobInfo property

Get the job's metadata.

Returns:

Type Description
JobInfo

the job's metadata

Notes
  • call update_info to save changes if any

job: int property

Get the job id.

Returns:

Type Description
int

the job id

job_size: int property

Get the stored job size.

Returns:

Type Description
int

the stored job size

Notes
  • may include additional overheads compared to data_size
  • may represent better the 'on-disk' size
  • may not be necessarily upper or lower than data_size

__init__(job, *, compressed, readonly)

[Internal] Initialize the job entry.

Parameters:

Name Type Description Default
job int

the job id

required
compressed bool

indicate whether the underlying data is compressed or not (in Zstandard)

required
readonly bool

indicate weather the job is read-only or not

required
Source code in src/lhcbdirac_log/providers/base/accessors.py
486
487
488
489
490
491
492
493
494
495
496
497
498
def __init__(self, job: int, *, compressed: bool, readonly: bool) -> None:
    """[Internal] Initialize the job entry.

    Args:
        job: the job id
        compressed: indicate whether the underlying data is compressed or not (in Zstandard)
        readonly: indicate weather the job is read-only or not
    """
    super().__init__(readonly=readonly)

    self._info = None
    self._job = job
    self._compressed = compressed

__iter__()

Iterate over all the job's data entries.

Returns:

Type Description
Iterator[E]

a generator of the job's data entries

Source code in src/lhcbdirac_log/providers/base/accessors.py
836
837
838
839
840
841
842
843
@final
def __iter__(self) -> Iterator[E]:
    """Iterate over all the job's data entries.

    Returns:
        a generator of the job's data entries
    """
    return (self.get(i) for i in self.files())

__len__()

Get the number of data entries.

Returns:

Type Description
int

the number of data entries

Source code in src/lhcbdirac_log/providers/base/accessors.py
845
846
847
848
849
850
851
def __len__(self) -> int:
    """Get the number of data entries.

    Returns:
        the number of data entries
    """
    return sum(1 for _ in self.files())

clear()

Clear all data entries.

Raises:

Type Description
ReadOnlyError

if the job is read-only

Source code in src/lhcbdirac_log/providers/base/accessors.py
789
790
791
792
793
794
795
796
797
798
799
800
def clear(self) -> None:
    """Clear all data entries.

    Raises:
        ReadOnlyError: if the job is read-only
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    for i in self:
        i.delete()

create(name, *, exists_ok=False)

Create a data entry.

Parameters:

Name Type Description Default
name str

the data name

required
exists_ok bool

if True, ignore the error if the data already exists (default is False)

False

Returns:

Type Description
E

the data entry

Raises:

Type Description
DataExistsError

if the data already exists and exists_ok is False

ReadOnlyError

if the job is read-only

Notes
  • the entry, if newly created, will not exist until data is written
Source code in src/lhcbdirac_log/providers/base/accessors.py
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
@final
def create(self, name: str, *, exists_ok: bool = False) -> E:
    """Create a data entry.

    Args:
        name: the data name
        exists_ok: if True, ignore the error if the data already exists (default is False)

    Returns:
        the data entry

    Raises:
        DataExistsError: if the data already exists and exists_ok is False
        ReadOnlyError: if the job is read-only

    Notes:
        - the entry, if newly created, will not exist until data is written
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    return self._create(name, exists_ok=exists_ok)

delete(name)

Delete a data entry.

Parameters:

Name Type Description Default
name str

the data name

required

Raises:

Type Description
DataNotExistsError

if the data does not exist

ReadOnlyError

if the job is read-only

Source code in src/lhcbdirac_log/providers/base/accessors.py
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
def delete(self, name: str) -> None:
    """Delete a data entry.

    Args:
        name: the data name

    Raises:
        DataNotExistsError: if the data does not exist
        ReadOnlyError: if the job is read-only
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    self.get(name).delete()

files() abstractmethod

Get a generator of the job's files.

Returns:

Type Description
Generator[str, None, None]

a generator of the job's file names

Source code in src/lhcbdirac_log/providers/base/accessors.py
802
803
804
805
806
807
808
@abstractmethod
def files(self) -> Generator[str, None, None]:
    """Get a generator of the job's files.

    Returns:
        a generator of the job's file names
    """

get(name, *, create=False)

Get a data entry.

Parameters:

Name Type Description Default
name str

the data name

required
create bool

if True, create the data if it does not exist (default is False)

False

Returns:

Type Description
E

the data entry

Raises:

Type Description
DataNotExistsError

if the data does not exist and create is False

ReadOnlyError

if the job is read-only and create is True

Notes
  • the entry, if newly created, will not exist until data is written
Source code in src/lhcbdirac_log/providers/base/accessors.py
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
@final
def get(self, name: str, *, create: bool = False) -> E:
    """Get a data entry.

    Args:
        name: the data name
        create: if True, create the data if it does not exist (default is False)

    Returns:
        the data entry

    Raises:
        DataNotExistsError: if the data does not exist and create is False
        ReadOnlyError: if the job is read-only and create is True

    Notes:
        - the entry, if newly created, will not exist until data is written
    """
    if create and self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    return self._get(name, create=create)

update_info(info=None)

Update the job's metadata, for compressed job.

Parameters:

Name Type Description Default
info JobInfo | None

new metadata to copy from and save (or None to save the current metadata)

None

Raises:

Type Description
RuntimeError

if called on non-compressed job

ReadOnlyError

if the job is read-only

Notes
  • for non-compressed job, the metadata is loaded directly from the job's data
  • for compressed job, the metadata is not accessible, and so, must be saved by the provider (if support is intended)
Source code in src/lhcbdirac_log/providers/base/accessors.py
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
@final
def update_info(self, info: JobInfo | None = None) -> None:
    """Update the job's metadata, for compressed job.

    Args:
        info: new metadata to copy from and save (or None to save the current metadata)

    Raises:
        RuntimeError: if called on non-compressed job
        ReadOnlyError: if the job is read-only

    Notes:
        - for non-compressed job, the metadata is loaded directly from the job's data
        - for compressed job, the metadata is not accessible, and so, must be saved by the provider (if support is intended)
    """
    if self._readonly:
        msg = f"Job '{self._job}' is read-only"
        raise ReadOnlyError(msg)

    if not self._compressed:
        msg = f"Job '{self._job}' is not compressed"
        raise RuntimeError(msg)

    if info is None:
        if self._info is None:
            self._info = self._load_info()
    elif self._info is None:  # copy the info
        self._info = JobInfo(info.dirac_id, info.success)
    else:  # transfer the info
        self._info.dirac_id = info.dirac_id
        self._info.success = info.success

    self._update_info()

JobExistsError

Bases: Exception

Raised when a job already exists.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
52
53
class JobExistsError(Exception):
    """Raised when a job already exists."""

JobInfo dataclass

Metadata for a job entry.

Unset attributes are set to None.

Attributes:

Name Type Description
dirac_id int | None

the job dirac id

success bool | None

the job success status

Source code in src/lhcbdirac_log/providers/base/accessors.py
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
@dataclass
class JobInfo:
    """Metadata for a job entry.

    Unset attributes are set to None.

    Attributes:
        dirac_id: the job dirac id
        success: the job success status
    """

    __slots__ = (
        "dirac_id",
        "success",
    )

    dirac_id: int | None
    success: bool | None

JobNotExistsError

Bases: Exception

Raised when a job does not exist.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
40
41
class JobNotExistsError(Exception):
    """Raised when a job does not exist."""

ReadOnlyError

Bases: Exception

Raised when a write operation is attempted on a read-only object.

Source code in src/lhcbdirac_log/providers/base/exceptions.py
28
29
class ReadOnlyError(Exception):
    """Raised when a write operation is attempted on a read-only object."""