efro.dataclassio package¶

Functionality for importing, exporting, and validating dataclasses.

This allows complex nested dataclasses to be flattened to json-compatible data and restored from said data. It also gracefully handles and preserves unrecognized attribute data, allowing older clients to interact with newer data formats in a nondestructive manner.

class efro.dataclassio.Codec(*values)[source]¶

Bases: Enum

Specifies expected data format exported to or imported from.

FIRESTORE = 'firestore'¶: Mostly like JSON but passes bytes and datetime objects through as-is instead of converting them to json-friendly types.

JSON = 'json'¶: Use only types that will translate cleanly to/from json - lists, dicts with str keys, bools, ints, floats, and None.

class efro.dataclassio.DataclassFieldLookup(cls: type[T])[source]¶

Bases: Generic

Get info about nested dataclass fields in type-safe way.

path(callback: Callable[[T], Any]) → str[source]¶

Look up a path on child dataclass fields.

Example

DataclassFieldLookup(MyType).path(lambda obj: obj.foo.bar)

The above example will return the string ‘foo.bar’ or something like ‘f.b’ if the dataclasses have custom storage names set. It will also be static-type-checked, triggering an error if MyType.foo.bar is not a valid path. Note, however, that the callback technically allows any return value but only nested dataclasses and their fields will succeed.

paths(callback: Callable[[T], list[Any]]) → list[str][source]¶

Look up multiple paths on child dataclass fields.

Functionality is identical to path() but for multiple paths at once.

Example

DataclassFieldLookup(MyType).paths(lambda obj: [obj.foo, obj.bar])

class efro.dataclassio.IOAttrs(storagename: str | None = None, *, store_default: bool = True, whole_days: bool = False, whole_hours: bool = False, whole_minutes: bool = False, float_times: bool = False, soft_default: Any = <MISSING>, soft_default_factory: Callable[[], Any] | _MissingType = <MISSING>, enum_fallback: Enum | None = None, multiline: bool | None = None)[source]¶

Bases: object

Used to customize dataclassio behavior for particular fields.

Example: specify that ‘value2’ will be stored as ‘v2’:

@ioprepped
@dataclass
class MyData:
    value1: int = 1
    value2: Annotated[int, IOAttrs('v2')] = 2

>>> print(dataclass_to_dict(MyData()))

# Output: {'value1': 1, 'v2': 2}

Providing fixed storagenames for all fields can allow the freedom to rename fields later without worrying about breaking existing data.

enum_fallback: Enum | None = None¶: If provided, specifies an enum value that can be substituted in the case of unrecognized input values. This can allow newer data to remain loadable in older environments. Note that ‘lossy’ must be enabled in the top level load call for this to apply, since it can fundamentally modify data.

float_times: bool = False¶: If True, values of type datetime.datetime (in json codec) and datetime.timedelta (in all codecs) will be stored as single float timestamp/seconds values instead of the default list of ints. This is more concise but introduces the possibility of restored values varying slightly from originals due to floating-point precision limitations.

multiline: bool | None = None¶: If provided for a string, hints as to whether multi line values are allowed/expected. Can be referenced when creating UI for editing the value. Does not actually affect value input/output.

soft_default: Any = <MISSING>¶: If passed, injects a default value into dataclass instantiation when the field is not present in the input data. This allows dataclasses to add new non-optional fields while gracefully ‘upgrading’ old data. Note that when a soft-default is present it will take precedence over field defaults when determining whether to store a value for a field with store_default=False (since the soft_default value is what we’ll get when reading that same data back in when the field is omitted).

soft_default_factory: Callable[[], Any] | _MissingType = <MISSING>¶: Is similar to ‘default_factory’ in dataclass fields; it should be used instead of ‘soft_default’ for mutable types such as lists to prevent a single default object from unintentionally changing over time.

storagename: str | None = None¶: If passed, is the name used when storing to json/etc.

store_default: bool = True¶: Can be set to False to avoid writing values when equal to the default value. Note that this requires the dataclass field to define a default or default_factory or for its IOAttrs to define a soft_default value.

validate_datetime(value: datetime, fieldpath: str) → None[source]¶: Ensure a datetime value meets our value requirements.

validate_for_field(cls: type, field: Field) → None[source]¶: Ensure the IOAttrs is ok to use with provided field.

whole_days: bool = False¶: If True, requires datetime values to be exactly on day boundaries (see efro.util.utc_today()).

whole_hours: bool = False¶: If True, requires datetime values to lie exactly on hour boundaries (see efro.util.utc_this_hour()).

whole_minutes: bool = False¶: If True, requires datetime.datetime values to lie exactly on minute boundaries (see efro.util.utc_this_minute()).

class efro.dataclassio.IOExtendedData[source]¶

Bases: object

A class types can inherit from for extra functionality.

did_input() → None[source]¶

Called on a class instance after created from data.

Can be useful to correct values from the db, etc. in the type-safe form.

classmethod handle_input_error(exc: Exception) → Self | None[source]¶

Called when an error occurs during input decoding.

This allows a type to optionally return substitute data to be used in place of the failed decode. If it returns None, the original exception is re-raised.

It is generally a bad idea to apply catch-alls such as this, as it can lead to silent data loss. This should only be used in specific cases such as user settings where an occasional reset is harmless and is preferable to keeping all contained enums and other values backward compatible indefinitely.

classmethod will_input(data: dict) → None[source]¶

Called on data before a class instance is created from it.

Can be overridden to migrate old data formats to new, etc.

will_output() → None[source]¶

Called before data is sent to an outputter.

Can be overridden to validate or filter data before sending it on its way.

class efro.dataclassio.IOMultiType[source]¶

Bases: Generic

A base class for types that can map to multiple dataclass types.

This enables usage of high level base classes (for example a ‘Message’ type) in annotations, with dataclassio automatically serializing & deserializing dataclass subclasses based on their type (‘MessagePing’, ‘MessageChat’, etc.)

Standard usage involves creating a class which inherits from this one which acts as a ‘registry’, and then creating dataclass classes inheriting from that registry class. Dataclassio will then do the right thing when that registry class is used in type annotations.

See tests/test_efro/test_dataclassio.py for examples.

classmethod get_type(type_id: EnumT) → type[Self][source]¶: Return a specific subclass given a type-id.

classmethod get_type_cached(type_id: EnumT) → type[Self][source]¶

Version of get_type() with caching.

Generally end-users of a multi-type class should use this instead of calling get_type() directly. It lazily caches looked up types so can be significantly more efficient with large multitypes and repeat lookups.

classmethod get_type_id() → EnumT[source]¶: Return the type-id for this subclass.

classmethod get_type_id_storage_name() → str[source]¶

Return the key used to store type id in serialized data.

The default is an obscure value so that it does not conflict with members of individual type attrs, but in some cases one might prefer to serialize it to something simpler like ‘type’ by overriding this call. One just needs to make sure that no encompassed types serialize anything to ‘type’ themself.

classmethod get_type_id_type() → type[EnumT][source]¶: Return the Enum type this class uses as its type-id.

classmethod get_unknown_type_fallback() → Self | None[source]¶

Return a fallback object in cases of unrecognized types.

This can allow newer data to remain readable in older environments. Use caution with this option, however, as it effectively modifies data.

class efro.dataclassio.JsonStyle(*values)[source]¶

Bases: Enum

Different style types for json.

FAST = 'fast'¶: Single line, no spaces, no sorting. Not deterministic. Use this where speed is more important than determinism.

PRETTY = 'pretty'¶: Multiple lines, spaces, sorted keys. Deterministic. Use this for pretty human readable output.

SORTED = 'sorted'¶: Single line, no spaces, sorted keys. Deterministic. Use this when output may be hashed or compared for equality.

efro.dataclassio.dataclass_from_dict(cls: type[T], values: dict, *, codec: Codec = Codec.JSON, coerce_to_float: bool = True, allow_unknown_attrs: bool = True, discard_unknown_attrs: bool = False, lossy: bool = False) → T[source]¶

Given a dict, return a dataclass of a given type.

The dict must be formatted to match the specified codec (generally json-friendly object types). This means that sequence values such as tuples or sets should be passed as lists, enums should be passed as their associated values, nested dataclasses should be passed as dicts, etc.

All values are checked to ensure their types/values are valid.

Data for attributes of type Any will be checked to ensure they match types supported directly by json. This does not include types such as tuples which are implicitly translated by Python’s json module (as this would break the ability to do a lossless round-trip with data).

If coerce_to_float is True, int values passed for float typed fields will be converted to float values. Otherwise, a TypeError is raised.

If ‘allow_unknown_attrs’ is False, AttributeErrors will be raised for attributes present in the dict but not on the data class. Otherwise, they will be preserved as part of the instance and included if it is exported back to a dict, unless discard_unknown_attrs is True, in which case they will simply be discarded.

If lossy is True, Enum attrs and IOMultiType types are allowed to use any fallbacks defined for them. This can allow older schemas to successfully load newer data, but this can fundamentally modify the data, so the resulting object is flagged as ‘lossy’ and prevented from being serialized back out by default.

efro.dataclassio.dataclass_from_json(cls: type[T], json_str: str, *, coerce_to_float: bool = True, allow_unknown_attrs: bool = True, discard_unknown_attrs: bool = False, lossy: bool = False) → T[source]¶

Return a dataclass instance given a json string.

Basically dataclass_from_dict(json.loads(…))

efro.dataclassio.dataclass_hash(obj: Any, coerce_to_float: bool = True) → str[source]¶

Calculate a hash for the provided dataclass.

Basically this emits json for the dataclass (with keys sorted to keep things deterministic) and hashes the resulting string.

efro.dataclassio.dataclass_to_dict(obj: Any, codec: Codec = Codec.JSON, coerce_to_float: bool = True, discard_extra_attrs: bool = False) → dict[source]¶

Given a dataclass object, return a json-friendly dict.

All values will be checked to ensure they match the types specified on fields. Note that a limited set of types and data configurations is supported.

Values with type Any will be checked to ensure they match types supported directly by json. This does not include types such as tuples which are implicitly translated by Python’s json module (as this would break the ability to do a lossless round-trip with data).

If coerce_to_float is True, integer values present on float typed fields will be converted to float in the dict output. If False, a TypeError will be triggered.

efro.dataclassio.dataclass_to_json(obj: Any, coerce_to_float: bool = True, pretty: bool = False, sort_keys: bool | None = None) → str[source]¶

Utility function; return a json string from a dataclass instance.

Basically json.dumps(dataclass_to_dict(…)). By default, keys are sorted for pretty output and not otherwise, but this can be overridden by supplying a value for the ‘sort_keys’ arg.

efro.dataclassio.dataclass_validate(obj: Any, coerce_to_float: bool = True, codec: Codec = Codec.JSON, discard_extra_attrs: bool = False) → None[source]¶: Ensure that values in a dataclass instance are the correct types.

efro.dataclassio.ioprep(cls: type, globalns: dict | None = None) → None[source]¶

Prep a dataclass type for use with this module’s functionality.

Prepping ensures that all types contained in a data class as well as the usage of said types are supported by this module and pre-builds necessary constructs needed for encoding/decoding/etc.

Prepping will happen on-the-fly as needed, but a warning will be emitted in such cases, as it is better to explicitly prep all used types early in a process to ensure any invalid types or configuration are caught immediately.

Prepping a dataclass involves evaluating its type annotations, which, as of PEP 563, are stored simply as strings. This evaluation is done with localns set to the class dict (so that types defined in the class can be used) and globalns set to the containing module’s class. It is possible to override globalns for special cases such as when prepping happens as part of an execed string instead of within a module.

efro.dataclassio.ioprepped(cls: type[T]) → type[T][source]¶

Class decorator for easily prepping a dataclass at definition time.

Note that in some cases it may not be possible to prep a dataclass immediately (such as when its type annotations refer to forward-declared types). In these cases, dataclass_prep() should be explicitly called for the class as soon as possible; ideally at module import time to expose any errors as early as possible in execution.

efro.dataclassio.is_ioprepped_dataclass(obj: Any) → bool[source]¶: Return whether the obj is an ioprepped dataclass type or instance.

efro.dataclassio.parse_annotated(anntype: Any) → tuple[Any, IOAttrs | None][source]¶: Parse Annotated() constructs, returning annotated type & IOAttrs.

efro.dataclassio.will_ioprep(cls: type[T]) → type[T][source]¶

Class decorator hinting that we will prep a class later.

In some cases (such as recursive types) we cannot use the @ioprepped decorator and must instead call ioprep() explicitly later. However, some of our custom pylint checking behaves differently when the @ioprepped decorator is present, in that case requiring type annotations to be present and not simply forward declared under an “if TYPE_CHECKING” block. (since they are used at runtime).

The @will_ioprep decorator triggers the same pylint behavior differences as @ioprepped (which are necessary for the later ioprep() call to work correctly) but without actually running any prep itself.

Submodules¶

efro.dataclassio.extras module¶

Extra rarely-needed functionality related to dataclasses.

class efro.dataclassio.extras.DataclassDiff(obj1: Any, obj2: Any)[source]¶

Bases: object

Wraps dataclass_diff() in an object for efficiency.

It is preferable to pass this to logging calls instead of the final diff string since the diff will never be generated if the associated logging level is not being emitted.

efro.dataclassio.extras.dataclass_diff(obj1: Any, obj2: Any) → str[source]¶

Generate a string showing differences between two dataclass instances.

Both must be of the exact same type.

efro.dataclassio.templatemultitype module¶

Template for an IOMultitype setup.

To use this template, simply copy the contents of this module somewhere and then replace ‘TemplateMultiType’ with ‘YourType’.

class efro.dataclassio.templatemultitype.TemplateMultiType[source]¶

Bases: IOMultiType[TemplateMultiTypeTypeID]

Top level class for our multitype.

classmethod get_type(type_id: TemplateMultiTypeTypeID) → type[TemplateMultiType][source]¶: Return the subclass for each of our type-ids.

classmethod get_type_id() → TemplateMultiTypeTypeID[source]¶: Return the type-id for this subclass.

class efro.dataclassio.templatemultitype.TemplateMultiTypeTypeID(*values)[source]¶

Bases: Enum

Type ID for each of our subclasses.

TEST = 'test'¶

class efro.dataclassio.templatemultitype.Test[source]¶

Bases: TemplateMultiType

Just a test.

classmethod get_type_id() → TemplateMultiTypeTypeID[source]¶: Return the type-id for this subclass.