Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 8.x] Use Python types to declare document fields #1847

Merged
merged 1 commit into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 174 additions & 6 deletions docs/persistence.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ layer for your application.
For more comprehensive examples have a look at the examples_ directory in the
repository.

.. _examples: https://github.com/elastic/elasticsearch-dsl-py/tree/master/examples
.. _examples: https://github.com/elastic/elasticsearch-dsl-py/tree/main/examples

.. _doc_type:

Expand Down Expand Up @@ -66,14 +66,14 @@ settings in elasticsearch (see :ref:`life-cycle` for details).
Data types
~~~~~~~~~~

The ``Document`` instances should be using native python types like
The ``Document`` instances use native python types like ``str`` and
``datetime``. In case of ``Object`` or ``Nested`` fields an instance of the
``InnerDoc`` subclass should be used just like in the ``add_comment`` method in
the above example where we are creating an instance of the ``Comment`` class.
``InnerDoc`` subclass is used, as in the ``add_comment`` method in the above
example where we are creating an instance of the ``Comment`` class.

There are some specific types that were created as part of this library to make
working with specific field types easier, for example the ``Range`` object used
in any of the `range fields
working with some field types easier, for example the ``Range`` object used in
any of the `range fields
<https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html>`_:

.. code:: python
Expand Down Expand Up @@ -103,6 +103,174 @@ in any of the `range fields
# empty range is unbounded
Range().lower # None, False

Python Type Hints
~~~~~~~~~~~~~~~~~

Document fields can be defined using standard Python type hints if desired.
Here are some simple examples:

.. code:: python

from typing import Optional

class Post(Document):
title: str # same as title = Text(required=True)
created_at: Optional[datetime] # same as created_at = Date(required=False)
published: bool # same as published = Boolean(required=True)

It is important to note that when using ``Field`` subclasses such as ``Text``,
``Date`` and ``Boolean``, they must be given in the right-side of an assignment,
as shown in examples above. Using these classes as type hints will result in
errors.

Python types are mapped to their corresponding field type according to the
following table:

.. list-table:: Python type to DSL field mappings
:header-rows: 1

* - Python type
- DSL field
* - ``str``
- ``Text(required=True)``
* - ``bool``
- ``Boolean(required=True)``
* - ``int``
- ``Integer(required=True)``
* - ``float``
- ``Float(required=True)``
* - ``bytes``
- ``Binary(required=True)``
* - ``datetime``
- ``Date(required=True)``
* - ``date``
- ``Date(format="yyyy-MM-dd", required=True)``

To type a field as optional, the standard ``Optional`` modifier from the Python
``typing`` package can be used. The ``List`` modifier can be added to a field
to convert it to an array, similar to using the ``multi=True`` argument on the
field object.

.. code:: python

from typing import Optional, List

class MyDoc(Document):
pub_date: Optional[datetime] # same as pub_date = Date()
authors: List[str] # same as authors = Text(multi=True, required=True)
comments: Optional[List[str]] # same as comments = Text(multi=True)

A field can also be given a type hint of an ``InnerDoc`` subclass, in which
case it becomes an ``Object`` field of that class. When the ``InnerDoc``
subclass is wrapped with ``List``, a ``Nested`` field is created instead.

.. code:: python

from typing import List

class Address(InnerDoc):
...

class Comment(InnerDoc):
...

class Post(Document):
address: Address # same as address = Object(Address, required=True)
comments: List[Comment] # same as comments = Nested(Comment, required=True)

Unfortunately it is impossible to have Python type hints that uniquely
identify every possible Elasticsearch field type. To choose a field type that
is different than the ones in the table above, the field instance can be added
explicitly as a right-side assignment in the field declaration. The next
example creates a field that is typed as ``Optional[str]``, but is mapped to
``Keyword`` instead of ``Text``:

.. code:: python

class MyDocument(Document):
category: Optional[str] = Keyword()

This form can also be used when additional options need to be given to
initialize the field, such as when using custom analyzer settings or changing
the ``required`` default:

.. code:: python

class Comment(InnerDoc):
content: str = Text(analyzer='snowball', required=True)

When using type hints as above, subclasses of ``Document`` and ``InnerDoc``
inherit some of the behaviors associated with Python dataclasses, as defined by
`PEP 681 <https://peps.python.org/pep-0681/>`_ and the
`dataclass_transform decorator <https://typing.readthedocs.io/en/latest/spec/dataclasses.html#dataclass-transform>`_.
To add per-field dataclass options such as ``default`` or ``default_factory``,
the ``mapped_field()`` wrapper can be used on the right side of a typed field
declaration:

.. code:: python

class MyDocument(Document):
title: str = mapped_field(default="no title")
created_at: datetime = mapped_field(default_factory=datetime.now)
published: bool = mapped_field(default=False)
category: str = mapped_field(Keyword(required=True), default="general")

When using the ``mapped_field()`` wrapper function, an explicit field type
instance can be passed as a first positional argument, as the ``category``
field does in the example above.

Static type checkers such as `mypy <https://mypy-lang.org/>`_ and
`pyright <https://github.com/microsoft/pyright>`_ can use the type hints and
the dataclass-specific options added to the ``mapped_field()`` function to
improve type inference and provide better real-time suggestions in IDEs.

One situation in which type checkers can't infer the correct type is when
using fields as class attributes. Consider the following example:

.. code:: python

class MyDocument(Document):
title: str

doc = MyDocument()
# doc.title is typed as "str" (correct)
# MyDocument.title is also typed as "str" (incorrect)

To help type checkers correctly identify class attributes as such, the ``M``
generic must be used as a wrapper to the type hint, as shown in the next
examples:

.. code:: python

from elasticsearch_dsl import M

class MyDocument(Document):
title: M[str]
created_at: M[datetime] = mapped_field(default_factory=datetime.now)

doc = MyDocument()
# doc.title is typed as "str"
# doc.created_at is typed as "datetime"
# MyDocument.title is typed as "InstrumentedField"
# MyDocument.created_at is typed as "InstrumentedField"

Note that the ``M`` type hint does not provide any runtime behavior and its use
is not required, but it can be useful to eliminate spurious type errors in IDEs
or type checking builds.

The ``InstrumentedField`` objects returned when fields are accessed as class
attributes are proxies for the field instances that can be used anywhere a
field needs to be referenced, such as when specifying sort options in a
``Search`` object:

.. code:: python

# sort by creation date descending, and title ascending
s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)

When specifying sorting order, the ``+`` and ``-`` unary operators can be used
on the class field attributes to indicate ascending and descending order.

Note on dates
~~~~~~~~~~~~~

Expand Down
4 changes: 3 additions & 1 deletion elasticsearch_dsl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from .aggs import A
from .analysis import analyzer, char_filter, normalizer, token_filter, tokenizer
from .document import AsyncDocument, Document
from .document_base import InnerDoc, MetaField
from .document_base import InnerDoc, M, MetaField, mapped_field
from .exceptions import (
ElasticsearchDslException,
IllegalOperation,
Expand Down Expand Up @@ -148,6 +148,7 @@
"Keyword",
"Long",
"LongRange",
"M",
"Mapping",
"MetaField",
"MultiSearch",
Expand Down Expand Up @@ -178,6 +179,7 @@
"char_filter",
"connections",
"construct_field",
"mapped_field",
"normalizer",
"token_filter",
"tokenizer",
Expand Down
4 changes: 3 additions & 1 deletion elasticsearch_dsl/_async/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
import collections.abc

from elasticsearch.exceptions import NotFoundError, RequestError
from typing_extensions import dataclass_transform

from .._async.index import AsyncIndex
from ..async_connections import get_connection
from ..document_base import DocumentBase, DocumentMeta
from ..document_base import DocumentBase, DocumentMeta, mapped_field
from ..exceptions import IllegalOperation
from ..utils import DOC_META_FIELDS, META_FIELDS, merge
from .search import AsyncSearch
Expand Down Expand Up @@ -62,6 +63,7 @@ def construct_index(cls, opts, bases):
return i


@dataclass_transform(field_specifiers=(mapped_field,))
class AsyncDocument(DocumentBase, metaclass=AsyncIndexMeta):
"""
Model-like class for persisting documents in elasticsearch.
Expand Down
4 changes: 3 additions & 1 deletion elasticsearch_dsl/_sync/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
import collections.abc

from elasticsearch.exceptions import NotFoundError, RequestError
from typing_extensions import dataclass_transform

from .._sync.index import Index
from ..connections import get_connection
from ..document_base import DocumentBase, DocumentMeta
from ..document_base import DocumentBase, DocumentMeta, mapped_field
from ..exceptions import IllegalOperation
from ..utils import DOC_META_FIELDS, META_FIELDS, merge
from .search import Search
Expand Down Expand Up @@ -60,6 +61,7 @@ def construct_index(cls, opts, bases):
return i


@dataclass_transform(field_specifiers=(mapped_field,))
class Document(DocumentBase, metaclass=IndexMeta):
"""
Model-like class for persisting documents in elasticsearch.
Expand Down
Loading
Loading