run pre-commit, fix errs and spelling

materialsproject · Jul 31, 2023 · a3b90a7 · a3b90a7
1 parent 081c500
commit a3b90a7
Show file tree

Hide file tree

Showing 76 changed files with 685 additions and 1,560 deletions.
diff --git a/.github/workflows/testing.yml b/.github/workflows/testing.yml
@@ -29,7 +29,7 @@ jobs:
         run: |
           pip install pre-commit
           pre-commit run
-  
+
   test:
     needs: lint
     services:

diff --git a/docs/concepts.md b/docs/concepts.md
@@ -12,7 +12,7 @@ s2 -- Builder 3-->s4(Store 4)
 
 ## Store
 
-A major challenge in building scalable data piplines is dealing with all the different types of data sources out there. Maggma's `Store` class provides a consistent, unified interface for querying data from arbitrary data
+A major challenge in building scalable data pipelines is dealing with all the different types of data sources out there. Maggma's `Store` class provides a consistent, unified interface for querying data from arbitrary data
 sources. It was originally built around MongoDB, so it's interface closely resembles `PyMongo` syntax. However,
 Maggma makes it possible to use that same syntax to query other types of databases, such as Amazon S3, GridFS, or even files on disk.
 
@@ -34,4 +34,4 @@ Both `get_items` and `update_targets` can perform IO (input/output) to the data
 
 Another challenge in building complex data-transformation codes is keeping track of all the settings necessary to make some output database. One bad solution is to hard-code these settings, but then any modification is difficult to keep track of.
 
-Maggma solves this by putting the configuration with the pipeline definition in JSON or YAML files. This is done using the `MSONable` pattern, which requires that any Maggma object (the databases and transformation steps) can convert itself to a python dictionary with it's configuration parameters in a process called serialization. These dictionaries can then be converted back to the origianl Maggma object without having to know what class it belonged. `MSONable` does this by injecting in `@class` and `@module` keys that tell it where to find the original python code for that Maggma object.
+Maggma solves this by putting the configuration with the pipeline definition in JSON or YAML files. This is done using the `MSONable` pattern, which requires that any Maggma object (the databases and transformation steps) can convert itself to a python dictionary with it's configuration parameters in a process called serialization. These dictionaries can then be converted back to the original Maggma object without having to know what class it belonged. `MSONable` does this by injecting in `@class` and `@module` keys that tell it where to find the original python code for that Maggma object.
diff --git a/docs/getting_started/advanced_builder.md b/docs/getting_started/advanced_builder.md
@@ -42,4 +42,4 @@ Since `maggma` is designed around Mongo style data sources and sinks, building i
 `maggma` implements templates for builders that have many of these advanced features listed above:
 
 - [MapBuilder](map_builder.md) Creates one-to-one document mapping of items in the source Store to the transformed documents in the target Store.
-- [GroupBuilder](group_builder.md) Creates many-to-one document mapping of items in the source Store to transformed documents in the traget Store
+- [GroupBuilder](group_builder.md) Creates many-to-one document mapping of items in the source Store to transformed documents in the target Store
diff --git a/docs/getting_started/group_builder.md b/docs/getting_started/group_builder.md
@@ -56,7 +56,7 @@ class ResupplyBuilder(GroupBuilder):
         super().__init__(source=inventory, target=resupply, grouping_properties=["type"], **kwargs)
 ```
 
-Note that unlike the previous `MapBuilder` example, we didn't call the source and target stores as such. Providing more usefull names is a good idea in writing builders to make it clearer what the underlying data should look like.
+Note that unlike the previous `MapBuilder` example, we didn't call the source and target stores as such. Providing more useful names is a good idea in writing builders to make it clearer what the underlying data should look like.
 
 `GroupBuilder` inherits from `MapBuilder` so it has the same configurational parameters.
 
@@ -65,7 +65,7 @@ Note that unlike the previous `MapBuilder` example, we didn't call the source an
 - store_process_timeout: adds the process time into the target document for profiling
 - retry_failed: retries running the process function on previously failed documents
 
-One parameter that doens't work in `GroupBuilder` is `delete_orphans`, since the Many-to-One relationshop makes determining orphaned documents very difficult.
+One parameter that doesn't work in `GroupBuilder` is `delete_orphans`, since the Many-to-One relationshop makes determining orphaned documents very difficult.
 
 Finally let's get to the hard part which is running our function. We do this by defining `unary_function`
 
@@ -81,4 +81,4 @@ Finally let's get to the hard part which is running our function. We do this by
         return {"resupply": resupply}
 ```
 
-Just as in `MapBuilder`, we're not returning all the extra information typically kept in the originally item. Normally, we would have to write code that copies over the source `key` and convert it to the target `key`. Same goes for the `last_updated_field`. `GroupBuilder` takes care of this, while also recording errors, processing time, and the Builder version.`GroupBuilder` also keeps a plural version of the `source.key` field, so in this example, all the `name` values wil be put together and kept in `names`
+Just as in `MapBuilder`, we're not returning all the extra information typically kept in the originally item. Normally, we would have to write code that copies over the source `key` and convert it to the target `key`. Same goes for the `last_updated_field`. `GroupBuilder` takes care of this, while also recording errors, processing time, and the Builder version.`GroupBuilder` also keeps a plural version of the `source.key` field, so in this example, all the `name` values will be put together and kept in `names`
diff --git a/docs/getting_started/running_builders.md b/docs/getting_started/running_builders.md
@@ -15,7 +15,7 @@ my_builder = MultiplyBuilder(source_store,target_store,multiplier=3)
 my_builder.run()
 ```
 
-A better way to run this builder would be to use the `mrun` command line tool. Since evrything in `maggma` is MSONable, we can use `monty` to dump the builders into a JSON file:
+A better way to run this builder would be to use the `mrun` command line tool. Since everything in `maggma` is MSONable, we can use `monty` to dump the builders into a JSON file:
 
 ``` python
 from monty.serialization import dumpfn
@@ -29,7 +29,7 @@ Then we can run the builder using `mrun`:
 mrun my_builder.json
 ```
 
-`mrun` has a number of usefull options:
+`mrun` has a number of useful options:
 
 ``` shell
 mrun --help

diff --git a/docs/getting_started/simple_builder.md b/docs/getting_started/simple_builder.md
@@ -52,7 +52,7 @@ The `__init__` for a builder can have any set of parameters. Generally, you want
 
 Python type annotations provide a really nice way of documenting the types we expect and being able to later type check using `mypy`. We defined the type for `source` and `target` as `Store` since we only care that implements that pattern. How exactly these `Store`s operate doesn't concern us here.
 
-Note that the `__init__` arguments: `source`, `target`, `multiplier`, and `kwargs` get saved as attributess:
+Note that the `__init__` arguments: `source`, `target`, `multiplier`, and `kwargs` get saved as attributes:
 
 ``` python
         self.source = source
@@ -243,4 +243,4 @@ Then we can define a prechunk method that modifies the `Builder` dict in place t
             }
 ```
 
-When distributed processing runs, it will modify the `Builder` dictionary in place by the prechunk dictionary. In this case, each builder distribute to a worker will get a modified `query` parameter that only runs on a subset of all posible keys.
+When distributed processing runs, it will modify the `Builder` dictionary in place by the prechunk dictionary. In this case, each builder distribute to a worker will get a modified `query` parameter that only runs on a subset of all possible keys.
diff --git a/docs/getting_started/stores.md b/docs/getting_started/stores.md
@@ -11,7 +11,7 @@ Current working and tested `Store` include:
 - `MongoStore`: interfaces to a MongoDB Collection
 - `MemoryStore`: just a Store that exists temporarily in memory
 - `JSONStore`: builds a MemoryStore and then populates it with the contents of the given JSON files
-- `FileStore`: query and add metadata to files stored on disk as if they were in a databsae
+- `FileStore`: query and add metadata to files stored on disk as if they were in a database
 - `GridFSStore`: interfaces to GridFS collection in MongoDB
 - `S3Store`: provides an interface to an S3 Bucket either on AWS or self-hosted solutions ([additional documentation](advanced_stores.md))
 - `ConcatStore`: concatenates several Stores together so they look like one Store

diff --git a/docs/getting_started/using_file_store.md b/docs/getting_started/using_file_store.md
@@ -80,7 +80,7 @@ and for associating custom metadata (See ["Adding Metadata"](#adding-metadata) b
 ## Connecting and querying
 
 As with any `Store`, you have to `connect()` before you can query any data from a `FileStore`. After that, you can use `query_one()` to examine a single document or
-`query()` to return an interator of matching documents. For example, let's print the
+`query()` to return an iterator of matching documents. For example, let's print the
 parent directory of each of the files named "input.in" in our example `FileStore`:
 
 ```python
@@ -142,7 +142,7 @@ fs.add_metadata({"name":"input.in"}, {"tags":["preliminary"]})
 
 ### Automatic metadata
 
-You can even define a function to automatically crate metadata from file or directory names. For example, if you prefix all your files with datestamps (e.g., '2022-05-07_experiment.csv'), you can write a simple string parsing function to
+You can even define a function to automatically create metadata from file or directory names. For example, if you prefix all your files with datestamps (e.g., '2022-05-07_experiment.csv'), you can write a simple string parsing function to
 extract information from any key in a `FileStore` record and pass the function as an argument to `add_metadata`.
 
 For example, to extract the date from files named like '2022-05-07_experiment.csv'
@@ -195,7 +195,7 @@ maggma.core.store.StoreError: (StoreError(...), 'Warning! This command is about
 Now that you can access your files on disk via a `FileStore`, it's time to write a `Builder` to read and process the data (see [Writing a Builder](simple_builder.md)).
 Keep in mind that `get_items` will return documents like the one shown in (#creating-the-filestore). You can then use `process_items` to
 
-- Create strucured data from the `contents`
+- Create structured data from the `contents`
 - Open the file for reading using a custom piece of code
 - etc.
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -97,4 +97,5 @@ explicit_package_bases = true
 no_implicit_optional = false
 
 [tool.codespell]
-ignore-words-list = "ot"
+ignore-words-list = "ot,nin"
+skip = 'docs/CHANGELOG.md,tests/test_files/*'
diff --git a/src/maggma/__init__.py b/src/maggma/__init__.py
@@ -1,4 +1,3 @@
-# coding: utf-8
 """ Primary Maggma module """
 from pkg_resources import DistributionNotFound, get_distribution
 

diff --git a/src/maggma/api/API.py b/src/maggma/api/API.py
@@ -23,8 +23,8 @@ def __init__(
         version: str = "v0.0.0",
         debug: bool = False,
         heartbeat_meta: Optional[Dict] = None,
-        description: str = None,
-        tags_meta: List[Dict] = None,
+        description: Optional[str] = None,
+        tags_meta: Optional[List[Dict]] = None,
     ):
         """
         Args:
@@ -33,7 +33,7 @@ def __init__(
             version: the version for this API
             debug: turns debug on in FastAPI
             heartbeat_meta: dictionary of additional metadata to include in the heartbeat response
-            description: decription of the API to be used in the generated docs
+            description: description of the API to be used in the generated docs
             tags_meta: descriptions of tags to be used in the generated docs
         """
         self.title = title

diff --git a/src/maggma/api/models.py b/src/maggma/api/models.py
@@ -20,18 +20,15 @@ class Meta(BaseModel):
 
     api_version: str = Field(
         __version__,
-        description="a string containing the version of the Materials API "
-        "implementation, e.g. v0.9.5",
+        description="a string containing the version of the Materials API implementation, e.g. v0.9.5",
     )
 
     time_stamp: datetime = Field(
         description="a string containing the date and time at which the query was executed",
         default_factory=datetime.utcnow,
     )
 
-    total_doc: Optional[int] = Field(
-        None, description="the total number of documents available for this query", ge=0
-    )
+    total_doc: Optional[int] = Field(None, description="the total number of documents available for this query", ge=0)
 
     class Config:
         extra = "allow"
@@ -56,9 +53,7 @@ class Response(GenericModel, Generic[DataT]):
     """
 
     data: Optional[List[DataT]] = Field(None, description="List of returned data")
-    errors: Optional[List[Error]] = Field(
-        None, description="Any errors on processing this query"
-    )
+    errors: Optional[List[Error]] = Field(None, description="Any errors on processing this query")
     meta: Optional[Meta] = Field(None, description="Extra information for the query")
 
     @validator("errors", always=True)
@@ -92,8 +87,6 @@ class S3URLDoc(BaseModel):
         description="Pre-signed download URL",
     )
 
-    requested_datetime: datetime = Field(
-        ..., description="Datetime for when URL was requested"
-    )
+    requested_datetime: datetime = Field(..., description="Datetime for when URL was requested")
 
     expiry_datetime: datetime = Field(..., description="Expiry datetime of the URL")
diff --git a/src/maggma/api/query_operator/core.py b/src/maggma/api/query_operator/core.py
@@ -8,7 +8,7 @@
 
 class QueryOperator(MSONable, metaclass=ABCMeta):
     """
-    Base Query Operator class for defining powerfull query language
+    Base Query Operator class for defining powerful query language
     in the Materials API
     """
 

diff --git a/src/maggma/api/query_operator/dynamic.py b/src/maggma/api/query_operator/dynamic.py
@@ -26,9 +26,7 @@ def __init__(
         self.excluded_fields = excluded_fields
 
         all_fields: Dict[str, ModelField] = model.__fields__
-        param_fields = fields or list(
-            set(all_fields.keys()) - set(excluded_fields or [])
-        )
+        param_fields = fields or list(set(all_fields.keys()) - set(excluded_fields or []))
 
         # Convert the fields into operator tuples
         ops = [
@@ -49,9 +47,7 @@ def query(**kwargs) -> STORE_PARAMS:
                     try:
                         criteria.append(self.mapping[k](v))
                     except KeyError:
-                        raise KeyError(
-                            f"Cannot find key {k} in current query to database mapping"
-                        )
+                        raise KeyError(f"Cannot find key {k} in current query to database mapping")
 
             final_crit = {}
             for entry in criteria:
@@ -74,26 +70,22 @@ def query(**kwargs) -> STORE_PARAMS:
             for op in ops
         ]
 
-        setattr(query, "__signature__", inspect.Signature(signatures))
+        query.__signature__ = inspect.Signature(signatures)
 
         self.query = query  # type: ignore
 
     def query(self):
         "Stub query function for abstract class"
-        pass
 
     @abstractmethod
-    def field_to_operator(
-        self, name: str, field: ModelField
-    ) -> List[Tuple[str, Any, Query, Callable[..., Dict]]]:
+    def field_to_operator(self, name: str, field: ModelField) -> List[Tuple[str, Any, Query, Callable[..., Dict]]]:
         """
         Converts a PyDantic ModelField into a Tuple with the
             - query param name,
             - query param type
             - FastAPI Query object,
             - and callable to convert the value into a query dict
         """
-        pass
 
     @classmethod
     def from_dict(cls, d):
@@ -115,9 +107,7 @@ def as_dict(self) -> Dict:
 class NumericQuery(DynamicQueryOperator):
     "Query Operator to enable searching on numeric fields"
 
-    def field_to_operator(
-        self, name: str, field: ModelField
-    ) -> List[Tuple[str, Any, Query, Callable[..., Dict]]]:
+    def field_to_operator(self, name: str, field: ModelField) -> List[Tuple[str, Any, Query, Callable[..., Dict]]]:
         """
         Converts a PyDantic ModelField into a Tuple with the
         query_param name,
@@ -181,11 +171,7 @@ def field_to_operator(
                             default=None,
                             description=f"Query for {title} being any of these values. Provide a comma separated list.",
                         ),
-                        lambda val: {
-                            f"{field.name}": {
-                                "$in": [int(entry.strip()) for entry in val.split(",")]
-                            }
-                        },
+                        lambda val: {f"{field.name}": {"$in": [int(entry.strip()) for entry in val.split(",")]}},
                     ),
                     (
                         f"{field.name}_neq_any",
@@ -195,11 +181,7 @@ def field_to_operator(
                             description=f"Query for {title} being not any of these values. \
                             Provide a comma separated list.",
                         ),
-                        lambda val: {
-                            f"{field.name}": {
-                                "$nin": [int(entry.strip()) for entry in val.split(",")]
-                            }
-                        },
+                        lambda val: {f"{field.name}": {"$nin": [int(entry.strip()) for entry in val.split(",")]}},
                     ),
                 ]
             )
@@ -210,9 +192,7 @@ def field_to_operator(
 class StringQueryOperator(DynamicQueryOperator):
     "Query Operator to enable searching on numeric fields"
 
-    def field_to_operator(
-        self, name: str, field: ModelField
-    ) -> List[Tuple[str, Any, Query, Callable[..., Dict]]]:
+    def field_to_operator(self, name: str, field: ModelField) -> List[Tuple[str, Any, Query, Callable[..., Dict]]]:
         """
         Converts a PyDantic ModelField into a Tuple with the
         query_param name,
@@ -253,11 +233,7 @@ def field_to_operator(
                         default=None,
                         description=f"Query for {title} being any of these values. Provide a comma separated list.",
                     ),
-                    lambda val: {
-                        f"{field.name}": {
-                            "$in": [entry.strip() for entry in val.split(",")]
-                        }
-                    },
+                    lambda val: {f"{field.name}": {"$in": [entry.strip() for entry in val.split(",")]}},
                 ),
                 (
                     f"{field.name}_neq_any",
@@ -266,11 +242,7 @@ def field_to_operator(
                         default=None,
                         description=f"Query for {title} being not any of these values. Provide a comma separated list",
                     ),
-                    lambda val: {
-                        f"{field.name}": {
-                            "$nin": [entry.strip() for entry in val.split(",")]
-                        }
-                    },
+                    lambda val: {f"{field.name}": {"$nin": [entry.strip() for entry in val.split(",")]}},
                 ),
             ]
 

diff --git a/src/maggma/api/query_operator/pagination.py b/src/maggma/api/query_operator/pagination.py
@@ -35,8 +35,7 @@ def query(
             ),
             _limit: int = Query(
                 default_limit,
-                description="Max number of entries to return in a single query."
-                f" Limited to {max_limit}.",
+                description=f"Max number of entries to return in a single query. Limited to {max_limit}.",
             ),
         ) -> STORE_PARAMS:
             """
@@ -82,7 +81,6 @@ def query(
 
     def query(self):
         "Stub query function for abstract class"
-        pass
 
     def meta(self) -> Dict:
         """