Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add unique feature arg in faker factory #820

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

arthurHamon2
Copy link

Since Faker 4.9.0 version, they have added support to generate unique values.
This is really helpful when dealing with unique constraint on a field generated by factory boy. The test can be flaky if you use faker without the "unique" feature on an ORM field with an unique constraint.

The usage with factory boy is simple as this:

class UserFactory(fatory.Factory):
            class Meta:
                model = User

            arrival = factory.Faker(
                'date_between_dates',
                date_start=datetime.date(2020, 1, 1),
                date_end=datetime.date(2020, 5, 31),
                unique=True  # The generated date is guaranteed to be unique inside the test execution.
            )

The unique keyword can be passed on every faker providers. If true the faker object passes through the Faker Unique Proxy, making sure the generated value has not been already generated before. Not that the default unique keyword value is false.

It's my first PR on this repository, be kind :)
Also I have been using factory boy for almost 5 years and it saves me so much boilerplate in my tests, thanks to every creators, maintainers of this wonderful project 🎉

@arthurHamon2 arthurHamon2 force-pushed the feat/add-unique-feature-arg-in-faker-factory branch from ba8f9ef to f77d357 Compare November 18, 2020 16:07
Copy link
Member

@francoisfreitag francoisfreitag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch ⭐! I believe some users will be interested by this feature.

Please add an entry to the release note to inform users.

Can a hook be added to reset the unique store (fake.unique.clear())? Similar in spirit to reset_sequence.


There’s an issue with the use of _FAKER_REGISTRY by factory_boy. Several factories share the same underlying faker instance, causing them to share their values list. I expect the following test to pass:

class SharedTest(unittest.TestCase):
     def test_faker_shared_faker_instance(self):
         class Foo:
             def __init__(self, val):
                 pass

         class Bar:
             def __init__(self, val):
                 pass

         class Factory1(factory.Factory):
             val = factory.Faker('pyint', min_value=1, max_value=1, unique=True)

             class Meta:
                 model = Foo

         class Factory2(factory.Factory):
             val = factory.Faker('pyint', min_value=1, max_value=1, unique=True)

             class Meta:
                 model = Bar

         Factory1.build()
         Factory2.build()

It currently fails:

ERROR: test_faker_shared_faker_instance (tests.test_faker.RealFakerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/freitafr/dev/factory_boy/tests/test_faker.py", line 237, in test_faker_shared_faker_instance
    Factory2.build()
  File "/home/freitafr/dev/factory_boy/factory/base.py", line 510, in build
    return cls._generate(enums.BUILD_STRATEGY, kwargs)
  File "/home/freitafr/dev/factory_boy/factory/base.py", line 464, in _generate
    return step.build()
  File "/home/freitafr/dev/factory_boy/factory/builder.py", line 276, in build
    step.resolve(pre)
  File "/home/freitafr/dev/factory_boy/factory/builder.py", line 217, in resolve
    self.attributes[field_name] = getattr(self.stub, field_name)
  File "/home/freitafr/dev/factory_boy/factory/builder.py", line 379, in __getattr__
    value = value.evaluate(
  File "/home/freitafr/dev/factory_boy/factory/declarations.py", line 356, in evaluate
    return self.generate(extra)
  File "/home/freitafr/dev/factory_boy/factory/faker.py", line 51, in generate
    return subfaker.format(self.provider, **params)
  File "/home/freitafr/dev/factory_boy/venv/lib/python3.8/site-packages/faker/proxy.py", line 282, in wrapper
    raise UniquenessException("Got duplicated values after {0:,} iterations.".format(_UNIQUE_ATTEMPTS))
faker.exceptions.UniquenessException: Got duplicated values after 1,000 iterations.

Second commit has a typo: unqiue instead of unique.

docs/reference.rst Outdated Show resolved Hide resolved
docs/reference.rst Outdated Show resolved Hide resolved
docs/reference.rst Show resolved Hide resolved
@rbarrois
Copy link
Member

I concur with @francoisfreitag's review: the idea is amazing!
However, its relation with factory_boy's factory classes should be clarified.

Basically, as a user:

  • If I specifiy factory.Faker('foo', unique=True), I expect values to be unique within this factory instance
  • If I add a factory.Faker('username', unique=True) to UserFactory, then subclass it as class AdminFactory(UserFactory), is the uniqueness kept across both UserFactory and AdminFactory?

As a side note: for this reason, I find it easier to discuss new features as an issue first, and to focus on the technical implementation once the overall design is clear 😉

@arthurHamon2 arthurHamon2 force-pushed the feat/add-unique-feature-arg-in-faker-factory branch 2 times, most recently from 9148a08 to de6bcaf Compare November 23, 2020 13:59
@arthurHamon2
Copy link
Author

Thanks for your reviews. Indeed I did not notice that factory boy is using a global Faker object across all factories.
I've addressed all your points @francoisfreitag but the clear feature. As the unique store is related to a factory instance, we want to be able to clear the store on the factory level, right ?
@rbarrois You're right, I though it was a minor change and did not fully understand how it impacted factory boy. To make things clearer I would say that :

  • A global Faker object is created for "non" unique use cases
  • A local faker object is created by local type at the factory (instance) level (the uniqueness is then guaranteed across all objects created by the factory)
  • On inheritance, if the child factory does not override the unique faker declaration, the child factory shares the same unique store as the parent declaration. If the declaration is redefined, then it won't use the same unique store.

If you are okay with these assumptions @rbarrois I write them in the documentation.

@arthurHamon2 arthurHamon2 force-pushed the feat/add-unique-feature-arg-in-faker-factory branch from de6bcaf to 4345ffa Compare November 23, 2020 17:06
@arthurHamon2
Copy link
Author

@francoisfreitag I just added a commit that clear the unique store : 0aaefde
Is it what you had in mind when you talked about Can a hook be added to reset the unique store (fake.unique.clear())? Similar in spirit to reset_sequence. ?

@rbarrois
Copy link
Member

@arthurHamon2 Yep, that's a great idea:

  • A global Faker;
  • A per-factory Faker (with unique=True) for factories;
  • For inheritance, I think you could rely on factory._meta._get_counter_reference() (whose name could be changed) to decide whether to reuse a parent factory's unique faker or not;

I've got two points to dig into:

  • Synchronising the random seeds across all Faker setups;
  • Managing the initialisation of Faker instances — setup time, sharing providers.

By the way, this is no small task you've taken up, leading deep into factory_boy's internals, congrats!
Let us know where you need help with the code — if you're interested in an in-depth introduction to the library's internals, I could even arrange that over visio in the next few days.

@arthurHamon2
Copy link
Author

@rbarrois an in-depth introduction over visio would be much appreciated, any time you want, let me know.
I guess we can discuss about my last commit, trying to setup a per-factory unique faker store.
I'm also working on an old abandoned feature : #822
We could chat about this too :)

@rbarrois
Copy link
Member

rbarrois commented Dec 3, 2020

@arthurHamon2 Awesome :) It might be easier getting in touch over email, my email address is in the project's setup.cfg 😛

@x-yuri
Copy link

x-yuri commented Feb 10, 2021

I don't know about you, but sounds like a lot (?) of extra complexity: per-factory fakers, syncing random generators (reproducible randomness), sharing providers, documenting it. And on the part of the users as well. They might not consider whether uniqueness gets inherited or not, if handled by factory-boy. They might not even need it. The general solution is apparently much more complex.

Maybe we should rather give user access to the global fakers? A way to iterate over them (to reset the seen values)? Maybe a way to override the faker call (for subclassing)? Or a way to obtain the faker that would be otherwise used (for sequences, if possible, to not bother with the locale)? The way I handle it now.

@matthijskooijman
Copy link

I needed factoryboy to generate unique values and found this PR, any chance of making progress here?

Reading back the discussion: How important is it it have uniqueness limited to particular factories and/or attributes? If unique values are generated to be globally unique, they will also be unique within a particlar factory and/or attribute.

So then the question is: How important is it that duplicate values can be generated when using the same faker for different factories/attributes? IMHO that is really not essential: If it is important for a test that values are duplicated, then the test should be written to ensure the duplication, not rely on randomly generated duplication?

Or are there compelling performance arguments to want to limit the scope of uniqueness?

@kwist-sgr
Copy link

I needed factoryboy to generate unique values and found this PR, any chance of making progress here?

I'm usign the following solution:

import factory


class UniqueFaker(factory.Faker):

    @classmethod
    def _get_faker(cls, locale=None):
        return super()._get_faker(locale=locale).unique

usage

class UserFactory(factory.Factory):
    class Meta:
        model = User

    name = UniqueFaker('name')

@tgross35
Copy link

tgross35 commented Jun 13, 2022

Alternative in-class solution (in case the _get_faker API changes)

active = factory.LazyFunction(lambda: fake.unique.company())

Adding my 2¢ on the per-factory uniqueness option: it really doesn't seem necessary, as these are the possible cases:

  1. Need all values unique, across all factories
  2. Need all values unique within each factory, don't care about cross-factory uniqueness
  3. Don't care about uniqueness
  4. Need duplicate values between factories
  5. Need duplicate values within factories

The only out of the box solution currently supported is case 3. Adding a simple kwarg factory.Faker(..., unique=True) (as I believe this PR covers) immediately works for the first three cases. Additionally, it's easy.

Cases 4 and 5 are not currently supported - and adding per-factory uniqueness would not cover them. If testing with duplicate values is required, that's important to make explicit. Enforcing duplication is much more complex and variable than enforcing uniqueness, so it's best left to per-implementation LazyFunction or factory-calling functions.

Per-factory uniqueness only enables the case of "unique within factories, but occasionally and not repeatably allow duplicates across factories" which doesn't seem very useful to me (though I would welcome counterexamples).

So, what would be required at this point to get this PR ready to go?

@arthurHamon2 are you still around and interested in this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants