-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Wilson Score Confidence Interval Strategy #567
Conversation
@rdsharma26 Hi can you help review this PR. |
object ConfidenceIntervalStrategy { | ||
val defaultConfidence = 0.95 | ||
|
||
case class ConfidenceInterval(lowerBound: Double, upperBound: Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently also calculate upperBound for these ConfidenceInterval. At the moment we don't actually make use of the upperBound though
Thank you for the PR! @zeotuan One point I would like to discuss is making the Wilson Score Confidence Interval Strategy the new default. Could this potentially break backwards compatibility in terms of behavior? If so, should we stick to the Wald Interval Strategy as the default and update the documentation so that users of Deequ can choose which one they want? |
I think making Wilson Score Confidence Interval default right now might introduce some surprising changes to existing data quality pipelines. I will add example usage documentation for this. |
This seems like a safe approach. Could we configure the default to be the Wald strategy in the following line?
The build also failed due to:
|
@@ -23,16 +23,17 @@ import com.amazon.deequ.metrics.DistributionValue | |||
import com.amazon.deequ.profiles.ColumnProfile | |||
import com.amazon.deequ.suggestions.ConstraintSuggestion | |||
import com.amazon.deequ.suggestions.ConstraintSuggestionWithValue | |||
import com.amazon.deequ.suggestions.rules.FractionalCategoricalRangeRule.defaultIntervalStrategy | |||
import com.amazon.deequ.suggestions.rules.interval.{ConfidenceIntervalStrategy, WilsonScoreIntervalStrategy} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we avoid grouped imports and use one import per line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, do we prefer separate import or single import but with each on a single line
import com.amazon.deequ.suggestions.rules.interval.ConfidenceIntervalStrategy
import com.amazon.deequ.suggestions.rules.interval.WilsonScoreIntervalStrategy
or
import com.amazon.deequ.suggestions.rules.interval{
ConfidenceIntervalStrategy,
WilsonScoreIntervalStrategy
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former. It helps with automatic resolution of merge conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Left a comment and a unit test needs fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @zeotuan for your continued contribution to Deequ!
Fixes #563
Description of changes:
RetainCompletenessRules
andFractionalCategoricalRangeRule
to accept and configure ConfidenceIntervalStrategy parameterBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.