Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor logging, introduce JDBC #260

Merged
merged 10 commits into from
Jun 6, 2023
Merged

Refactor logging, introduce JDBC #260

merged 10 commits into from
Jun 6, 2023

Conversation

jphui
Copy link
Contributor

@jphui jphui commented May 27, 2023

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

<refactor / feat> : refactor logging to be more concise and consistent, introduce JDBC query for cross checking with Ebean results

Suggestions from @dnbaker surrounding logging conventions in #255 have been incorporated. In addition, a JDBC query has been introduced to be used in cross-checking against Ebean's results to determine if the duplicity issue arises from the Ebean layer.

NOTE that JDBC is not added as a 4th Find Methodology; it's intentionally made to be run in parallel in order to allow cross-checking.

The Big Idea

The end result of this PR is that preceding the existing logging we will be able to see the results that JDBC "would have gotten" ... because if BOTH JDBC and Ebean return null then we can be assured that Ebean is not the cuprit!

// TODO(@jphui) added for job-gms duplicity debug, throwaway afterwards

// JDBC sanity check: should MATCH Ebean's results
if (log.isDebugEnabled() && "com.linkedin.dataJob.azkaban.AzkabanFlowInfo".equals(aspectName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comment in the other PR: #265. What is the difference between them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change in response to @dnbaker 's suggested changes to use a single SOT for the Aspect name.

The difference:

  • ModelUtils.getAspectName() = .getCanonicalName() = "get the fully qualified class name"
  • .getSimpleName() = "get only the suffix" like 'AzkabanFlowInfo'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry what I mean is the what are the difference between these two PRs #265 and this one? should I move my comments in #265 to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move them here to expedite just saw this sorry

return _server.find(EbeanMetadataAspect.class, key);
}

// TODO(@jphui) added for job-gms duplicity debug, throwaway afterwards

// JDBC sanity check: should MATCH Ebean's results
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about wrapping this into a method and put it inside the FindMethodology block. Then this can be enabled from the conflg file

Copy link
Contributor Author

@jphui jphui Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangyangv2 The JDBC functionality is intentionally run in parallel alongside the Ebean find() methods. As it is right now, we want them to run simultaneously to cross check results.

Edited the PR description to be more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then how about moving this logic to line:506, so you don't have to run the same if-check again?

}
}
// Encouraged to run AFTER query execution based on getGeneratedSql() documentation
if (log.isDebugEnabled() && "com.linkedin.dataJob.azkaban.AzkabanFlowInfo".equals(aspectName)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using aspectClass.getSimpleName() to compare with "AzkabanFlowInfo" for performance reason.

how about moving this log into the the method block, so we can control from the config file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this out of the block because both DIRECT_SQL and QUERY_BUILDER will run the same logging lines.

Copy link
Contributor

@yangyangv2 yangyangv2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangyangv2 yangyangv2 merged commit 6273d6e into master Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants