Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

view_sum cache for courses is miscalculated in CourseCacheManager #5911

Open
gabina opened this issue Jul 26, 2024 · 1 comment
Open

view_sum cache for courses is miscalculated in CourseCacheManager #5911

gabina opened this issue Jul 26, 2024 · 1 comment

Comments

@gabina
Copy link
Member

gabina commented Jul 26, 2024

What is happening?

There is a pretty counter-intuitive behavior related to ActiveRecord distinct and sum, which makes the view_sum cache incorrect. See issue #33082 in rails repo for more details on how this actually works.

The main method for CourseCacheManager class is update_cache, which updates the different caches.
The view_sum cache is updated through the following private method:

  def update_view_sum
    @course.view_sum = @course.articles_courses.tracked.live.sum(:view_count)
  end

This method is associated to the following SQL query:

SELECT SUM(DISTINCT `articles_courses`.`view_count`) FROM `articles_courses` INNER JOIN `articles` ON `articles`.`id` = `articles_courses`.`article_id` WHERE `articles_courses`.`course_id` = 16 AND `articles_courses`.`tracked` = TRUE AND `articles`.`deleted` = FALSE

This means that if the value of view_count is the same for different article courses, the repeated values are ignored, generating an incorrect calculation of the total sum of views.

This is caused by a not very intuitive behavior when using distinct and sum in ActiveRecord (note that tracked and live articles courses scopes use distinct).

To Reproduce

Steps to reproduce the behavior:

  1. Have a course downloaded locally with several articles courses, and make the view_count for those articles courses not unique. Example:
    image
  1. Move the update_view_sum private method out of the private methods section so that you can call directly
  2. Open a rails console
  3. Run the following code:
course = Course.find(16)
ccm = CourseCacheManager.new(course)
ccm.update_view_sum

view_sum = 0
course.articles_courses.tracked.live.each { |ac| view_sum+= ac.view_count }
  1. See that course.view_sum is 4049 (doesn't take the repeated 220 into account), while view_sum is 4269 (4049+220).

Expected behavior

view_sum field for Course should have the sum of view_count for all tracked live articles courses, no matter if the view_count value is not unique .

The SQL query should be:
SELECT DISTINCT `articles_courses`.* FROM `articles_courses` INNER JOIN `articles` ON `articles`.`id` = `articles_courses`.`article_id` WHERE `articles_courses`.`course_id` = 16 AND `articles_courses`.`tracked` = TRUE AND `articles`.`deleted` = FALSE

One option is to use the following definition, but I' m not sure if this could be less performant.

  def update_view_sum
    @course.view_sum = @course.articles_courses.tracked.live.sum(&:view_count)
  end

Additional context

It is possible that this same behavior is causing problems in other parts of the code. We should review all the code when we fix this.

@ragesoss
Copy link
Member

Wow, nice find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants