Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SMP prvSelectHighestPriorityTask adds current task to the front of the ready list if pxIndex points to the head of the list #990

Closed
gemarcano opened this issue Feb 13, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@gemarcano
Copy link

gemarcano commented Feb 13, 2024

Describe the bug
The new implementation for prvSelectHighestPriorityTask for SMP uses vListInsertEnd to insert the current TCB to the end of the ready task list. vListInsertEnd doesn't actually insert an element to the end of a list-- it only adds it such that it is the last element returned by calling listGET_OWNER_OF_NEXT_ENTRY multiple times before it starts repeating. Effectively vListInsertEnd inserts the node right before the current pxIndex node of the list.

In testing with a personal project and stepping through debugging, at first the pxIndex of the ready list seems to be the tail element of the list (before the xListEnd element). Over time, however, as tasks are removed and added to the ready list, it looks like the pxIndex element migrates to the top of the list. Once it reaches the top of the list, vListInsertEnd actually ends up inserting the current task TCB node to the front of the ready list!

The fix would be to use listGET_OWNER_OF_NEXT_ENTRY to iterate through the list, instead of starting from the head element.

Target

  • Development board: Raspberry Pi Pico W (rp2040)
  • Instruction Set Architecture: ARM Cortex-M0+
  • IDE and version: pico-sdk, ninja, vim, cmake, crossdev generated arm-none-eabi-gcc
  • Toolchain and version: arm-none-eabi-gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113

Host

  • Host OS: Gentoo Linux
  • Version: Unstable (rolling release), kernel version 6.7.0

To Reproduce
I don't have a generic reproducer, since it strongly depends on scheduler and task interaction. Even reproducing it on my device is almost like trying to reproduce a race condition, and any slowdown from gdb conditionals renders the issue impossible to reproduce.

My project is set up to mock an HID USB device using Tinyusb. I have a task dedicated to USB handling, a CLI task, a task mocking controller input, and a watchdog task. By dumping the list of active tasks, it looks like the pico-sdk also has a few other tasks running in the background:

Tasks active: 7
  task name: prb_cli
  task name: usb
  task name: IDLE1
  task name: IDLE0
  task name: prb_watchdog
  task name: controller
  task name: Tmr Svc

I configured all 4 of my tasks to have a core affinity so they only use core 2.

I triggered the issue by constantly requesting the CLI task to output my debug status info using uxTaskGetSystemState to get the system state. It can take seconds to almost a minute of me spamming requests (as a human, typing s and enter to trigger the CLI output) to trigger the bug.

What I observe is that the scheduler consistently schedules the current task once the bug is triggered, starving all others. Makes sense if pxIndex is the head node of the ready list, as the current task node gets added before the pxIndex node... becoming the new head node.

Expected behavior
No resource starvation on the core the bug triggers in.

Screenshots
N/A

Additional context

See this FreeRTOS forum post for a discussion and all of my findings about the issue.

I can open a pull request with an attempted fix, but I have no idea how to go about preparing unit tests and coverage, or how to do proper regression testing with FreeRTOS.

@gemarcano gemarcano added the bug Something isn't working label Feb 13, 2024
@rawalexe
Copy link
Member

Thank you for the bug report we are looking into the problem

@chinglee-iot chinglee-iot self-assigned this Feb 19, 2024
@chinglee-iot
Copy link
Member

The PR #1000 to address this issue is merged. Thank you for creating this issue.

laroche pushed a commit to laroche/FreeRTOS-Kernel that referenced this issue Apr 18, 2024
* Update M33F simulator Keil example

The example is updated to use latest CMSIS 5.9.0 and Device startup
2.1.0.

Signed-off-by: Devaraj Ranganna <devaraj.ranganna@arm.com>

* Increase timeout and correct config file path in the secure project

Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>

* Remove auto-generated files

Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>

---------

Signed-off-by: Devaraj Ranganna <devaraj.ranganna@arm.com>
Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Co-authored-by: Gaurav Aggarwal <aggarg@amazon.com>
Co-authored-by: Gaurav-Aggarwal-AWS <33462878+aggarg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants