Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

fix: memory leaks and deletion #6

Closed
wants to merge 7 commits into from
Closed

fix: memory leaks and deletion #6

wants to merge 7 commits into from

Conversation

eeberhard
Copy link
Member

@eeberhard eeberhard commented Jan 26, 2024

Description

This PR addresses the issue by implementing the fixes in bbenligiray#31 and bbenligiray#32. I took the liberty of patching other occurrences of bad deletion too.

Review guidelines

Estimated Time of Review: 5 minutes

Checklist before merging:

  • Confirm that the relevant changelog(s) are up-to-date in case of any user-facing changes
  • Tested on robot

@eeberhard eeberhard linked an issue Jan 28, 2024 that may be closed by this pull request
@JasperTan97
Copy link

So currently, I am getting

Thread 41 "state_engine" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff7cff9640 (LWP 278)]
0x00007ffff72c847e in free () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0  0x00007ffff72c847e in free () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fff0c12ea9f in stag::EdgeMap::~EdgeMap (this=0x7fff2804f6d0, __in_chrg=<optimized out>)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/include/stag/ED/EdgeMap.h:42
#2  0x00007fff0c12db16 in stag::EDInterface::~EDInterface (this=0x7fff4400a398, 
    __in_chrg=<optimized out>)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/EDInterface.cpp:31
#3  0x00007fff0c6017de in stag::Stag::~Stag (this=0x7fff4400a390, __in_chrg=<optimized out>)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/include/stag/Stag.h:14
#4  0x00007fff0c601802 in __gnu_cxx::new_allocator<stag::Stag>::destroy<stag::Stag> (
    this=0x7fff4400a390, __p=0x7fff4400a390) at /usr/include/c++/11/ext/new_allocator.h:168
#5  0x00007fff0c6016b5 in std::allocator_traits<std::allocator<stag::Stag> >::destroy<stag::Stag> (
    __a=..., __p=0x7fff4400a390) at /usr/include/c++/11/bits/alloc_traits.h:535
#6  0x00007fff0c601571 in std::_Sp_counted_ptr_inplace<stag::Stag, std::allocator<stag::Stag>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7fff4400a380) at /usr/include/c++/11/bits/shared_ptr_base.h:528
#7  0x000055555602be2f in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (
    this=0x7fff4400a380) at /usr/include/c++/11/bits/shared_ptr_base.h:168
#8  0x0000555556029fbd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (
    this=0x7fff209c1ec8, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#9  0x00007fff0c5fc8ea in std::__shared_ptr<stag::Stag, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
    this=0x7fff209c1ec0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#10 0x00007fff0c5fc90a in std::shared_ptr<stag::Stag>::~shared_ptr (this=0x7fff209c1ec0, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122
#11 0x00007fff0c601440 in pose_detection::detectors::STagDetector::~STagDetector (this=0x7fff209c1dd0, 
    __in_chrg=<optimized out>)
    at /home/ros2/ws/src/pose_detection/include/pose_detection/detectors/STagDetector.hpp:17
#12 0x00007fff0cbe2b85 in __gnu_cxx::new_allocator<pose_detection::detectors::STagDetector>::destroy<pose_detection::detectors::STagDetector> (this=0x7fff209c1dd0, __p=0x7fff209c1dd0)

When reading through the code, there is no explicit destructors for Stag.h stag object. Also, these objects do not explicitly define which should be deleted first.

From reading the code, Stag object creates EDInterface which has a pointer EdgeMap object. Hence, we want to delete EDInterface before EdgeMap. But as seen in the GDB console, the EdgeMap object gets deleted first, probably due to the order of inheritance, which has a dangling pointer. Phind (AI chatbot) says that free could mean double free or dangling pointer.

@eeberhard
Copy link
Member Author

we want to delete EDInterface before EdgeMap. But as seen in the GDB console, the EdgeMap object gets deleted first, probably due to the order of inheritance

The stack trace shows us the call history in reverse order. It's really just behaving as we expect.

// #2  0x00007fff0c12db16 in stag::EDInterface::~EDInterface
EDInterface::~EDInterface() {
    if (edLines != NULL)
        delete edLines;
    if (edgeMap != NULL)
        delete edgeMap; // <-- Go to destructor of EdgeMap
}

// #1  0x00007fff0c12ea9f in stag::EdgeMap::~EdgeMap
  ~EdgeMap(){
    delete[] edgeImg; // <-- Go to implementation of delete, which internally calls `free()`
    delete[] pixels;
    delete[] segments;
  } //end-~EdgeMap

// #0  0x00007ffff72c847e in free () from /lib/x86_64-linux-gnu/libc.so.6
! segmentation fault

When reading through the code, there is no explicit destructors for Stag.h stag object. Also, these objects do not explicitly define which should be deleted first.

The destruction sequence itself is not really the problem; default destructors just go in the opposite order of construction, and call the respective destructors of each property in turn as we see in the stack trace above.

The issue as before is that the memory has already been de-allocated somewhere.

It would seem that the check if (edgeMap != NULL) delete edgeMap should prevent this behavior, but actually delete doesn't set the pointer to NULL, it just deallocates the corresponding memory. This keeps delete atomic and optimized, but makes if !NULL check unreliable. A hack fix would be to explicitly set the pointer to NULL everywhere it is deleted, so that this if check always succeeds. This still involves finding all occurrences where the edgemap might be deleted, at which point we can also check for double deletion cases.

Otherwise, the less efficient workaround is just not to delete anything in the destructor of EdgeMap. This invites some memory leak but certainly avoids a double free error.

@JasperTan97
Copy link

JasperTan97 commented Jan 29, 2024

Thread 45 "state_engine" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff667fc640 (LWP 692)]
stag::QuadDetector::groupLines (this=0x7fff3000ffe0, image=..., edInterface=0x7fff3000ffa8) at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/QuadDetector.cpp:120
120	/home/ros2/ws/build/pose_detection/_deps/stag-src/src/QuadDetector.cpp: No such file or directory.
(gdb) backtrace
#0  stag::QuadDetector::groupLines (this=0x7fff3000ffe0, image=..., edInterface=0x7fff3000ffa8)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/QuadDetector.cpp:120
#1  0x00007fff15138678 in stag::QuadDetector::detectQuads (this=0x7fff3000ffe0, image=..., 
    edInterface=0x7fff3000ffa8)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/QuadDetector.cpp:30
#2  0x00007fff151411fd in stag::Stag::detectMarkers (this=0x7fff3000ffa0, inImage=...)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/Stag.cpp:27
Thread 40 "state_engine" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff757fa640 (LWP 553)]
0x00007fff14fc9cbf in stag::EDInterface::correctLineDirection (this=0x7fff3c2b47a8, image=..., ls=...) at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/EDInterface.cpp:57
57 /home/ros2/ws/build/pose_detection/_deps/stag-src/src/EDInterface.cpp: No such file or directory.
(gdb) backtrace
#0  0x00007fff14fc9cbf in stag::EDInterface::correctLineDirection (this=0x7fff3c2b47a8, image=..., 
    ls=...) at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/EDInterface.cpp:57
#1  0x00007fff14fde1ef in stag::QuadDetector::groupLines (this=0x7fff3c2b47e0, image=..., 
    edInterface=0x7fff3c2b47a8)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/QuadDetector.cpp:134
#2  0x00007fff14fdd678 in stag::QuadDetector::detectQuads (this=0x7fff3c2b47e0, image=..., 
    edInterface=0x7fff3c2b47a8)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/QuadDetector.cpp:30
#3  0x00007fff14fe61fd in stag::Stag::detectMarkers (this=0x7fff3c2b47a0, inImage=...)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/Stag.cpp:27
Thread 46 "state_engine" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff61ffb640 (LWP 424)]
0x00007fff8856ef29 in cv::invert(cv::_InputArray const&, cv::_OutputArray const&, int) () from /ws/install/pose_detection/lib/../apt/lib/x86_64-linux-gnu/libopencv_core.so.4.5d
(gdb) backtrace
#0  0x00007fff8856ef29 in cv::invert(cv::_InputArray const&, cv::_OutputArray const&, int) ()
   from /ws/install/pose_detection/lib/../apt/lib/x86_64-linux-gnu/libopencv_core.so.4.5d
#1  0x00007fff885c691a in ?? ()
   from /ws/install/pose_detection/lib/../apt/lib/x86_64-linux-gnu/libopencv_core.so.4.5d
#2  0x00007fff10e2d8a6 in cv::MatExpr::operator cv::Mat (this=0x7fff61ff6480)
    at /home/ros2/ws/install/pose_detection/apt/usr/include/opencv4/opencv2/core/mat.inl.hpp:3078
#3  0x00007fff1042e09a in stag::PoseRefiner::refineMarkerPose (this=0x7fff3c00cc28, 
    edInterface=0x7fff3c00cb68, marker=...)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/PoseRefiner.cpp:69
#4  0x00007fff1043e4b0 in stag::Stag::detectMarkers (this=0x7fff3c00cb60, inImage=...)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/Stag.cpp:53

So after implementing the mutex, I am getting random segmentation fault problems, all stemming from stag::Stag::detectMarkers . Do you think it's because there are other parts in that function that's looking for the deleted edInterface object and that's a dangling pointer? It's basically the stag detector trying to run a function while being destructed too? So if we move the mutex here it might solve the problem?

If that's true, then the mutex is somehow no longer needed in EDInterface because there won't be processes running there.

@JasperTan97
Copy link

free(): invalid pointer

Thread 43 "state_engine" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff677fe640 (LWP 533)]
0x00007ffff72b9a7c in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0  0x00007ffff72b9a7c in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7265476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff724b7f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff72ac6f6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff72c3d7c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff72c5ac4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff72c84d3 in free () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007fff14507d17 in stag::EDLines::~EDLines (this=0x7fff342056a0, __in_chrg=<optimized out>)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/include/stag/ED/EDLines.h:57
#8  0x00007fff14506c05 in stag::EDInterface::~EDInterface (this=0x7fff500019f8, 
    __in_chrg=<optimized out>)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/EDInterface.cpp:31
#9  0x00007fff14523270 in stag::Stag::~Stag (this=0x7fff500019f0, __in_chrg=<optimized out>)
    at /home/ros2/ws/build/pose_detection/_deps/stag-src/src/Stag.cpp:26
#10 0x00007fff149d7b34 in __gnu_cxx::new_allocator<stag::Stag>::destroy<stag::Stag> (
    this=0x7fff500019f0, __p=0x7fff500019f0) at /usr/include/c++/11/ext/new_allocator.h:168
#11 0x00007fff149d7aff in std::allocator_traits<std::allocator<stag::Stag> >::destroy<stag::Stag> (
    __a=..., __p=0x7fff500019f0) at /usr/include/c++/11/bits/alloc_traits.h:535
#12 0x00007fff149d79bb in std::_Sp_counted_ptr_inplace<stag::Stag, std::allocator<stag::Stag>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7fff500019e0) at /usr/include/c++/11/bits/shared_ptr_base.h:528
#13 0x000055555602be2f in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (
    this=0x7fff500019e0) at /usr/include/c++/11/bits/shared_ptr_base.h:168
#14 0x0000555556029fbd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (
    this=0x7fff189b3218, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#15 0x00007fff149d300a in std::__shared_ptr<stag::Stag, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
    this=0x7fff189b3210, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#16 0x00007fff149d302a in std::shared_ptr<stag::Stag>::~shared_ptr (this=0x7fff189b3210, 

Alright, after locking all unsafe functions during destruction, it really seems that the problem has returned to these arrays, where they are not NULL but deleted somewhere, and they also have unsafe functions that could be running during destruction. Furthermore, the entire Stag code is too complicated to code trace without spending so much time on it, and adding mutexes or pointer = NULL is a little silly. I will see how our pose detection uses stag and maybe we can decide what's the next course of action

@JasperTan97
Copy link

JasperTan97 commented Jan 30, 2024

bool STagDetector::configure() {
  stag_ = std::make_shared<stag::Stag>(library_->get_value(), error_correction_->get_value(), false);
  return BaseMarkerDetector::configure();
}

std::map<int, std::vector<cv::Point2d>> STagDetector::detect_markers(const cv::Mat& image) {
  cv::Mat gray;
  cv::cvtColor(image, gray, cv::COLOR_RGB2GRAY);
  this->stag_->detectMarkers(gray);
  annotated_image_ = this->stag_->drawMarkers();

  std::map<int, std::vector<cv::Point2d>> markers;
  for (const auto& marker: this->stag_->getMarkerList()) {
    markers.insert_or_assign(marker.id, marker.corners);
  }
  return markers;
}

So I am looking at the part of the code that invokes the Stag class found here, and it uses 3 functions.

  1. detectMarkers which was originally part of the code
  2. drawMarkers which was resolved in Draw markers #2. We find that it's also added here
  3. getMarkerList which was also resolved in Draw markers #2, which is here as well.

And after, there's some clean up with seperating headers and cpp file, as well as making sure all the installation and compilation scripts are working.

We could try to update our Stag library to the latest version, and it would be easier to update down the line too (maybe). Let me know what you think.

@JasperTan97
Copy link

This is closed to facilitate #7

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leaks in stag library
2 participants