Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error when copying to constant device memory with HIP #912

Open
gzagaris opened this issue Sep 24, 2024 · 2 comments
Open

Runtime error when copying to constant device memory with HIP #912

gzagaris opened this issue Sep 24, 2024 · 2 comments

Comments

@gzagaris
Copy link
Member

Describe the bug

Copying data from the host to constant device memory using HIP throws a runtime error. The code works fine with CUDA , but, Umpire throws the following runtime error when compiled with HIP:

C++ exception with description "! Umpire runtime_error [.../Umpire/src/umpire/op/HipCopyOperation.cpp:31]: hipMemcpy( dest_ptr = 0xe3e480, src_ptr = 0x52fb90, length = 4096) failed with error: invalid argument

To Reproduce

Here is a code snippet that reproduces this behavior:

auto& rm = umpire::ResourceManager::getInstance();
auto const_allocator = rm.getAllocator("DEVICE_CONST");

static constexpr int N = 4;
static constexpr int BYTESIZE = N * sizeof(int);
static constexpr int TEST_VAL = 42;

auto host_allocator = rm.getAllocator("HOST");
int* HOST_DATA = static_cast< int* >(host_allocator.allocate(BYTESIZE));
for ( int i = 0; i < N; ++i ) {
    HOST_DATA[ i ] = TEST_VAL;
}

int* A_d = static_cast< int* >(const_allocator.allocate(BYTESIZE));
EXPECT_TRUE(A_d != nullptr);
rm.copy(A_d, HOST_DATA, BYTESIZE); // <------------- RUNTIME ERROR THROWN HERE!

I am compiling Umpire with -DENABLE_HIP=On -DUMPIRE_ENABLE_DEVICE_CONST=On.

Am I missing anything?

Expected behavior

I would have expected this to work and not throw a runtime error.

Compilers & Libraries (please complete the following information):

  • Compiler & version: amdclang-16.0.0
  • ROCM version: v5.6.0
@gzagaris
Copy link
Member Author

One more thing to add to this. I did look into the implementation a bit and I did not see any calls to hipGetSymbolAddress() in the HipConstantMemoryResource.cpp, which I would have expected and it may be related to the issue that I am seeing.

@gzagaris
Copy link
Member Author

gzagaris commented Sep 24, 2024

Following a similar approach as in CudaConstantMemoryResource.cu, I made the following changes to HipConstantMemoryResource.cpp:

diff --git a/src/umpire/resource/HipConstantMemoryResource.cpp b/src/umpire/resource/HipConstantMemoryResource.cpp
index 65c5f72c..d00103a2 100644
--- a/src/umpire/resource/HipConstantMemoryResource.cpp
+++ b/src/umpire/resource/HipConstantMemoryResource.cpp
@@ -25,7 +25,8 @@ HipConstantMemoryResource::HipConstantMemoryResource(const std::string& name, in
       m_highwatermark{0},
       m_platform{Platform::hip},
       m_offset{0},
-      m_ptr{s_umpire_internal_device_constant_memory}
+      m_ptr{nullptr},
+      m_initialized{false}
 {
 }

@@ -33,6 +34,16 @@ void* HipConstantMemoryResource::allocate(std::size_t bytes)
 {
   std::lock_guard<std::mutex> lock{m_mutex};

+  if (!m_initialized) {
+    hipError_t error = ::hipGetSymbolAddress((void**)&m_ptr, s_umpire_internal_device_constant_memory);
+
+    if (error != hipSuccess) {
+      UMPIRE_ERROR(runtime_error, umpire::fmt::format("hipGetSymbolAddress failed with error: {}", ::hipGetErrorString(error)));
+    }
+
+    m_initialized = true;
+  }
+
   char* ptr{static_cast<char*>(m_ptr) + m_offset};
   m_offset += bytes;

diff --git a/src/umpire/resource/HipConstantMemoryResource.hpp b/src/umpire/resource/HipConstantMemoryResource.hpp
index d7afac23..5e32f45f 100644
--- a/src/umpire/resource/HipConstantMemoryResource.hpp
+++ b/src/umpire/resource/HipConstantMemoryResource.hpp
@@ -39,7 +39,8 @@ class HipConstantMemoryResource : public MemoryResource {

   std::size_t m_offset;
   void* m_ptr;
-
+  bool m_initialized;
+
   std::mutex m_mutex;
 };

I didn't have any luck with that though, hipGetSymbolAddress, now throws the following runtime error:

terminate called after throwing an instance of 'umpire::runtime_error'
45:   what():  ! Umpire runtime_error [.../Umpire/src/umpire/resource/HipConstantMemoryResource.cpp:41]: hipGetSymbolAddress failed with error: invalid device symbol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant