Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9 cuda tests failed #586

Open
yys123456 opened this issue Jul 16, 2022 · 9 comments
Open

9 cuda tests failed #586

yys123456 opened this issue Jul 16, 2022 · 9 comments

Comments

@yys123456
Copy link

I tried to build the terra from the source using CMake, and it was all good until running the terra test, 9 tests starting with cuda couldn't pass, but some other tests beginning with cuda like cudaprintf passed.
image
the environment on my computer: CUDA11.7, Visual Studio 17 2022, GTX1050ti, clang+llvm-11.1.0-x86_64-windows-msvc17

@yys123456 yys123456 changed the title 9 testcases stating with cuda failed 9 tests stating with cuda failed Jul 16, 2022
@yys123456 yys123456 changed the title 9 tests stating with cuda failed 9 tests starting with cuda failed Jul 16, 2022
@yys123456 yys123456 changed the title 9 tests starting with cuda failed 9 cuda tests failed Jul 16, 2022
@yys123456
Copy link
Author

the commands used to build from source

  1. cmake -DCMAKE_INSTALL_PREFIX=./../install .. -DTERRA_ENABLE_CUDA=ON -G "Visual Studio 17 2022"
D:\terra\build>cmake -DCMAKE_INSTALL_PREFIX=./../install .. -DTERRA_ENABLE_CUDA=ON -G "Visual Studio 17 2022"
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19043.
-- The C compiler identification is MSVC 19.32.31332.0
-- The CXX compiler identification is MSVC 19.32.31332.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Clang libraries: D:/LLVM/lib/clangFrontend.lib;D:/LLVM/lib/clangDriver.lib;D:/LLVM/lib/clangSerialization.lib;D:/LLVM/lib/clangCodeGen.lib;D:/LLVM/lib/clangParse.lib;D:/LLVM/lib/clangSema.lib;D:/LLVM/lib/clangAnalysis.lib;D:/LLVM/lib/clangEdit.lib;D:/LLVM/lib/clangAST.lib;D:/LLVM/lib/clangASTMatchers.lib;D:/LLVM/lib/clangLex.lib;D:/LLVM/lib/clangBasic.lib
-- Found Clang: D:/LLVM/include
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.37.1.windows.1")
-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7 (found version "11.7")
-- Using Lua: LuaJIT commit 50936d784474747b4569d988767f1b5bab8bb6d0
-- Configuring done
-- Generating done
-- Build files have been written to: D:/terra/build
  1. cmake --build . --target INSTALL --config Release

image

@elliottslaughter
Copy link
Member

Hi @yys123456,

I don't have a Windows dev box, so I'm going to need you to take the lead on fixing this. There may be a couple of other users hanging around the issue tracker who use Windows, but I don't know how many of them use CUDA on Windows.

What I can tell you is that CUDA on Linux is tested regularly. So whatever is going on here is specific to either (a) Windows, (b) GTX1050ti, or (c) something particular to your dev machine.

The parts of the build you've shown so far look fine. I think the next thing would be to look at the specific tests as see how they're failing.

P.S. If you don't mind, it would be nice to copy-and-paste the screenshots as text instead of images. Thanks.

@sssphil
Copy link

sssphil commented Jun 1, 2023

Hi @elliottslaughter, I've run into a similar situation on Ubuntu 20.04 with CUDA 11.7 except that I don't have cudaprintf.t but cudaoo.t in the list:

=================
= FAILING tests
=================
cudatest.t
cudashared.t
cudatex.t
cudaoffline.t
cudaoo.t
cudaaggregate.t
cudaatomic.t
cudaagg.t
cudaglobal.t
=================

I've compiled llvm from github release 16.0.4 along with clang and polly, and I've turned off the cmake flags following the instruction in this repo.

Running some of the test files alone gives this:

$ ../../../bin/terra cudaatomic.t 
<buffer>:1:10: fatal error: 'cuda_runtime.h' file not found
#include "cuda_runtime.h"
         ^~~~~~~~~~~~~~~~
compilation of included c code failed

stack traceback:
	[C]: in function 'registercfile'
	...syang/workspace/dynamicfusion/terra_src/src/terralib.lua:3529: in function 'includecstring'
	cudaatomic.t:22: in main chunk

Could it be a problem with compiling llvm? I tried a pre-compiled release of llvm 13 from its repo and I remember the tests all passed. But I was having problem with some old optimization code so I'm compiling everything altogether

@elliottslaughter
Copy link
Member

Where is your CUDA installed to? It's probably just missing the correct path. E.g., if your CUDA is installed to /usr/local/cuda-11, you could set:

export CUDA_HOME=/usr/local/cuda-11

@sssphil
Copy link

sssphil commented Jun 5, 2023

Thanks for the reply! I followed the instructions on NVIDIA's website, and $whereis CUDA shows /usr/local/cuda/. I set CUDA_HOME and tried again (also compiled with the variable) but it still shows the same error.

@elliottslaughter
Copy link
Member

Try setting INCLUDE_PATH=$CUDA_HOME/include and see if that changes anything.

@sssphil
Copy link

sssphil commented Jun 5, 2023

Thanks for the help! Now all tests have passed. I'm wondering if I missed any steps while compiling from code. Or should I set up the environment variables when using terra?

@elliottslaughter
Copy link
Member

No, it is not expected that you should need to set these variables. Something is going wrong.

Please got to src/terralib.lua at line 4345 and add the following debug prints:

print("CUDA_HOME", os.getenv("CUDA_HOME"))
print("terra.cudahome", terra.cudahome)
for k,v in pairs(terra.cudalibpaths) do
  print("terra.cudalibpaths", k, v)
end

Note that due to build glitches, you may need to make clean && make to see these change take effect.

@sssphil
Copy link

sssphil commented Jun 8, 2023

Hi @elliottslaughter, somehow I couldn't reproduce the problem anymore. I've been uninstalling and reinstalling CUDA and I've tried multiple versions and reverted to 11.7. Maybe it has something to do with how CUDA was installed?
Here is the debug output from running cudatest.t, but I guess everything should be normal now.

CUDA_HOME	nil
terra.cudahome	/usr/local/cuda
terra.cudalibpaths	nvvm	/usr/local/cuda/nvvm/lib64/libnvvm.so
terra.cudalibpaths	runtime	/usr/local/cuda/lib64/libcudart.so
terra.cudalibpaths	driver	libcuda.so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants