Would'a, Could'a, Should'a. Cuda!
From SC08 Student Competition
Teaser Description
- At least this problem description didn't mention 'gouda' in the title, that would have been cheesy. To prepare for this problem, familiarize yourself with the basics of CUDA programming. A wonderful walkthrough can be found at Dr. Dobb's Journal, http://www.ddj.com/cpp/207200659
Description
- How do CPU-bound applications compare with GPGPU applications in terms of performance? How can you tell where the bottlenecks are? In this problem you are to examine the matrix-matrix multiply performance running on the CPU with matrix-matrix multiply performance running on the GPU. Start with the basic matrix-matrix multiply code found in team member's "CUDA-Problem" directory in the competition environment. Using this code as the base, extend the code to perform the identical matrix-matrix multiply task on the system's GPGPUs, and compare the results.
Technology
- Required tools.
- CUDA sdk and included project codes.
- nvcc compiler for CUDA, available on cuda.gig1.loc in the pool of resources within the computational environment.
- gcc and Intel compilers
- ssh (to get over to the cuda.gig1.loc system)
Suggested tools.
- Debugging tools.
Logistics
- There is only one CUDA-equipped host available in the competition environment. That host is
cuda.gig1.loc. This host is also directly accessible from the Internet viacuda.littlefe.net. This system is equipped with four GPU cards: two NVidia 8800 GTS's and two NVidia 8600 GTS's. - Leverage the matrix-multiplication codes provided to you in the NVidia SDK under
/opt/cuda-sdk/projects. - Keep your tests small. Limit the size of your matrices to less than 200x200 when testing.
- Using a packaged BLAS (Basic Linear Algebra Subroutines) library on the GPUs is acceptable, provided a packaged BLAS library is also used on the CPU. If BLAS libraries are used, appropriate citation must be provided in the comments of your code or in a supplemental text file accompanying your solutions.
- Many articles and howto's exist on matrix multiplication in CUDA. You are free to base your solution on any howto or article that you find, provided that you fully cite the original article. For example, Michael Wolfe, Compiler Engineer of The Portland Group, Inc., has published a rather good walk-through of the process.
Grading: What is to be turned-in on your team's USB drive under a directory named "CUDA":
- The code (as a plain text file) with proper citations in-lined as comments
- Timing results for the 200x200 matrix-matrix multiplication run (as a plain text file). These results must contain
- Timings for both the CPU-bound and the GPGPU-bound matrix multiplication tasks for the 200x200 (square) matrix multiplication task
- A "sanity check" routine following the GPGPU calculation to check for errors in your GPGPU results
- Your code must give appropriate attribution to any work by original authors, including code used from the (suggested) CUDA SDK and any BLAS library
What the graders will be looking for.
- Code that compiles without errors
- Code that returns correct results
- Timing results appropriate for the task at hand
- First-author codes (not derived from other sources) will be worth 30% of the score.
- Submissions derived from sources not cited will receive no points, and may put the team at risk of being disqualified from the competition (the judges ruling is final)
