labs.beatcraft.com
CUDA

Overview

This article explains how to install Ubuntu 14.04 to a PC, which is equipped with Tesla K20c.
Then, CUDA6.5 is installed to the Ubuntu PC.

Hardware Specifications

This is the list of hardware specifications.

  • CPU: core i7 3770 (3.4GHz, 4-core/8-thread)
  • Memory: 32B (DDR3-12800 8GB x4)
  • HDD: 1TB (SATA, 7200rpm)
  • GPU: ETSK20-5GER (NVIDIA Tesla K20c, for CUDA)
  • GPU: GF-GT730-LE1GHD/D5 (NVIDIA Geforce GT730, for Video)

In the article of CUDA5/CentOS6.4, Primary Display is configured to On Board (Intel GPU) at UEFI since ETSK20-5GER does not have any display terminals (D-SUB, DVI, and HDMI). However, when NVIDIA GPU Driver and NVIDA OpenGL Library are installed simultaneously as parts of NVIDIA ToolKit, GUI is not displayed correctly if non-NVIDIA GPU is selected for the monitor.
In this article, since this is not used as a headless Ubuntu Server but an Ubuntu Desktop with GUI, GF-GT730-LE1GHD/D5, which is equipped with NVIDIA Geforce GT730, is added to the system, and this GPU board is assigned for the screen output.

Installing Ubuntu 14.04

Applying the configuration below, Ubuntu 14.04 LTS Desktop 64bit version is installed.

  • Language: US
  • Keyboard: English (or whatever you use)
  • Partitions of HDD: Whole Region (Default setting)
  • Network: DHCP

Select nomodeset at the boot option of Ubuntu install DVD.
Do not use Nouveau an open source NVIDIA GPU driver for install Ubuntu.

Configuring Ubuntu 14.04 at its Post-Installation

After the installation of Ubuntu is completed, the configurations shown below are applied.

Update Ubuntu 14.04

Installing the newest packages and applications, make Ubuntu updated.

$ sudo apt-get update
$ sudo apt dist-upgrade

After the update is completed, make sure that the system can be restarted with the updated Kernel.

Making Nouveau Ineffective

To install and use only NVIDIA's GPU Driver, this is for preventing Nouveau driver to be read by the system. Create a file blacklist-nouveau.conf at the directory of /etc/modprobe.d/lacklist-nouveau.conf The contents of the file are shown below.

blacklist nouveau
options nouveau modeset = 0

Regenerate kernel initramfs for making the new configuration effective,

$ sudo update-initramfs -u

To reboot the system, make sure that Nouveau Driver is NOT read by the system. If Nouveau Drier is NOT read, LightDM and Gnome start up in the low resolutions.
Also, please check nouveau is not included in the video drivers, which is done by lsmod.

Installing Packages

As the installation of Ubuntu 14.04 is completed, the installation of all required packages for CUDA is already completed. To operate the system effortless, the additional packages are installed.

$ sudo apt-get install vim lv ssh naoutilus-open-terminal build-essential

Installing CUDA 6.5

Since CUDA 6.5, the installation of CUDA on Ubuntu becomes very easer before. Because Ubuntu has officially become a supported distribution, he repository of deb package is prepared.

  • Installing Package Manager To install the package manager, please follow the directions, which are listed at this page.
  • Downloading the packages from CUDA Download page Please download the deb package (cuda-repo-ubuntu1404_6.5-14_amd64.deb) for Ubuntu 14.04.
    https://developer.nvidia.com/cuda-downloads
    Then, install the package as applying the command line below.
    $ sudo dpkg -i cuda-repo-ubuntu1404_6.5-14_amd64.deb
    Since This package only puts NVIDIA's repository on apt sources list, please renew the index and install CUDA as using the command line below.
    $ sudo apt-get update
    $ sudo apt-get install cuda
    This is the end of installing CUDA 6.5.
    Restarting the system, GUI of Ubuntu desktop became finer than before since NVIDA's GPU Driver becomes effective.

Configuring CUDA 6.5 at its Post-Installation

Configuring environment

As CUDA 6.5 is installed under the directory of /usr/local/cuda-6.5/, adjust the environment variable for accessing execution file and library, which are stored under this directory.

Add the command lines shown below to the end of .bashrc file.

$ export PATH=/usr/local/cuda-6.5/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH

The change in the environment variable becomes effective as reopen Terminal.

Copying CUDA Samples

The samples of CUDA are placed under the directory of /usr/local/cuda-6.5/samples/. In this directory, the user cannot write into these files of samples without the root privilege. To solve this issue, the samples are copied to user's home directory, where the user does have the root privilege.

$ cuda-install-samples-6.5.sh ~

Applying the command line above, the samples of CUDA are copied to the directory of /home/{user}/NVIDIA_CUDA-6.5_Samples/.

Build and Execute Samples

Applying the command lines below, move to the directory where the samples have been copied, and build the samples.

$ cd ~/NVIDIA_CUDA-6.5_Samples
$ make

Samples are built under the directory of NVIDIA_CUDA-6.5_Samples. The execution files of Samples are copied to the directory of ~/NVIDIA/_CUDA-6.5_Samples/bin/x86_64/linux/release/.

$ cd bin/x86_64/linux/release
beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ls
alignedTypes              cudaDecodeGL          matrixMul                scan                       simpleTemplates
asyncAPI                  cudaOpenMP            matrixMulCUBLAS          segmentationTreeThrust     simpleTexture
bandwidthTest             cuHook                matrixMulDrv             shfl_scan                  simpleTexture3D
batchCUBLAS               dct8x8                matrixMulDynlinkJIT      simpleAssert               simpleTextureDrv
bicubicTexture            deviceQuery           matrixMul_kernel64.ptx   simpleAtomicIntrinsics     simpleTexture_kernel64.ptx
bilateralFilter           deviceQueryDrv        MC_EstimatePiInlineP     simpleCallback             simpleVoteIntrinsics
bindlessTexture           dwtHaar1D             MC_EstimatePiInlineQ     simpleCubemapTexture       simpleZeroCopy
binomialOptions           dxtc                  MC_EstimatePiP           simpleCUBLAS               smokeParticles
BlackScholes              eigenvalues           MC_EstimatePiQ           simpleCUDA2GL              SobelFilter
boxFilter                 fastWalshTransform    MC_SingleAsianOptionP    simpleCUFFT                SobolQRNG
boxFilterNPP              FDTD3d                mergeSort                simpleCUFFT_2d_MGPU        sortingNetworks
cdpAdvancedQuicksort      fluidsGL              MersenneTwisterGP11213   simpleCUFFT_callback       stereoDisparity
cdpBezierTessellation     freeImageInteropNPP   MonteCarloMultiGPU       simpleCUFFT_MGPU           StreamPriorities
cdpLUDecomposition        FunctionPointers      nbody                    simpleDevLibCUBLAS         template
cdpQuadtree               grabcutNPP            newdelete                simpleGL                   template_runtime
cdpSimplePrint            histEqualizationNPP   NV12ToARGB_drvapi64.ptx  simpleHyperQ               threadFenceReduction
cdpSimpleQuicksort        histogram             oceanFFT                 simpleIPC                  threadMigration
clock                     HSOpticalFlow         p2pBandwidthLatencyTest  simpleLayeredTexture       threadMigration_kernel64.ptx
concurrentKernels         imageDenoising        particles                simpleMultiCopy            transpose
conjugateGradient         imageSegmentationNPP  postProcessGL            simpleMultiGPU             UnifiedMemoryStreams
conjugateGradientPrecond  inlinePTX             ptxjit                   simpleOccupancy            vectorAdd
conjugateGradientUM       interval              quasirandomGenerator     simpleP2P                  vectorAddDrv
convolutionFFT2D          jpegNPP               radixSortThrust          simplePitchLinearTexture   vectorAdd_kernel64.ptx
convolutionSeparable      libcuhook.so.1        randomFog                simplePrintf               volumeFiltering
convolutionTexture        lineOfSight           recursiveGaussian        simpleSeparateCompilation  volumeRender
cppIntegration            Mandelbrot            reduction                simpleStreams
cppOverload               marchingCubes         scalarProd               simpleSurfaceWrite
  • Running the Binaries To follow the directions listed here, execute deviceQuery. If this CUDA supple is built correctly, you can obtain the same result as it is listed below.
beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 4800 MBytes (5032706048 bytes)
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Clock rate:                                706 MHz (0.71 GHz)
  Memory Clock rate:                             2600 Mhz
  Memory Bus Width:                              320-bit
  L2 Cache Size:                                 1310720 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > 

Device 1: "GeForce GT 730"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 1023 MBytes (1073020928 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Clock rate:                                954 MHz (0.95 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           2 / 0
  Compute Mode:
      < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla K20c (GPU0) -> GeForce GT 730 (GPU1) : No
> Peer access from GeForce GT 730 (GPU1) -> Tesla K20c (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = Tesla K20c, Device1 = GeForce GT 730
Result = PASS

To execute bandwidthTest, obtains the results shown below.

beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Tesla K20c
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     6577.3

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     6545.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     147234.3

Result = PASS

Revision History

  • 2015/02/16 This article is initially uploaded

Front page   Edit Freeze Diff Backup Upload Copy Rename Reload   New List of pages Search Recent changes   RSS of recent changes
Last-modified: 2015-02-16 (Mon) 10:15:28 (3356d)