[[labs.beatcraft.com]]~
[[CUDA]]~
#Contents

*Overview [#be483fbd]
>
This article explains how to install Ubuntu 14.04 to a PC, which is equipped with Tesla K20c.~
Then, CUDA6.5 is installed to the Ubuntu PC.~

*Hardware Specifications [#e9a8a6d0]
>
This is the list of hardware specifications.~
- CPU: core i7 3770 (3.4GHz, 4-core/8-thread)~
- Memory: 32B (DDR3-12800 8GB x4)
- HDD: 1TB (SATA, 7200rpm)
- GPU: ETSK20-5GER (NVIDIA Tesla K20c, for CUDA)
- GPU: GF-GT730-LE1GHD/D5 (NVIDIA Geforce GT730, for Video)

>
In the article of [[CUDA5/CentOS6.4]], ''Primary Display'' is configured to ''On Board (Intel GPU)'' at UEFI since ETSK20-5GER does not have any display terminals (D-SUB, DVI, and HDMI). However, when NVIDIA GPU Driver and NVIDA OpenGL Library are installed simultaneously as parts of NVIDIA ToolKit, GUI is not displayed correctly if non-NVIDIA GPU is selected for the monitor.~
In this article, since this is not used as a headless Ubuntu Server but an Ubuntu Desktop with GUI, GF-GT730-LE1GHD/D5, which is equipped with NVIDIA Geforce GT730, is added to the system, and this GPU board is assigned for the screen output.~

*Installing Ubuntu 14.04 [#c57313e2]
> Applying the configuration below, Ubuntu 14.04 LTS Desktop 64bit version is installed.
- Language: US
- Keyboard: English (or whatever you use)
- Partitions of HDD: Whole Region (Default setting)
- Network: DHCP

>
Select ''nomodeset'' at the boot option of Ubuntu install DVD.~
Do not use ''Nouveau'' an open source NVIDIA GPU driver for install Ubuntu.~

* Configuring Ubuntu 14.04 at its Post-Installation [#ae1c26d1]
>
After the installation of Ubuntu is completed, the configurations shown below are applied.

** Update Ubuntu 14.04 [#i4612129]
>
Installing the newest packages and applications, make Ubuntu updated.
 $ sudo apt-get update
 $ sudo apt dist-upgrade
After the update is completed, make sure that the system can be restarted with the updated Kernel.

** Making Nouveau Ineffective [#xdb14ad4]
>
To install and use only NVIDIA's GPU Driver, this is for preventing Nouveau driver to be read by the system. Create a file ''blacklist-nouveau.conf'' at the directory of ''/etc/modprobe.d/lacklist-nouveau.conf'' The contents of the file are shown below.
 blacklist nouveau
 options nouveau modeset = 0

>
Regenerate ''kernel initramfs'' for making the new configuration effective,
 $ sudo update-initramfs -u
>
To reboot the system, make sure that Nouveau Driver is NOT read by the system. If Nouveau Drier is NOT read, LightDM and Gnome start up in the low resolutions.~
Also, please check ''nouveau'' is not included in the video drivers, which is done by ''lsmod''.

** Installing Packages [#m8a6056d]
>
As the installation of Ubuntu 14.04 is completed, the installation of all required packages for CUDA is already completed. To operate the system effortless, the additional packages are installed.
 $ sudo apt-get install vim lv ssh naoutilus-open-terminal build-essential

* Installing CUDA 6.5 [#f84ce7b3]
>
Since CUDA 6.5, the installation of CUDA on Ubuntu becomes very easer before. Because Ubuntu has officially become a supported distribution, he repository of deb package is prepared.~
~
- Installing Package Manager
To install the package manager, please follow the directions, which are listed at [[this page>http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#ubuntu-installation]].
~
- Downloading the packages from CUDA Download page
Please download the deb package (cuda-repo-ubuntu1404_6.5-14_amd64.deb) for Ubuntu 14.04.~
[[https://developer.nvidia.com/cuda-downloads]]~
Then, install the package as applying the command line below.~
 $ sudo dpkg -i cuda-repo-ubuntu1404_6.5-14_amd64.deb
Since This package only puts NVIDIA's repository on apt sources list, please renew the index and install CUDA as using the command line below.
 $ sudo apt-get update
 $ sudo apt-get install cuda
This is the end of installing CUDA 6.5.~
Restarting the system, GUI of Ubuntu desktop became finer than before since NVIDA's GPU Driver becomes effective.

* Configuring CUDA 6.5 at its Post-Installation [#v2155fbf]
** Configuring environment [#p2259550]
>
As CUDA 6.5 is installed under the directory of /usr/local/cuda-6.5/, adjust the environment variable for accessing execution file and library, which are stored under this directory.~
~
Add the command lines shown below to the end of .bashrc file.~
 $ export PATH=/usr/local/cuda-6.5/bin:$PATH
 $ export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH
The change in the environment variable becomes effective as reopen Terminal.

** Copying CUDA Samples [#udf970bf]
>
The samples of CUDA are placed under the directory of /usr/local/cuda-6.5/samples/. In this directory, the user cannot write into these files of samples without the root privilege. To solve this issue, the samples are copied to user's home directory, where the user does have the root privilege.~
 $ cuda-install-samples-6.5.sh ~
Applying the command line above, the samples of CUDA are copied to the directory of /home/{user}/NVIDIA_CUDA-6.5_Samples/.

** Build and Execute Samples [#xeef0cbc]
>
Applying the command lines below, move to the directory where the samples have been copied, and build the samples.~
 $ cd ~/NVIDIA_CUDA-6.5_Samples
 $ make

>
Samples are built under the directory of NVIDIA_CUDA-6.5_Samples. The execution files of Samples are copied to the directory of ''~/NVIDIA/_CUDA-6.5_Samples/bin/x86_64/linux/release/''.~
 $ cd bin/x86_64/linux/release
 beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ls
 alignedTypes              cudaDecodeGL          matrixMul                scan                       simpleTemplates
 asyncAPI                  cudaOpenMP            matrixMulCUBLAS          segmentationTreeThrust     simpleTexture
 bandwidthTest             cuHook                matrixMulDrv             shfl_scan                  simpleTexture3D
 batchCUBLAS               dct8x8                matrixMulDynlinkJIT      simpleAssert               simpleTextureDrv
 bicubicTexture            deviceQuery           matrixMul_kernel64.ptx   simpleAtomicIntrinsics     simpleTexture_kernel64.ptx
 bilateralFilter           deviceQueryDrv        MC_EstimatePiInlineP     simpleCallback             simpleVoteIntrinsics
 bindlessTexture           dwtHaar1D             MC_EstimatePiInlineQ     simpleCubemapTexture       simpleZeroCopy
 binomialOptions           dxtc                  MC_EstimatePiP           simpleCUBLAS               smokeParticles
 BlackScholes              eigenvalues           MC_EstimatePiQ           simpleCUDA2GL              SobelFilter
 boxFilter                 fastWalshTransform    MC_SingleAsianOptionP    simpleCUFFT                SobolQRNG
 boxFilterNPP              FDTD3d                mergeSort                simpleCUFFT_2d_MGPU        sortingNetworks
 cdpAdvancedQuicksort      fluidsGL              MersenneTwisterGP11213   simpleCUFFT_callback       stereoDisparity
 cdpBezierTessellation     freeImageInteropNPP   MonteCarloMultiGPU       simpleCUFFT_MGPU           StreamPriorities
 cdpLUDecomposition        FunctionPointers      nbody                    simpleDevLibCUBLAS         template
 cdpQuadtree               grabcutNPP            newdelete                simpleGL                   template_runtime
 cdpSimplePrint            histEqualizationNPP   NV12ToARGB_drvapi64.ptx  simpleHyperQ               threadFenceReduction
 cdpSimpleQuicksort        histogram             oceanFFT                 simpleIPC                  threadMigration
 clock                     HSOpticalFlow         p2pBandwidthLatencyTest  simpleLayeredTexture       threadMigration_kernel64.ptx
 concurrentKernels         imageDenoising        particles                simpleMultiCopy            transpose
 conjugateGradient         imageSegmentationNPP  postProcessGL            simpleMultiGPU             UnifiedMemoryStreams
 conjugateGradientPrecond  inlinePTX             ptxjit                   simpleOccupancy            vectorAdd
 conjugateGradientUM       interval              quasirandomGenerator     simpleP2P                  vectorAddDrv
 convolutionFFT2D          jpegNPP               radixSortThrust          simplePitchLinearTexture   vectorAdd_kernel64.ptx
 convolutionSeparable      libcuhook.so.1        randomFog                simplePrintf               volumeFiltering
 convolutionTexture        lineOfSight           recursiveGaussian        simpleSeparateCompilation  volumeRender
 cppIntegration            Mandelbrot            reduction                simpleStreams
 cppOverload               marchingCubes         scalarProd               simpleSurfaceWrite

>
- Running the Binaries
To follow the directions listed [[here>http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#running-binaries]], execute deviceQuery. If this CUDA supple is built correctly, you can obtain the same result as it is listed below.~

>
 beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./deviceQuery
 ./deviceQuery Starting...

 
  CUDA Device Query (Runtime API) version (CUDART static linking)
 
 Detected 2 CUDA Capable device(s)
 
 Device 0: "Tesla K20c"
   CUDA Driver Version / Runtime Version          6.5 / 6.5
   CUDA Capability Major/Minor version number:    3.5
   Total amount of global memory:                 4800 MBytes (5032706048 bytes)
   (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
   GPU Clock rate:                                706 MHz (0.71 GHz)
   Memory Clock rate:                             2600 Mhz
   Memory Bus Width:                              320-bit
   L2 Cache Size:                                 1310720 bytes
   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
   Total amount of constant memory:               65536 bytes
   Total amount of shared memory per block:       49152 bytes
   Total number of registers available per block: 65536
   Warp size:                                     32
   Maximum number of threads per multiprocessor:  2048
   Maximum number of threads per block:           1024
   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
   Maximum memory pitch:                          2147483647 bytes
   Texture alignment:                             512 bytes
   Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
   Run time limit on kernels:                     No
   Integrated GPU sharing Host Memory:            No
   Support host page-locked memory mapping:       Yes
   Alignment requirement for Surfaces:            Yes
   Device has ECC support:                        Enabled
   Device supports Unified Addressing (UVA):      Yes
   Device PCI Bus ID / PCI location ID:           1 / 0
   Compute Mode:
      < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > 
 
 Device 1: "GeForce GT 730"
   CUDA Driver Version / Runtime Version          6.5 / 6.5
   CUDA Capability Major/Minor version number:    3.5
   Total amount of global memory:                 1023 MBytes (1073020928 bytes)
   ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
   GPU Clock rate:                                954 MHz (0.95 GHz)
   Memory Clock rate:                             2505 Mhz
   Memory Bus Width:                              64-bit
   L2 Cache Size:                                 524288 bytes
   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
   Total amount of constant memory:               65536 bytes
   Total amount of shared memory per block:       49152 bytes
   Total number of registers available per block: 65536
   Warp size:                                     32
   Maximum number of threads per multiprocessor:  2048
   Maximum number of threads per block:           1024
   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
   Maximum memory pitch:                          2147483647 bytes
   Texture alignment:                             512 bytes
   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
   Run time limit on kernels:                     Yes
   Integrated GPU sharing Host Memory:            No
   Support host page-locked memory mapping:       Yes
   Alignment requirement for Surfaces:            Yes
   Device has ECC support:                        Disabled
   Device supports Unified Addressing (UVA):      Yes
   Device PCI Bus ID / PCI location ID:           2 / 0
   Compute Mode:
       < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
 > Peer access from Tesla K20c (GPU0) -> GeForce GT 730 (GPU1) : No
 > Peer access from GeForce GT 730 (GPU1) -> Tesla K20c (GPU0) : No
 
 deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = Tesla K20c, Device1 = GeForce GT 730
 Result = PASS

>
To execute ''bandwidthTest'', obtains the results shown below.

>
 beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./bandwidthTest
 [CUDA Bandwidth Test] - Starting...
 Running on...
 
  Device 0: Tesla K20c
  Quick Mode
 
  Host to Device Bandwidth, 1 Device(s)
  PINNED Memory Transfers
    Transfer Size (Bytes)        Bandwidth(MB/s)
    33554432                     6577.3
 
  Device to Host Bandwidth, 1 Device(s)
  PINNED Memory Transfers
    Transfer Size (Bytes)        Bandwidth(MB/s)
    33554432                     6545.8
 
  Device to Device Bandwidth, 1 Device(s)
  PINNED Memory Transfers
    Transfer Size (Bytes)        Bandwidth(MB/s)
    33554432                     147234.3
 
 Result = PASS

* Revision History [#e2d6f93d]
>
- 2015/02/16 This article is initially uploaded

Front page   Edit Diff Backup Upload Copy Rename Reload   New List of pages Search Recent changes   RSS of recent changes