[[labs.beatcraft.com]]~
[[CUDA]]~
#Contents
*Overview [#be483fbd]
>
This article explains how to install Ubuntu 14.04 to a PC, which is equipped with Tesla K20c.~
Then, CUDA6.5 is installed to the Ubuntu PC.~
*Hardware Specifications [#e9a8a6d0]
>
This is the list of hardware specifications.~
- CPU: core i7 3770 (3.4GHz, 4-core/8-thread)~
- Memory: 32B (DDR3-12800 8GB x4)
- HDD: 1TB (SATA, 7200rpm)
- GPU: ETSK20-5GER (NVIDIA Tesla K20c, for CUDA)
- GPU: GF-GT730-LE1GHD/D5 (NVIDIA Geforce GT730, for Video)
>
In the article of [[CUDA5/CentOS6.4]], ''Primary Display'' is configured to ''On Board (Intel GPU)'' at UEFI since ETSK20-5GER does not have any display terminals (D-SUB, DVI, and HDMI). However, when NVIDIA GPU Driver and NVIDA OpenGL Library are installed simultaneously as parts of NVIDIA ToolKit, GUI is not displayed correctly if non-NVIDIA GPU is selected for the monitor.~
In this article, since this is not used as a headless Ubuntu Server but an Ubuntu Desktop with GUI, GF-GT730-LE1GHD/D5, which is equipped with NVIDIA Geforce GT730, is added to the system, and this GPU board is assigned for the screen output.~
*Installing Ubuntu 14.04 [#c57313e2]
> Applying the configuration below, Ubuntu 14.04 LTS Desktop 64bit version is installed.
- Language: US
- Keyboard: English (or whatever you use)
- Partitions of HDD: Whole Region (Default setting)
- Network: DHCP
>
Select ''nomodeset'' at the boot option of Ubuntu install DVD.~
Do not use ''Nouveau'' an open source NVIDIA GPU driver for install Ubuntu.~
* Configuring Ubuntu 14.04 at its Post-Installation [#ae1c26d1]
>
After the installation of Ubuntu is completed, the configurations shown below are applied.
** Update Ubuntu 14.04 [#i4612129]
>
Installing the newest packages and applications, make Ubuntu updated.
$ sudo apt-get update
$ sudo apt dist-upgrade
After the update is completed, make sure that the system can be restarted with the updated Kernel.
** Making Nouveau Ineffective [#xdb14ad4]
>
To install and use only NVIDIA's GPU Driver, this is for preventing Nouveau driver to be read by the system. Create a file ''blacklist-nouveau.conf'' at the directory of ''/etc/modprobe.d/lacklist-nouveau.conf'' The contents of the file are shown below.
blacklist nouveau
options nouveau modeset = 0
>
Regenerate ''kernel initramfs'' for making the new configuration effective,
$ sudo update-initramfs -u
>
To reboot the system, make sure that Nouveau Driver is NOT read by the system. If Nouveau Drier is NOT read, LightDM and Gnome start up in the low resolutions.~
Also, please check ''nouveau'' is not included in the video drivers, which is done by ''lsmod''.
** Installing Packages [#m8a6056d]
>
As the installation of Ubuntu 14.04 is completed, the installation of all required packages for CUDA is already completed. To operate the system effortless, the additional packages are installed.
$ sudo apt-get install vim lv ssh naoutilus-open-terminal build-essential
* Installing CUDA 6.5 [#f84ce7b3]
>
Since CUDA 6.5, the installation of CUDA on Ubuntu becomes very easer before. Because Ubuntu has officially become a supported distribution, he repository of deb package is prepared.~
~
- Installing Package Manager
To install the package manager, please follow the directions, which are listed at [[this page>http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#ubuntu-installation]].
~
- Downloading the packages from CUDA Download page
Please download the deb package (cuda-repo-ubuntu1404_6.5-14_amd64.deb) for Ubuntu 14.04.~
[[https://developer.nvidia.com/cuda-downloads]]~
Then, install the package as applying the command line below.~
$ sudo dpkg -i cuda-repo-ubuntu1404_6.5-14_amd64.deb
Since This package only puts NVIDIA's repository on apt sources list, please renew the index and install CUDA as using the command line below.
$ sudo apt-get update
$ sudo apt-get install cuda
This is the end of installing CUDA 6.5.~
Restarting the system, GUI of Ubuntu desktop became finer than before since NVIDA's GPU Driver becomes effective.
* Configuring CUDA 6.5 at its Post-Installation [#v2155fbf]
** Configuring environment [#p2259550]
>
As CUDA 6.5 is installed under the directory of /usr/local/cuda-6.5/, adjust the environment variable for accessing execution file and library, which are stored under this directory.~
~
Add the command lines shown below to the end of .bashrc file.~
$ export PATH=/usr/local/cuda-6.5/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH
The change in the environment variable becomes effective as reopen Terminal.
** Copying CUDA Samples [#udf970bf]
>
The samples of CUDA are placed under the directory of /usr/local/cuda-6.5/samples/. In this directory, the user cannot write into these files of samples without the root privilege. To solve this issue, the samples are copied to user's home directory, where the user does have the root privilege.~
$ cuda-install-samples-6.5.sh ~
Applying the command line above, the samples of CUDA are copied to the directory of /home/{user}/NVIDIA_CUDA-6.5_Samples/.
** Build and Execute Samples [#xeef0cbc]
>
Applying the command lines below, move to the directory where the samples have been copied, and build the samples.~
$ cd ~/NVIDIA_CUDA-6.5_Samples
$ make
>
Samples are built under the directory of NVIDIA_CUDA-6.5_Samples. The execution files of Samples are copied to the directory of ''~/NVIDIA/_CUDA-6.5_Samples/bin/x86_64/linux/release/''.~
$ cd bin/x86_64/linux/release
beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ls
alignedTypes cudaDecodeGL matrixMul scan simpleTemplates
asyncAPI cudaOpenMP matrixMulCUBLAS segmentationTreeThrust simpleTexture
bandwidthTest cuHook matrixMulDrv shfl_scan simpleTexture3D
batchCUBLAS dct8x8 matrixMulDynlinkJIT simpleAssert simpleTextureDrv
bicubicTexture deviceQuery matrixMul_kernel64.ptx simpleAtomicIntrinsics simpleTexture_kernel64.ptx
bilateralFilter deviceQueryDrv MC_EstimatePiInlineP simpleCallback simpleVoteIntrinsics
bindlessTexture dwtHaar1D MC_EstimatePiInlineQ simpleCubemapTexture simpleZeroCopy
binomialOptions dxtc MC_EstimatePiP simpleCUBLAS smokeParticles
BlackScholes eigenvalues MC_EstimatePiQ simpleCUDA2GL SobelFilter
boxFilter fastWalshTransform MC_SingleAsianOptionP simpleCUFFT SobolQRNG
boxFilterNPP FDTD3d mergeSort simpleCUFFT_2d_MGPU sortingNetworks
cdpAdvancedQuicksort fluidsGL MersenneTwisterGP11213 simpleCUFFT_callback stereoDisparity
cdpBezierTessellation freeImageInteropNPP MonteCarloMultiGPU simpleCUFFT_MGPU StreamPriorities
cdpLUDecomposition FunctionPointers nbody simpleDevLibCUBLAS template
cdpQuadtree grabcutNPP newdelete simpleGL template_runtime
cdpSimplePrint histEqualizationNPP NV12ToARGB_drvapi64.ptx simpleHyperQ threadFenceReduction
cdpSimpleQuicksort histogram oceanFFT simpleIPC threadMigration
clock HSOpticalFlow p2pBandwidthLatencyTest simpleLayeredTexture threadMigration_kernel64.ptx
concurrentKernels imageDenoising particles simpleMultiCopy transpose
conjugateGradient imageSegmentationNPP postProcessGL simpleMultiGPU UnifiedMemoryStreams
conjugateGradientPrecond inlinePTX ptxjit simpleOccupancy vectorAdd
conjugateGradientUM interval quasirandomGenerator simpleP2P vectorAddDrv
convolutionFFT2D jpegNPP radixSortThrust simplePitchLinearTexture vectorAdd_kernel64.ptx
convolutionSeparable libcuhook.so.1 randomFog simplePrintf volumeFiltering
convolutionTexture lineOfSight recursiveGaussian simpleSeparateCompilation volumeRender
cppIntegration Mandelbrot reduction simpleStreams
cppOverload marchingCubes scalarProd simpleSurfaceWrite
>
- Running the Binaries
To follow the directions listed [[here>http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#running-binaries]], execute deviceQuery. If this CUDA supple is built correctly, you can obtain the same result as it is listed below.~
>
beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "Tesla K20c"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 4800 MBytes (5032706048 bytes)
(13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
GPU Clock rate: 706 MHz (0.71 GHz)
Memory Clock rate: 2600 Mhz
Memory Bus Width: 320-bit
L2 Cache Size: 1310720 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "GeForce GT 730"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 1023 MBytes (1073020928 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 954 MHz (0.95 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla K20c (GPU0) -> GeForce GT 730 (GPU1) : No
> Peer access from GeForce GT 730 (GPU1) -> Tesla K20c (GPU0) : No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = Tesla K20c, Device1 = GeForce GT 730
Result = PASS
>
To execute ''bandwidthTest'', obtains the results shown below.
>
beat@tesla:~/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: Tesla K20c
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6577.3
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6545.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 147234.3
Result = PASS
* Revision History [#e2d6f93d]
>
- 2015/02/16 This article is initially uploaded