Linux Follies: nvidia

Showing posts with label nvidia. Show all posts

2023-05-29

Linus Tech Tips takes a look at the Nvidia Grace CPU and the Hopper GPU

Nvidia has a new ARM-based CPU which they announced some time ago. Here, Linus Tech Tips takes a look at it at COMPUTEX Taipei 2023. The design is similar to Apple silicon, where CPU and memory are on the same chip. Nvidia does split out the GPU, connected via Nvlink.

2022-01-12

Nvidia acquires Bright Computing

Nvidia solidifies its reach into HPC, acquiring Bright Computing. More coverage at HPC Wire. This comes on the heels of its acquisition of Mellanox in April of last year.

2021-08-24

US DoE (Argonne) to acquire AMD+Nvidia supercomputer as testbed for delayed Intel-based exascale supercomputer

From Reuters: The Nvidia and AMD machine, to be called Polaris, will not be a replacement for the Intel-based Aurora machine slated for the Argonne National Lab near Chicago, which was poised to be the nation's fastest computer when announced in 2019.

Instead, Polaris, which will come online this year, will be a test machine for Argonne to start readying its software for the Intel machine, the people familiar with the matter said.

2014-07-01

Using the NVIDIA Python plugin for Ganglia monitoring under Bright Cluster Manager

The github repo for Ganglia gmond Python plugins contains a plugin for monitoring NVIDIA GPUs. This presumes that the NVIDIA Deployment Kit, which contains the NVML (management library), is installed via the normal means into the usual places. If you are using Bright Cluster Manager, you would have used Bright's cuda60/tdk to do the installation. That means that the libnvidia-ml.so library is not in one of the standard library directories. To fix it, just modify the /etc/init.d/gmond init script. Near the top, modify the LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/cm/local/apps/cuda/libs/current/lib64

The modifications to Ganglia Web, however, are out of date. I will make another post once I figure out how to do modify Ganglia Web to display the NVIDIA metrics.

UPDATE: Well, turns out there seems to be no need to modify the Ganglia Web installation. Under the host view, there is a tab for "gpu metrics" which shows 22 available metrics.

2012-05-21

NVIDIA Nsight Eclipse Edition

One of the new products announced along with CUDA 5 at the recent GPU Technology Conference was NVIDIA Nsight Eclipse Edition, which runs on Linux and Mac OS X. Previously, the only IDE available was Nsight Visual Studio which ran only on Windows.

I attended the demo talk for Nsight Eclipse, and it seemed a well thought out product. It gives access to all running threads on all cores, optimization suggestions, debugging interface, etc. Plus the usual Eclipse features like refactoring, build, version control. Watch the video:

Nsight Eclipse Edition is distributed as a pre-built binary, i.e. you can't just point Eclipse to a new software source. And, you have to be in the registered developer program to get access to the download.

Once you install the CUDA Toolkit, say in CUDAHOME=/usr/local/cuda, the nsight executable is in ${CUDAHOME}/libnsight.

2012-04-02

Using NVIDIA drivers in Fedora 16

UPDATE 2 (2012-08-23): The Nvidia installer now makes use of DKMS, which causes any kernel updates to rebuild the Nvidia kernel module. So, no need to go through this rigamarole at every kernel update.
UPDATE: There was a typo in my lspci command line. Should have been VGA and not CGA.

Installing the latest NVIDIA drivers under Fedora (or really, any distribution), is a little roundabout. Here is how I do it, which is a mix and match of several howtos on the net. Part of the reason the process is a little complicated is the use of the open source Nouveau drivers: these have to be removed before NVIDIA's drivers can be installed.

The canonical reference for all things Linux+NVIDIA is if-not-true-then-false.com. Their write-up on NVIDIA and Fedora 16 gives directions to use the RPMFusion repositories, which provide non-free software (including the NVIDIA drivers).

We will follow their instructions for removing Nouveau, but install NVIDIA drivers downloaded from NVIDIA themselves.

$ sudo -i
# yum install gcc kernel-devel
# yum update kernel* selinux-policy
# reboot

To remove nouveau, build a new initramfs image:

# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
# dracut /boot/initramfs-$(uname -r).img $(uname -r)
# reboot

Next, edit the file /etc/default/grub. To the line that defines GRUB_CMDLINE_LINUX, append the following:

rdblacklist=nouveau nouveau.modeset=0

Mine looks like:

GRUB_CMDLINE_LINUX="rd.md=0 rd.lvm.lv=vg_johnny/lv_swap rd.dm=0 KEYTABLE=us quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 LANG=en_US.UTF-8 rdblacklist=nouveau nouveau.modeset=0"

Then, make the grub2 config:

# grub2-mkconfig -o /boot/grub2/grub.cfg

Next, find the model number of your GPU card, and find the appropriate driver from NVIDIA:

> lspci | grep VGA

Mine shows:

01:00.0 VGA compatible controller: nVidia Corporation G92 [GeForce 9800 GT] (rev a2)

Then, go to NVIDIA's Linux driver page and pick the appropriate version. For me, it was Linux x86_64/AMD64/EM64T, version 295.33 -- the driver installer is a file named NVIDIA-Linux-x86_64-295.33.run

The installer will build a kernel module, but to do so, you must be in runlevel 3 (i.e. no GUI, but with networking):

# telinit 3

You will drop down to the console prompt. Login as root, and then do:

# sh NVIDIA-Linux-x86_64-295.33.run

and answer the prompts along the way. You should be able to just do "telinit 5" to get back the GUI login, but I usually just reboot.

~~Now, whenever the kernel is updated, you will have to rebuild the kernel module by repeating the last step.~~

At the final step of the Nvidia installation, you will be asked if you want to enable dkms, which allows kernel updates to rebuild the Nvidia kernel module automatically. Say "yes".