A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 12, 2024 - Python
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A high-throughput and memory-efficient inference and serving engine for LLMs
Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
CEED Library: Code for Efficient Extensible Discretizations
CUDA C++ Core Libraries
HPC solver for nonlinear optimization problems
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
Containers for machine learning
DaCe - Data Centric Parallel Programming
A model-independent chemistry module for atmosphere models
ALIEN is a CUDA-powered artificial life simulation program.
A retargetable MLIR-based machine learning compiler and runtime toolkit.
CUDA based Pathtracing Offline and Realtime Renderer
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
OneDiff: An out-of-the-box acceleration library for diffusion models.
Created by Nvidia
Released June 23, 2007