#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 4,937 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops llm inferentia llmops llm-serving trainium

Updated Jun 12, 2024
Python

celeritas-project / celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.

monte-carlo cuda computational-physics hep detector-simulation particle-transport

Updated Jun 12, 2024
C++

CEED / libCEED

CEED Library: Code for Efficient Extensible Discretizations

api hpc gpu julia linear-algebra cuda high-performance-computing high-order ecp exascale-computing ceed

Updated Jun 12, 2024
C

NVIDIA / cccl

CUDA C++ Core Libraries

cpp hpc gpu modern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing parallel-algorithm parallel-programming nvidia-gpu gpu-programming cuda-library cpp-programming cuda-programming accelerated-computing cuda-cpp

Updated Jun 12, 2024
C++

shader-slang / slang

Making it easier to work with shaders

shaders vulkan glsl cuda hlsl d3d12

Updated Jun 12, 2024
C++

LLNL / hiop

HPC solver for nonlinear optimization problems

Updated Jun 12, 2024
C++

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Jun 12, 2024
Python

pennylane-lightning

PennyLaneAI / pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane

hpc gpu parallel openmp mpi distributed-computing cuda quantum-computing rocm quantum-machine-learning

Updated Jun 12, 2024
C++

replicate / cog

Containers for machine learning

docker machine-learning ai deep-learning containers tensorflow cuda pytorch

Updated Jun 12, 2024
Python

spcl / dace

DaCe - Data Centric Parallel Programming

programming-language fpga cuda high-performance-computing high-level-synthesis vivado-hls

Updated Jun 12, 2024
Python

NCAR / micm

A model-independent chemistry module for atmosphere models

hpc gpu cuda gpu-acceleration atmospheric-science ode-solver atmospheric-chemistry atmospheric-modeling

Updated Jun 12, 2024
C++

ROCm / hipBLAS

ROCm BLAS marshalling library

cuda blas hip rocm

Updated Jun 12, 2024
C++

alien

chrxh / alien

ALIEN is a CUDA-powered artificial life simulation program.

cuda physics-engine artificial-life agent-based-simulation open-ended-evolution

Updated Jun 12, 2024
C++

jaredhoberock / ubu

cuda gpu-computing gpu-programming cuda-programming circlelang

Updated Jun 12, 2024
C++

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

machine-learning compiler runtime tensorflow vulkan cuda pytorch spirv jax mlir

Updated Jun 12, 2024
C++

Luminary

MilchRatchet / Luminary

CUDA based Pathtracing Offline and Realtime Renderer

c gpu graphics global-illumination cuda raytracing ray-tracing path-tracing

Updated Jun 12, 2024
Cuda

rapidsai / cudf

cuDF - GPU DataFrame Library

python data-science cpp gpu arrow pydata cuda pandas data-analysis dask dataframe rapids cudf

Updated Jun 12, 2024
C++

janhq / cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

ai cuda llama accelerated inference-engine openai-api llm stable-diffusion llms llamacpp llama2 gguf tensorrt-llm

Updated Jun 12, 2024
C++

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

machine-learning deep-learning cuda pytorch nvidia jetson tensorrt libtorch

Updated Jun 12, 2024
Python

onediff

siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

cuda pytorch lora lcm performance-optimization inference-engine diffusion-models stable-diffusion diffusers sd-webui comfyui sdxl aigc-serving lcm-lora stable-video-diffusion sdxl-turbo comfyui-workflow

Updated Jun 12, 2024
Python

Created by Nvidia

Released June 23, 2007

Followers: 205 followers
Website: developer.nvidia.com/cuda-zone
Wikipedia: Wikipedia

Related Topics

nvcc