flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
See what the GitHub community is most excited about today.
FlashInfer: Kernel Library for LLM Serving
LLM training in simple, raw C/CUDA
CUDA Library Samples
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
WholeGraph - large scale Graph Neural Networks
cuVS - a library for vector search and clustering on the GPU
CUDA accelerated rasterization of gaussian splatting
NCCL Tests
Tile primitives for speedy kernels
RCCL Performance Benchmark Tests
Causal depthwise conv1d in CUDA, with a PyTorch interface
A massively parallel, optimal functional runtime in Rust