NVIDIA L4 – what is it?

2026-04-10

De Novo Cloud Expert

NVIDIA L4 is a server-class tensor accelerator based on the Ada Lovelace architecture, optimized for energy-efficient inference of AI models, video processing, computer vision, and graphics virtualization in data centers and edge computing environments. Key specifications include 24 GB of GDDR6 memory with ECC, 300 GB/s memory bandwidth, a 72 W TDP, and a low-profile, single-slot PCIe Gen4 x16 form factor with passive cooling, making it suitable for high-density server deployments.

The fourth-generation Tensor Cores in the NVIDIA L4 deliver up to 485 TFLOPS in FP8 / INT8 (with sparsity) and 242 TFLOPS in FP16 / BF16, providing up to a 2.5–4x inference performance improvement compared to the previous generation. The NVIDIA L4 is used for LLM inference, generative AI, streaming video analytics, real-time transcoding, and vGPU workloads. It supports CUDA, TensorRT, CV-CUDA, and other NVIDIA software stacks, delivering high performance per watt and scalability for cloud and edge services handling thousands of concurrent requests.