Pages

vineri, 20 octombrie 2023

News : Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM is available.

Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source library is now available for free on the /NVIDIA/TensorRT-LLM GitHub repo and as part of the NVIDIA NeMo framework.