Unified memory vs. discrete VRAM
Jetson modules use a unified memory pool shared between CPU, GPU, and OS — there is no dedicated VRAM. On an 8 GB Orin Nano, the OS and runtime consume ~1.5–2 GB before inference begins. Memory planning for edge AI deployments must account for this overhead. Discrete GPU cards (e.g. desktop RTX) maintain separate VRAM, but Jetson has no such separation.
Quantization and activation memory
INT8 quantization reduces weight storage 4× versus FP32, but activation memory — which scales with input resolution and batch size — is computed in FP16 even in INT8 networks. This means VRAM and RAM sizing for INT8 models at 640×640 or higher resolution still requires careful accounting of activation buffers, not just weight size.
Related tools
Module Power Calculator — size PSU and thermal budget alongside memory.
Inference Throughput Estimator — estimate FPS and latency once memory fit is confirmed.
Full Deployment Planner — combine memory, power, and throughput into an end-to-end edge AI BOM.