What is unified memory on Jetson?

Jetson modules use a unified memory architecture — there is no separate VRAM. CPU processes, the OS, and GPU inference all share the same physical memory pool. This means your 8 GB Orin Nano isn't 8 GB dedicated to inference; the OS alone uses ~1.5–2 GB.

Why does INT8 not reduce memory 4×?

INT8 reduces weight storage 4× vs FP32. But activation memory — the largest component at high resolutions — is computed in FP16 even in INT8 networks. Runtime activation memory reduction is ~2×, not 4×.

What is TensorRT build workspace?

During trtexec export, TensorRT allocates 1–4 GB of temporary workspace for kernel selection and layer fusion. This is a one-time cost at build time — it doesn't consume memory during inference.

How accurate are these estimates?

±30% for activations and runtime overhead. Weights are exact (calculated from verified parameter counts). Always validate with jtop or tegrastats on device before finalising memory specifications.

// Decision Engine 09

Estimate inference memory for your edge AI workload

Memory planning for edge AI deployments determines whether a model fits on-device without swapping or OOM errors. This tool calculates VRAM and system RAM requirements across quantization levels (FP32, FP16, INT8) and hardware platforms — covering Jetson unified memory constraints, Coral SRAM limits, and Hailo-8 on-chip buffers. Supports NVIDIA Jetson (unified memory), Google Coral (on-chip SRAM), and Hailo-8 / 8L.

// Hardware context

AI Hardware Platform

// Select your primary hardware vendor

Loading hardware catalog…

Hardware Module

// Specific module within the platform

// Select a platform first

Runtime / Framework

// Inference engine — affects runtime overhead allocation

// Model configuration

Model Family

// Primary inference model architecture

Model Variant

// Specific model size within the family

// Select a model family first

Precision

// Quantization level — directly affects memory footprint

Batch Size

// Concurrent inference batch — affects activation memory

// Larger batches increase activation memory proportionally

Concurrent Streams

// Number of camera or video streams processed in parallel

Input Resolution

// Frame resolution — affects activation memory scaling

224×224

320×320

416×416

640×640

1280×720

1920×1080

// Select hardware and model to continue

Summary

Configure inputs and run to see results.

Memory Breakdown

—

Planning Notes

Configure inputs to see planning recommendations.

Assumptions

Configure the system to see detailed assumptions.

// RELATED TOOLS

→ Tool 08: Inference Throughput Estimator → Tool 07: Module Power Calculator → Tool 06: Full Deployment Planner

JSON EXPORT

{ }

Estimate inference memory for your edge AI workload

// Hardware context

// Model configuration

Memory Planning for Edge AI Deployments

Unified memory vs. discrete VRAM

Quantization and activation memory

Related tools