What is estimated FPS?

Estimated frames per second — how many inference passes the hardware can complete per second for the selected model, precision, and runtime. Higher FPS is better for real-time inference.

What is the difference between latency/batch and latency/image?

Latency/batch is the time to process a full batch of frames. Latency/image divides that by batch size — the per-frame processing time. For real-time streaming, latency/image is the relevant metric.

What does the confidence score mean?

High (90%): exact published vendor benchmark. Medium (65%): interpolated from GFLOPs across known variants. Low (40%): theoretical TOPS heuristic with no benchmark data. Always validate Low-confidence estimates on device.

Why is TensorRT so much faster than PyTorch?

TensorRT performs layer fusion, precision calibration, and kernel auto-tuning at build time — extracting 1.5–2.5× more throughput than vanilla PyTorch inference on Jetson hardware. The build step (trtexec) takes minutes but runs once.

What is DLA (Deep Learning Accelerator)?

DLA is a fixed-function neural network processor on Jetson Orin NX and AGX Orin (2 DLA cores each). It runs supported layers alongside the GPU, freeing GPU headroom for other tasks. Not all YOLO11 ops are DLA-compatible; unsupported layers fall back to GPU automatically.

How accurate are these estimates?

Benchmark-backed estimates are ±10–15% of real measured throughput under similar conditions. GFLOPs-interpolated: ~65% accurate. Theoretical TOPS: planning-only (±30–50%). Always measure on target hardware before finalising a deployment design.

What inference engines does this tool support?

The estimator supports NVIDIA TensorRT, PyTorch, ONNX Runtime, Google Coral Edge TPU SDK, and Hailo Runtime. Runtime availability depends on the selected hardware platform — unsupported runtimes are disabled in the selector.

// Decision Engine 08

Estimate inference throughput for your edge AI workload

Estimate real-world inference throughput for vision models on edge AI hardware. Configure runtime, precision, batch size, and concurrent streams to compare FPS, per-image latency, and maximum stream capacity. Supports NVIDIA Jetson, Google Coral, and Hailo platforms.

// Hardware context

AI Hardware Platform

// Select your primary hardware vendor

Loading hardware catalog…

Module / Accelerator

// Specific hardware module within the platform

// Select a platform first

Power Mode

// Operating power envelope

// Select a module first

Runtime / Framework

// Inference engine — availability depends on hardware

// Model configuration

Model Family

// Primary inference model architecture

Model Variant

// Specific model size within the family

// Select a model family first

Precision

// Model precision affects throughput and memory directly

Batch Size

// Concurrent inference batch — 1 = real-time, higher = throughput

// Larger batches improve throughput but increase latency

Concurrent Streams

// Number of camera or video streams processed in parallel

Input Resolution

// Frame resolution — affects compute and activation memory

224×224

320×320

416×416

640×640

1280×720

// Select hardware, runtime, and model to continue

Quick Results

Estimated FPS —

Latency / batch —

Latency / image —

Accelerator util. —

—

Multi-Stream Capacity

FPS per stream —

Max streams @ 30fps —

Max streams @ 15fps —

Total FPS (all streams) —

Planning Notes

Configure inputs to see planning recommendations.

Assumptions

Configure the system to see detailed assumptions.

// RELATED TOOLS

→ Tool 07: Module Power Calculator → Tool 06: Full Deployment Planner

JSON EXPORT

{ }