// Decision Engine 08

Estimate inference throughput for your edge AI workload

Estimate real-world inference throughput for vision models on edge AI hardware. Configure runtime, precision, batch size, and concurrent streams to compare FPS, per-image latency, and maximum stream capacity. Supports NVIDIA Jetson, Google Coral, and Hailo platforms.

// Hardware context

01
AI Hardware Platform
// Select your primary hardware vendor
Loading hardware catalog…
02
Module / Accelerator
// Specific hardware module within the platform
// Select a platform first
03
Power Mode
// Operating power envelope
// Select a module first
04
Runtime / Framework
// Inference engine — availability depends on hardware

// Model configuration

06
Model Family
// Primary inference model architecture
07
Model Variant
// Specific model size within the family
// Select a model family first
08
Precision
// Model precision affects throughput and memory directly
09
Batch Size
// Concurrent inference batch — 1 = real-time, higher = throughput
1
2
4
8
16
// Larger batches improve throughput but increase latency
10
Concurrent Streams
// Number of camera or video streams processed in parallel
1
2
4
8
16
32
11
Input Resolution
// Frame resolution — affects compute and activation memory
224×224
320×320
416×416
640×640
1280×720
// Select hardware, runtime, and model to continue
Quick Results
Estimated FPS
Latency / batch
Latency / image
Accelerator util.
Multi-Stream Capacity
FPS per stream
Max streams @ 30fps
Max streams @ 15fps
Total FPS (all streams)
Planning Notes
Configure inputs to see planning recommendations.
Assumptions
Configure the system to see detailed assumptions.
// RELATED TOOLS
→ Tool 07: Module Power Calculator → Tool 06: Full Deployment Planner
JSON EXPORT
{ }
FAQ
What inference engines does this tool support?

The estimator supports NVIDIA TensorRT, PyTorch, ONNX Runtime, Google Coral Edge TPU SDK, and Hailo Runtime. Runtime availability depends on the selected hardware platform — unsupported runtimes are disabled in the selector.

What is estimated FPS?

Estimated frames per second — how many inference passes the hardware can complete per second for the selected model, precision, and runtime. Higher FPS is better for real-time inference.

What is the difference between latency/batch and latency/image?

Latency/batch is the time to process a full batch of frames. Latency/image divides that by batch size — the per-frame processing time. For real-time streaming, latency/image is the relevant metric.

What does the confidence score mean?

High (90%): exact published vendor benchmark. Medium (65%): interpolated from GFLOPs across known variants. Low (40%): theoretical TOPS heuristic with no benchmark data. Always validate Low-confidence estimates on device.

Why is TensorRT so much faster than PyTorch?

TensorRT performs layer fusion, precision calibration, and kernel auto-tuning at build time — extracting 1.5–2.5× more throughput than vanilla PyTorch inference on Jetson hardware. The build step (trtexec) takes minutes but runs once.

What is DLA (Deep Learning Accelerator)?

DLA is a fixed-function neural network processor on Jetson Orin NX and AGX Orin (2 DLA cores each). It runs supported layers alongside the GPU, freeing GPU headroom for other tasks. Not all YOLO11 ops are DLA-compatible; unsupported layers fall back to GPU automatically.

How accurate are these estimates?

Benchmark-backed estimates are ±10–15% of real measured throughput under similar conditions. GFLOPs-interpolated: ~65% accurate. Theoretical TOPS: planning-only (±30–50%). Always measure on target hardware before finalising a deployment design.