What inference engines does this tool support?
The estimator supports NVIDIA TensorRT, PyTorch, ONNX Runtime, Google Coral Edge TPU SDK, and Hailo Runtime. Runtime availability depends on the selected hardware platform — unsupported runtimes are disabled in the selector.
What is estimated FPS?
Estimated frames per second — how many inference passes the hardware can complete per second for the selected model, precision, and runtime. Higher FPS is better for real-time inference.
What is the difference between latency/batch and latency/image?
Latency/batch is the time to process a full batch of frames. Latency/image divides that by batch size — the per-frame processing time. For real-time streaming, latency/image is the relevant metric.
What does the confidence score mean?
High (90%): exact published vendor benchmark. Medium (65%): interpolated from GFLOPs across known variants. Low (40%): theoretical TOPS heuristic with no benchmark data. Always validate Low-confidence estimates on device.
Why is TensorRT so much faster than PyTorch?
TensorRT performs layer fusion, precision calibration, and kernel auto-tuning at build time — extracting 1.5–2.5× more throughput than vanilla PyTorch inference on Jetson hardware. The build step (trtexec) takes minutes but runs once.
What is DLA (Deep Learning Accelerator)?
DLA is a fixed-function neural network processor on Jetson Orin NX and AGX Orin (2 DLA cores each). It runs supported layers alongside the GPU, freeing GPU headroom for other tasks. Not all YOLO11 ops are DLA-compatible; unsupported layers fall back to GPU automatically.
How accurate are these estimates?
Benchmark-backed estimates are ±10–15% of real measured throughput under similar conditions. GFLOPs-interpolated: ~65% accurate. Theoretical TOPS: planning-only (±30–50%). Always measure on target hardware before finalising a deployment design.