Inference Speed Calculator Online - Free

Inference Speed Calculator

Calculate AI model inference speed in tokens per second across different hardware configurations. Compare LLM throughput on GPU, TPU, and CPU with batch size optimization. Essential for AI deployment planning, cost estimation, and production serving infrastructure.

Inference Speed

Speed = (Memory Bandwidth / Bits per Token) × Utilization

Variables:

SpeedSpeed (tokens/second)
Speed (tokens/second)
BWMemory bandwidth (GB/s)
Memory bandwidth (GB/s)
BitsBits per token (FP16=16, INT8=8)
Bits per token (FP16=16, INT8=8)
UtilUtilization (0.6-0.9)
Utilization (0.6-0.9)

How to Use

1
Select Model
Choose model and quantization (FP16, INT8, etc.).
2
Select GPU
Choose the GPU being used.
3
Calculate
Get tokens per second.

Examples

Llama 3 8B on RTX 4090

Problem:

FP16, BW=1008GB/s, Util=0.8. Speed?

Solution:

1.Speed = (1008 / 16) × 0.8
2.Speed ≈ 50 tokens/s

Result:≈ 50 tokens/second

An RTX 4090 can run Llama 3 8B at about 50 tokens per second.

Frequently Asked Questions

Does batch size matter?

Larger batches increase throughput but reduce responsiveness.

FP16 vs INT8?

INT8 is about 2× faster but accuracy drops slightly.

Related Calculators

LLM VRAM

VRAM requirements

Training Cost

Training cost

References

LLM inference