Quantize#

Quantize operation converts a f32 tensor to a quantized (u8/s8) tensor. It supports both per-tensor and per-channel asymmetric linear quantization. Output data type is specified in output tensor data type. Rounding mode is library-implementation defined.

For per-tensor quantization:

\[\dst_{i} = round(\src_{i} / scale + zp)\]

For per-channel quantization, taking channel axis = 1 as an example:

\[\dst_{\cdots,i,\cdots,\cdots} = round(\src_{\cdots,i,\cdots,\cdots} / scale_i + zp_i), i \in {[0, ic-1]}\]

where \(ic\) is the number of channels.

Operation Attributes#

Attribute

Name

Description

Value Type

Supported

Values

Required or

Optional

qtype

Specifies which de-quantization type is used

string

per_tensor (default), per_channel

Optional

axis

Specifies dimension on which per-channel de-quantization is applied

s64

A s64 value in the range of [-r, r-1] where r = rank(src), 1 by default

Optional

scales

Scalings applied on the src data

f32

A f32 list (only contain one element if qtype is per_tensor)

Required

zps

Offset values that maps to float zero

s64

A s64 list (only contain one element if qtype is per_tensor)

Required

Execution Arguments#

The inputs and outputs must be provided according to the below index order when constructing an operation.

Inputs#

Index

Argument Name

Required or Optional

0

src

Required

Outputs#

Index

Argument Name

Required or Optional

0

dst

Required

Supported Data Types#

Quantize operation supports the following data type combinations.

Src

Dst

f32

s8, u8

@note This operation is to support int8 quantization model.