UE8M0 data format

ue8m0 is an 8-bit unsigned floating-point data format with 8 exponent bits and 0 mantissa bits, designed for microscaling applications in ai workloads.

technical specification

format characteristics

8 exponent bits enabling power-of-two values from 2^-127 to 2^127
0 mantissa bits - functions as pure scaling factor
unsigned format with no sign bit
nan represented as 0xff, no infinity support
must be used in packed format as ue8m0x2

the format was documented in nvidia’s ptx isa version 9.0 (august 2025).¹

binary structure

standard fp8 (e5m2): [S][EEEEE][MM]
standard fp8 (e4m3): [S][EEEE][MMM]
ue8m0 format:        [EEEEEEEE]

microscaling applications

ue8m0 serves as a shared scaling factor for blocks of 32 elements in lower precision formats:²

mxfp8: 8-bit values with ue8m0 scale
mxfp6: 6-bit values with ue8m0 scale
mxfp4: 4-bit values with ue8m0 scale

this approach reduces memory bandwidth by up to 75% while maintaining acceptable accuracy for inference.³

for more on mxfp4 formats and gpt model requirements, see gpt-oss mxfp4 requirements.

hardware benefits

the zero-mantissa design simplifies hardware:

eliminates multiplication circuits
reduces silicon area
optimizes for scaling operations
aligns with 32-element tensor tiles

platform support

nvidia gpus

support introduced in ptx isa 9.0 (august 2025):¹

supported: gb200 and newer
not supported: h100, h20, a100, older architectures

deepseek models

deepseek v3.1 (august 21, 2025) was trained using ue8m0 fp8 scale data format.⁴

deepeek explicitly stated that ue8m0 fp8 scale is “designed for the upcoming next-generation domestically produced chips,” indicating optimization for hardware from chinese manufacturers like cambricon and moore threads.⁶

implementation

def ue8m0_encode(value):
    """encode float as ue8m0"""
    if value == 0 or math.isnan(value):
        return 0xff if math.isnan(value) else 0x00

    exponent = int(math.log2(abs(value)))
    biased_exp = exponent + 127

    return max(0, min(254, biased_exp))

def ue8m0_decode(byte_value):
    """decode ue8m0 to float"""
    if byte_value == 0xff:
        return float('nan')
    if byte_value == 0x00:
        return 0.0

    exponent = byte_value - 127
    return 2.0 ** exponent

precision considerations

error tolerance increases from 1e-5 (standard fp8) to approximately 7e-4 with ue8m0.⁵ this makes it suitable primarily for inference rather than training.

the precision trade-off reflects strategic priorities: optimizing for inference workloads on domestically manufactured chips rather than pursuing training performance parity with nvidia hardware.

strategic significance

chinese hardware optimization

ue8m0 represents a technical differentiation strategy aligned with china’s ai hardware constraints:⁷

manufacturing reality: smic’s 7nm process limitations through 2026 drive efficiency-first optimizations
hardware simplification: eliminates mantissa multiplication circuits, reducing silicon area requirements
memory efficiency: up to 75% reduction in bandwidth usage enables competitive performance on domestic chips
inference focus: targets 90% of deployed ai workloads rather than training infrastructure

the format enables chinese companies to deploy large-scale inference using model weights developed domestically or obtained through other channels, decoupling inference capability from training dependency on us hardware.

market impact

announcement of deepseek v3.1 with ue8m0 optimization sparked significant market response in august 2025:⁸

cambricon shares surged 20% hitting daily trading limits
chinese ai chip stocks rallied on hardware-software co-optimization news
investors interpreted ue8m0 as breakthrough enabling domestic chip competitiveness
market positioned development as strategic response to us export controls

technical ecosystem

ue8m0 fits into the broader china ai hardware decoupling strategy alongside:

moore threads musa: cuda-compatible platform with ue8m0 support potential
cambricon mlu series: inference-optimized chips benefiting from memory efficiency
huawei ascend: datacenter processors with microscaling format compatibility
smic manufacturing: 7nm domestic production enabling format-specific optimizations