ue8m0 data format

published: August 21, 2025 updated: August 24, 2025

ue8m0 is an 8-bit unsigned floating-point data format with 8 exponent bits and 0 mantissa bits, designed for microscaling applications in ai workloads.

technical specification

format characteristics

  • 8 exponent bits enabling power-of-two values from 2^-127 to 2^127
  • 0 mantissa bits - functions as pure scaling factor
  • unsigned format with no sign bit
  • nan represented as 0xff, no infinity support
  • must be used in packed format as ue8m0x2

the format was documented in nvidia’s ptx isa version 9.0 (august 2025).1

binary structure

standard fp8 (e5m2): [S][EEEEE][MM]
standard fp8 (e4m3): [S][EEEE][MMM]
ue8m0 format:        [EEEEEEEE]

microscaling applications

ue8m0 serves as a shared scaling factor for blocks of 32 elements in lower precision formats:2

  • mxfp8: 8-bit values with ue8m0 scale
  • mxfp6: 6-bit values with ue8m0 scale
  • mxfp4: 4-bit values with ue8m0 scale

this approach reduces memory bandwidth by up to 75% while maintaining acceptable accuracy for inference.3

for more on mxfp4 formats and gpt model requirements, see gpt-oss mxfp4 requirements.

hardware benefits

the zero-mantissa design simplifies hardware:

  • eliminates multiplication circuits
  • reduces silicon area
  • optimizes for scaling operations
  • aligns with 32-element tensor tiles

platform support

nvidia gpus

support introduced in ptx isa 9.0 (august 2025):1

  • supported: gb200 and newer
  • not supported: h100, h20, a100, older architectures

deepseek models

deepseek v3.1 (august 21, 2025) was trained using ue8m0 fp8 scale data format.4

deepeek explicitly stated that ue8m0 fp8 scale is “designed for the upcoming next-generation domestically produced chips,” indicating optimization for hardware from chinese manufacturers like cambricon and moore threads.6

implementation

def ue8m0_encode(value):
    """encode float as ue8m0"""
    if value == 0 or math.isnan(value):
        return 0xff if math.isnan(value) else 0x00

    exponent = int(math.log2(abs(value)))
    biased_exp = exponent + 127

    return max(0, min(254, biased_exp))

def ue8m0_decode(byte_value):
    """decode ue8m0 to float"""
    if byte_value == 0xff:
        return float('nan')
    if byte_value == 0x00:
        return 0.0

    exponent = byte_value - 127
    return 2.0 ** exponent

precision considerations

error tolerance increases from 1e-5 (standard fp8) to approximately 7e-4 with ue8m0.5 this makes it suitable primarily for inference rather than training.

the precision trade-off reflects strategic priorities: optimizing for inference workloads on domestically manufactured chips rather than pursuing training performance parity with nvidia hardware.

strategic significance

chinese hardware optimization

ue8m0 represents a technical differentiation strategy aligned with china’s ai hardware constraints:7

  • manufacturing reality: smic’s 7nm process limitations through 2026 drive efficiency-first optimizations
  • hardware simplification: eliminates mantissa multiplication circuits, reducing silicon area requirements
  • memory efficiency: up to 75% reduction in bandwidth usage enables competitive performance on domestic chips
  • inference focus: targets 90% of deployed ai workloads rather than training infrastructure

the format enables chinese companies to deploy large-scale inference using model weights developed domestically or obtained through other channels, decoupling inference capability from training dependency on us hardware.

market impact

announcement of deepseek v3.1 with ue8m0 optimization sparked significant market response in august 2025:8

  • cambricon shares surged 20% hitting daily trading limits
  • chinese ai chip stocks rallied on hardware-software co-optimization news
  • investors interpreted ue8m0 as breakthrough enabling domestic chip competitiveness
  • market positioned development as strategic response to us export controls

technical ecosystem

ue8m0 fits into the broader china ai hardware decoupling strategy alongside:

  • moore threads musa: cuda-compatible platform with ue8m0 support potential
  • cambricon mlu series: inference-optimized chips benefiting from memory efficiency
  • huawei ascend: datacenter processors with microscaling format compatibility
  • smic manufacturing: 7nm domestic production enabling format-specific optimizations

references

[1] nvidia. (2025, august 1). parallel thread execution isa version 9.0.

[2] darvish rouhani, b., et al. (2025, may 30). microscaling data formats for deep learning.

[3] autogpt. (2025, august 21). deepseek launches new model with domestic chips.

[4] deepseek ai. (2025, august 21). deepseek-v3.1 model card.

[5] deepseek ai. (2025). ue8m0 features regression issue #240.

[6] chinese ai startup deepseek releases upgraded model with domestic chip support | reuters.

[7] deepseek’s ue8m0 fp8: how it impacts scaling of ai models | medium.

[8] tech war: deepseek’s ue8m0 fp8 format sparks chinese chip rally | yahoo finance.

on this page