vllm.model_executor.layers.quantization.tpu_int8 ¶
   Int8TpuConfig ¶
  Bases: QuantizationConfig
Int8 Quantization Config class for TPU Backend.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
   __init__ ¶
 __init__(activation_scheme: str = 'none') -> None
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
    from_config  classmethod  ¶
 from_config(config: dict[str, Any]) -> Int8TpuConfig
  get_config_filenames  staticmethod  ¶
     get_name ¶
 get_name() -> QuantizationMethods
  get_quant_method ¶
 get_quant_method(
    layer: Module, prefix: str
) -> Optional[TPUInt8LinearMethod]
  TPUInt8LinearMethod ¶
  Bases: LinearMethodBase
Int8 Linear method for TPU Quant.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
   __init__ ¶
 __init__(quant_config: Int8TpuConfig)
  _quantize_weight ¶
  Source code in vllm/model_executor/layers/quantization/tpu_int8.py
   apply ¶
  Source code in vllm/model_executor/layers/quantization/tpu_int8.py
   create_weights ¶
 create_weights(
    layer: Module,
    input_size_per_partition: int,
    output_partition_sizes: list[int],
    input_size: int,
    output_size: int,
    params_dtype: dtype,
    **extra_weight_attrs,
)
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
   process_weights_after_loading ¶
 process_weights_after_loading(layer: Module) -> None