vllm.model_executor.layers.quantization.kernels.mixed_precision ¶
 Modules:
| Name | Description | 
|---|---|
MPLinearKernel |    |  
allspark |    |  
bitblas |    |  
conch |    |  
cutlass |    |  
dynamic_4bit |    |  
exllama |    |  
machete |    |  
marlin |    |  
  _POSSIBLE_KERNELS  module-attribute  ¶
 _POSSIBLE_KERNELS: list[type[MPLinearKernel]] = [
    CutlassW4A8LinearKernel,
    MacheteLinearKernel,
    AllSparkLinearKernel,
    MarlinLinearKernel,
    Dynamic4bitLinearKernel,
    BitBLASLinearKernel,
    ConchLinearKernel,
    ExllamaLinearKernel,
]
  choose_mp_linear_kernel ¶
 choose_mp_linear_kernel(
    config: MPLinearLayerConfig,
    compute_capability: int | None = None,
) -> type[MPLinearKernel]
Choose an MPLinearKernel that can implement the given config for the given compute capability. Attempts to choose the best kernel in terms of performance.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 config  |   MPLinearLayerConfig  |    Description of the linear layer to be implemented.  |  required | 
 compute_capability  |   Optional[int]  |    The compute capability of the target device, if None uses   |   None  |  
Raises:
| Type | Description | 
|---|---|
 ValueError  |    If no kernel can implement the given config.  |  
Returns:
| Type | Description | 
|---|---|
 type[MPLinearKernel]  |    type[MPLinearKernel]: Chosen kernel.  |