vllm.model_executor.layers.rotary_embedding.common ¶
   _flashinfer_rotary_embedding ¶
 _flashinfer_rotary_embedding(
    positions: Tensor,
    query: Tensor,
    key: Tensor,
    head_size: int,
    cos_sin_cache: Tensor,
    is_neox: bool,
) -> None
Custom op wrapper for flashinfer's rotary embedding.
This is an in-place operation that modifies query and key tensors directly.
Source code in vllm/model_executor/layers/rotary_embedding/common.py
   _flashinfer_rotary_embedding_fake ¶
     apply_rotary_emb_dispatch ¶
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 x  |   Tensor  |    [num_tokens, num_heads, head_size]  |  required | 
 cos  |   Tensor  |    [num_tokens, head_size // 2]  |  required | 
 sin  |   Tensor  |    [num_tokens, head_size // 2]  |  required | 
 is_neox_style  |   bool  |    Whether to use the Neox-style or GPT-J-style rotary positional embeddings.  |  required | 
Source code in vllm/model_executor/layers/rotary_embedding/common.py
   apply_rotary_emb_torch ¶
  Source code in vllm/model_executor/layers/rotary_embedding/common.py
   dispatch_rotary_emb_function  cached  ¶
 dispatch_rotary_emb_function(
    default: Callable[..., Tensor] | None = None,
) -> Callable[..., Tensor]
Source code in vllm/model_executor/layers/rotary_embedding/common.py
   rotate_gptj ¶
     rotate_neox ¶
     yarn_find_correction_dim ¶
 yarn_find_correction_dim(
    num_rotations: int,
    dim: int,
    base: float = 10000,
    max_position_embeddings: int = 2048,
) -> float
Source code in vllm/model_executor/layers/rotary_embedding/common.py
    yarn_find_correction_range ¶
 yarn_find_correction_range(
    low_rot: int,
    high_rot: int,
    dim: int,
    base: float = 10000,
    max_position_embeddings: int = 2048,
) -> tuple[int, int]