vllm的SamplingParams参数

vllm部署示例

代码语言：javascript复制

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="facebook/opt-125m")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}"

参数列表

代码语言：javascript复制

n: Number of output sequences to return for the given prompt.
best_of: Number of output sequences that are generated from the prompt.
    From these `best_of` sequences, the top `n` sequences are returned.
    `best_of` must be greater than or equal to `n`. This is treated as
    the beam width when `use_beam_search` is True. By default, `best_of`
    is set to `n`.
presence_penalty: Float that penalizes new tokens based on whether they
    appear in the generated text so far. Values > 0 encourage the model
    to use new tokens, while values < 0 encourage the model to repeat
    tokens.
frequency_penalty: Float that penalizes new tokens based on their
    frequency in the generated text so far. Values > 0 encourage the
    model to use new tokens, while values < 0 encourage the model to
    repeat tokens.
repetition_penalty: Float that penalizes new tokens based on whether
    they appear in the prompt and the generated text so far. Values > 1
    encourage the model to use new tokens, while values < 1 encourage
    the model to repeat tokens.
temperature: Float that controls the randomness of the sampling. Lower
    values make the model more deterministic, while higher values make
    the model more random. Zero means greedy sampling.
top_p: Float that controls the cumulative probability of the top tokens
    to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
top_k: Integer that controls the number of top tokens to consider. Set
    to -1 to consider all tokens.
min_p: Float that represents the minimum probability for a token to be
    considered, relative to the probability of the most likely token.
    Must be in [0, 1]. Set to 0 to disable this.
use_beam_search: Whether to use beam search instead of sampling.
length_penalty: Float that penalizes sequences based on their length.
    Used in beam search.
early_stopping: Controls the stopping condition for beam search. It
    accepts the following values: `True`, where the generation stops as
    soon as there are `best_of` complete candidates; `False`, where an
    heuristic is applied and the generation stops when is it very
    unlikely to find better candidates; `"never"`, where the beam search
    procedure only stops when there cannot be better candidates
    (canonical beam search algorithm).
stop: List of strings that stop the generation when they are generated.
    The returned output will not contain the stop strings.
stop_token_ids: List of tokens that stop the generation when they are
    generated. The returned output will contain the stop tokens unless
    the stop tokens are special tokens.
include_stop_str_in_output: Whether to include the stop strings in output
    text. Defaults to False.
ignore_eos: Whether to ignore the EOS token and continue generating
    tokens after the EOS token is generated.
max_tokens: Maximum number of tokens to generate per output sequence.
logprobs: Number of log probabilities to return per output token.
    Note that the implementation follows the OpenAI API: The return
    result includes the log probabilities on the `logprobs` most likely
    tokens, as well the chosen tokens. The API will always return the
    log probability of the sampled token, so there  may be up to
    `logprobs 1` elements in the response.
prompt_logprobs: Number of log probabilities to return per prompt token.
skip_special_tokens: Whether to skip special tokens in the output.
spaces_between_special_tokens: Whether to add spaces between special
    tokens in the output.  Defaults to True.
logits_processors: List of functions that modify logits based on
    previously generated tokens.

prompt text 部署 model output

0 人点赞