Returns a response for the chat conversation.
A list of messages comprising the conversation so far.
The maximum number of tokens that can be generated in the chat completion.
Sequences where the API will stop generating further tokens. Maximum of 4 sequences are supported. Defaults to none.
If set, partial message deltas will be sent. Defaults to false.
What sampling temperature to use. Value can be between 0 and 2 and defaults to 1.0. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. For openai: Between 0 and
The response for the generated chat completion.
Text response generated for the chat.