The OpenAI API allows users to access OpenAI’s large language models and harnesses the power of generative artificial intelligence. The OpenAI API helps users create more dependable and controlled outputs from LLMs. This can be achieved by designing input prompts effectively and utilizing hyperparameters of the LLMs to regulate the output’s deterministic behavior.
The OpenAI API provides users with two main text completion endpoints: chat/completions
and completions
. The chat/completions
endpoint allows users to generate text output from a single prompt or create a multi-prompt chat interaction. On the other hand, the completions
endpoint generates text completion output using a single input prompt.
In addition to these text completion endpoints, the OpenAI API also offers other endpoints that serve different purposes.
models
endpoint allows users to list the available models and provides information about their owners and permissions. images
endpoint enables users to create and modify images using a text prompt. audio
endpoint allows users to generate text based on a given audio file.moderations
endpoint is designed to analyze text and determine if it violates OpenAI’s content policy. This can be particularly useful when developers want to verify if a user’s input complies with the policy before passing it to the API.The OpenAI API provides users with access to various large language models, including GPT-4, GPT-3.5, and text-davinci-003. The newer models, such as GPT-3.5 and GPT-4, can be accessed through the chat/completions
endpoint, while older models like text-davinci-003 are accessible via the completions
endpoint.
One important factor for measuring the performance of these models is the number of parameters they possess. Parameters refer to the tuned values used by the neural networks within the language models. These parameters play a crucial role in enabling the models to process the given input and generate the desired output.
chat/completions
EndpointWhen utilizing the OpenAI API’s chat/completions
endpoint, the response includes the completion information encapsulated within the message
parameter. Inside message
is an object containing the completion’s role
, specified as assistant
, and the content
, which holds the actual completion text.
This message
object can be effectively combined with the user
role in the input prompt, creating few-shot prompting for future inputs. This means that the completion response can be incorporated as part of ongoing conversation or interaction, allowing for a more interactive and dynamic use of the API.
The responses from the chat/completions
and completions
endpoints of the OpenAI API share common data important for handling the output correctly. This common data includes:
id
: A unique identifier associated with each completion.object
: Indicates the specific endpoint used for the completion (chat/completions
or completions
).created
: The timestamp indicating when the completion was generated.model
: Specifies the model used for the completion.choices
: A list of completions that were generated. This list can contain multiple completion outputs.usage
: Provides information about token usage, including the number of tokens used for input, output, and the total number of tokens.index
: The index of the completion in the list of responses. This is particularly relevant for the completions
endpoint, as multiple completions can be requested.finish_reason
: Specifies why the completion process terminated, with possible values such as stop
, length
, function call
, content filter
, or null
.Understanding this common data is crucial for correctly handling and interpreting the output received from the OpenAI API.
presence_penalty
HyperparameterThe OpenAI API incorporates a hyperparameter known as presence_penalty
that affects the probability of output tokens based on their presence in the generated output. Tokens already appearing in the output are assigned a lower probability due to the influence of the presence_penalty
.
When the presence_penalty
is increased, the output is more likely to branch into new topics related to the initial topic with less likelihood of repeating tokens already chosen for the output. This can be particularly useful in research or brainstorming scenarios, where the goal is to generate output that provides related information without redundantly using common terms.
chat/completions
Endpoint Input PromptThe chat/completions
endpoint of the OpenAI API supports four different roles when submitting an input prompt. These roles provide additional control and flexibility in defining chat behavior:
system
: This role is used once and helps guide the chatbot’s output towards specific topics or themes, influencing the overall behavior of the conversation.user
: This role can be used independently as a single prompt, similar to the behavior of the completions
endpoint. It can also be paired with the assistant
role to define chat behavior. In this way, it allows for few-shot prompting, where users can provide chat examples and specify how the chat should respond in future conversations.assistant
: This role is used with the user
role to provide example replies to preceding user inputs. It aids in few-shot prompting, allowing users to guide the chatbot’s responses based on specific chat examples.function
: This advanced role allows users to format the output according to specified function parameters, providing more precise control over the output generated by the API.By incorporating a combination of system prompts, user prompts, assistant prompts, and function parameters, users can have more control over the generated text output. Prompt engineering remains important, but these additional input options make the chat/completions
endpoint a powerful choice for text completion tasks.
temperature
HyperparameterThe OpenAI API incorporates a hyperparameter known as temperature
that affects the computation of token probabilities when generating output through the large language model. The temperature value ranges from 0 to 2, with lower values indicating greater determinism and higher values indicating more randomness.
When the temperature is set to a lower value, the probability distribution of tokens becomes narrower and taller. This means that a few tokens will have significantly higher probabilities than others. On the other hand, when the temperature is set to a higher value, the probability distribution becomes flatter. This leads to token probabilities that are closer in value to each other.
top_p
HyperparameterThe OpenAI API incorporates a hyperparameter known as top_p
that determines the portion of the highest probability tokens to select from. The “top_p” value ranges from 0 to 2, where lower values increase determinism, and higher values increase randomness.
When the top_p
value is set to a lower value, the model will choose tokens from a smaller percentage of the highest probabilities. This narrows down the selection range and leads to more deterministic outputs. Conversely, when the top_p
value is set to a higher value, the model will select tokens from a larger number of tokens with various probabilities. This expands the selection range and introduces more randomness in the output.
frequency_penalty
HyperparameterThe OpenAI API incorporates a hyperparameter called frequency_penalty
that influences the probability of output tokens based on their frequency of occurrence in the generated output. Tokens that have already appeared more frequently in the output are subject to a greater probability reduction due to the frequency penalty’s impact.
Increasing the frequency_penalty
will decrease the likelihood of repeated words and phrases in the output. This is particularly useful in scenarios where variety in the output is desired while maintaining focus on the topic at hand.
Large language models process text by breaking it down into units called tokens, which can represent either whole words or fragments of words. As a rough guideline, around 750 words can be roughly estimated to be equivalent to 1000 tokens.
It is important to note that when using the OpenAI API, costs are determined based on the number of tokens processed by each model. This means that you are charged for every 1000 tokens processed. In terms of cost-effectiveness, the GPT-3.5-turbo model, which can be accessed through the chat/completions
endpoint, is considered to be the most economical option.
The process of prompt engineering involves creating input prompts specifically designed to generate the desired and optimal output from large language models. The effectiveness of prompt engineering relies on crafting input prompts that are both descriptive and token-efficient. This can be achieved by using various strategies, whether creating a single input prompt with either endpoint or employing few-shot prompting with the chat/completions
endpoint. Here are a few recommended approaches for creating effective prompts:
By employing these prompt engineering strategies, users can enhance the performance of large language models and obtain more reliable and targeted outputs.