OpenAI Security Best Practices

Work in Progress

This section is a work in progress: It will probably drastically change in the upcoming days.

Best Practice	Description
Adversarial testing	Test the model over a wide range of inputs and user behaviors Red-Team the model to ensure it's robust against adversarial input Does it wander off topic? Can someone redirect the feature via prompt injections (e.g. "ignore the previous instructions and do this instead"?)
Human in the loop (HITL)	Have a human review outputs before they are used in practice Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs
Prompt engineering	Providing additional context to the model (such as by giving a few high-quality examples of desired behavior prior to the new input) can make it easier to steer model outputs in desired directions This reduces the chance of producing undesired content, even if a user tries to produce it
Know your customer (KYC)	Users should generally need to register and log-in to access your service Sending end-user IDs (hashed username or email address, to avoid sharing PII) in API requests can be a useful to help monitor and detect abuse: `user="<id>"`
Constrain user input and limit output tokens	Limiting the amount of text a user can input into the prompt helps avoid prompt injection Narrowing the ranges of inputs or outputs, especially drawn from trusted sources, reduces the extent of misuse Allowing user inputs through validated dropdown fields can be more secure than allowing open-ended text inputs Limiting the number of output tokens helps reduce the chance of misuse Returning outputs from a validated set of materials on the backend, where possible, can be safer than returning novel generated content Example: routing a customer query to the best-matching existing customer support article, rather than attempting to answer the query from-scratch
Be transparent with users	Communicate limitations Evaluate the performance of the API on a wide range of potential inputs in order to identify cases where the API's performance might drop Consider your customer base and the range of inputs that they will be using, and ensure their expectations are calibrated appropriately Allow users to report issues
Apply Moderation	OpenAI's Moderation API is free-to-use and can help reduce the frequency of unsafe content in completions

OpenAI Safety best practices ↩