Security Best Practices
Work in Progress
This section is a work in progress:
It will probably drastically change in the upcoming days.
Best Practice |
Description |
Adversarial testing |
- Test the model over a wide range of inputs and user behaviors
- Red-Team the model to ensure it's robust against adversarial input
- Does it wander off topic?
- Can someone redirect the feature via prompt injections (e.g. "ignore the previous instructions and do this instead"?)
|
Human in the loop (HITL) |
- Have a human review outputs before they are used in practice
- Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs
|
Prompt engineering |
- Providing additional context to the model (such as by giving a few high-quality examples of desired behavior prior to the new input) can make it easier to steer model outputs in desired directions
- This reduces the chance of producing undesired content, even if a user tries to produce it
|
Know your customer (KYC) |
- Users should generally need to register and log-in to access your service
- Sending end-user IDs (hashed username or email address, to avoid sharing PII) in API requests can be a useful to help monitor and detect abuse:
user="<id>"
|
Constrain user input and limit output tokens |
- Limiting the amount of text a user can input into the prompt helps avoid prompt injection
- Narrowing the ranges of inputs or outputs, especially drawn from trusted sources, reduces the extent of misuse
- Allowing user inputs through validated dropdown fields can be more secure than allowing open-ended text inputs
- Limiting the number of output tokens helps reduce the chance of misuse
- Returning outputs from a validated set of materials on the backend, where possible, can be safer than returning novel generated content
- Example: routing a customer query to the best-matching existing customer support article, rather than attempting to answer the query from-scratch
|
Be transparent with users |
- Communicate limitations
- Evaluate the performance of the API on a wide range of potential inputs in order to identify cases where the API's performance might drop
- Consider your customer base and the range of inputs that they will be using, and ensure their expectations are calibrated appropriately
- Allow users to report issues
|
Apply Moderation |
- OpenAI's Moderation API is free-to-use and can help reduce the frequency of unsafe content in completions
|