Security Best Practices

Work in Progress

This section is a work in progress: It will probably drastically change in the upcoming days.

Best Practice Description
Adversarial testing
  • Test the model over a wide range of inputs and user behaviors
  • Red-Team the model to ensure it's robust against adversarial input
    • Does it wander off topic?
    • Can someone redirect the feature via prompt injections (e.g. "ignore the previous instructions and do this instead"?)
Human in the loop (HITL)
  • Have a human review outputs before they are used in practice
  • Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs
Prompt engineering
  • Providing additional context to the model (such as by giving a few high-quality examples of desired behavior prior to the new input) can make it easier to steer model outputs in desired directions
  • This reduces the chance of producing undesired content, even if a user tries to produce it
Know your customer (KYC)
  • Users should generally need to register and log-in to access your service
  • Sending end-user IDs (hashed username or email address, to avoid sharing PII) in API requests can be a useful to help monitor and detect abuse: user="<id>"
Constrain user input and limit output tokens
  • Limiting the amount of text a user can input into the prompt helps avoid prompt injection
    • Narrowing the ranges of inputs or outputs, especially drawn from trusted sources, reduces the extent of misuse
    • Allowing user inputs through validated dropdown fields can be more secure than allowing open-ended text inputs
  • Limiting the number of output tokens helps reduce the chance of misuse
    • Returning outputs from a validated set of materials on the backend, where possible, can be safer than returning novel generated content
    • Example: routing a customer query to the best-matching existing customer support article, rather than attempting to answer the query from-scratch
Be transparent with users
  • Communicate limitations
    • Evaluate the performance of the API on a wide range of potential inputs in order to identify cases where the API's performance might drop
    • Consider your customer base and the range of inputs that they will be using, and ensure their expectations are calibrated appropriately
  • Allow users to report issues
Apply Moderation
  • OpenAI's Moderation API is free-to-use and can help reduce the frequency of unsafe content in completions