Nov 6, 2023

Privacy in AI Systems

AI systems make privacy questions sharper because data can influence behavior in less obvious ways. Source documents, prompts, logs, embeddings, feedback, and training examples all deserve attention.

I try to start with minimization. Does the system need this data? Does the model need to see raw text, or would a derived representation work? Should this content be retrievable for every user? How long should prompts and responses be retained?

Permissions are especially important in retrieval systems. If the underlying data access rules are unclear, the AI interface can become a very fast way to expose mistakes.

Good AI product engineering needs privacy engineering close to the architecture, not only at launch review.

Privacy Across the AI Lifecycle

AI systems create privacy questions at multiple stages. Ingestion decides what knowledge enters the system. Indexing decides how it is represented. Retrieval decides what can be surfaced. Prompting decides what context is sent to a model. Logging decides what is kept after the interaction.

Each stage needs boundaries. A document that is safe for one employee may not be safe for another. A prompt log that is useful for debugging may contain sensitive user input. A feedback example that improves evaluation may not be appropriate for long-term retention.

I have become especially careful about observability data. Logs are necessary for operating AI products, but they can quietly become a second data platform. They need retention, access controls, redaction where appropriate, and clear purpose.

The privacy mindset I trust most is not “collect everything and lock it down later.” It is “collect what we can justify, protect it by default, and make downstream use explainable.” That is how AI systems earn room to improve without becoming reckless.

Designing for Least Surprise

A useful privacy test is whether the system would surprise a reasonable user or internal stakeholder. Would they expect this document to be retrievable in this context? Would they expect prompts to be logged? Would they expect feedback to be reused for evaluation?

Least surprise is not a replacement for policy, but it is a good design instinct. It pushes teams to be clear about data use, retention, and access. It also highlights places where product UX and infrastructure policy need to align.

AI systems make this alignment important because the interface can feel flexible. Users may type sensitive information into a box because the product invites natural language. The platform needs guardrails around what happens next.

Privacy engineering in AI is not only about preventing bad outcomes. It is about making the system’s data behavior understandable enough that users and teams can trust it.