Using Private Data for AI Initiatives
No one knows exactly how much private or proprietary data exists, but by all accounts it vastly outweighs the public data that has gone into the most popular LLMs. To use this data for tasks like training models or providing co-pilots, you need to have clear rights. This is true for identifying, tokenized, and anonymized data alike. Once you know why you’re allowed to take an action, you need to ensure that all the actors, especially in agentic environments with agents training agents, are behaving correctly.
One example blueprint, where Tranquil Data is deployed in AWS to ensure that Fine-Tuning is done with known, compliant sets of data.
Ensure You Have the Contracts and Consents
Any AI initiative starts with data. The better the data in, the better the experience. All organizations today are trying to green-light AI initiatives with their internal data, but there’s massive risk that you’ll mis-use that data. Tranquil Data ensures that as you stage data for training or Fine-Tuning (as in the blueprint on the left) only the right data is included. For instance, if you’re training a model to help with customer support, or to act as a co-pilot for specific care-journeys, then you want (respectively) to include experience about a product or a health condition. You don’t want to include a user’s call history or health data unless a Privacy Policy or individual Consent specifically allowed it, and you don’t want to include knowledge from a business partner unless the terms of your MSA explicitly allow AI purposes. With Tranquil Data’s redaction and transform capabilities your AI initiatives start with a compliant foundation.
Know the Context of Each Model
It’s not enough to ensure that the right data goes into a training process. If you want to be responsible, then you need to track each model and know its manifest: what purpose was asserted to allow data into the process, what categories of data were included, and which Users, B2B contracts, or Locations were input to the model. More to the point, it’s a Legal (not technical) team that needs to have this level of detail before they can approve any AI initiatives around this model. Tranquil Data uses its audit trail and change data capture capabilities to provide a simple dashboard with this detail. Now there’s a clear view of what, in practice, actually went into training and as a result an easy way to decide where this model can or cannot be applied, and which sets of data it should be able to access and share.
A dashboard view of the purpose asserted to fine-tune a model, and the categories of data that were used in the process.
Fine-tuning results in multiple agents running on a model, where each agent has a specific purpose they are allowed to assert, but the support agent can impersonate a user to personalize the engagement.
Give Agents the Appropriate Purpose
Each agent plays a specific role in your organization. Depending on that role, the agent should have access to different sets of data for different reasons. With Tranquil Data this can be automated by taking the purpose from a training flow and giving an agent only that ability, or by giving prompt engineers the ability to assert different purpose based on context. For instance, your operational data may be helpful to marketing as long as identifying or sensitive data isn’t included. An agent supporting marketing gets a security token that allows only a specific purpose that allows access to aggregate data only. Separately, another agent may be supporting a call-center. That agent should only have access to site-wide knowledge, except when a specific customer is talking with a bot or is on the phone with support, at which point the agent should be able to impersonate and access customer data. These two contexts will be captured as separate flows in the audit trail, so that later it’s possible to show that agents only accessed personal data on behalf of a given individual.