How to anonymize text using ChatGPT

Creation date: 28/09/2023 14:15    Updated: 28/09/2023 14:15    anonymizing
One handy use case with ThinkAutomation and ChatGPT is data anonymization.

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

This can be useful when you need to record incoming emails, database updates, API requests, documents etc in a database - but ensure the data contains no personal information first.

In ThinkAutomation, you can use the ChatGPT Automation action to do this.

1. Add a ChatGPT action to your automation.
2. Select the 'Ask ChatGPT To Respond To A Prompt' operation.
3. Set the 'Prompt' text to:


Please anonymize the following text:

---
%Msg_Body%


Where %Msg_Body% is the ThinkAutomation variable containing the text you need to anonymize (in this case it would be the incoming message body plain text).

4. You do not need to set the 'Conversation Id' - since this is a one-time operation and not part of a 'conversation'.
5. Select the variable to receive the result from the 'Assign ChatGPT Response To' list.

The result will have any names, contact information, dates, locations, emails etc replaced with markers, such as '[Name]', '[Company]' etc.

You can then use the variable containing the result further in your Automation (eg: Save it to a database, CRM, use on an outgoing email etc).

You can adjust the instruction in the prompt text to fine tune anonymizing, for example:


Please anonymize the following text, leave web addresses and email addresses unchanged:

---
%Msg_Body%


Note: You need to ensure the text you send is below the token-limit for the OpenAI model you are using. The default is gpt-3.5-turbo-16k - which has a limit of 16384 tokens (which is roughly 75k - but this includes the response, so the text you send needs to be less than half of this to allow space for the response). One way around this would be to split the text into chucks and make multiple calls - then recombine afterwards.