Fine Tuning ChatGPT on your Intercom support chat history
An experiment: Fine tuning a ChatGPT model on your customer support chat history to provide automatic replies or suggestions.
Introduction & Goal
Fine tuning a model has gotten significantly easier in ChatGPT 3 compared to GPT 2. All it really takes nowadays is for you to prepare a data set of prompts and replies and ChatGPT will be able to fine tune a model for you. I wanted to dig into how to fine tune a model myself, so I decided to pick up a little side project.
At Magicul.io we use Intercom for our support chat and weāve been doing so for the last 3 years. That means that all the replies that weāve manually sent out over the years are stored in our Intercom support chat history. Perfect material to train an AI model on, I was thinkingā¦ š¤
Well apparently Iām not the only one who had that idea. Intercom itself is working on their own AI chatbot called āFinā. Powered by ChatGPT 4 it looks very promising and I bet it will be. Apparently ChatGPT 3.5 had too many issues and wasnāt reliable enough to give direct answers to customers (read more about it here: https://www.intercom.com/blog/announcing-intercoms-new-ai-chatbot/). That being said, āFinā isnāt currently even available to the public. It doesnāt seem to be quite ready yet. You can sign up for a wait list, but even after the public release I bet Intercom is going to make this paid feature (duh) šŖ
So my idea was the following:
- Train my own model based on all the support chat answers weāve given manually over the last 3 years
- Donāt reply automatically to customer automatically but rather give highly accurate suggestions
- Ideally this will reduce support workload and make it easier to onboard new support agents
So I started to dig into itā¦
Requirements
- Have a big chat history in Intercom: In order to fine tune a ChatGPT model you need data, a lot of dataā¦ the more the better. So if youāre reading this article with the plan to implement this yourself you have to have a big data set.
- A paid OpenAI account: Since weāll be using OpenAIs ChatGPT youāll need a paid account. A quick heads up: fine tuning a model isnāt cheap, Iāve spent about $6-10 per fine tuning job.
Exporting your Intercom Chat History
Unfortunately thereās no easy way to export and download your Intercom chat history. I was really hoping it would be as easy as signing into the admin panel and hitting a download button.
After looking into this for a while I found a few GitHub repositories. Some of them came close to what I had in mind:
- https://github.com/TheArtling/intercom-conversations-export
- A Python script that exports your Intercom conversations as txt files
- https://github.com/toch/intercom-export
- A Ruby script to export your Intercom data
I decided to give the Python script a try first. After fiddled around with it for a while I managed to export our existing conversations as txt files. Great!
There was only a small problem: In order to fine tune the ChatGPT model later, we need to have the data as a JSONL file (yes, you read correctly, not JSON, but JSONL, see here: https://platform.openai.com/docs/guides/fine-tuning and https://jsonlines.org/). Additionally the data has to be in a prompt/completion format. Meaning that we have to prepare the data further and potentially cut out subsequent conversations parts after the initial customer question was answered.
Hereās an example of the structure that we need to fine tune the model later:
{"prompt": "Do you offer a free trial?", "completion": "No, but we have a 30 day money back guarantee, so you can go ahead and purchase one of our plans and always get your money back afterwards if youāre not happy."}
{"prompt": "Do you share any data with 3rd parties?", "completion": "No, we donāt share any data with any 3rd party. Our terms and conditions state that by using our service you solely grant us the right to access the data for the purpose of the conversion and nothing else."}
Given that we need a different output format and also potentially have to clean the exported Intercom data, I decided to write my own JavaScript script to export all conversations from Intercom.
Iāve uploaded the Intercom export script here: https://github.com/kgoedecke/intercom-exporter-node
Credit to @TheArtling, because my codebase is heavily influenced by their project.
A few words about what this script does, this is important since the data we need is not directly what comes back from the API:
- It retrieves all conversations from your Intercom account
- It then filters for only conversations that are customer initiated
- It then removes all replies except for the VERY FIRST one, this is critical since we donāt want the whole conversation
- It then filters out empty replies and only keeps replies which were written by an admin/support agent
- Last step: It removes all HTML tags from the reply
Letās go ahead and see what this nifty script can do for us in practice.
In order to get started you need to get an Intercom API access token. Head over to the Intercom developer hub and get your token (see: https://developers.intercom.com/building-apps/docs/authentication-types#access-tokens).
Once youāve got it, simply clone the Intercom Conversation Export repository from GitHub:
git clone https://github.com/kgoedecke/intercom-exporter-node
Afterwards copy the .env.example
file to .env
and change the values in it.
For example your .env file should look like this:
INTERCOM_TOKEN = 2LDJwyV9S7HsKxAxLx7b2Yv3H0nSeuh='
OUTPUT_PATH = '/Users/kevin/Downloads'
Install all dependencies with yarn
and then you can go ahead and simply run `yarn start` to get the conversation export process started. This should print out something like the following
yarn start
yarn run v1.22.19
$ node src/index.js
Next page URL: https://api.intercom.io/conversations?starting_after=WzE2ODE5MjY5NjkwMDAsMTYyNTAzODAwMjMzMzg4LDJd
Retrieving 2 of 485
Next page URL: https://api.intercom.io/conversations?starting_after=WzE2ODE4NDM3MjIwMDAsMTYyNTAzODAwMjMzNDc1LDNd
Retrieving 3 of 485
Once it has finished youāll find a file called output.jsonl
in the directory youāve specified in your .env
file.
Go ahead and open it to check whatās in it.
Here are a few lines of what our file looks like:
{"prompt":"Hey wondering if your program works for Keynote to Figma too?","completion":"Hi there,\n\nThanks for reaching out. Unfortunately, we don't support Keynote to Figma conversion. Nonetheless, you can try our Powerpoint to Figma converter, since Keynote offers an option to export files as PowerPoint.\n\nI hope this information is helpful. Please don't hesitate to contact us if you have any further queries or concerns.\n\nBest regards,\nLokman"}
{"prompt":"Hi, the canva to figma converter appears to still be down. Could someone please convert this manually for me? Thanks","completion":"Hi there,\n\nSorry for the trouble. Yes, unfortunately, our Canva to Figma production server is down. We will convert your file locally and send you the file as soon as possible.\n\nBest regards,\nLokman"}
{"prompt":"Hi Support,I recently purchased a subscription at Magicul and already have two issues. Maybe you can assist.1. How do I download the converted file?I took the following steps:I uploaded a Canva file, then paid for the subscription, and logged in.Clicked Files.I see the copy (Download your converted files here) and see the converted file below.I see the option to Convert (again) but not download.2. It seems as though the same file was converted twice, so now I have one less credit. Thanks for your help.","completion":"Hi there,\n\nSorry for the inconvenience. Would you please send me the link to your Canva design? It seems the file did not convert due to a bug.\n\nLet me know if you have any questions.\n\nRegards,\nLokman"}
{"prompt":"can i get a trial of one file one artboard to see how well magicul converts from XD to Figma?","completion":"Hi there, \n\nYour converted file is ready. Please click here to download your file.\n\nLet us know if there's any issue.\n\nRegards,\nLokman"}
Cleaning your Chat History Data
Just by looking at the data that weāve exported from Intercom itās already quite obvious that weāll have to do some cleaning here.
Here are some of the issues with the raw data:
- Data may contain sensitive information that should not be feed into ChatGPT
- Incomplete answers should be deleted
- For now our export only contains the very first reply, if the support agent just replied with āHello, how can I helpā then itās not really helpful
All those things can be potentially implemented in the Intercom conversation export script. To keep this tutorial short Iāve manually cleaned our dataset by removing sensitive data and useless replies.
Fine Tuning the ChatGPT model
So letās fine tune our ChatGPT model with the data weāve exported from Intercom.
In order to do so weāll be using the OpenAI Command Line Interface (CLI).
I would recommend you follow my tutorial on how to get the OpenAI CLI set up on your computer: https://kevingoedecke.com/2023/04/14/how-to-install-openai-cli-on-macos/
To verify that youāve got it properly setup, simply run the following command:
openai --help
This should give you back a set of available command from the OpenAI CLI.
The first thing weāll have to do is prepare the JSONL data. The OpenAI CLI has a command that takes care of that.
openai tools fine_tunes.prepare_data -f /Users/kevin/Downloads/output.jsonl
In my case this prompted me with the following:
openai tools fine_tunes.prepare_data -f /Users/kevin/Downloads/output.jsonl
Analyzing...
- Your file contains 1648 prompt-completion pairs
- Your data does not contain a common separator at the end of your prompts. Having a separator string appended to the end of the prompt makes it clearer to the fine-tuned model where the completion should begin. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples. If you intend to do open-ended generation, then you should leave the prompts empty
- Your data does not contain a common ending at the end of your completions. Having a common ending string appended to the end of the completion makes it clearer to the fine-tuned model where the completion should end. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples.
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details
Based on the analysis we will perform the following actions:
- [Recommended] Add a suffix separator `\n\n###\n\n` to all prompts [Y/n]: Y
- [Recommended] Add a suffix ending ` END` to all completions [Y/n]: Y
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y
Say yes (type āYā) for all optimization suggestions.
Afterwards weāre ready to properly train our model with our Intercom exported chat history.
Execute the following (Note: make sure to use the cleaned jsonl file from the step before):
openai api fine_tunes.create -t "/Users/kevin/Downloads/output_prepared (1).jsonl"
This might take a while depending on how big your data set is.
After a bit of waiting around I got prompted with this:
$ openai api fine_tunes.create -t
/Users/kevin/Downloads/output_prepared (1).jsonl -m davinci
Upload progress: 100%
Uploaded file from /tmp/output. jsonl: file-XXX
Created fine-tune: ft-XXX
Streaming events until fine-tuning is complete...
(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-04-18 14:58:17] Created fine-tune: ft-XXX
[2023-04-18 14:59:16] Fine-tune costs $6.94
As you can see it cost me $6.94 to fine tune my ChatGPT model with my existing Intercom records. This is obviously not cheap. The price highly depends on how many records you feed into the fine tuning and also which underlying model you use. I used `davinci`, but according to other posts out there you might be able to achieve similar results using a cheaper model like `ada`.
Alright so now that weāve finally managed to fine tune our model letās give it a spin š„
Results – How well does it work?
You can now go ahead and try your fine tuned model out by shooting some prompts to the ChatGPT API via the CLI. To my surprise I was also able to see my fine tune model even in the ChatGPT playground (https://platform.openai.com/playground).
Letās feed it some questions, shall weā¦
Most of our customer service requests are one of the following:
- Someone wants to know if we offer a free trial
- Someone has questions about one of our plans
- Someone has a problem with a file conversion (we offer a design file converter)
Here are some of the results are training ChatGPT.
It was interesting to see that ChatGPT tried to become one of our customer support agents (Lokman) and it was even more funny to see it giving away a 50% discount on the first payment šøHAHA! Am I really ok with this š
Hereās another very impressive one:
It seems to have managed to pick up which exact file formats our design file converter supports and in which plan it is included. Kudos ChatGPT, not bad at all!
This one is really impressive too, since it actually picked up very specific things about our business.
Generally speaking I was very impressed with the results. That being said it also gave some very bad answers. I think this is partly because of the input data on which we fine tuned the model.
Have a look at this one:
Here it pretty much replied to the customer with another question. Very odd.
Cost
Obviously cost is a big factor in all this. I ended up spending about $7 to fine tune our model. This can get significantly more expensive if your data set is bigger.
The requests afterwards follow the standard OpenAI ChatGPT pricing. As of writing this for ChatGPT 3.5 this is $0.002 / 1K tokens.
Where to take it from here?
There is so much room for improvement here. I think if taken more seriously it would easily be possible to improve the fine tuned model by cleaning up the input data further. It was very impressive to see how well ChatGPT can be trained on a data set that isnāt gigantic.
Additionally itās possible to filter out certain replies using sentiment checks or other techniques.
Another interesting thing to look into would be to search through existing answers and rewriting them using ChatGPT. An approach here could be to turn all answers into embeddings, storing them in a vector storage and then upon a customer inquiry search over vectors storage similar answers. The results could then be passed into a ChatGPT prompt.
I donāt think a fine-tuned ChatGPT bot would be able directly serve as a customer support agent, but I definitely think it would make a lot of sense to give suggestions on how to reply to customers.
Open Source Code / Repos
You can find the code for the Intercom exporter here: https://github.com/kgoedecke/intercom-exporter-node
[…] mind-blowing š¤Æ. Additionally you could feed in data from platforms like Jira, Intercom (see my other blog post) and directly ask […]