The furor surrounding ChatGPT remains at a fever pitch as the ins and outs of the AI chatbot’s potential continue to make headlines. One issue that has caught the attention of many in the security field is whether the technology’s ingestion of sensitive business data puts organizations at risk. There is some fear that if one inputs sensitive information — quarterly reports, materials for an internal presentation, sales numbers, or the like — and asks ChatGPT to write text around it, that anyone could gain information on that company simply by asking ChatGPT about it later.
On March 22, OpenAI CEO Sam Altman confirmed reports of a ChatGPT glitch that allowed some users to see the titles of other users’ conversations. On March 20, users began to see conversations appear in their history that they said they hadn’t had with the chatbot. Altman said the company feels “awful” but the “significant” error has now been fixed.
“We had a significant issue in ChatGPT due to a bug in an open-source library, for which a fix has now been released and we have just finished validating. A small percentage of users were able to see the titles of other users’ conversation history,” Altman said.
The implications of chatbots remembering and learning from user input could be far-reaching: Imagine working on an internal presentation that contained new corporate data revealing a corporate problem to be discussed at a board meeting. Letting that proprietary information out into the wild could undermine stock price, consumer attitudes, and client confidence. Even worse, a legal item on the agenda being leaked could expose a company to real liability. But could any of these things actually happen just from things put into a chatbot?
ChatGPT doesn’t store users’ input data — does it?
The UK’s National Cyber Security Centre (NCSC) shared further insight on the matter in March, stating that ChatGPT and other large language models (LLMs) do not currently add information automatically from queries to models for others to query. That is, including information in a query will not result in that potentially private data being incorporated into the LLM. “However, the query will be visible to the organization providing the LLM (so in the case of ChatGPT, to OpenAI),” it wrote.
“Those queries are stored and will almost certainly be used for developing the LLM service or model at some point. This could mean that the LLM provider (or its partners/contractors) are able to read queries and may incorporate them in some way into future versions,” it added. Another risk, which increases as more organizations produce and use LLMs, is that queries stored online may be hacked, leaked, or accidentally made publicly accessible, the NCSC wrote.
Ultimately, there is genuine cause for concern regarding sensitive business data being inputted into and used by ChatGPT.
Likely risks of inputting sensitive data to ChatGPT
LLMs exhibit an emergent behavior called in-context learning. During a session, as the model receives inputs, it can become conditioned to perform tasks based upon the context contained within those inputs. This is likely the phenomenon people are referring to when they worry about information leakage. However, it is not possible for information from one user’s session to leak to another’s. Another concern is that prompts entered into the ChatGPT interface will be collected and used in future training data.
Although it’s valid to be concerned that chatbots will ingest and then regurgitate sensitive information, a new model would need to be trained in order to incorporate that data. Training LLMs is an expensive and lengthy procedure, and he says he would be surprised if a model were trained on data collected by ChatGPT in the near future. If a new model is eventually created that includes collected ChatGPT prompts, our fears turn to membership inference attacks. Such attacks have the potential to expose credit card numbers or personal information that were in the training data. However, no membership inference attacks have been demonstrated against the LLMs powering ChatGPT and other similar systems. That means it’s extremely unlikely that future models would be susceptible to membership inference attacks, though it’s possible that the database containing saved prompts could be hacked or leaked.
Third-party linkages to AI could expose data
Issues are most likely to arise from external providers who do not explicitly state their privacy policies, so using them with otherwise secure tools and platforms can put any data that would be private at risk. SaaS platforms such as Slack and Microsoft Teams have clear data and processing boundaries and a low risk of data being exposed to third parties. However, these clear lines can quickly become blurred if the services are augmented with third-party add-ons or bots that need to interact with users, irrespective of whether they are linked to AI. In the absence of a clear explicit statement where the third-party processor guarantees that the information will not leak, you must assume it is no longer private.
Aside from sensitive data being shared by regular users, companies should also be aware of prompt injection attacks that could reveal previous instructions provided by developers when tuning the tool or make it ignore previously programmed directives. Recent examples include Twitter pranksters changing the bot’s behavior and issues with Bing Chat, where researchers found a way to make ChatGPT disclose previous instructions likely written by Microsoft that should be hidden.
To see the original full article, click here.