Data leakage, a cybersecurity concern, has been present since the beginning of humans. This is the unintentional or intentional transfer of restricted information into the wrong hands, such as company secrets and Personally Identifiable Information (PII) restricted by regulatory policies like HIPAA and GDPR. All industries, regardless of size, have been dealing with this problem, and many attempts have been made to limit or block data leakage, but have been unsuccessful according to Cymulate's Annual Usage Report. OpenAI's generative AI platform, ChatGPT, has created a human-like AI interface that can answer complex questions accurately and learn from interactions.
However, this technology poses a threat to sensitive and confidential data since users feed the system with information daily, some of which may be PII or company confidential data shared by unaware users. While OpenAI warns against sharing sensitive or confidential data, it is challenging to prevent users from accidentally violating laws, regulations, and corporate policies. The AI may ingest and learn from this information, potentially using it when answering unrelated user questions. Though OpenAI is the leading vendor of this type of AI, it is not the only organization providing this platform, resulting in various data lakes originating from numerous queries and sources that may not be intended to be shared with the world.
Despite OpenAI's warning to users, it is almost impossible to prevent them from accidentally or purposefully sharing controlled information. Several countries, including Italy and Syria, have prohibited ChatGPT's use due to OpenAI's inability to keep users from sharing controlled information and its challenges in enforcing regulations like GDPR. As a result, organizations concerned about data leakage through ChatGPT must limit their exposure to accidental or purposeful data use. Technical options are limited, and implementing them can be challenging and costly. Blocking all known ChatGPT websites on company networks through a firewall or proxy that filters by domain and/or IP is the most straightforward approach, though not perfect.
Advanced Data Loss Prevention (DLP) and Cloud Access Security Broker (CASB) systems may be of limited help. Since the data in question is effectively a chat box, monitoring all user communications from corporate networks to the outside world presents several challenges, such as monitoring communications over TLS-encrypted (SSL) website sessions and potential privacy issues. The most straightforward method of controlling data leakage is user education and guidance, though it is limited by user compliance. Proper guidance on the impact of data sharing with AI interfaces like ChatGPT can go a long way in stemming inadvertent data leaks via what looks like a harmless chat-bot. It is crucial to consult with technical and legal advisors to ensure compliance with the law and the limits of technology, and continuous testing with platforms like Cymulate to ensure effective functioning is necessary.