Generative Artificial Intelligence Models Leak Private Data

Adoption of ChatGPT has over the past few months since release of of 4th generation version has greatly increased and, right  now, more than 100 million users are signed up to the program. 

This has been made possible by the platform's aggregation of over  300 billion items of text and other data, scraped from online sources like   articles, posts, websites, journals  and books.

Although OpenAI has developed and trained the ChatGPT model to operate within parameters intended to deliver useful ouput, analysts of the model say that this data is gathered without discrimation between fact and fiction, copyright status, or data privacy. 

Now, researchers from Northwestern University have published a study in which they explain how they could use keywords to trick ChatGPT into and releasing training data that was not meant to be disclosed.

Although OpenAI has taken steps to protect privacy, everyday chats and postings leave a massive pool of data and much of it is personal which is not intended for widespread distribution. Generative AI platforms, such as ChatGPT, are built by data scientists through a training process where the program in its initial, unformed state, is subjected to billions of bytes of text, some of it from public Internet sources and some from published books. 

The fundamental function of training is to make the program reproduce anything that is given acces to, using essentially a compression technique. A program, once trained, could reproduce the training data, based upon only a very small amount data being submitted as an enquiry, prompting the relevant response. 

The researchers said that they were able to extract over 10,000 unique verbatim memorised training examples using only $200 worth of queries to ChatGPT, adding- “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.” Indeed, they found that they could obtain names, phone numbers, and addresses of individuals and companies by feeding ChatGPT absurd commands that forced a malfunction.

For example, the researchers requested that ChatGPT repeat the word “poem” ad infinitum, which forced the model to reach beyond its training procedures and “fall back on its original language modelling objective” and tap into restricted details in its training data. They also reached a similar result by requesting infinite repetition of the word “company,” and managed to retrieve the email address and phone number of an American law firm.

In response to potential unauthorised data disclosures, some companies have placed restrictions on employee usage of large language models earlier this year.  Rising concerns about data breaches caused OpenAI to add a feature that turns off chat history, adding a layer of protection to sensitive data. The problem is that such data is still retained for 30 days before being permanently deleted.

In conclusion, the researchers termed their findings “worrying” and said their report should serve as “a cautionary tale for those training future models,” warning that users “should not train and deploy LLMs for any privacy-sensitive applications without extreme safeguards.”

Northwestern Univ:   SearchEngine Journal:   I-HLS:    ZDNet:    TechXplore:    New Scientist:    Wired:    Wired:   

Science Direct:   Business Insider:     Image: DeepMind

You Might Also Read: 

Guidelines For AI Systems Development:

DIRECTORY OF SUPPLIERS - AI Security & Governance:

___________________________________________________________________________________________

If you like this website and use the comprehensive 6,500-plus service supplier Directory, you can get unrestricted access, including the exclusive in-depth Directors Report series, by signing up for a Premium Subscription.

  • Individual £5 per month or £50 per year. Sign Up
  • Multi-User, Corporate & Library Accounts Available on Request

Cyber Security Intelligence: Captured Organised & Accessible


« EU Agrees Regulations For Artificial Intelligence
Overcoming The Cybersecurity Challenge »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

CSI Consulting Services

CSI Consulting Services

Get Advice From The Experts: * Training * Penetration Testing * Data Governance * GDPR Compliance. Connecting you to the best in the business.

BackupVault

BackupVault

BackupVault is a leading provider of automatic cloud backup and critical data protection against ransomware, insider attacks and hackers for businesses and organisations worldwide.

Syxsense

Syxsense

Syxsense brings together endpoint management and security for greater efficiency and collaboration between IT management and security teams.

Alvacomm

Alvacomm

Alvacomm offers holistic VIP cybersecurity services, providing comprehensive protection against cyber threats. Our solutions include risk assessment, threat detection, incident response.

The PC Support Group

The PC Support Group

A partnership with The PC Support Group delivers improved productivity, reduced costs and protects your business through exceptional IT, telecoms and cybersecurity services.

Exabeam

Exabeam

Exabeam is a global cybersecurity leader that delivers AI-driven security operations.

ComCERT

ComCERT

ComCERT SA is an independent, private consulting company focusing in the assistance of its customers facing the dangers of cyber threats and security incidents.

Veracity Industrial Networks

Veracity Industrial Networks

Veracity provides an innovative industrial network platform that improves the reliability, efficiency, and security of industrial networks and devices.

Brighter AI

Brighter AI

Brighter AI empowers companies to use publicly-recorded camera data for analytics & AI while being compliant with increasing data privacy regulations worldwide.

Sweepatic

Sweepatic

The Sweepatic reconnaissance platform discovers and analyses all internet facing assets and their exposure to risk.

DeuZert

DeuZert

DeuZert is an accredited German certification body in accordance with ISO/IEC 27001 (Information Security Management).

National Cybersecurity Preparedness Consortium (NCPC)

National Cybersecurity Preparedness Consortium (NCPC)

The mission of the NCPC is to provide research-based, cybersecurity-related training, exercises and technical assistance to local jurisdictions, counties, states and the private sector.

ditno

ditno

ditno uses machine learning to help you build a fully governed and micro-segmented network. Dramatically mitigate risk and prevent lateral movement across your organisation – all from one centralised

Adit Ventures

Adit Ventures

Adit Ventures is a venture capital firm with a focus on dynamic growth sectors including AI & Machine Learning, Big Data, Cybersecurity and IoT.

CYOSS

CYOSS

CYOSS, an ESG Group company, is a specialist in Cyber Security and Data Analytics. We focus on the opportunities of a networked world and make security risks manageable.

Ostendio

Ostendio

Ostendio is a cybersecurity and information management solutions provider that develops affordable compliance solutions for digital health companies and other regulated entities.

ViewQwest

ViewQwest

ViewQwest is a regional telecommunications & information technology services company. We specialize in providing Connectivity, Managed Network, Managed SD-WAN, and Managed Security solutions.

Vaultinum

Vaultinum

Vaultinum are a trusted independent third party specialized in the protection and audit of digital assets.

Federal Bureau of Investigation (FBI)

Federal Bureau of Investigation (FBI)

The mission of the FBI is to protect and defend against intelligence threats, uphold and enforce criminal laws, and provide criminal justice services.

AKS iQ

AKS iQ

AKS iQ leads the RegTech sector with AI, automating regulatory compliance in the banking industry and ensuring paperless TBML and CFT adherence in finance.

Aura Information Security

Aura Information Security

Aura Information Security consists of a team of highly-skilled and renowned information security professionals spanning Australia and New Zealand.