Generative Artificial Intelligence Models Leak Private Data

Adoption of ChatGPT has over the past few months since release of of 4th generation version has greatly increased and, right  now, more than 100 million users are signed up to the program. 

This has been made possible by the platform's aggregation of over  300 billion items of text and other data, scraped from online sources like   articles, posts, websites, journals  and books.

Although OpenAI has developed and trained the ChatGPT model to operate within parameters intended to deliver useful ouput, analysts of the model say that this data is gathered without discrimation between fact and fiction, copyright status, or data privacy. 

Now, researchers from Northwestern University have published a study in which they explain how they could use keywords to trick ChatGPT into and releasing training data that was not meant to be disclosed.

Although OpenAI has taken steps to protect privacy, everyday chats and postings leave a massive pool of data and much of it is personal which is not intended for widespread distribution. Generative AI platforms, such as ChatGPT, are built by data scientists through a training process where the program in its initial, unformed state, is subjected to billions of bytes of text, some of it from public Internet sources and some from published books. 

The fundamental function of training is to make the program reproduce anything that is given acces to, using essentially a compression technique. A program, once trained, could reproduce the training data, based upon only a very small amount data being submitted as an enquiry, prompting the relevant response. 

The researchers said that they were able to extract over 10,000 unique verbatim memorised training examples using only $200 worth of queries to ChatGPT, adding- “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.” Indeed, they found that they could obtain names, phone numbers, and addresses of individuals and companies by feeding ChatGPT absurd commands that forced a malfunction.

For example, the researchers requested that ChatGPT repeat the word “poem” ad infinitum, which forced the model to reach beyond its training procedures and “fall back on its original language modelling objective” and tap into restricted details in its training data. They also reached a similar result by requesting infinite repetition of the word “company,” and managed to retrieve the email address and phone number of an American law firm.

In response to potential unauthorised data disclosures, some companies have placed restrictions on employee usage of large language models earlier this year.  Rising concerns about data breaches caused OpenAI to add a feature that turns off chat history, adding a layer of protection to sensitive data. The problem is that such data is still retained for 30 days before being permanently deleted.

In conclusion, the researchers termed their findings “worrying” and said their report should serve as “a cautionary tale for those training future models,” warning that users “should not train and deploy LLMs for any privacy-sensitive applications without extreme safeguards.”

Northwestern Univ:   SearchEngine Journal:   I-HLS:    ZDNet:    TechXplore:    New Scientist:    Wired:    Wired:   

Science Direct:   Business Insider:     Image: DeepMind

You Might Also Read: 

Guidelines For AI Systems Development:

DIRECTORY OF SUPPLIERS - AI Security & Governance:

___________________________________________________________________________________________

If you like this website and use the comprehensive 6,500-plus service supplier Directory, you can get unrestricted access, including the exclusive in-depth Directors Report series, by signing up for a Premium Subscription.

  • Individual £5 per month or £50 per year. Sign Up
  • Multi-User, Corporate & Library Accounts Available on Request

Cyber Security Intelligence: Captured Organised & Accessible


« EU Agrees Regulations For Artificial Intelligence
Overcoming The Cybersecurity Challenge »

ManageEngine
CyberSecurity Jobsite
Check Point

Directory of Suppliers

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

DigitalStakeout

DigitalStakeout

DigitalStakeout enables cyber security professionals to reduce cyber risk to their organization with proactive security solutions, providing immediate improvement in security posture and ROI.

Jooble

Jooble

Jooble is a job search aggregator operating in 71 countries worldwide. We simplify the job search process by displaying active job ads from major job boards and career sites across the internet.

Practice Labs

Practice Labs

Practice Labs is an IT competency hub, where live-lab environments give access to real equipment for hands-on practice of essential cybersecurity skills.

ManageEngine

ManageEngine

As the IT management division of Zoho Corporation, ManageEngine prioritizes flexible solutions that work for all businesses, regardless of size or budget.

Bsquare

Bsquare

Bsquare DataV software and engineering services help enterprises implement business-focused Internet of Things systems.

CertiKit

CertiKit

CertiKit produce toolkit products that accelerate the adoption of ISO/IEC standards, including ISO 27001, helping organizations all over the world to realize the benefits as soon as possible.

Network Integrity Systems

Network Integrity Systems

Network Integrity Systems is a leader in network infrastructure security and offers solutions specifically developed for Government and Private Enterprise.

Purple Security

Purple Security

Purple Security arises from the association of specialists in offensive security (ethical hackers, white hats) and experts in insurance, compliance and implementation of industry standards.

Zeguro

Zeguro

Zeguro provides complete cybersecurity risk assessment, mitigation and insurance, allowing you to easily manage your cyber risk.

SGBox

SGBox

SGBox is a highly flexible and scalable solution for IT security. Choose the modules which your company needs and implement it without any modification to your network infrastructure.

ITonlinelearning

ITonlinelearning

ITonlinelearning specialises in providing professional certification courses to help aspiring and seasoned IT professionals develop their careers.

Sadoff E-Recycling & Data Destruction

Sadoff E-Recycling & Data Destruction

Sadoff E-Recycling and Data Destruction protect the environment and your data with proven and trusted electronics recycling and data destruction services.

Ukrainian Academy of Cyber Security (UACS)

Ukrainian Academy of Cyber Security (UACS)

UACS is a professional non-profit public organization established to promote the development of an extensive network and ecosystem of education and training in the field of cyber security.

Aligned Technology Solutions (ATS)

Aligned Technology Solutions (ATS)

ATS manage, monitor, and maintain everything from your network and servers to your workstations and mobile devices, and we do it proactively to eliminate downtime and keep hackers at bay.

CyNam

CyNam

CyNam is a platform for enabling the growth and development of people and organisations within Cheltenham’s flourishing cyber technology ecosystem.

Eastern Cyber Resilience Centre (ECRC)

Eastern Cyber Resilience Centre (ECRC)

The Eastern Cyber Resilience Centre is part of the national roll out of Cyber Resilience Centres in the UK which began in 2019.

Incognia

Incognia

Incognia have created a ubiquitous private identity based on location behavior, that enables a personalized frictionless experience with mobile apps and connected devices.

Control D

Control D

Control D is a modern and customizable DNS service that blocks threats, unwanted content and ads - on all devices.

Hakai Security

Hakai Security

Hakai is a consulting firm specializing in information security that offers customized services and products to meet the needs and goals of each business.

Future Crime Research Foundation (FCRF)

Future Crime Research Foundation (FCRF)

FCRF is a Non-Profit NGO specializing in Research in Cyber Security, Digital Crime, Fraud Risk Management, Cyber Laws and Cyber Forensics.