AI needs to be taught how to forget information. Here’s why!
You must have heard this at some point: Once something is on the internet, it can never really be deleted. A picture you were looking disheveled in or that inappropriate tweet someone you know had posted is now forever a part of your digital footprint. Even if you were to delete it, there is no guarantee that someone hasn’t taken a screenshot of it or hasn’t reposted it to some corner of the internet.
Artificial intelligence (AI) tools of all kinds are typically trained on data sourced from the internet. Data, which as we discussed above, isn’t something the average person would be all too proud to share. However, if the idea of an AI knowing everything about you sounds terrifying, don’t worry, machine unlearning is here to the rescue. Machine unlearning is an area of computer science that seeks to induce selective amnesia into AI tools so that they can forget specific people or chunks of information without adversely affecting the tool’s functionality. Let’s take a closer look at why machine learning matters and why it has been a challenge to achieve it.
Why do machines need to be trained to forget
One of the biggest reasons why AI needs to unlearn information is because of the rise in demand for the “right to be forgotten”. The right to be forgotten refers to the ability to get information about you removed from the internet under exceptional circumstances. This is especially relevant in industries such as healthcare where sensitive information is fed to an AI during the training process.
Some efforts are already being made to protect people’s right to be forgotten. For instance, in December 2022, the European Union gave its citizens the right to have any false information being spread about them on the internet removed from Google and other search engines.
The need for having sensitive information deleted isn’t just limited to individuals; companies and government bodies require it as well. If an AI tool gets hacked, the sensitive information that was used to train the AI could be leaked and misused, leading to disastrous consequences for the company that launches the AI.
As a result, companies are becoming vigilant about the information held by AI systems. For example, in 2021, the UK’s data regulator warned companies that their AI systems could be subjected to data deletion. Similarly, the same year, the U.S. Federal Trade Commission (FTC) made the cloud storage application Ever delete both user data and any algorithms trained on said data.
The challenges to machine unlearning
Now that we realize why machine unlearning is important, the next question is: How can machines be trained to unlearn? If you have ever tried to deliberately forget anything, you would know how hard it is to unlearn anything. It is just as hard to make a machine unlearn information. This is because once a piece of information is fed to an AI, there is no way of knowing where it resides inside the AI.
Moreover, we don’t know how a specific data point affects the AI overall. If a key piece of data is deleted, it could disrupt an AI’s entire system. Thus, if a system is required to forget some information, data scientists have to rebuild the AI from scratch.
Efforts towards machine unlearning
Even though the process has its fair share of challenges, there have been attempts to create a machine-learning system capable of forgetting information. For instance, researchers Yinzhi Cao and Junfeng Yang, who originally coined the term “machine unlearning”, devised an approach to remove a machine learning algorithm’s dependency on training data back in 2015. Their approach has yielded positive results on four machine learning systems.
Similarly, in 2019, researchers from the University of Toronto, Canada and the University of Wisconsin-Madison in the U.S. also came up with the idea of segregating the data fed to machine-learning systems into multiple parts. This way, each data point is processed separately, and if one of them has to be deleted later, it wouldn’t disrupt the functioning of others.
However, these approaches are still in their nascent stages. Neither of them addresses the concern of how to decide what should be remembered and what needs to be forgotten. Thus, it may take some time before machine unlearning becomes more commonplace. Till this happens, users of AI systems, be they individuals or companies, have to shoulder the responsibility of what sort of data they share with these systems in the first place.
Header image courtesy of Envato