Selective forgetting can help AI learn better

Selective forgetting can help AI learn better

The original version of this story appeared in Quanta Magazine.

A team of computer scientists has created a more agile and flexible type of machine learning model. The trick: he must periodically forget what he knows. And while this new approach won’t replace the huge models that underpin the biggest applications, it could reveal more about how these programs understand language.

The new research marks “a significant advance in the field,” said Jea Kwon, an AI engineer at South Korea’s Institute of Basic Sciences.

The AI ​​language engines used today are mostly powered by artificial neural networks. Each “neuron” in the network is a mathematical function that receives signals from other similar neurons, performs calculations, and sends signals through multiple layers of neurons. Initially, the flow of information is more or less random, but through training, the flow of information between neurons improves as the network adapts to the training data. If an AI researcher wanted to create a bilingual model, for example, he or she would train the model with a large stack of texts from both languages, which would adjust the connections between neurons in a way that links text in one language with equivalent texts. words in the other.

But this training process requires a lot of computing power. If the model doesn’t work very well or the user’s needs subsequently change, it is difficult to adapt it. “Let’s say you have a model that has 100 languages, but imagine that a language you want isn’t covered,” said Mikel Artetxe, co-author of the new research and founder of AI startup Reka. “We could start from scratch, but that’s not ideal. »

Artetxe and his colleagues tried to get around these limitations. A few years ago, Artetxe and others trained a neural network in a single language, then erased what it knew about the building blocks of words, called tokens. These are stored in the first layer of the neural network, called the embedding layer. They left all the other layers of the model alone. After clearing the tokens from the first language, they retrained the model on the second language, which populated the embedding layer with new tokens from that language.

Even though the model contained incompatible information, the retraining worked: the model was able to learn and process the new language. The researchers hypothesized that while the embedding layer stored information specific to the words used in the language, the deeper levels of the network stored more abstract information about the concepts behind human languages, which then helped the model learn the language. second language.

Seniors and caregivers demand smart monitoring to age safely at home

Seniors and caregivers demand smart monitoring to age safely at home

Exclusive: Snow Brand Australia confirms SafePay ransomware attack

Exclusive: Snow Brand Australia confirms SafePay ransomware attack

Leave a Reply

Your email address will not be published. Required fields are marked *