Machine learning by intuition

Beyond explicit training, toward genuine learning

The process of using machine learning to edit images began with human volunteers viewing artificially generated images of human faces. Each participant was asked to perform tasks related to the image while their brain responses were recorded via electroencephalography (EEG). The tasks involved identifying a feature such as “smiling” or “old”. The EEG then recorded brain activity and the researchers looked for a characteristic spike in activity called P300 that occurs 300 milliseconds after a stimulus.

As Tuukka Ruotsalo, professor of computer science at the University of Helsinki and one of the authors of the paper, explained, “By using this [P300], we can then understand when something on the screen evoked a stronger effect than something else.”

The image a user saw and the P300 spike data are the input for two models: one which deciphers the brain signal and feature of interest and a second called a generative adversarial network (GAN), a powerful tool for generating unique images from a source database, a concept already used to combat scientific image fraud. Together, the two models learn to perform an editing task, based on what the user’s brain reacted too. When given a new image of a face it can then transform features like the smile or hair color.

The truly unique and exciting aspect of this method is that the machine is not trained explicitly to carry out the task. “The important thing is that the model itself doesn’t know anything about these tasks,” said Ruotsalo. The models learn, based on brain activity alone, what the task is. “These two models negotiate what it is that the humans react to and then they gain an understanding, in this case in the image space, to be able to do these transformations,” explained Ruotsalo.

Previous techniques train models to carry out specific functions like moving a cursor or to recognize features based on manually annotated databases. By allowing the computer to learn for itself based solely on the brain’s natural reaction vastly improves the adaptability and potential application of the model to anything a human reacts to. “You could think of being able to pick up certain words or music features, or anything,” said Ruotsalo.

A two-way street

While we can learn to apply this technique to a variety of applications, at the same time the machine is learning a great deal about us, which raises some important ethical considerations.

“I think we have to be very careful on bringing in these new signals to applications where they might be misused,” said Ruotsalo. Online life is already heavily monitored and adding more data, such as brain activity, to the ever-growing database of behaviors we exhibit online further removes our privacy if we let it. “I think it’s a broader discussion of how we allow and what we consent to be done with these signals that can be recorded from us,” added Ruotsalo.

The natural responses of our brain to stimuli we deem relevant or interesting is a powerful signal to harness. “We maybe wouldn’t like these signals to be used for advertising or other things,” said Ruotsalo. “I don’t want to see a world like that.”

The goal of this work, though, was a demonstration of both the potential and pitfalls. “We really want to demonstrate what’s possible, but at the same time, raise awareness that this technology is there,” he said.

This means thinking about the policies to curb misuse. “We, as academics, should explore the possibilities, but at the same time demonstrate that this can be done so it also calls for policies and ethical guidelines on how it can be used,” said Ruotsalo.

Reference; T. Ruotsalo et al. ‘Brain-Supervized Image Editing‘ CVPR (2022)