Always wanting more makes humans better learners

Pursuit of happiness vs. survival

The researchers used a computer simulation, called reinforcement learning, wherein learning in virtual robots was driven by rewards. Within a video game-like setup, the robots were incentivized with positive motivations to reinforce desired behaviors and negative ones to reinforce aversion to others. This simulation format mimics human behavior where we are impelled to carry out actions that bring us closer to benefits, while other actions help us steer clear of harm.

In this experiment, the virtual robot’s reward depended on its performance in comparison to both other robots and its own prior expectations. The robots were placed in grid worlds with food, poison, and other impediments, such as sinkholes or walls. Robots that collected the most food without stumbling into obstacles in a certain duration of time were considered to be successful.

Robots that quickly learned their environments and found the food when placed anywhere within the virtual environment were good performers. Since these were simulations, the researchers could tweak the virtual environments without much effort. For instance, the robot’s behavior could be tested in environments with either generous or scarce amounts of food.

“Across several different environments, we found that the agent whose reward depended on prior expectations and comparisons was more ‘unhappy’ but paradoxically, it performed significantly better compared to the agent whose reward function didn’t depend on these features,” said Dubey.

In situations where robots were employing comparison, they were more likely to explore their surroundings. In environments with fewer food options or where the grid world randomly changed, previous expectations were a helpful feature. But while performance improved, discontent grew.

Beyond a certain point, comparisons had a detrimental effect on performance. “In our experiments, if an agent constantly compared itself to very lofty standards, then it was very unhappy and performed very poorly,” said Dubey. Constant comparison and a plethora of similar options led to dissatisfaction, negatively impacting the robot’s performance. The researchers suggest that limiting options may make decision making easier, even raising happiness levels.

Whether these findings translate to the more complex real world remains to be seen. Moreover, the confluence of other states like guilt, jealousy, boredom, and anxiety with happiness was not assessed in this study.

Since habituation and comparison are helpful in learning and exploration, they may explain why humans repeatedly crave an upgrade. “This might explain our modern obsession with growth at all costs and why our consumption levels have increased so dramatically and are not showing any signs of slowing down,” said Dubey.