Machine Learning From Crowds: A Systematic Review Of Its Applications

by WIREs Authors | Oct 30, 2018

Crowdsourcing opens the door to solving a wide variety of problems that previously were unfeasible in the field of machine learning, allowing us to obtain relatively low cost labeled data in a small amount of time.

In their WIREs Data Mining and Knowledge Discovery review, authors Rodrigo, Aledo, and Gámez analyze a great number of applications dealing with crowdsourced data in different fields, such as Bioinformatics, Computer Vision, or Natural Language Processing. With the recent appearance of crowdsourcing platforms such as Amazon Mechanical Turk many machine learning practitioners have expressed interest in using them to increase the efficiency and scope of their work. Several problems that would be too expensive to deal with using traditional methods now become easier, while problems which were not feasible are now tractable.

In this way, crowdsourcing opens the door to solving a wide variety of problems that previously were unfeasible in the ﬁeld of machine learning, allowing scholars to obtain relatively low cost labeled data in a small amount of time. However, the use of crowdsourcing for data acquisition presents important challenges for machine learning, such as how to obtain the most capable contributors, or the most challenging examples, to allocate resources more efficiently. These problems become increasingly important as datasets grow, because more contributors are needed and less resources are available to be assigned to each data acquisition task.

In this paper, the authors provide information about several problems tackled using crowdsourcing. They mainly focus on the techniques used in these applications, as well as the ways in which crowds are used to solve each of the applications. They also analyze the interest in the field, in terms of the growing number of applications, as well as the most common platforms used for collecting data for the applications. Moreover, they present future lines of research regarding the most common necessities in the development of solutions when learning from crowdsourced data.

Kindly contributed by the Authors.