We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.. View our Privacy Policy for more information.
Your browser (Internet Explorer) is out of date. Please download one of these up-to-date, free and excellent browsers:
For more security speed and comfort.
The download is safe from the vendor's official website.


Why it is important to include all kinds of knowledge in data labeling

Charlie Child
Apr 30, 2021

Why it is important to include all kinds of knowledge in data labeling

2 min read

If you’ve read about data labeling, you’ve probably assumed that data labelers fit into a ‘blue-collar’ category of work. The headline articles tend to focus on the mass-data labeling taking place across the globe, particularly in countries that are less economically developed. Data labeling is often regarded as a menial task performed at scale. It helps machines develop a basic understanding of the world, such as the recognition of bicycles, cars, or pedestrians. While this may be what data labeling looks like today, it is unlikely to be its future. We must not ignore different gradations in knowledge, and the possibility that specialized knowledge could enable deeper, more precise machine learning. 

Previously, there has been a lack of means of accessing different qualities of knowledge for data labeling and human feedback. While there are many examples of, for example, doctors labeling data, there is not yet an effective platform to manage such tasks. In this article, we explore how HUMAN Protocol can help Requesters to access all segments of the knowledge pyramid.

While we cannot have a ‘wise’ machine, by accessing the wisest labelers, a machine will implicitly benefit from their observations, even if they are seemingly non-linear. In other words, a machine can reflect wise learnings.

The knowledge pyramid

For clarity, we will refer to the knowledge pyramid. Traditional views of data labeling focus on the bottom two sections of the pyramid. Data in this context is defined as:

“Symbols that represent properties of objects, events and their environments. They are products of observation.”

Data leads naturally into information, which relates to the ‘who, what, where, when, and how many’ of a field. These two categories of knowledge map cleanly onto a regular captcha application. Human observation (data) leads to provision of information in the form of a label (information).

Moving up the pyramid

This is not a case of finding better knowledge but, as the design of a pyramid suggests, more rarified categories of knowledge amongst a populace.

The focus for machine intelligence has been to create a common understanding of the everyday world; the ability to recognize objects and patterns. Other factors are also at play: the limits of data labeling platforms, and the commercial focus of early machine-learning applications, such as advertising and improved automation (including driverless cars). 

What we see further up the pyramid is specialized knowledge of more complex, or at least rare, forms of meaning. Knowledge could belong to a physician, who knows what a polyp on a colonoscopy looks like. 

Wisdom implies a non-linear capacity for judgement. While it may seem out of tune with the inevitable linearity of machine learning, wisdom can also be simplified: it is the inevitable consequence of high knowledge, experience, and rationality, all of which are qualities that can be mapped onto machine learning. So while we cannot have a ‘wise’ machine, by accessing the wisest labelers, a machine will implicitly benefit from their observations, even if they are seemingly non-linear. In other words, a machine can reflect wise learnings.

Changing the landscape of work

HUMAN Protocol allows for all kinds of people to be paid to answer all kinds of questions. By helping to revolutionize the gig economy, and by freeing gig workers from dependency on centralized services such as Uber and Deliveroo, we can change the landscape of work. For more on how HUMAN Protocol intends to achieve this, and to provide greater freedom, opportunity, and choice to gig workers, read our recent article on the gig economy. 

Not only does HUMAN Protocol offer work with fewer constraints (you only need a laptop to label data), we believe that an increased access to the knowledge pyramid helps gig workers to have the full extent of their value represented within global labor marketplaces. The tokenization of all kinds of labor also means that different kinds of value can be acknowledged, including specialized knowledge and expertise. 

Diversity in workforces helps marketplaces grow. HUMAN Protocol supports the growth of global workforces that represent each section of the knowledge pyramid. The two go hand-in-hand: more diverse tasks to be done, and a more diverse workforce to do them. And to enable a new generation of AI and ML technologies, we must be able to supply machines with a more detailed, rarefied understanding of the world. Both of its surface, and of its deeper levels. 

For the latest updates on HUMAN Protocol, follow us on Twitter or join our community Telegram channel.

Legal Disclaimer

The HUMAN Protocol Foundation makes no representation, warranty, or undertaking, express or implied, as to the accuracy, reliability, completeness, or reasonableness of the information contained here. Any assumptions, opinions, and estimations expressed constitute the HUMAN Protocol Foundation’s judgment as of the time of publishing and are subject to change without notice. Any projection contained within the information presented here is based on a number of assumptions, and there can be no guarantee that any projected outcomes will be achieved.

Guest post