We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.. View our Privacy Policy for more information.
Your browser (Internet Explorer) is out of date. Please download one of these up-to-date, free and excellent browsers:
For more security speed and comfort.
The download is safe from the vendor's official website.

Blog:

Democratizing data: why HUMAN Protocol is important to the world

HUMAN Blog
Fundamentals
Charlie Child
Feb 11, 2021

Democratizing data: why HUMAN Protocol is important to the world

2 min read
Software is simply the encoding of human thought — Chris Dixon

You may not realize it, but every time you interact with an Artificial Intelligence product — such as photo tagging suggestions — you are benefitting from the unseen labor of hundreds of thousands, or even millions, of data labellers.

Traditionally, AI products are created by Google, Apple, Facebook, and Amazon (GAFA), and the data they use to train their products is consolidated within the company silos. GAFA create centralized services for billions of users, and harvest the generated data to train AI products. For other Machine Learning practitioners, it is difficult to access sufficient data for both training and quality control.

HUMAN Protocol democratizes data. Anyone can request data work from a diverse selection of distributed labor pools, amounting to hundreds of millions workers worldwide. It is the largest accessible labor market in the world, and it is all run by software.

We need to be careful about the data we use to train our machines. If the data does not account for a diverse, appropriate, or representative populace, the results, when applied to the algorithms that power AI products, can lead to unintended consequences.

In this article we will look at how HUMAN Protocol, through its decentralized, permissionless, and open platform, offers a solution.

Why do companies need more data?

The more people involved in the project, the more determined the consensus. An AI application learning from ten million data points will be more likely to produce a better product than one using only one million. This is because ML is a form of pattern recognition; the more data it has, the more accurate its pattern.

Therefore, when it comes to data, quantity is a quality of its own.

Data labelling is when a human creates an association between a word and a raw piece of data. For example: a doctor receives a request to label the cancerous growths on images of skin. The doctor would select some accurate growths, and possibly overlook a malignant mole. These ‘labelled’ images are now ready for ML; a machine can now be fed other images of skin, and spot cancerous growth.

Tackling bias

What if the doctor works with predominantly white patients, and is therefore not so astute at recognising growths on other ethnicities? His bias is imprinted on the data, which will be fed back into the author’s algorithm. Now, the ‘progressive’ AI product does not recognise non-white skin cancer. It is worth remembering that:

An author is only as good as their data.

Of course, we may hope these issues will be ironed out through a trial, iterate, and improve model. But there is no guarantee of improvement. In fact, if data continues to be limited in quality and quantity, there is a risk of exacerbating biases, as AI models continue to work on historical data. The sooner we take hold of these data problems, the better. There is an industry-wide risk for negligence, as cited in the North Carolina Medical Journal:

“High-profile examples of harmful or inadequate performance will bring extra scrutiny on the whole field and may retard the further development of even more robust AI systems.”

HUMAN Protocol increases the quantity of labellers, so that any prejudice — and there will inevitably be prejudice in an imperfect society — is mitigated by the algorithm. A prejudice will more likely be an outlier, and therefore ignored, because a broader consensus can be found by the data.

Consensus can mitigate bias, but only if the participants form a diverse cross-section of society. This is the context in which HUMAN’s access to the largest distributed labor pool in the world offers a genuine solution.

HUMAN is not a silver bullet to fix all the problems facing AI. It does, however, offer an open, transparent, and equitable foundation to offer companies the choice and incentive to improve their data practices. Only then can we hope to improve data practices, and thereby create a fairer future for everyone.

For the latest updates on HUMAN Protocol, follow us on Twitter or join our community Telegram channel.

Legal Disclaimer

The HUMAN Protocol Foundation makes no representation, warranty, or undertaking, express or implied, as to the accuracy, reliability, completeness, or reasonableness of the information contained here. Any assumptions, opinions, and estimations expressed constitute the HUMAN Protocol Foundation’s judgment as of the time of publishing and are subject to change without notice. Any projection contained within the information presented here is based on a number of assumptions, and there can be no guarantee that any projected outcomes will be achieved.

Guest post