Democratizing data: why HUMAN Protocol is important to the world
Software is simply the encoding of human thought — Chris Dixon
You may not realize it, but every time you interact with an Artificial Intelligence product — such as photo tagging suggestions — you are benefitting from the unseen labor of hundreds of thousands, or even millions, of data labellers.
Traditionally, AI products are created by Google, Apple, Facebook, and Amazon (GAFA), and the data they use to train their products is consolidated within the company silos. GAFA create centralized services for billions of users, and harvest the generated data to train AI products. For other Machine Learning practitioners, it is difficult to access sufficient data for both training and quality control.
HUMAN Protocol democratizes data. Anyone can request data work from a diverse selection of distributed labor pools, amounting to hundreds of millions workers worldwide. It is the largest accessible labor market in the world, and it is all run by software.
We need to be careful about the data we use to train our machines. If the data does not account for a diverse, appropriate, or representative populace, the results, when applied to the algorithms that power AI products, can lead to unintended consequences.
In this article we will look at how HUMAN Protocol, through its decentralized, permissionless, and open platform, offers a solution.
The more people involved in the project, the more determined the consensus. An AI application learning from ten million data points will be more likely to produce a better product than one using only one million. This is because ML is a form of pattern recognition; the more data it has, the more accurate its pattern.
Therefore, when it comes to data, quantity is a quality of its own.
Data labelling is when a human creates an association between a word and a raw piece of data. For example: a doctor receives a request to label the cancerous growths on images of skin. The doctor would select some accurate growths, and possibly overlook a malignant mole. These ‘labelled’ images are now ready for ML; a machine can now be fed other images of skin, and spot cancerous growth.
What if the doctor works with predominantly white patients, and is therefore not so astute at recognising growths on other ethnicities? His bias is imprinted on the data, which will be fed back into the author’s algorithm. Now, the ‘progressive’ AI product does not recognise non-white skin cancer. It is worth remembering that:
An author is only as good as their data.
Of course, we may hope these issues will be ironed out through a trial, iterate, and improve model. But there is no guarantee of improvement. In fact, if data continues to be limited in quality and quantity, there is a risk of exacerbating biases, as AI models continue to work on historical data. The sooner we take hold of these data problems, the better. There is an industry-wide risk for negligence, as cited in the North Carolina Medical Journal:
“High-profile examples of harmful or inadequate performance will bring extra scrutiny on the whole field and may retard the further development of even more robust AI systems.”
HUMAN Protocol increases the quantity of labellers, so that any prejudice — and there will inevitably be prejudice in an imperfect society — is mitigated by the algorithm. A prejudice will more likely be an outlier, and therefore ignored, because a broader consensus can be found by the data.
Consensus can mitigate bias, but only if the participants form a diverse cross-section of society. This is the context in which HUMAN’s access to the largest distributed labor pool in the world offers a genuine solution.
HUMAN is not a silver bullet to fix all the problems facing AI. It does, however, offer an open, transparent, and equitable foundation to offer companies the choice and incentive to improve their data practices. Only then can we hope to improve data practices, and thereby create a fairer future for everyone.
The HUMAN Protocol Foundation makes no representation, warranty, or undertaking, express or implied, as to the accuracy, reliability, completeness, or reasonableness of the information contained here. Any assumptions, opinions, and estimations expressed constitute the HUMAN Protocol Foundation’s judgment as of the time of publishing and are subject to change without notice. Any projection contained within the information presented here is based on a number of assumptions, and there can be no guarantee that any projected outcomes will be achieved.