The Data Oath

The Data Oath

Inclusive Data

Inclusivity is about integrating as many types of people, genders, races, ethnicities, groups, preferences, and other factors into our data and models. It’s about how we ensure that we are creating models that reflect the real world and the users we are meant to serve through our work. It’s about diversity. It’s about working hard as a company, developer, or executive to ensure that you are working daily to include more people into your data and models.

Inclusivity is closely related to Equality in the Five Foundations of Ethical AI. Equality deals with how intelligent systems treat users in relation to other users, groups, genders, races, etc. If a user receives recommendations, suggestions, or outcomes that are not in alignment with other groups, the system is biased and would need to be retrained, redesigned, or taken offline permanently. It becomes readily apparent that for a system to be equitable in its treatment of users, it has to be trained on diverse and inclusive data that represents the users and communities it seeks to serve.

So how do we ensure that the data and models we are using are inclusive? It’s rather similar to how we verify if a system is equitable. Systems should be designed to be Transparent & Explainable so that the suggestions, recommendations, and outcomes can be clearly understood and tested for bias. Transparency also helps identify what information is missing in the system or what parts of the models are causing suboptimal outputs.

Systems should be designed with the users’ quality of life and equality of treatment/outcomes at the center of the design process. Can the user gain access to their data and contest incorrect information? Can a user help inform a company about other features that should be added to their data to make it more representative? Is the system designed to evolve and learn to better serve society from day one? We think any system that interacts with human users should be.

We also believe that there should be a registry of all the possible features that could make a data set more inclusive of everyone in regards to gender, race, ethnicity, education, socioeconomic, and the myriad of other factors that can be used to create a more inclusive data ecosystem. We believe that this is one of the projects we would like all of you to help us with at The Data Oath; help us start to define what should be included to make a data set inclusive when dealing with human users.

An inclusive data set primer could be used to help companies understand if users or groups are being left out of their data set. If users or groups were found to be missing, companies and development teams would know where to start the process of collecting more information and including these users and groups into the data sets and models.

Ultimately, inclusivity is something that has to be worked on by everyone from the users to the developers, to company executives trying to balance profits and doing the right thing. As laid out here at The Data Oath, we believe the Five Foundations of Ethical AI will provide a useful resource for teams to ask the right questions, find the right answers, and serve all of us better.