The Big (Data) Gender Bias
By Rebecca Sewall with the Creative Development Lab Team
May 4, 2021
Development implementers are using “big data” from cellphones and remittances to track the flow of refugees and migrants. Through this data, they can model the scope and scale of a “problem” and provide real-time information that enables program design teams to adapt interventions. It may even provide the details that are needed for evidenced-based development decision-making.
All of this is good; however, we must remember that such data is usually generated from a sub-set of the population. For example, women are not equally represented in these data samples because they are significantly less likely to be carrying mobile phones, be connected to the internet or active on social media. Therefore, the use of this data automatically excludes significant proportions of the population – not only women, but the elderly, the poor, certain ethnic groups – the list goes on. At this stage in global digital transformation, big data suffers from representation or sample bias and should be used with extreme caution to inform development decision-making.
Big data – particularly passive data such as social media and cellphone metadata or call detail records –is one of the primary model inputs or training data for Artificial Intelligence (AI) and Machine Learning (ML).
Fortunately, USAID has begun a much-needed discussion on gender bias in AI and its effect on development initiatives. Identifying the insidious ways gender bias creeps into the AI lifecycle and correcting this bias is crucial to ensure that future programming promotes inclusive development. This blog focuses on getting AI’s underlying data right.
Gender Bias in Development Data
Development actors are also producers of data sets via project data collection efforts. USAID now requires all project data be entered into its Development Data Library in “machine-readable” formats. One may logically assume that this data may be used in AI/ML applications. This would be a good thing if the data going into the AI applications were free from bias. However, development data has too long been plagued by gender bias.
The use of samples where men are disproportionately represented is so prevalent in our project–level data sets that many of us do not even notice it. There are practical reasons why this happens.
First, it is easier to reach men. In many developing countries, women are less present in public spaces – whether it be the result of their “time poverty” (they are much more likely to be at home, performing household chores, caring for children, the sick and the elderly) or due to gender norms. Second, women tend to be more protected. Social norms often restrict male–female interaction and make families hesitant to welcome interviewers into the home. These norms also make it more likely that a male will be present if an interview takes place, and much more likely that women will be less forthcoming in discussions where men are present. In larger group settings, women are more likely to be deferential to men.
Whether data is collected through traditional means or relying on big data sources, the bias in favor of men in sampling data is often left unchallenged. We justify using this data by telling ourselves that this is the best we can get. Thus, we provide space for gender bias to impact the way we interpret the data. Biased data leads to programming that meets the needs of a few (usually men)=and reinforces and propagates existing gender inequalities, all while giving the illusion of advancing development objectives.
Addressing Gender Bias in Using Big Data
Closing the digital divide may one day make big data less vulnerable to interpretive bias. In the meantime, it is our responsibility to first acknowledge the limitations of using big data in capturing on-the-ground reality, as well as reflecting the behavior of women and others who are less likely to be carrying mobile phones or using the internet.
We then need to avoid making programming decisions based solely on data that we know is biased in favor of a sub-group of the population. These are hardly big tech solutions, but if successfully implemented, they would minimize the potential for gender bias to creep into the way we use big data and lead to more inclusive programming decisions.
Correcting for the over representation of men in data sets should be done as a matter of data quality and best practice, just as it would be if there were other types of distortions in the data. The dataset needs to be reviewed to ensure that it is representative – in that it has equal numbers of males and females. If it is skewed in favor of men, then the project must collect more data from women to correct for the underrepresentation of women.
Taking positive steps
Today, we have increasingly granular understanding of the extent of the digital divide and digital access. We could apply this information to develop statistical pointers to enable data collectors or publishers to adjust for sampling bias in specific countries.
If nothing else, we should require that all datasets carry warning labels that clearly state what proportion of men and women are represented in the sample. The warning would caution those who use the data about its limitations and make it harder for development practitioners to “unconsciously” take data from a sub-set of the population and extrapolate the findings to the entire community.
In scenarios where data publishers and users are known to each other, there is another simple way to expose bias: to ensure that all parties – those who collect data and those who use it – develop a shared understanding of the data, its limitations and what conclusions can be drawn from it.
In my experience as a Gender and Inclusion Advisor, this exercise has several benefits, the most important of which is interrupting “business as usual” and the tendency to rely on biased data. This creates space for project teams to acknowledge the need to develop programming that is responsive to the needs and interests of those communities that are rendered invisible when using such data. Without such a process, project teams often do not recognize the need to consider women or gender at all – especially if they don’t consider it a “gender” project.
We also recommend the collaborative production of a Compendium on Gender Bias in AI for Development to expose where and how gender bias manifests and discuss the potential negative outcomes in development programming sectors. This can be helpful to prompt discussion and raise awareness of gender bias in data for development decision-making.
We do not have all the answers, but we do know that the more we un-pack and analyze data sources and inputs into AI/ML models, build in processes that force us to collectively explore how we interpret them and use them carefully to inform our decision-making, the more equitable our outcomes will be.
Rebecca Sewall is the Senior Advisor, Gender and Inclusion. Members of the Creative Development Lab, promotes and channels new strategies to address development challenges through science, technology and media, contributed to this blog.