Tackling Gender Bias Using Analytics

Using SQLite and Arctype visualizations to explore a US Bureau of Labor Statistics dataset with the goal of learning more about the gender wage gap.

4 months ago   •   8 min read

By Igor Bobriakov
Table of contents

Introduction

Chances are, when you hear "0.77 cents on the dollar," an increasingly contentious subject— the gender pay gap—comes to mind.  Today, we will explore a US Bureau of Labor Statistics dataset containing information about America's workforce (as of Jan. 2015) to answer the following questions:

  • Which industries are most and least affected?
  • Is the gender wage gap the same across all industries?
  • How much does an average American salary vary by gender?

To answer these questions, we'll be importing the dataset mentioned above (available on Kaggle) into an Arctype SQLite database, where we'll use queries and dashboards to manipulate and interpret the data.

Exploring the dataset

The dataset contains 558 instances (or rows) and seven attributes (or columns). Before moving on to statistics, we’ll need to load our dataset into Arctype. Arctype is an easy-to-use SQL client for PostgreSQL, MySQL, and SQLite. With Arctype, you can create tables and visualize the dataset without the hassle of coding.

To import a CSV file as a table, simply select ‘Tables’ within any workspace, and click the upload button, located with the other table controls:

arctype import CSV to table button
Simply click 'Import CSV' and follow the prompts!

Follow this tutorial for more information.

After successfully importing the CSV data, a new tab containing the table will appear in your workspace. The Wage Gap dataset looks something like this:

gender gap data table import with null values
With Arctype, importing CSV data is easier than ever!

Removing NULL Values from Our SQLite Dataset

Before we move on to data analysis, and to avoid discrepancies while performing a mathematical operation, we need to check our data for any null values, which can be performed using a variation of this query formula:

DELETE FROM table_name WHERE condition;

Let’s plug in our values to delete the rows where either m_weekly or f_weekly is 'Na'

DELETE FROM wageinfo WHERE 'Na' in (m_weekly, f_weekly)

Adding Calculated Columns to an SQLite Table

We now have 142 rows left—let’s add some new columns to our table that calculate basic statistics for our male-female comparison:

  1. m_share - Ratio of male workers to the total number of workers
  2. f_share - Ratio of female workers to the total number of workers
  3. non_weighted_all_weekly - Non-weighted average of the weekly income of the two genders
  4. gap - Difference between the weekly income of male and female workers
  5. ratio - Proportion of weekly income of female to male workers
  6. ratio_of_workers - Overall ratio of female to male workers

Adding new columns in SQLite is fairly simple.  Just plug in your table name, desired column name, data type, and a default value (if you want):

ALTER TABLE table_name
ADD column_name data_type default_value(optional);

We will substitute the above query with the following values:

ALTER TABLE wageinfo ADD m_share FLOAT NULL;
ALTER TABLE wageinfo ADD f_share FLOAT NULL;
ALTER TABLE wageinfo ADD non_weighted_all_weekly FLOAT NULL;
ALTER TABLE wageinfo ADD gap INT NULL;
ALTER TABLE wageinfo ADD ratio FLOAT NULL;
ALTER TABLE wageinfo ADD ratio_of_workers FLOAT NULL;

Note that in SQLite, we can add only one column at a time. The repetition is, therefore, necessary to add the remaining columns.

We have initially assigned a default value of NULL to all the new columns. Now, let’s populate these columns as per their formulas:

UPDATE wageinfo 
	SET m_share = CAST(m_workers AS FLOAT)/all_workers;
UPDATE wageinfo 
	SET f_share = CAST(f_workers AS FLOAT)/all_workers;
UPDATE wageinfo 
	SET non_weighted_all_weekly = CAST(m_weekly AS FLOAT)+f_weekly/2;
UPDATE wageinfo 
	SET gap = m_weekly - f_weekly;
UPDATE wageinfo 
	SET ratio = CAST(f_weekly AS FLOAT)/m_weekly;
UPDATE wageinfo 
	SET ratio_of_workers = CAST(f_workers AS FLOAT)/m_workers;

The CAST as FLOAT command is important because division would result in values less than 1. Since the default type for these columns is INTEGER or NUMBER, the new column would have 0 as their final rounded value.

Visualizing Statistics Using Arctype Charts

Now, it's time to interpret our dataset using visualizations! Let's start by looking at occupations with the most equal and unequal incomes based on gender.

Comparing The Most Equal vs. Unequal Fields with Horizontal Bar Charts

The fields with a ratio closer to 1 can be seen as more equal fields, while a ratio value closer to 0 indicates a substantial discrepancy Using the SQL query below, we can sort the data in descending order to get the top 10 occupations with the highest proportion of female to male income:

SELECT
	LOWER(wageinfo.occupation) AS occupation, 
	wageinfo.ratio 
FROM 
	wageinfo
ORDER BY 
	wageinfo.ratio DESC 
LIMIT 
	10;
The fast and easy-to-use SQL client for developers and teams

The output should look something like this:

Similarly, to get the most unbalanced fields, we can simply reverse the sorting order and get the top 10 results.

SELECT
	LOWER(wageinfo.occupation) AS occupation, 
 	wageinfo.ratio 
FROM 
	wageinfo
ORDER BY 
	wageinfo.ratio ASC 
LIMIT 
	10;

To compare the ratios side by side, let's create a couple of horizontal bar charts displaying these ratios. After running the query, select 'Chart' from the options at the top of the query window. A list of chart types will appear on the right side of the app:

arctype chart types menu
Arctype offers a variety of visualization options

Select the 'Horizontal Bar Chart' option and fill in the labels for X-axis and Y-axis. The resulting charts should look like this:

arctype most equal fields for men and women horizontal bar graph
The most equal fields in terms of wages

...and this:

arctype most unequal fields for men and women horizontal bar graph
Fields where women get paid substantially less

You can save these charts by clicking 'Add these charts to Dashboard' at the top-right.

More Horizontal Bar Graphs: Visualizing Female Representation in Different Occupations

To retrieve the fields with the largest share of women, we will sort the data by f_share in descending order. For the fields with the smallest share of women, reverse the sorting order and extract the top 10:

SELECT
  LOWER(wageinfo.occupation) AS occupation,
  wageinfo.f_share * 100 AS f_share
FROM
  wageinfo
ORDER BY
  wageinfo.f_share DESC (or ASC for smallest share)
LIMIT
  10;

Note: we have multiplied f_share by 100 in order to output values as percentages rather than decimals.

We will use a horizontal bar chart for this query as well. Following the same procedure, and the final charts should look like this:

arctype horizontal bar graph highest female worker representation by occupation
Fields with the highest percentage of female workers

...and this:

arctype horizontal bar graph lowest female worker representation by occupation
These results match our overall observations from everyday life

The fields in which women have the smallest share are not necessarily the fields with the most income. Therefore, looking at the problem with gender as a driving force—from this angle, at least—can be ambiguous and possibly misleading.

We can, however, draw a comparison between the share of both men and women in the most as well as least paying fields.

Using Stacked Horizontal Bar Charts to Visualize Gender Breakdown in High-Income Occupations

We will now extract both m_share and f_share in a single query and compare them for each occupation using a query like this:

SELECT
  LOWER(wageinfo.occupation),
  wageinfo.non_weighted_all_weekly,
  wageinfo.f_share * 100 AS '% Female',
  wageinfo.m_share * 100 AS '% Male'
from
  wageinfo
order by
  wageinfo.non_weighted_all_weekly DESC (ASC for lowest incomes)
LIMIT
  10;

Select the option of a Stacked Horizontal Bar Chart with f_share and m_share on the X-axis and occupation on the Y-axis. The resulting graphs are as follows:

arctype horizontal stacked bar graph gender breakdown in high-income occupations
Gender bias in the workplace becomes more visible with this graph
arctype horizontal stacked bar graph gender breakdown in low-income occupations
Almost an inverse of the chart above!

We can see that the most paying fields are dominantly male-driven. The contrast is especially stark for occupations such as Chief Executives, Engineers, etc. While the situation is better in the least paying fields, in which women tend to represent a larger portion of the workforce, we can still notice that fields like Agriculture are heavily male-driven.  More importantly, however, female representation is (on the whole) higher as income decreases, which would indicate that the wage gap is in fact real.

Comparing Average Incomes for Men and Women Using Multivariable Bar Graphs

We can also compare the average wages of men and women working in the same field against the overall average weekly income. This will give us some sense of whether or not the wages actually differ by gender.

First, let's extract the most paying fields sorted by the average weekly income in descending order:

SELECT
  LOWER(wageinfo.occupation),
  wageinfo.m_weekly AS 'Male',
  wageinfo.f_weekly AS 'Female',
  wageinfo.all_weekly AS 'Overall'
FROM
  wageinfo
ORDER BY
  wageinfo.all_weekly DESC
LIMIT
  10;

Select Bar Chart from the Chart Menu. Set occupation as X-label while plotting Female, Male, and Overall on Y-axis. We should get the following output:

arctype multivariable bar graph comparing men and women's weekly income
Women earn anywhere from 'slightly less' to 'substantially less' than average in each of the 10 highest paying occupations.

Conclusion

In this article, we explored the wage gap in America by comparing meaningful statistics such as weekly income, shares, etc. Using SQL and Arctype to conduct analysis, we've determined the most- and least- affected fields (see ‘Most Equal vs. Unequal Fields'). The corresponding bar graph shows that occupations like Wholesale and Healthcare enjoy a fair distribution of female to male workers. On the other hand, fields such as legal and finance are substantially more skewed.

However, our analysis also indicates that occupations with considerable gender imbalances—both predominantly female and male—do not necessarily correlate with the highest- and lowest-paying jobs (see ‘Female Representation in Different Occupations’).

Therefore, we instead looked into the gender breakdown by occupation, which helped us gain insight into whether the wage gap is equal across all industries. The stacked bar charts here (see 'Gender Breakdown in High- and Low-Income Occupations') clearly show a male-dominated atmosphere, especially in higher-paying fields. On the other hand, fields with lower wages such as cashiers and housekeeping are primarily female-driven. While the wage gap differs by industry, most of them still skew substantially towards males.

To answer the last question—regarding how income varies with gender—we extracted the overall average weekly income for each occupation, and then compared it with the male and female weekly income (see 'Comparing Average Incomes'). We could not help but notice that the female income is less than the average income in most—if not all—categories. Inevitably, male income exceeded that threshold by quite a margin.

The data here is clear: in a world where women comprise 49% of the population, we still have to work on inclusivity and our perception of gender—especially in the workforce.

JOIN the Arctype Newsletter
Programming stories, tutorials, and database tips every 2 weeks

Spread the word

Keep reading