This week was the first time I participated in the data visualization community #MakeoverMonday. No stunning visualizations from me. Rather, some general steps towards my “data literacy” and Tableau competency goals.
As I began work with the data, the first thing I observed is the minimal value provided by the provided MakeoverMonday African heat-map visualization. This map would have been helpful if it showed a strong “by region” distribution of teenage pregnancy (i.e. similarity for rates in contiguous countries) and was paired with other data visualizations that confirmed the reasonable assumption that obstetric fistula incidence is correlated with teen pregnancy. For exploratory (contrast to explanatory) data visualization, the map does have some value for confirming that there is no such regional clustering in teen pregnancy.
To orient myself to the dataset and research objective, I reviewed Operation Fistula website, the MakeoverMonday Data Dictionary, International Women’s Day documentation (PDF), and Global Fistula Map (a link in the PDF) that illustrates the incidence of fistula repair surgery by location.
From the PDF, I crafted the research questions as:
- To discover any predictive features for classification as a “fistula country” (either positive or negative). For example, the PDF mentions ‘Extreme Gender Inequality’ as a suspected driver.
- To understand any other correlation between the different metrics.
From the PDF, I learn that the source World Bank dataset is found here —I briefly reviewed the information on this page to conclude that I do not need to use anything from this source and can focus just on the MakeoverMonday dataset. I also don’t expect to use the Global Fistula Map since it is mapping the incidence of restorative surgery, whereas the research questions relate to the incidence of fistula in the population. A high number of surgeries might indicate a high incidence of fistula; however, it could instead indicate a high standard of medical care, whereby whenever fistula occurs restorative surgery is available.
From the Data Dictionary, I organized some potential correlations to ‘is fistula country’ by theme:
- Country prosperity:
- GDP. Note: this would need to be converted to ‘per capita’ to be useful — but the dataset itself does not include country population to facilitate this.
- Teenage pregnancy rate
- Quality of healthcare:
- Percentage of births attended by skilled staff
- Health expenditure as a percentage of GDP
- Hospital beds per 1000 people
- Risk of maternal death
- Nurses and midwives per 1000 people
- Gender inequality:
- Female genital mutilation prevalence
- Female life expectancy compared to male life expectancy
- Incidence of physical or sexual violence
- Incidence of girls being married by the age of 15
Next, I downloaded the MakeoverMonday dataset (CSV file) and saved it in Excel file format. A quick inspection of the data highlighted that:
- There are 11792 data rows
- The date range is 1960 to 2017
- There are 216 countries
- There are no null values for ‘country’, ‘date’, or ‘fistula country’; however, the remaining columns are sparse, with many rows having null values. For example, there are 9366 null-value cells in ‘births attended by skilled staff’, and 11459 null rows in the ‘pregnancy’ column.
Two Excel tips I leveraged for the analysis above and subsequent data massaging:
- To select all rows in an Excel column, hold down Ctrl and Shift, then press down-arrow.
- To copy visible cells only after hiding some cells using Excel Hide functionality: follow these instructions.
For further data analysis, I decided to first pursue ‘risk of maternal death’, given its nearly complete data for 1990 through 2017. After completing some preprocessing in Excel and narrowing the countries to an illustrative few, I created this visualization in Tableau Public for exploratory analysis.
- To reduce clutter, while still illustrating risk trends, I choose to group data by decade.
- I considered using a simple bar-chart with ‘is fistula country’ shown in a contrasting color compared to ‘is not a fistula country’; however, I thought it would be easier to observe this distinction by using a zero line and plot “is” as positive and “is not” as negative.
- To facilitate filtering out countries with very low maternal death, I created a Calculated Field and Filter for the absolute value of maternal death.
Using this simple visualization for the sample data, it is easy to immediately see that Algeria, Bolivia, and Bhutan are outliers that have comparatively high risk for maternal death and yet are not classified as ‘is fistula country’. Given more time that I could devote to this MakeoverMonday challenge, I would want to further explore this observation with secondary research.
A final note regarding my learning from this exercise: Being inspired by Cole Nussbaumer Knaflic’s chart esthetics in her Storytelling With