The Equal Employment Opportunity Commission has unveiled a new data visualization project. It shows enforcement data in what the commission calls a simple, comprehensible and visually appealing way. For more about the visualization and how they built them, the EEOC’s chief data officer Dr. Chris Haffer spoke to Federal Drive with Tom Temin.
Tom Temin: Dr. Haffer, good to have you on.
Dr. Chris Haffer: Thank you for having me.
Tom Temin: Give us the big picture here. One doesn’t normally think of EEOC as a data agency, but it sounds like you really are. Give us a sense of the data that you’re dealing with and the stakeholders you’re trying to boost up here.
Dr. Chris Haffer: Sure. First, let me just provide a little bit of context. The recently enacted foundations for Evidence Based Policymaking Act encouraged all federal agencies to begin to pay more attention to the data that they have, realizing that there was just a treasure trove of data within federal agencies that had never thought of themselves as data related agencies before. So what we’re attempting to do at EEOC is really to open the treasure trove of data that we have, and make those data publicly available for facilitating transparency, as well as to allow the public to use the data for statistical evidence building purposes.
Tom Temin: For many years, EEOC has published data that’s in the form of, I guess, spreadsheets, basic textual presentations of data. But apparently, I guess the feeling is that didn’t get the most out of the data that’s possible to get.
Dr. Chris Haffer: That’s right. Publishing data in tables is something that was good, but not great. Using these modern analytical techniques and these modern data query and visualization tools allows us to put out more data than we ever have before in a way that enhances the confidentiality protections for employers and individual employees.
Tom Temin: And give us a sense of the types of data that is there because I was interested to find that there is a lot of demographic information on employers. And you don’t present that as a comprehensive look at the employment picture for the United States. But it’s still about one of the best ways of at least getting a clue into the demographics of employment.
Dr. Chris Haffer: That’s right. Every year the EEOC collects information from approximately 90,000 private sector employers on the demographic makeup of their workforce. We collect information on race, ethnicity, gender, and job category counts for every employer in the country. And this tool allows individual members of the public to actually dig in by geographic location, and we’ve got it down to the county level, as well as industry — and take a look at what’s happening both within a given year, as well as across the years as well. For this first release of our tool, we chose to include two years, 2017 and 2018 data. And we’re going to include more years going forward, as well as include data from prior years of the data collection. We’re also exploring ways to include other types of data that EEOC collects, including aggregate information on charge filings, as well as the data we collect from our federal sector investigation.
Tom Temin: For many years in Excel and other spreadsheet programs that preceded it. You could produce simple charts and graphs of the same data. They were not great looking, but put them in pretty colors and circles and bars and three dimensions. What does visualization do that goes beyond that? And what do you have to do to the data in such a way that it can be visualized in the new ways that you mean by visualization?
Dr. Chris Haffer: Sure. In the past, when federal agencies produced information, tables, graphs, and charts, they all depended on certain decisions that the staff who were creating those charts and graphs. It depended on what they thought was important information to depict. Using this EEOC explore tool, which is based on the tableau framework. individual members of the public who are interested in slicing and dicing the data however they want can specify the way that they want to use and look at the data. So they’re no longer dependent on individuals, faceless bureaucrats who are making these choices. Data users themselves can now choose how they want to see the data.
Tom Temin: Give us a sample scenario of how this might work for someone, a type of stakeholder and what they might be looking for and what they can now produce with the data published there.
Dr. Chris Haffer: Sure. There’s oftentimes a lot of interest in comparing employment sectors across the country and looking at the demographic makeup of the workforce, say in certain tech areas like Silicon Valley versus the Atlanta area versus the Boston area. And this tool will allow individual members of the public who are interested in comparing the demographic makeup of the tech sector in different locations across the country the ability to actually drill down and take a look and obtain the information they need about the types of employees that are being employed at these tech sector firms in different parts of the country.
Tom Temin: And what about people that might use data visualizations to come to conclusions that may or may not be supportable by the facts surrounding the data? Because often people take outcomes as evidence of intent, when that’s not always necessarily the case.
Dr. Chris Haffer: That’s a great point. I think that in any effort where a federal agency is making data more transparent, that it really is important that the individual data user realize that correlation is not causality. And while they may find evidence of some type of relationship, it’s incumbent upon them to dig down and find out what is going on that led them to ask the question that they’re asking,
Tom Temin: Sure, it’s the old everyone who ever died was raised on mother’s milk type of question that comes up in these kinds of issues. But give us a sense of the process that you had to go through, your staff, to prepare the data or otherwise stage it such that it could be used in visualizations, because as you say, for many years you can just download PDFs or download spreadsheets and you’re off to the races.
Dr. Chris Haffer: Yeah, we’re fortunate in that we have grown our staff of statisticians and data scientists, many of whom came to us with experience from other federal statistical agencies in preparing data sets and testing data sets and ensuring that the data sets provided insights into valuable information, while at the same time protecting the confidentiality of individuals and individual employers. And it was an 18 month developmental process where we needed to take the raw data and to begin to both standardize it in terms of ensuring that it was the highest quality data that we had available, and then beginning to drill down and employ various statistical disclosure limitation techniques and methods to ensure that the data were protected as robustly as we could.
Tom Temin: What’s your lesson learned, overall here in this project for other federal agencies that might want to goose up the usefulness and utility of the data products they have?
Dr. Chris Haffer: The lesson learned that’s applicable I think to other federal agencies, especially small federal agencies who are working with limited resources, is that they really need to take a look at the data requests that are coming in from the media, the data requests that are coming in from FOIA, and to think about how they could create public use files that will help meet the needs of these data requesters when they come in. Because what this ultimately does is you put a tool like this into the hands of the public and that should really help to reduce the number of FOIA requests that you’re receiving, and reduce the number of requests that you’re receiving from the media through your communications office. So that instead of having to produce one off data runs by someone who’s already an analyst who may already be overburdened and their plate is overflowing, you now have put a self service tool into the hands of the requester themselves and allow them to cut the data however they would like to cut the data.
Tom Temin: So in some ways, it makes the operations of the agency itself a little bit more efficient.
Dr. Chris Haffer: Absolutely. That was one of the primary goals when we realized we needed to do something like this is we were receiving dozens of requests every month for different slices and dices of the EEO one data through various channels at the agency. And we realized that investing the time and effort in producing this data visualization tool, the state of query tool, was going to pay substantial dividends once this was released.
Tom Temin: And I imagine it can also fold back and help the EEOC help the commissioners themselves understand what’s happening in the agency and maybe as the Data Act also I think intends help to drive the agency mission in some manner.
Dr. Chris Haffer: That’s right. Very similar to the question you raised earlier about, what should one conclude if one finds evidence of some type of relationship. This is a tool that can be used for generating questions, and I think it’s a very valuable tool, as you pointed out, for the commissioners to be able to take a look at different segments of the population, different segments of different industries, and to start to pull the curtain back, if you will, and just enable some initial glances into what’s happening with employment trends in those industries. And then ask those follow up questions as to what might be explaining some of the trends that they’re observing.
Tom Temin: And will you be monitoring the take up of the tool that you’ve just introduced?
Dr. Chris Haffer: We definitely will. We’ve provided a mechanism for users to provide us feedback. And so far, this is 24 hours after launch, things are looking really well in terms of having designed a tool that met user needs.
Tom Temin: Dr. Chris Haffer is chief data officer at the Equal Employment Opportunity Commission. Thanks so much for joining me.
Dr. Chris Haffer: Thank you.