Invisible in data: The lack of LGBTQ data collection

Although government surveys collect significant amounts of data about the U.S. population, these surveys have historically excluded demographic questions about the LGBTQ population. As a result, official population data may misrepresent the LGBTQ community, resulting in biased or insufficient policies. This article is the first in a series on citizen engagement and the LGBTQ community. The series will consider policies such as voter identification and election law, voter engagement, bipartisan LGBT policymakers and policies, and LGBTQ diversity in the workforce.

Although government surveys collect significant amounts of data about the U.S. population, these surveys have historically excluded demographic questions about the LGBTQ population. As a result, official population data may misrepresent the LGBTQ community, resulting in biased or inefficient policies. This article is the first in a series called Invisible in Data. The series will consider policies such as voter identification and election law, voter engagement, bipartisan LGBT policymakers and policies, and LGBTQ diversity in the workforce.

The peanut farmer that loved

Heartfelt stories translate into currency for advocacy and policymaking. On December 12, 2017 an historic special election took place for an Alabama seat in the United States Senate. Leading up to election day, stories both good and bad surfaced about candidates Doug Jones and Roy Moore. However, one story particularly felt inspiring for lesbian, gay, bisexual, transgender, and queer (LGBTQ) folk. In response to an anti-LGBTQ comment made by Moore, peanut farmer Nathan Mathis shared a story in protest of Moore about his lesbian daughter who committed suicide — a serious problem in the LGBTQ community according to some of the best available data. In a moving video that went viral, Mathis upbraided Moore’s comment that “all gay people are perverts, abominations,” stating that “we don’t need someone like that representing us in Washington.” Moore would go on to lose to Democratic nominee Jones. Jones’ marginal victory came after a 25-year Republican political monopoly of the Alabama delegation to the United States Senate. Indeed, stories such as Mathis’ shape politics daily, but policy and politics consist of more than storytelling.

An invisible people

Although policymaking ultimately rests upon empirical inquiry, an overwhelming amount of population surveys omit questions relating to sexual orientation and gender identity or expression (SOGIE). The result: A dearth of understanding about the experiences of the LGBTQ community and how gender identity and sexuality influence other demographics such as race, ability, or geopolitics.

The American Community Survey (ACS) only indirectly collects data for one SOGIE variable: marital status. ACS provides same-sex married couples the opportunity to identify as a same-sex married couple. ACS asks for “head of household information first, coding sex of the head of household, and then asks for information on all other persons in the household, coding sex for each of these persons. For the other members of the household, the form asks how they are related to the head of household.” Thus, if the head of household reports being female and identifies their spouse as female, ACS codes that couple as a same-sex married couple.

The Center for American Progress (CAP) explains that ACS provides some insight into same-sex couple households, but it does not cover discrimination in the workplace, housing, or public accommodations. Furthermore, the coding practices for ACS ignore the ways in which transgender folk identify along with their sexuality. This leads to a conflation of gender and sexuality as well as an outright disregard for gender diversity. The National Center for Transgender Equality (NCTE) writes that without data collection inclusive of transgender people, unemployment rates, income and poverty, drug and alcohol abuse, suicide, and all other data regularly measured in the general population go unobserved. Furthermore, the Human Rights Campaign released a statement that expresses concern beyond surveys within the United States, stating, “There is currently no country in the world that gathers comprehensive data about the welfare, empowerment, rights enjoyment, and equality of lesbian and bisexual women, and transgender and intersex persons. At most, we are mentioned in HIV prevention statistics, hate crime numbers, or one-off suicide studies.”

Why does it matter?

The lack of population data collection gives researchers and policymakers serious mathematical issues. With the exclusion of SOGIE variables in analyses, mathematical bias in the form of confounders plagues the foundation of policies. Academic literature on the definition of a confounding variable determined the following statistical meaning of a confounder as most appropriate: “A pre-exposure covariate C is a confounder for the effect of A on Y if it is a member of some minimally sufficient adjustment set.” For example, sexual orientation (C) is a confounder when it both influences A and Y, where race (A) also influences something such as voting (Y). However, the statistical model excludes (C), creating the confounder. Figure 1 models this relationship. Ultimately, the presence of a confounder (i.e., lack of a variable) biases results of statistical models because of an unaccounted influence on both A and Y.

Confounder Model

Moreover, existing LGBTQ data at membership associations and nonprofits such as the American Association of Retired Persons (AARP) or NCTE conduct research using originally collected data, but these data come from nonrandom samples, limiting conclusions to those samples. For example, the 2015 United States Transgender Survey, the largest survey of transgender and gender non-conforming people in history, surveyed nearly 28,000 people. But it only enables researchers to draw conclusions upon those 28,000 respondents. Considering that the best available statistics estimate over 1 million transgender people exist in the United States (out of the roughly 10 million LGBTQ people in the country), these 28,000 respondents hardly tell the full story of the community and likely understate the severity of negative outcomes of many transgender people today. Furthermore, AARP produced a monumental study of over 85,000 members, of whom approximately 1,700 reported being LGBTQ. Unfortunately, this sample is non-random as well, limiting the scope of conclusions to AARP members, who do not reflect the entirety of the roughly 10 million LGBTQ people (about 4 to 5 percent of people) in the United States – let alone the elderly population.

The problem with these type of data have solutions, but these solutions often come under unnecessary attack. For example, the Trump Administration removed the only LGBTQ question in the National Survey of Older Americans Act Participants, a national survey of senior citizens who receive Title III benefits – benefits relating to social security and unemployment. This omission limits researchers’ ability to identify the ways in which LGBTQ elders face inequality with regard to these services. Uncertainty constantly looms over researchers as to whether public officials arbitrarily decide to remove or overall fail to consider these types questions. In the end, effective governance and efficient policymaking require data collected more accurately, consistently, and fairly.

Ways to fix the data

Including SOGIE survey questions allows policymakers to better target populations that need additional services and use resources more efficiently.

CAP published a column that provides information about LGBTQ data collection. CAP recommends collecting the following responses, at minimum, of LGBTQ+ people:

  • Sexual orientation/attraction/identity
  • Sex assigned at birth
  • Gender identity/expression
  • Transgender status
  • Relationship status
  • Preferred name
  • Gender pronoun

Notably, CAP possesses some of the best LGBTQ data available today. In 2017, CAP commissioned a survey to 1,864 individuals about their experiences with health insurance and health care. Among the respondents, slightly less than half identified as lesbian, gay, bisexual, and/or transgender. Respondents came from all income ranges and across factors such as race, ethnicity, education, geography, disability status, and age. This quality of life survey utilized oversampling and weighting survey methods. That is, methods that collect large amounts of data from a small population (LGBTQ in this case), and apply weights to account for misrepresentation. Nevertheless, fielding small random surveys and applying weights makes for inefficient use of resources when large population surveys could insert SOGIE questions instead.

Mining in the closet

In order to demonstrate the negative research consequences of insufficient LGBTQ data collection, the Invisible in Data series explores research questions and tests the reliability of models that do not include sexual orientation variables. We conduct the tests by including both sexual orientation and interaction variables into the statistical models. However, no publicly accessible population survey includes both sexual orientation and transgender status; therefore, although this investigation will still be insightful, it will remain incomplete.

The General Social Survey (GSS) serves as a dataset to test our hypothesis about bias against LGBTQ individuals stemming from a lack of SOGIE data. According its website,  GSS has gathered data on United States society to identify trends and constants in attitudes, behaviors, and attributes since 1972. It contains a standard core of demographic, behavioral, and attitudinal questions. GSS started collecting data on sexual orientation in the mid-2000s, collecting sexual orientation data for about 100 out of 1,500 to 2,000 respondents each time. (While GSS may prove useful for testing the hypothesis of this series, it should still serve as a call for better LGBTQ data collection among larger population surveys, such as ACS, the Census, and the Current Population Survey.)

Whether voting or checking a box on a survey, all people deserve representation in data – for an inclusive society and an efficient one.

***

Header photo from www.humanengineers.com

+ posts

Charlie is a member of the Georgetown Public Policy Review as well as Director of Research and Publications for the McCourt School’s LGBTQ+ Policy Initiative. Originally from South Carolina, Charlie acquired a BA from Coastal Carolina University. Charlie’s research typically considers technology's and the shareholder economy's roll in economic inequality, the policy and economics of LGBTQ+ productivity, and labor market programs for disadvantaged populations.