Gendering the research landscape. Gender distribution among the contributors to the TEI Conference and Members’ Meeting 2016

The talk “Gendering the research landscape. Gender distribution among the contributors to the TEI Conference and Members’ Meeting 2016” (authored with Peter Andorfer) was presented at the Women’s History in the Digital World conference at Maynooth university.

Location: Maynooth University, Ireland
Date: Jul 6-7, 2017


The TEI Conference and Members’ Meeting 2016 took place at the Austrian Academy of Sciences in Vienna and was hosted by the Austrian Centre for Digital Humanities. As hosts, we took an interest in the gender distribution among the contributors to the conference. However, as we had not collected information on the gender the conference contributors identified with, we were confronted with a lack of data.

As we had encoded the conference abstracts in XML TEI and tagged the authors’ forenames in these files, we therefore decided to deduce the contributors’ genders from their forenames – or, to be more precise, to deduce the gender that the respective forename is most commonly associated with. We did not want to make assumptions about the gender of the contributors themselves, but much rather to analyze the gender of their names.

Finding out the gender most commonly associated with a forename is possible when empirical evidence is available – evidence such person names lists that include information on gender. We started with such a list provided by Mark Kantrowitz, used e.g. in the Natural Language Toolkit. Unfortunately, this list provides no documentation on how the decisions to match names to gender were made. For this reason, we looked for another resource and found, a web service to “determine the gender of a first name”. The website states that the data collected there was assembled by scraping data from social network profiles, where people can declare their gender themselves. This was what we had been looking for, therefore we compared our list of authors’ forenames to the dataset.

The result of this comparison did not leave us with a fully meaningful result, out of 124 names 22 were left undetermined. Therefore, we manually compared the still ungendered names to the database. This website sources its data from the “2001 and 2011 UK Census Data, together with multiple online sources and contributions from our 2m website visitors” and assigns gender accordingly. What was particularly appealing about this database was that it also considers the possibility of unisex names: “If we see just one instance of a name appearing as both male and female, we categorise it as unisex.” As use of this database is only free for up to 20 names, respectively for manual name search, we decided to compare the 22 names manually. This left us with statistically relevant results and three ungendered names: Kiyonori, Sünna, and Tetsuei.

In the proposed paper, we will not only reveal the outcome of our gender check, but also speak about its implications for the gendered research landscape in the field of digital editing, the theoretical base of our technical approach, and the difficulties of considering complex theories such as Judith Butler’s concept of gender performativity and agency when modelling and evaluating data.

See the gender distribution analysis

See the slides