Estimating Gender Equality
Modeling
Assume we have data about job application procedures \(P_i\), with \(i \leqslant N \in \mathbb{N}\), where we have data about the number of male \(M_i\) and \(F_i\) applicants and if a male applicant was offered a job \(P_i = 0\), or a female contestant, i.e., \(P_i = 1\). Now, we wish to model the influence of a gender bias \(\rho \in [0,1]\), where \(\rho = 0\) means absolute bias towards men, so \(P_i=0\) for all \(i\), and \(\rho=1\) would mean absolute bias in favor of female applicants. They way \(\rho\) works is that if we have exactly two applicants, one male, one female, we select the male applicant with probability \((1-\rho)\) and the female applicant with probability \(\rho\). Based on this and the assumption that all considered job applicants are absolutely equally qualified, we can model the density \(f_i\) of \(P_i\) as $$ f_i(n, \rho) = \delta_0(n) \cdot \frac{(1-\rho) M_i}{(1-\rho)M_i + \rho F_i} + \delta_1(n) \cdot \frac{\rho F_i}{(1-\rho)M_i + \rho F_i}, $$ which essentially generalizes above two-applicant scenario to arbitrary number of applicants. Now, since we have real worl data of the past, so \(P_i = x_i\) together with concrete values for \(M_i\) and \(F_i\), we can turn things around and instead estimate \(\rho\) from these cases. Since, we have a precise statistical model for our observations, we can use the maximum likelihood approach over the joint density of all observations. Under the assumption, that the job application procedures are mutually independent, we have that $$ f(\rho \vert x_1, \dots, x_N) = \prod\limits_{i=1}^N f_i(\rho \vert x_i), $$ which after applying a \(\log\) on both sides and negating gives the negative log-likelihood $$ \lambda(\rho \vert x_1, \dots, x_N) = -\sum\limits_{i=1}^N \log f_i(\rho \vert x_i). $$ The value \(\rho^\ast\) for which \( \lambda(\cdot \vert x_1, \dots, x_N) \) is maximized is called the maximum likelihood estimate.
Implementation
Simulation
Due to the lack of (publishable) data, we also have to simulate some cases such that we can try out our estimator. Additionally, we will see later that we need it to estimate the estimator’s distribution. First, in order to encode the cases, we simply store a triple of integers \((F_i, M_i, x_i) \in \mathbb{N} \times \mathbb{N} \times \{0,1\}\). A simple code snippet for generating those cases might look like
|
|
where we draw the number of applicants from a uniform distribution with a certain maximum and the number of females from a binomial distribution, where the number of trials is conditioned on the number of applicants. This allows us to model the fact that the number of applicants is not evenly distributed over the two main genders (as it is the case for many kinds of professions).
Now, we need the implementation for settle_case
, which reads as
|
|
which skews the choice between women and men according to the bias \(\rho\) as explained above.
Estimation
In order to find an optimal value of \(\lambda\), we need to be able to evaluate it given \(\rho\) and some case data. This can be done as follows:
|
|
Note the use of np.where
, which effectively allows broadcasting an operation across a condition.
In order to unlock gradient-based optimization of \(\lambda\), we also need the derivative with respect to \(\rho\).
We skip the useless math details and just implement it as:
|
|
Next, we wish to implement the process of maximum likelihood estimation.
For this we need to curry both functions above to only depend on \(\rho\).
Then, we sample
\(\lambda\) for a grid of values for \(\rho\) in order to deal with possible non-convexities and then use the maximum across those samples as an initialization for a gradient-based method, like scipy.optimize.minimize
.
Finally, we need to take care of application procedures, where only one of the genders was present.
These cases give us no information about \(\rho\) and need to be disregarded as np.nan
.
A possible implementation looks like:
|
|
This already allows us to generate one instance of \(\rho^\ast\) based on a collection of specific application procedures.
Testing
However, how do we interpret this value? Of course, we assume it should be close to \(1/2\), but how close exactly? Remember, inherently the maximum likelihood estimate is a random variable, as the estimator’s output depends on random data, i.e., the outcomes of our application processes. Where do we draw the line and make the decision that the hiring process is generally skewed towards one gender?
For this, we can design a hypothesis test, where in our case we have that \(H_0: \rho=1/2\). Under this hypothesis, we can now simulate the distribution of the maximum likelihood estimates \(\rho^\ast\). To this end, we repeat the process of generating data to obtain an empirical approximation of the distribution of \(\rho^\ast\) form those multiple realizations. From this empirical distribution, we get empirical quantiles and we can for instance design empirical rejection thresholds \(\rho_1<\rho_2\) such that $$ P(\rho^\ast < \rho_1) = P(\rho^\ast > \rho_2) \approx 0.025, $$ and we can control our False-rejection probability to \(0.05\).
Doing this in code basically uses everything from above and can be done as:
|
|
This code produces something like the picture below:

Discussion
As we can see, when analyzing “only” 40 cases, the distribution of \(\rho^\ast\) is pretty wide and even values that intuitively look “concerning” if they would come out of our statistics, cannot justify the rejection of \(H_0\). At least not with the tools we have provided here. Note that asymptotically the variance of \(\rho^\ast\) most likely only scales with \(1/N\), so narrowing down the rejection interval for \(H_0\) requires significantly more data. Then we might be tracking the company’s hiring process across a long time frame and we might for instance be running into the issue that \(\rho\) is not constant anymore.
Tags: #Tech