How is candidate gender determined in Datapeople's reports?

Datapeople uses a proprietary algorithm to probabilistically estimate gender based on a candidate's first name. The model we use is trained on large sets of public data where individuals self-identified their gender. This dataset is global and extends beyond names found in the United States.

How accurate is the Datapeople model?

No model is ever going to have perfect accuracy, but we come pretty close! Our gender data is 90%+ accurate when we compare against candidate self-reports. This is enough accuracy to diagnose whether there are systemic issues in recruiting processes.

Is your model accurate on non-English names?

Yes, the model we use is trained on large public datasets in which individuals have self-reported their gender and first names. These datasets are global and include non-English names.

How does this report consider the fact that some candidates identify outside the male/female binary?

Datapeople supports inclusivity and fairness for candidates and employees of all gender expressions and identities--but we display and report only on data we can responsibly and accurately provide.

This is the same reason we do not report on individual candidate gender, but only in the aggregate. We cannot tell you whether a specific candidate is male or female because we don’t know for sure that someone’s identification is directly correlated to their first name. We can (for example) say that, in aggregate, your pipeline is likely to comprise 47% women.

Why does Datapeople model gender representation?

Candidate self-reports are the gold-standard for identity data. After all, self-reports reflect how the candidate sees themselves. However, collection of candidate identity data in ATSs can be limited in a variety of ways that make it difficult to trust the data. This can lead to a significant blindspot where too few candidates in your applicant tracking system (ATS) have demographic data associated with them. We recognize that our model only provides a binary look at gender, but the benefit of using gender inference modeling is that we can have reasonable predictions of the whole candidate pool, not just the 50% that chose to disclose their gender using your demographic surveys.

Why is Datapeople’s gender data only offered in aggregate?

The goal of gender representation applicant reporting is to measure whether your processes have any biases or bottlenecks that disproportionately impact one demographic group. Datapeople does not provide gender predictions for individuals and/or their self-identification, for example, identifying as non-binary. We cannot tell you that “this is a female candidate” because we don’t know for sure what someone’s identification is directly correlated to their first name. We can say that in aggregate that your pipeline is likely to comprise 47% women and that in aggregate it results in 52% of hires.

Why don't you use the "gender" field filled out by the applicant?

Data integrity and accuracy are of the utmost importance to us at Datapeople. Because of this, we choose not to use gender self-reporting by applicants because it is notoriously sparse and often incomplete, which would not provide an accurate summary of the gender breakdown in your hiring funnel. We get a more accurate result with our global probability model that looks at first names rather than relying on incomplete, unavailable, or missing data from an ATS.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Last updated on August 2, 2023