Machine Learning Tool to Detect Interviews At-risk

Client

Year

2023

Location

Technologies

GitHub, Jupyter Notebook, PyOD, Python, Survey Solutions
Machine Learning Tool to Detect Interviews At-risk

In line with efforts by the World Bank and Joint Data Centre to strengthen data systems and fill data gaps, the World Bank sought to develop a machine learning tool to detect irregularities or potential falsification in interview data collected through Survey Solutions.

rowsquared developed RISSK, an open-source package that analyzes unmodified Survey Solutions export files from any survey to identify interviews potentially containing interviewer misconduct.

The tool is generic, privacy-focused, and easily integrable into any data quality monitoring process.

To establish a solid foundation, we first conducted a comprehensive review of academic literature and similar efforts.

We extracted a wide range of generic features from paradata and microdata from Survey Solutions export files, such as interview timing and answering sequences.

Using machine learning algorithms, we detected anomalies in individual features and calculated scores, which we then combined into a single, easy-to-interpret unit risk score at the interview level.

To rigorously test RISSK’s effectiveness in identifying high-risk interviews, we conducted a survey experiment using both real and artificially created “fake” interviews.

RISSK has successfully identified interviewer misconduct in our surveys.

You can find the package and more details on GitHub.