Machine Learning Tool to Detect Interviews At-risk
Year
Services
Technologies
In line with efforts by the World Bank and Joint Data Centre to strengthen data systems and fill data gaps, the World Bank sought to develop a machine learning tool to detect irregularities or potential falsification in interview data collected through Survey Solutions.
rowsquared developed RISSK, an open-source package that analyzes unmodified Survey Solutions export files from any survey to identify interviews potentially containing interviewer misconduct.
The tool is generic, privacy-focused, and easily integrable into any data quality monitoring process.
To establish a solid foundation, we first conducted a comprehensive review of academic literature and similar efforts.
We extracted a wide range of generic features from paradata and microdata from Survey Solutions export files, such as interview timing and answering sequences.
Using machine learning algorithms, we detected anomalies in individual features and calculated scores, which we then combined into a single, easy-to-interpret unit risk score at the interview level.
To rigorously test RISSK’s effectiveness in identifying high-risk interviews, we conducted a survey experiment using both real and artificially created “fake” interviews.
RISSK has successfully identified interviewer misconduct in our surveys.
You can find the package and more details on GitHub.