Machine Learning Assisted Occupation Classification
Year
Technologies
Following the 2022 census, the Maldives Bureau of Statistics faced the challenge of classifying 242,000 occupation descriptions into ISCO and ISIC codes. To address this, rowsquared developed an innovative Machine Learning Assisted Classification (MLAC) system as part of our comprehensive census support.
We first trained a NLP model using pre-classified observations from the 2019 Household Income and Expenditure Survey (HIES). This model was then applied to predict all four levels of the ISCO and ISIC codes for the census data.
A key challenge was the mixed-language occupation descriptions, often in non-standardized Roman spellings of Dhivehi and English, with varying quality. Our system was designed to effectively handle these linguistic complexities.
To ensure accuracy, we developed a user-friendly online labeling tool for human reviewers. This tool displayed key information and allowed reviewers to confirm or correct entries, with an interface for reviewing previously classified descriptions to maintain consistency.
Our model achieved high match rates for human-classifiable descriptions. More importantly, our innovative approach led to significant improvements in classification consistency, speed, and resource efficiency compared to traditional manual coding methods.
The project showcased rowsquared’s expertise in applying advanced data science techniques to practical challenges in survey and census operations. It also highlighted our ability to develop scalable solutions that can be adapted for various classification needs across different contexts and languages.