SOEPpapers 1230, 29 S.
Daniel Graeber, Lorenz Meister, Carsten Schröder, Sabine Zinn
2025
get_appDownload (PDF 477 KB)
Machine learning is increasingly used in social science research, especially for prediction. However, the results are sometimes not as straight-forward to interpret compared to classic regression models. In this paper, we address this trade-off by comparing the predictive performance of random forests and logit regressions to analyze labor market vulnerabilities during the COVID-19 pandemic, and a global surrogate model to enhance our understanding of the complex dynamics. Our study shows that, especially in the presence of non-linearities and feature interactions, random forests outperform regressions both in predictive accuracy and interpretability, yielding policy-relevant insights on vulnerable groups affected by labor market disruptions
JEL-Classification: C45;C53;C25;J08;I18;C83;J21
Keywords: Machine learning, interpretability, labor market, random forests