This is an online seminar using Cisco Webex. You will receive the login data with the invitation to the talk.
Abstract: Large-scale personal data collection for the purpose of personalized predictions has been driven by high expectations of efficiency gains in many business and policy settings. Yet, quantifying the trade-off between the costs of linking disconnected silos of personal data, for example in the form of privacy concerns or potential consumer exploitation, and data-enabled welfare improvements has moved to the center of the debate on regulation and policy intervention in digital markets. We investigate the benefits of data combination in a typical health care provision context, the diagnostic prediction of urinary tract infections. We draw on rich linked personal data from Denmark to train a state-of-the art machine learning algorithm on various data combinations. Based on the algorithm's predictions, we quantify the incremental gains in prediction quality and prediction-based counterfactual policy outcomes due to data combination. We find that combining personal non-health data with health data significantly improves prediction accuracy based on a range of measures. Policy outcomes measured in reductions of antibiotic prescribing, while keeping correctly treated bacterial infections fixed, range between 0.56 and 10.22 percent.
Joint work with Michael Allan Ribers and Hannes Ullrich (DIW Berlin and University of Copenhagen).