Seminário - Estatística e Gestão do Risco

Consequences of model misspecifications in synthetic data analysis

by Presidential Research Professor Bimal Sinha, Department of Math/Stat, UMBC & Center for Statistical Research and Methodology (CSRM), US Census Bureau

Data: 5/12/2016

Hora: 14h00

Local: Sala de Seminários - Edifício VII, FCT-UNL

 

Abstract: 

Creation of appropriate synthetic data, to hide some sensitive features of an original data set, and its valid analysis are important in many government agencies, including the US Census Bureau. Often generating synthetic data is model based, and it can very well happen that the assumed model is not the underlying true model which yielded the observed data. In this talk I will first explain how singly imputed synthetic data can be created and analyzed. Next I will explain some consequences of model misspecifications at three stages of data analysis: original data generation model, imputer's assumption about the model, and data user's perspective of the model. I will use the example of a multiple linear regression model with a publicly available real data set.

Financiado através do projeto UID/MAT/00297/2013.