Conducting a Survey: New Forms of Data in Survey Research PDF
Document Details
Uploaded by FerventMoldavite3499
Utrecht University
Camilla Salvatore
Tags
Summary
This document discusses conducting surveys and the use of new forms of data, such as digital trace data, in survey research. It examines the transition from traditionally designed data collection methods to more organic forms of data, along with specific examples and potential implications for research.
Full Transcript
WEEK 7 Conducting a Survey: new forms of data in survey research Camilla Salvatore [email protected] Intro Survey research Groves, 2011: The Three Survey Era “Designed Data” Supp...
WEEK 7 Conducting a Survey: new forms of data in survey research Camilla Salvatore [email protected] Intro Survey research Groves, 2011: The Three Survey Era “Designed Data” Supplemented by Era of Invention “Organic Data” Non-probability sampling Area probability sampling 1960–1990 Computer-assisted (online) surveys Face-to-face & mail surveys Surveys linked to big data sources 1 2 3 1930–1960 Random digit dialing (RDD) probability sampling Computer-assisted telephone surveys 1990–present? Era of Expansion Changes in society & technology RECAP: nonprobability sample surveys Online non-prob panels: database of potential respondents who declare they will cooperate for future data collection if selected Quota sampling: specific characteristics (quota) of the population are represented proportionally within the sample River sampling: invite website visitor to immediate surveys (e.g. pop up windows) Recruitment via targeted advertisement: e.g. Facebook/Twitter etc. Problems: based on volunteers, part of the population might be systematically excluded Two types of data: digital trace data DIGITAL TRACE DATA 5 New data sources – digital trace data How to collect these data? Directly from the web: API, web scraping etc. DIGITAL TRACE DATA From survey participants: Data donation Web browsing Strava, running apps Facebook, google etc. Designed big data and smart surveys Data collected within or after a survey Apps Browsers 6 Example: Data donation Boeschoten et al. 2020 https://doi.org/10.48550/arXiv.2011.09851 Introducing “design” to digital trace data Example with sensor data: 8 Privacy concerns Participants might have concerns about potential risks related to sensor data ○ Data streams could be intercepted by unauthorized party ○ Connecting multiple streams of data could re-identify previously anonymous users ○ Information could be used to impact credit, employment, or insurability Higher privacy concerns correlate with lower willingness to participate (Keusch, Struminskaya, et al. 2019; Revilla et al. 2019; Struminskaya et al. 2021; Wenz et al. 2019) Privacy concerns Source: Keusch, Struminskaya, Kreuter, Weichbold (2021) Hypothetical willingness from the LISS Panel Hypothetical willingness to share sensor data Share GPS location Share video 10 20 30 40 50 60 10 20 30 40 50 60 Percent Percent n= 2,678 Dutch smartphone 0 0 def. yes prob. yes prob. no def. no def. yes prob. yes prob. no def. no users Share photo house Share photo self Randomized order of sensor measurements 10 20 30 40 50 60 10 20 30 40 50 60 Percent Percent 0 0 def. yes prob. yes prob. no def. no def. yes prob. yes prob. no def. no 11 CBS Consent Survey: Willingness Willingness to share sensor data 66.5 15.7 12.4 18.7 14.5 0 25 50 75 100 Percent GPS Video Photo house Photo receipt Photo self n = 1,883 Dutch smartphone & tablet users Struminskaya 12 et al. (2018) CBS Consent Survey: Actual participation Actual sharing of sensor data conditional on willingness 68.6 100.0 100.0 100.0 100.0 0 25 50 75 100 Percent GPS Video Photo house Photo receipt Photo self n=1883, smartphone & tablet users 13 Examples of new forms of data Smartphone sensors NFC Bluetooth Air humidity sensor Thermometer Proximity sensor Wi-Fi GPS Microphone Cellular network Light sensor Fingerprint sensor Camera Barometer Compass Accelerometer Pedometer Gyroscope 15 Smartphone sensors Proximity NFC Bluetooth Air humidity sensor Thermometer Location Proximity sensor Wi-Fi GPS Microphone Cellular network Light sensor Ambience Fingerprint sensor Camera Barometer Compass Accelerometer Pedometer Gyroscope Physical activity 16 Smartphone sensors Examples of use of sensor data: acceleration data SurveyMotion JavaScript-based Total Acceleration Completion Behavior Fitness Tasks: Squats Taken from Höhne & Schlosser (2019) and Elevelt, Höhne, & Blom (2019) More on sensors Examples of use of sensor data: acceleration data Wearables Wrist worn GENEActiv Axivity ax3 at upper thigh (Total) Physical Activity UK Millennium Cohort Study: Gilbert & Calderwood (2018) SHARE: Scherpenzeel, Angleys, & Weiss (2018) More on sensors Examples of use of sensor data: purchases Several organizations are using cameras to scan receipts (understanding society, official statistics) Work by Wenz, Jäckle in Understanding Society (UK) https://www.understandingsociety.ac.uk/research/publicati ons/subject/Information+And+Communication+Technologi es Work by Rodenburg, Schouten, Struminskaya (NL, CBS) App data The Tabi app (with Statistics Netherlands) Travel mode and history Open source: : https://gitlab.com/tabi/tabi-app App data The Tabi app (with Statistics Netherlands) Some examples of studies using sensors Topic Type of sensor Reference Social networks GPS, encrypt. call logs, proximity Eagle et al. 2009 Spatial segregation GPS Palmer et al. 2013 Saad-Sulonen 2008 Urbanicity GPS, camera (photos) Jones et al. 2011 GPS Wiehe et al. 2008 GPS Giannotti & Rinzivillo 2014 Geurs et Mobility GPS, accelerometer (app) al. 2015 GPS (app) Althoff et al. 2017 wearables: GPS + accelerometer Rosli et al. 2013 Physical activity wearables: accelerometers Scherpenzeel 2017 Special populations: GPS, call logs Sugie 2016 Hard-to-reach camera (photos) Plowman & Stevenson 2012 Fritz et Small children GPS, camera (photos) al. 2017 Older minorities Media use Boase & Ling 2013 passive measurement apps (meter) Online behavior Revilla et al. 2016 Some examples of studies collecting biomeasures Survey Country Type of data The LISS Panel NL Blood and saliva samples The Survey of Health, Ageing and Dried blood spots, lung strengh, grip strength, DE Retirement in Europe (SHARE) walking speed, chair stand Dried blood spots, saliva, breathing test, grip Health and Retirement Study (HRS) USA strength, blood pressure, timed walk, pulse Waist & hip circumference, blood pressure, pulse, National Social Life, Health, and Aging USA timed get-up-and-go, timed walk, saliva, blood spots, Project (NSHAP) urine, vaginal swabs Blood pressure, extensive anthropometrics, vision, The National Health and Nutrition USA hearing, oral health, timed walk, balance test, hair Examination Study (NHANES) sample & more Blood pressure, heart rate, pulse, grip strength, The Irish Longitudinal Study of Ageing timed get-up-and-go, walking speed, hip & waist IE (TILDA) circumference, visual activity, contrast sensitivity, bone density & more (at health centers & R home) Some examples of studies using linkage to administrative data Survey Country Type of data Various Statistics Netherlands (CBS) The Longitudinal Internet Studies NL records (old age pension benefits, for the Social Science (LISS) savings, health) The Survey of Health, Ageing and DE Social security data Retirement in Europe (SHARE) Labour Market and Social Security DE Social security data (PASS) Health and Retirement Study (HRS) USA Social security, medicare claims Panel Study of Income Dynamics Social security, medicare claims USA (PSID) Understanding Society – The UK Tax and benefit records, education, UK Household Longitudinal Study health records Break Do you trust the conclusion of these studies? Consider the following example: Case 1: A researcher wants to understand the percentage of individuals (18-80 years old) who are worried about climate change in Germany. They decide to analyze tweets written in German that contains words related to climate change. They conclude that 57% of Germans are worried. 26 Do you trust the conclusion of these studies? Consider the following example: Case 2: A researcher wants to study social isolation of young people (15-18 years old) and decide to advertise a survey on Instagram. The results is that 23% of young people is suffering form social isolation. 27 Shall we use social media as replacement for official statistics? Sentiment on Twitter data replicates consumer confidence and presidential job approval from surveys SM Job-loss index aligns with the Department of Labor’s Initial Claims for Unemployment Insurance 28 Shall we use social media as replacement for official statistics? Initially: good alignment between measures, good candidate for replacement But then: Conrad et al. (2015) replicated O’Connor et al. (2010): degradation of the relationship between SM and traditional indexes after 2011 SM Job Loss index (Antenucci et al., 2014), starting from mid-2014 began to diverge to the actual claim for unemployment 29 Shall we use social media as replacement for official statistics? Conrad et al. (2021), after several experiments on the original O’Connor et al. (2010) analysis, conclude that the relationship between the data was “more than a chance occurrence” Micro-decisions in the analysis can potentially strongly affect the results Pasek et al. (2018) argue that at the current time SM data may “only be fit for purpose in replacing survey data under very limited conditions” 30 And what about non-probability sample surveys? 31 What could be the reasons for these inferential "failures"? 32 What could be the reasons for these "failures"? Some Examples Measurement problems Selection problems 33 What could be the reasons for these "failures"? A sampling perspective Selection Sample Inference Population Sample in dutch: steekproef Coffee beans Soup Steering and tasting: randomization/ random selection Probability vs nonprobability sample surveys Probability samples (PS) Non-Probability samples (NPS) Allow inferences to the general population Drawing inference is hard or not possible High data quality More affordable, timely, convenient, new aspects of phenomena Rely on sampling theory No unified inferential framework Design/Model based inference Unknown selection mechanism: Falling response rate, time-consuming, expensive Self-selection → selection bias (SB) Diverse: NPS surveys, digital traces 35 Probability Surveys vs digital trace data Surveys Digital trace data “Designed” data: Collected for the Large N research purposes Possibly lower measurement error due to self-report Researcher control over content Possibly more granular (time and space) PRO Large number of covariates Detailed documentation of the data generating process High nonresponse “Organic” data: Collected for purposes other than Small N research Measurement error (recall, social No control over content desirability) Limited number of covariates CONS No / little documentation Access issues (Missingness & coverage) Other types of measurement error (based on Baker 2018, Groves 2011, Sakshaug 2015, Salganik 2018) Probability sample are the gold standard, but expensive, low response rates…. …NPS surveys and digital trace are more convenient, sometimes detailed… What can we do if none of the source is perfect? 37 Combining data sources might be the solution 38 Two principles of Data Integration (DI) 1. DI is statistics and purpose specific 2. DI is a puzzle (also in terms of data quality) How do we integrate data? Why data integration? Finite population and analytic inference Structured and unstructured data Variables are available in all sources or only in some One source is used as a supplement or to correct for selection bias Combining different sources to improve measurement 39 A Quality perspective Total Survey Error … For digital traces 40 Exercise (New) forms of behavioral data An error perspective Let us consider this example: Can we study physical activity using passive data? Althoff, T., Hicks, J. L., King, A. C., Delp, S. L., & Leskovec, J. (2017). Large-scale physical activity data reveal worldwide activity inequality. Nature, 547 (7663), 336-339 (New) forms of behavioral data An error perspective GROUP EXERCISE Which errors should we consider? Construct Target Population Sampling Measurement Frame Sample Response Respondents Edited Response Postsurvey Adjustments Survey Statistic (Groves et al. 2004) Your solution: Target Construct Population Not everyone has an Copy and paste Coverage Apple Phone (so does this post-it Error not have the app) Validity Sampling Frame Measuremen t Sampling Error People report weight, Measureme nt Error height themselves. Can Sample lie about this. Nonrespon se Error Response Respondents Processing Error Adjustmen t Error Edited Response Postsurvey Copy and paste adjustments this post-it Standard deviation is very high Survey Statistics Your solution: Copy and paste Target Construct this post-it Population Only iPhone users are included Coverage Do steps represent physical activity? Error App users might not be (e.g., lifing weights, biking) representative for general population Validity Sampling Frame Measuremen Each country proportionally t Sampling represented? Error Is the technology working properly? (accelerometer, IP-addresses) Measureme nt Error Self report error for demographic Sample information No information on nonresponse Nonrespon se Error Response Non recorded days Respondents Processing Error Adjustmen t Error Edited Response Postsurvey adjustments Survey Statistics Your solution: (New) forms of behavioral data An error perspective Which errors should we consider? (some ideas) Construct Target Population Coverage only iPhone users Error Sampling Measurement Frame Sampling only volunteers Measurement Error device malfunction/ Error out of power Sample Response Nonresponse only those who Processing Error downloaded the app Error Respondents Edited Adjustment population Error Response Postsurvey characteristics known? Adjustments Survey Statistic (Groves et al. 2004) (New) forms of behavioral data Additional reading Struminskaya, B., Lugtig, P., Toepoel, V., Schouten, B., Giesen, D., Dolmans, R. (2021). Sharing data collected with smartphone sensors: Willingness, participation, and non-participation bias. Public Opinion Quarterly 85, 423- 462. https://doi.org/10.1093/poq/nfab025 Struminskaya, B., Toepoel, V., Lugtig, P., Haan, M., Luiten, A., Schouten, B. (2020). Understanding willingness to share smartphone-sensor data. Public Opinion Quarterly 84(3), 725-759. https://doi.org/10.1093/poq/nfaa044 Struminskaya, B., Lugtig, P., Keusch, F., Höhne, J. (2020). Augmenting Surveys with Data from Sensors and Apps: Opportunities and Challenges. Social Science Computer Review Special Issue ‘Using Mobile Apps and Sensors in Surveys’ https://doi.org/10.1177/0894439320979951 Struminskaya, B. & Keusch, F. (2020). From web surveys to mobile web to apps, sensors, and digital traces. Survey Methods: Insights from the Field, Special Issue: ‘Advancements in Online and Mobile Survey Methods’. DOI:10.13094/SMIF-2020-00015 Next week: In class quiz, similar structure to the exam 49