Methods for combining probability and nonprobability samples under unknown overlaps

Savitsky, Terrance D.; Williams, Matthew R.; Gershunskaya, Julie; Beresovsky, Vladislav

doi:10.59170/stattrans-2023-061

Article details

Journal

Statistics in Transition new series

2023 | 24 | 5 | 1-34

Article title

Methods for combining probability and nonprobability samples under unknown overlaps

Authors

Terrance D. Savitsky , Matthew R. Williams , Julie Gershunskaya , Vladislav Beresovsky

Content

Full texts:

Download

Title variants

Languages of publication

Abstracts

EN

Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the convenience sample is different from the population distribution. A recent set of approaches estimates inclusion probabilities for convenience sample units by specifying reference sample-weighted pseudo likelihoods. This paper introduces a novel approach that derives the propensity score for the observed sample as a function of inclusion probabilities for the reference and convenience samples as our main result. Our approach allows specification of a likelihood directly for the observed sample as opposed to the approximate or pseudo likelihood. We construct a Bayesian hierarchical formulation that simultaneously estimates sample propensity scores and the convenience sample inclusion probabilities. We use a Monte Carlo simulation study to compare our likelihood based results with the pseudo likelihood based approaches considered in the literature.

Keywords

EN

Survey sampling Nonprobability sampling Data combining Inclusion probabilities Exact sample likelihood Bayesian hierarchical modeling

Publisher

Główny Urząd Statystyczny

Journal

Statistics in Transition new series

Year

2023

Volume

24

Issue

5

Pages

1-34

Physical description

Dates

published

2023

Contributors

author

Terrance D. Savitsky

Office of Survey Methods Research, U.S. Bureau of Labor Statistics

https://orcid.org/0000000318433106

author

Matthew R. Williams

RTI International

https://orcid.org/0000000188941240

author

Julie Gershunskaya

OEUS Statistical Methods Division, U.S. Bureau of Labor Statistics

https://orcid.org/000000020096186X

author

Vladislav Beresovsky

Office of Survey Methods Research, U.S. Bureau of Labor Statistics

https://orcid.org/0009000283755195

References

Beaumont, J.-F., (2020). Are probability surveys bound to disappear for the production of official statistics? Survey Methodology, 46, 1–28.
Beresovsky, V., (2019). On application of a response propensity model to estimation from web samples. https://www.researchgate.net/publication/333915871_On_application_of_a_response_propensity_model_to_estimation_from_web_ samples.
Bhattacharya, A., D. Pati, and Y. Yang, (2019). Bayesian fractional posteriors. The Annals of Statistics, 47(1), 39 – 66.
Binder, D. A., (1996). Taylor linearization for single phase and two phase samples: A cookbook approach. Survey Methodology, 17–26.
Carvalho, C. M., N. G., Polson, and J. G. Scott (2009, 16–18 Apr). Handling sparsity via the horseshoe. In D. van Dyk and M. Welling (Eds.), Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Volume 5 of Proceedings of Machine Learning Research, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, pp. 73–80. PMLR.
Chen, Y., P. Li, and C. Wu, (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115(532), 2011–2021.
DiSogra, C., C. Cobb, E. Chan, and J. M. Dennis (2011). Calibrating nonprobability internet samples with probability samples using early adopter characteristics. JSM Proceedings, Survey Research Methods Section, Alexandria, VA: American Statistical Association., pp. 4501–4515.
Elliott, M. R., (2009). Combining data from probability and non-probability samples using pseudo-weights. Survey Practice 2, 813–845.
Elliott, M. R. and R. Valliant, (2017). Inference for Nonprobability Samples. Statistical Science, 32(2), 249 – 264.
Gelman, A., D. Lee, and J. Guo, (2015). Stan: A probabilistic programming language for bayesian inference and optimization. In press. Journal of Educational and Behavior Science.
Johnson, N. G., M. R. Williams, and E. C. Riordan, (2021). Generalized nonlinear models can solve the prediction problem for data from species-stratified use-availability designs. Diversity and Distributions, 27(11), 2077–2092.
Lancaster, T. and G. Imbens, (1996). Case-control studies with contaminated controls. Journal of Econometrics, 71(1-2), 145–160.
Leon-Novelo, L. G. and T. D. Savitsky, (2019). Fully Bayesian estimation under informative sampling. Electronic Journal of Statistics, 13(1), 1608 – 1645.
Reiter, J. P. and T. E. Raghunathan, (2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102(480), 1462–1471.
Tillé, Y. and A. Matei, (2021). sampling: Survey Sampling. R package version 2.9. Valliant, R., (2020). Comparing alternatives for estimation from nonprobability samples. Journal of Survey Statistics and Methodology, 8(2), 231–263.
Valliant, R. and J. A. Dever, (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods and Research, 40, 105–137.
Wang, L., R. Valliant, and Y. Li, (2021). Adjusted logistic propensity weighting methods for population inference using nonprobability volunteer-based epidemiologic cohorts. Stat Med., 40(4), 5237–5250.
Williams, M. R. and T. D. Savitsky, (2021). Uncertainty Estimation for Pseudo-Bayesian Inference Under Complex Sampling. International Statistical Review, 89(1), 72–107.
Wu, C., (2022). Statistical inference with non-probability survey samples. Survey Methodology, 48(2), 283–311.

Document Type

Publication order reference

Identifiers

DOI

10.59170/stattrans-2023-061

Biblioteka Nauki

31342142

YADDA identifier

bwmeta1.element.ojs-doi-10_59170_stattrans-2023-061

Article details

Journal

Statistics in Transition new series

Article title

Methods for combining probability and nonprobability samples under unknown overlaps

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier