Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2023 | 24 | 5 | 1-34

Article title

Methods for combining probability and nonprobability samples under unknown overlaps

Content

Title variants

Languages of publication

Abstracts

EN
Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the convenience sample is different from the population distribution. A recent set of approaches estimates inclusion probabilities for convenience sample units by specifying reference sample-weighted pseudo likelihoods. This paper introduces a novel approach that derives the propensity score for the observed sample as a function of inclusion probabilities for the reference and convenience samples as our main result. Our approach allows specification of a likelihood directly for the observed sample as opposed to the approximate or pseudo likelihood. We construct a Bayesian hierarchical formulation that simultaneously estimates sample propensity scores and the convenience sample inclusion probabilities. We use a Monte Carlo simulation study to compare our likelihood based results with the pseudo likelihood based approaches considered in the literature.

Year

Volume

24

Issue

5

Pages

1-34

Physical description

Dates

published
2023

Contributors

  • Office of Survey Methods Research, U.S. Bureau of Labor Statistics
  • RTI International
  • OEUS Statistical Methods Division, U.S. Bureau of Labor Statistics
  • Office of Survey Methods Research, U.S. Bureau of Labor Statistics

References

  • Beaumont, J.-F., (2020). Are probability surveys bound to disappear for the production of official statistics? Survey Methodology, 46, 1–28.
  • Beresovsky, V., (2019). On application of a response propensity model to estimation from web samples. https://www.researchgate.net/publication/333915871_On_application_of_a_response_propensity_model_to_estimation_from_web_ samples.
  • Bhattacharya, A., D. Pati, and Y. Yang, (2019). Bayesian fractional posteriors. The Annals of Statistics, 47(1), 39 – 66.
  • Binder, D. A., (1996). Taylor linearization for single phase and two phase samples: A cookbook approach. Survey Methodology, 17–26.
  • Carvalho, C. M., N. G., Polson, and J. G. Scott (2009, 16–18 Apr). Handling sparsity via the horseshoe. In D. van Dyk and M. Welling (Eds.), Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Volume 5 of Proceedings of Machine Learning Research, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, pp. 73–80. PMLR.
  • Chen, Y., P. Li, and C. Wu, (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115(532), 2011–2021.
  • DiSogra, C., C. Cobb, E. Chan, and J. M. Dennis (2011). Calibrating nonprobability internet samples with probability samples using early adopter characteristics. JSM Proceedings, Survey Research Methods Section, Alexandria, VA: American Statistical Association., pp. 4501–4515.
  • Elliott, M. R., (2009). Combining data from probability and non-probability samples using pseudo-weights. Survey Practice 2, 813–845.
  • Elliott, M. R. and R. Valliant, (2017). Inference for Nonprobability Samples. Statistical Science, 32(2), 249 – 264.
  • Gelman, A., D. Lee, and J. Guo, (2015). Stan: A probabilistic programming language for bayesian inference and optimization. In press. Journal of Educational and Behavior Science.
  • Johnson, N. G., M. R. Williams, and E. C. Riordan, (2021). Generalized nonlinear models can solve the prediction problem for data from species-stratified use-availability designs. Diversity and Distributions, 27(11), 2077–2092.
  • Lancaster, T. and G. Imbens, (1996). Case-control studies with contaminated controls. Journal of Econometrics, 71(1-2), 145–160.
  • Leon-Novelo, L. G. and T. D. Savitsky, (2019). Fully Bayesian estimation under informative sampling. Electronic Journal of Statistics, 13(1), 1608 – 1645.
  • Reiter, J. P. and T. E. Raghunathan, (2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102(480), 1462–1471.
  • Tillé, Y. and A. Matei, (2021). sampling: Survey Sampling. R package version 2.9. Valliant, R., (2020). Comparing alternatives for estimation from nonprobability samples. Journal of Survey Statistics and Methodology, 8(2), 231–263.
  • Valliant, R. and J. A. Dever, (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods and Research, 40, 105–137.
  • Wang, L., R. Valliant, and Y. Li, (2021). Adjusted logistic propensity weighting methods for population inference using nonprobability volunteer-based epidemiologic cohorts. Stat Med., 40(4), 5237–5250.
  • Williams, M. R. and T. D. Savitsky, (2021). Uncertainty Estimation for Pseudo-Bayesian Inference Under Complex Sampling. International Statistical Review, 89(1), 72–107.
  • Wu, C., (2022). Statistical inference with non-probability survey samples. Survey Methodology, 48(2), 283–311.

Document Type

Publication order reference

Identifiers

Biblioteka Nauki
31342142

YADDA identifier

bwmeta1.element.ojs-doi-10_59170_stattrans-2023-061
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.