Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2015 | 3 (49) | 9-19

Article title

The use of data mining models in solving the problem of imbalanced classes based on the example of an online marketing campaign

Content

Title variants

PL
Wykorzystanie modeli data mining w rozwiązywaniu problemu niezrównoważonych klas na przykładzie kampanii marketingowych w Internecie

Languages of publication

EN

Abstracts

EN
While building predictive models in analytical CRM, researchers often encounter the problem of imbalanced classes (skewed distributions of dependent variables), which consists in the fact that the number of observations belonging to one category of the dependent variable is much lower than the number of observations belonging to the second category of that variable. This is related to such areas as churn analysis, customer acquisition models and cross and up-selling models. The purpose of the paper is to present a predictive model that was built to predict the response of Internet users to banner advertising. The dataset used in the study came from an online social network which offers advertisers banner campaigns targeting its users. The advertising campaign of a cosmetics company was carried out in the autumn of 2010 and was mainly targeted at young women. A user of this service was described by 115 independent variables – 3 out of which were demographic variables (sex, age, education), and the remaining 112 referred to the user’s online activity. While building the model there appeared the problem of imbalanced classes due to the low number of users who clicked on the banner ad. The number of cases amounted to 81,000, while the number of positive reactions to the banner was 207, which constitutes approximately 0.25% of the dependent variable. During the study, two popular data mining tools were utilized – the decision trees C&RT and Random Forest. The second goal of this paper is to compare the performance of the predictive models based on both these analytical tools.

Contributors

References

  • Breiman L., 2001, Random Forests, Machine Learning, 45, Kluwer Academic Publishers, pp. 5-32.
  • Breiman L. Friedman J.H., Olshen R.A., Stone C.J., 1984, Classification and Regression Trees, Chapman and Hall, London.
  • Breiman L., Cutler A., Random Forests, paper downloaded from stat-www.berkeley.edu, (15.10.2007).
  • Buntine W., 1993, Tree classification software, NASA, Washington, Technology 2002: The Third National Technology Transfer Conference and Exposition, Volume 1, pp. 289-298.
  • Chipman H.A., George E.I., McCulloch R.E., 1998, Bayesian CART models search, Journal of the American Statistical Association, September, Vol. 93, No. 443, pp. 935-960.
  • Chen C., Liaw A., Breiman L., 2004, Using random forest to learn unbalanced data, Technical Report, No 666, Statistics Department, University of California at Berkeley.
  • Chiu S. Tavella D., 2008, Data Mining and Market Intelligence for Optimal Marketing Returns, Elsevier, Amsterdam.
  • Crawford S.L., 1989, Extension to the CART algorithm, International Journal Man-Machine Studies, Vol. 31, pp. 197-217.
  • Goldfarb A., Tucker C., 2011,Online display advertising: targeting and intrusiveness, Marketing Science, Vol. 30 No. 3, May-June, pp. 389-404.
  • Hollis, N., 2005, Ten years of learning on how online advertising builds brands, Journal of Advertising Research, 45(2), pp. 255-268.
  • Ling C.X., Sheng V.S., 2008, Cost-Sensitive Learning and the Class Imbalance Problem, [in:] Encyclopedia of Machine Learning, ed. C. Sammut, Springer Verlag, Berlin, pp. 167-168.
  • Loh W-Y., Vanichsetakul N., 1988, Tree-structured classification via generalized discriminant analysis, Journal of the American Statistical Association, September, Vol. 83, No. 403, pp. 715-725.
  • Raskutti B., Kowalczyk A., 2004, Extreme rebalancing for SVMs: a case study, SIGKDD Explorations, Vol. 6, Issue 1, pp. 60-69.
  • Surma J., Furmanek A., 2011, Data mining in on-line social network for marketing response analysis, The Third IEEE International Conference on Social Computing (SocialCom2011), MIT, Cambridge, pp. 537-540.

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.desklight-85202662-77fe-473a-9661-4813310e6e14
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.