The selection of areas for case study research in socio-economic geography with the application of k-means clustering

Warchalska-Troll, Agata; Warchalski, Tomasz

doi:10.5604/01.3001.0015.7717

Article details

Journal

Wiadomości Statystyczne. The Polish Statistician

2022 | 67 | 2 | 1-20

Article title

The selection of areas for case study research in socio-economic geography with the application of k-means clustering

Authors

Agata Warchalska-Troll , Tomasz Warchalski

Content

Full texts:

9-b5117dea-a641-46fa-9a0b-96dd9dd6bb84.pdf.pdf

Download

Title variants

PL

Wybór obszarów do studiów przypadku w geografii społeczno-ekonomicznej z zastosowaniem metody grupowania k-średnich

Languages of publication

Abstracts

PL

Znane w statystyce techniki grupowania są rzadko wykorzystywane przez geografów do wyboru obszaru badań. Celem analiz opisanych w artykule było sprawdzenie możliwości zastosowania metody podziału k-średnich do wyboru jednostek przestrzennych (w tym przypadku gmin) do studiów przypadku. Dokonano tego poprzez rozwiązanie problemu metodycznego polegającego na optymalnym wyznaczeniu gmin do pogłębionych badań nad relacją między ochroną przyrody a rozwojem lokalnym i regionalnym w polskich Karpatach. Szczególną uwagę zwrócono na określenie odpowiedniej liczby skupień za pomocą metody łokcia (ang. elbow method) oraz statystyki pseudo-F (wskaźnika Calińskiego-Harabasza). Dane wykorzystane w analizach pochodziły z Głównego Urzędu Statystycznego i obejmowały okres 1999–2012. W rezultacie kilkustopniowej procedury wytypowano gminy: Cisna, Lipinki, Ochotnica Dolna, Sękowa, Szczawnica i Zawoja. Opisany w artykule przykład pokazuje, że metoda k-średnich, pomimo pewnych słabości, może być przydatna do tworzenia klasyfikacji i typologii prowadzących do wyboru obszarów do studiów przypadku ze względu na jej użyteczność oraz dostępność w oprogramowaniu typu open source. Zarazem jednak – z uwagi na stopień złożoności społeczno-ekonomicznych cech obszarów – zastosowanie tej metody w geografii społeczno-ekonomicznej może wymagać wsparcia interpretacji jej wyników analizą dodatkowych źródeł informacji oraz wiedzą ekspercką.

EN

The grouping techniques which are known in statistics are rarely used by geographers to select a research area. The aim of the paper is to examine the potential use of the k-means clustering (partitioning) method for the selection of spatial units (here: gminas, i.e. the lowest administrative units in Poland) for case studies in socio-economic geography. We explored this topic by solving a practical problem consisting in the optimal designation of gminas for in-depth research on the interaction between nature protection and local and regional development in the Polish Carpathians. Particular attention was devoted to defining an appropriate number of clusters by means of the elbow method as well as the pseudo-F statistic (the Calinski-Harabasz index). The data for the analysis were mostly provided by Statistics Poland and covered the period of 1999–2012. The multi-stage procedure resulted in the selection of the following gminas: Cisna, Lipinki, Ochotnica Dolna, Sękowa, Szczawnica and Zawoja. The example described in the paper demonstrates that the k-means technique, despite its certain deficiencies, may prove useful for creating classifications and typologies leading to the selection of case study sites, as it is relatively time-effective, intuitive and available in opensource software. At the same time, due to the complexity of the socio-economic characteristics of the areas, the application of this method in socio-economic geography may require support in terms of the interpretation of the results through the analysis of additional data sources and expert knowledge.

Keywords

EN

case study k-means partitioning elbow method pseudo-F statistic Calinski-Harabasz index

PL

studium przypadku grupowanie metodą k-średnich metoda łokcia statystyka pseudo-F wskaźnik Calińskiego-Harabasza

Publisher

Główny Urząd Statystyczny

Journal

Wiadomości Statystyczne. The Polish Statistician

Year

2022

Volume

67

Issue

2

Pages

1-20

Physical description

Dates

published

2022

Contributors

author

Agata Warchalska-Troll

Instytut Rozwoju Miast i Regionów; Uniwersytet Jagielloński w Krakowie, Instytut Geografii i Gospodarki Przestrzennej / Institute of Urban and Regional Development; Jagiellonian University in Krakow, Institute of Geography and Spatial Management

https://orcid.org/0000000313143206

author

Tomasz Warchalski

https://orcid.org/0000000228942265

References

Babbie, E. (2007). Badania społeczne w praktyce. Wydawnictwo Naukowe PWN.
Bayisa, F. L., Ådahl, M., Rydén, P, & Cronie, O. (2020). Large-scale modelling and forecasting of ambulance calls in northern Sweden using spatio-temporal log-Gaussian Cox processes. Spatial Statistics, 39, 1-22. https://doi.org/10.1016/j.spasta.2020.100471.
Bole, D., Kozina, J, & Tiran, J. (2019). The variety of industrial towns in Slovenia: a typology of their economic performance. Bulletin of Geography. Socio-economic Series, 46(46), 71-83. http://doi.org/10.2478/bog-2019-0035.
Brauksa, I. (2013). Use of Cluster Analysis in Exploring Economic Indicator Differences among Regions: The Case of Latvia. Journal of Economics, Business and Management, 1(1), 42-45. http://doi.org/10.7763/JOEBM.2013.V1.10.
Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1-27.
Crone, T. M. (2005). An alternative definition of economic regions in the United States based on similarities in state business cycles. The Review of Economics and Statistics, 87(4), 617-626. https://doi.org/10.1162/003465305775098224.
Dawidowicz, D. (2020). Ocena sytuacji finansowej gmin z wykorzystaniem metody k-średnich. Wiadomości Statystyczne. The Polish Statistician, 65(7), 26-46. https://doi.org/10.5604 /01.3001.0014.3284.
ESRI. (n.d.). Grouping Analysis (Spatial Statistics) 8. Retrieved June 24, 2021, from https://pro.arcgis.com/en/pro-app/2.8/tool-reference/spatial-statistics/grouping-analysis.htm.
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis (5th edition). John Wiley & Sons. https://doi.org/10.1002/9780470977811.
Gao, P., & Kupfer, J. A. (2018). Capitalizing on a wealth of spatial information: Improving biogeographic regionalization through the use of spatial clustering. Applied Geography, 99, 98- 108. https://doi.org/10.1016/j.apgeog.2018.08.002.
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100-108. https://doi.org/10.2307/2346830.
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of Cluster in K-Means Clustering. International Journal of Advance Research in Computer Science and Management Studies, 1(6), 90-95.
Kong, W., Wang, Y., Dai, H., Zhao, L., & Wang, C. (2021). Analysis of energy consumption structure based on K-means clustering algorithm. E3S Web of Conferences, 267, 1-5. https://doi.org/10.1051/e3sconf/202126701054.
Kraszewska, B. (2016). Wykorzystanie analizy skupień w ocenie zróżnicowania zagrożenia ubóstwem w podregionach Polski. Wiadomości Statystyczne. The Polish Statistician, 61(5), 17- 36. https://doi.org/10.5604/01.3001.0014.0993.
Larose, D. T., & Larose, C. D. (2014). Discovering Knowledge in Data: An Introduction to Data Mining (2nd edition). John Wiley & Sons. https://doi.org/10.1002/9781118874059.
Li, X., Wang, L., & Liu, S. (2016). Geographical Analysis of Community Resilience to Seismic Hazard in Southwest China. International Journal of Disaster Risk Science, 7(3), 257-276. https://doi.org/10.1007/s13753-016-0091-8.
Lloyd, S. P. (1982). Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129-137. https://doi.org/10.1109/TIT.1982.1056489.
MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In L. M. Le Cam & J. Neyman (Eds.), Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281-297). University of California Press. https://projecteuclid.org/ebooks/berkeley-symposium-on-mathematical-statistics-and-probability /Some-methods-for-classification-and-analysis-of-multivariate-observations/chapter/Some-methods -for-classification-and-analysis-of-multivariate-observations/bsmsp/1200512992.
Malinowski, M. (2016). Potencjał ludzki a efektywność ekonomiczna przedsiębiorstw - wykorzystanie metod taksonomicznych w ujęciu regionalnym. Studia Regionalne i Lokalne, (2), 87-109. https://doi.org/10.7366/1509499526405.
Migdał-Najman, K. (2011). Ocena jakości wyników grupowania - przegląd bibliografii. Przegląd Statystyczny, 58(3-4), 281-299.
Mikuš, R., Máliková, L., & Lauko, V. (2016). An introductory study of perceptual marginality in Slovakia. Bulletin of Geography. Socio-economic Series, (34), 47-62. http://dx.doi.org/10.1515 /bog-2016-0034.
Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159-179. https://doi.org/10.1007 /BF02294245.
Nicholson, D., Vanli, O. A., Jung, S., & Ozguven, E. E. (2019). A spatial regression and clustering method for developing place-specific social vulnerability indices using census and social media data. International Journal of Disaster Risk Reduction, 38, 101-224 https://doi.org/10.1016 /j.ijdrr.2019.101224.
Novotná, M., Šlehoferová, M., & Matušková, A. (2016). Evaluation of spatial differentiation in the Pilsen region from a socio-economic perspective. Bulletin of Geography. Socio-economic Series, (34), 73-90. https://doi.org/10.1515/bog-2016-0036.
Peeples, M. A. (2011). R Script for K-Means Cluster Analysis. Retrieved May 27, 2021, from http://www.mattpeeples.net/kmeans.html.
R Core Team. (n.d.). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved August 30, 2020, from https://www.R-project.org/.
Steinhaus, H. (1957). Sur la division des corps matériels en parties. Bulletin L'Académie Polonaise des Sciences, 4(12), 801-804. http://www.laurent-duval.eu/Documents/Steinhaus_H_1956 _j-bull-acad-polon-sci_division_cmp-k-means.pdf.
Stukalo, N., & Simakhova, A. (2018). Global parameters of social economy clustering. Problems and Perspectives in Management, 16(1), 36-47. https://doi.org/10.21511/ppm.16(1).2018.04.
Taylor, L. (2016). Case Study Methodology. In N. Clifford, M. Cope, T. Gillespie & S. French (Eds.), Key Methods in Geography (3rd edition; pp. 581-595). SAGE Publications.
Thorndike, R. L. (1953). Who belongs in the family?. Psychometrika, 18(4), 267-276. https://doi.org/10.1007/BF02289263.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63(2), 411-423. https://doi.org/10.1111/1467-9868.00293.
Warchalska-Troll, A. (2018). Natura 2000 sites in the Polish Carpathians vs local development: inevitable conflict?. eco.mont Journal of Mountain Protected Areas Research and Management, 10(2), 50-58. https://doi.org/10.1553/eco.mont-10-2s50.
Warchalska-Troll, A. (2019). Do Economic Opportunities Offered by National Parks Affect Social Perceptions of Parks? A Study from the Polish Carpathians. Mountain Research and Development, 39(1), 37-46. https://doi.org/10.1659/MRD-JOURNAL-D-18-00055.1.
Zhang, Y., Moges, S., & Block, P. (2016). Optimal Cluster Analysis for Objective Regionalization of Seasonal Precipitation in Regions of High Spatial-Temporal Variability: Application to Western Ethiopia. Journal of Climate, 29(10), 3697-3717. https://doi.org/10.1175/JCLI-D-15-0582.1.

Document Type

Publication order reference

Identifiers

DOI

10.5604/01.3001.0015.7717

Biblioteka Nauki

1984996

YADDA identifier

bwmeta1.element.ojs-doi-10_5604_01_3001_0015_7717

Article details

Journal

Wiadomości Statystyczne. The Polish Statistician

Article title

The selection of areas for case study research in socio-economic geography with the application of k-means clustering

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier