Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2021 | 43 | 251-269

Article title

The use of web-scraped data to analyze the dynamics of footwear prices

Authors

Content

Title variants

Languages of publication

EN

Abstracts

EN
Aim/purpose – Web-scraping is a technique used to automatically extract data from websites. After the rise-up of online shopping, it allows the acquisition of information about prices of goods sold by retailers such as supermarkets or internet shops. This study examines the possibility of using web-scrapped data from one clothing store. It aims at comparing known price index formulas being implemented to the web-scraping case and verifying their sensitivity on the choice of data filter type. Design/methodology/approach – The author uses the price data scrapped from one of the biggest online shops in Poland. The data were obtained as part of eCPI (electronic Consumer Price Index) project conducted by the National Bank of Poland. The author decided to select three types of products for this analysis – female ballerinas, male shoes, and male oxfords to compare their prices in over one-year time period. Six price indexes were used for calculation – The Jevons and Dutot indexes with their chain and GEKS (acronym from the names of creators – Gini–Éltető–Köves–Szulc) versions. Apart from the analysis conducted on a full data set, the author introduced filters to remove outliers. Findings – Clothing and footwear are considered one of the most difficult groups of goods to measure price change indexes due to high product churn, which undermines the possibility to use the traditional Jevons and Dutot indexes. However, it is possible to use chained indexes and GEKS indexes instead. Still, these indexes are fairly sensitive to large price changes. As observed in case of both product groups, the results provided by the GEKS and chained versions of indexes were different, which could lead to conclusion that even though they are lending promising results, they could be better suited for other COICOP (Classification of Individual Consumption by Purpose) groups. Research implications/limitations – The findings of the paper showed that usage of filters did not significantly reduce the difference between price indexes based on GEKS and chain formulas. Originality/value/contribution – The usage of web-scrapped data is a fairly new topic in the literature. Research on the possibility of using different price indexes provides useful insights for future usage of these data by statistics offices.

Year

Volume

43

Pages

251-269

Physical description

Contributors

author
  • Department of Statistical Methods. Faculty of Economics and Sociology. University of Lodz, Łódź, Poland

References

  • Australian Bureau of Statistics [ABS]. (2018). Web scraping in the CPI Australian Bureau of Statistics. Retrieved from https://www.unece.org/fileadmin/DAM/stats/ documents/ece/ces/ge.22/2018/Australia_-_poster.pdf
  • Auer, J., & Boettcher, I. (2017). From price collection to price data analytics: How new large data sources require price statisticians to re-think their index compilation procedures. Experiences from web-scraped and scanner data. Paper presented on Ottawa Group Meeting. Retrieved from https://www.ottawagroup.org/Ottawa/ ottawagroup.nsf/4a256353001af3ed4b2562bb00121564/1ab31c25da944ff5ca25822 c00757f87/$FILE/From%20price%20collection%20to%20price%20data%20analytics%20-Josef%20Auer,%20Ingolf%20Boettcher%20-Paper.pdf
  • Białek, J., & Bobel, A. (2019). Comparison of price index methods for CPI measurement using scanner data. Paper presented at the 16th Meeting of the Ottawa Group on Price Indices, Rio de Janeiro, Brazil. Retrieved from https://eventos.fgv.br/sites/ eventos.fgv.br/files/arquivos/u161/bialek_bobel_paper_2.pdf
  • Bitner, T., & Stech, G. (2019). GUS: Big Data to nasz priorytet. Wywiad z Dominikiem Rozkrutem, prezesem GUS [CSO: Big Data is our priority. An interview with Dominik Rozkrut, president of Central Statistical Office in Poland]. Retrieved from https://www.computerworld.pl/wywiad/GUS-Big-Data-to-nasz-priorytet,412891.html
  • ten Bosch, O. (n.d.). Uses of web scraping for official statistics ESTP course on big data sources – web, social media and text analytics. Retrieved from https://circabc. europa.eu/sd/a/5e250346-44a9-471b-87f1-5b5ddb59aa77/1_Big%20Data%20 Sources%20part3-Day%201-A%20Use.pdf
  • Cavallo, A. (2013). Online vs official price indexes: Measuring Argentina’s inflation (Research Paper, No. 4975-12). Cambridge: MA: MIT Sloan. https://doi.org/ 10.2139/ssrn.1906704
  • Cavallo, A. (2017, January). Are online and offline prices similar? Evidence from large multi-channel retailers. American Economic Review, 107(1), 283-303. https://doi. org/10.1257/aer.20160542
  • Cavallo, A. (2018, March). Scraped data and sticky prices. The Review of Economics and Statistics, 100(1), 105-119. https://doi.org/10.1162/REST_a_00652
  • Cavallo, A., & Rigobon, R. (2016, Spring). The billion prices project: Using online prices for measurement and research. Journal of Economic Perspectives, 30(2), 151-178. https://doi.org/10.1257/jep.30.2.151
  • Chessa, A. G., & Griffioen, R. (2019). Comparing price indices and footwear for scanner data and web scraped data. Economie et Statistique, 509, 49-68. https:/doi.org/ 10.24187/ecostat.2019.509.1984
  • Chuanyang, F., & Lee Wen Hao, J. (2016). Experiences with the use of online prices in consumer price index. Singapore: Singapore Department of Statistics. Retrieved from https://www.singstat.gov.sg/-/media/files/publications/reference/newsletter/ ssnsep2016.pdf
  • Dutot, C. F. (1738). Reflexions politiques sur les finances et le commerce (tome 1). The Hague: Les Freres Vaillant et Nicolas Prevost.
  • Eurostat. (2021). Internet purchases by individuals [Data base]. Retrieved from https:// ec.europa.eu/eurostat/web/digital-economy-and-society/data/database
  • International Labour Organization, International Monetary Fund, Organisation for Economic Cooperation and Development, Statistical Office of the European Communities, United Nations, The International Bank for Reconstruction and Develop-ment, The World Bank. (2004). Consumer Price Index Manual: Theory and practice. Retrieved from https://www.ilo.org/wcmsp5/groups/public/---dgreports/---stat/documents/presentation/wcms_331153.pdf
  • Jevons, W. S. (1865, June). On the variation of prices and the value of the currency since 1782. Journal of the Statistical Society of London, 28, 294-320. Retrieved from https://archive.org/details/jstor-2338419/mode/2up
  • Juszczak, A. (2021). Usage of scraped data in price dynamic measurement. Acta Universitatis Lodziensis. Folia Oeconomica, 1(352), 25-37. https://doi.org/10.18778/ 0208-6018.352.02
  • Lunnemann, P., & Wintr, L. (2006). Are internet prices sticky? (ECB Working Paper, No. 645). Frankfurt am Main: European Central Bank. Retrieved from https:// www.ecb.europa.eu/pub/pdf/scpwps/ecbwp645.pdf
  • Macias, P., & Stelmasiak, D. (2018). Food inflation nowcasting with web scraped data (Working Paper, No. 302). Warsaw: NBP. Retrieved from https://www.nbp.pl/ publikacje/materialy_i_studia/302_en.pdf
  • Office for National Statistics [ONS]. (2017). Research indices using web scraped price data: August 2017 update. Retrieved June 20, 2020, from https://www.ons.gov.uk/ economy/inflationandpriceindices/articles/researchindicesusingwebscrapedprice data/august2017update
  • Office for National Statistics [ONS]. (2020). Using statistical distributions to estimate weights for web-scraped price quotes in consumer price statistics. Retrieved March 11, 2021 from https://www.ons.gov.uk/economy/inflationandpriceindices/articles/using statisticaldistributionstoestimateweightsforwebscrapedpricequotesinconsumerprice statistics/2020-09-01
  • Persson, E. (2019). Evaluating tools and techniques for web scraping. Retrieved from https://www.diva-portal.org/smash/get/diva2:1415998/FULLTEXT01.pdf
  • Polidoro, F., Giannini, R., Lo Conte, R., Mosca, S., & Rosetti, F. (2015). Web scraping techniques to collect data on consumer electronics and airfares for Italian HICP compilation. Statistical Journal of the IAOS, 31(2), 165-176. https://doi.org/ 10.3233/sji-150901
  • Radzikowski, B., & Śmietanka, A. (2016). Online CASE CPI. Paper presented at the First International Conference on Advanced Research Methods and Analytics, Universitat Politecnica de València, València, Spain, July 6-7, 2016. https://doi.org/ 10.4995/CARMA2016.2016.3133
  • Van Loon, K., & Roels, D. (2018). Integrating big data in the Belgian CPI. Paper presented at Meeting of the group of experts on Consumer Price Indices in Geneva, Switzerland 7-9 May. Brussels: StatBel Belgium in Figures. Retrieved from https:// unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.22/2018/Belgium.pdf

Document Type

Publication order reference

Identifiers

ISSN
1732-1948

YADDA identifier

bwmeta1.element.cejsh-fe222515-12cd-4001-b4f4-3c91bc435b1f
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.