Results found: 2

Search results

Sort By:

Limit search:

Applications of Google Trends as a Data Source for Statistical Models

100%

Lenart K.

Acta Universitatis Lodziensis. Folia Oeconomica

2024

vol. 3

issue 368

69-81

Wraz z postępem technologicznym rośnie liczba potencjalnych źródeł danych, które mogą stanowić alternatywę dla tradycyjnych badań ankietowych. Przykładem tego mogą być dane o popularności wyszukiwań, udostępniane w czasie rzeczywistym za pośrednictwem Google Trends. Dane tego typu pozwalają na badanie zachowań, postaw w społeczeństwie i opinii publicznej czy prognozowanie zjawisk ekonomicznych. Zaletą wykorzystania danych o popularności wyszukiwań jest natychmiastowy czas i niski koszt ich pozyskania. Nie bez znaczenia jest też fakt, że Google Trends pozwala na bezpośrednie badanie zachowań użytkowników internetu, a nie jedynie ich deklaracji jak w przypadku ankiety. Może to mieć znaczenie, jeżeli ankietowani uważają którąś z odpowiedzi za bardziej moralnie słuszną. Korzystanie z Google Trends wymaga jednak trafnego dobrania uwzględnianych w badaniu wyszukiwań oraz świadomości ograniczenia próby badawczej do użytkowników wyszukiwarki Google. W ramach artykułu zaprezentowano wady i zalety Google Trends oraz zweryfikowano przydatność tego źródła danych, w szczególności w okresach zwiększonej zmienności na rynkach.

As technology advances, there is a growing number of potential data sources that can provide an alternative to traditional surveys. An example of this is the real time search popularity data made available through Google Trends. This type of data makes it possible to study public opinion, behaviour and attitudes in society or forecast economic phenomena. A definite advantage of using search popularity data is the immediate availability and low cost of obtaining such data. Also of significance is the fact that the Google Trends tool allows for direct research into the behaviour of Internet users, and not just their declarations as in the case of a survey. This can make a difference if respondents consider one of the answers to be more morally correct. Nevertheless, the use of Google Trends requires selecting correct search topics and terms to be included in the study and an awareness of the fact that the research sample is limited to Google search engine users. The paper will present the advantages and disadvantages of Google Trends and review its usefulness as a data source especially in times of higher market volatility.

Comparison of Machine Learning and Statistical Approaches of Detecting Anomalies Using a Simulation Study

100%

Lenart K.

Econometrics. Ekonometria. Advances in Applied Data Analytics

2024

vol. 28

issue 4

23-31

Cel: Anomalia to obserwacja lub grupa obserwacji nietypowych dla danego zbioru danych. Wykrywanie anomalii ma wiele zastosowań, nie tylko jako etap przygotowania danych do dalszych analiz, lecz także jako sposób wykrywania oszustw z wykorzystaniem kart kredytowych, włamań do sieci i wielu innych. Istnieją różne metody wykrywania anomalii. Można wyróżnić dwie grupy metod, które rozwijane są niezależnie: metody statystyczne oraz algorytmy uczenia maszynowego. Grupy te nieczęsto są porównywane. Podczas gdy metody statystyczne oparte są na sformułowaniu miary nietypowości obserwacji, nadzorowane uczenie maszynowe umożliwia wykorzystanie danych zarówno o typowych obserwacjach, jak i wcześniej zidentyfikowanych anomaliach. Celem artykułu jest dokonanie porównania tych dwóch podejść na podstawie badań symulacyjnych. Metodyka: W przeprowadzonych badaniach symulacyjnych wykorzystano dane wygenerowane przy użyciu funkcji kopula. W celu wygenerowania różnych rodzajów anomalii dokonano modyfikacji parametrów oraz postaci rozkładów brzegowymi zmiennych. Skuteczność każdej z metod została oceniona na podstawie miar dokładności klasyfikacji. Wyniki: Podczas gdy skuteczność metod statystycznych zależna była od trafnego zaprognozowania procenta anomalii, jaki pojawi się w danych, metody uczenia maszynowego charakteryzowały się niższą czułością w przypadku wprowadzenia mniejszych zmian wartości parametrów. Implikacje i rekomendacje: W przypadku metod statystycznych przedstawionych w ramach artykułu kluczowe było posiadanie wiedzy o rozkładzie zmiennych, podczas gdy do zastosowania algorytmów nadzorowanego uczenia maszynowego konieczne było posiadanie zbioru uczącego. W przeciwieństwie do uczenia maszynowego, metody statystyczne uzyskiwały podobną trafność w przypadku wprowadzenia mniejszych zmian wartości parametrów. Oryginalność/wartość: Dwa podejścia do wykrywania anomalii zaprezentowane w artykule są nieczęsto porównywane. Zazwyczaj metody te są wykorzystywane przez dwie odrębne grupy badaczy – statystyków oraz specjalistów z zakresu uczenia maszynowego lub data science.

Aim: An anomaly is an observation or a group of observations that is unusual for a given dataset. Anomaly detection has many applications, not only as a step of data preparation but also, for example, as a way of identifying credit card fraud detection, network intrusions and much more. There are diverse methods of anomaly detection. In particular two groups of methods have been developed independently – statistical methods and machine learning algorithms. Those methods are rarely compared. While statistical methods focus on formulating a measure of the abnormality of the observations, supervised machine learning makes it possible to use data about typical observations and previously identified anomalies. The aim of this paper was to compare the two approaches by conducting a simulation study. Methodology: A simulation study was conducted, during which the data was generated using copula functions. For the purpose of generating different types of anomalies, marginal distributions of the variables were manipulated. The effectiveness of each method was evaluated based on measures of classification model performance. Results: While the accuracy of the statistical methods was dependent on the precise prediction of the percentage of the anomalies that would occur in the data, the machine learning algorithms’ recall was significantly lower when the change in the marginal distribution of the value parameters was smaller. Implications and recommendations: For the statistical methods included in the study, knowledge about the distribution of the variables was crucial while the supervised machine learning algorithms required acquiring a training dataset. Unlike machine learning algorithms, the statistical methods performed with similar accuracy even when the change in the marginal distribution parameters’ value was smaller. Originality/value: The two approaches to anomaly detection presented in the paper are not often compared, usually used by two separate groups of researchers – statisticians and machine learning or data science specialists.

Refine search results

1 Acta Universitatis Lodziensis. Folia Oeconomica

1 Econometrics. Ekonometria. Advances in Applied Data Analytics

2 Lenart K.

2 2024

Search results

Applications of Google Trends as a Data Source for Statistical Models

Comparison of Machine Learning and Statistical Approaches of Detecting Anomalies Using a Simulation Study