Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Refine search results

Journals help
Authors help
Years help

Results found: 24

first rewind previous Page / 2 next fast forward last

Search results

in the keywords:  stylometry
help Sort By:

help Limit search:
first rewind previous Page / 2 next fast forward last
The paper focuses on the analysis of a sample of military language from the stylometric perspective. The corpus is the chronicle of the 8th Czech Armed Forces Guard Company, which operated at the Bagram Air Field base (BAF). We work on the assumptions that in the corpus, there will be (A) a prominent presence of military slang; (B) a high proportion of abbreviations; (C) frequent linguistic devices expressing mutuality and collectiveness of the soldiers’ enterprise. The texts were subjected to keyword and collocation analyses; these determined several stylistic features of theirs (such as use of English-based expressions, protocol-like language, or idiosyncratic collocations), which testify to the multifaceted character of the military chronicle genre.
The object of this paper is a quantitative study of sequential structures in the medieval Czech chronicle Dalimilova Kronika. The authors analyses style changes in the chronicle and tries to answer some questions concerning its authorship. Another topic discussed in this paper concerns the relationship between orality and literacy at the threshold of the Middle Ages in Europe. A philological approach, combined with quantitative tools including trend analysis and time series modeling, is applied in this paper.
An open stylometric system based on multilevel text analysisStylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications. Otwarty system stylometryczny wykorzystujący wielopoziomową analizę języka Zastosowania metod stylometrycznych na ogół ograniczają się do kilku typowych problemów badawczych, takich jak atrybucja autorska, styl gatunków literackich czy studia nad zróżnicowaniem stylistycznym kobiet i mężczyzn. Z pewnością dałoby się je z powodzeniem zastosować również do wielu innych problemów klasyfikacji tekstów, gdyby tylko owe metody oraz odpowiednie narzędzia były bardziej dostępne dla uczonych reprezentujących różne dyscypliny nauk humanistycznych i społecznych. Artykuł niniejszy omawia założenia teoretyczne oraz w pełni funkcjonalny prototyp otwartego systemu stylometrycznego, którego szerokie zastosowanie umożliwią dwie jego cechy: elastyczność techniczna oraz dostosowywalność do różnych pytań badawczych. System opiera się na instalacji serwerowej sprzęgniętej z sieciowym interfejsem użytkownika. Uwalnia to użytkownika od konieczności instalowania jakichkolwiek dodatkowych programów. Jednocześnie system oferuje wiele sposobów analizowania tekstów nie tylko na poziomie leksykalnym, lecz także poprzez cechy językowe niskiego poziomu. Daje to możliwość stosowania systemu na wiele różnych sposobów, od typowych testów stylometrycznych do analizy semantycznej dokumentów. Wewnętrzna architektura systemu składa się z wielu elementów znanych ze swej funkcjonalności, w tym z pakietu Stylo przeznaczonego do analiz stylometrycznych oraz pakietu Cluto służącego do zaawansowanej analizy skupień. Artykuł omawia: (1) Koncepcję całego systemu, postrzeganą z punktu widzenia użytkownika, (2) Architekturę systemu oraz jego elementy odpowiedzialne za przetwarzanie tekstu, (3) Cechy językowe służące do opisu dokumentów, (4) Zastosowanie modułów analizy danych, takich jak Stylo czy Cluto. W artykule zostały też przedstawione przykładowe zastosowania systemu.
In audiovisual translation, stylometry can be used to measure formal-aesthetic fidelity. We present a corpus-based measure of syntactic complexity as a feature of language style. The methodology considers hierarchical dimensions of syntactic complexity, using syllable counting and dependency parsing. The test material are dialogues of several characters from the TV show “Two and a Half Men”. The results show that characters do not differ syntactically among themselves as much as might be expected, and that, despite a general tendency to level differences even more in translation, the changes in syntactic complexity between the original and translation depend mostly on the respective character-feature combination.
This article discusses automatic extraction of relevant words from sets of texts. The author briefly presents three methods aimed to extract the words from the corpus of words with regard to their frequency, or words whose occurrence next to each other is not random. First, he focuses on the keyword analysis method, then he discusses the Zeta method developed by John Burrows and Hugh Craig, and the third method covered in the article is the topic modelling method, which is becoming very popular recently, and consists in finding clusters of words co-occurring in similar contexts. Topic modelling was intended for a quick content search in large collections of documents. On the basis of 100 Polish novels, the article presents how this method can be used for linguistic studies.
This paper is dedicated to the construction of a small cluster corpus of Polish texts from the period 1830–1918. The assumptions of the corpus, its micro- and macro-structure, as well as stylistic, regional and author diversity, and method of making it available are presented. Its application capabilities are illustrated on the example of orthographic, infl ectional, and syntactic studies.
The purpose of the article is to compare selected features of the style of utterances of professors and students in an oral exam as a communication situation. The research material consists of 25 recordings of oral exam (9 examiners with 32 students). They come from a corpus collected as part of GeWiss – a study project on the spoken scientific language. The texts were divided into two subcorpora: E (examiners) and S (students). Corpus linguistics methods were used in analysis. Several characteristic features of scientific and official styles were compared: numerous structures proszę + infinitive; nominal structures (nominal style); extensive hypotaxis. The analysis showed numerous stylistic similarities between the examined subcorpora. The style of none of the texts in the subcorpora is strongly nominal. A clear difference between the subcorpora is the presence of structures with the word proszę – it appears in the utterances of examiners, while in the utterances of students it is almost non-existent. The distribution of means responsible for cohesion in both subcorpora is different (parataxis is more common than hypotaxis but is implemented differently); also, there are differences in lists of one hundred most frequently used lexemes in the subcorpora – these differences allow us to distinguish these texts with tools for automatic style similarity analysis.
The article deals with the question of authorship of the thirteenth-century Chronica Polonorum (or Gesta principium Polonorum [The Deeds of the Princes of the Poles]), also known as The Polish Chronicle. It seeks to verify the hypothesis, recently reproposed by Tomasz Jasiński, whereby the author was of Venetian origin. The hypothesis is namely based on the textual similarities observed between Translatio Sancti Nicolai by an author referred to as the ‘Monk of Lido’ (Monachus Littorensis) and the Chronica. The attribution attempt put forth by M. Eder is based upon stylometric methods that measure the frequencies of the most frequent words in the texts under research (mainly, conjunctions, prepositions, pronouns, and particles) which are subsequently subjected to cluster analysis, multidimensional scaling, or principal components analysis. The outcome of the experiment in question has demonstrated a strong resemblance between the Translatio Sancti Nicolai and the Polish Chronicle, which may be regarded as an substantial argument in support of the Venetian background hypothesis.
Dynamiczny wzrost treści generowanych przez użytkowników w sieci stanowi poważne wyzwanie w zakresie ochrony użytkowników Internetu przed narażeniem na obraźliwe materiały, takie jak cyberprzemoc i mowa nienawiści, i jednoczesnego ograniczania rozprzestrzeniania nieetycznych zachowań. Jednak projektowanie zautomatyzowanych modeli wykrywania obraźliwych treści pozostaje złożonym zadaniem, szczególnie w językach o ograniczonych publicznie dostępnych danych. W naszych badaniach współpracujemy z serwisem internetowym Wykop.pl w celu uczenia modelu przy użyciu rzeczywistych treści, które podlegały usunięciu w procesie moderacji. W niniejszym artykule skupiamy się na języku polskim i omawiamy pojęcie zbiorów danych i metod anotacji, a następnie przedstawiamy naszą analizę stylometryczną treści z serwisu Wykop.pl w celu zidentyfikowania struktur morfosyntaktycznych, które są powszechnie aplikowane w języku cyberprzemocy i mowie nienawiści. Dzięki naszym badaniom mamy nadzieję na wniesienie wkładu w toczącą się dyskusję na temat obraźliwego języka i mowy nienawiści w badaniach socjolingwistycznych, podkreślając potrzebę analizy treści generowanych przez użytkowników w sieci.
The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.
Background: To recognize the authors of the texts by the use of statistical tools, one first needs to decide about the features to be used as author characteristics, and then extract these features from texts. The features extracted from texts are mostly the counts of so called function words. Objectives: The data extracted are processed further to compress as a data with less number of features, such a way that the compressed data still has the power of effective discriminators. In this case feature space has less dimensionality then the text itself. Methods/Approach: In this paper, the data collected by counting words and characters in around a thousand paragraphs of each sample book, underwent a principal component analysis performed using neural networks. Once the analysis was complete, the first of the principal components is used to distinguish the books authored by a certain author. Results: The achieved results show that every author leaves a unique signature in written text that can be discovered by analyzing counts of short words per paragraph. Conclusions: In this article we have demonstrated that based on analyzing counts of short words per paragraph authorship could be traced using principal component analysis. Methodology could be used for other purposes, like fraud detection in auditing.
Henryk Sienkiewicz’s novel "Quo Vadis" made its way into Italy at the end of the 19 th century through the efforts of Neapolitan translator Federigo Verdinois. The first part of this paper outlines the history of the popularity of "Quo Vadis" by focusing on the operations of Milanese publishers that made the Polish novel part of their offer in a variety of ways (as translations, adaptations, reworkings, plagiarisms, etc.). Bibliometric methods are used to establish why so many publishing houses decided to publish Henryk Sienkiewicz’s Roman romance. The analysis of the bibliometric data of the published translations helped assess and describe the extent and the character of the popularity that the novel garnered among Milanese publishers. The second part of the paper relates the findings of a multi-method quantitative study of the same material. The number of word tokens was compared between the original and the translations. The lexical richness across the texts under study was compared by means of the moving average type-token ratio (MATTR). Sentence lengths were also compared, as was sentence length distribution as time series. Two different programmes ("WCopyFind and Tracer") yielded very similar results on the degree of the similarity of five-word phrases in pairs of translations, which was determined in network analysis.
L’opera "Quo vadis" di Henryk Sienkiewicz arrivò in Italia alla fine del XIX secolo grazie al traduttore napoletano Federigo Verdinois. Lo scopo della prima parte del contributo è quello di presentare la storia della popolarità del romanzo "Quo vadis" attraverso le azioni delle case editrici milanesi, le quali hanno introdotto l’opera del polacco, in varie forme, nella sua offerta editoriale (come traduzioni, adattamenti, parafrasi, plagi). La ricostruzione della storia delle traduzioni del romanzo romano di Henryk Sienkiewicz è stata possibile grazie al metodo biblometrico che è stato ustato nella prima parte dell’articolo. L’analisi dei dati bibliografici raccolti ha permesso di valutare e descrivere la grandezza e il carattere della popolarità di "Quo vadis" tra gli editori milanesi nella prima parte del XX secolo. Nella seconda parte del contributo per meglio far luce sulle complicate sorti milanesi dell’opera di Sienkiewicz abbiamo usato alcuni metodi d’analisi quantitativa. Abbiamo paragonato il numero delle parole in originale e nelle traduzioni descritte nella prima parte. La ricchezza del vocabolario di tutti i testi esaminati è stata misurata e confrontata usando il calcolo della media mobile del rapporto del numero di parole alla lunghezza del testo (MATTR). Abbiamo confrontato anche le lunghezze delle frasi come serie temporali. I due programmi diversi ("WCopyFind e Tracer"), utili per le analisi delle reti, hanno dato risultati simili per il numero delle somiglianze delle frasi di pentagrammi verbali tra le traduzioni di "Quo vadis".
Niniejszy tekst stanowi recenzję książki "Reading beyond the female: The relationship between perception of author gender and literary quality" holenderskiej badaczki Cornelii Koolen. Prezentowana książka podejmuje tematykę relacji między płcią autora, oceną jakości literackiej jego lub jej twórczości i rzeczywistymi cechami tekstów, wpisując się w tym samym w nurt badań nad stereotypami płciowymi w języku i literaturze. Dzięki innowacyjnemu zastosowaniu ilościowych metod analizy tekstu, stanowi też istotną pozycję w zakresie metodologii stylometrycznej, nadając całości pracy interdyscyplinarny charakter.
Presented text is a review of the book "Reading beyond the female: The relationship between perception of author gender and literary quality" by Dutch researcher Cornelia Koolen. Discussed book undertakes the issues of relations between the gender of the author, evaluation of literary quality of their work and actual features of the texts, thus fitting in the larger trend of research on gender stereotypes in language and literature. The innovative use of quantitative methods also grants it an important place within literatureon stylometric, making it an interdisciplinary work.
Jeremiah Curtin translated most works by Poland’s first literary Nobel Prize winner, Henryk Sienkiewicz. He was helped in this life-long task by his wife Alma Cardell Curtin. It was also Alma, who, after her husband’s death, produced the lengthy Memoirs she steadfastly ascribed to her husband for his, rather than hers, greater glory. This article investigates the possible textual influences Alma might have had on other works by her husband, including his travelogues, ethnographic and mythological studies, and the translations themselves. Lacking traditional authorial evidence, this study relies on stylometric methods comparing most frequent word usage by means of cluster analysis of z-scores. There is much in this statistics-based authorial attribution to show how Alma Cardell Curtin’s significantly affected at least two other original works of her husband and, possibly, at least two of his translations.
In the following paper author discuss the natural language processing method (NLP) usage in polish academic literature. In the analysis three fields were pointed out: sociology, political science and literature science. Three groups of texts were presented from Marek Troszyński, Paweł Matuszewski and Maciej Eder. As the result of the conducted analysis author emphasized the most important methodological aspects of NLP usage: contexts, opportunities and risks. Finally, author indicated areas for the further research where NLP would be beneficial method.
W niniejszym artykule autor przedstawia stosowanie metod analizy przetwarzania języka naturalnego (NLP) w obszarze polskich badań. W analizie uwzględniono trzy pola badawcze: socjologiczne, politologiczne oraz literaturoznawcze. Omówione zostały prace takich badaczy, jak Marek Troszyński, Paweł Matuszewski oraz Maciej Eder. Efektem przeprowadzonej analizy było nakreślenie najważniejszych aspektów metodologicznych związanych z używaniem metody NLP: kontekstów, możliwości oraz zagrożeń. Finalnie wskazano dalsze perspektywy badawcze, w których stosowanie omawianych metod może przynieść potencjalnie pozytywne rezultaty.
This articles presents the results of a quantitative analysis of frequently appearing words in a data set of over 2,500 Polish texts: Polish literature from the fourteenth to twenty-first century, and Polish translations from English, French, Russian and (to a lesser degree) other languages. The data set reveals a visible signal by type and by original language. The results also point to a definite stylometric specificity of Polish translations of Shakespeare, and their stylometric resemblance to Polish romantic and neoromantic dramas.
W artykule przedstawiono wyniki analizy ilościowej najczęstszych słów korpusu ponad 2500 tekstów polskich: literatura polska od XIV do XXI wieku oraz polskie przekłady z angielskiego, francuskiego, rosyjskiego i (w mniejszym stopniu) innych języków. Wykazano istnienie w korpusie silnego sygnału rodzajowego i sygnału języka wyjściowego. Wyniki wskazują również na wyraźną odrębność stylometryczną języka polskich przekładów szekspirowskich i ich bardzo silne podobieństwo stylometryczne do polskiego dramatu romantycznego i neoromantycznego.
Content available remote

Kategoria stylu w badaniach metaleksykograficznych

The paper discusses the category of style and the validity of its application to dictionaries. The author begins by considering the question of textuality of dictionaries and whether it is possible to analyse them in line with other texts. By invoking concrete examples, she indicates works which use diverse means of expression to convey the same content, which suggests that the category of style may also be applied in the field of lexicography. The author attempts to determine what it would involve to identify style in a dictionary, which components of style may come into play here, and examines the possibilities of observation and interpretation of selected components. Finally, the paper discusses the usefulness of automatized stylometric tools in stylistic research into dictionaries.
The study investigates the possible identity of Vladimír Vašek (= Petr Bezruč), author of Silesian Songs and The Blue Underwing, and Pavel Hrzánský, who authored Poems: Opus no. 5, a book of verse bearing some similarities to the future development of Vašek’s poetic self. The research is carried out via a novel authorship attribution method based on the investigation of numbers and numerals. This new investigation is complemented by the standardly employed MFC and MFW analyses. All the inquiries corroborate that Vašek’s authorship of Hrzánský’s poems is implausible. If the Hrzánský−Bezruč link is to be maintained in liter
This article proposes ways to analyse the content and metadata of biographical interviews using statistical methods. The basis for this series of stylometric experiments was a specially created corpus exceeding 1.2 million lexical units in size and composed of texts extracted from selected biographical interviews from the Oral History Archive, the History Meeting House, and the KARTA Centre available on the website www.relacjebiograficzne.pl. Research was based on the content of biographical interviews with forty-one people assigned to three thematic categories: ‘Warsaw,’ ‘the village,’ and ‘gentry.’ The main goal of the experiments was to determine which linguistic factors differentiate speakers and which features (gender, place of origin, age, length of speech, or topic) can influence this classification. This research was carried out using quantitative linguistics methods, and the conclusions we have arrived at allow for the determination of the direction of further work in the field of the stylometry of spoken language.
W artykule zostają zaproponowane sposoby analizy treści i metadanych wywiadów biograficznych metodami statystycznymi. Podstawą do przeprowadzenia serii eksperymentów stylometrycznych był specjalnie stworzony korpus o rozmiarze przekraczającym 1,2 mln jednostek leksykalnych. Na korpus złożyły się teksty wybranych relacji biograficznych pochodzących z Archiwum Historii Mówionej, Domu Spotkań z Historią i Ośrodka KARTA, udostępnianych na portalu: www.relacjebiograficzne.pl. W badaniach wykorzystano treści wywiadów biograficznych 41 osób przyporządkowanych do trzech kategorii tematycznych: Warszawa, wieś, ziemiaństwo. Głównym celem eksperymentów było ustalenie, jakie czynniki językowe różnicują mówców i które cechy (płeć, miejsce pochodzenia, wiek, długość wypowiedzi, temat) mogą mieć wpływ na klasyfikację. Badania przeprowadzono metodami językoznawstwa kwantytatywnego, a uzyskane wnioski pozwalają na wyznaczenie kierunku dalszych prac w zakresie stylometrii języka mówionego.
first rewind previous Page / 2 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.