DYLEN: Diachronic Dynamics of Lexical Networks
- Geschrieben von
- Elisabeth Königshofer und Katharina Wünsche
- Veröffentlicht am
- 17. Februar 2023
- Getagged mit
- Data management
Learning outcomes
After completing this resource, you will
-
understand the purpose of DYLEN
-
be able to read a visualisation that was created in DYLEN
-
know how to undertake an ego network analysis with DYLEN
-
generate a general network analysis
Introduction
DYLEN is the acronym of the Diachronic Dynamics of Lexical Networks (Baumann et al. 2019). It is an interactive visualisation tool (Yim et al. 2022) that the Diachronic Dynamics of Lexical Networks project team created to provide insights into the dynamic lexical changes of Austrian German during the 21st century. It helps lexicographers and linguists to analyse the development of Austrian German lexemes over the course of time. It is an open source tool that can be used free of charge.
DYLEN enables lexical network research on large-scale authentic language data that are taken from two Austrian Geman corpora, the Austria Media Corpus (amc), (Ransmayr et al. 2017) and Corpus of Austrian Parliamentary Records (ParlAT).
DYLEN provides three options:
-
Ego network,
-
General network (party),
-
General network (speaker),
and 2 additional components:
-
Node metrics comparison,
-
Time series analysis.
The following comic provides a visual summary of this article and illustrates the key features of the DYLEN tool.
Networks
Diachronic networks derive from the texts in amc and ParlAT with the help of word embeddings. In NLP, word embeddings are representations of words.
The user interface is very intuitive but every search starts with deciding on either an ego network or a general network (party or speaker). In each network type, you can analyse various parameters of a single entity or compare two entities. The first step on your diachronic network journey is to select the network that you would like to generate.
Ego network
Connected words are semantic neighbours that share some aspects of the target word. Some can even substitute the target word in a particular context. The ego network visualises the 50 most closely related semantic neighbours of a target word. Note that it does not show the target word itself because it would render the visualisation impossible to read. The semantic neighbours are classified as parts of speech (POS), e.g. noun, proper nouns and verbs.
Instructions:
On the input field on the left side bar, you can
-
select a corpus (i.e., amc or PARLAT),
-
select a subcorpus (e.g., a specific newspaper),
-
type a target word (e.g., ‘Geld’),
-
and finally click Visualise.
Understanding the visualisation
Once you clicked on visualise, DYLEN will generate your network. Let us stick with our “Geld” (money) example.
Above, you see the semantic neighbours represented by nodes that can be dragged further apart to get a better overview. Their size indicates their frequency. The bigger the node, the more commonly it is used in the corpus. You can click on each node to highlight the connections. The colours represent different parts of speech and you can change them to your preference.
Time Series Analysis
The Time Series Analysis allows to compare two words over time; the comparison can be relative to the first year, last year or previous year.
Metrics and Node metric comparison
In addition, you can select the metrics for the parallel coordinates with the sliders. These metrics are presented in the parallel coordinates plot. Every graph line represents one word and each vertical axis stands for the value in the respective metric. You can visualise all words or selected words in the node metrics comparison. When you click the lines, you can inspect the values for each metric.
General networks
General networks reflect the speeches of a particular politician or a political party. Those networks are larger than ego networks and require more filters that make the visualisation more legible. Under general networks you can explore frequent lexemes used by particular political parties (general network (party)) or individual politicians (general network (speaker)) in the Austrian Parliament.
Instructions:
On the input field on the left side bar, you can
-
select a party,
-
select a speaker (only for general network (speaker)),
-
(optional, but recommended) use the Node filter to
-
select a metric (e.g. degree centrality)
-
adjust the percentage of nodes to be displayed,
-
-
and finally click Visualise.
Node Metrics Comparison
The general network analysis allows for node metric comparison too. You can choose between the same metrics as in the ego network. When you compare two parties or speakers, each component gets a different colour. Also, you can ask DYLEN to return a table for the node metrics with indicating colours (see below).
Time Series Analysis
In the general network analysis, the development of speakers or parties can be traced, like the ego network traces individual words. You can visualise your results as a graph on a timeline, or as a table with selected metrics and values. All your options for analysis are explained in more detail on the DYLEN website in the technical details in the Time Series Analysis tab.
Links
References
-
Baumann, Andreas, Julia Neidhardt, and Tanja Wissik. 2019. DYLEN: Diachronic Dynamics of Lexical Networks. In Proceedings of the Poster Session of the 2nd Conference on Language, Data and Knowledge (LDK-PS 2019), ed. Thierry Declerck and John P. McCrae, 2402:24–28. CEUR Workshop Proceedings. Leipzig, Germany: CEUR.
-
Ransmayr, Jutta, Karlheinz Mörth and Matej Ďurčo (2017): AMC (Austrian Media Corpus) – Korpusbasierte Forschungen zum österreichischen Deutsch. In Digitale Methoden der Korpusforschung in Österreich (= Veröffentlichungen zur Linguistik und Kommunikationsforschung Nr. 30), Hrsg. C. Resch und W. U. Dressler, 27-38. Wien: Verlag der Österreichischen Akademie der Wissenschaften.
-
Wissik, Tanja, and Hannes Pirker. 2018. ParlAT beta Corpus of Austrian Parliamentary Records. In LREC2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora In Proceedings of the Eleventh International Conference on Language Resources and Evaluation LREC2018, ed. Darja Fišer, Maria Eskevich, and Franciska de Jong. Miyazaki: European Language Resources Association.
-
Yim, Seung-bin, Katharina Wünsche, Asil Cetin, Julia Neidhardt, Andreas Baumann, and Tanja Wissik. 2022. Visualizing Parliamentary Speeches as Networks: the DYLEN Tool. In Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference, ed. Darja Fišer, Maria Eskevich, Jakob Lenardič, and Franciska de Jong, 56–60. Marseille, France: European Language Resources Association.