WG2 has started Digilex, a platform for sharing tips, raising questions and discussing methods of digitizing print dictionaries. See latest blog entries below, for more, visit the website https://digilex.hypotheses.org.

RSS WG2 blog

  • An XML Version of Turkish Dictionary January 22, 2018
    In order to anchor international or multinational lexicographic projects on existing Turkish dictionaries, we should have a common understanding of the way we make reference resources available, as is the case for the digital version of our Turkish dictionary project. Although there work has been being done on digitizing Turkish dictionaries, both old Turkish dictionaries […]
    Emrah Özcan
  • Creating a prototype for lexicographic entries for spoken German December 14, 2017
    Since dictionaries are mostly based on written language data, creating a dictionary of spoken language requires new types of lexicographic descriptions and an elaborate microstructure. When analyzing spoken language material, a remarkable part of lexicographic work consists in analyzing interactional contributions of one or more speakers, and focusing on the lexicalized units used for organizing […]
    Dolores Batinic
  • Simple XSLT experiment: building up a concordance dictionary from a corpus December 10, 2017
      During the lexical master class in Berlin, we worked together with Simonetta Battista and Ellert Johannsson on the Dictionary of Old Norse Prose. One of the little exercises we have done was to use XSLT in order to generate lexical entries automatically from an existing annotated textual corpus of Old Norse. The corpus The […]
    Laurent Romary
  • GROBID Dictionaries: Experiments with the General Basque Dictionary (OEH) PDF December 7, 2017
    GROBID Dictionaries is a tool for structuring dictionaries (conversion from PDF format to TEI XML), with a supervised machine learning approach (CRF models). Details are explained in (Khemakhem et al. 2017), which is also the paper to cite in relation to GROBID Dictionaries. At LexMC, I have taken part at the GROBID Dictionaries tutorial and […]
    David Lindemann
  • From a Legacy Dictionary to New Lexica: An Alternative Time-Machine to Discover Neologisms March 28, 2016
    The History of Greek Lexicography has many examples of exceptional and quite “crazy” pioneer researchers, amateur lexicographers, linguist authors who travel all the country to collect data for their lexica and dictionaries. Nikos Kazantzakis travelled most areas of Greece to collect “beautiful” words for his works, especially for his epic saga, Odyssey – as a […]
    Athanasios Karasimos
  • Digitised and Born-Digital in One Application: Dutch Historical Dictionaries Online March 28, 2016
    Dutch historical language has been described in four separate comprehensive dictionaries: the Woordenboek der Nederlandsche Taal (WNT, Dictionary of the Dutch Language, 1500-1976) the Middelnederlandsch Woordenboek (MNW, Dictionary of Middle Dutch, ~1250 – 1550), the Vroegmiddelnederlands Woordenboek (VMNW, Dictionary of Early Middle Dutch, 1200-1300). and the Oudnederlands Woordenboek (‘ONW’, Dictionary of Old Dutch, ca. 500–1200). […]
    Katrien Depuydt
  • Digitising 150 Years of the Swiss German Dictionary March 26, 2016
    The scholars of the Swiss German Dictionary (Schweizerisches Idiotikon) have collected more than 15000 pages of highly concentrated information over the last 150 years. When we began retro-digitising the dictionary a few years ago, we were unsure if we were up to the task of dealing with such a massive amount of data. Of course, […]
    Tobias Roth
  • A New Life for an Old Dictionary: Notes on Digitizing the Dictionary of Russia March 23, 2016
    I was a PhD student just finishing my thesis when my doctoral studies advisor, Professor Viktor Kabakchi, a grey-haired Russian scholar with many years of university teaching experience, invited me to take part in updating his brainchild  – The Dictionary of Russia. The first print edition of this dictionary about Russian cultural terms in English, […]
    Kseniya Egorova
  • How Can I OCR My Dictionary? March 8, 2016
    One way to digitise a dictionary is using Optical Character Recognition or OCR. But is OCR feasible at all for my dictionary? And if so, which OCR program should I used, trainable or omnifont? And how about the workflow: should I train the OCR engine or not? And, finally, what should be the output format […]
    Jesse de Does
  • Legacy Dictionaries Reloaded: Why Should We Bother?  January 12, 2016
    The closest I’ve ever come to glimpsing hell was several years ago, reading an article in the New York Times, entitled “Justices Turning More Frequently to Dictionary, and Not Just for Big Words.” The article cited the example of a certain Chief Justice John G. Roberts Jr. who had apparently parsed the meaning of a federal […]
    Toma Tasovac