The Chorasmian Dictionary Project: Materials & Goals

Producing a dictionary of the Chorasmian language has been a goal within Iranian philology for about seven decades. Two major figures in the field, Henning and MacKenzie, dedicated a significant part of their careers to that goal—and both were unable to finish their work or make much available to the public before their deaths. Fortunately, the materials left behind by both scholars provide a solid foundation to attain this goal, indeed, perhaps even all the necessary building blocks. In this post, I aim in the first part to overview these existing materials and in the second to discuss the approach to compiling and organizing them to produce an online and accessible dictionary of Chorasmian.

Part 1: Existing unpublished sources

To compile a (non-etymological) dictionary, the following materials are the primary foundation:

  • A draft printout of completed entries ʾ- through γw-, almost 1900 entries together with a slightly older version of the corresponding digital file (MacKenzie)
  • An older and less complete draft entries of ʾ- through n- on paper (MacKenzie)
  • Printed and digital versions of the concordance of the entire lexicon, organized alphabetically (MacKenzie)
  • Computer files containing the Chorasmian words and phrases in each source text (MacKenzie)
  • Handwritten index cards of the entire lexicon (MacKenzie, Henning)

The most recent work seems to be the print-out of ʾ– through γw-, probably dating to mid-2001 a few months before MacKenzie’s death. His computer file for ʾ– through γwxy– is very slightly different than the print-out, mostly in the format of lemmas, and seems to be a slightly older version. This draft of almost the whole of alif through ghayn, about 1900 completed entries, is essentially complete and can be used verbatim as the basis for the dictionary (but will be reformatted, see below). These probably constitute 30-40% of the entire dictionary, especially given that such a disproportionate number of forms begin with alif (almost 1,000!).

Continue reading

Towards a Chorasmian Dictionary, Part 3: MacKenzie

The next major stage in the history of attempts to create a Chorasmian dictionary, and the second to be interrupted by death, is that of David N. MacKenzie (1926-2001). Initially a specialist in Pashto and Kurdish, MacKenzie moved into Old and Middle Iranian as a student of Henning at SOAS in the 1950s. As a faculty member of SOAS, by the mid-1960s, he had evidently become interested in Chorasmian (despite his own admission (1971:1) that he “never studied any appreciable amount of Khwarezmian” with Henning) and had begun studying the available material in earnest. Already in 1968 he had prepared a preliminary edition of the Qunyat al-Munya and its Chorasmian phrases, but this project was paused for several decades until he was able to obtain access to better manuscripts of the text from Soviet archives.

MacKenzie’s work towards a Chorasmian dictionary also seems to have begun in the late 1960s, even before he received the fragment of Henning’s dictionary which he edited and published in 1971. Indeed, it was his intimate familiarity with Chorasmian and its structure that enabled him to publish his extensive, critical reviews of Benzing’s work between 1970 and 1972. Although MacKenzie seems to have produced several kinds of preparatory material for a dictionary over the decades, he—like Henning—never published any of it during his lifetime, despite devoting much time to the project after his retirement in 1994. A glossary to his edition of the Qunya (1990) is thus his only published lexical material.

An entry from the glossary to MacKenzie’s edition of the Qunyat al-Munya (1990:103)

Once MacKenzie passed away and his Nachlass was made available to scholars interested in finishing some of his unfinished projects, some insight could be obtained into the—fairly significant—progress he had actually made on Chorasmian. As Desmond Durkin-Meisterernst first reported in English and German papers on Chorasmian lexicology (both 2005), MacKenzie had, sometime in 2001, printed out a close-to-final draft of his finished entries. According to my count, these total 1888 completed entries covering the letters from alif to most of ghayn (breaking down as follows: ʾalif 950 ʿayn 47 B/b 320 β 72 C/c 113 č 81 D/d 59, δ 90, F/f 65 G 1 Γ/γ 90). MacKenzie also had other kinds of materials in various states, including basic lists of all attested words, concordances to the texts in which they occur, handwritten/typed drafts of entries up to the letter n, and his own card catalog of the complete lexicon. It is clear as well that all his work on a dictionary was based on his own editions of all the existing Chorasmian material in Arabic script—besides the Qunya (1990), he also published corrected decipherments of the Yatīmat al-dahr family of manuscripts (1996).

It seems that MacKenzie envisaged his dictionary to have two main parts: the complete glossary with definitions and attestations, and a part with etymological discussion. The latter part, which he separated out from the former, probably in order to finish it more quickly, does not seem to be preserved (or perhaps was never really drafted). It is also unknown how much longer MacKenzie thought he needed to draft the remaining entries; it probably would have required a few more years. However, it is obvious from the existing near-final draft that he had essentially all of the Chorasmian material gathered and organized, and only needed to write it up into the format he chose for the lemmas. MacKenzie therefore got much further than did Henning, and his material was even more thorough—only to also unfortunately be bested by time.

MacKenzie’s dictionary format largely follows that created by Henning decades previous, although with a slightly more thorough transliteration system, more extensive attestations of course, and a rather complex system of abbreviations which were probably designed to shorten the overall length of the work in print.

A sample page from the printout of MacKenzie’s most recent draft of the Chorasmian dictionary (mid-2001), showing his handwritten annotations.

From the sample page above, one can see just how much more extensive the full dictionary is than previous published materials such as Henning’s Fragment or the glossary to Mackenzie’s Qunya. These are the materials—in the first place, the printed draft of the completed entries—which form the basis for the Chorasmian dictionary project. The existing materials as well as the project’s methods will be surveyed in the next blogpost in this series.

Sifting for Chorasmian

Finding the Chorasmian glosses in Arabic manuscripts is not easy work. One has to closely examine numerous copies of the same work in order to find out whether some Chorasmian marginalia or short notes are present on a few folios, or not at all. While Yusuf Aǧa 5010, the main source of lexical material, has rather clearly-written glosses throughout, other copies of the Muqaddimat al-Adab have only a few, marginal Chorasmian words. Given just how many copies of that text exist, one doesn’t envy the scholars of previous generations who took it upon themselves to sift through thousands of folios in order to see if a given copy had any Chorasmian or not.

With British Library Add. MS 7429, a partial copy of the Muqaddima, David N. MacKenzie got rather lucky. There are Chorasmian glosses on the very first page, and only on that page, though tucked under and beside the more copious Persian glosses. One does wonder how far MacKenzie searched through the ms. before deciding there was no more Chorasmian to be found. Perhaps in a nod to that medieval scribe who wrote the Chorasmian, MacKenzie helpfully tucked his own announcement of the discovery of the glosses in this ms. and his edition thereof into a page in the middle of one of his reviews of Benzing’s Sprachmaterial (MacKenzie 1971: 524-525 “The Khwarezmian Glossary IV”). It was thus a while till I myself noticed this additional manuscript. A colophon for the section on particles (ḥurūf) dates Add. MS 7429 to Dhu ’l-Qa‘da 760 AH, or Oct. of 1359 CE—a date contemporary with the production of several other Chorasmian manuscripts.

Given that the glosses are so few, and the manuscript has been shared publicly by the BL, it will furnish a nice illustration of how Chorasmian glosses to the Muqaddima actually look. This particular ms. is glossed throughout in Persian and also has a few pages with glosses in what was then called “Eastern Turkish”, that is, Khwarezmian Turkic or early Chagatay.

BL Add. 7429 fol. 1v, first page of the Muqaddimat al-Adab. Chorasmian glosses marked by numbers in green.
Continue reading

Towards a Chorasmian Dictionary, Part 2: Benzing

Approximately concurrent with Henning’s late work on the Chorasmian lexicon in the 1960s, another effort was underway back in Germany. The German Turkologist (and Nazi cryptanalyst) Johannes Benzing (1913-2001) had begun editing the Chorasmian material preserved in the Muqaddimat al-Adab, primarily based on the manuscript Yusuf Aǧa 5010 held in Konya and published in facsimile by Togan in 1951. Benzing was not a scholar of Iranian languages, but was primarily interested in Chorasmian as a possible substrate language to the Turkic variety that became current in the region from the 13th century on (known as Khwarezmian Turkic or pre-Chagatay). His collection of this material published in 1968 (Das Chwaresmische Sprachmaterial einer Handschrift der “Muqaddimat al-Adab” von Zamaxšari) therefore ended up beset by problems.

The work is first of all something of an eclectic edition. Benzing combines his idiosyncratic transliteration of the Chorasmian glosses in the Konya manuscript with the Arabic and Persian equivalents and Latin translations as presented in the 1843 edition of the Muqaddima by Johannes G. Wetzstein (1815-1905). Benzing leans in the direction of a text edition, listing each word or form as it appears in order in the manuscript, rather than gathering them into an alphabetized grouping. As a guide to the Chorasmian lexicon, this means that the edition is rather hard to use. Moreover, as a guide to a single manuscript, its eclectic arrangement with ms pages mixed with the Wetzstein entry numbers means that it is confusing to peruse.

But the main problem was that Benzing made little effort to interpret the spellings of Chorasmian words, many of which were partially or totally unpointed. He used a complicated transliteration scheme to try to indicate what letters should have been pointed a certain way and which remained unclear. Lacking knowledge of Iranian languages and philology, he wasn’t able to resolve these problems through comparison and etymology, and ended up mis-presenting numerous words, many of which would have been decipherable, behind their unpointed Arabic skeletons, to someone with knowledge of Old or Middle Iranian. This is precisely what MacKenzie pointed out in a series of five (rather brutal) review articles on the book that he published from 1970 to 1972. His appraisal of the work can more or less be summed up by his statement that “Benzing can be faulted for many misreadings and, worse, wrong generalizations from them” (1970:541). These reviews are so extensive that reference to Chorasmian words preserved in the Muqaddima is simply not possible without them.

Continue reading

Towards a Chorasmian Dictionary, Part 1: Henning

Credit for efforts to create a comprehensive, or at least extensive, glossary of the Chorasmian language are due, in the first place, to those medieval scribes who thought it useful to insert Chorasmian translations into copies of the Arabic Muqaddimat al-Adab (more on that, and on them, in due course!).

In modern times, the idea of compiling all the attested Chorasmian materials into a dictionary goes back to Walter B. Henning, the pioneer, together with Zeki Velidi Togan, of Chorasmian studies. Having begun working on Chorasmian in the 1930s, Henning had lost interest for a while due to the plagiarism of A.A. Freiman and did not publish anything on the language until the 1950s (Henning 1955:423).

It is unclear when Henning began returning to Chorasmian matters, but by the mid-1950s point, he had organized the words and phrases attested in the known manuscripts of the Muqaddima and Qunya (all discovered by Togan) into a card catalogue. At a 1954 conference, he announced “I have now compiled nearly a complete glossary which I hope to publish in the near future” (Henning 1956:43). Other obligations and projects intervened to delay the conversion of this card index into publishable dictionary form, however, and it was only in the year or so before his death in Jan. 1967 that Henning began to draft full entries.

Henning’s handwritten draft of a Chorasmian dictionary (p. 16)

At the time of his passing, approximately 260 entries had been completed, by hand as was his norm, filling 108 pages. These entries were based mainly on two manuscripts of the Muqaddima: the substantial glosses in the Yusuf Aǧa MS 5010 ms. (Togan 1951), and Hacı Beşir Aǧa MS 648 (at that time unpublished), as well as on some then-unknown manuscripts of the Qunya available to him.

Owing to the nature of the language, in which hundreds of words begin with a vowel written by means of the letter alif ʾ, these 260 entries only take us from ʾ- up to ʾkw-. Substantial as it is, this amounts to barely halfway through the first letter of the alphabet! The pages were sent after Henning’s death to David MacKenzie, who by then had established himself as the major authority on the language. The latter apparently had no access to other materials from Henning. MacKenzie published the entries with essentially no modification, adding only about 140 brief additional entries where warranted by cross-references in the completed part, as A Fragment of a Khwarezmian Dictionary in 1971. The handwritten pages were eventually returned to Henning’s Nachlass.

Besides the initial work in collecting, deciphering, and etymologizing Chorasmian words—though much of this was expanded and improved upon by MacKenzie later—one important thing about the Fragment is that Henning, perhaps inadvertently, established the lemma format which MacKenzie’s subsequent work on the Chorasmian dictionary would adhere to exactly. We’ll explain what we think are the downsides of this format in another blogpost; suffice it to say for now that it is not very comprehensible to someone not very familiar with Chorasmian literature.

Entry for ʾβʾry- ‘to forgive’, from Henning’s Fragment (1971:9)

Here ends Henning’s part in the story of a Chorasmian dictionary. Although his card catalogue is still extant, and may have some interesting notes and etymologies, it does not seem to have been used by any subsequent scholars. Moreover, the correct understanding of Chorasmian grammar and historical linguistics was developed much, much further in the following decades by Henning’s student David MacKenzie. It was he who took up the project of finishing a Chorasmian dictionary, laboring over it for the next three decades.

Some 10th-century Arabic Reports on Chorasmian

Ibn Faḍlān, judging the way Chorasmian sounds

Ibn Faḍlān (d. 960), Risāla

وهم أوحش الناس كلاماً وطبعاً كلامهم أَشبه شيء بصياح الزرازير وبها قرية على يوم يقال لها أَردكو أَهلها يقال لهم أُلكردلية كلامهم أَشبه شيءٍ بنقيق الضفادع

“[The Khwarizmians] are the most barbarous of people, both in speech and customs. Their language sounds like the cries of starlings. In their country there is a village one day’s journey away called Ardakuwa, whose inhabitants are known as Kardaliyya, and their speech sounds like the croaking of frogs.”

What Ibn Faḍlān gives as Ardakuwa (اردكو) may refer to the town al-Maqdisi calls Ardh-Khiva, located to the southeast of the main settlement of Kath (or may not; I haven’t seen it explained anywhere, though). That the inhabitants of this Ardakuwa should be called “Kardaliyya” is not clear—perhaps Ibn Faḍlān combined here reference to inhabitants of a different town, Kurdar, in the northeast of Khwarizm?

Source: Ibn Faḍlān, Risālat Ibn Faḍlān, edited by Sāmī Dahhān (Damascus, 1959), p. 82

Ibn Ḥawqal (d. ca. 978), Kitāb ṣūrat al-arḍ

وهم أكثر أهل خراسان انتشارا وسفرا، وليس بخراسان مدينة [كبيرة] إلّا وفيها من أهل خوارزم جمع كثير، ولسان أهلها مفرد بلغتهم وليس بخراسان لسان على لغتهم

"[the Khwarizmians] are the most widespread and widely-traveld people of Khurāsān, and there is no city in Khurāsān that does not have a large group of Khwarizmians, and their language is unique to them, no other like it is spoken in Khurāsān"

Source: Ibn Ḥawqal, Kitāb ṣūrat al-arḍ, edited by J. H. Kramer (Leiden, 1938), vol. 2, pp. 477–478, 481–482.

al-Maqdisī (d. 991), Aḥsan al-taqāsīm fī ma‘rifat al-aqālīm

لسان اهل خوارزم لا يُفهم
“The language of the people of Khwarizm cannot be understood”

In other words, Chorasmian was to al-Maqdisī not similar to Persian, many different varieties of which he mentions in his work.

Source: Basil Anthony Collins, translator, The Best Divisions for Knowledge of the Regions (Reading, 1994), p. 272; ed. M. F. de Goeje, Bibliotheca Geographorum Arabicorum, Vol. 3, (Leiden, 1877, rev. 1906)

The Discovery of Chorasmian

While the early years of the 20th century saw a massive boom in the number of Middle Iranian sources available to the world, the Chorasmian language remained almost unknown to modern scholars. That an Iranian language particular to the region had existed, of course, was no surprise: the polymath al-Biruni cited a few Chorasmian terms in some of his works, and al-Biruni’s works had been the subject of major European scholarly publications since the 1870s. But no Chorasmian sources were discovered at Silk Road sites such as Turfan, for example, which yielded such riches in Sogdian, Parthian, Middle Persian, and Khotanese. So, no actual sources in Chorasmian were known to that first modern generation of specialists in Middle Iranian.

The fact that original Chorasmian texts are now available to scholars, and have been studied since the mid-20th century, is due entirely to the efforts of one person: the Bashkir revolutionary and Turkologist Ahmed Zeki Validi Togan (1890-1970).

https://upload.wikimedia.org/wikipedia/commons/0/08/%D0%92%D0%B0%D0%BB%D0%B8%D0%B4%D0%B8.jpg
Әхмәтзәки Әхмәтшаһ улы Вәлиди, hero of Chorasmologists
Continue reading

Being “Biruni” in Khwarizm

In the 10th and 11th centuries, a number of authors who wrote in Arabic were aware of the existence of a language particular to the region of Khwarizm. Abu Rayḥān al-Bīrūnī is the most famous and knowledgeable of these authors, of course, being a native speaker of the Chorasmian language. But travellers and scholars such as Ibn Faḍlān and al-Maqdisi reported on this language, and their judgements on the way it sounded to them, but did not record any examples of the language.

Al-Biruni, or should we say, ī anbīcak

‘Abd al-Karīm al-Sam‘ānī (d. 1166), however, seems to have actually known something of the Chorasmian language. Originating from Merv in today’s northwestern Iran, not all that far from Khwarizm, he may have actually heard the language being spoken. In his biographical dictionary known as the Kitāb al-Ansāb (ed. Hyderabad, 1962, Vol. II, p. 353), he reports:

البيروني بفتح الباء الموحدة وسكون الياء آخر الحروف وضم الراء بعدها الواو وفي آخرها النون هذه النسبة الى خارج خوارزم فان بها من يكون من خارج البلد ولا يكون من نفسها يقال له فلان بيروني ست ويقال بلغتهم انبيژك ست والمشهور بهذه النسبة أبو ريحان المنجم البيروني
“al-Bayrūnī (with fatḥ/kasr of the 'b', sukūn of the 'y', ḍamm of the 'r', followed by 'w' and 'n'): This surname (nisba) refers to the outer parts of Khwarizm, for in that country if someone is from outside the town and not from the town itself they say of him fulān bayrūnī-st (فلان بيروني ست), but in their own language they say anbīcak yitti (read: انبيڅك يت). The person most well-known by this surname is the astronomer Abū Rayḥān al-Bayrūnī.”
Continue reading

Chorasmian leftovers in 17th-century Khivaq

The Persian dictionary Burhān-i Qāṭiʿ, compiled by Muḥammad Ḥusayn b. Khalaf al-Tabrīzī in 1062/1651-2 in Ḥaydarābād, mentions the use of two very particular terms in the language of the town of Khivaq in Khwarizm (ed. Muḥammad Muʿīn (Tehran, 1330-5 s./1951-6), Vol. II, 1183, also available here):

سوپ - به ضم اول و سكون ثانى و باى فارسى به زبان خيوق كه يكى از الكاى خوارزم است آب را گويند همچنان كه پكند با باى فارسى و كاف بر وزن سمند نان را و سوپ و پكند آب و نان است
sūp, with ḍamm on the first (letter) and sukūn on the second and a Persian 'b', in the language of Khivaq, which is one of the provinces of Khwarezm, they call “water” so, just as they call “bread” pakand. So sūp o pakand is 'water and bread'.
Continue reading

The Edinburgh manuscript of al-Biruni’s Chronology

The Edinburgh manuscript (Or. ms. 161) of al-Biruni’s Chronology (الاثار الباقية عن القرون الخالية), is not only one of the earliest witnesses of the work, with a date of 1307 CE, but is also probably the most beautiful, bearing numerous illustrations and tables rendered in fine calligraphy. It is now digitized in its entirety and available to view online through the University of Edinburgh’s library. For some lectures concerning the most recent work on the Chronology, particularly by François de Blois, see this page, part of a major UCL project on ancient and medieval calendars. For more on the Edinburgh manuscript itself and the context of its production, see “The Edinburgh Biruni Manuscript: A Mirror of Its Time?” by Robert Hillenbrand.

Al-Biruni, of course, cites a number of terms in Chorasmian, his native tongue, as well as in Sogdian, another Middle Iranian language he was very familiar with. Due to the high quality of the Edinburgh manuscript, the Chorasmian and Sogdian terms are quite clear. Here, I thought it worth showing two folios in particular.

Continue reading