AHL AL-ḎIMMA VS. AHL AL-KITĀB IN ISLAM: SYNONYMS OR DIFFERENT TERMS?

This paper is inspired by the dichotomy in the characteristics of the terms ahl al-ḏimma ‘people under protection’ designating the religious groups under the protection of an Islamic administration, and ahl al-kitāb ‘people of the Book’ used for the religious groups who have a revealed scripture (the Old or New Testaments, the Avesta) in the modern literature on Islam, where the prevailing narrative sees them as two separate terms whereas some studies point to their near synonymy. Our study is based on the behaviour of the two concepts in works of classical Arabic literature (based on the CLAUDia historical corpus of Arabic). On the grounds of the collocations connected with the two concepts, the data supports the thesis of two distinct terms with rather little contextual overlap, where ahl al-ḏimma is used mainly for practical and formal aspects of the life of the above-described people within Islamic society, and ahl al-kitāb serves as a designation of the representatives of opposing religions in theological debates.

The designation ahl al-ḏimma is used in Islamic civilization as a designation of those people who for some reason enjoy the protection of the ruling Islamic authorities. The word ḏimma means protection, and ahl is a common way of describing a group of people with some common attribute. 1 It is generally considered a "medieval construct by the Muslim state to provide protection for 1 Thus, one can find ahl al-ʕilm 'the people of science, scientists', ahl al-Dimašq 'people of Damascus, the inhabitants of Damascus', ahl al-janna 'people of Paradise, those who (will) enter Paradise', ahl al-luġa 'people of language, linguists', etc.
minorities" 2 and one should add that in order to qualify for such a construct, one had to fulfil certain prerequisites, such as being a member of the ahl al-kitāb.
The term ahl al-kitāb, 'people of the Book', stands for those religious minorities that have a canonical book. It covers primarily monotheistic religions (Jews and Christians, with the Old and New Testaments), but is applicable to Zoroastrians (with the Avesta), too. In the Quran itself (Q 22:17), four religions are named. While the major religions (Judaism, Christianity) are clearly distinguishable, the Sabians (aṣ-Ṣābi'ūn) and (most probably) Zoroastrians (al-Maǧūs) are only vaguely mentioned in the Quran, and their real status remains unclear 3 . It can also be expected that the attitudes of the majority society differed in respect of these religious groups.
Both concepts are clearly meant for the same or almost the same groups of people. Only those individuals who belonged to ahl al-kitāb could become part of ahl al-ḏimma, usually by means of a contract ( c ahd al-ḏimma). This means that the overlap between the two must have been considerable.
The question of the different usage of these two concepts then naturally arises. A frequent trend in the Western literature, however, keeps these two designations distinct. Even the Encyclopaedia of Islam offers two entries on ḏimma, the first one concentrating on the historical part, the other on the legal aspects of the concept. 4 A somewhat tighter connection between the two concepts can be seen in the first article, but even here, this connection is not overemphasized. This holds also for the ahl al-kitāb, where one can find a reference to the legal aspect of these people under ḏimma (and ǧizya, the tax paid by ahl al-kitāb). 5 Though other such examples can be added, most studies keep the question of the relation between the two concepts apart, as they deal only with the aspects of one of the two concepts. On the other hand, there are also examples that do not consider the borders between the two concepts as clear cut, and stress the fact that both concepts deal with the same people. Here, probably the best example is in 6 where the two phrases are labelled as nearly synonymous in post-Quranic times.
This dichotomy led us to the question of how these concepts are treated in classical Arabic literature. We will be interested in the usage of the two terms in the literature, especially in their collocational behaviour and their mutual relation.
For the purpose of this study, we used the historical corpus of Arabic CLAUDia (Corpus Linguae Arabicae Diachronicum). This corpus contains approx. 420 million words, and covers practically the whole time span of writings in Arabic, since the 7th century until the middle of the 20th century. The corpus consists of approx. 2,000 works, which are included in the corpus in their entirety. From the chronological point of view, the distribution corresponds very well to the common image of the history of Arabic literature, with a gradual increase in the number of titles from the beginning, until the peak in the 14th century, followed by a decline especially in the 17th and 18th centuries and a subsequent revival in the 19th century. As such, it should be able to offer a sufficient sample of Arabic literary production to judge the usage and possible overlap of the two concepts. It should be borne in mind that the character of Arabic literature, especially in the earlier periods, is rather Islamoriented, which should help in the investigation of the two terms, clearly connected with the teaching and life of Islamic society.
While the second phrase is very common in the Quran (31 times), the mention of ahl al-ḏimma is secondary. It is not attested in the Quran itself; the root ḏmm appears here in different meanings (Q 9:8, 9:10, 17:18, 17:22, 26:54, 68:49). It is obvious that the concept of ahl al-ḏimma is secondary in our data, the earliest attestation coming from Ibn Sirin (d. 110/728), but the phrase is obviously already established and usage points to a commonly understood concept. 7 Both collocations can be considered very prominent and the frequency of their occurrence in the texts is high. They certainly cannot match such phrases as rasūlu ´llāhi ('messenger of God', an epithet of Muhammad), where the total amount approaches cca 1.3 million occurrences in our corpus, but this is a central concept in Islamic society. Should we consider only the phrases that occur in connection with ahl, then ahl al-kitāb (24,380x) is second in frequency after ahl al-c ilm ('people of science, scientists', 33,135x), and ahl al-ḏimma (6,893x) stands between ahl al-bayt ('people of the House', mostly 'relatives of Muhammad', 7,138x) and ahl miṣr ('people of Egypt, Egyptians' 6,667x). When compared with the individual words, then ahl al-kitāb has a similar frequency of occurrence to mamlaka 'kingdom' (24,378x) or ṭibb 'medicine' (24,281x); in the case of ahl al-ḏimma, we find words like iǧtihād 'effort' (6,849), or ǧamād 'solid; inorganic body' (6,865x). It is clear that both concepts belonged to the central ones in Arabo-Islamic society.
The distribution of both terms on a chronological axis is given in Table I. The distribution is shown in respective centuries, and the figures show their usage in absolute figures, then how many times the individual term occurs per million words (this is in order to be able to better compare the actual frequency in texts), plus the number of authors who actually employ those terms. This should ensure that it is possible to decide whether there is not just one big treatise on the topic that covers most of the usages, or that the usage is common with more authors. The table demonstrates that the number of authors who used the two terms was high: the number of authors who used the term ahl al-kitāb reached 901 (excluding 35 titles of periodicals) and ahl al-ḏimma was used by 596 authors (excluding 32 titles of periodicals). The list of authors who used these terms most often is available in the supplement. Table I clearly shows that both terms were very popular throughout the history of Islam. From the point of view of the frequency per million, we can see that for ahl al-ḏimma, it was relatively most common in the 8th century CE, then in decreasing frequency in the 10th century to increase again in the 11th century, but in the remaining period its frequency oscillated between 20 and 10, i.e. in lower ranks than before. The other term, ahl al-kitāb, displays a slightly different run of the curve, with rather high peaks in the 8th, 14th, and especially 20th centuries. This is illustrated in Fig. I, where the different peaks are clearly visible. Such a difference already points to the difference in the usage of the two terms. of other words, in a similar context. The difference in the context can be used for finding different meanings of homonymy or polysemy 8 . Such approaches can use various types of methods, based on psychological or cognitive experiments, general taxonomy of terms, or the meaning derived from the context. 9 As we will take the advantage of a corpus of Arabic, we can use the context in real texts as the basis of our research. In order to investigate the behaviour of the two phrases, we will analyze the immediate context of the terms, where we will look at the neighboring two words of the terms from both sides. The technique based on n-grams and collocational windows will be used. 10 This type of approach can be described as quantitative, as it does not take into consideration e.g. syntactic boundaries.
The length of the n-gram can play an important role. Bigrams show only the behaviour of immediate collocations, such as minor changes, but fail to find a phrase such as notify minor changes, for which the minimum of a trigram is necessary. Should the n-gram be too big, it can mislead us as many of the words may appear in a different syntactic domain, which would work in favour of more frequent words.
We have decided to work with a window showing five elements at a time. This means that we have at our disposal a window consisting of five elements taken directly from the text, i.e. two words before the item, then the item under investigation itself, and two more words after the item. As we are already working with bigrams (both terms consist of two words), in reality, we are using 8 KLEIN, D. E., MURPHY, G. L. The Representation of Polysemous Words. In Journal of Memory and Language, 2001, Vol. 45, No 2, pp. 259−282 -the authors clearly show the role of context variation for a decision on the particular meaning of a word in a given context; BERGLER, S. The semantics of collocational patterns for reporting verbs. In EACL '91: Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics, April 1991, pp. 216-221example of how the context can be used for determining grammatical information (distinction of verbs in a text). This points to the importance of contextual information for a variety of research questions. 9 A useful overview can be found e.g. in XIAO, R., McENERY, T. Collocation, Semantic Prosody, andNear Synonymy: A Cross-Linguistic Perspective. In Applied Linguistics, 2006, Vol. 27, No. 1, pp. 103−129. 10 GABLASOVA, D., BREZINA, V., McENERY, T. Collocations in Corpus-Based Language Learning Research: Identifying, Comparing, andInterpreting the Evidence. In Language Learning, 2017, Vol. 67, No. S1, pp. 155−179example of the way of defining the word windows and how these can be used in language learning based on the exploitation of a linguistic corpus. a hexagram, but the context on each side consists of two words only. 11 Such an extent should be able to capture collocations that are loose enough, and not to allow the most frequent words to skew the image too much. 12 It should also be noted that we are not trying to find a type of synonymy such that could be (at least in most cases) used for a lexicographical definition of the terms, but rather for mapping the usage of the two terms based on the contexts. One has to remember, too, that the two terms rather belong to an area of free collocation, which means that we can find both of them in a wide range of contexts.
That is why we will concentrate on the quantitative aspects of the question and compare the words that collocate with the two concepts most frequently. Should the two concepts be in a relation of synonymy or near synonymy, they should share the same neighbourhood, at least to a considerable extent. Lack of such overlap is generally interpreted as lack of synonymity.
For this purpose, the sets of items neighbouring both phrases were investigated. Each set had to be cleaned of unnecessary information (grammatical words, stop words), 13 and as a result, the subsets of items most frequent in both sets were confronted. As the phrases are different in their frequency, we used a percentual information for the most common phrases to normalize the collocations and enable the comparison of the two sets. The lowest co-occurrence was set arbitrarily to at least 1.7 per cent, i.e. a collocation of the concept and a certain word must occupy at least 1.7 percent of the whole occurrences with the concept. From this sample, the percentual information is derived.
The results are summarized in Figures II and III. In Figure II, the disjunct contexts are given; the contexts are ordered according to their connection with one of the two concepts, i.e. according to the difference in usage of the respective concepts. Only the concepts most relevant to the respective concepts are included in the graph. 11 To give a clear example, in a sentence man aslama min ahl al-ḏimma aʕqalu-hum fī bayt al-māl ('those who converted to Islam from ahl al-ḏimma are the most clever in the House of Wealth') we get the chunk aslama min ahl al-ḏimma aʕqalu-hum fī. 12 In Arabic, it is impossible to avoid Allāh, which is the most frequent content word in practically all textual collections, and so its collocation with any lexeme will be naturally high. However, other frequent words such as muslim will certainly appear. 13 It makes no sense here to investigate the relation of the term 'ahl al-ḏimma with e.g. prepositions. The example in note 10, however, shows that the percentage of grammatical words or stopwords can be rather high, close to 50 per cent. These instances were omitted from our analysis. The results comparing the collocations disjunct to both concepts show that those terms most common with one of the concepts are not so common with the other one. 14 These types of collocations can be grouped together quite reasonably: • ahl al-ḏimma collocations: the concentration of terms such as 'testimony' (root šhd), 'property' (mwl), 'authority' (ḥkm), 'allowed' (ǧwz), 'tax' (ǧzy), 'tithe' ( c šr), 'law'(ḥqq), 'protection' (ḏmm), 'business' (tǧr) point to a rather formal group of terms, connected with life (both private and public) in Islamic societies and regulations connected with it; it is these that are subject to authority, have to pay taxes and tithes, obey laws and can do business with; these people have to obey regulations; • ahl al-kitāb collocations: the concepts like 'believe' ('mn), 'unbelief' (kfr), 'polytheism' (šrk), 'knowledge' ( c lm), 'book' (ktb), and 'debate' (ǧdl) can be interpreted as terms connected with theological questions connected with the (hierarchical) coexistence of the religions in question, the extent of unbelief 14 As pointed out above, words like Allāh will occur with most of the lexemes in Arabic due to their high frequency. In our set, this also clearly concerns words on the base of the root slm, such as muslim, islām, etc. or polytheism of those religions; with these people you can debate questions of faith and religion. The next Figure III compares the contexts where both terms exhibit overlap in their contexts.

Figure III. Overlapping collocations of both concepts
In the case of overlapping contexts, the frequencies of occurrences are much lower: we are dealing with the range between 0.45 and 2.41, i.e. within a range which would be excluded in the case of disjunct contexts. The number of contexts where the two concepts meet is also much lower than in the case of the disjunct category (6 to 21). The two concepts overlap at 'marriage 1' (zwǧ), 'controversy' (ḫlf), 'truth' (ṣdq), 'marriage 2' (nkḥ), 'Christian' (nṣr) and 'Hadith' (ḥdṯ).
In this case, it is certainly much more difficult to group these terms according to the axis mentioned above, i.e. life and its regulations in Islamic society as opposed to theological disputations. One can successfully argue for both poles in all the items given here. On the other hand, the low frequency of the co-occurrences points to the marginal character of these collocations.

Conclusion
Our data seems not to support the thesis of near-synonymy. On the contrary, based on purely quantitative analysis, it seems that the two concepts, ahl alḏimma and ahl al-kitāb, exhibit only a small degree of overlap (unlike in the case of synonyms), and act in discourse as two rather distinct terms, the first of them is used as a label for groups of people with certain regulations of their lives within Islamic society, the latter one is mostly used for the designation of practically the same group of people in theological disputations among the coexistent religions.