The Architecture of Words

Generating a vocabulary for an invented language is a stupendous task. For basic functionality, a natural language probably needs about 5000 to 10,000 words; a language that’s fully capable of dealing with all sorts of specialized and technical topics may have upwards of 100,000 words.

Naturally, when I started work on neo-Khuzdul, I did not have such a complete vocabulary. Nor, after many years, does such a vocabulary exist today. What I created instead was something like what Tolkien had done with his Elvish languages; instead of making a dictionary of thousands of words, he created a system by which new vocabulary items could be generated based on existing words and “roots” — the sounds which carry the fundamental meaning of each word.

But Khuzdul was very different in structure from Elvish. In this, as with the phonology of Khuzdul, Tolkien had left clues that I was bound to follow. In The Lord of the Rings itself, Tolkien had had little to say about Khuzdul as a language; only that it was a “strange tongue, changed little by the years,” and “a tongue of lore rather than a cradle-speech” which few had succeeded in learning. We recall that, at the West-gate of Moria, Gandalf speculates that it will be unnecessary for him to ask Gimli for “words of the secret dwarf-tongue that they teach to none.” Obviously, even Gandalf is not a master of Khuzdul!

A vital clue about the nature of Khuzdul came from a text that is not really concerned with Khuzdul or the Dwarves at all. This is “Lowdham’s Report on the Adunaic Language” (in Sauron Defeated, pp.413-440). This text purports to be a description, by the fictional character Alwin Arundel Lowdham, of the languages of Númenor — an “Atlantis” of a distant, semimythical past, which he has been able to view by means of entering into the experiences of his remote ancestors – who may include Elendil of Númenor! The machinery by which all this is justified is extremely complex, and is described in The Notion Club Papers (also in Sauron Defeated); it also raises all sorts of intriguing issues which are beyond the scope of this particular discussion. Suffice it to say that Lowdham provides some elementary grammatical information about the primary language of Númenor, Adûnaic (or, as he spells it, Adunaic), and contrasts it with the Elvish or “Nimrian” languages — nimir being the Adunaic word for “elf.”

Adunaic, Lowdham speculates,

came under some different influence [than Elvish]. This influence I call Khazadian; because I have received a good many echoes of a curious tongue, also connected with what we should call the West of the Old World, that is associated with the name Khazad. Now this resembles Adunaic phonetically, and it seems also in some points of vocabulary and structure; but it is precisely at the points where Adunaic most differs from Avallonian [sc. Quenya] that it approaches nearest to Khazadian.

Lowdham does not identify Khazadian specifically as a language of Dwarves, doubtless because he does not know; his psychic information is largely focused on language-details, and occasionally visions of manuscripts, with other aspects of the visualized culture being scant or absent. But there is no doubt that, within Tolkien’s mythology, Khazad refers to the Dwarves and that Khazadian is Khuzdul.

This is all well and good, but one would like to know more precisely in what points of structure “Khazadian” resembles Adunaic. Lowdham happily comes through:

The majority of the word-bases of Adunaic were triconsonantal. This structure is somewhat reminiscent of Semitic; and in this point Adunaic shows affinity with Khazadian rather than Nimrian.

No more is said about “Khazadian” in this text, but this is enough. It echoes, somewhat obliquely, a comment made by Tolkien in a letter to Naomi Mitchison: that the Dwarves were “like Jews… speaking the languages of the country [i.e., of whatever country they happen to be living], but with an accent due to their own private tongue.” It’s difficult to reconcile this with Tolkien’s statement that Khuzdul was “a tongue of lore rather than a cradle-speech”; and in reality it’s more likely that the pronunciation of Hebrew was influenced by “the languages of the country” than the other way around, and that such accents as the Jews of Tolkien’s acquaintance may have had more likely came from Yiddish than from Hebrew.

But to return to the main point: it seemed evident that Tolkien intended Khuzdul to be somewhat Semitic in structure, particularly as regarded the system of roots. The Semitic language family is a large but fairly tightly-knit group of languages found mostly in the Middle East. Its representatives with the most speakers today are the Arabic languages (descended from Classical Arabic), modern Israeli Hebrew, and some but not all of the languages of Ethiopia and Eritrea. Extinct varieties include the Akkadian languages spoken in Mesopotamia, including Assyria and Babylon; Phœnician, spoken along the coasts of what are now Syria, Lebanon, and Northern Israel, and its descendant, the Punic of Carthage; and Aramaic, spoken originally in Syria but later throughout the Middle East. Aramaic is not quite extinct; some descendant dialects are still spoken in a few villages, though more than a century of upheaval has not been kind to them, and they are now on the edge of extinction.

What these languages have in common is a peculiar structure, in which basic meaning is carried by a group of consonants (normally three, but sometimes 1, 2, or 4) which are then modified by the addition of prefixes, suffixes, infixes, doubling of consonants (normally the second), and, most notably, the insertion or deletion of vowels between these consonants. For instance, in Arabic the three consonants k-l-m carry the notion of “speaking” or “speech”. From this root are derived (among othes) the verbs kallama “address”, kâlama “converse”, takallama “utter”, and the nouns kalimah “word, speech”, kalâm “expression”, mukâlama “discussion”, takallum “talk”, and the adjectives kalâmî “pertaining to speech”, tiklâm “eloquent”, and mutakallim “speaking”. The sequence k-l-m (which, for convenience’s sake, I’ll express in capital letters henceforth without dashes, thus: KLM) is the “root”, which in Arabic is called jidhr and in Hebrew shoresh.

A standard set of affixes or pattern of vowels can be applied to many different roots. These patterns (called wazn in Arabic and binyan in Hebrew) can indicate the part of speech, the person, number, mood, or tense of the verb, the comparative or superlative forms of the adjective, and so forth. For instance, in Arabic, the adjectives meaning “big” and “near” have the pattern CaCîC, where C=one of the consonants of the root: kabîr, qarîb. The superlatives of these same adjectives have the pattern aCCaC: akbar “biggest”, aqrab “nearest”.

Can we verify that Khuzdul has this kind of construction, in general, if not in detail? The word Khuzdul itself is evidently related to Khazâd “dwarves”, the prefixed form Khazad– in “Khazad-dûm” (“Mansion of the Dwarves”) and probably also Nulukkhizdîn*, a Dwarvish name for Nargothrond (where Nuluk probably = Narog, the name of the river on which Nargothrond was built). Each of these shows the same root KhZD (remember that kh is a single consonantal sound in Khuzdul) with a variety of vowel patterns and suffixes: CaCaC, CaCâC, CuCCul, CiCCîn. The ending -ul in Khuzdul is probably the same as that seen in Mazarbul and Fundinul — in the latter case appended to a name of Mannish origin. We also have the word Rukhs “orc”, plural Rakhâs “orcs” in The War of the Jewels, p. 391, which shows that patterns are repeated; Rakhâs has the same pattern as Khazâd, but with the root RKhS. If the patterns are consistent, then most likely the singular of Khazâd is Khuzd, which in term explains Khuzd-ul — basically equivalent to Dwarf-ish.

Assuming a Semitic style of construction, generating a Khuzdul vocabulary was therefore — in principle — as simple as producing a lot of triliteral roots and a suitable set of patterns, like (but not identical to) those found in real Semitic languages. In actual application, things were a little more complicated.

*Misspelled in The Silmarillion; see The War of the Jewels, p. 180.

Runes in The Hobbit film

Bethany writes to ask:

I have a question about the dwarf-runes used in The Hobbit: An Unexpected Journey. I have noticed that in some instances (e.g. Thorin’s key) Anglo-Saxon futhark runes are used, and in others Tolkien’s own Angerthas runes are used. Is there a reason that both kinds are used alongside each other?


This is a problem peculiar to the film, and results from different people working on different aspects of the film at the same time without necessarily being aware of what the others were doing. Although only the Anglo-Saxon runes are used in the book of The Hobbit, I recommended, at an early stage (with what I thought were plausible arguments) that the Angerthas runes should be used throughout the film of The Hobbit, to maintain continuity with the previous filmed The Lord of the Rings and avoid questions of this kind being raised! I even, for instance, retransliterated the text of Thror’s map out of the English fuþorc and into the Angerthas of Erebor (still using English, but different rune forms). At some point, without my being aware of it, the decision was made to go back to the fuþorc in some instances.But where there was something entirely original to the film, like Dwalin’s axes (to which I gave neo-Khuzdul names and corresponding inscriptions in Angerthas) there was no original to go back to, and so there the Angerthas remain.

The result is precisely the mixture which I was trying to avoid! But now that it’s part of the film universe, I think we can at least guess at an in-universe interpretation — where the original inscription is in Khuzdul, the Angerthas are used, but where it was in Westron or another non-Khuzdul (most likely Mannish) language, we see the fuþorc used instead. In which case we have to suppose that, even on such “secret” items as the map and key, the Dwarves preferred not to use Khuzdul in writing. The use of Khuzdul writing on Dwalin’s axes might be seen (and this is just a guess that I came up with while writing this) as a sort of charm, if Khuzdul (which is a language made by a Vala, after all) is thought to have greater magical potency than other languages.

But in general I don’t expect there will be any way of telling which artifacts are going to bear inscriptions in Angerthas and which will use fuþorc; it depends on decisions that I didn’t take part in.

Language Creation 101

Working on creating an extended version of Khuzdul for The Lord of the Rings was a different challenge, both from translating into Quenya and Sindarin, and from simply inventing a new language out of the blue. In the first case, I had a large vocabulary and a general grammatical framework. In the second, I could do pretty much as I pleased. Khuzdul, however, though it had almost no indications of grammar and a very small vocabulary, nonetheless had a definite and distinctive sound and feel to it. It was going to be my task, before all else, to determine what that feel was and replicate it.

The first part of the job was relatively simple: to determine what the sounds of Khuzdul were, and the constraints upon the way those sounds could be organized. Simply by going through a list of all the Khuzdul words and names, I found the following sounds:

Vowels, short: i, e, a, o, u

Vowels, long: î, ê, â, û

Diphthongs: ai

Consonants, stops: b, d, g, t, th, k, kh, ʔ (glottal stop)

Consonants, fricatives: f, s, sh (i.e., /ʃ/), z, h

Nasals: m, n

Liquids and glides: l, r

Of course I knew that, given the small size of the Khuzdul corpus (the total number of words, names, and phrases in the language) this might not be all the sounds there were in the language, but each sound was repeated enough that it seemed to be pretty characteristic of the language. I decided that I wasn’t going to go outside this set of sounds without a very good reason.

The list of letters in the Angerthas, or runic alphabet, which is said to be used by the Dwarves that appears in Appendix E of The Return of the King appears, at first glance, to give a much larger variety of sounds. On closer consideration of its history and use, however, it turns out to be unreliable as a guide to Khuzdul sounds. The Dwarvish Angerthas is a fairly superficial remodelling of an earlier Elvish Angerthas, at first used for writing Sindarin (which explains the existence, for instance, of vowel sounds for ü and ö) and then supplemented by additional letters to write sounds found in other languages, which might include Khuzdul, but would also include Quenya, the Nandorin Elvish languages, and Mannish languages. And since the Dwarves always lived in contact with Men and Elves, they might have retained the same symbols for the same reason. The only sound that the description of the Angerthas specially pointed to as being part of Khuzdul was ʔ, “the clear or glottal beginning of  a word with an initial vowel that appeared in Khuzdul.” By this I understood that when a word appears to begin with a vowel in Khuzdul — like uzbad, iglishmêk, or azanulbizar — it actually began with a ʔ, and this sound would or could be written in Khuzdul with the certh (rune) #35. (However, the word uzbad on Balin’s tomb actually doesn’t begin with this symbol.)

Appendix E’s pronunciation guide also pointed toward another distinctive characteristic of Khuzdul. Dwarvish is there said not to “possess the sounds represented… by th and ch (kh)” (meaning the sounds of English thick and German ach, represented in IPA by /θ/ and /x/) and that the written combinations of Roman letters th and kh actually represented aspirates — that is, IPA /tʰ/ and /kʰ/. The absence of /pʰ/ was notable, as was the presence of /f/ (as in felak-gundu, the Khuzdul name of the Elvish king Finrod Felagund). But my interpretation of these facts will be noted later.

The vowel system seemed fairly straightforward, with one exception: the extreme rarity of o (and the total absence of ô). The vowel o only appears in the name Gabilgathol, Khuzdul for Belegost/Mickleburg, and presumably also meaning “Great Fortress”. I decided that using a lot of o’s in my Khuzdul words would make it look very different from Tolkien’s, and I decided to avoid o where possible (though, as the language later developed, an ô later got in through the backdoor!).  A system of just i, e, a, u looks somewhat “unbalanced,” when charted out: /i/ has /u/ as its back counterpart, but /e/ has no back counterpart at all. Of course, these symbols are, in principle, derived from equating the Khuzdul vowel-signs with their Elvish counterparts, and it’s quite possible that the Dwarvish values ascribed to these symbols were somewhat different. For instance, e might actually represent /æ/, a low front vowel, and a might represent /ɑ/, a low back vowel, in which case the symmetry would be complete. On the other hand, many natural languages do have asymmetrical vowel systems; so I decided not to worry too much about it.

The other question was about possible combinations of sounds. This was easy: there seemed to be no limitations. There was gl in aglâb, ʃm in iglishmêk, zb in uzbad, zd in Khuzdul, kʰs in rukhs, rb in Mazarbul, rbh in Sharbhund, rg in Nargûn, rk in bark and Tharkûn, nd in Bundushathûr, and various other combinations in what were more obviously compound words, like lg in Gabilgathol or lb in Azanulbizar. It seemed that wherever two consonants came together, they remained without change — and this, of course, made the task of construction considerably easier. It also gave the resulting words a very distinctive sound, less mellifluous, perhaps, than the Elvish languages, but more powerful.

But sounds were only the raw materials, the bricks and mortar of a building. In order to start building phrases and sentences, I needed words; and to get words, I needed an architectural plan, a way of bringing those sounds together in consistent shapes that would do two things at the same time: first, carry meaning; second, look and sound like Khuzdul. With this in mind, I proceeded to step two.

Getting started

Hi! This is the first post on the Midgardsmal blog. I’m starting this blog because I know there are a lot of questions about my linguistic work on Tolkien’s languages, particularly in connection with the movies made by Peter Jackson. Instead of trying to write the same answers to a lot of different people, I thought it would be better to put some of these answers out where they can be publicly viewed.

Creating languages to supplement the work of one of the best known language creators in the world is a daunting task. It might have been too daunting if I’d ever thought about it in those terms when I started out. Actually, I kind of got sucked into it gradually.

When I worked on Quenya and Sindarin translations for The Lord of the Rings, over a decade ago, I had a fair-sized vocabulary to start with, and a general grammatical scheme. I tried to stick as closely as possible to what was known, and though I had to improvise at some points, it was less a question of invention than of extending or elaborating along known lines. To use an artistic metaphor, it was like retouching a mural from which some flakes of paint have fallen — from the existing lines and colors, it’s usually not too hard to guess what went in the gaps, though of course you can never be 100% sure.

When I was asked to come up with some Dwarvish-language lines and lyrics for The Lord of the Rings, I initially balked. It wasn’t my first experience with constructing Khuzdul — I had invented some names for the Middle-earth Role Playing Game several years earlier — but that had been with the understanding that I was, in a sense, contributing to a new world, related to Tolkien’s but not quite the same. This felt a bit different. I pointed out that the amount of written Khuzdul could fit on a couple of pages (this is still basically true) and that almost nothing was known about its structure. I said that whatever I wrote in it would be largely a new invention, and that I wasn’t going to pass it off as Tolkien’s own work. I got the go-ahead anyway, and plunged in.