The transcriber is a tool that enables you to convert text into Musa. For now, it only works with American English text written in the Roman alphabet, and it only produces Musa with defective punctuation. If you need full Musa punctuation, you will need to edit the Musa by hand. You will also need to transcribe by hand any words not in the dictionary.

The main challenge in transcription is distinguishing heteronyms (words that are spelled alike but pronounced differently, like read or project). Here, you have two choices: you can let the transcriber choose one of the heteronyms on its own, or you can tell it which one you want. However, this transcriber only knows 71 of the most common heteronyms; there are many more. We also let you choose between dialectal variants: for example, I pronounce caught to rhyme with thought, but CMUdict says it's more common to rhyme it with cot.

This transcriber uses the Carnegie Mellon University Pronouncing Dictionary (CMUdict) of North American English, which has over 134,000 entries, including many proper names. Words that are not found in the dictionary will be left untranscribed, but numerals and punctuation will be converted (using defective punctuation). One-syllable grammar words are assumed to be unstressed, but in your text, some of these may be stressed - What is THAT? What IS that? WHAT is THAT??? - so please check the text after transcription.

The CMUdict distinguishes between mid schwa and close schwi , for instance in Lisa's versus leases. But in many cases, CMUdict uses a schwa where I would use a schwi; for example, in words like bullet bottle bottom button. Based on Flemming's work, almost all of these schwas should be raised to schwis. The main cases when schwa is not raised are when initial as in about, when final as in sofa (including plural sofas and possessive sofa's), in the prefixes un- and up-, before another vowel as in extraordinary, or at the end of a syllable within a word as in alphabet. But the exceptions are too numerous and too diverse to be corrected algorithmically, and yet schwa/schwi occurs too often in English for us to ask you each time. So until a better solution appears, we give you two choices: accept the CMUdict readings or raise all schwas to schwis except in the cases mentioned above.

The dictionary entries have been adjusted to account for aspiration of unvoiced plosives at the beginning of words and stressed syllables, and darkening of L after vowels. T and d are spelled as flaps between a stressed vowel and a reduced vowel, but not across word boundaries or despite an intervening r l m n.

Here is the dictionary itself, as a text file. It contains over 134,000 English words and names, along with their Musa transcriptions. This is the same data used in the Transcriber; note that the names are pronounced as in North America, which is often different from how they are pronounced in their country of origin.

