Digital Musa

Encoding

From the human point of view, a Musa text is a sequence of half letters. But from a digital point of view, a Musa text is a sequence of full letters. In other words, the Musa encoding is essentially Alphabet Gait. Not only is it complete and compact, but unsophisticated rendering engines will still produce legible output.

Musa is encoded in the Private Use Area of Unicode, starting at E000 and ending at E2FF. This makes Musa compatible with Unicode (and the SIL PUA) but not part of it. Since Unicode is not allowed to change, it won't be appropriate to encode Musa directly in Unicode until it stops evolving.

The entire E2xx page, from E200 to E2FF, is reserved for Musa markup: codes that control how Musa is displayed. We'll tell you about markup on the next page.

The first two lines of the E0xx page are reserved for some future use. The first Musa codepoint, E000, has a special meaning as the end-of-text character. It indicates to computers that there's no more text to display or transmit. The ASCII equivalent is ETX 0003, while in the C language, it's 0000.

The next two lines - E020-E3FF - are used to encode Musa shapes (not letters). This collection includes a few shape variants, so that they can be used as half-letters, as keycaps, as shapes on blocks and tiles, and other possible uses for the bare shapes.

The rest of the two Musa pages encodes letters (not shapes). Here is the complete set of codepoints, with a box of "tofu" in the unused codepoints:

_0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F
E00_
E01_
E02_
E03_
E04_
E05_
E06_
E07_
E08_
E09_
E0A_
E0B_
E0C_
E0D_
E0E_
E0F_
_0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F
E10_
E11_
E12_
E13_
E14_
E15_
E16_
E17_
E18_
E19_
E1A_
E1B_
E1C_
E1D_
E1E_
E1F_
_0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F
ASCIISpaceEnterTabEscape

The hexadecimal numbers at top left, top and left add up to indicate the code point of the letter in each cell. For instance, the Musa n is at code point E110.

The double-wide Musa logo is at E17E, as if it were spelled by its two components. The Musa colon is at E155, as if it were spelled by two circles.

The Musa dot letter is encoded at E040 separately from the normal Unicode space at 0020. The rule is that the space between Musa text and other text is the normal space, but the dot is used within Musa text. That confounds the non-Musa end-of-line algorithms so that lines of Musa text in Alphabet gait are right-justified. The other gaits may have to leave an extra space or two at the right side of a line.

Digital Gaits

In Musa, the gaits are implemented using OpenType Advanced Typography, which specifies substitutions or positionings of glyphs in certain circumstances. For example, in Kana gait, a sequence of consonant+vowel is replaced by the corresponding kana. The feature set is rich enough for everything Musa needs, mostly ligatures and contextual alternates.

Since gaits are implemented as fonts, there's no need for special treatment during text entry, transmission or storage. Musa text can be searched and sorted without regard for gait, and foreign words in text that can't be written in the gait of the text will appear in Alphabet gait. On the next page, we'll explain how Musa Markup gives you a way to embed the gait in the text without changing the letters.

Musa fonts share a common naming format: a font name, the word Musa, and then a gait keyword, like Dushan Musa Alphabet or Zhou Musa Fangzi, followed by a style (Regular, Bold, Italic, ...). The possible gait keywords are:

Domains

The conventional extension for sites completely in Musa will be .musa or the single Musa letter , at E17E. However, there isn't yet a Musa superdomain, so your site could be musa.mysite.com or mysite.com/musa, for example.

Letter Spacing

We mentioned on the Principles page that spacing is left to the font designer, but there are some basic ideas that are common to all Musa fonts. Musa is written on a grid whose cell size corresponds to the size of a Musa vowel. Tall letters are twice as tall, and fill a domino (a 2x1 rectangle). It's the height of this domino that is the named size of the font; for example, a 12-point font will have cells 6 points x 6 points. A line of Musa text will come pretty close to the height of a line of text of the same font size in other scripts.

But the letters don't completely fill the cell or domino : there is spacing around them (like CSS margin). The space between letters depends on the font, with half of it in each domino. Adjacent letters are in adjacent cells. Musa is usually monospace, with fixed-width letters. The Break and the Long mark have whitespace on both sides.

In Alphabet gait, ascenders and descenders extend half a cell beyond other letters, into the space between lines of text. Latin descenders hang down below the baseline to a similar extent. However, most Latin fonts have ascenders that only go as high as capital letters, while that's the height of normal Musa letters - Musa ascenders go higher. The result is that Musa text needs more space between lines than Latin text. Here's a diagram - the black grid shows the borders between cells:

In Kana gait, the kana fill 2x2 cells.

In Fangzi gait, the fangzi fill 3x3 cells, with double margin all around, but centered on the same centerline as other gaits.

Abjad spacing depends on the font - vowels may be written inside the consonants or above them - but the line of consonants is always two cells tall.

The space between words is always one cell wide, and lines of text are 3 cells apart, center to center, for all gaits except in Fangzi gait, where they're 4 cells apart.


< Letter Reference Markup >


© 2002-2021 The Musa Academy musa@musa.bet 15jul21