Write Arabic Text in Pico8 – ArSpr

Every variation of every character in the font. 150+ glyphs.

اقرأ هذه المقالة بالعربية

ArSpr is a set of 2 utility functions that let’s you write Arabic text in Pico8. You input a Latin text and it will transliterate it to Arabic. I made it because I want to make localized Pico8 games, and to practice writing my own implementation for Right-to-Left text and Arabic Shaping.

You can get it from the Pico8 forum:
https://www.lexaloffle.com/bbs/?tid=35164

Or you can download the following p8.png cart and import it to Pico8:

arspr.p8.png

Usage is easy, but slightly different from the regular Print() function in Pico8; you have to initialize a variable to contain the text data then draw it. Here’s an example:

function _init()
  text1=create_ar_spr([string])
end

function _draw()
  cls()
  draw_ar(text1, [x], [y])
end
Usage example

The text has 3 colors. One for the letters, one for the dots, and one for the tashkeel. We can easily change colors like this:

function _init()
  text1=create_ar_spr([string])
end

function _draw()
  cls()
  ar_col1=[1-16] --letters color
  ar_col2=[1-16] --dots color
  ar_col3=[1-16] --tashkeel color

  draw_ar
end
Changing colors example

[Reading past this point is optional]


TECHNICAL INFO ABOUT AR_SPR

This project was inspired by Tiny Text, which uses a similar method to encode a Pico8 font.

ArSpr’s specifications:
2 sprites
510 out of 8192 tokens
~30% of compressed space

No dropped frames when filling the screen

VISUAL DESIGN

Each character’s height is fixed at 8 pixels max. The width is variable, but most characters are 5 pixels wide including an empty 1 pixel space on the right of Initial and Isolated glyphs.

I decided early that I wasn’t going to do a mono-spaced font because some letters need more space to be legible. Such as ض which is as wide as 2 typical letters when pixelated.

FONT ENCODING AND PARSING

The compressed font is made up of 2 sprites total:

Basically, we draw text on screen by copying and pasting chunk areas from the sprite map.

The glyph on the left can be broken up to create any of the glyphs on the right

The chunk data is encoded in a few long strings as Pico8 doesn’t consider string data in the Token count. Let’s take a look at one of them. This one contains all the tashkeel characters:

_ar_tash = ",^__121__,/__121_6,~_112__1,▥_22111_+__32____,❎_2211__+__221__1+__321___,🐱_2211__,ˇ_2211y_+__2211__,⬇️_2211_6,✽_2211y6+__2211_6"

The string _ar_tash has the crop and draw locations for every drawn chunk. Each character in the string is converted to a letter or number. The format is “[character], [character advance], [sprite X pos], [sprite Y pos], [sprite width], [sprite height], [draw offset X], [draw offset Y], [end chunk or add a new chunk to same character]” (every bracket in the list is a single character. Underscore equals zero or null.)

Let’s take an example:

"✽_2211y6+__2211_6,"

That translates to “The letter ✽ will advance the line 0 pixels. It has a chunk at (x=2,y=2), size (1,1), drawn at offset (-2, 6) and another chunk (2, 2), size (1, 1), drawn at offset (0, 6).”

Arabic Shaping

Letters have 4 glyph variations (initial, medial, final, isolated). Arabic Shaping is when we pick the needed glyph based on A) the position of the letter in the word and B) the “Joining Type” of letter.

In the alphabet, there are 2 Joining Types: 1- Some letters can connect to the previous and following letter and we call them Dual-Joining. They have all 4 glyph variations (initial, medial, final, isolated), such as ب
ببب ب – بـ ـبـ ـب ب

2- Other letters can only join the previous letter and we call them Previous-Joining. They have 2 glyph variations (final, isolated), such as د
ددد بدددد – ـد د

Full list of Arabic letters split into the 2 groups.

Other characters, such as (space, numbers, punctuation), do not join with letters and so only have a single, isolated glyph.

Tashkeels (diacritics) are completely ignored by the Arabic Shaper and they have zero width so they could be drawn in the same position as a letter. They only have a single glyph each.

My Arabic Shaping implementation is a simple case-switch (pseudo code):


function _get_font(previousChar,currentChar,nextChar)
	
	if currentChar is tashkeel
            return _ar_tashkeel_glyph

	elseif currentChar is (space, number, punctuation)
	     return _ar_nonletter_glyph
		
	elseif previousChar is any except dual-joining letter
	and currentChar is dual-joining letter
	and nextChar is any except (space, number, punctuation)
	     return _ar_init_glyph
		
	elseif previousChar is dual-joining
	and currentChar is dual-joining
	and nextChar is any except (space, number, punctuation)
	     return _ar_medial_glyph
		
	elseif previousChar is dual-joining 
	and currentChar is dual-joining
	and nextChar is (space, number, punctuation)
	     return _ar_final_glyph
	
	elseif previousChar is dual-joining letter
	and currentChar is previous-joining letter
	     return _ar_final_glyph
	
        -- all previous checks failed.
	else
	     return _ar_isolated_glyph
	end
end

For a different implementation of Arabic Shaping you can check out MiniBidi by Ahmad Khalifa.

Possible Improvements

  • We can eliminate duplicate chunk data in the strings by letting characters reference common chunks.
  • We can try using caaz’s sprite compression technique.
  • In an earlier version, I had one function called printAr(str,x,y) that didn’t need to initialize strings by caching string after being drawn for the first time. It was neat to have everything in one function! But I removed it to save tokens. It would be nice to bring it back…