I’d assumed that emoji were all organized into a contiguous Unicode codepoint range, but this is very much not the case!
There are more than a thousand different ranges containing emoji.
The Unicode consortium makes the complete list available as a file, emoji-data.txt
.
Here are a few lines:
25FB..25FE ; Emoji # E0.6 [4] (◻️..◾) white medium square..black medium-small square
2600..2601 ; Emoji # E0.6 [2] (☀️..☁️) sun..cloud
2602..2603 ; Emoji # E0.7 [2] (☂️..☃️) umbrella..snowman
2604 ; Emoji # E1.0 [1] (☄️) comet
I wanted to convert this list into the form U+25FB-25FE,U+2600-2601,...
for use with the kitty terminal’s symbol_map
configuration option.
I wrote some shell to convert it into that format:
| | | | | |
Some pretty big caveats:
emoji-data.txt
also contains ASCII codepoints like#
(0x23
),*
(0x2a
), and0
-9
(0x30-0x39
)! Depending on your usecase you might want to remove these.- One rendered emoji can be composed from multiple codepoints. For example, the emoji 🏋️♂️ (
man lifting weights
) is composed from three codepoints:person lifting weights
+zero width joiner
+male sign
. All three codepoints are listed separately inemoji-data.txt
.