kakasi (0.1.0)
Installation
[registry]
default = "forgejo"
[registries.forgejo]
index = "sparse+ " # Sparse index
# index = " " # Git
[net]
git-fetch-with-cli = true
cargo add kakasi@0.1.0
About this package
kakasi
kakasi
is a Rust library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet).
It was ported from the pykakasi library which itself is a port of the original kakasi library written in C.
Usage
Transliterate:
let res = kakasi::convert("こんにちは世界!");
assert_eq!(res.hiragana, "こんにちはせかい!");
assert_eq!(res.romaji, "konnichiha sekai!");
Check if a string contains Japanese characters:
use kakasi::IsJapanese;
assert_eq!(kakasi::is_japanese("Abc"), IsJapanese::False);
assert_eq!(kakasi::is_japanese("日本"), IsJapanese::Maybe);
assert_eq!(kakasi::is_japanese("ラスト"), IsJapanese::True);
CLI
$ cargo install kakasi
## Convert to romaji
$ kakasi こんにちは世界!
konnichiha sekai!
## Convert to hiragana
$ kakasi -k こんにちは世界!
こんにちはせかい!
## Read from file
$ kakasi -f rust_article.txt
## Read from STDIN
$ echo "こんにちは世界!" | kakasi
Performance
CPU: AMD Ryzen 7 5700G
Text | Conversion time | Speed |
---|---|---|
Sentence (161 B) | 7.0911 µs | 22.70 MB/s |
Rust wikipedia article (31705 B) | 1.5055 ms | 21.06 MB/s |
CLI comparison
Time to convert a 100KB test file using the CLI:
Library | Time | Speed |
---|---|---|
kakasi (Rust) | 7.4 ms | 13.5 MB/s |
kakasi (C) | 33.5 ms | 2.99 MB/s |
pykakasi (Python) | 810.6 ms | 0.123 MB/s |
Test commands:
CLI performance was measured with hyperfine.
hyperfine --warmup 3 'cat 100K.txt | kakasi-rs'
hyperfine --warmup 3 'cat 100K.txt | kakasi -i utf-8 -Ka -Ha -Ja -Sa -s'
hyperfine --warmup 3 'cat 100K.txt | python bin/kakasi -Ka -Ha -Ja -Sa -s'
License
kakasi is published under the GNU GPL-3.0 license.
The Kakasi dictionaries (Files: codegen/dict/kakasidict.utf8
, codegen/dict/itajidict.utf8
,
codegen/dict/hepburn.utf8
)
were taken from the pykakasi project,
published under the GNU GPL-3.0 license.
pykakasi
Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)
The dictionaries originate from the kakasi project, published under the GNU GPL-2.0 license.
original kakasi
Copyright (C) 1992 1993 1994
Hironobu Takahashi (takahasi@tiny.or.jp),
Masahiko Sato (masahiko@sato.riec.tohoku.ac.jp),
Yukiyoshi Kameyama, Miki Inooka, Akihiko Sasaki, Dai Ando, Junichi Okukawa,
Katsushi Sato and Nobuhiro Yamagishi
For testing I included a copy of the Japanese Rust wikipedia article
(tests/rust_article.txt
). The article is published under the Creative Commons
Attribution-ShareAlike License 3.0.
Dependencies
ID | Version |
---|---|
byteorder | ^1.4.3 |
phf | ^0.11.1 |
phf_shared | ^0.11.1 |
unicode-normalization | ^0.1.22 |
criterion | ^0.4.0 |
proptest | ^1.0.0 |
rstest | ^0.16.0 |