Specification of ETL-9

ETL-9G

Format

Fixed-length records length without control sequence
8 bits per byte, 8199 bytes per record
Big endian
File formats and sample script

Files

One data set contains 3036 characters written by a writer, hence 12144 = 4 * 3036
20 sheets per writer: like 1-20: first writer, 21-40: second writer etc.

filename	# records	# categories	# data sets	data set indices	# sheets
ETL9G_01	12144	3036	4	1-4	80
ETL9G_02	12144	3036	4	5-8	80
⋮	⋮	⋮	⋮	⋮	⋮
ETL9G_50	12144	3036	4	197-200	80

Samples

filename	record	metadata and JIS code in hex	image
ETL9G_01	1	(1, 12321, ‘A.TSUGU ‘, 1, 0, 0, 0, 0, 0, 0, 8212, 8310, 0, 0) 0x3021
ETL9G_11	101	(1, 12580, ‘IN.HIBI ‘, 101, 0, 0, 0, 0, 0, 0, 8212, 8311, 4, 6) 0x3124
ETL9G_21	201	(2, 12839, ‘OU.OKI ‘, 49, 0, 0, 0, 0, 0, 0, 8212, 8406, 0, 3) 0x3227
ETL9G_31	301	(2, 13101, ‘KAI.BAI ‘, 301, 0, 0, 0, 0, 0, 0, 8212, 8405, 4, 9) 0x332d
ETL9G_41	401	(3, 13360, ‘KAN.MA ‘, 401, 0, 0, 0, 0, 0, 0, 8212, 8403, 0, 6) 0x3430

ETL-9B

ETL-9B is generated from ETL-9G by binalization. The threshold is determined by T=λ∙h + (1-λ)∙μ, where h is Otsu’s threshold [4] and μ is the average of all intensity levels in ETL-9G [5]. For ETL-9B, λ=0.4 [1][2].

Format

File formats and sample script

Files

One data set contains 3036 characters written by a writer, hence 121440 = 40 * 3036
20 sheets per writer: 1-20: first writer, 21-40: second writer etc.
The first record of each file is dummy filled by zeros
The last data set of 3036 records of ETL9B_5 is the model presented to examinees

filename	# records	# data sets	data set index	# sheets
ETL9B_1	121440	40	1-40	800
ETL9B_2	121440	40	41-80	800
ETL9B_3	121440	40	81-120	800
ETL9B_4	121440	40	121-160	800
ETL9B_5	121440+3036	40+1	161-200	800+20

Samples

filename	record index (dummy record as 0)	metadata and JIS code in hex	image
ETL9B_1	1	(1, 9250, ‘A.HI’) 0x2422
ETL9B_2	100	(801, 12349, ‘AYA.’) 0x303d
ETL9B_3	200	(1601, 12611, ‘EI.A’) 0x3143
ETL9B_4	300	(2402, 12873, ‘KA.Y’) 0x3249
ETL9B_5	400	(3203, 13135, ‘KAKU’) 0x334f

References

斉藤泰一、山田博三、山本和彦: “JIS第1水準手書漢字データベースETL9とその解析”, 「信学論(D) 画像処理特集号」, Vol.J68-D, No.4, pp.757–764 (1985-04).
斉藤泰一、山田博三、山本和彦: “手書文字データベースの解析(VIII) －方向パターン・マッチング法によるJIS第1水準手書漢字データベースETL9の評価－”, 「電総研彙報」, Vol.49, No.7, pp.487–525 (1985-07).
斉藤泰一、山本和彦、山田博三: “手書文字データベースの解析(IX) －データベースETL9とその見本文字について－”, 「電総研彙報」, Vol.50, No.4, pp.259–263 (1986-04).
大津展之: “判別および最小2乗規準に基づく自動しきい値選定法”, 「信学論(D)」, Vol.63-D, No.4, pp.349–356 (1980-04).
斉藤泰一、山田博三: “判別しきい値選定法の一改良”, 「情報処理学会論文誌(情処学論)」, Vol.22, No.6, pp.596–599 (1981-11).