Database Details


ETL1

Background of ETL1 Development

ETL1 was created as part of the Agency of Industrial Science and Technology’s large-scale project, “Research and Development of Pattern Information Processing System,” which began in 1971. The database contains alphanumeric characters, special symbols, and 99 handwritten Katakana characters. The OCR sheets and the observation system were developed jointly by the Electrotechnical Laboratory and Fujitsu Limited, and the observations were made using the TOSBAC-3400 computer, which was then installed in the Graphic Processing Laboratory of the Electrotechnical Laboratory. The observations were made on a TOSBAC-3400 computer installed at the Graphic Processing Laboratory of the Electrotechnical Laboratory. For this database, individual characters were evaluated by human observers, and values indicating good quality were included as part of the ID information attached to the observed patterns.

Observation Specifications

  • OCR Sheet Specifications
    • Handwriting Character Reading Paper : B5 size, 90kg OCR paper (1 type)
    • Dropout color : No.26 violet 50% screen (DNP)
    • Character frame : width 5mm, height 7mm
    • Character frame pitch: 7.62 mm (width), 12.7 mm (height)
    • Number of character frames: 10 x 12 = 120
  • Characters (99 in total)
    • Numerals : 10
    • Upper case alphabet : 26
    • Special characters : 12
    • Katakana : 51
  • OCR sheet collection
  • Observation equipment
    • Input device : Flying Spot Scanner (FSS) (Flying Spot Scanning Tube 5CNP16) (Photomultiplier Tube 7696)
    • Sampling interval : 0.133mm x 0.133mm
    • Spot size : 0.1333mm
    • Density level : 16 (4bit)
    • Number of specimens : 72 x 76 (later moved to 64 x 63 at the center)
  • Database creation
    • Location : National Electrotechnical Laboratory
    • Computer used : TOSBAC-3400/41 (program: FSSTOMT)
    • Creation Date : September 1973
    • Observation Period : September – December 1973 (~January – March)

ETL1 Database Specifications


ETL2

Background of ETL2

ETL2 was created at the same time as ETL1, as part of the same major project, in collaboration with the Electrotechnical Laboratory of Japan and Tokyo Shibaura Electric Co. Ltd. and the Mainichi Newspapers. The data was observed and edited by TOSPICS and TOSBAC-5600, respectively, at Tokyo Shibaura Electric Co.

Observation Specifications

  • OCR Sheet Specifications
    • OCR sheet : B4 size, 90kg OCR paper for continuous use
    • Character size : Newspaper type 8 point (Letterpress), Patent and Public Relations type 9 point (Offset printing)
  • Target characters (Total 2,184 characters) (CO-59 Code)
    • Hiragana, Katakana, alphanumeric characters, symbols, Kanji
  • OCR sheet collection
    • Data collection : Dai Nippon Printing Co., Mainichi Newspapers
    • Total number of samples : 52,796
  • Observation device
    • Input device : ITV camera and scanner 240×240
    • Sampling interval : 54μm x 54μm
    • Spot diameter : 54μm
    • Density level : 64 (6bit)
    • Number of specimens : 60 x 60 = 3,600 pixels
  • Database Creation
    • Location : Toshiba Research Laboratory
    • Computer used : TOSBAC-40C TOSPICS system (program: )
    • Creation Date : October, 1973
    • Observation period : October 1973

ETL2 Database Contents

The CO-59 code (for Rokusha Agreement Newspaper) was established in 1959 by six Japanese newspaper publishers, and its code table is “Kan Telefax Code and Character Arrangement Table”.


ETL3

Background of ETL3 Creation

ETL3 is a database created in 1974 in collaboration with the Electrotechnical Laboratory and Hitachi, Ltd. The OCR sheets were collected by Hitachi, Ltd. and the TOSBAC-3400 observation system of the Electronics Research Institute was used for the observation.

Observation Specifications

  • OCR sheet specification (same OCR sheet as ETL1 is used)
    • Handwriting character reading paper : B5 size, 90kg OCR paper (1 type)
    • Dropout color : No.26 violet 50% screen (DNP)
    • Character frame : width 5mm, height 7mm
    • Character frame pitch: 7.62 mm (width), 12.7 mm (height)
    • Number of character frames: 10 x 12 = 120
  • Characters (48 characters in total)
    • Numerals : 10
    • Upper case alphabet : 26
    • Special characters: 12
  • OCR sheet collection
  • Observation equipment
    • Input device : Flying Spot Scanner (FSS) (Flying Spot Scanning Tube 5CNP16) (Photomultiplier Tube 7696)
    • Sampling interval : 0.133mm x 0.133mm
    • Spot size : 0.1333mm
    • Density level : 16 (4bit)
    • Number of specimens : 72 x 76 = 5,472 pixels
  • Database Creation
    • Location : National Electrotechnical Laboratory
    • Computer used : TOSBAC-3400/41 (program: FSSTOMT)
    • Creation Date : April 1974
    • Observation Period : April 1974

ETL3 Database Specifications


ETL4

Background of ETL4 Creation

ETL4 is a database created in 1974 by collecting OCR sheets at Nagoya University and using the TOSBAC-3400 observation system at the Electrotechnical Laboratory. 51 types of hiragana were filled in using sample characters as reference.

Observation Specifications

  • OCR sheet specification (same OCR sheet as ETL1 is used)
    • Handwriting character reading paper : B5 size, 90kg OCR paper (1 type)
    • Dropout color : No.26 violet 50% screen (DNP)
    • Character frame : width 5mm, height 7mm
    • Character frame pitch: 7.62 mm (width), 12.7 mm (height)
    • Number of character frames: 10 x 12 = 120
  • Characters (Total 51 characters)
    • Hiragana : 51
    • OCR sheet collection
    • Collection location: Nagoya University
    • Restrictions on entry : Specify how to fill in the form.
    • Number of scribes : 120
    • Total sample size : 6,120
  • Observation equipment
    • Input device : Flying Spot Scanner (FSS) (Flying Spot Scanning Tube 5CNP16) (Photomultiplier Tube 7696)
    • Sampling interval : 0.133mm x 0.133mm
    • Spot size : 0.1333mm
    • Density level : 16 (4bit)
    • Number of specimens : 72 x 76 = 5,472 pixels
  • Database Creation
    • Location : National Electrotechnical Laboratory
    • Computer used : TOSBAC-3400/41 (program: FSSTOMT)
    • Creation Date : December 1974
    • Observation Period : December 1974

ETL4 Database Specifications


ETL5

Background of ETL5 Creation

ETL5 is a database created in 1974 by collecting OCR sheets at Fujitsu and using the TOSBAC-3400 observation system at the Electrotechnical Laboratory. 51 katakana characters were entered using sample characters as reference.

Observation Specifications

  • OCR sheet specification (same OCR sheet as ETL1 is used)
    • Handwriting character reading paper : B5 size, 90kg OCR paper (1 type)
    • Dropout color : No.26 violet 50% screen (DNP)
    • Character frame : width 5mm, height 7mm
    • Character frame pitch: 7.62 mm (width), 12.7 mm (height)
    • Number of character frames: 10 x 12 = 120
  • Characters (Total 51 characters)
    • Katakana : 51
  • OCR sheet collection
  • Observation equipment
    • Input device : Flying Spot Scanner (FSS) (Flying Spot Scanning Tube 5CNP16) (Photomultiplier Tube 7696)
    • Sampling interval : 0.1mm x 0.1mm
    • Spot size : 0.1mm
    • Density level : 16 (4bit)
    • Number of specimens : 72 x 76 = 5,472 pixels
  • Database Creation
    • Location : National Electrotechnical Laboratory
    • Computer used : TOSBAC-3400/41 (program: FSSTOMT)
    • Creation Date : February 1975
    • Observation Period : February 1975

ETL5 Database Specifications


ETL6

History of ETL6 Creation

As a project of the Agency of Industrial Science and Technology and the Japan Electronics Industry Development Association, a committee specializing in handwritten characters for OCR was established in 1974, and in 1976, draft character standards for a total of 114 handwritten characters for OCR were created, including katakana, alphanumerics, and symbols. ETL6 is the data obtained from the TOSBAC-40C observation system at the Electrotechnical Laboratory. JISC6254-1979 (Katakana) and JISC6255-1979 (Numerals), which are now Japanese Industrial Standards, were created based on this draft character standard.

Observation Specifications

  • OCR Sheet Specifications
    • OCR Data Collection Paper : A4 size, 100kg OCR paper (Tokushu Paper Mfg. Co., Ltd.)
    • Dropout color : No.114 Reddish Orange 50% screen (Dai Nippon Printing)
    • Character frame : 5mm (width), 6mm (height)
    • Character frame pitch: 6.35 mm (width), 12.7 mm (height)
    • Number of character frames: 26 x 17 = 442
  • Characters (114 characters in total)
    • Numerals : 10
    • Upper case letters : 26
    • Katakana : 46
    • Special characters : 32
  • OCR sheet collection
  • Observation device
    • Input device: photoconductive imaging tube (VIDICON) (Toshiba Calnicon)
    • Filter : Transmission limit wavelength 620nm (JIS B 7113 R-62)
    • Sampling interval : approx. 0.11mm x 0.11mm
    • Density level : 16 (4bit)
    • Number of specimens : 64 x 63 = 4,032 pixels
  • Database creation
    • Location : National Institute of Electronics Technology
    • Computer used : TOSBAC-40C (program : VIDSYS)
    • Creation Date : December 1976
    • Observation Period : December 1976 – May 1977

ETL6 Database Specifications


ETL7

Background of the creation of ETL7

ETL7 is the data obtained from the observation of OCR sheets filled out by 175 people (OCR users, manufacturers, universities, and government offices) for 48 hiragana characters in 1977, using the TOSBAC-40C observation system at the Electrotechnical Laboratory.

Observation Specifications

  • OCR sheet specifications (Large, Small 2 types)
    • Handwritten OCR paper : A4 size, 100kg OCR paper (Tokushu Paper Mfg. Co., Ltd.)
    • Dropout color : No.114 Reddish orange 50% screen (Dai Nippon Printing)
    • Character frame : Large 6mm wide, 7.2mm high / Small 5mm wide, 6.0mm high
    • Font frame pitch : Large 8.47mm (W) x 11.0mm (H) / Small 6.35mm (W) x 12.7mm (H)
    • Number of character frames: Large 20 x 20 = 400 / Small 26 x 17 = 442
  • Characters (Total 48 characters)
    • Hiragana : 46
    • Dakuten and handakuten : 2
  • OCR sheet collection
  • Observation equipment
    • Input device : Photoconductive imaging tube (VIDICON)
    • Filter : Transmission limit wavelength 620nm (JIS B 7113 R-62)
    • Sampling interval : Large 0.13mm x 0.13mm / Small 0.11mm x 0.11mm
    • Density level : 16 (4bit)
    • Number of specimens : 64 x 63 = 4,032 pixels
  • Database Creation
    • Location : National Institute of Electronics Technology
    • Computer used : TOSBAC-40C (program: )
    • Creation Date : August 1977
    • Observation Period : August 1977

ETL7 Database Specifications


ETL8 (ETL8G, ETL8B)

Background of ETL8 creation

ETL8 is a database of OCR sheets collected in 1980 from 1,600 people including OCR users, manufacturers, Nagoya University, and others by the Handwriting Specialist Committee for OCR of the Japan Electronic Industry Development Association. The database contains 881 educational kanji characters and 75 hiragana characters.

Observation Specifications

  • OCR Sheet Specifications
    • OCR data collection paper : A4 size, 83kg OCR paper (special paper) (10 types)
    • Dropout color : No.114 Readish Orange 50% screen (Dai Nippon Printing)
    • Character frame : 10mm (width), 10mm (height)
    • Character frame pitch: 12.7 mm (width), 15.24 mm (height)
    • Number of character frames: 13 x 16 = 208
  • Characters covered (956 characters in total)
    • Kanji for education : 881 (According to the Cabinet Notification No. 1, 1948, “Kanji for Today’s Use”)
    • Hiragana : 75
  • OCR sheet collection
    • Restrictions on entry: Specified in the “Request for filling in the handwritten Kanji OCR sheet
    • Number of scribes : 1,600 in total
    • Total sample size: 152,960
    • Data collection: OCR Handwriting Character Expert Committee, Japan Electronic Industry Development Association, Nagoya University
  • Observation device
    • Input device : 128×1 point photodiode array sensor (ADC 6bit) (semiconductor array RL-128EC by Reticon)
    • Sampling interval : 0.108mm x 0.1016mm
    • Density level : 16 (4bit <– 6bit)
    • Number of specimens : 128 x 127 = 16,256 pixels
  • Database Creation
    • Location : Electrotechnical Laboratory
    • Computer used : TOSBAC-40C (program: )
    • Creation Date : February 1980
    • Observation Period : February 1980 – ? February 1980 – ?

ETL8 Database Specifications


ETL9 (ETL9G, ETL9B)

History of ETL9 Creation

ETL9 is a database collected by the Japanese Information Processing Standardization Research Committee, Specialist Committee C, established in 1980 by the Japan Electronics Industry Development Association on behalf of the Agency of Industrial Science and Technology (AIST). The database is based on observations made by the TOSBAC-40C observation system at the Electrotechnical Laboratory.

Observation Specifications

  • OCR sheet specifications
    • OCR data collection paper : A4 size, kg OCR paper (special paper) (20 types)
    • Dropout Color : No.114 Reddish Orange 50% screen (Dai Nippon Printing)
    • Character frame : width 8mm, height 9mm
    • Character frame pitch : 10 mm (W) x 12 mm (H)
    • Number of character frames: 16 x 20 = 320
  • Target characters (total 3,036 characters)
    • JIS first level Kanji : 2,965 (JIS X 0208)(JIS C 6226-83)
    • Hiragana : 71
  • OCR sheet collection
    • Restrictions on entry : Specified in “Writing Instructions
    • Number of scribes : 4,000 in total
    • Total sample size : 607,200
  • Observation device
    • Input device : 128×1 photodiode array sensor (ADC 6bit) (Semiconductor array Reticon RL-128EC)
    • Sampling interval : 0.108mm x 0.1016mm
    • Density level : 16 (4bit <– 6bit)
    • Number of specimens : 128 x 127 = 16,256 pixels
  • Database Creation
    • Location : Electrotechnical Laboratory
    • Computer used : TOSBAC-40C (program: )
    • Creation Date : March 1984

ETL9 Database Specifications


References

  • [A] H.YAMADA, S.MORI: “An Analysis of the Hand-Printed Character Data Base. I,” Bulletin of the Electrotechnical Laboratory,
  • Vol.39, No.8, pp.580-599 (1975-08) (in Japanese).
  • [B] H.YAMADA, S.MORI: “An Analysis of Hand-Printed Character Data Base. II,” Bulletin of the Electrotechnical Laboratory, Vol.40, No.6, pp.513-529 (1976-06) (in Japanese).
  • [C] T.SAITO, H.YAMADA, S.MORI: “An Analysis of Hand-Printed Character Data Base. III,” Bulletin of the Electrotechnical Laboratory, Vol.42, No.5, pp.385-434 (1978-05) (in Japanese).
  • [D] S.MORI, K.YAMAMOTO, H.YAMADA, T.SAITO: “On A Handprinted KYOIKU-KANJI Character Data Base,” Bulletin of the Electrotechnical Laboratory, Vol.43, Nos.11&12, pp.752-773 (1979-11&12) (in Japanese).
  • [Otsu] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979.
  • [E] T.SAITO, H.YAMADA, S.MORI: “An Analysis of Handprinted Character Data Base. IV. – Statistics of KYOIKU-KANJI Characters -,” Bulletin of the Electrotechnical Laboratory, Vol.44, No.4, pp.219-251 (1980-04) (in Japanese).
  • [F] T.SAITO, H.YAMADA, K.YAMAMOTO, S.MORI: “On A Handprinted KANJI Data Base – KYOIKU-KANJI -,” 1981 National Convention Record, The Institute of Electronics and Communication Engineers, 1385 (1981-04) (in Japanese).
  • [G] T.SAITO, H.YAMADA, K.YAMAMOTO, S.MORI: “An Analysis of Handprinted Character Data Base. V. – Evaluation of KYOIKU-KANJI Characters by Pattern Matching Approach -,” Bulletin of the Electrotechnical Laboratory, Vol.45, Nos.1&2, pp.49-77 (1981-01&02) (in Japanese).
  • [H] T.SAITO, H.YAMADA, K.YAMAMOTO, R.OKA, M.YASUDA, T.SAKAKURA, H.SONE: “An Intuitive Analysis of Handprinted KYOIKU-KANJI Characters,” 1982 National Convention Record, The Institute of Electronics and Communication Engineers, 1342 (1982-03) (in Japanese).
  • [I] T.SAITO, H.YAMADA, K.YAMAMOTO: “An Analysis of Handprinted Chinese Characters by Directional Pattern Matching Approach,” Transactions of The Institute of Electronics and Communication Engineers, Vol.J65-D, No.5, pp.550-557 (1982-05) (in Japanese).
  • [J] T.SAITO, H.YAMADA, K.YAMAMOTO: “An Analysis of Handprinted Character Data Base. VI. – An Analysis of KYOIKU KANJI Characters by Directional Pattern Matching Approach -,” Bulletin of the Electrotechnical Laboratory, Vol.46, No.12, pp.695-725 (1982-12) (in Japanese).
  • [K] T.SAITO, H.YAMADA, K.YAMAMOTO, M.YASUDA: “An Analysis of Handprinted Character Data Base. VII. – An Intuitive Analysis of Handprinted KYOIKU KANJI Characters -,” Bulletin of the Electrotechnical Laboratory, Vol.47, No.4, pp.261-275 (1983-04) (in Japanese).
  • [L] T.SAITO, H.YAMADA, K.YAMAMOTO: “On the Data Base ETL9 of Handprinted Characters in JIS Chinese Characters and Its Analysis,” Transactions of The Institute of Electronics and Communication Engineers, Vol.J68-D, No.4, pp.757-764 (1985-04) (in Japanese).
  • [M] T.SAITO, H.YAMADA, K.YAMAMOTO: “An Analysis of Handprinted Character Data Base. VIII. – An Estimation of the Data Base ETL9 of Handprinted Characters in JIS Chinese Characters by Directional Pattern Matching Approach -,” Bulletin of the Electrotechnical Laboratory, Vol.49, No.7, pp.487-525 (1985-07) (in Japanese).
  • [N] T.SAITO, K.YAMAMOTO, H.YAMADA: “An Analysis of Handprinted Character Data Base. IX. – On the Data Base ETL9 and its Model Patterns -,” Bulletin of the Electrotechnical Laboratory, Vol.50, No.4, pp.259-263 (1986-04) (in Japanese).
  • [O] K.YAMAMOTO, T.SAITO, H.YAMADA: “An Analysis of the Data Base ETL9 of Handprinted Characters in KANJI Characters by Human examination,” 1985 National Convention Record, The Institute of Electronics and Communication Engineers, S4-2, pp.303-304 (1985-11) (in Japanese).
Scroll to Top