The ETL Character Database

The ETL Character Database contains about 1.2 million character images of handwritten and printed alphanumeric characters, symbols, hiraganas, katakanas, educational kanjis, JIS level 1 kanjis, etc. It was collected for character recognition research at the Electrotechnical Laboratory (now the National Institute of Advanced Industrial Science and Technology) between 1973 and 1984, in cooperation with the Japan Electronics Industry Development Association (now the Japan Electronics and Information Technology Industries Association), universities and private research institutes, and compiled into nine datasets (ETL1 through ETL9). The database was previously provided on magnetic tapes and CD-Rs by mail, but since April 2011, it has been available for downloading via the Internet.

The “ETL Character Database” is a collection of standardized data designed to enable the comparison of offline character recognition algorithms. It consists of scanned images of OCR sheets filled out by writers or sheets with printed Kanji characters (ETL2). ETL1 to ETL9 datasets contain multi-level grayscale data, but ETL8 and ETL9 also provide binary versions (ETL8B and ETL9B). The size of each character pattern varies depending on the database, with available dimensions including 60×60, 64×63, 72×76, and 128×127 pixels. Each character pattern is assigned an ID containing the corresponding ground truth code. A single character pattern along with its ID information constitutes one record, and multiple records are grouped into a single file.

Summary of the Datasets

  • ETL1
    • free handwriting
    • categories: 99 (10 numerals, 26 uppercase alphabets, 12 symbols, 51 katakanas)
    • number of writers: 1445
    • number of samples: 141319
    • image width×height×bits: 64×63×4
    • date of compilation: September 1973
    • number of files, file format: 13, M-type
  • ETL2
    • printed
    • categories: 2184 (kanjis, hiraganas, katakanas, alphanumerics, symbols)
    • two fonts (Mincho, Gothic)
    • number of samples: 52796
    • image width×height×bits: 60×60×6
    • date of compilation: October 1973
    • number of files, file format: 5, K-type
  • ETL3
    • regular handwriting
    • categories: 48 (10 numerals, 26 uppercase alphabets, 12 symbols)
    • number of writers: 200
    • number of samples: 9600
    • image width×height×bits: 72×76×4
    • date of compilation: April 1974
    • number of files, file format: 2, C-type
  • ETL4
    • free handwriting
    • categories: 51 hiraganas
    • number of writers: 120
    • number of samples: 6120
    • image width×height×bits: 72×76×4
    • date of compilation: December 1974
    • number of files, file format: 1, C-type
  • ETL5
    • regular handwriting
    • categories: 51 katakanas
    • number of writers: 104 (twice for each character)
    • number of samples: 10608
    • image width×height×bits: 72×76×4
    • date of compilation: February 1975
    • number of files, file format: 1, C-type
  • ETL6
    • regular handwriting
    • categories: 114 (46 katakanas, 10 numerals, 26 uppercase alphabets, 32 symbols)
    • number of writers: 1383
    • number of samples: 157662
    • image width×height×bits: 64×63×4
    • date of compilation: December 1976
    • number of files, file format: 12, M-type
  • ETL7L, ETL7S
    • regular handwriting
    • categories: 48 (46 hiraganas, dakuten, handakuten)
    • number of writers: 175 in two sizes (large and small)
    • number of samples: 16800
    • image width×height×bits: 64×63×4
    • date of compilation: August 1977
    • number of files, file format: 2 for each, M-type
  • ETL8 (ETL8G, ETL8B)
    • handwriting
    • categories: 956 (881 kanjis, 75 hiraganas)
    • number of writers: 1600
    • number of samples: 152960
    • ETL8G
      • image width×height×bits: 128×127×4
      • date of compilation: February 1980
      • number of files, file format: 32, G-type
    • EGL8B2
      • image width×height×bits: 64×63×1
      • date of compilation: July 1981
      • number of files, file format: 3, B-type
  • ETL9 (ETL9G, ETL9B)
    • handwriting
    • categories: 3036 (2965 kanjis, 71 hiraganas)
    • number of writers: 4000
    • number of samples: 607200
    • ETL9G
      • image width×height×bits: 128×127×4
      • date of compilation: March 1984
      • number of files, file format: 50, G-type
    • EGL9B
      • image width×height×bits: 64×63×1
      • date of compilation: August 1984
      • number of files, file format: 5, B-type

† When collecting data for the “ETL Character Database”, “regular handwriting” is defined as characters for which the character writer was shown a sample character, and “free handwriting” is defined as characters for which the character writer was not shown a sample character. However, the distinction is not very strict, since it is sometimes difficult to specify the place of writing (Katakana in ETL1) without showing the shape of the character. The “free handwriting” collection form has sample characters printed at the top of the character entry box. This is to facilitate the addition of the correct answer code for each character sample.

‡ Basically, each writer has written each character only once. The exception is ETL7, where each sheet contains two instances of the same Hiragana characters.

Scroll to Top