Specification of ETL-1

Data Collection

OCR Sheet

  • e1shtSheet: B5, 90kg per 1000 sheets
  • Dropout color: No.26 Violet 50% Screen(DNP)
  • Frame size: 5mm x 7mm
  • Pitch: 7.62mm x 12.7mm
  • Number of frames: 10 x 12 = 120

Characters

  • Numeric: 10 (0–9)
  • Capital Roman alphabet: 26 (A–Z)
  • Special: 12 (¥+-*/=()・,␣’)
  • Katakana: 51 (ア–ン)
  • Total: 99

Writers

Scanning System

  • Scanner: Flying Spot Scanner (FSS) with a Flying Spot Cathode Ray Tube 5CNP16 and a Photomultiplier Tube 7696
  • Interval: 0.133mm x 0.133mm
  • Spot size: 0.1333mm
  • Intensity levels: 16 (4bit)
  • Number of pixels: 72 x 76, cut out to 64 x 63

Compiling

  • Place: Electrotechnical Laboratory (ETL)
  • Joint work of ETL and Fujitsu for design of OCR sheet and scanning system
  • Computer : TOSBAC-3400/41
  • Software: FSSTOMT
  • Date of Collection: Sept. 1973
  • Date of Scanning: Sept. 1973–Mar. 1974
  • Quality evaluation by human
  • 141319 samples in total

Format

  • M-Type Data Format (ETL1, ETL6, ETL7)
  • Fixed Record Length without Control Words
  • Logical record length is 2052 bytes (1byte = 8bits)
  • Big endian
  • Format of a record:
    Byte Range # of Bytes Type Contents
    1-2 2 Integer Data Index (>=1)
    3-4 2 ASCII Character Name (e.g. “0”, “A”, “$”, “KA”)
    5-6 2 Integer Sheet Index (>=1)
    7 1  Binary Character Code (JIS X0201)
    8 1  Binary Character Code (EBCDIC)
    9 1 Integer Quality of Character Image (0:clean, 1, 2, 3)
    10 1 Integer Quality of Character Group (0:clean, 1, 2)
    11 1 Integer Gender of Writer ( 1:male, 2:female ) (JIS X 0303)
    12 1 Integer Age of Writer
    13-16 4 Integer Serial Data Index (>=1)
    17-18 2 Integer Industry Classification Code (JIS X 0403)
    19-20 2 Integer Occupation Classification Code (JIS X 0404)
    21-22 2 Integer Date of Sheet Gathering (19)YYMM
    23-24 2 Integer Date of Scan (19)YYMM
    25 1 Integer Y Coordinate of Scan Position on Sheet (>= 1)
    26 1 Integer X Coordinate of Scan Position on Sheet (>= 1)
    27 1 Integer Minimum Intensity Level (0 – 255)
    28 1 Integer Maximum Scanned Level (0 – 255)
    29-30 2 Integer (undefined)
    31-32 2 Integer (undefined)
     33-2048  2016 Packed 16 Gray Level (4bit/pixel) Image Data. 64(X-axis size) * 63(Y-axis size) = 4032 pixels.
    2049 – 2052 4 Integer (uncertain)

Contents

Contents of files:

Filename Categories # Categories Sheets # Sheets # Records
ETL1C-01 01234567 8 1001-2960 1445 11560
ETL1C-02 89ABCDEF 8 1001-2960 1445 11560
ETL1C-03 GHIJKLMN 8 1001-2960 1445 11560
ETL1C-04 OPQRSTUV 8 1001-2960 1445 11560
ETL1C-05 WXYZ¥+-* 8 1001-2960 1445 11560
ETL1C-06 /=()・,␣’ 8 1001-2960 1445 11560
ETL1C-07 アイウエオカキク 8 1001-2960 1411 11288
ETL1C-08 ケコサシスセソタ 8 1001-2960 1411 11288
ETL1C-09 チツテトナニヌネ 8 1001-2960 1411 11287 note: ナ(NA) on Sheet 2672 is missing
ETL1C-10 ノハヒフヘホマミ 8 1001-2960 1411 11288
ETL1C-11 ムメモヤイユエヨ 8 1001-2960 1411 11288
ETL1C-12 ラリルレロワヰウ 8 1001-2960 1411 11287 note: リ(RI) on Sheet 2708 is missing
ETL1C-13 ヱヲン 3 1001-2960 1411 4233

List of available sheets:

1001-1026 1028-1149 1151-1243 1301-1306 1308-1316 1318-1355 1357 1360-1391 1393-1436 1438-1453 1455-1459 1461-1491 1501-1525 1527-1658 1660-1663 1665 1667-1695 1701-1766 1801-1837 1839-1884 2001-2019 2021-2025 2027-2153 2201-2391 2501-2696 2701-2744 2801-2802 *2803-2812 2813 *2814 2815-2817 *2818-2840 2901-2960 *: Katakana characters are missing

Samples

Metadata and images of randomly drawn records from some files. Images are shown as 0:black, 15:white.

 filename  metadata image
ETL1C-01 (1282,’0 ‘,2677,48,240,0,0,1,24,1282,3552,42,7308,7401,1,0,0,0) 1282267730
ETL1C-03 (1319,’G ‘,2718,71,199,0,0,1,18,1319,3552,42,7308,7402,2,6,0,0)  1319271847
ETL1C-04 (164, ‘O ‘, 1166, 79, 214, 0, 0, 1, 30, 164, 9711, 121, 7309, 7311, 3, 4, 0, 0)  16411664f
ETL1C-05 (943, ‘W ‘, 2229, 87, 230, 1, 0, 0, 0, 943, 3552, 42, 7308, 7312, 4, 2, 0, 0)  943222957
ETL1C-07 (317,’ A’,1381,177,129,0,0,1,32,317,5021,151,7309,7310,7,0,0,0)  3171381b1

Python code for extracting a record from a file (Python 2.7.5):

References

  • 山田博三、森俊二: “手書文字データベースの解析(I)”, 「電総研彙報」, Vol.39, No.8, pp.580–599 (1975-08).
  • 電総研、富士通: “手書文字データ・バンク外部仕様書” (1973-09).