Specification of ETL-1

Data Collection

OCR Sheet

  • e1shtSheet: B5, 90kg per 1000 sheets
  • Dropout color: No.26 Violet 50% Screen(DNP)
  • Frame size: 5mm x 7mm
  • Pitch: 7.62mm x 12.7mm
  • Number of frames: 10 x 12 = 120

Characters

  • Numeric: 10 (0?9)
  • Capital Roman alphabet: 26 (A?Z)
  • Special: 12 (\+-*/=()・,?’)
  • Katakana: 51 (ア?ン)
  • Total: 99

Data Collection

Scanning System

  • Scanner: Flying Spot Scanner (FSS) with a Flying Spot Cathode Ray Tube 5CNP16 and a Photomultiplier Tube 7696
  • Interval: 0.133mm x 0.133mm
  • Spot size: 0.1333mm
  • Intensity levels: 16 (4bit)
  • Number of pixels: 72 x 76, cut out to 64 x 63

Compiling

  • Place: Electrotechnical Laboratory (ETL)
  • Joint work of ETL and Fujitsu for design of OCR sheet and scanning system
  • Computer : TOSBAC-3400/41
  • Software: FSSTOMT
  • Date of Collection: Sept. 1973
  • Date of Scanning: Sept. 1973?Mar. 1974
  • Quality evaluation by human
  • 141319 samples in total

Format

  • M-Type Data Format (ETL1, ETL6, ETL7)
  • Fixed Record Length without Control Words
  • Logical record length is 2052 bytes (1byte = 8bits)
  • Big endian
  • Format of a record:
    Byte Range # of Bytes Type Contents
    1-2 2 Integer Data Index (>=1)
    3-4 2 ASCII Character Name (e.g. “0”, “A”, “$”, “KA”)
    5-6 2 Integer Sheet Index (>=1)
    7 1  Binary Character Code (JIS X0201)
    8 1  Binary Character Code (EBCDIC)
    9 1 Integer Quality of Character Image (0:clean, 1, 2, 3)
    10 1 Integer Quality of Character Group (0:clean, 1, 2)
    11 1 Integer Gender of Writer ( 1:male, 2:female ) (JIS X 0303)
    12 1 Integer Age of Writer
    13-16 4 Integer Serial Data Index (>=1)
    17-18 2 Integer Industry Classification Code (JIS X 0403)
    19-20 2 Integer Occupation Classification Code (JIS X 0404)
    21-22 2 Integer Date of Sheet Gathering (19)YYMM
    23-24 2 Integer Date of Scan (19)YYMM
    25 1 Integer Y Coordinate of Scan Position on Sheet (>= 1)
    26 1 Integer X Coordinate of Scan Position on Sheet (>= 1)
    27 1 Integer Minimum Intensity Level (0 – 255)
    28 1 Integer Maximum Scanned Level (0 – 255)
    29-30 2 Integer (undefined)
    31-32 2 Integer (undefined)
     33-2048  2016 Packed 16 Gray Level (4bit/pixel) Image Data. 64(X-axis size) * 63(Y-axis size) = 4032 pixels.
    2049 – 2052 4 Integer (uncertain)

Contents

Contents of files:

Filename Categories # Categories Sheets # Sheets # Records
ETL1C-01 01234567 8 1001-2960 1445 11560
ETL1C-02 89ABCDEF 8 1001-2960 1445 11560
ETL1C-03 GHIJKLMN 8 1001-2960 1445 11560
ETL1C-04 OPQRSTUV 8 1001-2960 1445 11560
ETL1C-05 WXYZ\+-* 8 1001-2960 1445 11560
ETL1C-06 /=()・,?’ 8 1001-2960 1445 11560
ETL1C-07 アイウエオカキク 8 1001-2960 1411 11288
ETL1C-08 ケコサシスセソタ 8 1001-2960 1411 11288
ETL1C-09 チツテトナニヌネ 8 1001-2960 1411 11287 note: ナ(NA) on Sheet 2672 is missing
ETL1C-10 ノハヒフヘホマミ 8 1001-2960 1411 11288
ETL1C-11 ムメモヤイユエヨ 8 1001-2960 1411 11288
ETL1C-12 ラリルレロワヰウ 8 1001-2960 1411 11287 note: リ(RI) on Sheet 2708 is missing
ETL1C-13 ヱヲン 3 1001-2960 1411 4233

List of available sheets:

1001-1026 1028-1149 1151-1243 1301-1306 1308-1316 1318-1355 1357 1360-1391 1393-1436 1438-1453 1455-1459 1461-1491 1501-1525 1527-1658 1660-1663 1665 1667-1695 1701-1766 1801-1837 1839-1884 2001-2019 2021-2025 2027-2153 2201-2391 2501-2696 2701-2744 2801-2802 *2803-2812 2813 *2814 2815-2817 *2818-2840 2901-2960 *: Katakana characters are missing

Samples

Metadata and images of randomly drawn records from some files. Images are shown as 0:black, 15:white.

 filename  metadata image
ETL1C-01 (1282,’0 ‘,2677,48,240,0,0,1,24,1282,3552,42,7308,7401,1,0,0,0) 1282267730
ETL1C-03 (1319,’G ‘,2718,71,199,0,0,1,18,1319,3552,42,7308,7402,2,6,0,0)  1319271847
ETL1C-04 (164, ‘O ‘, 1166, 79, 214, 0, 0, 1, 30, 164, 9711, 121, 7309, 7311, 3, 4, 0, 0)  16411664f
ETL1C-05 (943, ‘W ‘, 2229, 87, 230, 1, 0, 0, 0, 943, 3552, 42, 7308, 7312, 4, 2, 0, 0)  943222957
ETL1C-07 (317,’ A’,1381,177,129,0,0,1,32,317,5021,151,7309,7310,7,0,0,0)  3171381b1

Python code for extracting a record from a file (Python 2.7.5):

import struct
from PIL import Image, ImageEnhance

filename = 'ETL1/ETL1C_01'
skip = 100
with open(filename, 'rb') as f:
    f.seek(skip * 2052)
    s = f.read(2052)
    r = struct.unpack('>H2sH6BI4H4B4x2016s4x', s)
    iF = Image.frombytes('F', (64, 63), r[18], 'bit', 4)
    iP = iF.convert('P')
    fn = "{:1d}{:4d}{:2x}.png".format(r[0], r[2], r[3])
#    iP.save(fn, 'PNG', bits=4)
    enhancer = ImageEnhance.Brightness(iP)
    iE = enhancer.enhance(16)
    iE.save(fn, 'PNG')