Summary
ETL4 is a dataset of handwritten character images of 51 Hiragana characters made from OCR sheets collected at Nagoya University which were scanned at Electrotechnical Laboratory with the TOSBAC-3400 scanning system in 1974 (S49) FY.
Data Collection
OCR Sheet (Same as the ETL1)
- Sheet: B5, 90kg per 1000 sheets
- Dropout color: No.26 Violet 50% Screen(DNP)
- Frame size (width x height): 5mm x 7mm
- Frame pitch (width x height): 7.62mm x 12.7mm
- Number of frames: 10 x 12 = 120
Characters
- Hiragana: 51(あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやいゆえよらりるれろわゐうゑをん)
Data Collection
- Location: Nagoya University
- Instructions: 手書文字読取用紙記入上のお願い
- Templates: given
- Number of writers: 120
- Number of samples: 6,120
Scanning System
- Scanner: Flying Spot Scanner (FSS) with a Flying Spot Cathode Ray Tube 5CNP16 and a Photomultiplier Tube 7696
- Interval: 0.133mm x 0.133mm
- Spot diameter: 0.1333mm
- Intensity levels: 16 (4bit)
- Number of pixels: 72 x 76 = 5,472 pixels
Compilation
- Location: Electrotechnical Laboratory (ETL)
- Computer : TOSBAC-3400/41
- Software: FSSTOMT
- Date of Compilation: Dec. 1974
- Date of Scanning: Dec. 1974
Format
- C-Type Data Format (ETL3, ETL4, ETL5)
- Fixed Record Length without Control Words
- Logical record length is 3936 bytes (6 bits / byte) or 2952 octets (1 octet = 8 bits)
- Big endian
- File formats and sample script
Sample
metadata | image | |
0 | Serial Data Number: 500100 Serial Sheet Number: 5001 JIS Code: 0xb1 EBCDIC Code: 0x81 4 Character Code: H A Evaluation of Individual Character Image: 0 Evaluation of Character Group: 0 Sample Position Y on Sheet: 1 Sample Position X on Sheet: 0 Male-Female Code: 1 Age of Writer: 23 Industry Classification Code: 9144 Occupation Classifiaction Code: 11 Sheet Gatherring Date: 741202 Scanning Date: 741216 Number of X-Axis Sampling Points: 72 Number of Y-Axis Sampling Points: 76 Number of Levels of Pixel: 16 Magnification of Scanning Lens: 133 Serial Data Number (old): 0 |