Summary
ETL7 was compiled from OCR sheets of Hiragana 48 characters written by 175 people from OCR users, companies, universities and public officials which was scanned by TOSBAC-40C system at Electrotechnical Laboratory in 1977.
Data Collection
OCR Sheet
- OCR Sheet: A4, 100 kg per 1000 sheets (custom order)
- Dropout color: No. 114 Reddish Orange 50% Screen(DNP)
- Frame size (width x height): Large 6.0 mm x 7.2 mm / Small 5.0 mm x 6.0 mm
- Frame pitch (width x height): Large 8.47mm x 11.0 mm / Small 6.35 mm x 12.7 mm
- Number of frames: Large 20 x 20 = 400 / Small 26 x 17 = 442
Characters
- Hiragana: 46(あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをん)
- Dakuten (voiced mark: “゛”)
- Handakuten (semi-voiced mark: “゜”)
Data Collection
- Instructions: 手書文字読取用紙記入上のお願い
- Number of writers: 175
- Number of samples: 16,800 (Large 8,400 / Small 8,400)
Scanning System
- Scanner: VIDICON
- Filter: sharp cut filter 620 nm (JIS B 7113 R-62)
- Interval: Large 0.13 mm x 0.13 mm / Small 0.11 mm x 0.11 mm
- Intensity levels: 16 (4bit)
- Number of pixels: 64 x 63 = 4,032 pixels
Compilation
- Scanning Location: Electrotechnical Laboratory (ETL)
- Computer : TOSBAC-40C
- Compilation Date: Aug. 1977
- Scanning Date: Aug. 1977
Contents
Format
- Data format is the same as ETL1. Please refer to ETL1 for sample code.
Files
filename | # records | contents |
ETL7LC_1 | 9600 | 48 characters: “あーん゛゜” written in large frames, 200 records per character |
ETL7LC_2 | 7200 | 48 characters: “あーん゛゜” written in large frames, 150 records per character |
ETL7SC_1 | 9600 | 48 characters: “あーん゛゜” written in small frames, 200 records per character |
ETL7SC_2 | 7200 | 48 characters: “あーん゛゜” written in small frames, 150 records per character |