Specification of ETL7

Summary

ETL7 was compiled from OCR sheets of Hiragana 48 characters written by 175 people from OCR users, companies, universities and public officials which was scanned by TOSBAC-40C system at Electrotechnical Laboratory in 1977.


Data Collection

OCR Sheet

  • OCR Sheet: A4, 100 kg per 1000 sheets (custom order)
  • Dropout color: No. 114 Reddish Orange 50% Screen(DNP)
  • Frame size (width x height): Large 6.0 mm x 7.2 mm / Small 5.0 mm x 6.0 mm
  • Frame pitch (width x height): Large 8.47mm x 11.0 mm / Small 6.35 mm x 12.7 mm
  • Number of frames: Large 20 x 20 = 400 / Small 26 x 17 = 442

Characters

  • Hiragana: 46(あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをん)
  • Dakuten (voiced mark: “゛”)
  • Handakuten (semi-voiced mark: “゜”)

Data Collection

Scanning System

  • Scanner: VIDICON
  • Filter: sharp cut filter 620 nm (JIS B 7113 R-62)
  • Interval: Large 0.13 mm x 0.13 mm / Small 0.11 mm x 0.11 mm
  • Intensity levels: 16 (4bit)
  • Number of pixels: 64 x 63 = 4,032 pixels

Compilation

  • Scanning Location: Electrotechnical Laboratory (ETL)
  • Computer : TOSBAC-40C
  • Compilation Date: Aug. 1977
  • Scanning Date: Aug. 1977

Contents

Format

  • Data format is the same as ETL1. Please refer to ETL1 for sample code.

Files

filename # records contents
ETL7LC_1 9600 48 characters: “あーん゛゜” written in large frames, 200 records per character
ETL7LC_2 7200 48 characters: “あーん゛゜” written in large frames, 150 records per character
ETL7SC_1 9600 48 characters: “あーん゛゜” written in small frames, 200 records per character
ETL7SC_2 7200 48 characters: “あーん゛゜” written in small frames, 150 records per character