All data is stored in binary format. The format is derived from the magnetic tape on which the data was originally recorded. All records in a file have the same fixed length without control sequence. The storage unit is 8 bits per byte except for ETL2-5 whose storage unit is 6 bits per byte. Bit order is the big endian. There are 7 different formats with several character codecs depending on the dataset.
- M-type (ETL1, ETL6, ETL7), codecs: JIS X 0201, extended EBCDIC
- K-type (ETL2), codecs: CO-59, T56
- C-type (ETL3, ETL4, ETL5), codecs: JIS X 0201, extended EBCDIC, T56
- B-type (ETL8B), codecs: JIS X 0208
- G-type (ETL8G), codecs: JIS X 0208
- B-type (ETL9B), codecs: JIS X 0208
- G-type (ETL9G), codecs: JIS X 0208
For the convenience of browsing contents, a sample Python script for unpacking is provided.
Download this file
(unpack.zip) and extract two files (
euc_co59.dat) in the same folder. Install the necessary dependencies, and run the script by:
python3 unpack.py ETL1/ETL1C_01for example for the file
This script includes functions for extracting images and metadata with character code conversion. By the command shown above, it should produce several sets of 50-by-40 tiled images (
ETL1/ETL1C_01_??.png), character arrays corresponding to the images (
ETL1/ETL1C_01_??.txt), and a CSV file of the metadata (
ETL1/ETL1C_01.csv) in the same folder as the specified file (
ETL1/ETL1C_01). The field names are kept as the original description of the formats. This script is written in and tested with Python 3.7.
There is a user project called etlcdb-image-extractor.