File Formats
All data is stored in binary format. The format is derived from the magnetic tape on which the data was originally recorded. All records in a file have the same fixed length without control sequence. The storage unit is 8 bits per byte except for ETL2-5 whose storage unit is 6 bits per byte. Bit order is the big endian. There are 7 different formats with several character codecs depending on the dataset.
- M-type (ETL1, ETL6, ETL7), codecs: JIS X 0201, extended EBCDIC
- K-type (ETL2), codecs: CO-59, T56
- C-type (ETL3, ETL4, ETL5), codecs: JIS X 0201, extended EBCDIC, T56
- B-type (ETL8B), codecs: JIS X 0208
- G-type (ETL8G), codecs: JIS X 0208
- B-type (ETL9B), codecs: JIS X 0208
- G-type (ETL9G), codecs: JIS X 0208
Sample Script
For the convenience of browsing contents, a sample Python script for unpacking is provided.
Download this file (unpack.zip)
and extract two files (unpack.py
, euc_co59.dat
) in the same folder. Install the necessary dependencies, and run the script by:
python3 unpack.py ETL1/ETL1C_01
for example for the file ETL1/ETL1C_01
.
This script includes functions for extracting images and metadata with character code conversion. By the command shown above, it should produce several sets of 50-by-40 tiled images (ETL1/ETL1C_01_??.png
), character arrays corresponding to the images (ETL1/ETL1C_01_??.txt
), and a CSV file of the metadata (ETL1/ETL1C_01.csv
) in the same folder as the specified file (ETL1/ETL1C_01
). The field names are kept as the original description of the formats. This script is written in and tested with Python 3.7.
There is a user project called etlcdb-image-extractor.