File Formats and Sample Script

File Formats

All data is stored in binary format. The format is derived from the magnetic tape on which the data was originally recorded. All records in a file have the same fixed length without control sequence. The storage unit is 8 bits per byte except for ETL2-5 whose storage unit is 6 bits per byte. Bit order is the big endian. There are 7 different formats with several character codecs depending on the dataset.

M-type (ETL1, ETL6, ETL7), codecs: JIS X 0201, extended EBCDIC
K-type (ETL2), codecs: CO-59, T56
C-type (ETL3, ETL4, ETL5), codecs: JIS X 0201, extended EBCDIC, T56
B-type (ETL8B), codecs: JIS X 0208
G-type (ETL8G), codecs: JIS X 0208
B-type (ETL9B), codecs: JIS X 0208
G-type (ETL9G), codecs: JIS X 0208

Sample Script

For the convenience of browsing contents, a sample Python script for unpacking is provided.

Download this file (unpack.zip) and extract two files (unpack.py, euc_co59.dat) in the same folder. Install the necessary dependencies, and run the script by:
python3 unpack.py ETL1/ETL1C_01for example for the file ETL1/ETL1C_01.

This script includes functions for extracting images and metadata with character code conversion. By the command shown above, it should produce several sets of 50-by-40 tiled images (ETL1/ETL1C_01_??.png), character arrays corresponding to the images (ETL1/ETL1C_01_??.txt), and a CSV file of the metadata (ETL1/ETL1C_01.csv) in the same folder as the specified file (ETL1/ETL1C_01). The field names are kept as the original description of the formats. This script is written in and tested with Python 3.7.

There is a user project called etlcdb-image-extractor.