Reads a table from a .gel file.

Parameters:

Parameters:
The GelFile Reader action allows you to read a table from a “.gel” file. The “.gel” file format is used to store table data, and this action can handle multiple files containing similar structures. You can specify the filename using relative paths, wildcards, or JavaScript expressions to dynamically locate the file.
You can connect to the input pin of the GelFile Reader a table containing (many) filenames. Typically, this input table will be computed using the fileListFromObsDate action. ETL reads all the corresponding “Gel files” one after the other (this is more or less equivalent to the Append action). There is a limitation: the different “Gel Files” that are read in this way must all have exactly the same meta-data.
By default, when a user leaves the editor, cache files are removed. In Global Parameters, there’s a toggle to keep these files, but note they can become very large. The editor also includes a Clean cache button to remove them manually.
NOTE:
You can drag&drop a .gel file from your local machine into an ETL-Pipeline-Window: this will directly create the corresponding ReadGel action inside the ETL Pipeline.
Here is an example:

An “.gel” file contains:
The data inside a “.gel” file is compressed using a block-based compression algorithm. This means that the process used to read a “.gel” file follows these steps:

It can happen that you are forced to open a large number of “.gel” files simultaneously (for example, when using the mergeSortInput action). Keep in mind that each opened “.gel” file uses (by default) around 1 MB of RAM (and it uses even more RAM if you set the “Read Buffer” parameter greater than one). Thus, to open one thousand “.gel” files simultaneously, you need at least 1 GB of RAM. This is already a lot of RAM on a small 32-bit server and might lead to some crashes.
Using a larger block size than 1 MB means that the compression is slightly better, and the reading and writing speed is also slightly improved.
The algorithm used to read a “.gel” file is the following:
In terms of speed, the above algorithm used to read the “.gel” file is not very efficient because it is a “synchronous (i.e. blocking)” I/O algorithm. That is, when the actions connected to the output of the GelFileReader action request more rows (i.e. when they have consumed all the rows from the current data block), the data transformation:
Only after these steps are completed does the transformation process resume and provide the next rows.
A better approach would be to use an asynchronous (i.e. non-blocking) I/O algorithm.
When you set the value “2” for the “Number of read Buffer” parameter of the GelFileReader action, ETL switches to an asynchronous (non-blocking) I/O mode. This enables the use of multiple threads in parallel.
In this mode, all extraction, validation, and decompression tasks are handled continuously in a background thread. As a result, when the actions connected to the output of the GelFileReader action request more rows (e.g. from the next data block), those rows are already available in RAM, ready to be consumed. The transformation process no longer has to wait for the data to be prepared.
The background thread continuously produces new rows in advance. These rows are stored in one of the available Read Buffers. The number of Read Buffers can be configured here:

When the parameter “Number of Read Buffers” is:
“1”:
“2”:
“3 and higher”:
A large value for the “Number of read Buffer” parameter helps compensate for brief drops in network performance.
For example, if the remote drive stops responding momentarily due to a network issue, the actions connected to the output of the GelFileReader action can continue processing using the rows already preloaded in the N Read Buffers (with N set to a large number).
Once the network resumes, the background thread quickly refills the N buffers. This way, even if another network slowdown occurs, the data transformation can continue without interruption.
Setting a high value for the number of Read Buffers enables ETL to maintain high-speed data transformations, even over unstable network connections.
