TPCH benchmark raw tables.

Parameters:
- Benchmark Select the type of benchmark data to generate:
• TPC-H
• TPC-H
- Scale Factor (For TPC-H) Defines the data volume in GB (e.g., 1 GB)
- Table(s) to generate Select specific table(s) to generate from the benchmark (e.g., Customer, Orders, etc.)
- Optimize tables for processing Toggle to optimize table layout for downstream processing
- Reference for dates in Elapsed Time format Set a reference date (in
yyyyMMdd hh:mm:ss format) to compute elapsed times
- Conversion table to generate (For TPC-H conversion tables) Choose specific conversion table (e.g., C_MKT_SEGMENT)
- First Record Number (For Sort benchmark) Starting record number (default: 0)
- Number of Records to generate (For Sort benchmark) Specify how many records to generate (default: 1,000,000)
-Generate the table for the Sort Benchmark Confirms generating the Sort benchmark table
Generates TPCH benchmark raw tables and other benchmark datasets for testing and performance evaluations.
This action supports the following benchmark types:
-
TPC-H
-
TPC-H conversion tables
-
Sort benchmark
It is typically used for:
-
Simulating large datasets for pipeline testing
-
Benchmarking sorting, aggregation, and ETL performance
-
Generating reproducible test cases for development or documentation
Example Workflow:
- Generate synthetic benchmark data with GenBenchmarkData.
- Sort the generated data using the Sort action.
- Output sorted data to the next processing step.
- Benchmarking ETL tools with large TPC-H datasets.
- Testing database performance with synthetic customer/order data.
- Generating Sort Benchmark datasets for stress testing.
- Creating sample datasets for pipeline documentation.
Notes
- The output structure and data size depend on the selected benchmark and parameters.
- Large TPC-H datasets may require significant disk space and memory.
- Sort Benchmark generates purely numeric data for sorting tests.
- Execution time and memory usage may vary with data volume.
Generates TPC-H-compliant datasets with a configurable scale.
Example Configuration:
- Benchmark: TPC-H
- Scale Factor: 1 GB
- Table: Customer
- Optimize Tables: Enabled
Example Output:
| C_CUSTKEY |
C_NAME |
C_ADDRESS |
C_NATIONKEY |
C_PHONE |
| 1 |
Customer#000000001 |
Sample Address |
15 |
25-989-741-2988 |
| ... |
... |
... |
... |
... |
Generates conversion datasets for TPC-H benchmarks.
Example Configuration:
- Benchmark: TPC-H conversion tables
- Conversion Table: P_BRAND
Example Output:
| P_BRAND |
P_BRAND_INDEX |
| Brand#11 |
11 |
| Brand#12 |
12 |
Generates numeric datasets for sorting performance tests.
Example Configuration:
- Benchmark: Sort benchmark
- First Record Number: 0
- Number of Records to Generate: 1,000,000
Example Output:
| ID_1 |
ID_2 |
ID_3 |
ID_4 |
| 1248423239 |
1918990674 |
1230005512 |
6325099 |
| ... |
... |
... |
... |
