Point of Interest Cinema is a pipeline sample to demonstrate ETL transformation capabilities. It retrieves textual data from a CSV file, transform it (property renaming, string replacement, create new property, ...) and write the result to another CSV file.
Here is the graph designed for Point of Interest Cinema example;
Now let's examine graph steps and action associated:
- readCSV : first step of the graph. read dataset csv file (cinema.csv) and load data.
- ColumnRename : transform osm_id property name to osmid
- Calculator : transformation to create multiple final properties.
- COORDONNES_GEO : computed property based on latitude (Y) and longitude (Y). The output format of the action representing COORDONNEES_GEO property is "Y_X".
- NAME : generated from the original name property. If a name of the cinema property is empty in original dataset, we fill the property with the simple "cinema" word.
- MARQUE : string replacement in marque property.
- OPENING_HOURS : string replacement in opening_hours property.
- URL : construct a variable containing the url of the web service used to make some reverse geocoding. In our case we use a local geocoding service and the url variable has the value "http://ip:8080/reverse?format=json&lon="//X//"&lat="//Y". Note that we use that variable X and Y which represent Latitude and Longitude in original dataset are used as parameter to get the address of the cinema.
- MOTS_CLES : create this property and fill it with 'cinema' and 'cinéma' keywords.
- InsertKey : to generate CINEMA_ID property. this id allow to identify each cinema but is not present in original dataset. we need to generate one based on a simple incremental value from 1 to the number of records.
- Calculator : this processing step is used to assign marking to records.
- nominatimGeocoding : call nominatim reverse geocoding service. Pass X and Y hase parameters and extract from json result
- Calculator : create adress_name property. Concatenate num and street values from nominatim output and fill adress_name with this concatenate string.
- ontology-poi : define ontology mapping. Map original dataset properties (ref_cnc, marque), temporary variable () and computed variable (COORDONNEES) to properties in the final ontology.
- writeCSV : genereate the output csv file from ontology-poi output.
- InlineTable : declare two variable; C1 containing poi-cinema.csv (graph output file), C2 containing POI-CINEMA (ontology name) and pass it to the next action.
- ingestAndStoreFiles : use previously defined variables C1 and C2 as input parameter of this action to upload the poi-cinema.csv output file and assign it to the ontology name defined in C2 (POI-CINEMA).
- RunToFinishLine : dedicated action used to finalize a graph.
Back to home page