CSV lookup filter plugin for Embulk
An Embulk filter plugin for Lookup Transformation with CSV
Configuration
- csv_lookup: Required attributes for the LookUp Filter Plugin -
- mapping_from: (Name of columns to be matched with table 2 columns) (required)
- Name of column-1: column name-1 from input file
- Name of column-2: column name-2 from input file etc ...
- mapping_to: (Name of columns to be matched with table 1 columns) (required)
- Name of column-1: column name-1 from input file
- Name of column-2: column name-2 from input file
- new_columns: (New generated column names) (required)
- Name-1,Type-1: Any Name, Type of the name (name: country_name, type: string)
- Name-2,Type-2: Any Name, Type of the name (name: country_address, type: string) etc ... ## Example - columns
- mapping_from: (Name of columns to be matched with table 2 columns) (required)
Input1 for table 1 is as follows :-
year country_code country_name literacy_rate
1990 1 India 80%
1993 2 USA 83%
1997 3 JAPAN
1999 4 China 72%
2000 5 Ukraine 68%
2002 6 Italy 79%
2004 7 UK 75%
2011 8 NULL 42%
Input2 for table 2 is as follows :-
id country_population country_address country_GDP
1 11.3 India 1.67
2 18.2 USA 16.72
3 30 JAPAN 5.00
4 4 China 9.33
5 57 Ukraine 1.08
6 63 Italy 2.068
7 17 UK 2.49
8 28 UAE 1.18
Note: country_population is calculated in Billion and country_GDP is calculated in $USD Trillion
As shown in yaml below, columns mentioned in mapping_from will be mapped with columns mentioned in mapping_to
ie:
country_code : id
country_name : country_address
After successful mapping an Output.csv file containing the columns mentioned in new_columns will be generated
Output File generated :-
year country_code country_name literacy_rate country_GDP country_population
1990 1 India 80% 1.67 11.3
1993 2 USA 83% 16.72 18.2
1997 3 JAPAN 5.00 30
1999 4 China 72% 9.33 4
2000 5 Ukraine 68% 1.08 57
2002 6 Italy 79% 2.068 63
2004 7 UK 75% 2.49 17
2011 8 NULL 42%
- type: csv_lookup
mapping_from:
- country_code
- country_name
mapping_to:
- id
- country_address
new_columns:
- { name: country_GDP, type: string }
- { name: country_population, type: string }
Notes:
- mapping_from attribute should be in same order as mentioned in input file.
Development
Run example:
$ ./gradlew package
$ embulk run -I ./lib seed.yml
Deployment Steps:
Install ruby in your machine
$ gem install gemcutter (For windows OS)
$ ./gradlew gemPush
$ gem build NameOfYourPlugins (example: embulk-filter-csv_lookup)
$ gem push embulk-filter-csv_lookup-0.1.0.gem (You will get this name after running above command)
Release gem:
$ ./gradlew gemPush