InfoObjects Logo

Infoobjects is a consulting company that helps enterprises transform how and where they run applications and infrastructure. From strategy, to implementation, to ongoing managed services, Infoobjects creates tailored cloud solutions for enterprises at all stages of the cloud journey.

CSV lookup filter plugin for Embulk

license

An Embulk filter plugin for Lookup Transformation with CSV

Configuration

  • csv_lookup: Required attributes for the LookUp Filter Plugin -
    • filters:
      • type: Name of lookup type (required)
    • mapping_from: (Name of columns to be matched with table 2 columns) (required)
      • Name of column-1: column name-1 from input file
      • Name of column-2: column name-2 from input file etc ...
    • mapping_to: (Name of columns to be matched with table 1 columns) (required)
      • Name of column-1: column name-1 from input file
      • Name of column-2: column name-2 from input file
    • new_columns: (New generated column names) (required)
      • Name-1,Type-1: Any Name, Type of the name (name: country_name, type: string)
      • Name-2,Type-2: Any Name, Type of the name (name: country_address, type: string) etc ...
    • path_of_lookup_file: lookup file path (required) ## Example - columns

Input1 for table 1 is as follows :-

    year               country_code                 country_name            literacy_rate

    1990                    1                          India                       80%
    1993                    2                           USA                        83%
    1997                    3                          JAPAN                        
    1999                    4                          China                       72%
    2000                    5                         Ukraine                      68%
    2002                    6                          Italy                       79%
    2004                    7                            UK                        75%
    2011                    8                           NULL                       42%

Input2 for table 2 is as follows :-

    id               country_population                        country_address               country_GDP

    1                       11.3                                    India                       1.67
    2                       18.2                                     USA                        16.72
    3                       30                                      JAPAN                       5.00
    4                       4                                       China                       9.33
    5                       57                                     Ukraine                      1.08
    6                       63                                      Italy                       2.068
    7                       17                                       UK                         2.49
    8                       28                                       UAE                        1.18                            


    Note: country_population is calculated in Billion and country_GDP is calculated in $USD Trillion

As shown in yaml below, columns mentioned in mapping_from will be mapped with columns mentioned in mapping_to
ie:

country_code : id
country_name : country_address

After successful mapping an Output.csv file containing the columns mentioned in new_columns will be generated

Output File generated :-

    year               country_code                 country_name              literacy_rate                 country_GDP                   country_population

    1990                    1                          India                       80%                         1.67                                11.3
    1993                    2                           USA                        83%                         16.72                               18.2
    1997                    3                          JAPAN                                                   5.00                                30
    1999                    4                          China                       72%                         9.33                                4
    2000                    5                         Ukraine                      68%                         1.08                                57
    2002                    6                          Italy                       79%                         2.068                               63
    2004                    7                            UK                        75%                         2.49                                17
    2011                    8                           NULL                       42%                                                         
 - type: csv_lookup
   mapping_from:
   - country_code
   - country_name
   mapping_to:
   - id
   - country_address
   new_columns:
   - { name: country_GDP, type: string }
   - { name: country_population, type: string }

Notes:

  1. mapping_from attribute should be in same order as mentioned in input file.

Development

Run example:

$ ./gradlew package
$ embulk run -I ./lib seed.yml

Deployment Steps:

Install ruby in your machine
$ gem install gemcutter (For windows OS)

$ ./gradlew gemPush
$ gem build NameOfYourPlugins (example: embulk-filter-csv_lookup)
$ gem push embulk-filter-csv_lookup-0.1.0.gem (You will get this name after running above command)

Release gem:

$ ./gradlew gemPush

Licensing

InfoObjects license (MIT License)