Goodot (LCM CLI)

Status

Gem Version Downloads Dependency Status

What is it?

Toolset for helping with lifecycle management of the project, specifically

  • CLI for doing typical tasks
  • Best practices for creating applications

Installation

CLI

Currently the biggest piece of functionality is the CLI interface. It allows you to perform several typical tasks in the application. If you would like to see more open an issue on github pls.

As part of the gem installation executable goodot is installed on your machine. By running goodot you will see a list of commands you can use. We will walk through them here briefly to give you an overview.

Login parameters

In the same vein as GoodData SDK you have to either specify the login parameters on command line

goodot -U [email protected] -P SECRET -s https://customer-domain.intgdc.com -d organization command

or you can rely on your GoodData SDK stored authentication information. In the examples the login information will be omitted for brevity.

Commands

clients
list

Will provide you table of clients in your application.

goodot clients list
remove

Allows you to remove a particular client

goodot clients remove --client CLIENT_ID

Be careful that this removes also the associated project. If you would like to save the project use goodot reset first.

reset

Allows you to disassociate particular client's project from the client's definition. This means that during next "provision clients" phase (which you can invoke through goodot app spin-up-projects command) a fresh project will be created for that client.

goodot clients reset --client CLIENT_ID

By default the project is just removed from client's defintion. If you also want to delete the project you can use.

goodot clients reset --client CLIENT_ID --delete-project
segments
list

Will provide you table of clients in your application.

goodot segments list

The response might look like this

+-----------+----------------------------------+
| Id        | Master_project_id                |
+-----------+----------------------------------+
| segment_1 | j87dughbcn2ybfjqua1i373j33sxy2pn |
| segment_2 | d7b70gzooqhijd3ozvoe6m2clp1620fg |
| segment_3 | al5v4rdvt8bhuoo84gp2ooize5kj9ahh |
+-----------+----------------------------------+

Sometimes output for machine processing is useful. This command can be also output as a json.

The response might look like this

[
  {
    "name": "segment_1",
    "master_project_id": "j87dughbcn2ybfjqua1i373j33sxy2pn"
  },
  {
    "name": "segment_2",
    "master_project_id": "d7b70gzooqhijd3ozvoe6m2clp1620fg"
  },
  {
    "name": "segment_3",
    "master_project_id": "al5v4rdvt8bhuoo84gp2ooize5kj9ahh"
  }
]
remove

Allows you to remove a particular segment.

goodot segments remove --segment SEGMENT_ID

By default it expects the segment to be empty ie it does not have any clients associated. If it is not the case it will fail. You can opt in to cascade delete all the clients in the segment by using cascade switch.

goodot segments remove --segment SEGMENT_ID --cascade

This will remove the clients and delete their respective projects.

masters
list

Will provide you table of master projects in your application along with their respective segment ids.

goodot masters list

The response might look like this

+----------------------------------+-----------------+------------+
| Pid                              | Title           | Segment_id |
+----------------------------------+-----------------+------------+
| t4ygciqls3nonmi25qtc2hj9t6joi3w7 | HR Demo Project | segment_1  |
| lys4vtz363h5f2eq1ioglkjtimu2ze3c | HR Demo Project | segment_2  |
| yebbqafxmsb9eg4axgu3prunfrdh66r6 | Test project    | segment-3  |
+----------------------------------+-----------------+------------+
app
create-segment

Allows you to add a segment. Useful for creating segments by hand. You have to provide a project that will serve as a master for the segment.

goodot app create-segment -s SEGMENT_ID -p PROJECT_ID
add-client

Allows you to add a client to particular segment. Useful for populating the segments by hand.

goodot app add-client -c CLIENT_ID -s SEGMENT_ID

You can also add a client with specific project

goodot app add-client -c CLIENT_ID -s SEGMENT_ID -p PROJECT_PID
spin-up-projects

With command goodot app add-client you will create clients in a segment but they by default will not have a project assigned. This command will allow to spin a new project for you. The project is spun up off a release. The release is created by goodot app synchronize.

goodot app spin-up-projects
synchronize

Synchronize will do two things.

  • It will package up current master proejct for each segment and aave them. This package will be used as a snapshot for spinning up all new projects by spin-up-projects.
  • It will try to synchronize the existing projects that are not up to date.

    goodot app synchronize

You can also specify segment to perform release and sync in just that particular segment.

goodot app synchronize -s SEGMENT_ID
export-association

This will export the association between segments and clients for backup or documentation purposes.

It will export

  • segment id
  • client id
  • project id

It either prints it to the STDOUT

goodot app export-association

or into a file if specified.

goodot app export-association -f file.json

Sometimes it might be useful to omit the project IDs. You can omit them in the output by using

goodot app export-association --without-project
import-association

This will import the association stored in a file. This accepts the format which is output by goodot app export-association.

goodot app import-association -f file.json

The segment and id keys are required. The project key is optional. The way it is treated on API is that if you specify project it is set to that value. If you do not specify project it is kept as it is. The file is taken declaratively the result will be as specified in the file. Superfluous clients will be deleted including their projects

Lifecycle Tutorial

Let's walk through a sample implementation. We will use Ruby SDK and LCM SDK for all the steps.

Definitions

Before we start let's define and explain some terms

Segment

This is a term that we use for a group of customers that share the same model, same reports and same ETL. This does not mean the share the exact same instance of each but the get each their respective copy. We typically say "Customer X is in 'Basic' segment".

Client

This is another word for a customer. At gooddata a client (in context of LCM) has meaning of creating a relationship between specific segment and specific project (maybe not yet created). Again we can tyically say something like "Client 'acme_production' is in 'Basic' segment"

Master project

Each segment has a master project associated. This is the place where each client gets its model/etl/reports from.

Master-Segment

While master project is defined in LCM as just a project that is associated with a segment as master project I would like to extend the definition a bit. The extension is here to allow you to

  • Ability to identify a Master project which is no longer associated with any segment (API does not keep history of that)
  • Ability to revert to a particular version in case something goes wrong
  • Ability to keep track of the versions

If you will use LCM tooling all the low level stuff will be handled for you almost transparently. If you however wish to do it yourself but still would like to use LCM CLI for inspecting things this is the interface that it expects you to fulfill.

Master project has to have several metadata properties defined. You can easily define those using SDK by following this example. The properties are

"GD_LCM_TYPE" = "master_project"
"GD_LCM_VERSION" = "x.y.z" # for example '1.0.0'
"GD_LCM_SEGMENT" = "id_of_the_segment"

Hello world

This is a proverbial hello world of LCM applications. Let's assume we have an application that has 2 segments - Basic and Premium. These 2 have different models and different reports (very symbolically represented in this example). We would like to demonstrate the following in the hello world example

  • there are 2 different segments (basic, premium) each having unique model and different reports
  • each project has users synced automatically from data
  • each project has data permissions synced automatically from data
  • we will show how to perform update of ETL
  • we will show how to perform update of model
  • we will show how to perform addition of new dashboards
  • we will show how to perform a rollback to earlier version of the application

Hello world

Prerequisites

We assume that we have

  • Whitelabeled GoodData Organization (contact support if you do not have that)
  • ETL processes ready for data acquisition and data manipulation. I will call these "downloader" and "transformation". I will show only how to deploy them or update them not how to create them since this is beyond the scope of this tutorial
  • we will refer to organization name with name 'organization_name' this will be different in your particular case
  • we will refer to token with name 'token'. Again in your particular case this will be different.

Code

If you want to follow along all the code can be found on github https://github.com/fluke777/lcm_example

Definition of segment masters

First we need to set up the application structure. We have to define 2 projects that will serve as masters for our segments' clients and then set up the segments itself. The masters of segments serve several purposes.

  • They are the blueprint of the model. Every client project that will be in that particular segment will receive the same model
  • Similarly it serves as a master for the ETLs. Every client project will get the ETLs (with some slight parametrization) as the master project.
  • The master contains the dashboards. The clients' projects will receive those dashboards and all the dependent reports and metrics.

Basic segment's master

You can find the code for definition of the basic segment. The flow is pretty simple. We define a model. Then we create the processes and their schedules. When everything is ok we define the actual segment.

Premium segment's master

Let's repeat for the premium segment. It is almost identical code with the only difference of defining a different model. Again this is represented here very symbolically just by having a model with two datasets instead of one. In real situations they might be very different.

ETL

The ETL is compliant with the blueprint and it typically consists of three pieces executed in this order.

  • load of data into the project
  • creation of Data permissions
  • addition of users into the project

The flow is illustrated on this picture

ETL flow

Creation of segments

You can create segments by running code

ruby initial_deployment.rb

This will create the

Again you can verify everything went fine by inspecting the application

goodot -U DOMAIN_OWNER_LOGIN -P PASSWORD -s ORGANIZATION_URL -d ORGANIZATION_NAME masters list

The result will look something like this

+----------------------------------+-----------------------+-----------------+---------+
| PID                              | Title                 | Segment ID      | Version |
+----------------------------------+-----------------------+-----------------+---------+
| zgetgfidjtcz5h0e2e5fafjl4y1zve2k | Basic Segment 1.0.0   | basic_segment   | 1.0.0   |
| eun7nh25r0vx1f3bd8omtlsvna3tqplv | Premium Segment 1.0.0 | premium_segment | 1.0.0   |
+----------------------------------+-----------------------+-----------------+---------+

Notice we still do not have any clients defined. You can verify this by running

goodot -U DOMAIN_OWNER_LOGIN -P PASSWORD -s ORGANIZATION_URL -d ORGANIZATION_NAME clients list

Service project

Service segment is one place per application where the following happens. This is usually not a rule but it is typical since many things need to happen only once per application and it is beneficial if they happen only once and from the same place.

  • the data downloads happen here for all projects
  • the data preparation/mangling happen here
  • the organization's users get created and/or updated here

The way service project is created is that we will create a project that is special. The project is created in a very similar way to the master project we showed previously.

ruby inital_deploy_service_project.rb

This project is currently not in any segment. You have to remember its PID.

Processes

Let me describe what the processes here do and why they are in the order they are in. The high level picture is looking like this

ETL in service project

Downloader/Data acquisition

This makes sure we have the data in place where we need them. If the customer loads the data directly to ADS or project specific storage this step might be superfluous. The case might be you are downloading the data from some API so this is the step where it would happen.

Data transformation

Data does not necessarily need to arrive in the shape suitable to be consumed by subseqnet steps. Here is the place to slice and dice and mangle the data into appropriate shapes and put them into the right places. Again might not be needed.

Segment/client association

Part of the data load will be information about which clients are in each segment. Here we are talking about creating or removing new clients. Moving them between segments is currently not covered (TODO: VERIFY). The association is handled by Association brick. Briefly what it does is

  • makes sure old clients are discarded
  • makes sure new clients are provisioned. When provisioned
    • the association is added
  • when deprovisioned
    • the association is removed
    • the client project is discarded

Take note that the association is expected to be defined declaratively. What should be provisioned is determined by comapring to current state. You can find more details about this brick here. Association brick docs

Project provisioning

The previous step creates only association between a segment and a client. It says "This client is in this segment". This still does not mean the project is created for him. The project is created in this step. This means

  • the model, ETL and reports are cloned from the released master project
  • the schedules are enabled
Organization users synchronization

This makes sure the users are created and updated in your organization. This only adds the information about users into GD - creates accounts for them. The actual assignement happens in the respective projects. You can read more about this brick here. It is the same brick as the one used to add users into projects. It is used in mode 'add_to_organization'.

Client ETL kick off

The ETL's in the client are usually not scheduled at all. The reason for this is that they run only when the data are prepared. There are couple of strategies how to achieve this but this is probably the least error prone.

Since the platform does not provide a way how to declaratively define schedule triggers based on succesfull finished of a schedule in a different project we use (this brick)[https://github.com/gooddata/app_store/tree/master/apps/schedule_execution_brick] to achieve something similar. Take note that when we deployed the ETLs to clients we marked them with a tag. This will search for all the schedules marked with that tag and execute them.

Making initial release

Part of the previous script was also a relase. This means several things.

  • Release is something like a marker in time
  • Every new project spun up will get the model + that was in the segment master at the time of a release
  • All current projects will be updated to the currently released version

Also note that we prepared the whole master upfront from scratch and only then we created the actual segments and released them. We will use similar approach when we update the segments. Specifically we will always be following one simple rule

  • We will never update a master that is live.

Segment master is treated like an immutable thing. This will help us make sure that we will always be able to go back to a certain version of a master if something goes wrong. We will get in more detail how to update master (for example with new reports later)

Recap

Now we have everything working. Namely

  • we have segments defined and released
  • we have automatic processes in place that will provision new users, data_permissions and new clients based on incoming data.
  • we have service project that contains processes to update users, provision new clients and kick off the ETL in clients

Try it out

Let's try it out. We have prepared a file that will upload some test data into appropriate places. Run

ruby service_data_load.rb SERVICE_PROJECT_PID

This will upload 2 files. One that defines several clients to be spun off. Two test users that will be added to domain. Now go to the service project in browser (the URI should look like this https://DOMAIN/admin/disc/#/projects/SERVICE_PROJECT_PID) and execute the first schedule in the pipeline (that should be downloader). After all finishes you should be able to verify that it went fine like this.

There should be additional clients

goodot -U DOMAIN_OWNER_LOGIN -P PASSWORD -s ORGANIZATION_URL -d ORGANIZATION_NAME clients list

The result should look like this

+------------+----------------------------------------------+------------------------------------------------+
| Client Id  | Segment URI                                  | Project URI                                    |
+------------+----------------------------------------------+------------------------------------------------+
| acme       | /gdc/domains/mustangs/segments/basic_segment | /gdc/projects/sjpi817nns6o8sszbsb6ohshevbtrbfe |
| hearst     | /gdc/domains/mustangs/segments/basic_segment | /gdc/projects/c50pp64hh5m08v78118hiy2ehmruur0v |
| level_up   | /gdc/domains/mustangs/segments/basic_segment | /gdc/projects/jmfwto12m0bvmd8vi6x4f9gbxf5l17hc |
| mastercard | /gdc/domains/mustangs/segments/basic_segment | /gdc/projects/if5k4ohvxyha8zdxe74j4rt5ggw93lst |
+------------+----------------------------------------------+------------------------------------------------+

There should be users in your domain. You can verify it that you jack in to your application

goodot -U DOMAIN_OWNER_LOGIN -P PASSWORD -s ORGANIZATION_URL -d ORGANIZATION_NAME jack-in

and execute this

domain.users.map(&:login)

This should return something like

["[email protected]", "[email protected]", "[email protected]"]

Exit with executing

exit

Now you should have your client projects spun up with ETLs and models and everything. Let's try to run one by hand. First you need data for the processes. Run

ruby client_data_load.rb CLIENT_ID

For example

ruby client_data_load.rb hearst

This will upload files for users and filters. The data load does not do much in the example apart from outputting a messeage into log. After the script finishes go to the project in the browser and run the load schedule. This will execute all the processes under that project. Once done we expect 2 things to happen.

  • have some users inside a project
  • have data permissions defined for them

The first one you can verify like this. Jack in

goodot -U DOMAIN_OWNER_LOGIN -P PASSWORD -s ORGANIZATION_URL -d ORGANIZATION_NAME jack-in

and execute

domain.clients('hearst').project.users.map { |u| [u., u.role.identifier] }

The second one can be verified by running

domain.clients('hearst').project.data_permissions.pmap { |f| [f., f.pretty_expression]}

What's next?

Now when we have the basis up and running we would like to cover couple of additional typical scenarios that will occur.

  • How to release an updated version of ETL
  • How to release new dashboards
  • How to introduce new model changes

Releasing new changes

The process is basically the same as the one we used during initial release. Since all the changes we described are codified in the master for particular segment the steps are very similar for all the cases.

  • We will clone the current master project
  • We will update the master project. This means
    • Changing the model
    • Add new dasshboards
    • Change the model
  • We bump the version of the segments
  • We swap the master project for the new one
  • We make a release
  • We update all the existing projects

The following picture illustrates the process

ETL in service project

There are couple of very important things to notice.

  • We never changed the old master so if there is a need we can always downgrade back.
  • Since we are swapping the masters in an instance there is no risk that some project might have been created from the old version and some from the new version.

Unfortunately everything is not perfect. There are couple of things that have to be considered durng the design of the application or that particular release. Let's discuss them briefly here.

The problem of change

There are couple of innate problems with any change introduced in the project and that is "what is the source of truth" and "what happens if there is some clash of old data and the new changes". Let's walk through typical examples of change in a project and how it affects thecomplexity of an application.

Model changes

There are basically two scenarios here. In scenario one you as a application provider is completely responsible for all the reports and dashboards provided. Customer has no ability to create new content. In such a case everything should be fine and you are able to introduce arbitrary changes into the model. This case is prevalent in scenarios where customers are accesing GD through some other application and typically do not have access to ad hoc capabilities.

The second scenario is that end users do have access to adhoc content creation capabilities. Now when they create some new content they are depending on model or metric that you provided. If you change that in future version you are risking that you either change the meaning of the report (which is very dangerous) or that you break outright it. The solution is either do not change anything just add new stuff which is something that cannot be done indefinitely becuase it adds clutter to the project or you have to migrate the reports in some way. Which generally requires some manual work.

ETL/Model changes

Here the problem arises from the fact that different versions of the application might need different data. You need to make sure that changes to several parts of the application happens at the same time which might be difficult to do and this is something that should be thought of in the innitial design. The worst case scenario is to shut down the ETLs and enable them only after the release. This is difficult since it requires to "stop the world" which is inherently difficult.

Example of a migration

When migrating to a new version we need to ensure several things happen. To emphasize again we want to uphold our rule of "not touching a live master project" in any way.

  • We need to create a new master project. The change we want to make is
    • adding a report
    • adding a field
    • changing the ETL
  • We need to tag it appropriately
  • We need to make new release
  • We need to update existing projects

There is a small helper method that helps you doing that. We will try illustrating two examples how to use it. Imagine, we want to do two things in a migration

  • Change a project model. Let's add a field
  • We need to change the upload process since we need to fill it with new data
  • Let's add a new dashboard that will take leverage of that new field
Code based migration

The migration is fleshed out in file. Let's pay attention to this particular piece of code

release(domain, '2.0.0', auth_token: TOKEN, service_project: service_project) do |release|
  release.with_segment('basic_segment') do |segment, new_master_project|
    blueprint = new_master_project.blueprint
    blueprint.datasets('dataset.departments').change do |d|
      d.add_fact('fact.departments.number', title: 'NUMBER')
    end
    new_master_project.update_from_blueprint(blueprint)
    redeploy_or_create_process(new_master_project, './scripts/load/2.0.0', name: 'load')
  end
end

The method release will go over all the segments. Take its masters, clone them and give them to you to adjust how you would like to. This is accomplished by release.with_segment('basic_segment') call. We are saying that we want to change that particular master. We will get a new master that we can change. If particular segment's master is not requested it is just cloned. The old version is kept around. This is for the reason that all the masters are on the same version so you can reason about migrations easily. Once all the udates finish succesfully. The new masters are taken, tagged, and swapped as new segments' masters. Then the release is performed. Since this point any new spun up clients will receive the data from these new masters. We also have to update the exeisting projects which happens automatically as part of the release method.

Hand based migration

While the previous example is nice because it is all done inside code this is not a typical flow for many changes. Usually the new reports are created by people sometimes over several weeks. We wanted to support this scenario as well so there is another way how to use the migration. You can basically opt out of using the proposed clone. This is done by using different method.

release(domain, '2.0.0', auth_token: TOKEN, service_project: service_project) do |release|
  release.provide_segment('basic_segment') do |segment|
    client.projects('PID_OF_YOUR_PROJECT')
  end
end

In such case the framework does not create a clone for you. The project you return is used (given it has all the appropriate tags as defined above). It is your responsibility to provide a master project that is compliant with the expectations.

In this flow it is typical that you want to start from a project that is actually a clone of the current version so you can just tweak it here and there. There is a method in goodot that can help you with this.

(TODO add goodot command)