Talend data stewardship console

Taking a review of the new Talend Data Stewardship Console 6.3.1

Introduction

Talend’s Data Stewardship Console is a web-based application which is part of Talend’s commercial data quality product suite (see https://www.talend.com/products/data-quality/).

This is how Talend documentation describes this product:

Talend Data Stewardship is a comprehensive tool you can use to manage data assets. It organizes data interactions whenever human intervention is required, for example, to do arbitration, review, merge or cleanse data.

Main Concepts

Three basic concepts are key to operating with this product :

 

✓ Tasks

 

✓ Data models

 

✓ Campaigns

 

The user interface displays these three concepts nicely in its navigation:

 

navigation

The next chapters give a basic introduction to these concepts.

Tasks

Tasks can be assigned to users and have a type and a state.

Table 1 shows the main task types supported by the Talend Stewardship Console.

Table 1. Data Stewardship Tasks
Task Description
Arbitration This type of task is based on a question and a data set. Typically, the data steward needs to filter records meeting a specific criterion. The steward classifies the data manually.
Resolution The steward repairs the content of fields in a data set or adds missing data to empty fields.
Merging The steward merges several potential duplicate records into one single record.

In the following screenshot you can see merging tasks assigned to a user:

 

tasks_ui

Data Models

In order to be able to execute the abovementioned tasks, you need to declare the structure of your data – the data model. The data model is a container of attributes with specific properties.

The user interface of the Talend Stewardship console lists on the left side the names of the attributes. On the right side of the interface you have the properties of each single element.

data_model_edit

Each element can be associated to an enumeration of values or a validation pattern.

Note It appears that a user cannot create complex data types (data types which contain subtypes, as in XML schemas) which would support hierarchical data models. All models you can address with the Talend Stewardship console are flat.

Extending Data Types

You can extend semantic data types based on regular expressions in the Talend Dictionary Service.

You can create a semantic type based on a regular expression in the Talend Dictionary Service and add it to the list of recognized data types in Talend Data Stewardship.

Campaigns

All tasks are contained in campaigns which you can see as a “data stewardship project”. Campaigns can have the three types mentioned in Table 1:

 

✓ Arbitration

 

✓ Merging

 

✓ Resolution

add_campaign

A campaign contains a set of owners and stewards.

campaign_roles

It is associated to a single data model and a workflow. In a workflow you can associate roles to each of the workflow steps.

campaign_workflows

Data Ingestion

Until now we have seen that the Data Stewardship Console allows you to perform operations on imported data. Yet we have not seen any mechanism in the web application which allows users to import data. It appears that data can only be imported via Talend data integrations jobs.

Talend has provided three brand new components in Talend Data Fabric 6.3.1 which allow users to send, retrieve and delete data in the Talend Data Stewardship database:

 

tDataStewardshipTaskOutput

tDataStewardshipTaskOutput

 

 

This component is used to create new tasks in an existing campaign.

 

tDataStewardshipTaskInput

tDataStewardshipTaskInput

 

 

This component is used to read the data rows in tasks of a campaign assigned to specific users with specific state and priority.

 

tDataStewardshipTaskOutput

tDataStewardshipTaskOutput

 

 

This component is used to delete tasks in a specific campaign.

 

Note If you have old Talend DI jobs which use the old components tStewardshipTaskOutput, tStewardshipTaskOutput and tStewardshipTaskDelete, they will not work with the new Stewardship console.

Example of a Talend DI job ingesting tasks

In order to import tasks for a merging campaign you use the tMatchGroup component together with the tDataStewardshipOutput component. The following job generates a set of rows, compares the rows to each other and outputs those that have a low matching score to the Stewardship Console:

example_tDataStewardshipTaskOutput

Under the Hood

Talend’s Data Stewardship Console has changed a lot in version 6.3.1. It now uses MongoDB to store data, a REST interface defined with Swagger and Kafka queues. The application also seems to have been written Spring Boot and on the client side with the React framework, the application used is Tomcat.

It also seems that the Stewardship Console uses the credentials created in TAC (Talend Administration Center).

It is a complete re-write of the old Stewardship console which was using the MySQL and SOAP interfaces.

 

MongoDB

The MongoDB database (a NoSQL document database) contains the following collections:

 

  • campaignActions
  • campaigns
  • comments
  • data-events
  • participants
  • schemaReferences
  • schemas
  • tasks
  • userData

The relevant data entities are stored in Mongo Collections (schemas, tasks, campaigns).

 

Conclusion

The Talend Stewardship Console is a powerful tool which allows data stewards to correct, filter and merge records. In 6.3.1 Talend has created a new web application with a completely new UI and three new Studio components. The application was thoroughly re-vamped:

 

  • The stewardship work is now much better organised, using campaigns and roles.
  • The data model now is pre-defined before you can start your work.

On the technology side a lot has changed: the back-end is now MongoDB, the UI is React based, messaging is done via Kafka.

As far as we understand it, the Talend Stewardship Console does not support hierarchical schemas, as are needed for complex types and the capability of optionally working with dynamic schemas.

Click Here and Download Complete PDF Document of
Talend Data Stewardship Console 6.3.1 Post

Leave a Reply

mautic is open source marketing automation