Talend Spark Kudu Components
Three components have been created which allow Talend users to create Big Data Batch jobs with Spark using Kudu:
This component set allows to connect to Kudu, read from and write to Kudu tables.
This component allows to setup the connection details to the Kudu server. As of now you can only set the servers and ports to which to connect.
This component is then used internally by the two other components tKuduInput and tKuduOutput which cannot operate without these.
This component reads data in form of an RDD from a specified Kudu table. It can either scan the whole table or filter rows from a Kudu table using Kudu predicates.
The supported predicates are:
3. GREATER EQUALS
5. LESS EQUALS
6. IS NULL
7. IS NOT NULL
It also allows to set a limit to the output rows.
Simple Example Job
This job sets up two configuration components which setup a connection to HDFS (tHDFSConfiguration_1) and a connection to the Kudu server (tKuduConfiguration_1). The tKuduInput_1 component is then used to read data from a Kudu table:
The output produced is then consumed by a tLogRow component. It could be consumed by any other Spark batch component though.
This component supports the following functionality:
1. Create (also re-create) a table including two types of partitions on primary keys:
a. Range partitions (at the moment it only supports one single column)
b. Hash partitions
2. Perform writing operations on a table:
a. Insert – insert new records
b. Delete – delete existing records
c. Update – update existing records
d. Upsert – update or if the record does not exist, insert a new record
The components have some known limitations. The most important one is the lack of support for the Date data format. This is due to the fact that Talend uses internally java.util.Date in the avro records and this cannot be converted to the Kudu timestamp data type.
These components have been tested with Talend 6.4.1. But they should also work with Talend 6.5.1.
These components can be installed in Talend Studio in the following way:
1. Unpack the provided zip file into a local folder ()
2. Start Talend Studio
3. Go to Menu “Window” -> “Preferences”
4. Type “Component” in the search field on the top left side of this dialogue:
5. Click on “Components” in the tree on the left panel
6. Write the path to in the field with label “User component folder:”
7. Click on “Talend Component Designer” in the tree on the left panel
8. Write the path to in the field with label “Component project:”
9. Click the “OK” button