Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Databases
  4. CRUID

Create DB

Create tables ..

PreviousDatabase ConnectionsNextRead DB

Last updated 28 days ago

Workshop - Create DB

If you work with databases, one of the main objectives will be to extract, load and transform your data. Steel Wheels has several data sources that require loading into a database to discover, cleanse, conform, enrich and validate the data for reports.

In this guided Lab, you will:

  • Connect to a database

  • Create a database table

  • Configure: Table Input


Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Load sales data from a csv file into: STG_SALES_DATA table.

Modify the VARCHAR to prevent the value being truncated.

Follow the steps outlined below:

CSV File input

This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.

This step has fewer overall options than the general Text File Input step, but it has a few advantages over it: NIO

Native system calls for reading the file means faster performance, but it is limited to only local files currently. No VFS support.

Parallel running

If you configure this step to run in multiple copies or in clustered mode, and you enable parallel running, each copy will read a separate block of a single file allowing you to distribute the file reading to several threads or even several slave nodes in a clustered transformation.

Lazy conversion

If you will be reading many fields from the file and many of those fields will not be manipulate, but merely passed through the transformation to land in some other text file or a database, lazy conversion can prevent Kettle from performing unnecessary work on those fields such as converting them into objects such as strings, dates, or numbers.

  1. Drag the CSV file input step onto the canvas.

  2. Open the CSV file input properties dialog box.

Stepname

csvi-sales_data

Filename

${Internal.Entry.Current.Directory}\sales_data.csv

Delimiter

, (comma)

Lazy Conversion

unchecked

Header row present

checked

The CSV File input parses a sample number of records to set the metadata properties. As the Varchar is being mapped to a database table then you need to be careful to avoid truncation.

  1. Ensure the following details are configured, as outlined below:

  1. Click on the Get Fields button.

  2. Click OK.

Table output

The Table Output step allows you to load data into a database table. Table Output is equivalent to the DML operator INSERT. This step provides configuration options for target table and a lot of housekeeping and/or performance-related options such as Commit Size and Use batch update for inserts.

  1. Drag the Table Output step onto the canvas.

  2. Open the Table Output properties dialog box. Ensure the following details are configured, as outlined below:

  1. Click on the Database fields.

  2. Click on the ‘Get Fields’ button.

Ensure the mappings bewteen Table fields and Stream fields are correct.

  1. Click on the SQL button.

  1. Click Execute.

  2. Click OK to Close all windows.

RUN

At some stage you will need a Database Management tool to check the results. Tables are usually created by executing external SQL scripts which, if required, can be set as in the cope of the Project.

  1. Click the Run button in the Canvas Toolbar.

  2. Check that the table has been created in the sampledata database.

Load orders data from a csv file into: STG_ORDERS_MERGED table.

Modify the VARCHAR to prevent the value being truncated.

Follow the steps outlined below:

  1. Drag the CSV file input step onto the canvas.

  2. Open the CSV file input properties dialog box.

Ensure the following details are configured, as outlined below:

  1. Drag the Table Output step onto the canvas.

  2. Open the Table Output properties dialog box. Ensure the following details are configured, as outlined below:

  1. Click on the Database fields.

  2. Click on the ‘Get Fields’ button.

  1. Click on the SQL button.

  1. Click Execute.

  2. Click OK to Close all windows.

  1. Click the Run button in the Canvas Toolbar.

  2. Check that the table has been created in the SampleData database.

Create databases
Load sales data
CSV file input
Table output - options
Table output - fields
SQL editor
STG_SALES_DATA
Load orders data
Add path to orders.txt
Set Content
Get Fields
Table output - options
Table output -fields
SQL editor
STG_ORDERS_MERGED