Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources

Flat Files

How Data Integration handles Flat files ..

PreviousData SourcesNextText

Last updated 1 month ago

Workshops

Despite being the most basic format used to store data, files are broadly used and they exist in several formats as fixed width, comma-separated values, spreadsheet, or even free format files. Pentaho Data Integration can read data from all types of files.

TXT & CSV Files

Some of the Orders data that Steel Wheels process are in a text format. In this guided demo, you will flatten the list, create capture groups, replace text, and finally format the order_value.

In this demonstration, you will format the text file input to be onboarded into a database table:

  • Text File Input

  • Flattener

  • RegEx Evaluation

  • Replace in String

  • Select values

TXT & CSV

Steel Wheels wants to send out a survey to its customers, based on a list of questions.

In this demonstration, you will configure a text file survey:

  • Get System Info

  • User Defined Java Expression

  • Data Grid

  • Append

  • Text File Output

Excel

Steel Wheels wish to automate their Half Yearly Sales and Expenses Report in Excel. The ETL process has been broken down into various workflows, resulting in writing data to an Excel template, once previous workflows have been completed.

In this demonstration, you will populate an Excel workbook source data sheet in a template:

  • Excel Writer

  • CSV File Input

  • Block Step

XML

Steel Wheels has some data sources in XML format. This guided demonstration illustrates the 3 data source options for retrieving XML data.

In this demonstration, you will retrieve XML data and format:

  • Data Grid

JSON

Steel Wheels have several JSON data sources. In this guided demonstration, you will create a simple workflow to extract the required reporting dataset.

In this demonstration, you will retrieve JSON data and format:

  • JSON Input

RSS

The good old days of bulletin boards ..!

A lot of websites have RSS feeds which can be used to: update a news feed, stock prices, sports scores and so on ..

In this demonstration you will configure the following step:

• RSS

Currently being updated to next version ..

Text File Output
Excel Writer
Read XML
Read JSON
Text File Input