Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Flat Files

JSON

JSON

JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write, and simple for machines to parse and generate. It uses a text-based structure with key-value pairs and arrays to represent data. JSON is language-independent and widely used for transmitting data in web applications.

Now, to extract key-value pairs from this JSON object in Pentaho Data Integration, you would typically use the "JSON Input" step.

{
  "customer": {
    "id": 1001,
    "name": "John Doe",
    "email": "john.doe@example.com",
    "active": true
  }
}

In the JSON Input step, the data stream field name, path and data type are defined.

Name
Path
Type

id

$.customer.id

Integer

name

$.customer.name

String

email

$.customer.email

email

active

$.customer.active

Boolean

Here's a brief explanation of the JSON Path notation used:

$ represents the root of the JSON document

.customer navigates to the "customer" object

.id, .name, .email, and .active access the respective fields within the "customer" object

Workshops

Pentaho Data Integration offers several specialized steps for working with JSON data in your ETL processes.

The JSON Input step reads JSON data from files or fields, supporting complex nested structures and JSON Path expressions for precise data extraction. It handles arrays and provides options for managing missing values.

JSON Output converts your transformation data into JSON format, with control over formatting, file output options, and the ability to create both objects and arrays.

The REST Client step connects with REST APIs that typically use JSON, handling authentication, headers, and processing the returned JSON responses for further transformation.

Common workflows include API integration, JSON file processing, and complex JSON transformations, often using these steps in combination for effective data handling.

Read JSON

In this workshop our standard customer orders file is in a JSON format. Simply going to onboard the required fileds into our data stream.

PreviousRead XMLNextRead JSON

Last updated 1 month ago

Read JSON
Json Input