Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • SETUP
    • Windows 11 Pentaho Lab
  • FAQs
    • FAQs
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Flat Files

XML

Data exchange & storage ..

PreviousExcel WriterNextRead XML

Last updated 1 month ago

XML & XPath

XML (eXtensible Markup Language) is a versatile format for structuring and storing data, widely used in various applications and data exchange scenarios. One of the powerful tools for working with XML is XPath (XML Path Language), which provides a way to navigate and extract specific data from XML documents.

XPath is a set of rules used for getting information from an XML document. In XPath, XML documents are treated as trees of nodes. There are several types of nodes; elements, attributes, and texts are some of them. As an example, document, and order are some of the nodes in the sample file.

Among the nodes there are relationships. A node has a parent, zero or more children, siblings, ancestors, and descendants depending on where the other nodes are in the hierarchy. To select a node in an XML document, you should use a path expression relative to a current node.

Workshops

The "Get Data from XML" step in Pentaho Data Integration extracts data from XML files using XPath expressions. It converts hierarchical XML structures into tabular formats suitable for database loading and downstream processing.

This component is essential when integrating systems that use XML for data exchange, allowing ETL developers to precisely target specific elements within complex documents. The step supports dynamic processing through parameter substitution, enabling adaptable data pipelines that can handle varied XML sources.

Read XML

The "Get Data from XML" step in Pentaho Data Integration extracts data from XML files using XPath expressions. It converts hierarchical XML structures into tabular formats suitable for database loading and downstream processing.

Read XML
XPath Tutorial
Link to Xpath tutorial
Logo
X path
Parse XML