Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • SETUP
    • Windows 11 Pentaho Lab
  • FAQs
    • FAQs
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Object Stores

Hitachi Content Platform

Object Storage ..

Last updated 1 year ago

This workshop refers to the PDI/PBA + PDO/PDC + HCP Skytap environment.

Hitachi Content Platform (HCP) is an object-storage solution designed for efficient and secure data management. It allows organizations to store, protect, and retrieve vast amounts of unstructured data with ease.

HCP integrates seamlessly with various applications and provides advanced features such as data deduplication, compression, and encryption. Its scalable architecture and robust governance capabilities make it suitable for both on-premises and cloud environments, ensuring data integrity and accessibility.

HCP stores objects in a repository. Each object permanently associates data HCP receives (for example, a document, an image, or a movie) with information about that data called metadata.

In PDI, you can query the metadata to locate and access HCP objects. The HCP object consists of a read-only file, a unique URL, system metadata properties, and custom metadata annotations.

A VFS (Virtual File System) connection allows you to integrate and manage different storage systems within PDI, abstracting the complexities of underlying protocols. It provides a unified interface to access a variety of storage backends like Amazon S3, Azure Data Lake, Google Cloud Storage, and more.

Create a VFS connection

Perform the following steps to create a VFS connection in PDI:

  1. Start the PDI client (Spoon) and create a new transformation or job.

  2. In the View tab of the Explorer pane, right-click on the VFS Connections folder, and then click New. The New VFS connection dialog box opens.

  3. In the Connection name field, enter a name that uniquely describes this connection. The name can contain spaces, but it cannot include special characters, such as #, $, and %.

  4. In the Connection type field, select from one of the following types:Amazon S3 / MinIO:/HCP (Default):

    • Simple Storage Service (S3) accesses the resources on Amazon Web Services.

    • MinIO accesses data objects on an Amazon compatible storage server.

x

x

x

x

x

x

x

x

x

HCP uses the S3 protocol to access HCP. See for more information.

Access to HCP REST
HCP Solution
Metadata