Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources

Object Stores

Introduction

Object storage systems are designed to handle large amounts of unstructured data like documents, images, videos, and backups. They organize data as objects rather than files in a hierarchy, with each object containing the data, metadata, and a unique identifier.

minIO is an open-source object storage solution that's compatible with Amazon S3's API. It's particularly popular for private cloud deployments and can be run on-premises or in any cloud environment. minIO excels at high-performance workloads and is often used in conjunction with Kubernetes for scalable container deployments.

Amazon S3 (Simple Storage Service) is the industry standard for cloud object storage, offering virtually unlimited scalability, 99.999999999% durability, and extensive integration with AWS services. It provides different storage tiers (like Standard, Infrequent Access, and Glacier) to optimize costs based on access patterns.

Hitachi Content Platform (HCP) is an enterprise-grade object storage system that focuses on data governance, compliance, and security. It offers advanced features like data classification, retention policies, and WORM (Write Once, Read Many) capabilities. HCP can be deployed on-premises or in hybrid cloud configurations and supports multiple protocols including S3 compatibility.

Workshops

The Virtual File System (VFS) is an abstraction layer that provides a unified interface for accessing different types of file systems and file storage. It creates a consistent programming interface that hides the specific details of the underlying storage mechanisms.

VFS allows applications to access files across various storage types—local disks, network locations, cloud storage, archives, FTP servers, SFTP sites, HTTP resources, and more—using a single, consistent API. This eliminates the need to implement separate code for each storage type.

Key benefits include location transparency (uniform access regardless of physical location), protocol independence (same operations across different protocols), and enhanced functionality (metadata access, caching, security controls). VFS implementations are common in operating systems (Linux VFS), programming frameworks (Apache Commons VFS), and data processing tools (Pentaho Data Integration's VFS support).

In tools like Pentaho, VFS enables seamless reading and writing of data across diverse storage systems using a standardized URI-based path notation, significantly simplifying data integration workflows involving multiple storage technologies.

minIO

The workshop focuses on how to configure the VFS connection to a minIO Object Store.

x

x

x

x

PreviousSCDsNextMinIO

Last updated 1 month ago