Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Object Stores

MinIO

Access S3 type Object Store -VFS ..

PreviousObject StoresNextSMB

Last updated 1 month ago

Workshop - MinIO

MinIO is a high-performance, Kubernetes-native object storage system designed for cloud-native applications. Built from the ground up to be compatible with Amazon S3, MinIO offers a lightweight yet powerful alternative for organizations looking to deploy object storage in their own infrastructure.

At its core, MinIO provides distributed object storage with performance characteristics. It's capable of handling millions of operations per second and can store petabytes of data while maintaining sub-millisecond latency. This performance is achieved through a simplified architecture that eliminates complex dependencies and optimizes for modern hardware capabilities.

One of MinIO's key strengths lies in its versatility. It can be deployed virtually anywhere - from bare metal servers to public, private, and edge cloud environments. Organizations particularly value its seamless integration with Kubernetes, making it an ideal choice for containerized environments. MinIO's open-source nature also provides transparency and flexibility that many enterprises require for their data infrastructure needs.

The following section is for Reference only.

Samba has been installed and configured on both Windows & Linux servers.

Setup

x

  1. Update system pacakages.

sudo apt update && sudo apt upgrade -y
  1. Create a MinIO directory.

cd
mkdir -p ~/MinIO/data
ls -al
  1. Create a docker-compose.yml file.

cd
cd ~/MinIO
nano docker-compose.yml
  1. Add the following:

services:
  minio:
    image: minio/minio:latest
    container_name: minio
    restart: always
    ports:
      - "9000:9000"
      - "9002:9001"
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    volumes:
      - minio_data:/data
    command: server /data --console-address ":9001"
volumes:
  minio_data:
  1. Run MInIO.

cd
cd ~/MinIO
docker-compose up -d
  1. Access MinIO UI:

Username: minioadmin

Password: minioadmin

The MinIO port has been changed to prevent conflicts.


MinIO CLI

The AWS Command Line Interface (CLI) seamlessly integrates with MinIO, providing a consistent experience for managing S3-compatible object storage. By specifying MinIO's endpoint URL with the --endpoint-url parameter, users can leverage familiar AWS S3 commands for operations like creating buckets, uploading files, and managing permissions.

  1. Install AWS CLI

sudo apt install awscli -y
  1. Configure AWS CLI for MinIO.

aws configure --profile minio
Setting
Value

Access Key ID

minioadmin

Secret Access Key

minioadmin

Region

us-east-1

Output format

json

  1. Create a MinIO Bucket.

aws --endpoint-url http://localhost:9000 --profile minio s3 mb s3://my-bucket
  1. Upload a file.

echo "Hello Minio" > test.txt
aws --endpoint-url http://localhost:9000 --profile minio s3 cp test.txt s3://my-bucket/
  1. List files.

aws --endpoint-url http://localhost:9000 --profile minio s3 ls s3://my-bucket/

Virtual File Systems

PDI allows you to establish connections to most Virtual File Systems (VFS) through VFS connections. These connections store the necessary properties to access specific file systems, eliminating the need to repeatedly enter configuration details.

Once you've added a VFS connection in PDI, you can reference it whenever you need to work with files or folders on that Virtual File System. This streamlines your workflow by allowing you to reuse connection information across multiple steps.

For instance, if you're working with Hitachi Content Platform (HCP), you can create a single VFS connection and then use it throughout all HCP transformation steps. This approach saves time and ensures consistency by removing the need to re-enter credentials or access information for each data operation.

  1. Create a New Transformation.

  2. Click: 'View' Tab.

  3. Right mouse click on VFS Connections > New.

  4. Enter the following details:

Setting
Value

Connection Name

MinIO:my-bucket

Connection Type

Amazon S3/Minio/HCP

Description

Connectio to my-bucket

S3 Connectio Type

Minio/HCP

Access Key

minioadmin

Secret Key

minioadmin

Endpoint

http://localhost:9000

Signature Version

AWSS3V4SignerType

Root Folder Path

/

  1. Test the connection.

x

x

x

http://localhost:9002localhost
MinIO | High Performance, Kubernetes Native Object StorageMinIO
Link to MinIO
MinIO
MinIO
Bucket
VFS MinIO Connection
Logo