Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources

SMB

File sharing ..

PreviousMinIONextBig Data

Last updated 1 month ago

Workshop - SMB/CIFS

The Server Message Block (SMB) protocol is a network file sharing protocol that allows applications on a computer to read and write to files and to request services from server programs in a computer network. The SMB protocol can be used on top of its TCP/IP protocol or other network protocols

Objective of this workshop is to:

  • install & configure a basic Samba server.

  • share user home directories as well as provide read-write anonymous access to selected directory.

Samba

Samba is a suite of applications that implements the Server Message Block (SMB) protocol. Many operating systems, including Microsoft Windows, use the SMB protocol for client-server networking. Samba enables Linux / Unix machines to communicate with Windows machines in a network. Samba is open source software.

The following section is for Reference only.

Samba has been installed and configured on both Windows & Linux servers.

  1. Double-click the Samba executable file that you downloaded in Step 1.

  2. In the Samba installation wizard, click Next to proceed.

  3. Accept the license agreement and click Next.

  4. Choose the destination folder where you want to install Samba and click Next.

  5. Choose the components that you want to install (we recommend selecting all components) and click Next.

  6. Configure the Samba Server settings (e.g., global workgroup, security mode, etc.) as per your requirements and click Next.

  7. Choose a password for the administrative account that you will use to manage the Samba server and click Next.

  8. Review the installation summary and click Next to start the installation process.

  9. After the installation is complete, click Finish to close the installer.

x

x

  1. Open the Samba configuration file (smb.conf) located in the installation directory.

  2. Edit the smb.conf file to configure the necessary settings (e.g., share directories, users and groups, etc.). You can refer to the Samba documentation for more information on configuring smb.conf.

  3. Save the changes to the smb.conf file.

x

  1. Press the Windows key + R to open the Run dialog box.

  2. Type services.msc and click OK to open the Services Manager window.

  3. Find the Samba service (named "SMB" or "Samba") in the list of services.

  4. Right-click the Samba service and choose Start.

  5. Wait for a few seconds until the Samba service starts.

x

  1. Open the File Explorer on your Windows 10 system.

  2. In the address bar, type \localhost or \[your-computer-name] and press Enter.

  3. You will be prompted to enter your Samba username and password. Enter the credentials that you specified in Step 2.

  4. You should now see a list of shared directories on your Samba server. Double-click the directories to access their contents.

x

Installed on Ubuntu 22.04.

  1. Ensure all installed Packages are up-to-date.

sudo apt update && sudo apt upgrade -y
  1. Install Samba server.

sudo apt install tasksel
sudo tasksel install samba-server
  1. Make a copy of the existing configuration file and create a new /etc/samba/smb.conf configuration file

sudo cp /etc/samba/smb.conf /etc/samba/smb.conf_backup
sudo bash -c 'grep -v -E "^#|^;" /etc/samba/smb.conf_backup | grep . > /etc/samba/smb.conf'
  1. Any user existing on the samba user list must also exist within the /etc/passwd file.

sudo smbpasswd -a pentaho
New SMB password: password
Retype new SMB password: password
Added user pentaho.
  1. Add the home directory share.

sudo nano /etc/samba/smb.conf
  1. Copy & paste the following to the bottom of the file - private home & public access.

[homes]
   comment = Home Directories
   browseable = yes
   read only = no
   create mask = 0700
   directory mask = 0700
   valid users = %S
[public]
  comment = public anonymous access
  path = /var/samba/
  browsable =yes
  create mask = 0660
  directory mask = 0771
  writable = yes
  guest ok = yes
  1. Save.

CTRL + O
Enter
CTRL + X
  1. Create a directory that mounts public share and change its access permission.

sudo mkdir /var/samba
sudo chmod 777 /var/samba/
  1. Restart your samba server.

sudo systemctl restart smbd

SMB Server

We're going to setup the Samba server with access to shareable, public directory - /var/samba/ - that can be accessed anonymously.

Next .. access to the 'pentaho user' - /pentaho/home directory. Obviously you'll need to be a registered user with a password to access the directory.

  1. Let’s create some test files.

touch /var/samba/public-share 
touch /home/pentaho/home-share

Public

A 'public' directory that be accessed from any machine ..

  1. In File Explorer, select: + Other Locations.

  2. Enter the following connection details:

smb://pentaho.local/public/
  1. Connect as: Anonymous.

  1. You should see the public-share file.

Registered

Only registered users can access the /pentaho/home directory ..

  1. In File Explorer, select: + Other Locations.

  2. Enter the following connection details:

smb://pentaho.local/homes/
  1. Connect as: Registered User.

Username: pentaho

Domain: WORKGROUP

Password: password

  1. You should see the public-share file somewhere in the /home directory.

Pentaho Data Integration

Pentaho Data Integration utilizes Virtual File System (VFS) as the abstraction layer within the kernel to expose different filesystems.

  1. Download the library.

  1. Copy the JCIFS JAR file into Pentaho Data Integration "lib" folder.

Download CIFS driver

Pentaho Data Integration ships with jcifs-1.3.3.jar

If you wish to replace the current driver, rename to: jcifs-1.3.3.jar -> jcifs-1.3.3.jar.bak

cd
cd ~/Downloads
cp jcifs-[version].jar ~/Pentaho/design-tools/data-integration/lib
  1. Start Pentaho Data Integration.

cd
cd ~/Pentaho/design-tools/data-integration
./spoon.sh
  1. Create a new Transformation.

  2. Click on the 'View' tab.

  3. Highlight 'VFS Connections' and select 'New'.

  1. Configure with the following details:

  1. Click 'Test'.


Transformation - SMB File Retrieval

Let's create a simple Transformation to onboard data via an SMB VFS connection.

  1. Create the following transformation:

  1. Double-click on Text file input > File tab

  2. Click on Browse and ensure you select:

VFS Connections > SMB > Pentaho/design-tools/data-integration/samples/transformations/files/sales_data.csv

  1. Add the path.

  1. Click on Content tab & configure with the following settings:

  1. Click on Fields tab & click on 'Get Fields'

  1. Preview the rows.

  1. Click OK.

Add the other steps to format / rename some fields, before output as a .txt in the same directory as your Transformation.

In PDI, you can add a VFS connection and then reference that connection whenever you want to .

access files or folders on your Virtual File System
jcifs 2.1.39 javadoc (org.codelibs)
Link to latest jcifs
Logo
SMB Server
Connect to public share
Connect as: Anonymous
public-share
Connect to /pentaho/home
Connect as 'Registered' user
Test connection
tr_SMB_File_Retrieval
Content
Get Fields
Preview rows