Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration

Enrich Data

Enhance the quality of the data ..

Introduction

Data Enrichment is a value adding process, where external data from multiple sources is added to the existing data set to enhance the quality and richness of the data. This process provides more information of the product / service to the customer.

A common data enrichment process could, for example, correct likely misspellings or typographical errors in a database using precision algorithms. Following this logic, data enrichment could also add information to simple data tables.

Another way that data enrichment can work is in extrapolating data. Through methodologies such as fuzzy logic, engineers can produce more from a given raw data set. This and other projects can be described as data enrichment activities.

There are numerous data enhancement options available including:

  • Telephone & Fax numbers

  • Additional contact names

  • Residential or Business location

  • Standard Industrial Classification Codes (SIC’s) & ‘Market Sector’ codes

  • No. of employees

  • Small office /Home office (SoHo’s)

  • Household income /Age

  • Credit score

  • Financial information

  • Adding valuable geographic information and mapping (GIS) such as location analysis, distance calculations, spatial analysis, natural boundary analysis, and more

  • Enhancing data by classifying, segmenting, and aggregating customer data using advanced statistical methodologies such as factor, cluster, and conjoint analysis.


PreviousApache HadoopNextMerge

Last updated 1 month ago