GenAI
Generative artificial intelligence (GenAI) can create certain types of images, text, videos, and other media in response to prompts ..
Last updated
Generative artificial intelligence (GenAI) can create certain types of images, text, videos, and other media in response to prompts ..
Last updated
So does it work?
Users send Messages to the Thread, which the Assistant then processes.
The framework uses a Thread to maintain the context of a conversation.
Each interaction is added to the Thread as a Message.
Assistants can work with uploaded files, analyzing and referencing them in responses.
The framework maintains state across interactions, allowing for complex, multi-turn conversations.
The Assistant generates responses based on the conversation history and its capabilities.
Responses are generated asynchronously, allowing for handling of long-running tasks.
Assistants can produce various types of output, including text, code, or structured data.
Developers can fine-tune the Assistant's behavior through detailed instructions and model selection.
The HTML Parser is a utility plugin for Pentaho Data Integration (PDI) that extracts desired text from HTML or XML files. Useful for cleaning data for natural language processing tasks like sentiment analysis and SEO keyword analysis.
• Accepts input from both data streams and files
• Supports parsing using Xpath expressions or CSS selectors
• Can process single files or multiple inputs from a stream
• Compatible with local and virtual file systems
The plugin utilizes jsoup, a Java library, that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and Xpath selectors.
The step is located in the Input folder.
XPath (XML Path Language) is a query language for selecting nodes from an XML or HTML document. While Jsoup doesn't natively support XPath, we can use a combination of Jsoup and Java's built-in XPath capabilities to achieve this.
Here's an overview of some common XPath syntax:
/
- Selects from the root node
//
- Selects nodes anywhere in the document
.
- Selects the current node
..
- Selects the parent of the current node
@
- Selects attributes
[] -
Used for predicates (conditions)
Some examples:
//div
- Selects all div elements in the document
//div[@class='content']
- Selects all div elements with class 'content'
//h1/text()
- Selects the text content of all h1 elements
//div[@class='content']/p
- Selects all p elements that are direct children of div elements with class 'content'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>XPath Example Page</title>
</head>
<body>
<header>
<h1 id="main-title">Welcome to Our Website</h1>
<nav>
<ul>
<li><a href="#home">Home</a></li>
<li><a href="#about">About</a></li>
<li><a href="#contact">Contact</a></li>
</ul>
</nav>
</header>
<main>
<section id="featured-articles">
<h2>Featured Articles</h2>
<article>
<h3>Article 1</h3>
<p class="content">This is the content of article 1.</p>
<span class="author">By John Doe</span>
</article>
<article>
<h3>Article 2</h3>
<p class="content">This is the content of article 2.</p>
<span class="author">By Jane Smith</span>
</article>
</section>
<section id="latest-news">
<h2>Latest News</h2>
<ul>
<li>News item 1</li>
<li>News item 2</li>
<li>News item 3</li>
</ul>
</section>
</main>
<footer>
<p>© 2024 Our Website. All rights reserved.</p>
</footer>
</body>
</html>
The data source is referenced in a path.
Open the following transformation:
C:/Projects/genai/html/HTML Parser - Xpath.ktr
~/Projects/genai/html/HTML Parser - Xpath.ktr
Double-click on the hp: html and configure with the following settings:
Leaving Xpath field blank will result in all tags being removed and all the content returned.
3. RUN and preview the results.
These XPath queries will help you navigate and extract specific content from the homepage.html
Select the main title:
//h1[@id='main-title']
Select all navigation links:
//nav//a
Select all article titles (h3 elements within articles):
//article/h3
Select all paragraph content within articles:
//article/p[@class='content']
Select all author names:
//span[@class='author']
Select the latest news items:
//section[@id='latest-news']//li
Select the footer text:
//footer/p/text()
Select all section titles (h2 elements that are direct children of section elements):
//section/h2
Select the second article:
(//article)[2]
Select all elements with a class attribute:
//*[@class]
The data source is referenced as a filepath in a datastream field.
Enable the hop between: dg: filepath from stream -> hp: parse html xpath.
Disable the hop between: Data Grid -> hp: parse html xpath
dg: html from stream -> hp: parse html xpath
Double-click on the hp: html and configure with the following settings:
RUN and preview the results.
The data source is referenced as <html> in a data stream field.
Pentaho's data streams often use binary fields to handle various types of data, including large text objects like HTML. By using binary datum, you ensure that the entire HTML content is treated as a single, uninterpreted chunk of data within the Pentaho pipeline - represented as 0 or 1.
Storing the HTML as binary datum allows you to pass the raw content through various steps in your Pentaho transformation without Pentaho trying to interpret or modify the HTML prematurely.
Enable the hop between: dg: html from stream -> hp: parse html xpath.
Disable the hop between: dg: filepath from stream -> hp: parse html xpath
Data Grid -> hp: parse html xpath
Double-click on the hp: html and configure with the following settings:
RUN and preview the results.
CSS selectors are powerful tools for targeting specific HTML elements, and they're used not only for styling but also for selecting elements when extracting data from HTML documents.
Below are some examples of the syntax used to extract HTML snippets
a) Element Selector:
p /* Selects all <p> elements */
div /* Selects all <div> elements */
b) Class Selector:
.highlight /* Selects elements with class="highlight" */
p.highlight /* Selects <p> elements with class="highlight" */
c) ID Selector:
#header /* Selects the element with id="header" */
d) Universal Selector:
* /* Selects all elements */
a) Descendant Selector (space):
div p /* Selects all <p> elements inside <div> elements */
b) Child Selector (>):
ul > li /* Selects all <li> elements that are direct children of <ul> */
c) Adjacent Sibling Selector (+):
h1 + p /* Selects the first <p> element immediately after an <h1> */
d) General Sibling Selector (~):
h1 ~ p /* Selects all <p> elements that are siblings of <h1> */
a) [attribute]:
[type] /* Selects elements with a type attribute */
b) [attribute="value"]:
[type="text"] /* Selects elements with type="text" */
c) [attribute~="value"]:
[class~="highlight"] /* Selects elements with class containing "highlight" as a whole word */
d) [attribute^="value"]:
[href^="https"] /* Selects elements with href starting with "https" */
e) [attribute$="value"]:
[href$=".pdf"] /* Selects elements with href ending with ".pdf" */
f) [attribute*="value"]:
[href*="example"] /* Selects elements with href containing "example" */
a:first-child /* Selects every <a> element that is the first child of its parent */
p:last-child /* Selects every <p> element that is the last child of its parent */
li:nth-child(2n) /* Selects every even <li> element */
input:not(:checked) /* Selects all unchecked input elements */
div.highlight, p.important /* Selects <div> with class "highlight" and <p> with class "importa
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>TechGadgets - Your Electronics Store</title>
</head>
<body>
<header id="main-header">
<h1>TechGadgets</h1>
<nav>
<ul>
<li><a href="#home">Home</a></li>
<li><a href="#products">Products</a></li>
<li><a href="#about">About</a></li>
<li><a href="#contact">Contact</a></li>
</ul>
</nav>
</header>
<main>
<section id="featured-products">
<h2>Featured Products</h2>
<div class="product">
<h3>Smartphone X</h3>
<p class="description">The latest smartphone with advanced features.</p>
<span class="price">$999</span>
</div>
<div class="product">
<h3>Laptop Pro</h3>
<p class="description">Powerful laptop for professionals.</p>
<span class="price">$1499</span>
</div>
</section>
<section id="about">
<h2>About Us</h2>
<p>TechGadgets is your one-stop shop for all electronics needs.</p>
</section>
<section id="newsletter">
<h2>Subscribe to Our Newsletter</h2>
<form>
<input type="email" name="email" placeholder="Enter your email">
<button type="submit">Subscribe</button>
</form>
</section>
</main>
<footer>
<p>© 2024 TechGadgets. All rights reserved.</p>
</footer>
</body>
</html>
The data source is referenced in a path.
Open the following transformation:
C:/Projects/genai/html/HTML Parser - CSS.ktr
~/Projects/genai/html/HTML Parser - CSS.ktr
Double-click on the hp: html and configure with the following settings:
Leaving CSS field blank will result in all tags being removed and all the content returned.
RUN preview the results.
These CSS queries will help you navigate and extract specific content from the landingpage.html
Select the main title:
h1
Select all navigation links:
nav a
Select all product titles:
.product h3
Select all product descriptions:
.product .description
Select all product prices:
.product .price
Select the "About Us" section:
#about
Select the newsletter form:
#newsletter form
Select all section titles (h2 elements):
main h2
Select the footer text:
footer p
Select all elements with a class of "product":
.product
The data source is referenced as a filepath in a datastream field.
Enable the hop between: dg: filepath from stream -> hp: parse html xpath.
Disable the hop between: Data grid -> hp: parse html css
dg: html from stream -> hp: parse html css
Double-click on the hp: html and configure with the following settings:
RUN preview results.
The data source is referenced as <html> in a data stream field.
Pentaho's data streams often use binary fields to handle various types of data, including large text objects like HTML. By using binary datum, you ensure that the entire HTML content is treated as a single, uninterpreted chunk of data within the Pentaho pipeline - represented as 0 or 1.
Storing the HTML as binary datum allows you to pass the raw content through various steps in your Pentaho transformation without Pentaho trying to interpret or modify the HTML prematurely.
Enable the hop between: dg: html from stream -> hp: parse html css.
Disable the hop between: dg: filepath from stream -> hp: parse html css
Data grid -> hp: parse html css
Double-click on the hp: html and configure with the following settings:
RUN preview results.
Apache Tika is a content analysis toolkit that extracts text, metadata, and language from a variety of file formats. It's commonly used in data processing to prepare data for further analysis.
• Supports a wide range of document formats including PDFs, Word documents, and HTML files.
• Extracts metadata such as author, title, creation date, and language.
• Can be integrated into larger data processing pipelines for automated content extraction.
• Facilitates full-text search indexing and content classification.
The step is located in the Input folder.
The data source is a word document.
The document type is referenced in a datastream field.
The old oak tree stood sentinel at the edge of the meadow, its gnarled branches reaching skyward like ancient fingers grasping at clouds. Generations had passed beneath its sprawling canopy, each leaving whispered secrets in its bark. A gentle breeze rustled through its leaves, carrying the scent of wildflowers and distant rain. Nearby, a babbling brook wound its way through moss-covered stones, its crystalline waters reflecting the dappled sunlight filtering through the forest canopy.
A family of deer cautiously approached the water's edge, their ears twitching at every sound. In the distance, a woodpecker's rhythmic tapping echoed through the trees, nature's own percussion. As the sun began its slow descent, the meadow came alive with the soft glow of fireflies, their bioluminescent dance a magical display against the deepening twilight. A lone owl hooted softly, heralding the arrival of night and all its mysterious inhabitants. The air grew cooler, and dew began to form on blades of grass, each droplet a miniature world reflecting the stars above.
In this timeless moment, the boundary between earth and sky seemed to blur, and one could almost believe in the old tales of fairies and woodland spirits. As darkness settled fully over the land, the oak tree stood as it always had, a silent guardian of the forest's secrets, its roots deep in the earth, its crown brushing the heavens.
Open the following transformation:
C:/Projects/genai/tika/Read Unstructured Document- Word Doc.ktr
~/Projects/genai/tika/Read Unstructured Document- Word Doc.ktr
Double-click on the Read Unstructured Document step and configure with the following settings:
RUN and preview the results.
The data source is a password protected PDF document.
The document type is referenced in a datastream field.
Open the following transformation:
C:/Projects/genai/tika/Read Unstructured Document- Password PDF.ktr
~/Projects/genai/tika/Read Unstructured Document- Password PDF.ktr.
Double-click on the Read Unstructured Document step and configure with the following settings:
RUN and preview the results.
Multiple documents are referenced as the data source.
The Javascript step is used to add the PDF password.
Open the following transformation:
C:/Projects/genai/tika/Read Unstructured Document- Stream Multiple Files.ktr
~/Projects/genai/tika/Read Unstructured Document- Stream Multiple Files.ktr.
Double-click on the Javascript: Add password column.
A data stream field: filepass is associated with the password: qweasd
Double-click on the Read Unstructured Document step and configure with the following settings:
RUN and preview the results.
Base64 is a binary-to-text encoding scheme that represents binary data in an ASCII string format. It's widely used for transmitting data over media that are designed for textual data.
The BASE64 step is located in the Transform folder.
Consider the sentence Hi, where the \n represents a newline. The first step in the encoding process is to obtain the binary representation of each ASCII character. This can be done by looking up the values in an ASCII-to-binary conversion table.
ASCII uses 8 bits to represent individual characters, but Base64 uses 6 bits. Therefore, the binary needs to be broken up into 6-bit chunks.
Finally, these 6-bit values can be converted into the appropriate printable character by using a Base64 table.
Since Base64 uses 24-bit sequences, padding is needed when the original binary cannot be divided into a 24-bit sequence. You have probably seen this type of padding before represented by printed equal signs (=). For example, Hi without a newline is represented by only two 8-bit ASCII characters (for a total of 16 bits). Padding is removed by the Base64 encoding schema when data is decoded.
Base64 is not necessarily used to protect information. It has the advantage that it can convert mostly any type of byte encoding into a human-readable ASCII.
Open the following transformation.
C:/Projects/genai/base64/Base64 Encode.ktr
~/Projects/genai/base64/Base64.ktr.
Double-click on the Base64 Encoder step and configure with the following settings:
Check the Select values step.
RUN and preview the results.
The Data grid holds the RAW Text that will be encoded.
Enable the hop between: Data grid - Raw Text Input -> Base64 Encoder.
Disable the hop between: Data grid -> Base64 Encoder
Get file names images -> Base64 Encoder
Double-click on the Data grid -Raw Input Text step - Data tab
Check the Select values.
RUN and preview the results
Encode multiple files.
Enable the hop between: Get file names - Images -> Base64 Encoder
Disable the hop between: Data grid - Raw Text Input -> Base64 Encoder.
Data grid -> Base64 Encoder
Double-click on the Get File names - Images step.
Double-click on the Base64 Encoder and configure with the following settings:
Check the Select values.
RUN and preview the results.
Pentaho GenAI is an extension of the Pentaho Data Integration that incorporates generative AI capabilities. It aims to enhance data workflows processes by leveraging large language models and other AI technologies.
OpenAI released an API platform that enables the creation of 'assistants' that can perform a wide range of tasks:
Natural Language Querying: Users can ask questions or provide prompts to large language models (LLMs) like OpenAI and Azure OpenAI, allowing for natural language interaction with data and systems.
Document Analysis: The plugin supports attaching documents for LLMs to process, enabling users to analyze and extract insights from text files and related documents.
Sentiment Analysis: The plugin can, for example, be used to determine the sentiment of text data, such as tweets.
Log Analysis: Process and analyze log files, potentially for troubleshooting or identifying patterns.
Structured Data Generation: The plugin supports generating responses in both text and JSON formats, allowing for the creation of structured data from natural language inputs.
Data Extraction and Transformation: The plugin can be used within Pentaho Data Integration (PDI) workflows, assisting in extracting and transforming data as part of larger ETL processes.
Question Answering: The plugin supports using document embeddings to efficiently answer multiple questions about a document(s), making it useful for information retrieval and FAQ-style applications.
Prompt Engineering: Users can create structured templates and use PDI environment variables for dynamic prompt generation, allowing for flexible and customizable interactions with LLMs.
Moderation and Content Filtering: The plugin includes options for response moderation, which can be used to filter / flag potentially harmful or inappropriate content.
Let's start exploring some simple chat scenarios:
• Enter the prompt directly.
• Pass the prompt and 'role' in data stream fields.
• Configure the step to use your own OpenAI account details.
The step is located in the AI folder.
Enable the hop between: Data Grid -> AI Chat.
Disable the hop bewteen: User Input -> AI Chat.
Open the following transformation:
C:/Projects/genai/ai chat/.ktr
~/Projects/genai/html/HTML Parser - Xpath.ktr
Double-click on the hp: html and configure with the following settings:
Double-click on the AI Chat step and configure with the following settings:
Run Instruction for LLM
Role-playing with Large Language Models (LLMs), such as ChatGPT, is an emerging field that explores the interaction between AI and creative, narrative-driven experiences. It leverages the advanced capabilities of LLMs to simulate human-like dialogue and human behavior within a role-playing context.
This process enables the AI to engage in dynamic conversations, mimic various characters, and respond to user inputs in a manner that aligns with the character’s predefined traits and narrative context, making use of powerful computation of large corpus of text data. In this way, this role-playing technique enhances its efficiency in tasks that require specific skills or knowledge, such as acting like a historian or providing historical facts and analyses.
Click on the Model tab.
The temperature value ranges from 0 to 2, with lower values indicating greater determinism and higher values indicating more randomness.
The moderations endpoint is a tool you can use to check whether text is potentially harmful. Developers can use it to identify content that might be harmful and take action, for instance by filtering it.
RUN and preview the result.
You can send multiple questions to ChatGPT.
However, there are limits to the number of requests an LLM model can accept. The response will fail if the threshold limit is reached.
Disable the hop between: Data Grid -> AI Chat.
Enable the hop bewteen: User Input -> AI Chat.
Double-click on the User Input step and the Data tab.
Double-click on the AI Chat step and configure with the following settings:
RUN and preview result.
The 'pipeline' configuration is the same as the previous scenario.
You will require to enter your own OpenAI key.
Double-click anywhere on the canvas to configure the Parameters.
Enter your own OpenAI Key.
Double-click on the AI Chat step and configure with the following settings.
RUN and preview the result. Should be the same as the previous scenario ..!!
RAG (Retrieval-Augmented Generation) is a technique in artificial intelligence that combines information retrieval with text generation. It's particularly useful for keeping AI systems updated without constant retraining and for providing responses grounded in specific, retrievable facts.
So .. in the data folder you will find a story about Charlie the happy go lucky carrot who, with his friends, lives in Veggeville ..
If you don't attch the document then the response will ask for more information as it can't place Charlie in any context.
Enable the hop between: Data Grid -> AI Chat.
Double-click on the AI Chat step and configure with the following settings:
Click on the Embedding tab.
An embedding model converts words, phrases, or other data into numerical vectors. These vectors serve as a bridge between raw data and machine learning algorithms by creating meaningful, computable representations of complex information.
RUN and preview the result.
Let's change the embbedding to store to WRITE to persist the results as a file -
openai-embedding-store.json.
This file is particularly useful in RAG (Retrieval-Augmented Generation) systems, where quick access to embeddings is crucial for efficient information retrieval.
Double-click on the AI Chat step and then on the Embedding tab.
Configure with the following settings.
RUN & check that the embedding has been stored.
Embedding - READ
Now that a Veggeville embedding has been created, we can ask questions leveraging the vector store: openai-embeeding-store.json
Double-click on the AI Chat step and configure with the following settings:
Ensure you select the Attach Document(s) option - enables the embedding options.
Click on the Embedding tab and confiure with the following settings:
RUN and preview the result.
A prompt is essentially the input given to an AI model to elicit a desired output or behavior. It can range from simple questions to complex instructions or examples.
Prompt engineering is the art and science of crafting these inputs to optimize the AI's performance for specific tasks. This involves carefully selecting words, providing context, and structuring the prompt to guide the model towards producing the most accurate, relevant, and useful responses.
Double-click anywhere on the canvas to set the parameters.
Double-click on the Chat AI step and configure with the following settings:
RUN and preview the result.
With a little prompt engineering, the response can populate a 'template'.
Double-click on the Chat AI step and configure with the following settings:
Create a recipe for a ${DISHTYPE} with the following ingredients: ${INGREDIENTS}.
Structure your answer in the following way:
Recipe name: ...
Description: ...
Preparation time: ...
Required ingredients:
- ...
- ...
Instructions:
- ...
- ...
Respond in JSON format.
RUN and preview the result.
Let's run through some Use Cases:
Sentiment Analysis - Determine the sentiment of a tweet: Positive, Neutral, Negative.
Log Analysis - Analyze multiple log files for any errors. The errors are hopefully resolved by AI Chat with the results written to a CSV file.
Analyzes multiple log files to identify errors, then using AI Chat to provide with a resolution. The generated result is in JSON format and the processed output is stored as a CSV file.
Open the following transformation:
C:/Projects/genai/aichat/Usecase - Log Analysis.ktr
~/Projects/genai/aichat/Usecase - Log Analysis.ktr
Double-click on the hp: html and configure with the following settings:
The Prompt has been engineered to analyze log files identifying errors. The errors are resolved using
Double-click on the AI Chat step to view the settings.
Analyze the log file from the stream and identify the issue. Once the issueis identified, respond with possible resolutions to fix the issue. Include the date (IssueDate) of when the issue occurred.
If no issue is found, then respond as "No Issues found" and Resolution as "No Resolution suggested. Log Looks fine".
Reply the answer in the below JSON template:
{
"Issue" : "...",
"IssueDate": "...",
"Resolution" : "..."
}
Determines location the data source:
File - Browse and enter the path to the data source
Stream - the data (or reference) is being passed in a data stream field. In this workshop the paths to the log files are being passed from the previous step in the filename data stream field.
Click on the Model tab.
The temperature value ranges from 0 to 2, with lower values indicating greater determinism and higher values indicating more randomness.
The moderations endpoint is a tool you can use to check whether text is potentially harmful. Developers can use it to identify content that might be harmful and take action, for instance by filtering it.
Click on the Embedding tab.
Enter your embedding model and whether you want to create and persist in a file or keep it as default In-Memory.
Take a look at the RAG workshop.
Click on the Response tab.
The response is held as a JSON object in the result field.
The response JSON object needs to be parsed to create our data stream fields.
Double-click on the Process generated JSON result step.
Click on the Fields tab.
Take a look at the result field (preview data in AI Chat step) to determine the structure of the JSON object / array.
This should reflect the structure of the prompt template.
{
"Issue": "Errors initializing Table output step and executing query job due to Simba driver limitations.",
"IssueDate": "2023-09-07",
"Resolution": "Use the GBQ Bulk Loader step instead of the regular Table output step to create the table and handle data inserts."
}
...
$
Root object
$ returns the whole JSON structure
.
Child operator; it's used to access different levels of the JSON structure
$..Issue returns the Issue
RUN and preview result - Output - log-analysis.csv
The sentiment_analysis
function analyzes the overall sentiment of the discussion. It considers the tone, the emotions conveyed by the language used, and the context in which words and phrases are used.
Double-click on the AI Chat step.
Again a bit of prompt engineering enables the tweets from the Tweet data stream field.
RUN and preview result - Text file output.
One of the most common problems for large, high-growth businesses is dealing with increasing volumes and varieties of financial data - more specifically, extracting the data from PDF documents such as quarterly reports, balance sheets, bank statements and cash flow statements.
Without a solution to handle these data extraction tasks at scale, operations quickly become error-prone and time-consuming. This is why a growing number of organisations are now implementing AI data extraction tools.
In this use case we're going to extract sales data from PDF reports, using Pentaho Data Integration.
Review the main steps of the
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
The previous step - Get file names returns the paths to the PDFs.
Double-click on the Read Unstructured Document step to view settings.
Based on the filenames passed in the filename field, the pdf contents are extracted and associated with the pdf_file_contents data stream field.
Double-click on the AI Chat step to view the settings.
Under the Message tab, a templated prompt provides the model the instructions to extract the sales data, from the data stream field pdf_file_contents which has the extracted pdf.
Click on the Model tab.
Based on parameters set in the transformation properties, the OpenAI model, API Key and Temeprature, are set.
Click on Embedding tab.
Based on parameters set in the transformation properties, the Embedding model is set.
Click on the Response tab.
The Response is returned as a JSON object associated with the generated_response data stream field.
This is where we have to put our thinking caps on ..
In the generated_response data stream field the SaleYear & SaleMonth sales data is defined as JSON objects with an array for each:
ProductCategory
UnitSold
Revenue
This will have to be a 2 stage process.
Stage 1 - is to extract the SaleYear & SaleMonth.
Record: SaleYear: SaleMonth:
2024 August
2024 July
Stage 2 - iterates through the SalesPerformanceByProduct[array] for each Stage 1 record.
So .. on the first iteration SaleYear = 2024 SaleMonth = August
ProductCategory: UnitSold: Revenue:
Eco-Gear 1,500 $150,000
Smart Home Devices 1,200 $180,000
Fitness Equipment 960 $96,000
Accessories 1,200 $74,000
This is repeated for Record 2 ..
Conclusion
August 2024 was a positive month for Acme Corporation, marked by notable growth in sales and
customer retention. However, addressing regional disparities and capitalizing on successful
product lines will be crucial for sustaining this growth momentum in the coming months.
{
"SaleYear": "2024",
"SaleMonth": "August",
"SalesPerformanceByProduct": [
{
"ProductCategory": "Eco-Gear",
"UnitSold": "1,500",
"Revenue": "$150,000"
},
{
"ProductCategory": "Smart Home Devices",
"UnitSold": "1,200",
"Revenue": "$180,000"
},
{
"ProductCategory": "Fitness Equipment",
"UnitSold": "960",
"Revenue": "$96,000"
},
{
"ProductCategory": "Accessories",
"UnitSold": "1,200",
"Revenue": "$74,000"
}
]
}
Conclusion
July 2024 was a solid month for Acme Corporation, characterized by successful product launches
and moderate sales growth. While the overall performance was positive, addressing regional
disparities and improving competitive positioning in certain product categories will be essential for
sustaining growth in the coming months.
{
"SaleYear": "2024",
"SaleMonth": "July",
"SalesPerformanceByProduct": [
{
"ProductCategory": "Eco-Gear",
"UnitSold": "1,250",
"Revenue": "$125,000"
},
{
"ProductCategory": "Smart Home Devices",
"UnitSold": "1,100",
"Revenue": "$165,000"
},
{
"ProductCategory": "Fitness Equipment",
"UnitSold": "900",
"Revenue": "$105,000"
},
{
"ProductCategory": "Accessories",
"UnitSold": "1,250",
"Revenue": "$60,000"
}
]
}
...
It would be interesting to give this a go using: Hierachical Data Type (HDT) EE plugin
The Select values step can perform all the following actions on fields in the data stream:
Select values
Remove values
Rename values
Change data types
Configure length and precision of values
Double-click on the Select values step.
RUN and preview the result.
The Microsoft Excel Writer step writes incoming rows from PDI out to an MS Excel file and supports both the .xls and .xlsx file formats.
The .xls files use a binary format which is better suited for simple content, while the .xlsx files use the Open XML format which works well with templates since it can better preserve charts and miscellaneous objects.
Double-click on Write the Sales Forecast to Excel.
A 'Sales Report AI Generated' is created with an xlsx extension.
If the file exists, its replaced with 'new output file'.
The data set is written to active - Sheet 1.
Click on the Content tab.
The beginning data set is written to cell A1
Any existing cells are overwritten.
Header cells are wriiten.
Get Fields - retrieves the data stream fields.
RUN and open the file:
~/Projects/genai/Use Case - Analyzing Financial Reports/data/Sales Report AI Generated.xlsx