详解kettle之UserDefinedJavaClass步骤(二)
详解User Defined Java Class步骤(二) kettle中的user defined java class步骤,也称UDJC步骤,从4.0版本就有,功能非常强大,无所不能;可以在其中写任意代码,却不影响效率。本文将详细介绍在不同场景中用示例展示如果使用该步骤,由于内容非常多,便于
详解User Defined Java Class步骤(二)
kettle中的“user defined java class”步骤,也称UDJC步骤,从4.0版本就有,功能非常强大,无所不能;可以在其中写任意代码,却不影响效率。本文将详细介绍在不同场景中用示例展示如果使用该步骤,由于内容非常多,便于阅读方便,把内容分成三部分,请完整看完全部内容,示例代码在这里下载.
如果没有从第一部分开始,请访问第一部分。
使用步骤参数(Step Parameter)
如果你写了一段代码,如果想让带更通用,步骤参数这时就能用到;在示例中,我们提供一个正则表达式和一个字段的名称,该步骤检查参数对应的字段是否匹配正则表达式,如果是返回结果为1,反之为0。
代码如下:
import java.util.regex.Pattern;
private Pattern p = null;
private FieldHelper fieldToTest = null;
private FieldHelper outputField = null;
public boolean processRow(StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// prepare regex and field helpers
if (first){
first = false;
String regexString = getParameter("regex");
p = Pattern.compile(regexString);
fieldToTest = get(Fields.In, getParameter("test_field"));
outputField = get(Fields.Out, "result");
}
r= createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
String test_value = fieldToTest.getString(r);
// test for match and write result
if (p.matcher(test_value).matches()){
outputField.setValue(r, Long.valueOf(1));
}
else{
outputField.setValue(r, Long.valueOf(0));
}
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
getParameter()方法返回在ui界面中定义的参数对应值内容,当然参数的值也可能是kettle的变量。把变量作为参数是使用变量通常的做法。我们可以在步骤的xml代码中手工搜索到变量。
示例的转换名称是:parameter.ktr.
消息步骤(Info Steps)使用
有时需要合并多个输入步骤,可能赋予不同的角色,就如流查询步骤。消息步骤用来提供查询,其数据行不通过getRow()方法返回。在udjc步骤中非常容易使用。在udjc步骤的ui界面消息步骤选项卡中定义,通过getRowsFrom()方法返回对应的值。
示例转换中使用消息步骤接收一组正则表达式,用其测试主流数据中的一个字段是否匹配,如果任何一个表达式匹配,结果字段设置为1.如果没有任何匹配,则结果为0,同时附加输出匹配的表达式。
代码如下:
import java.util.regex.Pattern;
import java.util.*;
private FieldHelper resultField = null;
private FieldHelper matchField = null;
private FieldHelper outputField = null;
private FieldHelper inputField = null;
private ArrayList patterns = newArrayList(20);
private ArrayList expressions = newArrayList(20);
public boolean processRow(StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// prepare regex and field helpers
if (first){
first = false;
// get the input and output fields
resultField = get(Fields.Out, "result");
matchField = get(Fields.Out, "matched_by");
inputField = get(Fields.In, "value");
// get all rows from the info stream andcompile the regex field to patterns
FieldHelper regexField = get(Fields.Info, "regex");
RowSet infoStream = findInfoRowSet("expressions");
Object[] infoRow = null;
while((infoRow = getRowFrom(infoStream)) != null){
String regexString = regexField.getString(infoRow);
expressions.add(regexString);
patterns.add(Pattern.compile(regexString));
}
}
// get the value of the field to check
String value = inputField.getString(r);
// check if any pattern matches
int matchFound = 0;
String matchExpression = null;
for(int i=0;i if (((Pattern) patterns.get(i)).matcher(value).matches()){ matchFound = 1; matchExpression = (String)expressions.get(i); break; } } // write result to stream r= createOutputRow(r, data.outputRowMeta.size()); resultField.setValue(r, Long.valueOf(matchFound)); matchField.setValue(r, matchExpression); // Send the row on to the next step. putRow(data.outputRowMeta, r); return true; } 调用findInfoRowSet()方法,返回在udjc步骤的消息步骤中定义的名称对应的输入步骤的整个行集内容。从行集内容中读取某行与从主数据流中去某行不同,通过调用getRowFrom(),并显示指明那个行集。 示例转换的名称为info_steps.ktr. 使用目标步骤(Target Steps) 使用udjc步骤有时可能需要指定行集流转到不同的目标步骤。通过调用putRow()方法,并传递一个目标步骤作为参数。我们需要在udjc步骤的ui界面的目标步骤中定义所有可能的目标步骤,下面示例中随机分发行数据到不同弄的目标步骤。 findTargetRowSet()方法返回在ui界面中定义的目标步骤行集,并作为putRowto()方法的参数.示例转换的名称为target_steps.ktr. 代码如下: import java.util.regex.Pattern; import java.util.*; private RowSet lowProbStream = null; private RowSet highProbStream = null; public boolean processRow(StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException { Object[]r = getRow(); if(r == null) { setOutputDone(); returnfalse; } //prepare regex and field helpers if (first){ first = false; lowProbStream= findTargetRowSet("low_probability"); highProbStream= findTargetRowSet("high_probability"); } //Send the row on to the next step. if(Math.random()
putRowTo(data.outputRowMeta, r,lowProbStream); } else{ putRowTo(data.outputRowMeta, r,highProbStream); } returntrue; } 更多内容请查看第三部分;

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Windows operating system is one of the most popular operating systems in the world, and its new version Win11 has attracted much attention. In the Win11 system, obtaining administrator rights is an important operation. Administrator rights allow users to perform more operations and settings on the system. This article will introduce in detail how to obtain administrator permissions in Win11 system and how to effectively manage permissions. In the Win11 system, administrator rights are divided into two types: local administrator and domain administrator. A local administrator has full administrative rights to the local computer

Detailed explanation of the mode function in C++ In statistics, the mode refers to the value that appears most frequently in a set of data. In C++ language, we can find the mode in any set of data by writing a mode function. The mode function can be implemented in many different ways, two of the commonly used methods will be introduced in detail below. The first method is to use a hash table to count the number of occurrences of each number. First, we need to define a hash table with each number as the key and the number of occurrences as the value. Then, for a given data set, we run

Detailed explanation of division operation in OracleSQL In OracleSQL, division operation is a common and important mathematical operation, used to calculate the result of dividing two numbers. Division is often used in database queries, so understanding the division operation and its usage in OracleSQL is one of the essential skills for database developers. This article will discuss the relevant knowledge of division operations in OracleSQL in detail and provide specific code examples for readers' reference. 1. Division operation in OracleSQL

Detailed explanation of the remainder function in C++ In C++, the remainder operator (%) is used to calculate the remainder of the division of two numbers. It is a binary operator whose operands can be any integer type (including char, short, int, long, etc.) or a floating-point number type (such as float, double). The remainder operator returns a result with the same sign as the dividend. For example, for the remainder operation of integers, we can use the following code to implement: inta=10;intb=3;

Detailed explanation of the usage of Vue.nextTick function and its application in asynchronous updates. In Vue development, we often encounter situations where data needs to be updated asynchronously. For example, data needs to be updated immediately after modifying the DOM or related operations need to be performed immediately after the data is updated. The .nextTick function provided by Vue emerged to solve this type of problem. This article will introduce the usage of the Vue.nextTick function in detail, and combine it with code examples to illustrate its application in asynchronous updates. 1. Vue.nex

PHP-FPM is a commonly used PHP process manager used to provide better PHP performance and stability. However, in a high-load environment, the default configuration of PHP-FPM may not meet the needs, so we need to tune it. This article will introduce the tuning method of PHP-FPM in detail and give some code examples. 1. Increase the number of processes. By default, PHP-FPM only starts a small number of processes to handle requests. In a high-load environment, we can improve the concurrency of PHP-FPM by increasing the number of processes

The modulo operator (%) in PHP is used to obtain the remainder of the division of two numbers. In this article, we will discuss the role and usage of the modulo operator in detail, and provide specific code examples to help readers better understand. 1. The role of the modulo operator In mathematics, when we divide an integer by another integer, we get a quotient and a remainder. For example, when we divide 10 by 3, the quotient is 3 and the remainder is 1. The modulo operator is used to obtain this remainder. 2. Usage of the modulo operator In PHP, use the % symbol to represent the modulus

Detailed explanation of Linux system call system() function System call is a very important part of the Linux operating system. It provides a way to interact with the system kernel. Among them, the system() function is one of the commonly used system call functions. This article will introduce the use of the system() function in detail and provide corresponding code examples. Basic Concepts of System Calls System calls are a way for user programs to interact with the operating system kernel. User programs request the operating system by calling system call functions
