Study on Oracle Sequence & Data Stage Sequence
INTRODUCTION:
Sequence number generation is used to load the
surrogate keys in the dimension table. This document presents details about the
procedure used to generate sequence numbers in datastage. It also presents
details about calling oracle sequences through datastage. Comparison of Oracle
Sequence & Data Stage Sequence by using inbuilt routine is also made.
Generating Sequence By Using Data Stage
- KeyMgtGetNextValue:
Data Stage has a SDK Sequences hash file, which resides in the
Projects folder. It consists of two fields. One field is the sequence name and
the other is the next value of the sequence.
KeyMgtGetNextValue
is the sequence generator used in data stage.
This routine uses a SDK Sequences hash file
to store the next available number. It reads the number once, and then
increments and stores back the next value into the same file.
Here is the example explains the usage of this routine.
There is a source table
(product_source) in oracle, which has product_id, product_desc, and
product_cost as its attributes, which is used to load the data into target
table (product_target) with product_key as its primary key.
Ø Save
this job and compile in the designer. It can be validated and run on the
director.
Ø The
following output is obtained. Product_target table is loaded as:
2. Keymgtnextvalueconcurrent:
This routine is also used to generate sequence numbers. The
difference between the both is as follows:
Ø
KeyMgtNextValue
routine will only work when the sequence is being accessed by one process at a
time. So even if we change the name of the sequences the routine considers it
as one sequence only.
Ø
KeyMgtNextValueConcurrent
routine uses locking to allow multiple processes to access the same sequence.
So if we use two different sequence names for different processes, two sets of
sequence numbers are generated.
Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines with Different Names as an Argument
KeyMgtNextValue
Routine with different names
in the same mapping.
The Outputs obtained are: product_target1
product_target2:
When the same job is run again, the output obtained is: product_target1
product_target2
KeyMgtGetNextValueConcurrent with different names in the same mapping.
The output for this job is: Product_target1 is loaded as:
Product_target2 is loaded as:
When the same job is run again, the output obtained is: product_target1
product_target2:
Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines with Same Name as an Argument
1.
KeyMgtNextValue
with same sequence names
The Outputs obtained are: product_target1

product_target2:

When the same job is run again, the output obtained is: product_target1:

product_target2

2. KeyMgtNextValueConcurrent with same sequence
names.

The Outputs obtained are: product_target1

product_target2:

When the same job is run again, the output obtained is: product_target1

product_target2

Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines in the same mapping with different names

The output for this job is: product_target1 is loaded as:

Product_target2
is loaded as:

When the same job is run again, the output obtained is: product_target1

product_target2:

Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines in the same mapping with Same Names

The Outputs obtained are:
product_target1

product_target2:

When the same job is run again, the output obtained is: product_target1

product_target2

Procedure For Resetting Of Data Stage Sequence
Resetting of Data Stage Sequence for each run of Data Stage Job
UPDATE SDKSequences USING DICT VOC SET F1 = '1' WHERE @ID = 'seq name';
Here ‘seq name’ is
the name of the sequence that has to be reset. This command can be given by
using a command stage or invoke it as an afterstage or beforestage subroutine
by selecting execTCL option in the
Designer
This Command resets
the sequence at every run of the Data Stage Job
Here, KeyMgtNextValueConcurrent routine is invoked using
stage variables in the transformer stage and the sequence created using the
routine is reset.

By going on to stage properties of the stage variable
(stagevar1), the command to reset the sequence s1 can be given.

Resetting of Data Stage Sequence for each cycle
It is also possible to have sequence numbers generated in
cycles by changing the condition in the command.
UPDATE SDKSequences
USING DICT VOC
SET F1 = '1' WHERE @ID = 'seq name' AND F1=’max value +1’;
Note: SDK Sequences
is a hash file, which resides in the Projects folder. It consists of two
fields. One field is the sequence name and the other is the nextvalue of the
sequence. So checking this value and altering will aid cyclic sequence
generation.
DICT and VOC are files, which reside in the Engine. VOC file consists of details about how
datastage executes the commands or functions. DICT
file consists of all the information about the VOC
file.
Generating Sequence By Using Oracle Sequence in Data Stage
Generation of
sequence can also be done by calling oracle sequence.
There is a source table
(product_source) in oracle, which has product_id, product_desc, and
product_cost as its attributes, which is used to load the data into target
table (product_target) with product_key as its primary key.
Ø Insert
a TRANSFORMER stage in between source and the target and map all the columns
from source to the target except the product_key in the target
Ø Insert
an OCI stage along with the source which should also be connected to the
transformer stage. This is the stage, which is going to generate the sequence
which is used as a lookup. (This stage is seqkeygen)

Ø Create
a sequence in oracle. (Create sequence
Minvalue <>
Maxvalue
<>
Start with <>
Increment by <>
Cache <>;
Ø Select
user-defined sql in the seqkeygen stage. Define the sql as “select . Nextval from dual . Create a
column with integer as sql type.
Note: Select
the table name as dual in the General tab.

Ø
Match the columns in the transformer stage. Save
the job and compile.
Ø
Validate and run in the Run Director.
Advantages of using Oracle Sequences:
1. While
creating oracle sequence min value, max value, increment can be set by the
user. When the source has greater number of records, partitioning is done to
load it in the target. In this case we can set a minimum value from which the
number generation has to start.
2. There
is a option in oracle sequence by which it can be decided as to
how many values can be cached for faster access of the sequence. This improves
the performance to a great extent.
Disadvantages of using Oracle Sequences:
- One disadvantage while using oracle sequence is that if loading in the target gets aborted due to any unexpectable reason the numbers are not exactly in sequence. Say we have given cache as 100. The operation has aborted while writing the 50th record, the next time the 51st record is loaded the sequence number would be 101 and not 51.
- One more disadvantage in oracle sequence is that if DataStage Sequence is used it may not mesh with the database sequences, particularly if there is another source inserting rows, or generating database sequence numbers. This is because the sequence generator is housed deep within the repository.
Advantages of using DataStage Sequences:
- Performance is good when datastage sequence is used because the details of the sequence name and the last value is stored in a hash file thus accessing time is less. In oracle sequence every time it becomes necessary to communicate with the database which increases job run time.
Statistical details by using
DataStage sequence and Oracle Sequence
SNO
|
Procedure used to generate sequence numbers
|
Time taken to generate 50000 numbers
|
1.
|
Usage of KeyMgtNextValueConcurrent
|
|
2.
|
Usage of KeyMgtNextValue
|
4min and 11sec
|
3.
|
Usage of Oracle Sequence
|
7min and 30sec
|

4. Creating sequence numbers by calling stored procedure in Oracle:
Sequence can also be generated by calling stored procedure from datastage. Check for the illustration below where sequence numbers are generated by invoking the procedure.The procedure created in oracle is
Create or replace procedure
callseq(flag in varchar2) is
a number(4);
begin
if flag='y' then
for i in 1..5 loop
select s1.nextval into a from dual;
insert into seqtable values(a);
end loop;
end if;
end callseq;
Ø Import
the stored procedure in the manager by choosing Import -> Table definitions
-> stored procedure definitions.
Ø Create
a mapping with an odbc stage as source with another odbc stage as target and a
transformer stage in the middle.
Ø Import
a table in the source, which gives the value for the input parameter for the
stored procedure.
Ø In
the target stage chose ‘call stored procedure’
in the update action and the stored procedure name in the dropdown list in general tab in inputs tab.

Ø The
columns tab consists of details about the in parameter of the procedure.

Ø Now
this job loads the table ‘seqtable”.
Procedure executes for each row.
Reset Oracle
Sequence:
It is also possible to reset the sequence in oracle.
The following procedure will reset the sequence to any value that is
required.
CREATE OR REPLACE PROCEDURE
reset_sequence (sequencename IN VARCHAR2
,startvalue in integer) as
curr_val INTEGER;
i number;
string varchar2(25);
BEGIN
EXECUTE IMMEDIATE 'alter sequence '
||sequencename||' MINVALUE 0';
EXECUTE IMMEDIATE 'SELECT '
||sequencename ||'.nextval FROM dual' INTO curr_val;
dbms_output.put_line('currval'||curr_val);
curr_val := curr_val - startvalue +
1;
if curr_val < 0
then
string := ' increment by ';
curr_val:= abs(curr_val);
else
string := ' increment by -';
end if;
dbms_output.put_line('string'||string);
EXECUTE IMMEDIATE 'alter sequence '
||sequencename||string||curr_val ;
EXECUTE IMMEDIATE 'SELECT '
||sequencename ||'.nextval FROM dual' INTO curr_val;
EXECUTE IMMEDIATE 'alter sequence '
||sequencename||' increment by 1';
END reset_sequence;
No comments:
Post a Comment