Sunday, January 5, 2014

Study on Oracle Sequence & Data Stage Sequence



Study on Oracle Sequence & Data Stage Sequence



INTRODUCTION:


Sequence number generation is used to load the surrogate keys in the dimension table. This document presents details about the procedure used to generate sequence numbers in datastage. It also presents details about calling oracle sequences through datastage. Comparison of Oracle Sequence & Data Stage Sequence by using inbuilt routine is also made.
     

Generating Sequence By Using Data Stage


  1. KeyMgtGetNextValue:  

Data Stage has a SDK Sequences hash file, which resides in the Projects folder. It consists of two fields. One field is the sequence name and the other is the next value of the sequence.

KeyMgtGetNextValue is the sequence generator used in data stage.  This routine uses a SDK Sequences hash file to store the next available number. It reads the number once, and then increments and stores back the next value into the same file.

Here is the example explains the usage of this routine.

There is a source table (product_source) in oracle, which has product_id, product_desc, and product_cost as its attributes, which is used to load the data into target table (product_target) with product_key as its primary key.
Ø    Import the source and the target table definitions by using the Data Stage manager.
Ø    Insert an ODBC stage in the Data Stage Designer and load the column definitions from source table i.e.,  (product_source)
Ø    Insert an ODBC stage in the Data Stage Designer and load the structure of the target table (product_target).
Ø    Insert a TRANSFORMER stage in between source and the target and map all the columns from source to the target except the product_key in the target.
Ø  In the product_key column in the transformer stage right click in the Derivation and select the KeyMgtGetNextValue () Transform. Specify sequence name as an argument



Ø  Save this job and compile in the designer. It can be validated and run on the director.

Ø  The following output is obtained.  Product_target table is loaded as:

 
2.   Keymgtnextvalueconcurrent:
                                
                     This routine is also used to generate sequence numbers. The difference between the both is as follows:

Ø  KeyMgtNextValue routine will only work when the sequence is being accessed by one process at a time. So even if we change the name of the sequences the routine considers it as one sequence only.
Ø  KeyMgtNextValueConcurrent routine uses locking to allow multiple processes to access the same sequence. So if we use two different sequence names for different processes, two sets of sequence numbers are generated.


Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines with Different Names as an Argument

 KeyMgtNextValue Routine with different names in the same mapping.

 


The Outputs obtained are: product_target1


product_target2:



When the same job is run again, the output obtained is:  product_target1





product_target2



KeyMgtGetNextValueConcurrent with different names in the same mapping.





The output for this job is: Product_target1 is loaded as:


Product_target2 is loaded as:


When the same job is run again, the output obtained is:  product_target1


product_target2:


Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines with Same Name as an Argument

1.     KeyMgtNextValue with same sequence names



The Outputs obtained are: product_target1







product_target2:


When the same job is run again, the output obtained is:  product_target1:


product_target2



2.  KeyMgtNextValueConcurrent with same sequence names.



The Outputs obtained are: product_target1

product_target2:


When the same job is run again, the output obtained is:  product_target1


product_target2




Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines in the same mapping with different names


The output for this job is: product_target1 is loaded as:


Product_target2 is loaded as:


When the same job is run again, the output obtained is:  product_target1


product_target2:

Behavior of KeyMgtNextValue & KeyMgtNextValueConcurrent Routines in the same mapping with Same Names




The Outputs obtained are:   product_target1

product_target2:


When the same job is run again, the output obtained is:  product_target1


product_target2



Procedure For Resetting Of Data Stage Sequence


Resetting of Data Stage Sequence for each run of Data Stage Job


UPDATE SDKSequences USING DICT VOC SET F1 = '1' WHERE @ID = 'seq name';

Here ‘seq name’ is the name of the sequence that has to be reset. This command can be given by using a command stage or invoke it as an afterstage or beforestage subroutine by selecting execTCL option in the Designer
This Command resets the sequence at every run of the Data Stage Job

Here, KeyMgtNextValueConcurrent routine is invoked using stage variables in the transformer stage and the sequence created using the routine is reset.


By going on to stage properties of the stage variable (stagevar1), the command to reset the sequence s1 can be given.

Resetting of Data Stage Sequence for each cycle


It is also possible to have sequence numbers generated in cycles by changing the condition in the command.

UPDATE SDKSequences USING DICT VOC SET F1 = '1' WHERE @ID = 'seq name' AND F1=’max value +1’;

This command resets the sequence if it reaches the maximum value i.e. if sequence value reaches maximum (10000) it restarts from initial value i.e. 1

Note: SDK Sequences is a hash file, which resides in the Projects folder. It consists of two fields. One field is the sequence name and the other is the nextvalue of the sequence. So checking this value and altering will aid cyclic sequence generation.

DICT and VOC are files, which reside in the Engine. VOC file consists of details about how datastage executes the commands or functions. DICT file consists of all the information about the VOC file.

Generating Sequence By Using Oracle Sequence in Data Stage


                      Generation of sequence can also be done by calling oracle sequence.

There is a source table (product_source) in oracle, which has product_id, product_desc, and product_cost as its attributes, which is used to load the data into target table (product_target) with product_key as its primary key.
Ø    Import the source and the target table definitions by using the Data Stage manager.
Ø    Insert an OCI stage in the Data Stage Designer and load the column definitions from source table i.e.,  (product_source)
Ø    Insert an OCI stage in the Data Stage Designer and load the structure of the target table (product_target).
Ø  Insert a TRANSFORMER stage in between source and the target and map all the columns from source to the target except the product_key in the target
Ø  Insert an OCI stage along with the source which should also be connected to the transformer stage. This is the stage, which is going to generate the sequence which is used as a lookup. (This stage is seqkeygen)



           
Ø  Create a sequence in oracle. (Create sequence
                                                              Minvalue <>
                                                               Maxvalue <>
                                                               Start with <>
                                                                Increment by <>
                                                                Cache <>;
Ø  Select user-defined sql in the seqkeygen stage. Define the sql as “select . Nextval from dual. Create a column with integer as sql type.
Note:  Select the table name as dual in the General tab.



                             

Ø Match the columns in the transformer stage. Save the job and compile.
Ø Validate and run in the Run Director.

Advantages of using Oracle Sequences:


1.       While creating oracle sequence min value, max value, increment can be set by the user. When the source has greater number of records, partitioning is done to load it in the target. In this case we can set a minimum value from which the number generation has to start.
2.       There is a option in oracle sequence by which it can be decided as to how many values can be cached for faster access of the sequence. This improves the performance to a great extent.

Disadvantages of using Oracle Sequences:


  1. One disadvantage while using oracle sequence is that if loading in the target gets aborted due to any unexpectable reason the numbers are not exactly in sequence. Say we have given cache as 100. The operation has aborted while writing the 50th record, the next time the 51st record is loaded the sequence number would be 101 and not 51.
  2. One more disadvantage in oracle sequence is that if DataStage Sequence is used it may not mesh with the database sequences, particularly if there is another source inserting rows, or generating database sequence numbers.  This is because the sequence generator is housed deep within the repository.

Advantages of using DataStage Sequences:


  1. Performance is good when datastage sequence is used because the details of the sequence name and the last value is stored in a hash file thus accessing time is less. In oracle sequence every time it becomes necessary to communicate with the database which increases job run time.

Statistical details by using DataStage sequence and Oracle Sequence
SNO
Procedure used to generate sequence numbers
Time taken to generate 50000 numbers
1.
Usage of KeyMgtNextValueConcurrent
4min and 11sec
2.
Usage of KeyMgtNextValue
4min and 11sec
3.
Usage of Oracle Sequence
7min and 30sec


4. Creating sequence numbers by calling stored procedure in Oracle:

                  Sequence can also be generated by calling stored procedure from datastage.  Check for the illustration below where sequence numbers are generated by invoking the procedure.
The procedure created in oracle is
Create or replace procedure callseq(flag in varchar2) is
a number(4);
begin
     if flag='y' then
        for i in 1..5 loop
           select s1.nextval into a from dual;
           insert into seqtable values(a);
        end loop;
     end if;
end callseq;

Ø  Import the stored procedure in the manager by choosing Import -> Table definitions -> stored procedure definitions.
Ø  Create a mapping with an odbc stage as source with another odbc stage as target and a transformer stage in the middle.
Ø  Import a table in the source, which gives the value for the input parameter for the stored procedure.
Ø  In the target stage chose ‘call stored procedure’ in the update action and the stored procedure name in the dropdown list in general tab in inputs tab.



Ø  The columns tab consists of details about the in parameter of the procedure.

Ø  Now this job loads the table ‘seqtable”.

Procedure executes for each row.


   Reset Oracle Sequence:

                     It is also possible to reset the sequence in oracle.

                    The following procedure will reset the sequence to any value that is required.

CREATE OR REPLACE PROCEDURE reset_sequence (sequencename IN VARCHAR2
,startvalue in integer) as
curr_val INTEGER;
i number;
string varchar2(25);
BEGIN
EXECUTE IMMEDIATE 'alter sequence ' ||sequencename||' MINVALUE 0';
EXECUTE IMMEDIATE 'SELECT ' ||sequencename ||'.nextval FROM dual' INTO curr_val;
dbms_output.put_line('currval'||curr_val);
curr_val := curr_val - startvalue + 1;
if curr_val < 0
then
string := ' increment by ';
curr_val:= abs(curr_val);
else
string := ' increment by -';
end if;
dbms_output.put_line('string'||string);
EXECUTE IMMEDIATE 'alter sequence ' ||sequencename||string||curr_val ;
EXECUTE IMMEDIATE 'SELECT ' ||sequencename ||'.nextval FROM dual' INTO curr_val;
EXECUTE IMMEDIATE 'alter sequence ' ||sequencename||' increment by 1';
END reset_sequence;