Friday, May 11th, 2012

Using partition by to write alternative SQL to get effdt and effseq items

Paritioning data is really useful for grouping data prior to data manipulation (as opposed to group by which does grouping after manipulation the data). This is because group by is grouping resulting data while partition by is actually partitioning  the data and returning the appropriate partitioned data. Simply said; a normal query is run and the results are retrieves by the Database, then Analytics are applied to the results and the Analytic function columns are computed.

This can be very usefull in peoplesoft when retrieving MAX(effdt) and MAX(effseq). Normally we would write a SQL like this to get the MAX(effdt) and MAX(effseq) for each EMPLID and EMPL_RCD.


SELECT A.EMPLID, A.EMPL_RCD, A.EFFDT, A.EFFSEQ
FROM PS_JOB A
WHERE ( A.EFFDT =
(SELECT MAX(A_ED.EFFDT) FROM PS_JOB A_ED
WHERE A.EMPLID = A_ED.EMPLID
AND A.EMPL_RCD = A_ED.EMPL_RCD
AND A_ED.EFFDT <= sysdate)
AND A.EFFSEQ =
(SELECT MAX(A_ES.EFFSEQ) FROM PS_JOB A_ES
WHERE A.EMPLID = A_ES.EMPLID
AND A.EMPL_RCD = A_ES.EMPL_RCD
AND A.EFFDT = A_ES.EFFDT) )

As we can see in the explain plan, a lot is going on in the tables just to get the MAX(effdt) and MAX(effseq).

normal sql explain plan

In the next example I have use the Partition by function to get the MAX(effdt) and MAX(effseq) It partitions job by EMPLID and EMPL_RCD, and returns the maximum dated row for each partition.


SELECT J.EMPLID,
J.EMPL_RCD,
J.EFFDT,
J.EFFSEQ
FROM
(SELECT EMPLID,
EMPL_RCD,
EFFDT,
EFFSEQ,
MAX (EFFDT) over (partition BY EMPLID, EMPL_RCD) AS MAX_JOB_EFFDT,
MAX (EFFSEQ) over (partition BY emplid, EMPL_RCD, EFFDT) AS MAX_JOB_EFFSEQ
FROM PS_JOB
WHERE effdt <= SYSDATE) J
WHERE J.EFFDT = J.MAX_JOB_EFFDT
AND j.EFFSEQ = J.MAX_JOB_EFFSEQ;

In this explain plan, a lot less is going on but it returns the exact same data.

Partitioned SQL

Also you might notice that the CPU Cost is much lower. This doesn’t have to be the case with all partition by data. Because we are using analytic functions, when processing this on 300,000 rows in job the cost might be much higher then the cost of a normal query. The big difference will be in the execution time. For example; when executing the normal query this might cost 2.500 seconds, while in analythis this will be below 1.000 seconds. This is something to keep in mind.

Analytic functions  sometimes cost more CPU when processig large amounts of rows, but execution time will also be much, much faster.

 Viewed 881 times by 425 visitors


Category: Technical
You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

2 Responses (last comment shown first)

May 28, 2012
Ganesh
avatar

Good Thought and analysis of the Explain Plan

I think the results will not be same – with above query.

Where EFFDT <= SYSDATE – Condition needs to added to get the same result.

[Reply to this comment]

avatar

Wijnand Gritter Reply:

Thanks for noticing my writing mistake. I have edited the post.

[Reply to this comment]