Blog

Blog Post: Avoiding reinventing the wheel – use MULTISET EXCEPT to get set relative complement
Allgemein

Blog Post: Avoiding reinventing the wheel – use MULTISET EXCEPT to get set relative complement

This post was originally published on this site

Today in an application I stumbled upon a PL/SQL function created to return those elements of one nested table that did not exist in another nested table. Not a huge function, but still a bit of work and some thinking that had been done some while ago to compare the elements and create the desired output nested table.

The trouble is, that this functionality already natively exists in PL/SQL (and SQL) – the developer had actually reinvented the wheel. In set theory this is called a relative complement and PL/SQL and SQL has operator MULTISET EXCEPT to perform this set operation.

So just as a warning to all that you should really be aware of the built-in functionality of any programming language, I’ll show you this function and how I replaced it.

First I create the nested table type along with a procedure that simply dumps the content of that type to the server output.

create or replace type num_tab_type as table of number
/

create or replace procedure dump_num_tab( p_nt num_tab_type )
is
begin
if p_nt is null then
dbms_output.put_line('NULL');
else
dbms_output.put_line('Count: '||p_nt.count);
for i in 1..p_nt.count loop
dbms_output.put_line(lpad(i,5)||': '||p_nt(i));
end loop;
end if;
end dump_num_tab;
/

Then I have the function as I found it (renamed identifiers and reformatted for this demo, but functionality exactly as I found it.) The code loops through the elements of the two nested tables comparing elements individually to each other – lots of work.

create or replace function num_tab_complement ( p_nt_1 num_tab_type, p_nt_2 num_tab_type) 
return num_tab_type
is
v_retval num_tab_type;
v_retcnt number := 0;
v_present boolean;
begin
v_retval := num_tab_type();
if p_nt_1 is not null then
for i in 1..p_nt_1.count loop

v_present := false;
if p_nt_2 is not null then
for j in 1..p_nt_2.count loop
if p_nt_1(i) = p_nt_2(j) then
v_present := true;
exit;
end if;
end loop;
end if;

if not v_present then
v_retval.extend(1);
v_retcnt := v_retcnt + 1;
v_retval(v_retcnt) := p_nt_1(i);
end if;

end loop;
end if;

return v_retval;
end num_tab_complement;
/

Testing the function with some various cases of two different nested tables.

declare
nt1 num_tab_type;
nt2 num_tab_type;
nt3 num_tab_type;
begin
dbms_output.put_line('-- case 1 --');
nt1 := num_tab_type(1,2,3);
nt2 := num_tab_type(2,4);
nt3 := num_tab_complement(nt1, nt2);
dump_num_tab(nt3);

dbms_output.put_line('-- case 2 --');
nt1 := num_tab_type(1,2,3);
nt2 := num_tab_type(4,3,2,1);
nt3 := num_tab_complement(nt1, nt2);
dump_num_tab(nt3);

dbms_output.put_line('-- case 3 --');
nt1 := num_tab_type(1,2,3);
nt2 := num_tab_type();
nt3 := num_tab_complement(nt1, nt2);
dump_num_tab(nt3);

dbms_output.put_line('-- case 4 --');
nt1 := num_tab_type(1,2,3);
nt2 := NULL;
nt3 := num_tab_complement(nt1, nt2);
dump_num_tab(nt3);

dbms_output.put_line('-- case 5 --');
nt1 := num_tab_type();
nt2 := num_tab_type(2,4);
nt3 := num_tab_complement(nt1, nt2);
dump_num_tab(nt3);

dbms_output.put_line('-- case 6 --');
nt1 := NULL;
nt2 := num_tab_type(2,4);
nt3 := num_tab_complement(nt1, nt2);
dump_num_tab(nt3);
end;
/

The implementation chosen by the original developer was to treat NULL valued nested tables as if they were empty nested tables, as can be seen in the output. This is not standard, but the application presumably expects the code to work this way.

-- case 1 --
Count: 2
1: 1
2: 3
-- case 2 --
Count: 0
-- case 3 --
Count: 3
1: 1
2: 2
3: 3
-- case 4 --
Count: 3
1: 1
2: 2
3: 3
-- case 5 --
Count: 0
-- case 6 --
Count: 0

So now I turn to the built-in functionality MULTISET EXCEPT for getting a relative complement of two sets.

I could in principle avoid the function completely and just use MULTISET EXCEPT those places where the function is called. This would normally be my preference.

Except that in this case the original function did a non-standard handling of NULL values, so to make certain the application does not break, I’ll just replace the inner workings of the function with a little CASE structure to handle the NULL special cases and otherwise just return the MULTISET EXCEPT of the two sets.

create or replace function num_tab_complement ( p_nt_1 num_tab_type, p_nt_2 num_tab_type) 
return num_tab_type
is
begin
return
case
when p_nt_1 is null then
num_tab_type()
when p_nt_2 is null then
p_nt_1
else
p_nt_1 multiset except all p_nt_2
end;
end num_tab_complement;
/

And testing the new function with the exact same test anonymous block gives the same output, so the replacement will not break the application, just make it faster.

-- case 1 --
Count: 2
1: 1
2: 3
-- case 2 --
Count: 0
-- case 3 --
Count: 3
1: 1
2: 2
3: 3
-- case 4 --
Count: 3
1: 1
2: 2
3: 3
-- case 5 --
Count: 0
-- case 6 --
Count: 0

A couple of notes:

  • I used MULTISET EXCEPT ALL (ALL is the default, the alternative is DISTINCT.) This does not entirely match the original function. In the original, if there are duplicates of a value in p_nt_1, either that value will be in the output nested table the same number of occurrences as in p_nt_1 (if the value does not exist in p_nt_2) or that value will not be in the output nested table at all (if the value exists minimum once in p_nt_2.) So here the replacement function does not entirely emulate the original, and I can only do this because I know there will never be duplicates in the nested tables in this application. If there had been duplicates, I would question whether the original function was a correct implementation and find out whether ALL or DISTINCT would be the correct replacement.
  • Using the MULTISET instead of the Do-It-Yourself code has many advantages like for example speed. But a very important advantage is that I do not have to care about whether the nested tables are dense or sparse. Of course I could have rewritten the original DIY code to handle sparse nested tables, but it is just so much easier when I don’t have to worry about that at all and just let the database figure it out.

Morale: Be lazy. Use the built-in functionality. It is only in very rare very special cases that it just might be possible for you to DIY something better and faster than what the database can do natively 😉

The code shown can be tried out on LiveSQL here: https://livesql.oracle.com/apex/livesql/s/fmmlw2stbi30fcbzf5pg3ud0n