Indexing NULL table column values for fast SQL performance

One problem with all relational databases is having the optional ability to index on a NULL column.  By default, relational databases ignore NULL values -- because the relational model says that NULL means "not present" .  Hence, Oracle indexes will not include NULL values.  


For example, this index definition would not index on "open positions", new employee positions that are stored with a NULL employee name:

create index
    emp_ename_idx
on
   emp
   (ename)
;

Whenever a SQL query asks for the open position employee slots "where ename is NULL", there will be no index entries for NULLS in emp_name_idx and Oracle would perform an unnecessary large-table full-table scan.

Execution Plan
---------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=6)
1 0 TABLE ACCESS (FULL) OF 'EMP' (Cost=1 Card=1 Bytes=6)


To get around the optimization of SQL queries that choose NULL column values, we can create a function-based index using the null value built-in SQL function to index only on the NULL columns. 
Note that the "null value" (NVL) function replaces NULL values with the character string "null', a real value that can participate in an index:

-- create an Function based index on ename column with NULL values

create index
    emp_null_ename_idx
on
   emp
   (nvl(ename,'null'));

analyze index emp_null_ename_idx compute statistics;


You can also do this techniques with NULL numeric values.  This syntax replaces NULL values with a zero:
 
-- create an FBI on emp_nbr column with NULL values
create index
    emp_null_emp_nbr_idx
on
   emp
   (nvl(ename,o));

analyze index emp_null_ename_idx compute statistics;


Now we can use the index and greatly improve the speed of any queries that require access to the NULL columns.  Note that we must make one of two changes:

    1 - Add a hint to force the index
    2 - Change the WHERE predicate to match the function

Here is an example of using an index on NULL column values:

-- insert a NULL row

insert into emp (empno) values (999);
set autotrace traceonly explain;

-- test the index access (change predicate to use FBI)

select /*+ index(emp_null_ename_idx) */
   ename
from
   emp e
where
   nvl(ename,'null') = 'null'
;


it can be done without changing query as well......

SQL> select count(1) from t where n is null;

 COUNT(1)
----------
      334

Execution Plan
---------------------------------------------------------
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=3)
  1    0   SORT (AGGREGATE)
  2    1     TABLE ACCESS (FULL) OF 'T' (Cost=3 Card=334 Bytes=1002)


SQL> create index tind on t(n, 1); ----> here 1 is just any arbitary value.

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true);

PL/SQL procedure successfully completed.

SQL> select count(1) from t where n is null;

 COUNT(1)
----------
      334

Execution Plan
----------------------------------------------------------
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=4)
  1    0   SORT (AGGREGATE)
  2    1     INDEX (RANGE SCAN) OF 'TIND' (NON-UNIQUE) (Cost=2 Card=3
         34 Bytes=1336)



Post a Comment

And that's all there is to it!

If anyone has any other questions or requests for future How To posts, you can either ask them in the comments or email me. Please don't feel shy at all!

I'm certainly not an expert, but I'll try my hardest to explain what I do know and research what I don't know.

Previous Post Next Post