How to identify and remove special characters in a column of a table in Oracle

Try checking the exact character codes:

SELECT DUMP(addr, 1016)
FROM nz_bharat_fiber_leads
WHERE lead_id = 'LEAD-20260505-142621-EF41';

The Output is 

Typ=1 Len=81 CharacterSet=US7ASCII: 4c,41,49,4a,55,20,56,41,52,4b,45,59,2c,53,54,2e,20,41,52,4e,4f,4c,44,27,53,20,48,49,47,48,20,53,43,48,4f,4f,4c,20,26,20,4a,55,4e,49,4f,52,20,43,4f,4c,4c,45,47,45,20,b8,b7,20,b0,20,b0,20,b8,20,b9,b0,b7,b0,20,34,30,30,30,39,33,20,49,6e,64,69,61


Since your bad data contains characters with ASCII values greater than 127 (B0, B7, B8, B9), you can identify all affected rows in several ways.

1. Find rows containing any non-ASCII characters

This is the most useful query:

SELECT lead_id,
       addr
FROM nz_bharat_fiber_leads
WHERE REGEXP_LIKE(addr, '[^ -~]');

This finds characters outside the printable ASCII range (32-126).


2. Find rows containing specific garbage bytes

Since your dump showed:

B0 B7 B8 B9

use:

SELECT lead_id,
       addr
FROM nz_bharat_fiber_leads
WHERE INSTR(addr, CHR(176)) > 0
   OR INSTR(addr, CHR(183)) > 0
   OR INSTR(addr, CHR(184)) > 0
   OR INSTR(addr, CHR(185)) > 0;

3. Count affected rows

SELECT COUNT(*)
FROM nz_bharat_fiber_leads
WHERE REGEXP_LIKE(addr, '[^ -~]');

4. Find the exact offending character positions

SELECT lead_id,
       addr,
       DUMP(addr,1016) dump_value
FROM nz_bharat_fiber_leads
WHERE REGEXP_LIKE(addr, '[^ -~]');

5. Preview cleaned addresses before updating

SELECT lead_id,
       addr old_addr,
       REPLACE(
         REPLACE(
           REPLACE(
             REPLACE(addr, CHR(184), ''),
           CHR(183), ''),
         CHR(176), ''),
       CHR(185), '') new_addr
FROM nz_bharat_fiber_leads
WHERE REGEXP_LIKE(addr, '[^ -~]');

6. Bulk cleanup of all affected rows

After verifying:

UPDATE nz_bharat_fiber_leads
SET addr =
    REPLACE(
      REPLACE(
        REPLACE(
          REPLACE(addr, CHR(184), ''),
        CHR(183), ''),
      CHR(176), ''),
    CHR(185), '')
WHERE REGEXP_LIKE(addr, '[^ -~]');

COMMIT;

Recommended first step

Run:

SELECT COUNT(*)
FROM nz_bharat_fiber_leads
WHERE REGEXP_LIKE(addr, '[^ -~]');

to see how many rows are affected before attempting a bulk update. This will also help estimate the cleanup effort.


if the unwanted character is actually U+FFFD (�):

UPDATE nz_bharat_fiber_leads

SET addr = REPLACE(addr, UNISTR('\FFFD'), '')
WHERE lead_id = 'LEAD-20260505-142621-EF41';

COMMIT;

However, if the characters are still visible after the update, they may not be U+FFFD. Try checking the exact character codes:

SELECT addr,
       DUMP(addr, 1016)
FROM nz_bharat_fiber_leads
WHERE lead_id = 'LEAD-20260505-142621-EF41';

A more aggressive cleanup approach is:

UPDATE nz_bharat_fiber_leads
SET addr =
       REGEXP_REPLACE(
           addr,
           '[^[:alnum:][:space:],./()&''-]',
           ''
       )
WHERE lead_id = 'LEAD-20260505-142621-EF41';

COMMIT;

This will keep:

  • Letters (A-Z, a-z)

  • Numbers (0-9)

  • Spaces

  • Common address symbols: , . / ( ) & ' -

Example:

Before:

LAIJU VARKEY,ST. ARNOLD'S HIGH SCHOOL & JUNIOR COLLEGE �� � � � ���� 400093 India

After:

LAIJU VARKEY,ST. ARNOLD'S HIGH SCHOOL & JUNIOR COLLEGE 400093 India

If this still doesn't remove the characters, please share the output of:

SELECT DUMP(addr, 1016)
FROM nz_bharat_fiber_leads
WHERE lead_id = 'LEAD-20260505-142621-EF41';

and  identify the exact bytes that need to be removed.


If the address contains replacement characters like:

LAIJU VARKEY,ST. ARNOLD'S HIGH SCHOOL & JUNIOR COLLEGE ��������������� ��������� ������������������ ������������ ��������������� ������������������������������ 400093 India

you can remove all non-printable/special characters using Oracle regular expressions.

Preview the cleaned value

SELECT lead_id,
       address,
       REGEXP_REPLACE(address, '[^[:print:]]', '') AS cleaned_address
FROM nz_bharat_fiber_leads
WHERE lead_id = 1234;

Update the address

UPDATE nz_bharat_fiber_leads
SET address = REGEXP_REPLACE(address, '[^[:print:]]', '')
WHERE lead_id = 1234;

COMMIT;

If the address contains Unicode replacement character (�)

Sometimes the symbol displayed as is actually Unicode character U+FFFD. In that case:

UPDATE nz_bharat_fiber_leads
SET address = REPLACE(address, UNISTR('\FFFD'), '')
WHERE lead_id = 1234;

COMMIT;

Remove all characters except letters, numbers and common address symbols

This is often the safest approach:

UPDATE nz_bharat_fiber_leads
SET address =
       REGEXP_REPLACE(
           address,
           '[^[:alnum:][:space:],./()&''-]',
           ''
       )
WHERE lead_id = 1234;

COMMIT;

Example Result

Before:

LAIJU VARKEY,ST. ARNOLD'S HIGH SCHOOL & JUNIOR COLLEGE ��������������� ��������� ������������������ ������������ ��������������� ������������������������������ 400093 India

After:

LAIJU VARKEY,ST. ARNOLD'S HIGH SCHOOL & JUNIOR COLLEGE 400093 India

To determine exactly which character is stored, run:

SELECT DUMP(address, 1016)
FROM nz_bharat_fiber_leads
WHERE lead_id = 1234;


Post a Comment

And that's all there is to it!

If anyone has any other questions or requests for future How To posts, you can either ask them in the comments or email me. Please don't feel shy at all!

I'm certainly not an expert, but I'll try my hardest to explain what I do know and research what I don't know.

Previous Post Next Post