Removing non-printable / ASCII characters oracle
🔹 1. Your Current Approach (Valid but Heavy)
REGEXP_REPLACE(column, '[^[:print:]]', '')
✔ Removes all non-printable characters
❌ Expensive for large datasets (regex on every column)
🔹 2. Optimized Approach (Better Performance)
If your main issue is control characters like newline, tab, carriage return, you can replace them explicitly:
REPLACE(REPLACE(REPLACE(column, CHR(10), ''), CHR(13), ''), CHR(9), '')
👉 Faster than REGEXP
Example:
REPLACE(REPLACE(REPLACE(PHONE_NO, CHR(10), ''), CHR(13), ''), CHR(9), '') PHONE_NO
🔹 3. Create a Reusable Function (Best Practice)
Instead of repeating everywhere, create a function:
CREATE OR REPLACE FUNCTION CLEAN_TEXT(p_text VARCHAR2)
RETURN VARCHAR2 IS
BEGIN
RETURN REGEXP_REPLACE(p_text, '[^[:print:]]', '');
END;
/
Then your query becomes:
SELECT
CLEAN_TEXT(TO_CHAR(SLNO)) SLNO,
CLEAN_TEXT(PHONE_NO) PHONE_NO,
CLEAN_TEXT(SERVICE_TYPE) SERVICE_TYPE,
CLEAN_TEXT(SERVICE_SUB_TYPE) SERVICE_SUB_TYPE,
CLEAN_TEXT(SERVICE_OPER_STATUS) SERVICE_OPER_STATUS,
CLEAN_TEXT(CUST_ACCNT_NO) CUST_ACCNT_NO,
CLEAN_TEXT(BILL_ACCNT_NO) BILL_ACCNT_NO,
CLEAN_TEXT(BILL_ACCNT_TYPE) BILL_ACCNT_TYPE,
CLEAN_TEXT(BILL_ACCNT_SUB_TYPE) BILL_ACCNT_SUB_TYPE,
CLEAN_TEXT(MOBILE_NO) MOBILE_NO,
CLEAN_TEXT(EMAIL_ID) EMAIL_ID,
CLEAN_TEXT(OLT_IP) OLT_IP,
CLEAN_TEXT(CUSTOMER_NAME) CUSTOMER_NAME,
CLEAN_TEXT(ADDR_1) ADDR_1,
CLEAN_TEXT(ADDR_2) ADDR_2,
CLEAN_TEXT(DISTRICT) DISTRICT,
CLEAN_TEXT(PIN_CODE) PIN_CODE,
CLEAN_TEXT(SSA) SSA,
CLEAN_TEXT(CIRCLE) CIRCLE,
CLEAN_TEXT(ZONE) ZONE,
CLEAN_TEXT(BBM_USERNAME) BBM_USERNAME,
CLEAN_TEXT(BBM_NAME) BBM_NAME,
CLEAN_TEXT(BBM_MOBILE_NO) BBM_MOBILE_NO,
CLEAN_TEXT(TYPE_OF_ORGANISTAION) TYPE_OF_ORGANISTAION,
CLEAN_TEXT(MINISTRY) MINISTRY,
CLEAN_TEXT(DEPARTMENT) DEPARTMENT,
CLEAN_TEXT(ORGANISATION) ORGANISATION,
CLEAN_TEXT(PROJECT) PROJECT,
CLEAN_TEXT(REMARKS) REMARKS,
CLEAN_TEXT(TO_CHAR(UPDATED_ON)) UPDATED_ON,
CLEAN_TEXT(UPDATED_BY) UPDATED_BY
FROM INFRA_ADMIN.GOVT_CONNECTIONS;
✔ Cleaner
✔ Maintainable
✔ Easier to modify logic later
🔹 4. If Data Fix is One-Time (Recommended)
Instead of cleaning every query, fix data permanently:
UPDATE INFRA_ADMIN.GOVT_CONNECTIONS
SET PHONE_NO = REGEXP_REPLACE(PHONE_NO, '[^[:print:]]', '');
👉 Do this for all required columns once
You want to update only when non-printable characters exist, to avoid unnecessary updates
✔ Correct Conditional UPDATE
UPDATE INFRA_ADMIN.GOVT_CONNECTIONS
SET PHONE_NO = REGEXP_REPLACE(PHONE_NO, '[^[:print:]]', '')
WHERE REGEXP_LIKE(PHONE_NO, '[^[:print:]]');
🔍 Explanation
REGEXP_REPLACE(...)→ removes non-printable charactersREGEXP_LIKE(... '[^[:print:]]')→ ensures update happens only if such characters exist
🔹 If column can have NULLs (safe version)
UPDATE INFRA_ADMIN.GOVT_CONNECTIONS
SET PHONE_NO = REGEXP_REPLACE(PHONE_NO, '[^[:print:]]', '')
WHERE PHONE_NO IS NOT NULL
AND REGEXP_LIKE(PHONE_NO, '[^[:print:]]');
🔹 To check before update (recommended)
SELECT PHONE_NO
FROM INFRA_ADMIN.GOVT_CONNECTIONS
WHERE REGEXP_LIKE(PHONE_NO, '[^[:print:]]');
🔹 For multiple columns (pattern)
UPDATE INFRA_ADMIN.GOVT_CONNECTIONS
SET PHONE_NO = REGEXP_REPLACE(PHONE_NO, '[^[:print:]]', ''),
EMAIL_ID = REGEXP_REPLACE(EMAIL_ID, '[^[:print:]]', '')
WHERE REGEXP_LIKE(PHONE_NO, '[^[:print:]]')
OR REGEXP_LIKE(EMAIL_ID, '[^[:print:]]');
⚡ Performance Tip (Important for large tables)
Regex is costly. If mostly dealing with:
newline →
CHR(10)carriage return →
CHR(13)tab →
CHR(9)
Use:
UPDATE INFRA_ADMIN.GOVT_CONNECTIONS
SET PHONE_NO = REPLACE(REPLACE(REPLACE(PHONE_NO, CHR(10), ''), CHR(13), ''), CHR(9), '')
WHERE PHONE_NO LIKE '%' || CHR(10) || '%'
OR PHONE_NO LIKE '%' || CHR(13) || '%'
OR PHONE_NO LIKE '%' || CHR(9) || '%';