Quantcast
Channel: Lalit Kumar B
Viewing all articles
Browse latest Browse all 44

UTL_MATCH string comparision technique in Oracle

$
0
0

The UTL_MATCH package facilitates matching two records. It comes very handy for data entry validation and string matching.

Setup

DROP TABLE t_utl_match;

CREATE TABLE t_utl_match (
id   NUMBER,
col1 VARCHAR2(15),
col2 VARCHAR2(15))
);

INSERT INTO t_utl_match VALUES (1, 'Dunningham', 'Cunnigham');
INSERT INTO t_utl_match VALUES (2, 'Abroms', 'Abrams');
INSERT INTO t_utl_match VALUES (3, 'Lampley', 'Campley');
INSERT INTO t_utl_match VALUES (4, 'Marhta', 'Martha');
INSERT INTO t_utl_match VALUES (5, 'Jonathon', 'Jonathan');
INSERT INTO t_utl_match VALUES (6, 'Jeraldine', 'Geraldine');
INSERT INTO t_utl_match VALUES (7, 'Bat Man', 'Cat Woman');
COMMIT;

UTL_MATCH Subprograms

  • EDIT_DISTANCE Function

a.k.a. Levenshtein Distance, calculates the number of changes required to transform string-1 into string-2

For example,

SQL> SELECT id,
  2         text1,
  3         text2,
  4         UTL_MATCH.edit_distance(text1, text2) AS ed
  5  FROM   t_utl_match
  6  ORDER BY id;

        ID TEXT1           TEXT2                   ED
---------- --------------- --------------- ----------
         1 Dunningham      Cunnigham                2
         2 Abroms          Abrams                   1
         3 Lampley         Campley                  1
         4 Marhta          Martha                   2
         5 Jonathon        Jonathan                 1
         6 Jeraldine       Geraldine                1
         7 Bat Man         Cat Woman                4

7 rows selected.

SQL>
  • EDIT_DISTANCE_SIMILARITY Function

Calculates the number of changes required to transform string-1 into string-2, returning a value between 0 (no match) and 100 (perfect match)

For example,


SQL> SELECT id,
2         text1,
3         text2,
4         UTL_MATCH.edit_distance_similarity(text1, text2) AS eds
5  FROM   t_utl_match
6  ORDER BY id;

ID TEXT1           TEXT2                  EDS
---------- --------------- --------------- ----------
1 Dunningham      Cunnigham               80
2 Abroms          Abrams                  84
3 Lampley         Campley                 86
4 Marhta          Martha                  67
5 Jonathon        Jonathan                88
6 Jeraldine       Geraldine               89
7 Bat Man         Cat Woman               56

7 rows selected.

SQL>
  • JARO_WINKLER Function

Calculates the measure of agreement between string-1 and string-2

For example,


SQL> SELECT id,
2         text1,
3         text2,
4         UTL_MATCH.jaro_winkler(text1, text2) AS jw
5  FROM   t_utl_match
6  ORDER BY id;

ID TEXT1           TEXT2                   JW
---------- --------------- --------------- ----------
1 Dunningham      Cunnigham       8.963E-001
2 Abroms          Abrams          9.222E-001
3 Lampley         Campley         9.048E-001
4 Marhta          Martha          9.611E-001
5 Jonathon        Jonathan          9.5E-001
6 Jeraldine       Geraldine       9.259E-001
7 Bat Man         Cat Woman       7.566E-001

7 rows selected.

SQL>
  • JARO_WINKLER_SIMILARITY Function

Calculates the measure of agreement between string-1 and string-2, returning a value between 0 (no match) and 100 (perfect match)

For example.

DROP TABLE t_utl_match;

SQL> SELECT id,
2         text1,
3         text2,
4         UTL_MATCH.jaro_winkler_similarity(text1, text2) AS jws
5  FROM   t_utl_match
6  ORDER BY id;

ID TEXT1           TEXT2                  JWS
---------- --------------- --------------- ----------
1 Dunningham      Cunnigham               89
2 Abroms          Abrams                  92
3 Lampley         Campley                 90
4 Marhta          Martha                  96
5 Jonathon        Jonathan                95
6 Jeraldine       Geraldine               92
7 Bat Man         Cat Woman               75

7 rows selected.

SQL>

See the documentation for more details about UTL_MATCH.

Hope it helps!


Filed under: Oracle Delimited String manipulation Tagged: compare strings, data entry, edit_distance, edit_distance_similarility, jaro_winkler, jaro_winkler_similarity, match strings, string difference, string similarity, utl_match

Viewing all articles
Browse latest Browse all 44

Trending Articles