The UTL_MATCH package facilitates matching two records. It comes very handy for data entry validation and string matching.
Setup
DROP TABLE t_utl_match;
CREATE TABLE t_utl_match (
id NUMBER,
col1 VARCHAR2(15),
col2 VARCHAR2(15))
);
INSERT INTO t_utl_match VALUES (1, 'Dunningham', 'Cunnigham');
INSERT INTO t_utl_match VALUES (2, 'Abroms', 'Abrams');
INSERT INTO t_utl_match VALUES (3, 'Lampley', 'Campley');
INSERT INTO t_utl_match VALUES (4, 'Marhta', 'Martha');
INSERT INTO t_utl_match VALUES (5, 'Jonathon', 'Jonathan');
INSERT INTO t_utl_match VALUES (6, 'Jeraldine', 'Geraldine');
INSERT INTO t_utl_match VALUES (7, 'Bat Man', 'Cat Woman');
COMMIT;
UTL_MATCH Subprograms
- EDIT_DISTANCE Function
a.k.a. Levenshtein Distance, calculates the number of changes required to transform string-1 into string-2
For example,
SQL> SELECT id,
2 text1,
3 text2,
4 UTL_MATCH.edit_distance(text1, text2) AS ed
5 FROM t_utl_match
6 ORDER BY id;
ID TEXT1 TEXT2 ED
---------- --------------- --------------- ----------
1 Dunningham Cunnigham 2
2 Abroms Abrams 1
3 Lampley Campley 1
4 Marhta Martha 2
5 Jonathon Jonathan 1
6 Jeraldine Geraldine 1
7 Bat Man Cat Woman 4
7 rows selected.
SQL>
- EDIT_DISTANCE_SIMILARITY Function
Calculates the number of changes required to transform string-1 into string-2, returning a value between 0 (no match) and 100 (perfect match)
For example,
SQL> SELECT id,
2 text1,
3 text2,
4 UTL_MATCH.edit_distance_similarity(text1, text2) AS eds
5 FROM t_utl_match
6 ORDER BY id;
ID TEXT1 TEXT2 EDS
---------- --------------- --------------- ----------
1 Dunningham Cunnigham 80
2 Abroms Abrams 84
3 Lampley Campley 86
4 Marhta Martha 67
5 Jonathon Jonathan 88
6 Jeraldine Geraldine 89
7 Bat Man Cat Woman 56
7 rows selected.
SQL>
- JARO_WINKLER Function
Calculates the measure of agreement between string-1 and string-2
For example,
SQL> SELECT id,
2 text1,
3 text2,
4 UTL_MATCH.jaro_winkler(text1, text2) AS jw
5 FROM t_utl_match
6 ORDER BY id;
ID TEXT1 TEXT2 JW
---------- --------------- --------------- ----------
1 Dunningham Cunnigham 8.963E-001
2 Abroms Abrams 9.222E-001
3 Lampley Campley 9.048E-001
4 Marhta Martha 9.611E-001
5 Jonathon Jonathan 9.5E-001
6 Jeraldine Geraldine 9.259E-001
7 Bat Man Cat Woman 7.566E-001
7 rows selected.
SQL>
- JARO_WINKLER_SIMILARITY Function
Calculates the measure of agreement between string-1 and string-2, returning a value between 0 (no match) and 100 (perfect match)
For example.
DROP TABLE t_utl_match;
SQL> SELECT id,
2 text1,
3 text2,
4 UTL_MATCH.jaro_winkler_similarity(text1, text2) AS jws
5 FROM t_utl_match
6 ORDER BY id;
ID TEXT1 TEXT2 JWS
---------- --------------- --------------- ----------
1 Dunningham Cunnigham 89
2 Abroms Abrams 92
3 Lampley Campley 90
4 Marhta Martha 96
5 Jonathon Jonathan 95
6 Jeraldine Geraldine 92
7 Bat Man Cat Woman 75
7 rows selected.
SQL>
See the documentation for more details about UTL_MATCH.
Hope it helps!
Filed under: Oracle Delimited String manipulation Tagged: compare strings, data entry, edit_distance, edit_distance_similarility, jaro_winkler, jaro_winkler_similarity, match strings, string difference, string similarity, utl_match
