Tanel published a great post a while ago talking about Oracle’s sql_id and hash values in Oracle 10g+. I wanted to be able to compute sql_id and hash values directly from SQL statements for our Hedgehog product. I did a few tests and could not match the MD5 value generated from the SQL statement to the MD5 value Oracle is calculating in X$KGLOB.KGLNAHSV. After a short discussion with Tanel, it turned out that Oracle is appending a NULL (‘\0′) value to the statement and then calculates the MD5.

Here is a test and some code in Python:

SYS> select 'Slavik' from dual;
'SLAVI
------
Slavik
SYS> select kglnahsv, kglnahsh from x$kglob where kglnaobj =
'select ''Slavik'' from dual';
KGLNAHSV                KGLNAHSH
--------------------------------- ----------
7a483e90555ab4ad24e190abe3e7775d  3823597405
7a483e90555ab4ad24e190abe3e7775d  3823597405

SYS> select sql_id, hash_value, old_hash_value from v$sql where sql_text =
'select ''Slavik'' from dual';

SQL_ID        HASH_VALUE OLD_HASH_VALUE
------------- ---------- --------------
29schpgjyfxux 3823597405     3501236764

So, first, let's check that our MD5 matches:
>>> import hashlib
>>> import math
>>> import struct
>>> stmt = "select 'Slavik' from dual"
>>> d = hashlib.md5(stmt + '\x00').digest()
>>> struct.unpack('IIII', d)[3]
3823597405
>>> h = ''
>>> for i in struct.unpack('IIII', d):
 h += hex(i)[2:]
>>> h
'7a483e90555ab4ad24e190abe3e7775d'

Good, all seem to match!

Now, let's create some utility functions:
def sqlid_2_hash(sqlid):
  sum = 0
  i = 1
  alphabet = '0123456789abcdfghjkmnpqrstuvwxyz'
  for ch in sqlid:
    sum += alphabet.index(ch) * (32**(len(sqlid) - i))
    i += 1
  return sum % (2 ** 32)

def stmt_2_sqlid(stmt):
  h = hashlib.md5(stmt + '\x00').digest()
  (d1,d2,msb,lsb) = struct.unpack('IIII', h)
  sqln = msb * (2 ** 32) + lsb
  stop = math.log(sqln, math.e) / math.log(32, math.e) + 1
  sqlid = ''
  alphabet = '0123456789abcdfghjkmnpqrstuvwxyz'
  for i in range(0, stop):
    sqlid = alphabet[(sqln / (32 ** i)) % 32] + sqlid
  return sqlid

def stmt_2_hash(stmt):
  return struct.unpack('IIII', hashlib.md5(stmt + '\x00').digest())[3]

Let's try them...
>>> stmt_2_hash(stmt)
3823597405
>>> stmt_2_sqlid(stmt)
'29schpgjyfxux'
>>> sqlid_2_hash(stmt_2_sqlid(stmt))
3823597405

Well, it all works. Now, to the real programming…