哈希

问题:动态查找

思想:通过计算的方法查找元素

应用:字典问题

  • 符号表\(:=\set{<\text{name, attribute}>}\)
  • 用于比较字典中的字符串

ADT:A set of name-attribute pairs, where the names are unique

操作:

  • Sym Tab Create(TableSize)
  • Boolean Isln(symtab, name)
  • Attribute Find(symtab, name)
  • Sym TabInsert(symtab,name, attr)
  • Sym TabDelete(symtab,name)

概念:

image.png|300

  • 哈希函数:f ( x ) = position of x in ht[ ] (i.e. the index of the bucket that contains x )
  • T := total number of distinct possible values for x
  • n := total number of identifiers in ht[ ]
  • identifier density := \(n/T\)
  • loading density \(\lambda\) := \(n/(sb)\)

问题:

  • 哈希碰撞:A collision occurs when we hash two nonidentical identifiers into the same bucket
    • 解决:设定哈希缓冲区
  • 溢出:An overflow occurs when we hash a new identifier into a full bucket.

当不存在溢出和碰撞时,\(T_{search}=T_{insert}=T_{delete}=O(1)\)

哈希函数设计:

  • 整数:
    • 求余法:\(f(x)=x\%m\) 其中\(m\)为素数
    • 平方取中:将\(x\)平方,取中间的位数(中间位数受各数字影响较大)
    • 折叠法:将数字分块,每部分按正反顺序排列后相加
    • 数字分析法
  • 字符串:
    • \(f(x)=(\sum x_i)\%\)TableSize 问题:利用率低
    • \(f(x)=(x_0+x_1\times27+x_2\times27^2)\%\)TableSize 问题:统计结果表明前三位组合少
    • \(f(x)=(\sum x_{N-i-1}\times32^i)\%\)TableSize
      • 使用32的幂可用位运算加速运算
      • 实现:
        Index Hash3( const char *x, int TableSize )
        {
            unsigned  int  HashVal = 0;
        /* 1*/  while( *x != '\0' )
        /* 2*/       HashVal = ( HashVal << 5 ) + *x++;
        /* 3*/  return HashVal % TableSize;
        }
        

        标准哈希函数:Probability\((f(x)=i)=1/b\)

链地址法:使用链表存储哈希值相同的元素

struct  ListNode;
typedef  struct  ListNode  *Position;
struct  HashTbl;
typedef  struct  HashTbl  *HashTable;
struct  ListNode {
    ElementType  Element;
    Position  Next;
};
typedef  Position  List;
/* List *TheList will be an array of lists, allocated later */
/* The lists use headers (for simplicity), */
/* though this wastes space */
struct  HashTbl {
    int  TableSize;
    List  *TheLists;
};
初始化:
HashTable  InitializeTable( int TableSize )
{   HashTable  H;
    int  i;
    if ( TableSize < MinTableSize ) {
        Error( "Table size too small" );  return NULL;  
    }
    H = malloc( sizeof( struct HashTbl ) );  /* Allocate table */
    if ( H == NULL )    FatalError( "Out of space!!!" );
    H->TableSize = NextPrime( TableSize );  /* Better be prime */
    H->TheLists = malloc( sizeof( List ) * H->TableSize );  /*Array of lists*/
    if ( H->TheLists == NULL )   FatalError( "Out of space!!!" );
    for( i = 0; i < H->TableSize; i++ ) {   /* Allocate list headers */
    H->TheLists[ i ] = malloc( sizeof( struct ListNode ) ); /* Slow! */
    if ( H->TheLists[ i ] == NULL )  FatalError( "Out of space!!!" );
    else    H->TheLists[ i ]->Next = NULL;
    }
    return  H;
}
查找:
Position  Find ( ElementType Key, HashTable H )
{
    Position P;
    List L;
    L = H->TheLists[ Hash( Key, H->TableSize ) ];
    P = L->Next;
    while( P != NULL && P->Element != Key )  /* Probably need strcmp */
    P = P->Next;
    return P;
}
插入:
void  Insert ( ElementType Key, HashTable H )
{
    Position   Pos, NewCell;
    List  L;
    Pos = Find( Key, H );
    if ( Pos == NULL ) {   /* Key is not found, then insert */
    NewCell = malloc( sizeof( struct ListNode ) );
    if ( NewCell == NULL )     FatalError( "Out of space!!!" );
    else {
         L = H->TheLists[ Hash( Key, H->TableSize ) ];
         NewCell->Next = L->Next;
         NewCell->Element = Key; /* Probably need strcpy! */
         L->Next = NewCell;
    }
  }
}

开放地址法:

Algorithm: insert key into an array of hash table
{
    index = hash(key);
    initialize i = 0 ------ the counter of probing;
    while ( collision at index ) {
    index = ( hash(key) + f(i) ) % TableSize;
    if ( table is full )    break;
    else    i ++;
    }
    if ( table is full )
    ERROR (No space left);
    else
    insert key at index;
}
删除处理:打标记

平均成功查找次数:枚举,求平均(每个元素都去找一遍,求平均)

平均失败查找次数:

  • 分类计数:以哈希值为分类依据,查找次数——查找多少次能断定该元素不存在(例:若哈希函数为\(x\%M\),讨论余数为\(0,\cdots,M-1\)每一类需要的次数)

线性探测:\(f(i)=i\)

问题:初始聚集,any key that hashes into the cluster will add to the cluster after several attempts to resolve the collision.

平方探测:\(f(i)=i^2\)

定理:若(1)哈希表元素一半不到 (2)表大小为素数,则任意插入都能找到空位

证明:只要证前\(\lfloor\)表大小/2\(\rfloor\) 次探测位置都不同,用反证法证明

实现:

查找:

Position  Find ( ElementType Key, HashTable H )
{   Position  CurrentPos;
    int  CollisionNum;
    CollisionNum = 0;
    CurrentPos = Hash( Key, H->TableSize );
    while( H->TheCells[ CurrentPos ].Info != Empty &&
    H->TheCells[ CurrentPos ].Element != Key ) {
    CurrentPos += 2 * ++CollisionNum - 1; // (i+1)^2=i^2+(2i+1) 2i+1=2(i+1)-1
    if ( CurrentPos >= H->TableSize )  CurrentPos -= H->TableSize;
    }
    return CurrentPos;
}
插入:
void  Insert ( ElementType Key, HashTable H )
{
    Position  Pos;
    Pos = Find( Key, H );
    if ( H->TheCells[ Pos ].Info != Legitimate ) { /* OK to insert here */
    H->TheCells[ Pos ].Info = Legitimate;
    H->TheCells[ Pos ].Element = Key; /* Probably need strcpy */
    }
}
变种:\(f(i)=\pm i\)

公共避冲区法:开一块空间专门处理冲突

双哈希法:\(f(i)=i+\text{hash}_2(x)\)

取法:\(\text{hash}_2(x)=R-x\%R\)

rehashing:将表扩大一倍,素数扩大一倍后找最近的素数

扩增条件:装载率超过一半