|
HashString
Submitted by |
If you are in a scenario similar to this one:
// Declaration of a hash_map
std::hash_map<std::string, int map;
...
// Insertion of data into hash
map["SNES"] = 10;
map["N64"] = 20;
map["PS"] = 30;
map["PS2"] = 40;
map["XBOX"] = 50;
...
// Reading
int p = map.find("XBOX")-second; |
Look at the last line. This apparently ingenuous line hides a
serious overhead. Having declared the hash_map as <std::string, int>,
when you pass an array of chars to the hash map, it will perform an
implicit conversion from char * to string. So, in that line you are
creating a temporal string, an later copying into it.
The overhead due to the creation of the temporal string, varies
with the std::string implementation. Some implementation will allocate
memory, an memcpy. VC7 implementation of string, doesn't allocate memory
if the size is less than 16 characters. But you have the copy...
This overhead can be avoided with something like this:
static const std::string __XBOX = "XBOX";
int p = map.find(__XBOX)-second; |
This solution doesn't seem to be clean. And you have the
size-overhead of those statics.
A better solution may be using a class like this to avoid the
overhead described:
// HashString is supposed to be used as a key in a HashTable.
// It is used like a normal string but with the added possibility of create it
// as a pointer to an external array of chars without committing copy.
class HashString
{
public:
// Create HashString from a zero terminated array of chars
HashString(const Char *szName): m_sString(szName), m_szPtr(0) {}
// Create HashString from another HashString
HashString(const HashString &hs): m_sString(hs.m_sString), m_szPtr(0)
{
assert(hs.m_szPtr == 0);
}
enum DontCopyEnum
{
DontCopy
};
// Create HashString from a zero terminated array of chars. The creation
// will only perform a copy of the pointer. This constructor must only be
// used for temporal objects.
// \param szName zero terminated array of chars
// \param dontCopy
HashString(const Char *szName, DontCopyEnum dontCopy): m_szPtr(szName)
{
}
// Copy operator
HashString &operator=(const HashString &hs)
{
assert(hs.m_szPtr == 0);
assert(m_szPtr == 0);
m_sString = hs.m_sString;
}
// Less operator
bool operator<(const HashString &hs) const
{
return _tcscmp(GetPtr(), hs.GetPtr()) < 0;
}
// Hash function
// This algorithm (k=33) was first reported by dan bernstein many years ago in
// comp.lang.c. another version of this algorithm (now favored by bernstein) uses
// xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33
// (why it works better than many other constants, prime or not) has never been
// adequately explained.
operator size_t() const
{
UInt32 iHash = 5381;
Int32 c;
for(UInt32 i = 0; i < _tcslen(GetPtr()); i++)
{
c = GetPtr()[i];
iHash = ((iHash << 5) + iHash) + c; // hash * 33 + c
}
return iHash;
}
private:
const Char *GetPtr() const
{
if(m_szPtr) return m_szPtr;
else return m_sString.c_str();
}
String m_sString;
const Char *m_szPtr;
}; |
HashString can be used this way
// Declaration of a hash_map
std::hash_map<HashString, int map;
...
// Insertion of data into hash
map["SNES"] = 10;
map["N64"] = 20;
map["PS"] = 30;
map["PS2"] = 40;
map["XBOX"] = 50;
...
// Reading
int p = map.find(HashString("XBOX", HashString::DontCopy))-second; |
In a single loop that only reads from the hash the speedup is
over 50% using this HashString. Of course this numbers depends on the
string implementation.
So If you have a bottleneck reading hashtables (std::map,
std::hash_map, ... MFCs hashtables) an implementation like this may help
you.
|
The zip file viewer built into the Developer Toolbox made use
of the zlib library, as well as the zlibdll source additions.
|