This section of the archives stores flipcode's complete Developer Toolbox collection, featuring a variety of mini-articles and source code contributions from our readers.

 

  HashString
  Submitted by



If you are in a scenario similar to this one:

	// Declaration of a hash_map
	std::hash_map<std::string, int map;

...

// Insertion of data into hash map["SNES"] = 10; map["N64"] = 20; map["PS"] = 30; map["PS2"] = 40; map["XBOX"] = 50;

...

// Reading int p = map.find("XBOX")-second;

Look at the last line. This apparently ingenuous line hides a serious overhead. Having declared the hash_map as <std::string, int>, when you pass an array of chars to the hash map, it will perform an implicit conversion from char * to string. So, in that line you are creating a temporal string, an later copying into it.

The overhead due to the creation of the temporal string, varies with the std::string implementation. Some implementation will allocate memory, an memcpy. VC7 implementation of string, doesn't allocate memory if the size is less than 16 characters. But you have the copy...

This overhead can be avoided with something like this:

	static const std::string __XBOX = "XBOX";   
	int p = map.find(__XBOX)-second; 


This solution doesn't seem to be clean. And you have the size-overhead of those statics. A better solution may be using a class like this to avoid the overhead described:

// HashString is supposed to be used as a key in a HashTable. 
// It is used like a normal string but with the added possibility of create it 
// as a pointer to an external array of chars without committing copy.

class HashString
{
public:

// Create HashString from a zero terminated array of chars HashString(const Char *szName): m_sString(szName), m_szPtr(0) {}

// Create HashString from another HashString HashString(const HashString &hs): m_sString(hs.m_sString), m_szPtr(0) { assert(hs.m_szPtr == 0); }

enum DontCopyEnum { DontCopy };

// Create HashString from a zero terminated array of chars. The creation // will only perform a copy of the pointer. This constructor must only be // used for temporal objects. // \param szName zero terminated array of chars // \param dontCopy HashString(const Char *szName, DontCopyEnum dontCopy): m_szPtr(szName) { }

// Copy operator HashString &operator=(const HashString &hs) { assert(hs.m_szPtr == 0); assert(m_szPtr == 0);

m_sString = hs.m_sString; }

// Less operator bool operator<(const HashString &hs) const { return _tcscmp(GetPtr(), hs.GetPtr()) < 0; }

// Hash function // This algorithm (k=33) was first reported by dan bernstein many years ago in // comp.lang.c. another version of this algorithm (now favored by bernstein) uses // xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 // (why it works better than many other constants, prime or not) has never been // adequately explained. operator size_t() const { UInt32 iHash = 5381; Int32 c;

for(UInt32 i = 0; i < _tcslen(GetPtr()); i++) { c = GetPtr()[i]; iHash = ((iHash << 5) + iHash) + c; // hash * 33 + c }

return iHash; }

private:

const Char *GetPtr() const { if(m_szPtr) return m_szPtr; else return m_sString.c_str(); }

String m_sString; const Char *m_szPtr; };

HashString can be used this way

	// Declaration of a hash_map
	std::hash_map<HashString, int map;

...

// Insertion of data into hash map["SNES"] = 10; map["N64"] = 20; map["PS"] = 30; map["PS2"] = 40; map["XBOX"] = 50;

...

// Reading int p = map.find(HashString("XBOX", HashString::DontCopy))-second;



In a single loop that only reads from the hash the speedup is over 50% using this HashString. Of course this numbers depends on the string implementation.

So If you have a bottleneck reading hashtables (std::map, std::hash_map, ... MFCs hashtables) an implementation like this may help you.

The zip file viewer built into the Developer Toolbox made use of the zlib library, as well as the zlibdll source additions.

 

Copyright 1999-2008 (C) FLIPCODE.COM and/or the original content author(s). All rights reserved.
Please read our Terms, Conditions, and Privacy information.