|
Main
Date: 20 Feb 2008 17:23:37
From: dkraft
Subject: hash table size
|
Hi, How large should the hash table size be? The large the better? Or is 1536 MB wasting hash memory? And is 512 MB enough? Thanks Dieter
|
|
|
Date: 20 Feb 2008 11:10:06
From: Guest
Subject: Re: hash table size
|
Basically, the larger the better provided it all fits into physical memory with no chance of paging out to disk. It's not going to fit into the L2 cache anyway.... Basically, run a simple test... With whatever test suite you have, try it with various sizes. The odds are good you'll see that the larger it is, the better the program performs, with the improvements slowing down the larger the table size gets. If you have the memory and it's not being used for anything else, then why not use it? You have nothing to loose by doing so because a larger hash table will never make a program play worse (unless there are bugs in the program, and a few pathological test cases.) Incidentally, this newsgroup is no longer suitable for discussions of computer chess. Too much spam, Sam Sloan, USCF, etc. etec. garbage. Most chess programmers have moved over to: http://www.talkchess.com/forum/viewforum.php?start=0&f=7&topic_view=flat (They have several forums, but that's the programmer's forum.) "dkraft" <[email protected] > wrote in message news:[email protected]... > Hi, > > How large should the hash table size be? > The large the better? > Or is 1536 MB wasting hash memory? > And is 512 MB enough? > > Thanks > Dieter
|
| |
Date: 20 Feb 2008 19:25:26
From: Anders Thulin
Subject: Re: hash table size
|
Guest wrote: > Basically, the larger the better provided it all fits into physical memory > with no chance of paging out to disk. It's not going to fit into the L2 > cache anyway.... I don't quite agree: every hash table entry imposes some administrative overhead that offsets the benefit of having a hash table. If that overhead gets too large, you lose. Just where the cutoff is depends on how the hash table was implemented. Testing can be used to detect the cut-off, but it will probably take a lot of careful testing to discover. So: ask the software writers. They should have an idea if there is a sweet spot or not. -- Anders Thulin anders*thulin.name http://www.anders.thulin.name/
|
| | |
Date: 20 Feb 2008 18:08:17
From: Guest
Subject: Re: hash table size
|
"Anders Thulin" <[email protected] > wrote in message news:G8%[email protected]... > Guest wrote: >> Basically, the larger the better provided it all fits into physical >> memory >> with no chance of paging out to disk. It's not going to fit into the L2 >> cache anyway.... > > I don't quite agree: every hash table entry imposes some administrative > overhead that offsets the benefit of having a hash table. If that > overhead > gets too large, you lose. The only overhead is: HashIndex = Hash % TableSize; That's constant regardless of table size. And most tables are implented one of four ways. 1) Check one entry. 2) Check two entries. 3) Check one hash line, which may contain two or more entries. 4) One small table and one large table. All four cases behave the same regardless of table size. The only additional overhead would be an increase in the number of TLB entries. (The cpu internal entries that convert addresses into physical memory addresses.) But since there are already a lot of them and they aren't going to be able to stay in the L2 cache, every hash check is nearly guaranteed to be a full cost random memory access. So unless your hash table is *very* small (like small enough to fit into L2 or only slightly larger), there is no extra overhead when increasing the size, as long as you stay within physical memory. Now, if you try to clear the hash, then a larger hash will take more time. But that's a one time thing before each search, and most people don't actually clear the hash. Instead other things are done so that old entries are very unlikely to match for a new search. > Just where the cutoff is depends on how the hash table was implemented. > Testing can be used to detect the cut-off, but it will probably take a lot > of careful testing to discover. > > So: ask the software writers. They should have an idea if there > is a sweet spot or not. I am a writer. Not a major program writer, no. But I have written a few chess programs over the years. And this question has come up repeatedly over the years. And the answer is always the same by the major chess program writers. (Hyatt, etc. etc.) People who have done these kinds of tests and always get the same basic answer. As to whether there is a 'sweet spot', that would suggest the most benefit for the amount of memory consumed. Like 75% at 16m, 95% at 32m and 96% at 64m. In that case, the 'sweet spot' would be 95% at 32m. The most 'bang for the buck'. Finding that 'sweet spot' would require testing. Which I did suggest he do. But although going to 64m would cost you an extra 32m for only 1% improvement would make it unatractive, it's still not a bad setting if you have the extra memory to spare. Except for the occasional random 'bad' position that might hurt from a larger hash, the majority of time, for normal positions, more memory is better.
|
| | | |
Date: 21 Feb 2008 16:49:59
From: Anders Thulin
Subject: Re: hash table size
|
Guest wrote: > The only overhead is: > > HashIndex = Hash % TableSize; and later: > All four cases behave the same regardless of table size. No, there's more to it. You show it yourself when you say: > 1) Check one entry. > 2) Check two entries. no further overhead here, true, except that the second may increase code space, and so cause other kinds of overhead. If it's implemented as a 1..2 loop, there is obviously counting overhead. > 3) Check one hash line, which may contain two or more entries. and here you have (I presume) a bucket search, which may be more or less complex depending on how you implement it. I don't see how you can claim no overhead in this situation. > 4) One small table and one large table. But why stop there? The table is not static -- you don't only do searches. You also have to insert new stuff, and weed out old. In the single entry case, it's easy, but in the bucket hash it tends to get hairy: a fast search usually means a slow insert. Now all that adds up: code has to execute to search the table, code hash to execute to change the table. As long as that code executes faster than the code the hash replaces, all is fine. But does it? If the hash table is poorly implemented, you may turn out to have a bucket hash with linear searches of the buckets, and one that resizes by increasing bucket chain length only. I've seen those in actual use -- and turned out by professional programmers, too. It doesn't matter one whit what theory says, if the programmer didn't do a good job implementing it. In this particular case, we don't have a single clue to what code base the question applies to. I assume it applies to code base, of course -- if the question is purely theoretical, one of the standard books on search algorithms will probably be the best source for a reply. -- Anders Thulin anders*thulin.name http://www.anders.thulin.name/
|
| | | | |
Date: 21 Feb 2008 21:08:15
From: David Richerby
Subject: Re: hash table size
|
Anders Thulin <[email protected] > wrote: >Guest wrote: >> The only overhead is: >> >> HashIndex = Hash % TableSize; > >and later: > > > All four cases behave the same regardless of table size. > > No, there's more to it. You show it yourself when you say: > >> 1) Check one entry. >> 2) Check two entries. > > no further overhead here, true, except that the second may increase > code space, and so cause other kinds of overhead. If it's > implemented as a 1..2 loop, there is obviously counting overhead. Yes but that's insignificant and independent of table size. >> 3) Check one hash line, which may contain two or more entries. > > and here you have (I presume) a bucket search, which may be more or > less complex depending on how you implement it. I don't see how you > can claim no overhead in this situation. Because the buckets are of constant size, independent of the size of the hash table. >> 4) One small table and one large table. > > But why stop there? The table is not static -- you don't only > do searches. You also have to insert new stuff, and weed out old. Insertion overwrites old data; it's not deleted `manually'. The hash table is operating as a cache, not a dictionary: if the data you want isn't in its `priy' location, you look in a constant number of other places for it and then give up, deleting the stalest or shallowest data you have and replacing it with the new evaluation. > Now all that adds up: code has to execute to search the table, > code hash to execute to change the table. Yes but the speed of this code is completely independent of the size of the hash table. > If the hash table is poorly implemented ... then you've already lost. Any code might have been badly written and behave stupidly as a result. Yes, it's possible that the hash table was written by an idiot but, if it was, the engine will probably be weak for other reasons, too. In a well-written engine, the optimal hash table size really is `as big as you can have it without swapping.' Dave. -- David Richerby Disgusting Atlas (TM): it's like www.chiark.greenend.org.uk/~davidr/ a map of the world but it'll turn your stomach!
|
| | | | | |
Date: 22 Feb 2008 02:18:37
From: Andy Walker
Subject: Re: hash table size
|
In article <l5A*[email protected] >, David Richerby <[email protected] > wrote: >In a well-written engine, the optimal hash table size really is `as >big as you can have it without swapping.' Whoa, it's not *quite* as simple as that. (a) Even in a well-written engine, you may want/need to "clear out" the table from time to time. Initialising several terabytes may take a little while even on a modern machine .... (b) Some of the standard operations on a hash table involve indexing and finding remainders modulo the table size. These may be more efficient if the table size is a power of 2, and if the relevant numbers fit into [in C terms] ints rather than longs. (c) If you manage to lock down all, or most, of the storage of a machine, you may perhaps be unpopular, in a multi-user environment, with other users of the machine. *Your* program may not be swapping, but *theirs* are! [Even on a home PC, the other users may be *you*, trying to run other things, such as the engine that this program is playing against.] -- Andy Walker Nottingham
|
| | | | | | |
Date: 21 Feb 2008 21:24:56
From: Guest
Subject: Re: hash table size
|
"Andy Walker" <[email protected] > wrote in message news:[email protected]... > In article <l5A*[email protected]>, > David Richerby <[email protected]> wrote: >>In a well-written engine, the optimal hash table size really is `as >>big as you can have it without swapping.' > > Whoa, it's not *quite* as simple as that. > > (a) Even in a well-written engine, you may want/need to "clear > out" the table from time to time. Initialising several terabytes > may take a little while even on a modern machine .... Actually, current programs don't do that. They leave the hash alone. They don't even initialize it when the program is first run. Either they actually use the data from the previous search, or they invalidate it at the time of access... There are several ways to invalidate at access... 1) Change all the hash keys so you get a new random value. This isn't a prefered choice because many people use the hash as a way to quickly check for repetition. 2) The hash entry contains some sort of counter as part of the 'lock'. 3) At time of access into the hash, XOR in an additional key that changes for every search. This way old entries from the previous search (or uninitialized hash memory) will very very rarely be considered valid. The odds are so remote that you are more likely to encounter a normal false hash hit during the search. And if you actually check the returned move as valid (not everybody does because it takes time and they feel the odds are acceptible), then the odds are so remote that you are accessing old data as current data that you can totally ignore it. You have a better chance of getting random bit flip errors in the memory or in the cpu. As for terabytes... well, not too many people have access to machines with terabytes of physical memory.... > (b) Some of the standard operations on a hash table involve > indexing and finding remainders modulo the table size. These may > be more efficient if the table size is a power of 2, and if the > relevant numbers fit into [in C terms] ints rather than longs. It is true that doing an AND is faster than a modulo. But the time difference isn't really all that significant considering how long main memory is going to take to access. And the savings can greatly outweigh the few extra cycles a modulo would take. But yes, many people do prefer a power of two size. And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of 16 bit compilers, but these days on 32 bit systems, both int & long will be 32 bits. And on a 32 bit system, 32 bits will be enough to access all of physical memory. (With the exception of certain hacks that Intel did for some servers to increase memory by paging it.) So there will be no need for 64 bit 'long long' which would involve a more complicated modulo. And on 64 bit systems (like the powerful chess programs all use and even many hobbiests use), int's and long's are both going to be 64 bits. (Well, actually that needs to be qualified a bit, since the C99 standard is a little vague on how to handle all 8, 16, 32, 64 and 128 bit data sizes that can be done on a 64 bit system.) And no 64 bit system is going to have more than 2^64 bytes of memory, so there's no need to get into 128 bit 'long long'. > (c) If you manage to lock down all, or most, of the storage of > a machine, you may perhaps be unpopular, in a multi-user environment, Well DUH. However, on a multi-user environment you are likely to have memory limitations imposed so you can't use more than your fair share. In fact, you may only be given a couple meg of physical memory and all the rest of your data will be paged in & out as needed. Definetly not a good thing for large hash tables. If you are on a system without that restriction, then it's probably not a problem for other reasons, such as verbal agreements between users, etc. etc. And if you are on a system without that restriction and you don't have agreements etc. and you do it anyway and cause problems... (shrug) Tough. That's your problem when the sysadmin catches you. > with other users of the machine. *Your* program may not be swapping, > but *theirs* are! [Even on a home PC, the other users may be *you*, > trying to run other things, such as the engine that this program is > playing against.] No. If you are runing a chess program for a reason (such as a tournament), then you know not to use it for other things. If you intend to use it for other things, then just don't tell it to use all your available memory for the hash. (Although your other programs will reduce the cpu power available to the chess program, which may cost it a full ply of search.) And if you do want to run some other program and a chess program with massive hash tables at the same time.... (shrug) Get a second computer. It's not like they are expensive. A second computer can be bought for $400 that's more than good enough for basic usage, web surfing, etc. For just a few dollars more, you could get a low end laptop instead.
|
| | | | | | | |
Date: 24 Feb 2008 01:23:09
From: Andy Walker
Subject: Re: hash table size
|
In article <[email protected] >, Guest <[email protected]> wrote: >> (a) Even in a well-written engine, you may want/need to "clear >> out" the table from time to time. Initialising several terabytes >> may take a little while even on a modern machine .... >Actually, current programs don't do that. They leave the hash alone. They >don't even initialize it when the program is first run. I don't have access to the source code of the commercial programs, so you seem to have an advantage in this respect. But: >Either they actually use the data from the previous search, or they >invalidate it at the time of access... >There are several ways to invalidate at access... [...] OK. But they all involve trade-offs, inc extra computation and/or extra data stored in the table; which way the trade-offs balance would seem to depend on depth of search, expected number of reads vs writes, relative speed of memory, etc., and it would be at least a slight surprise if they balanced the same way for all engines on all hardware for all time. >> (b) Some of the standard operations on a hash table involve >> indexing and finding remainders modulo the table size. [...] >It is true that doing an AND is faster than a modulo. But the time >difference isn't really all that significant considering how long main >memory is going to take to access. [...] OK, but DR's contention was that swapping was the *only* reason not to have the table as large as possible. Presumably, it is very likely that a table with [eg] 2^25 entries is better than one with 2^25 + 1? >And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of 16 >bit compilers, but these days on 32 bit systems, both int & long will be 32 >bits. And on a 32 bit system, 32 bits will be enough to access all of >physical memory. [...] I was recently using a system with 32-bit ints and 64-bit longs and pointers; I didn't think it particularly unusual. But perhaps it was. >> (c) If you manage to lock down all, or most, of the storage of >> a machine, you may perhaps be unpopular, in a multi-user environment, >Well DUH. >However, on a multi-user environment you are likely to have memory >limitations imposed so you can't use more than your fair share. In fact, >you may only be given a couple meg of physical memory and all the rest of >your data will be paged in & out as needed. Definetly not a good thing for >large hash tables. I don't think I've used a system like *that* since the days when 2M of physical memory was more than "elephantine"! ... >[...] >And if you are on a system without that restriction and you don't have >agreements etc. and you do it anyway and cause problems... (shrug) Tough. >That's your problem when the sysadmin catches you. ... And ever since those days I've *been* the sysadmin, at least for the machine on my desk. But not really the point: >If you are runing a chess program for a reason (such as a tournament), then >you know not to use it for other things. OK. >If you intend to use it for other things, then just don't tell it to use all >your available memory for the hash. (Although your other programs will >reduce the cpu power available to the chess program, which may cost it a >full ply of search.) Then we're back to contention with DR's claim that the larger the better until swapping occurs. Personally, I've just acquired my first "Vista" machine [no flowers, by request], with its annoying habits of firing off all sorts of stuff to and from M$ without any notice, suddenly deciding to re-arrange its discs, telling me at awkward moments to insert such-and-such disc to do dumping, and so on. But even my Unix box is liable to receive e-mail, or to start "cron" jobs. None of these things seem likely to occupy that much CPU, but they do load up some surprisingly large programs. >And if you do want to run some other program and a chess program with >massive hash tables at the same time.... (shrug) Get a second computer. >It's not like they are expensive. A second computer can be bought for $400 >that's more than good enough for basic usage, web surfing, etc. For just a >few dollars more, you could get a low end laptop instead. Perhaps that's a left- vs right-pondian viewpoint. The machines you can get for that sort of price over here really aren't "good enough" at all. If they were merely last year's low-end models reduced to clear, that might be different, but we only seem to get a teensy discount for that. Instead, the very cheap ones are totally inadequate. YMMV. As it happens, Prof Kraft [the OP] really gave us no idea what hardware or software he was asking about, if any [eg, perhaps he is writing his own program, or perhaps he needs background info for an academic paper] so all bets are off. Your initial advice ["Basically, run a simple test..."] to him is sound, of course. My only concern was that one should not be quite so confident that "bigger is better" in all [non-swapping] cases. -- Andy Walker Nottingham
|
| | | | | | | | |
Date: 24 Feb 2008 12:22:31
From: Guest
Subject: Re: hash table size
|
"Andy Walker" <[email protected] > wrote in message news:[email protected]... > In article <[email protected]>, Guest <[email protected]> > wrote: >>> (a) Even in a well-written engine, you may want/need to "clear >>> out" the table from time to time. Initialising several terabytes >>> may take a little while even on a modern machine .... >>Actually, current programs don't do that. They leave the hash alone. >>They >>don't even initialize it when the program is first run. > > I don't have access to the source code of the commercial > programs, so you seem to have an advantage in this respect. But: I don't really either. But the methods have been widely discussed for years. (Well, I do have the same kind of access that everybody else has. The early Fruit & Toga, Strelka (the reverse enginered Rybka), Crafty, and so on. However I generally make a point not to look at them. I much prefer people talking about chess programming ideas and what they do rather than actually looking at the code itself.) Sure, you can clear out the memory. That'll work, and for most memory sizes it wont take an excessive amount of time. Or you can leave it in and use the results for the next search. That's been done by programs too. The current methods of invalidating at time of access have really only become used in the last few years, when processors got so much faster than memories. In the older days, probably right up to the P3 days, processor speeds were closer to main memory, and memory sizes were small enough that directly clearing it wasn't a problem. I don't think even the mainframe programs worried about it. (Although I can't say for sure, since there weren't too many mainframe chess programs left after the mid-80s.) Right off hand, I don't know when the 'invalidate at access' ideas actually came about. >>Either they actually use the data from the previous search, or they >>invalidate it at the time of access... >>There are several ways to invalidate at access... [...] > > OK. But they all involve trade-offs, inc extra computation > and/or extra data stored in the table; which way the trade-offs Just an extra 64 bit XOR. Just a couple cycles per hash check. And considering you'll probably be spending 200 to 400 cycles waiting for a random access to main memory, it's neglible. Some people prefer to use a few extra bits in the hash entry to hold a search ID number. It depends on how much info you try to fit into a hash entry and what cache line size you expect. Since you don't want a hash entry (or group of entries) to span more than one cache line, you may have a few extra bits to spare for a unique search ID. > balance would seem to depend on depth of search, expected number > of reads vs writes, relative speed of memory, etc., and it would > be at least a slight surprise if they balanced the same way for > all engines on all hardware for all time. That is true. There are lots of ways to do things in a chess program. And as I said above, it's only been relatively recent that cpu's became so much faster than main memory, so in the 'old days' it just didn't matter. >>> (b) Some of the standard operations on a hash table involve >>> indexing and finding remainders modulo the table size. [...] > >>It is true that doing an AND is faster than a modulo. But the time >>difference isn't really all that significant considering how long main >>memory is going to take to access. [...] > > OK, but DR's contention was that swapping was the *only* > reason not to have the table as large as possible. Presumably, It is the main reason. But not the only one. > it is very likely that a table with [eg] 2^25 entries is better > than one with 2^25 + 1? Not really any great reason for it to be. There would only be a few reasons it might. 1) The extra size of the hash table. Not too likely one more entry is going to matter! Even doubling a large cache is only going to help a few ratings points once you get past a reasonable size. (I don't have any current numbers for that.) 2) The odd size of the hash table might spread out the entries more or less uniformly. Some truth to that, but for chess hashes, which use random Zobrist keys, it's not really a major issue. Probably wouldn't hurt, though. 3) It being easier to make a hash index out of the 64 bit hash value. Working with powers of two (or +1 or -1 of 2^x) is certainly more convenient. But not really a major problem. True, a 64 bit divide is likely to take around 64 cycles. Some cpu's will be faster and some slower. But slow. Some may even be "dead dog" slow. But you (or the compiler) can probably schedule things so you can do some other stuff while waiting for the divide to finish. If you are on a system where you know it has a slow divide and you still want an odd size hash, all you have to do is multiply by the reciprocal. Since you'll be working with one size hash for the whole program run, it'll be quick & easy to set up. And since we are doing this for an index rather than a math answer, you wont even need to do a 'fix-up' to get the exact right answer. It being off by one will be tolerable since it will be consistantly off by one for that index. (But if you want to do the fix-up, it'd take only another few cycles. Still much faster than a divide.) Do any current programs do it that way.... (shrug) I have no idea. I haven't heard anybody actually say. But it's a simple tweak so I would certainly assume at least some do if they want to use odd size hashes. If you have 4g of memory, the OS and the program is going to use some, so you'd be left with about 1.5g of unused memory if you used only powers of two.. Or they may just go ahead and do a power of two but do a multi-bucket hash entry. They may be able to fit 3 entries per cache line, so their cache size would actually be 3*2^x. There's a lot of room for doing things different ways. It depends a lot on personal preference and what you know about your target hardware. >>And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of >>16 >>bit compilers, but these days on 32 bit systems, both int & long will be >>32 >>bits. And on a 32 bit system, 32 bits will be enough to access all of >>physical memory. [...] > > I was recently using a system with 32-bit ints and 64-bit > longs and pointers; I didn't think it particularly unusual. But > perhaps it was. I'm not sure which you are meaning. 1) An unusual system where the cpu really does work with 32 bit ints but extended pointers. Well... there are quite a few non-x86 systems around. And not all of them do things the same way. Admittedly I am more familiar with regular x86 style consumer processors, rather than RISC, mainframes, workstations, etc. And the C89 & C99 standards provide quite a bit of flexibility in how big things can be. You can indeed have a 32 bit system with a 64 bit address space. The C standard doesn't really care what the system requires or even what the compiler writer chooses to do (61 bit integers and 47 bit pointers on an 8 bit micro, for example.) 2) Simply an extended 32 bit cpu, where the pointers have been hacked in the cpu to be longer. Somewhat like the segment registers of the old 8088 or the stuff Intel put into its servers before they were ready to release 64 bit cpus. I haven't used those Intel servers, but I don't think you can access the extra memory directly. You have to go through some hoops to acess the extra memory. Would the cost be excessive for that increased hash size... Don't know. That'll depend on the chess program & the cpu. However, the extra cost for each access is not going to be large. It'd probably be swamped by the slow memory access times. If main memory often takes 300 cycles for a random access, then what is an extra 4-10 cycles? Your program is still at a dead stop waiting for the hash entry to be returned. 3) A real 64 bit cpu where the compiler just happens to use 32 bits for an int but 64 for longs and pointers. In this case, it doesn't matter. It's still all 64 bit hardware. It's not like a 32 bit cpu working with 64 bit math. The only issues you have to deal with are the unspecified & inconsistant sizes of int & long. The C standard doesn't really care as long as sizeof(short) <= sizeof(int) <= sizeof(long). You could have all of them be 64 bits, if you wanted. It's so hopelessly screwed up that the C99 standard gave up and created brand new data types that you should use whenever size is important. I wouldn't be surprised if the next C standard actually started depricating the use of 'int'. So it is certainly possible you have a 64 bit system where the C compiler does 32 bit ints and 64 bit longs. That's just the compiler. The hardware is still in 64 bit mode, so it's not an issue. You will likely have issues if you switch compilers, though. That's why it's better to use the new data types that C99 specified. I generally never use 'int' for anything except general vars, loop indexes, etc. where there is never any possibility that it would even aproach 32 bits. (Yeah, I am kind of gambling that my programs will never be ported to a system that has less than 32 bit integers. Which is probably safe....) But I guess I shouldn't have done a blanket statement like that. (grin) I guess I sit corrected.... Now, having said all of that.... Hashes are going to be at least 64 bits. Nothing smaller is going to be acceptable. 32 bits stopped being tolerable back in the 70s or early 80s, I guess. 48 bits by the early 80s. At 64 bits, the false hash hit rate on current processors is low enough that it's not a major concern (provided the occasional false hash hit doesn't crash your engine. That's why some authors always verify the suggested move from the hash.) So you aren't really going to be getting into 'int' vs. 'long' anyway. About the only issue would be whether or not your cpu has 64 bit modulo or not. If it doesn't, then a power of two might be a better choice else you might have to do a slower 'software' based modulo. You can do a full cache line of hash buckets per entry. That way you get the advantage of using more memory than a simple power of two while at the same time keeping the hash- >index conversion quick & simple. If you have terabytes of memory, then you probably have a 64 bit cpu, with a full 64 bit ALU with a full 64 bit divide (and mul) instruction. >>> (c) If you manage to lock down all, or most, of the storage of >>> a machine, you may perhaps be unpopular, in a multi-user environment, >>Well DUH. >>However, on a multi-user environment you are likely to have memory >>limitations imposed so you can't use more than your fair share. In fact, >>you may only be given a couple meg of physical memory and all the rest of >>your data will be paged in & out as needed. Definetly not a good thing >>for >>large hash tables. > > I don't think I've used a system like *that* since the days when > 2M of physical memory was more than "elephantine"! ... Well, a couple meg of memory was just an example. But no multi-user environment today is going to let one user hog all the resources of the system. I don't think even the most primative multi-user systems in use today lets a user get away with that. They wouldn't be much of an OS if they did. If you are on a 1g system with 8 other people, no reasonable multi-user OS is going to let one user have 900meg of that and starve the rest of the users. Not without elevating their privileges. The amount of physical memory each user gets is likely to be somewhat variable, changing depending on the load etc. But the user is likely to only be 'guarnateed' a small amount of memory. Everything else will be paged to disk as needed and you can't really count on any particular amount of memory for a hash. So it wouldn't be a good idea for a user to even try to do a big hash. It just wouldn't work well, the program would run slowly and it'd play terrible because stuff would be paging to & from disk. >>[...] >>And if you are on a system without that restriction and you don't have >>agreements etc. and you do it anyway and cause problems... (shrug) Tough. >>That's your problem when the sysadmin catches you. > > ... And ever since those days I've *been* the sysadmin, at > least for the machine on my desk. But not really the point: >>If you are runing a chess program for a reason (such as a tournament), >>then >>you know not to use it for other things. > > OK. > >>If you intend to use it for other things, then just don't tell it to use >>all >>your available memory for the hash. (Although your other programs will >>reduce the cpu power available to the chess program, which may cost it a >>full ply of search.) > > Then we're back to contention with DR's claim that the larger > the better until swapping occurs. Personally, I've just acquired my > first "Vista" machine [no flowers, by request], with its annoying I got my first Vista system (a laptop) a few months ago.... Can't say I like it. > habits of firing off all sorts of stuff to and from M$ without any > notice, suddenly deciding to re-arrange its discs, telling me at You can turn off the background defrag. Along with much of the other background & scheduled stuff. > awkward moments to insert such-and-such disc to do dumping, and so Never had Vista ask me for a disk. > on. But even my Unix box is liable to receive e-mail, or to start > "cron" jobs. None of these things seem likely to occupy that much > CPU, but they do load up some surprisingly large programs. Right. On a highly variable system like that.... well, you are screwed, basically. That's just the nature of having a system that isn't suitable for running a processor intensive, memory hungry program. About the best you can do is only use a few meg and hope that'll be enough and that the OS wont page out even that small amount. Or you live with the variability. But on a system that is fairly stable. With no sudden background task etc. (like my XP box, where very little happens without me doing it), you can get pretty consistant memory & processor availability. >>And if you do want to run some other program and a chess program with >>massive hash tables at the same time.... (shrug) Get a second computer. >>It's not like they are expensive. A second computer can be bought for >>$400 >>that's more than good enough for basic usage, web surfing, etc. For just >>a >>few dollars more, you could get a low end laptop instead. > > Perhaps that's a left- vs right-pondian viewpoint. The machines > you can get for that sort of price over here really aren't "good enough" I don't know where you are. I'm in the U.S. A low end system with a 1.8ghz AMD x2, 1g and 250g drive would be around $400. A little shopping around would probably improve that somewhat. Not a massive cost. Eating out at a fast food resturant 15 times. Eating at a regular resturant 10 times. 10-15 tanks of gas for the car. Most people (and even families) can blow that much money in a month or two and never even notice it. The US, Canada, & UK should have roughly comparable average household incomes. > at all. If they were merely last year's low-end models reduced to clear, > that might be different, but we only seem to get a teensy discount for > that. Instead, the very cheap ones are totally inadequate. YMMV. Depends on what you are wanting to do. The 'average' person is going to surf the web, check email, do some instant chatting, watch a few youtube videos, etc. A more advanced person might do a little programming or video or photo editing, or a game or two. No serious number crunching or heavy 3d game playing. (Yes, they may like heavy 3d games, but they probably aren't going to be playing one in the short time the main system is occupied.) For that, most low end systems *are* good enough to be a secondary computer. That's why they are a secondary system, rather than a second main system. Most low end systems have more computing power than many of the more expensive systems of just a couple years ago. We just expect more from them and the OS (Vista) takes up more of the power. I have two computers at my desk. My laptop and a desktop. Before I got the laptop, I had two desktop systems hooked up to a KVM switch. My main system and a slower old secondary system. The second one usually didn't get used unless I was doing something on the main one that I didn't want to interrupt, or I was wanting to do a long term program that would run several days and I didn't want to tie up the main system. (As a side note, I've read that a lot of chess programmers actually use a small collection of old or cheap computers to do their chess program tuning test games. They can get a couple cheap low end computers and do more test games that way than if they spent the same amount on a high end system.) > As it happens, Prof Kraft [the OP] really gave us no idea what I have no idea who the author was. I guess you recognised him, though. > hardware or software he was asking about, if any [eg, perhaps he is Since he was only talking about using up to 1.5g of hash memory, it obviously was either a small desktop system or a laptop. A high end desktop would have had more memory. Perhaps up to 4g. A server would have had more. A mainframe or workstation would have had more. So it probably was just a plain desktop or laptop. > writing his own program, or perhaps he needs background info for an > academic paper] so all bets are off. Your initial advice ["Basically, If he's needing background info for a paper, then he's in the wrong newsgroup! I don't think he wants to hear much about Sam Sloan etc. etc... He should head over to TalkChess.com That's where the chess programmers hang out these days. > run a simple test..."] to him is sound, of course. My only concern > was that one should not be quite so confident that "bigger is better" > in all [non-swapping] cases. Maybe a blanket statement is a little unwise. Just because of the few situations and systems and such where issues might arise. But it's not too far off though. Close enough that you could safely say something like: "Using as much memory as what's available without paging is nearly always a good idea." Just the addition of the word 'nearly' covers the few cases where it wouldn't. > > -- > Andy Walker > Nottingham
|
| | | | | |
Date: 21 Feb 2008 15:36:15
From: Guest
Subject: Re: hash table size
|
"David Richerby" <[email protected] > wrote in message news:l5A*[email protected]... > Anders Thulin <[email protected]> wrote: >>> 4) One small table and one large table. >> >> But why stop there? The table is not static -- you don't only >> do searches. You also have to insert new stuff, and weed out old. > > Insertion overwrites old data; it's not deleted `manually'. The hash > table is operating as a cache, not a dictionary: if the data you want You pointing out that a chess hash table is being done as a cache rather than a dictionary, and him saying that a table isn't static, actually explains a few things about his message. I bet he's thinking along the line of a a linked list or tree or b-tree or some sort of data structure that gets updated dynamically. Maybe even a structure who's size and organization can change during the search. Stuff gets inserted, removed, re-organized, searched, etc. Data structures and algorithms that are simply not used in a chess hash table. Anders, a chess hash table is done as a fixed sized table. It's a straight array with no other structure. It's not a linked list or a tree or a b-tree or any other 'fancy' data structure. Just a basic linear vector array. To access a hash entry, you do: HashEntry = HashTable [ HashIndex % HashSize]; if (HashEntry.Lock == MAGIC(HashIndex)) return HashEntry; return NULL; To save it, you do: OldEntry = HashTable [ HashIndex % HashSize]; if (OldEntry.Draft >= NewEntry.Draft) HashTable[HashIndex % HashSize] = NewEntry; And so on. If you do a hash entry that actually contains two or more (a fixed number) entries, then the code will be slightly more complicted, but still fairly simple and a fixed cost regardless of the size of the table. Other modifications to the hash table will change things slightly, but again, they will be of a fixed cost that is independant of the size of the table. It didn't occur to me until now that you might be thinking of too high a level of data structures. Since this is a computer chess newsgroup, most people in here are already familar with the basic algorithms, and I just assumed you were.
|
| | | | |
Date: 21 Feb 2008 11:57:48
From: Guest
Subject: Re: hash table size
|
"Anders Thulin" <[email protected] > wrote in message news:[email protected]... > Guest wrote: > >> The only overhead is: >> >> HashIndex = Hash % TableSize; > > and later: > > > All four cases behave the same regardless of table size. > > No, there's more to it. You show it yourself when you say: > >> 1) Check one entry. > > 2) Check two entries. > > no further overhead here, true, except that the second may > increase code space, and so cause other kinds of overhead. > If it's implemented as a 1..2 loop, there is obviously > counting overhead. Say what??? A loop, if you did it that way, would only cost a few cycles. Say 10 cycles. An access to main memory, which is essentially a random access at full cost, will cost at least 200 cycles or even up to 400 cycles. Yeah, main memory is very expensive! Is an extra 10 cycles really significant? And that has *nothing* to do whether you are using 32m, 128m or even 1.5g of memory. That's an implementation detail of how you choose to implement the hash table. It will be a constant and will not be influenced by how big the hash table is. Which is what the original question was.... Which was whether he should go ahead and use 1536m or stay with 512m. >> 3) Check one hash line, which may contain two or more entries. > > and here you have (I presume) a bucket search, which may be more Why a bucket search?? Due to cache line sizes (the only reason to do this kind of hash organization), you'll only have 2 or almost certainly no more than 4 entries. No need for any fancy search when a straight comparison will do fine. Don't go trying to make things more complex than you need to. > or less complex depending on how you implement it. I don't see how > you can claim no overhead in this situation. I'm not claiming no overhead for how the hash table is implemented. I'm claiming there would be no additional overhead (or penalty) if you increase your cache size from 512m to 1.5g, which is what the original question was. It's even what the thread subject says... "hash table size". Not "hash table implementation cost". Go read the original message. I'll save you the trouble and quote it here... *** How large should the hash table size be? The large the better? Or is 1536 MB wasting hash memory? And is 512 MB enough? *** Again, his question wasn't about how to implement it. Whether one method was better than the other or easier or whatever. It was how much memory was enough. Whether using his whole 1.5g available memory was a waste or whether he should cut it back to just 512m. > >> 4) One small table and one large table. > > But why stop there? The table is not static -- you don't only > do searches. You also have to insert new stuff, and weed out old. > In the single entry case, it's easy, but in the bucket hash it > tends to get hairy: a fast search usually means a slow insert. Which will be a constant cost regardless of how big the hash is. Which is what the original question was. > Now all that adds up: code has to execute to search the table, > code hash to execute to change the table. As long as that code > executes faster than the code the hash replaces, all is fine. But > does it? If the hash table is poorly implemented, you may turn You are talking more about bugs etc. than what the original question was about. > out to have a bucket hash with linear searches of the buckets, > and one that resizes by increasing bucket chain length only. > I've seen those in actual use -- and turned out by professional > programmers, too. > > It doesn't matter one whit what theory says, if the programmer > didn't do a good job implementing it. And how is that at all relevant to the original question? > In this particular case, we don't have a single clue to what code > base the question applies to. I assume it applies to code base, Assuming his hash table isn't out right broken, and assuming he hasn't gone out of his way to implement it in an idiotic maner, then the cost of the accessing the hash entry in main memory will be larger than all the other mainenance costs. So, as long as his hash table manages to work, and do the job it's designed for, it will be saving move generation costs, reducing the search bounds, which makes the tree smaller which cuts out evals and move gens and q-searches, etc... So it really all comes down to: 1) Asuming his table isn't broken. 2) Asuming his table wasn't deliberately designed to be inefficient at large memory sizes 3) the implementation details will effect the performance of the hash table (ie: hit rate) 4) the cost of actually maintaining the table will be constant (for whatever method he chooses) regardless of table size. 5) then it all comes down to his original question... How about that. Even after going through your totally off subject stuff, it all still comes down to his original question... Whether it was a waste of memory to use his full 1.5g of available memory or just cut it down to 512m. Unless your next message actually talks about the original subject (which is also what the subject header says), I wont be replying to any more of your trolling. If you want to start a brand new thread that talks about implementation ideas and methods and the overheads involved with each, and so on, then I might be willing to read and participate in that discussion. Although actually, it would be much more convenient to do this over at the TalkChess forum. That's where the major chess programmers are. That way you could get the feedback from dozens of chess programmers with 10 or 20 or more years experience. I don't go there every day, but I'm sure there are many others that would be willing to immediately discuss implementation issues. But if you keep trying to change the meaning of the original question in this thread.... bye. > of course -- if the question is purely theoretical, one of the > standard books on search algorithms will probably be the best source for > a reply. > > -- > Anders Thulin anders*thulin.name http://www.anders.thulin.name/
|
| | |
Date: 20 Feb 2008 20:42:13
From: Gian-Carlo Pascutto
Subject: Re: hash table size
|
Anders Thulin wrote: > I don't quite agree: every hash table entry imposes some > administrative overhead that offsets the benefit of having a hash > table. What would that administrative overhead be? -- GCP
|
|