Monday, January 31, 2011

Evaluation evaluation

Last time I thought about the evaluation part of the engine. This part is responsible for returning an absolute score for a certain position. Here all the chess specific knowledge is contained. The engine must be able to see whether a certain position is in favor for white or black, so it knows which position it should avoid and which positions it should play towards to.

Easy evaluation terms are material based. The side with more material has an advantage. The bigger the material difference the higher the advantage. But even material becomes difficult when you try to fine tune that.

Example question: What is better, a queen or two rooks ?

The answer depends on the other material on the board, if there are a lot of pawns on the board, the queen might be superior because the rook paths are blocked. If there are only a few paws on the board the rooks gain a lot of strength.

A queen that is not opposed by a bishop pair is stronger than one were the opponent still possess the bishop pair.

So to tune the evaluation is probably the must difficult part and also the part the engine gets its personality from. What focus I give on certain terms will determine the engine play, so it is quite an interesting subject.

For the programming I maintain a header file with all the evaluation terms and scores associated. This is very time consuming to maintain especially if you want to have a nice code formatting.

I started to maintain the evaluation terms in a Excel and wrote a Visual Basic macro that exports the terms into c++ code. This is really helpful.


Monday, January 24, 2011

Including binary chess endgame tables into ice executable

In order the include basic functionality into the new c++ based ice chess engine I moved to basic endgame play. Basic endgame means that the engine should understand when a 3 man endgame is a draw, when it is a win and how to play it correctly. The most accurate way is to make endgame tables for KPK, KRK and KQK available to the engine.

From my former work with endgame table for the old mace engine I calculated all those tables and it was time to use them for ice also.

Because this knowledge is so essential and the engine relies on having it available I decided to include the tables into the binary executable and not to distribute them as separate files along with the engine.

To do that I added a function to my table base generator that generates c++ header files that include the contents of the table.The generated file looked like this.

#ifndef KPK
#define KPK

| Chess Endgame Table Base Data for Endgame Type : KPK        |
| created by table base creator 1.0                           |
| (c) by Thomas Petzke                                        |
| 23.01.2011 18:18                                            |

const int KPK_TBL_SIZE = 79865;
const short KPK_TBL[KPK_TBL_SIZE] = {

I estimated that the executable growths by a few 100k in size when this tables are included. To my surprise the gnu c compiler added almost 2MB to the size of the executable, so much more than necessary for the raw data. Obviously he aligns the array content for faster access so each element takes 8 bytes (instead of the 2 bytes of a short).

As speed is not really a problem here, the array is read only once when the program initializes, so when this takes a few ms longer no one cares, I tried to shrink the size of the executable.

I packed 4 shorts into an 64bit integer, but the size remained the same. Also using pre processor directives did not help. So I minimized the array size by not including illegal positions and easy to recognize draws in it. This helped a bit but the executable is still larger than it should be.

As it is more a cosmetic problem I decided to live with it for the moment. At least it works correctly :-)

White moves and mates in 23 

iCE 0.1 build 511 [2011.1.24]
position fen 8/8/8/8/8/6k1/4P3/2K5 w - - 0 1
go depth 1
info depth 1 seldepth 0 time 0 nodes 7 pv c1c2  nps 6999 score mate 23 hashfull 0 tbhits 7
bestmove c1c2 

Thursday, January 6, 2011

Solving the FINE70 position

So after fixing the last bug (see last post) the engine solves to FINE70 position correctly again. Here is the engine protocol about it for reference

iCE 0.1 build 333 [2011.1.6]
go 26 ply at FINE70 = 8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - -
info depth 1 seldepth 1 time 0 nodes 6 pv a1b1  nps 5999 score cp 97
info depth 2 seldepth 2 time 0 nodes 17 pv a1b1 a7b6  nps 16999 score cp 97
info depth 3 seldepth 3 time 0 nodes 68 pv a1b2 a7b6 b2c3  nps 67999 score cp 99
info depth 4 seldepth 4 time 0 nodes 95 pv a1b2 a7b6 b2c3 b6c7  nps 94999 score cp 99
info depth 5 seldepth 5 time 0 nodes 178 pv a1b2 a7b6 b2c3 b6c7 c3c4  nps 177999 score cp 99
info depth 6 seldepth 6 time 0 nodes 216 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7d7  nps 215999 score cp 99
info depth 7 seldepth 7 time 0 nodes 310 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7d7 c4d3 nps 309999 score cp 99
info depth 8 seldepth 9 time 0 nodes 413 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 b6c7 nps 412999 score cp 99
info depth 9 seldepth 9 time 0 nodes 477 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 b6c7 d3c4  nps 476999 score cp 99
info depth 10 seldepth 11 time 0 nodes 907 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 906999 score cp 99
info depth 11 seldepth 11 time 0 nodes 764 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 763999 score cp 99
info depth 12 seldepth 13 time 0 nodes 1252 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 1251999 score cp 99
info depth 13 seldepth 13 time 0 nodes 1337 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 1336999 score cp 99
info depth 14 seldepth 15 time 16 nodes 1790 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 111874 score cp 99
info depth 15 seldepth 16 time 0 nodes 1771 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 1770999 score cp 99
info depth 16 seldepth 18 time 15 nodes 2284 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 152266 score cp 99
info depth 17 seldepth 18 time 16 nodes 2582 pv a1b2 a7b6 b2c3 b6c7 c3c4 c7b6 c4d3 nps 161374 score cp 99
info depth 18 seldepth 21 time 62 nodes 20832 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 nps 336000 score cp 99
info depth 19 seldepth 20 time 0 nodes 2508 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 nps 2507999 score cp 99
info depth 20 seldepth 22 time 0 nodes 3554 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 nps 3553999 score cp 99
info depth 21 seldepth 24 time 16 nodes 4393 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 nps 274562 score cp 99
info depth 22 seldepth 25 time 31 nodes 5572 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 nps 179741 score cp 99
info depth 23 seldepth 26 time 0 nodes 6317 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 b6c7 nps 6316999 score cp 99
info depth 24 seldepth 28 time 16 nodes 7638 pv a1b2 a7a8 b2c3 a8b7 c3c4 b7b6 c4d3 b6c7 nps 477374 score cp 99
info depth 25 seldepth 30 time 78 nodes 72402 pv a1b1 a7b7 b1c1 b7c8 c1d2 c8d7 d2c3 d7c7 nps 928230 score cp 199
info depth 26 seldepth 31 time 63 nodes 45329 pv a1b1 a7b7 b1c1 b7c8 c1d2 c8d7 d2c3 d7c7 nps 719507 score cp 199
bestmove a1b1

There is always one more bug

Today I ran into an extremely annoying bug. I found it when I was doing a stress test against my transposition table. I do that once in a while to see that recent changes did not break anything.

Usually I run a ply 26 search against the position known as FINE70 (or the Lasker-Reichhelm Position)
8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - -
The only winning moves is Kb1!. All other moves lead to draw. With a correct implemented transposition table the position is solved in about 1 sec or less, without a transposition table or with errors in it it takes much longer or is not solved at all.

The last test revealed that the ice engine was not able to solve it anymore. It ran quite fast through the plys but at ply 26 it showed Kb2 as best move.

So the fun began, trying to find an error in the transposition table. I found several bugs where I used & in a logical expression where it should read && but this did not solve the problem.

The last thing I added was considering piece square values in the evaluation and indeed when I removed that code the position was solved. But the code is really simple, I could not imagine I did anything totally wrong here, I debugged it anyway but to no result. It worked as it should just when it added a small value to the score less than 1/10 of a pawn the position was not solved anymore.
At the end I turned the hash table off and let the engine run without it (it calculated over an hour) and to my surprise the position wasn't solved either. So obviously it was not a hash table problem. So I went to the PVS search, turned off extensions, quiescence search, LMR, zero window search in PVS, everything I could think off. Trying to debug a 26 ply search in a recursive algorithm is really mind boggeling. I wasn't able to spot any error, with every feature alone or all together turned off, the position was still not solved.

I finally started to collect the principle variations at ply 25 to see what happens to the winning move line and to my surprise at ply 25 no principle variation with an exact score that contained the winning move Kb1 was encountered. It seemed that this move and so no line starting with it was existing in the search tree, which was quite odd. But this revealed the most likely source of the error to me. It had to be in the code where the root position is handled. Here for every legal move in the root position a PVS search is performed. (The root node is outside the normal search, because at the root I need more than the score of the position, here I need the best move also).

It showed that I did not run the search for all moves at the root, I skipped the last one, the code was
for (i=0; i < ml->moveCnt -1 ; i++)
It did not show as a problem before because the move list is usually well ordered and the last move will very unlikely turn out as the best. Adding the piece square evaluation to the the score changed the order for the moves in the list, Kb2 was considered better than Kb1 on lower plys (king closer in the center), this changed the order of the moves in the root position and Kb1 was moved to the end for searches from ply 4 onwards. Here it was not considered anymore because of the -1 bug.

So my head hurts from recent head banging but I'm glad I found that nasty little bug. It really bothers me knowing to have a bug somewhere in the code and no be able to find it.

So this one is fixed now, but I already encountered the next one. There seems to be a problem with the zero window search in PVS. There is always one more bug ...

Monday, January 3, 2011

Pros and Cons of MS Visual C++ 2010 Express

Since I was feeling that my Free Pascal based mACE engine executes to slowly I started the iCE engine which is using C++ as programming language. When I came to choose the actual compiler I decided to use MS Visual C++ 2010 Express Edition.

After a few weeks of coding with it here is my very personal opinion about its Pros and Cons

  1. It's free. (I don't want to spend money for a little spare time project)
  2. It has a pleasing ergonomic usage view and feeling. (MS always knew how to make things look nice)
  3. It allows a fast start into programming. Just type a little "hello world" program into the editor and click the run button.
  4. It has an extremely good on the fly syntax checking, called IntelliSense. While you type your code it checks it for syntax correctness. Things like type mismatches, missing brackets, misspelled type names, missing ; or missing includes are spotted and marked (like in MS Word) while you type. This is really impressing and gives you a huge boost in coding performance. When you build your project your code is already pretty good and your are not flooded with countless error messages because of a missing closing bracket somewhere.
  5. It has a powerful integrated debugger, it gives you an auto pane where it automatically displays the variables and values of the just used scope. It is also able to interpret complex statements like (ESquare)((ml->moves[i] & MASK_FROM_SQ) >> SHIFT_FROM_SQ) and it can show you the contents of complex types behind pointers. Free Pascal just showed the memory address of the object which is quite useless and you had to write code to map your objects into local variables as the debugger was not able to display the object directly. 
  6. It supports the INTEL inline assembler style, the syntax is a little different from the one in Free Pascal but basically comparable. And you can start a assembler block with a _asm directive and just write assembler code between { }. When you see the GNU compiler assembler mess with assembler statements being C++ strings in an awkward AT&T style you will see what I mean. I would love to see the architectural decision that made the GNU guys decide to use a programming language from a Telco provider rather than from a processor manufacturer.  (I know the -masm=intel option in GNU, but the assembler syntax is still not the same as in Free Pascal or Visual C++).
  7. In Warning Level 4 it gives you good hints where you might have coded dirty causing you problems later on. It will spot things like if (value = 0), which is in almost all cases an error or usage of not initialized variables (useful when you have a lot of branches in your code and in one of the branches you access the variable without its initialization first).
This is a long list of Pros, longer than I expected, now to the Cons

  1. The Express Edition does not include a code profiler that allows you to find not optimized parts of your code, or just to give you information what code is spend the most time on, so you know where to optimize first. The Professional Edition shall include one (not verified by me) but this edition is not free so for my little non commercial project not affordable.
  2. The generated executable includes dependencies on some dll's so they don't run on other computers unless a redistributable framework is installed on them. There might be a code generation option for a runtime library like "Multithreaded-Debug-DLL" where the dependency  might not exist (not tested so far), but the pure existence of a default that generates an executable that does not run on another computer is annoying. 
  3. The generated code does not seem to be optimal. I use seem, because I might not have found all tweaks to make it faster. VC++ comes with 2 code generation defaults, a DEBUG Release and a RELEASE release. the RELEASE option generates code that is about twice as fast as the DEBUG option, so I assume a lot of optimization is turned on here. But the resulting code was still much slower than the code generated by my Free Pascal engine. Usually this is not a problem as computers today are so fast they run even unoptimized code fast enough, but when you program a chess engine execution speed is one of your main concerns (and it was the reason to start with a C++ engine at the beginning).
The last Con is real concern to me and for this little chess engine project. When the final code is slow it doesn't matter how nice and painless it was to write it. I decided to test another C++ compiler on my source code just to see whether I made mistakes in porting it from Pascal to C++.

I used g++ (the GNU C++ compiler) on my source. I had to make a few source code adjustemenst (like I defined my own INFINITY constant, VC++ was fine with that, g++ not, even with an own namespace). Most hassle went into the inline assembler functions for bitboard operations (finding the least signinficat bit) which did just not compile in g++ even with -masm=intel. I replaced the assembler code with C++ code like

#ifdef _MSC_VER
            ; use bsf to get the least significant bit
      if (B==0) return 64; else return __builtin_ctzll(B);

So it finally compiled. To my surprise the almost identical source code generated a much faster executable with g++. I did not expect such a huge difference.

Just to give an impression of the difference

go depth 11

info depth 10 seldepth 29 time 42125 nodes 7924921 pv b4b7 f7b7 h5g6 h7g6 d8g8 g6f5 g8g4 f5e5 g4h5 f3f5 f2f4 h2f4 h5e2 d1e2 a4e4 d5e4 d3d4  nps 188128 score mate 9 hashfull 190 tbhits 0

info depth 11 seldepth 24 time 135391 nodes 38553528 pv b4b7 f7b7 h5g6 h7g6 d8g8 g6f5 g8g4 f5e5 g4h5 f3f5 f2f4 h2f4 h5e2 d1e2 a4e4 d5e4 d3d4  nps 284756 score mate 9 hashfull 132 tbhits 0
bestmove b4b7

go depth 11

info depth 10 seldepth 29 time 15203 nodes 7924921 pv b4b7 f7b7 h5g6 h7g6 d8g8 g6f5 g8g4 f5e5 g4h5 f3f5 f2f4 h2f4 h5e2 d1e2 a4e4 d5e4 d3d4  nps 521273 score mate 9 hashfull 190 tbhits 0

info depth 11 seldepth 24 time 54906 nodes 38553528 pv b4b7 f7b7 h5g6 h7g6 d8g8g6f5 g8g4 f5e5 g4h5 f3f5 f2f4 h2f4 h5e2 d1e2 a4e4 d5e4 d3d4  nps 702173 score mate 9 hashfull 132 tbhits 0
bestmove b4b7

So the g++ code processed 702.173 nodes per second where the VC++ only processed 284.756 nodes.

My personal outcome:
I will continue to write and debug the code with Visual C++ but for the release code generation I will use the GNU compiler instead.

Saturday, January 1, 2011

C++ and backward for loops

Although I'm programming for many years now sometimes I just make simple coding errors. Just recently I used a statement like
unsigned in i;
for (i=ml->moveCnt; i>=0; i--)
    // do something

It was meant to apply some actions for all moves in a list of moves. In reality this loop will never terminate except for not handled exceptions (like list index out of range). The reason is of course that the loop variable is an unsigned int, so the exit condition is never reached (i will always be greater or equal to 0).

The correct way of coding that would be

unsigned int i;
for (i=ml->moveCnt; i --> 0; )
  // do something

The "i --> 0" looks also very cool, but somehow those loop statements are a bit harder to read. So I use signed int and classical loop statements wherever possible.

Interestingly enough the C++ compiler I use does not warn about the first incorrect statement, even in the highest warning level. It warns however when it encounters a while (true) statement. But then it suggests to use a never ending for loop ( for (;;) )instead. Probably this is the reason why never ending for loops don't raise a compiler warning in Visual C++.