The first and most obvious question is: why Z80 and ZX Spectrum? Well, if you analize it, it is a clear mistake, but take into account that I was not planning to write any chess game. For a rough comparison Z80 instruction set is around 1268 instructions, while x86 processor would have around 3683 instructions available. If you look at the registers, using 16, 32 or 64 bye registers becomes an immediate advantage. In the best case, you could have the whole board stored in a 64-byte register. x86 family is much more flexible on both aspects. The amount of addressable memory is noy key in this case. Therefore, it would seem more appropriate to choose a more modern processor, like Olivier Poudade or Oscar Toledo did for their chess implementations. Later, looking at it more positively, I thought that going for the ZX Spectrum would be a nice tribute to ZX81 1K Chess.
The heart fothe program: although the code does not do any recursive tree search, it has a piece of code that implements what I call 1.5 ply depth. 1st ply covers all computer moves, and 2nd ply would mean that all potential responses from the human side can be analized too. 1st ply is covered by the code, and in parallel, the program generates a bitboard with all squares attacked by the opponent (it does not know which opponent's piece is attacking the potential destination square, though). Therefore, it's able to take all those attacked squares into consideration. As I see it, it sits somewhere between 1 and 2 ply search, so that's why I call it 1.5 plies depth.
If you ask me, the most difficult part to write in such short space is the candidate move list generation (genlis in my code). The algorithm is very tricky and it's very difficult to cover all possibilities. I still have a small issue with attacked squares and pawn moves, as you can see in the code.
How it works
At program start, the screen is set to be used. And then, the program is basically a simple loop, where many sections are shared for both human and computer moves.
Common part: common code for both sides after producing a move from any side,
- Common part: At the beginning of the loop, the program generates a list with all legal moves for current white pieces and it decides who's turn it is.
- Sides move: Depending on the turn,
- White (human): it refreshes the board, requests for human input in coordinate format (i.e. e2e4), converts it to a computer usable format (source square, and target square) and (optionally) checks the move validity against the legal moves list.
- Black (computer): it takes the list of legal moves generated in the common part and evaluates the position for each move based on several criteria. I would not call it AI as it is very simple and it does not do any tree search, it just evaluates the position after each possible move, but it does its job. After lots of different tests I ended up implementing these criteria below, very simple, but able to give some direction to the computer play.
This approach, where only the moved piece impact is evaluated -ignoring the rest or the armies- and not the board as a whole has clear evaluation drawbacks, mainly uncovering main pieces to the enemy. This is particularly bad in the check case. The algorythm is able to find an unattacked square for the king, but it is unable to block checks by placing a piece in the middle, and the program will illegally move another piece while the king is under check.
Each potential move, and by extension the whole position, is valued through a byte value, which is divided into two. The high nibble (half byte) stores the materials value of the move, and the low nibble stores the target square weight. Therefore, a nice capture is always better than the best target square with no capture. Valuation is based on:
- Target square piece value, if any.
- Origin piece value, considering if the current square is attacked and/or the target square is defended by the enemy. Placing your piece in a square defended by the other side is heavily penalized, unless the capture makes the sacrifice worth it.
- Target square weight: center columns are more valued, particularly the central ones (C to E). I avoid promoting F column to avoid opening and prevent attacks through F7, as the computer is very bad at escaping checks. Also, farther ranks into the human side are also promoted. With a little tweaking of the square weight values, it is possible to make the computer open using a rational looking sequence. Current piece square is not considered for the square weighting. This could result in a suboptimal move in some cases, as the piece may move to a worse square than its current one, but still the best among all candidates.
- After either black or white move selection, the move is actually updated on the board.
- The board is reversed so that in the next loop iteration, the turn switches to the other side and the moving side is always white. This is particularly important for the valid move generation code which, with this approach, is always run for white pieces.
- Back to the beginning of the loop.
Disclaimer: this section requires some assembly skills for detailed explanations. However, the concepts are mostly explained using non-programming terms.
There are some Z80 features not needed for my chess implementation, like interrupts or In/Out port usage, so there is not much I can bring in on that area. However, I have several interesting findings:
- Board implementation: there are tons of literature on the topic over the internet, but this article at chessprogramming.org is particularly good. Most works out there are focused on execution speed, and not so much on simplicity and memory requirements. After some consideration, I discarded bitmaps, as the memory requirements were much higher, and ended up writing the initial coding using a 8x8 board. Soon I found the famous out of board detection problem and the 0x88 solution. I went through several board variants where the 0x88 out of board detection worked (12x10, 12x8) and ended up using a 16x8 board, where half of the squares are never addressed or used, but the code to run thorugh the board squares was much simpler. It's a pity I did not fully understand all these articles before I went thorugh the learning process the hard way. Only later I found that all my findings had been found long ago by someone else.
- Micro-paging: one of my main finds during this development. I have placed (almost) every data structure in different 256-byte pages, mainly at the beginning of each page. This allows me to hop from one data area to another just by modifying H register, since HL is used for data addressing. Furthermore, if the program follows a linear execution sequence, data pages can be arranged in a way that just by increasing/decreasing H register by one (INC H, DEC H, only 1 byte!), we can jump to the required data page at each point.
- Data scattering: in regular programming, data is placed in memory without taking into much consideration whether access to these data is simple or not. In my case, I have placed data structures in memory areas where addressing them is simple. In other words, data location is adapted to the code and not the opposite.
- Long instructions: I avoided using long instructions as much as possible (Z80 has 1-byte to 4-byte long instructions). I totally avoid using 4-byte instructions, and reduce 3-byte instructions to the minimum. This approach has several aspects:
- Avoid IX and IY registers extended instructions (2 to 4 bytes long). It is hard sometimes to avoid it as in some cases all registers are busy. On the other hand, using the stack to save and release busy registers temporarily and recover them afterwards is typically lighter.
- Avoid temporary variables (memory positions). Only registers are used in this program. For instance, R/W operations of memory positions require at least 5 bytes (LD HL,XX; LD A,(HL), LD (HL),A), where R/W of a register takes only 2 bytes.
- Avoid procedure calls, as CALL + RET set requires 4 bytes.
- Avoid absolute jumps (JP, 3 bytes) are avoided as much as possible. As the code is so small relative jumps (JR, 2 bytes) are possible in almost all cases.
- Coding optimization: some operations have an alternative way to be implemented with a shorter instruction. For instance, the most famous one: if you need to set A=0, the standard instruction would be LD A, 0 (2 bytes). There is another method you can use in most situations, XOR A (1 byte). There are many other optimization tricks, most of them covered in these two sites:
Z80 heaven - Optimization
brandonw.net - Z80 Optimization
- Numeric values: avoid using numeric values when loading registers or addressing. Very often a usable value is available in another register. This saves 1 byte per instruction.
- Discarded ideas: I played with many ideas that did not turn our right, for whateever reason. Some of them:
- I played with the idea of having a 2nd stack for different purposes, but it showed too complex.
- I also started looking at recursive code for position analysis, but again it was going too far in size.
- I also painted the board in different ways, but the method I use seems to be the shortest.
Basic stuff I missed in Z80 assembler
- Nibble (half byte) swap instruction, as it exists in Gameboy Z80 variant (see SWAP instruction). Besides the obvious nibble operations, it would help in shifting/rotating bytes. The equivalent operation with rotate/shift instruction requires 4-bytes, and it can be only done over A register.
- Arithmetic operations are only possible on A register. Addition, subtraction, and logical operations make necessary to release A all the time. I know this is part of the concept of this processor family, but it makes the code unnecessarily longer.
- DJNZ uses the wrong register. DJNZ is a super nice idea, but the register chosen should have been a low half of the 16 bit registers (C, E or L), and not B. This would have enabled the chosen 16 bit register suitable run thorugh arrays/tables.
- HL operations. having some operations only available on HL . It would have been very handy to have them with at least another register pair (i.e. LD B, (DE)).