Gem License

Zugzwang

Lightweight CLI for converting the metadata of PGN files from Lichess' game database to a database file, using lazy file enumerators for optimizing the conversion of large PGN files with millions of chess games.

PGN files

PGN (Post-Game Notation) files are plain-text files that provide records of chess games along with their metadata.

Lichess PGN format

The extensive game database provided by Lichess (over 368 million games as of July 2018) uses the following format for each game of within their PGN files.

This format is essentially the same as standard PGN formats, but with a few differences. Particularly with the game record and some of the metadata fields.

[Event "Rated Bullet tournament https://lichess.org/tournament/yc1WW2Ox"]
[Site "https://lichess.org/PpwPOZMq"]
[White "Abbot"]
[Black "Costello"]
[Result "0-1"]
[UTCDate "2017.04.01"]
[UTCTime "11:32:01"]
[WhiteElo "2100"]
[BlackElo "2000"]
[WhiteRatingDiff "-4"]
[BlackRatingDiff "+1"]
[WhiteTitle "FM"]
[ECO "B30"]
[Opening "Sicilian Defense: Old Sicilian"]
[TimeControl "300+0"]
[Termination "Time forfeit"]

1. e4 { [%eval 0.17] [%clk 0:00:30] } 1... c5 { [%eval 0.19] [%clk 0:00:30] }
2. Nf3 { [%eval 0.25] [%clk 0:00:29] } 2... Nc6 { [%eval 0.33] [%clk 0:00:30] }
3. Bc4 { [%eval -0.13] [%clk 0:00:28] } 3... e6 { [%eval -0.04] [%clk 0:00:30] }
4. c3 { [%eval -0.4] [%clk 0:00:27] } 4... b5? { [%eval 1.18] [%clk 0:00:30] }
5. Bb3?! { [%eval 0.21] [%clk 0:00:26] } 5... c4 { [%eval 0.32] [%clk 0:00:29] }
6. Bc2 { [%eval 0.2] [%clk 0:00:25] } 6... a5 { [%eval 0.6] [%clk 0:00:29] }
7. d4 { [%eval 0.29] [%clk 0:00:23] } 7... cxd3 { [%eval 0.6] [%clk 0:00:27] }
8. Qxd3 { [%eval 0.12] [%clk 0:00:22] } 8... Nf6 { [%eval 0.52] [%clk 0:00:26] }
9. e5 { [%eval 0.39] [%clk 0:00:21] } 9... Nd5 { [%eval 0.45] [%clk 0:00:25] }
10. Bg5?! { [%eval -0.44] [%clk 0:00:18] } 10... Qc7 { [%eval -0.12] [%clk 0:00:23] }
11. Nbd2?? { [%eval -3.15] [%clk 0:00:14] } 11... h6 { [%eval -2.99] [%clk 0:00:23] }
12. Bh4 { [%eval -3.0] [%clk 0:00:11] } 12... Ba6? { [%eval -0.12] [%clk 0:00:23] }
13. b3?? { [%eval -4.14] [%clk 0:00:02] } 13... Nf4? { [%eval -2.73] [%clk 0:00:21] } 0-1

The data is divided into two sections:

Metadata

This section contains the metadata about the chess game.

In the example game above, the metadata section is the part that looks like:

[Event "Rated Bullet tournament https://lichess.org/tournament/yc1WW2Ox"]
[Site "https://lichess.org/PpwPOZMq"]
[White "Abbot"]
...
[TimeControl "300+0"]
[Termination "Time forfeit"]

This is the section that the CLI is concerned with, and will produce a database record with the fields consisting of the metadata for each game.

Game record

This section contains the actual record of the chess game, the moves are given in Algebraic Chess Notation, along with the time on the clock for each move.

Additionally, some games may include Stockfish evaluation for each move.

In the example game above, the game record is the part that looks like:

1. e4 { [%eval 0.17] [%clk 0:00:30] } 1... c5 { [%eval 0.19] [%clk 0:00:30] }
2. Nf3 { [%eval 0.25] [%clk 0:00:29] } 2... Nc6 { [%eval 0.33] [%clk 0:00:30] }
3. Bc4 { [%eval -0.13] [%clk 0:00:28] } 3... e6 { [%eval -0.04] [%clk 0:00:30] }
...
12. Bh4 { [%eval -3.0] [%clk 0:00:11] } 12... Ba6? { [%eval -0.12] [%clk 0:00:23] }
13. b3?? { [%eval -4.14] [%clk 0:00:02] } 13... Nf4? { [%eval -2.73] [%clk 0:00:21] } 0-1

Since this CLI is not concerned with individual moves or position analysis, this data is discarded during the parsing process.

Field format changes

To fit with SQL data-type format specifications (and in order to optimize querying), the following PGN metadata fields have had their formats changed within the generated database file:

Field Format in PGN file New format in database
UTCDate YYYY.MM.DD YYYY-MM-DD
WhiteRatingDiff +11 or -11 (example) 11 or -11 (Integer without + for positive)
BlackRatingDiff +5 or -5 (example) 5 or -5 (Integer without + for positive)

Installation

To install the CLI:

$ gem install zugzwang

Usage

Once you have installed the CLI, you can use it with the command:

$ zugzwang

The only command available is create.

$ zugzwang create [DATABASE] *[ITEMS] --extension, --extension=EXTENSION
# Converts the metadata of PGN files from Lichess' game database to a database file.

Arguments and flags

  • DATABASE - The path to the newly generated database file.

DEFAULT: lichess

Examples:

  • games - Generates a database file called games (with extension specified by flag) in the current directory.
  • path/to/db/games - Generates a database file called games (with extension specified by flag) in the path/to/db directory, which the user is prompted to create if it does not exist.
  • *[ITEMS] - The PGN files to parse (or directory to recursively search for PGN files), separated by spaces.

Patterns are accepted, and an argument is assumed to be a directory if no extension is given.

Examples:

  • test.pgn games/game1.pgn games/game2.pgn games/game3.pgn - Specifying individual files
  • test.pgn games - Reads the test.pgn files, then searches the games directory recursively for any pgn files.
  • test.pgn games/*.pgn - Does the same as the previous command, but using patterns.
  • . - Does the same as the previous command by recursively searching the current directory.
  • --extension - The file extension to be given to the database.

DEFAULT: sql

The file extension should be given without the preceding . dot.

Expected file extensions are sql sqlite sqlite3 db, but it is possible to override this restriction and try to generate a database with any extension, provided it is compatible with Sequel's database class constructor.

Ideally, the extension should be specified at the end of the command.

Examples:

  • --extension=sqlite3 - Extension specifier with explicit equals sign.
  • --extension sqlite3 - Extension specifier without explicit equals sign.

Examples

$ zugzwang create lichess/db/2018-05 games-2018-05.pgn --extension sqlite3
  • Creates a database file at lichess/db/2018-05.sqlite3, populating it with data from file games-2018-05.pgn.
$ zugzwang create database games/*.pgn --extension sql
  • Creates a database file at database.sql, populating it with data from .pgn files located within the games directory.
$ zugzwang create games/1 . --extension db
  • Creates a database file at games/1.db, populating it with data from .pgn files found from recursively searching the current directory.

TODO

  • [ ] Add Postgres support
  • [ ] Add multithreaded PGN file parsing
  • [ ] Add specs/testing

Development

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/eonu/zugzwang.