Zugzwang
Lightweight CLI for converting the metadata of PGN files from Lichess' game database to a database file, using lazy file enumerators for optimizing the conversion of large PGN files with millions of chess games.
PGN files
PGN (Post-Game Notation) files are plain-text files that provide records of chess games along with their metadata.
Lichess PGN format
The extensive game database provided by Lichess (over 368 million games as of July 2018) uses the following format for each game of within their PGN files.
[Event "Rated Bullet tournament https://lichess.org/tournament/yc1WW2Ox"]
[Site "https://lichess.org/PpwPOZMq"]
[White "Abbot"]
[Black "Costello"]
[Result "0-1"]
[UTCDate "2017.04.01"]
[UTCTime "11:32:01"]
[WhiteElo "2100"]
[BlackElo "2000"]
[WhiteRatingDiff "-4"]
[BlackRatingDiff "+1"]
[WhiteTitle "FM"]
[ECO "B30"]
[Opening "Sicilian Defense: Old Sicilian"]
[TimeControl "300+0"]
[Termination "Time forfeit"]
1. e4 { [%eval 0.17] [%clk 0:00:30] } 1... c5 { [%eval 0.19] [%clk 0:00:30] }
2. Nf3 { [%eval 0.25] [%clk 0:00:29] } 2... Nc6 { [%eval 0.33] [%clk 0:00:30] }
3. Bc4 { [%eval -0.13] [%clk 0:00:28] } 3... e6 { [%eval -0.04] [%clk 0:00:30] }
4. c3 { [%eval -0.4] [%clk 0:00:27] } 4... b5? { [%eval 1.18] [%clk 0:00:30] }
5. Bb3?! { [%eval 0.21] [%clk 0:00:26] } 5... c4 { [%eval 0.32] [%clk 0:00:29] }
6. Bc2 { [%eval 0.2] [%clk 0:00:25] } 6... a5 { [%eval 0.6] [%clk 0:00:29] }
7. d4 { [%eval 0.29] [%clk 0:00:23] } 7... cxd3 { [%eval 0.6] [%clk 0:00:27] }
8. Qxd3 { [%eval 0.12] [%clk 0:00:22] } 8... Nf6 { [%eval 0.52] [%clk 0:00:26] }
9. e5 { [%eval 0.39] [%clk 0:00:21] } 9... Nd5 { [%eval 0.45] [%clk 0:00:25] }
10. Bg5?! { [%eval -0.44] [%clk 0:00:18] } 10... Qc7 { [%eval -0.12] [%clk 0:00:23] }
11. Nbd2?? { [%eval -3.15] [%clk 0:00:14] } 11... h6 { [%eval -2.99] [%clk 0:00:23] }
12. Bh4 { [%eval -3.0] [%clk 0:00:11] } 12... Ba6? { [%eval -0.12] [%clk 0:00:23] }
13. b3?? { [%eval -4.14] [%clk 0:00:02] } 13... Nf4? { [%eval -2.73] [%clk 0:00:21] } 0-1
The data is divided into two sections:
Metadata
This section contains the metadata about the chess game.
In the example game above, the metadata section is the part that looks like:
[Event "Rated Bullet tournament https://lichess.org/tournament/yc1WW2Ox"]
[Site "https://lichess.org/PpwPOZMq"]
[White "Abbot"]
...
[TimeControl "300+0"]
[Termination "Time forfeit"]
This is the section that the CLI is concerned with, and will produce a database record with the fields consisting of the metadata for each game.
PGN
This section contains the actual record of the chess game, the moves are given in Algebraic Chess Notation.
In the example game above, the PGN section is the part that looks like:
1. e4 { [%eval 0.17] [%clk 0:00:30] } 1... c5 { [%eval 0.19] [%clk 0:00:30] }
2. Nf3 { [%eval 0.25] [%clk 0:00:29] } 2... Nc6 { [%eval 0.33] [%clk 0:00:30] }
3. Bc4 { [%eval -0.13] [%clk 0:00:28] } 3... e6 { [%eval -0.04] [%clk 0:00:30] }
...
12. Bh4 { [%eval -3.0] [%clk 0:00:11] } 12... Ba6? { [%eval -0.12] [%clk 0:00:23] }
13. b3?? { [%eval -4.14] [%clk 0:00:02] } 13... Nf4? { [%eval -2.73] [%clk 0:00:21] } 0-1
Since this CLI is not concerned with individual moves or position analysis, this data is discarded during the parsing process.
Installation
To install the CLI:
$ gem install zugzwang
Usage
Once you have installed the CLI, you can use it with the command:
$ zugzwang
The only command available is create
.
$ zugzwang create [DATABASE] *[ITEMS] --extension, --extension=EXTENSION
# Converts the metadata of PGN files from Lichess' game database to a database file.
Arguments and flags
DATABASE
- The path to the newly generated database file.
DEFAULT: lichess
Examples:
games
- Generates a database file calledgames
(with extension specified by flag) in the current directory.path/to/db/games
- Generates a database file calledgames
(with extension specified by flag) in thepath/to/db
directory, which the user is prompted to create if it does not exist.
*[ITEMS]
- The PGN files to parse (or directory to recursively search for PGN files), separated by spaces.
Patterns are accepted, and an argument is assumed to be a directory if no extension is given.
Examples:
test.pgn games/game1.pgn games/game2.pgn games/game3.pgn
- Specifying individual filestest.pgn games
- Reads thetest.pgn
files, then searches thegames
directory recursively for anypgn
files.test.pgn games/*.pgn
- Does the same as the previous command, but using patterns..
- Does the same as the previous command by recursively searching the current directory.
--extension
- The file extension to be given to the database.
DEFAULT: sql
The file extension should be given without the preceding .
dot.
Expected file extensions are sql sqlite sqlite3 db
, but it is possible to override this restriction and try to generate a database with any extension, provided it is compatible with Sequel's database class constructor.
Ideally, the extension should be specified at the end of the command.
Examples:
--extension=sqlite3
- Extension specifier with explicit equals sign.--extension sqlite3
- Extension specifier without explicit equals sign.
Examples
$ zugzwang create lichess/db/2018-05 games-2018-05.pgn --extension sqlite3
- Creates a database file at
lichess/db/2018-05.sqlite3
, populating it with data from filegames-2018-05.pgn
.$ zugzwang create database games/*.pgn --extension sql
- Creates a database file at
database.sql
, populating it with data from.pgn
files located within thegames
directory.$ zugzwang create games/1 . --extension db
- Creates a database file at
games/1.db
, populating it with data from.pgn
files found from recursively searching the current directory.
TODO
- [ ] Add Postgres support
- [ ] Add multithreaded PGN file parsing
- [ ] Add specs/testing
Development
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/eonu/zugzwang.