rsssf - tools 'n' scripts for RSSSF (Rec.Sport.Soccer Statistics Foundation) archive data

What's the Rec.Sport.Soccer Statistics Foundation (RSSSF)?

The RSSSF collects and offers football (soccer) league tables, match results and more from all over the world online in plain text.

Example:

Round 1
[May 25]
Vasco da Gama   1-0 Portuguesa
 [Carlos Tenório 47']
Vitória         2-2 Internacional
 [Maxi Biancucchi 2', Gabriel Paulista 11'; Diego Forlán 29', Fred 63']
Corinthians     1-1 Botafogo
 [Paulinho 73'; Rafael Marques 24']
[May 26]
Grêmio          2-0 Náutico         [played in Caxias do Sul-RS]
 [Zé Roberto 15', Elano 70']
Ponte Preta     0-2 São Paulo
 [Lúcio 9', Jádson 44'p]
Criciúma        3-1 Bahia
 [Matheus Ferraz 45'+1', Lins 46', João Vítor 82'; Diones 72']
Santos          0-0 Flamengo        [played in Brasília-DF]
Fluminense      2-1 Atlético/PR     [played in Macaé-RJ]
 [Rafael Sóbis 15'p, Samuel 53'; Manoel 28']
Cruzeiro        5-0 Goiás
 [Diego Souza 5', Bruno Rodrigo 30', Nílton 40',79', Borges 42']
Coritiba        2-1 Atlético/MG
 [Deivid 53', Arthur 90'+1'; Diego Tardelli 51']

Find out more about the Rec.Sport.Soccer Statistics Foundation (RSSSF) »

Usage

Working with Pages

To fetch pages from the world wide web use:

page = RsssfPage.from_url( 'http://www.rsssf.com/tablese/eng2015.html')

Note: The RsssfPageFetcher will convert the rsssf archive page from hypertext (HTML) to plain text e.g.

<hr>
<a href="#premier">Premier League</A><br>
<a href="#cups">Cup Tournaments</A><br>
<a href="#champ">Championship</A><br>
<a href="#first">Division 1</A><br>
<a href="#second">Division 2</A><br>
<a href="#conf">Conference</A>
<hr>
<h4><a name="premier">Premier League</A></h4>
<pre>
Final Table:

 1.Chelsea                 38  26  9  3  73-32  87  Champions
 2.Manchester City         38  24  7  7  83-38  79
 3.Arsenal                 38  22  9  7  71-36  75
...

will become

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
‹Premier League›
‹Cup Tournaments›
‹Championship›
‹Division 1›
‹Division 2›
‹Conference›
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

#### Premier League


Final Table:

 1.Chelsea                 38  26  9  3  73-32  87  Champions
 2.Manchester City         38  24  7  7  83-38  79
 3.Arsenal                 38  22  9  7  71-36  75
...

Working with Repos

To fetch pages from the world wide web for many seasons in batch setup and use a repo.

Step 1: List all archive pages

In the tables/config.yml list all archive pages to fetch. Example:

2010-11: tablese/eng2011.html
2011-12: tablese/eng2012.html
2012-13: tablese/eng2013.html
2013-14: tablese/eng2014.html
2014-15: tablese/eng2015.html

Step 2: Fetch all archive pages

Use:

repo = RsssfRepo.new( './eng-england', title: 'England (and Wales)' )
repo.fetch_pages

Bonus: To create a summary of all pages fetched (e.g. authors, last_updated, sections, etc.). Use:

repo.make_pages_report

Example - tables/README.md:

football.db RSSSF Archive Data Summary for England (and Wales)

Last Update: 2015-11-26 18:22:22 +0200

Season File Authors Last Updated Lines (Chars) Sections
2014-15 eng2015.txt Ian King and Karel Stokkermans 4 Jun 2015 1249 (34138) Premier League, Cup Tournaments, Championship, Division 1, Division 2, Conference
2013-14 eng2014.txt Ian King and Karel Stokkermans 5 Feb 2015 1254 (34294) Premier League, Cup Tournaments, Championship, Division 1, Division 2, Conference
2012-13 eng2013.txt Karel Stokkermans 5 Feb 2015 1269 (34531) Premiership, Cup Tournaments, Championship, Division 1, Division 2, Conference
2011-12 eng2012.txt Karel Stokkermans 5 Feb 2015 691 (21925) Premiership, Cup Tournaments, Championship, Division 1, Division 2, Conference
2010-11 eng2011.txt Ian King, Karel Stokkermans and Jan Schoenmakers 5 Feb 2015 959 (37393) Premiership, Cup Tournaments, Championship, Division 1, Division 2, Conference

That's it.

Preparing Archive Pages for SQL Database Imports (e.g. football.db)

To import match schedules (fixtures and results) and more using the football.db machinery prepare "simple" single league (or cup) pages with standings tables etc. stripped out. For example, to break-out the Premier League and FA Cup from the eng2015.txt archive page use:

page = RsssfPage.from_url( 'http://www.rsssf.com/tablese/eng2015.html')

schedule = page.find_schedule( header: 'Premier League')   ## returns RsssfSchedule obj
schedule.save( './1-premierleague.txt' )

schedule = page.find_schedule( header: 'FA Cup', cup: true )
schedule.save( './facup.txt' )

Install

Just install the gem:

$ gem install rsssf

RSSSF Datasets

See the rsssf github org for pre-processed ready-to-import datasets. Prepared repos include:

  • eng-england - rsssf archive data for England - Premier League, Championship, FA Cup etc.
  • de-deutschland - rsssf archive data for Germany (Deutschland) - Deutsche Bundesliga, 2. Bundesliga, 3. Liga, DFB Pokal etc.
  • es-espana - rsssf archive data for España (Spain) - Primera División / La Liga, Copa de Rey, etc.
  • at-austria - rsssf archive data for Austria (Österreich) - Österr. Bundesliga, Erste Liga, ÖFB Pokal etc.
  • br-brazil - rsssf archive data for Brazil (Brasil) - Campeonato Brasileiro Série A / Brasileirão etc.
  • and more

License

The rsssf scripts are dedicated to the public domain. Use it as you please with no restrictions whatsoever.

Questions? Comments?

Send them along to the Open Sports & Friends Forum/Mailing List. Thanks!