NewspaperWorksFixtures
This project contains very little code, but provides a set of fixture files under version control to facilitate testing of ingest workflows for newspaper_works, or for other digitized newspaper content workflows in general.
Installation
Add NewspaperWorksFixtures to your Gemfile, preferably only as a test or
development dependency:
group :development, :test do
gem 'newspaper_works_fixtures'
end
Then run bundle install.
Once the gem is installed, you will be able to access the paths to the fixtures by
calling methods on the base NewspaperWorksFixtures class, as in:
2.4.1 :001 > NewspaperWorksFixtures.file_fixtures
=> "/path/to/newspaper_works_fixtures/spec/fixtures/files"
Contents
NDNP 'local' batch
# /path/to/gem/spec/fixtures/files/ndnp/batch_local
NewspaperWorksFixtures.ndnp_local_batch
A small batch of newspaper objects that is intended to mock vendor-provided digitization deliverables (page-level objects, no article segmentation) conforming to Library of Congress NDNP specs. (The data here is from University of Utah.)
This batch includes 1 title, 1 reel, and 1 issue with 2 pages. Each scan has a TIFF, JP2, PDF, and ALTO XML file.
2 image scans; 74 MB
NDNP ChronAm batch
# /path/to/gem/spec/fixtures/files/ndnp/batch_test_ver01
NewspaperWorksFixtures.ndnp_chronam_batch
A small batch of newspaper objects that mimics the BagIt-formatted batches of scanned newspapers
found on the Library of Congress Chronicling America data/batches site.
(The data here is from batch_curiv_jojoba_ver01; page-level objects, no
article segmentation.)
This batch includes multiple titles, reels, target files, issues, and pages. Each scan has a JP2, PDF, and ALTO XML file (no TIFF). All of the corresponding BagIt and METS files are included as well.
11 image scans; 149 MB
Deseret News article segmented batch
# /path/to/gem/spec/fixtures/files/article_segmented/batch_deseret_news
NewspaperWorksFixtures.article_segmented_batch_deseret_news
This batch includes one title, one issue, nine pages, and articles. Each page has a PDF, and ALTO XML file, and each article has a PDF and an ALTO XML file (no TIFF).
Article segmented files: 19 pdf, 19 xml/dtd; 3.9 MB
Page level files: 9 pdf; 5.6 MB; 9 xml/dtd; 3.8 MB
Topaz Times article segmented batch
# /path/to/gem/spec/fixtures/files/article_segmented/batch_topaz_times
NewspaperWorksFixtures.article_segmented_batch_topaz_times
This batch includes an issue, pages, and articles. Each page has a PDF, TIF, ALTO XML file, and an articles XML file.
Article segmented files: 30 PDF, 30 TIF; 1.1 MB
Page level files: 4 PDF, 4 TIF; 876 KB
Credits
This gem is part of a project developed in a collaboration between The University of Utah, J. Willard Marriott Library and Boston Public Library, as part of the "Newspapers in Samvera" project grant funded by the Institute for Museum and Library Services.