ArticleFixtureGen

This Gem implements a utility, article_fixture_gen, which generates blog-style article/post fixture data for use in development testing. Initially, the output is in YAML format.

Why another article-data generator? This optionally generates "marker tag pairs" in the articles' body mark­up. These in turn are used to simulate user action affecting data; see below for a more complete descrip­tion. This feature was the original justification for developing the utility.


Contents

  1. Installation
  2. Usage
  3. Options
    1. Command-Line Options
    2. Configuration-File Options
  4. What Are Marker Tag Pairs?
    1. Single-Marker Tag Pair
    2. Paired-Marker Tag Pair
  5. Article Fixture-Data Format
  6. Development
    1. Running Tests
    2. Feasible Future Features
  7. Contributing
    1. Process
    2. Notes on Contributing
  8. License

Installation

Add this line to your application's Gemfile:

gem 'article_fixture_gen'

And then execute:

$ bundle

Or install it yourself as:

$ gem install article_fixture_gen

Usage

From the command line, running

$ article_fixture_gen

with no existing configuration as described below, will produce YAML output, sent to standard output, for five articles that can then be loaded to create test-fixture entities in your application development.

Options

Each of these options may be specified either from the command line or from a configuration file. Configuration specified on the command line overrides conflicting settings specified in a configuration file; in other words, the effective value of a particular option is determined based on

  1. the specified default value, which may be overridden by
  2. a value specified in a configuration file, which may be overridden by
  3. a value specified on the command line.

Specifying an invalid value for any setting in a configuration file and/or on the command line will cause the invalid value to be ignored.

For any option which is one of a pair of min/max options, such as --para-count-min and --para-count-max, if the effective value of the *-min setting is greater than the effective value of the*-max setting, then execution will be aborted with a suitable error message. Specifying the same effective value for both will, of course, function as a fixed value, eliminating randomised variation. When the *-min and *-max settings have different values, then the value used at any given time will be a random value between the minimum and maximum values, inclusive.

Command-Line Options

--article-count

Specifies the number of article fixtures to generate. Valid values are positive integers; the default is 5 (equivalent to --article-count 5). Not specifying the option, or specifying an invalid value, will use the default.

—config

Specifies the filename of a configuration file containing settings for this utility. If no configura­tion file is specified, and one exists in the afg-config.yml file in the current directory, it will be used.

—generate-config

Specifies the filename of a configuration file to which all currently-effective option settings will be written. If the filename specified here is the same as that specified for the —config option above, then the file will be written to only after any existing settings are read. This allows the user to, for example, set the one or two options that she knows she cares about, generate a full configuration file with all settings, and experiment from there.

--para-count-max

Specifies the maximum number of (HTML) paragraphs in each generated fixture. This defaults to 10 (equivalent to --para-count-max 10) and, if specified, must be at least 1.

--para-count-min

Specifies the minimum number of (HTML) paragraphs in each generated fixture. This defaults to 2 (equivalent to --para-count-min 2 and, if specified, must be at least 1.

--pmtp-count

Specifies the number of paired-marker tag pairs to generate. This defaults to 2 (equivalent to --pmtp-count 2) and, if specified, may be any integer greater than or equal to 0. (Note that a combination of short content (paragraph/sentence count) with a high pMTP count is inadvisa­ble, as each pMTP must contain one or more words and each pMTP must not overlap another.)

--pmtp-text

Specifies the text portion of paired-marker tag pairs' id attribute values. This defaults to paired-mtp (thus equivalent to specifying --pmtp-text paired-mtp) and, if specified, may be any non-empty contiguous text string made up of alphabetic characters optionally interspersed with hyphen (-) characters, so long as neither the first nor last characters are hyphens.

--sent-count-max

Specifies the maximum number of sentences in each paragraph within each generated fixture. This defaults to 8 (equivalent to --sent-count-max 8) and, if specified, must be at least 1.

--sent-count-min

Specifies the minimum number of sentences in each paragraph within each generated fixture. This defaults to 1 (equivalent to --sent-count-min 1), which is the minimum allowed value.

--smtp-count

Specifies the number of single-marker tag pairs to generate within the content. This defaults to 4 (equivalent to --smtp-count 4) and, if specified, may be any integer greater than or equal to 0. A relatively high sMTP/word count ratio doesn't have the same impact that it does for pMTPs, since it's permissible to have multiple immediately adjacent single-marker tag pairs.

--smtp-text

Specifies the text portion of single-marker tag pairs' id attribute values. This defaults to single-mtp (thus equivalent to specifying --smtp-text single-mtp) and, if specified, may be any non-empty contiguous text string made up of alphabetic characters optionally interspersed with hyphen (-) characters, so long as neither the first nor last characters are hyphens.

Configuration-File Options

Option settings may be defined in a YAML-formatted configuration file and loaded into the utili­ty via the —config command-line option. An example configuration file which simply lists all de­fault settings is shown below:

---
:article_count: 5
:para_count:
  :max: 10
  :min: 2
:pmtp_count: 2
:pmtp_text: paired-mtp
:sent_count:
  :max: 8
  :min: 1
:smtp_count: 4
:smtp_text: single-mtp

What Are Marker Tag Pairs?

A marker tag pair, or MTP, is a specifically-formatted <a></a> tag pair, where

  1. no content separates the opening and closing tags;
  2. the opening tag has an href attribute whose value is the empty string; and
  3. the opening tag has an id attribute, whose value is formatted differently based on whether it is a single-marker or paired-marker tag pair, but in any case will start with a sequence of text characters (alphabetic or hyphens), followed by a

Single-Marker Tag Pair

A single-marker tag pair (sMTP) is, logically enough, one <a></a> tag pair, where the id attri­bute of the opening tag follows the format TEXT-uuid, where TEXT can be any sequence of alphabetic characters, possibly intermixed with hyphen (-) characters. The uuid placeholder represents a UUID.

Paired-Marker Tag Pair

A paired-marker tag pair (pMTP) is a sequence of two <a></a> tag pairs, where the id attribute of each opening tag follows the format TEXT-uuid-begin for the first of the two tag pairs, and TEXT-uuid-end for the second. The TEXT and uuid placeholders follow the same conven­tions as for sMTPs, with the restriction that both tag pairs' TEXT and uuid placeholders must be identical, and the -begin MTP must occur before the -end MTP.

Article Fixture-Data Format

Each conceptual Article has the following attributes, included in the YAML output:

  1. A :title, which is a simple text string of between three and ten (random) words;
  2. An :author_name, which is a text string, containing a randomly-generated given and "family" name fol­lowing simple Western convention;
  3. A :body, which is a string of (valid, well-formed) HTML (as XHTML/HTML5) content comprising one or more <p></p> tag pairs with random generated text inside the paragraphs. Optional settings may cause additional HTML to be inserted into the paragraphs' text; however, each is still guaranteed to be valid and well-formed. Note that multiple paragraphs are not enclosed within a containing `
    or other tag pair; if the HTML tool you use (such as Ox) chokes on such multiple elements without a shared containing element, you'll need additional processing once the YAML has been parsed;
  4. An array of :keywords, which are zero to at most ten random short strings of one or more words each. There is no guarantee that any two generated Article entries will share any keywords, but no keywords will be repeated within any given Article;
  5. An :image_url, which is a string in the form http://www.example.com/image_NUM.FMT, where NUM is text for a one- to four-digit number (1 to 9999 inclusive) and FMT is either png or jpg. Both components are randomly selected each time;
  6. A :created_at timestamp, which is a YAML-formatted timestamp with sub-second (formatted as pico­second) resolution;

Your Article may not need all of these fields; they were used for different applications within our organisa­tion. Future releases are expected to support filtering the then-supported attributes.

Development

After checking out the repo, run bin/setup to install dependencies (which as of now must already be in­stalled on your local system). Then, run bin/rake test to run the tests, or bin/rake to run tests and, if tests are successful, further static-analysis tools (RuboCop, Flay, Flog, and Reek).

To install your build of this Gem onto your local machine, run bin/rake install. We recom­mend that you uninstall any previously-installed "official" Gem to increase your confi­dence that your tests are running against your build. You should then be able to run the article_fixture_gen command-line tool.

Running Tests

Running tests other than command-line tests (within the test/exe/article_fixture_gen/ directory) works just as you would expect for individual MiniTest::Spec test scripts; you can run a command line such as ruby test/article_fixture_gen/data/article_test.rb to run a single test-spec file. Also, running rake and rake test works just as you'd expect for running the complete set of tests.

To run individual command-line tests, two changes are necessary:

  1. Install your current version of the Gem in your local Gem repo with the command line rake install;
  2. After successfully installing, run a single test prefixing the command line with bundle exec, as in bundle exec ruby test/exe/article_fixture_gen/version_test.rb.

If anyone knows how to streamline this, to write the command-line tests so that they can be run from the command line without bundle exec as is done for all other tests, please submit a pull request!

Feasible Future Features

Suppressing Field Output

The attributes specified above may not be necessary in all uses of the utility. To suppress out­put of undesired fields, new options may be supported; for example, --no-image-url would sup­press the :image_url attribute.

Other Ideas?

Do you see something we missed that you'd find useful? Open an issue and PR and let's have a chat about it!

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/jdickey/article_fixture_gen. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to ad­here to the Contributor Covenant code of conduct.

Process

If you wish to submit a new feature to the Gem, please open an issue to discuss your idea with the main­tainer and other interested community members. Issue threads are a great place to thrash out the details of what you're trying to accomplish and how your work would affect other code and/or community mem­bers. If you need help with something, or aren't sure how to choose between different ideas to accomplish some detail of what you're setting out to do, this is the place to discuss that. There is no such thing as a stu­pid question that you don't know the answer to (once you've researched in your search engine of choice, of course; please do respect people's time and attention).

The processes for proposing a new feature or a fix to an open bug-report issue are very similar:

  1. Make sure that you have forked this Gem's repository on GitHub to your own GitHub account. (If you don't yet have a GitHub account, join; it's free.)
  2. If you're proposing a new feature, open an issue as suggested above. If you're addressing an existing issue, thanks; you don't need to open a new one.
  3. Clone your copy of the repo to your local development system.
  4. Create a new Git branch for your work. It's best to give it a reasonably short name that's suggestive of what you're specifically trying to accomplish.
    1. If you're adding a new feature, please use a descriptive name for your feature branch. output-as-fooml is a much more descriptive name than my-branch when you're working on adding the ability to output FooML instead of YAML, for example.
    2. If you're adding a fix for an existing issue other than a new feature which you've opened to propose a new feature, then use the branch name issue-nnnn, where nnnn is the issue number in Github. For example, if you're working on Issue #4972, then you'd ordinarily name your branch issue-4172.
    3. Do not work on your copy of the master branch! Any pull request (see below) that you later submit for changes you've made on master will be rejected, and you will be asked to submit your pro­posed changes on a branch that branches from a commit on the upstream master branch.
  5. Now write great (tests and) code!
  6. As soon as you have something to show, even if it's not complete yet (but it passes what tests you have), push your branch to your forked repo on GitHub and open a new pull request ("PR") for your branch com­pared to master on the upstream repository. That lets the maintainer and other community mem­bers review your code and tests, comment, help out, and so on.
  7. Continuing with your pull request, it's usually better if you make small, incremental changes in each commit in a sequence. We (endeavour to) practice behaviour-driven development: write tests for the simplest thing that could possibly work; see the tests fail; then make them pass, commit, and go on to the next simplest thing. Don't get hung up on lots of refac­toring until you have code that does everything you want it to do; once you have a legiti­mately complete green bar, that's the time to apply SOLID principles and patterns to DRY things up. Better to have (temporary) duplication than choose the wrong abstraction.

Notes on Contributing

Don't be discouraged if it takes several commits to complete your work and then several more to get everybody agreeing that it's complete and well done. ("Useful" and "worth adding" should have been settled at the issue stage, before you started working on your PR.) That's because…

When pull requests are merged into the master branch, they are squashed so that all changes are applied to master in a single commit. This means that, even if you have a dozen or more commits in your PR where you've been very incremental, and even changed direction once or twice, what matters is the final result; not what it took to get there.

Once your PR has been merged, it's a good idea to pull the upstream master branch to your development system (git pull upstream master) and then push it to your fork (git push origin master). (What? You don't have an upstream remote as shown by git remote -v? Run the command git remote add upstream https://github.com/TheProlog/prolog_minitest_matchers.git from your local development directory, and now you do.)

License

The Gem is available as open source under the terms of the MIT License.