
Implements a mechanism to parse and utilize IEEE publication identifers.

Historic identifier patterns

There are at least two major "pattern series" of identifiers due to historical reasons: old (type I) and new (type II). This implementation attempts to support both types of publication identifier patterns.

Use cases to support

  • analyze a pattern of type I idetifier

  • parse type II idetifier into components

  • generate a filename from the components similar to type I pattern

Elements of the PubID


Name Abbrev

Institute of Electrical and Electronics Engineers


Report number

{number} - is a set of one or more digits and optional letters


{part} - is a set of digits and optional letters; starts with a digit; if a letter or letters are present then they are in the end; optional


{subpart} - is a set of digits and optional letters; optional, many subparts are possible


{year} - is a set of 4 digits; optional

Corrigendum & Amendment

{cor} - is a corrigendum or an amendments with the pattern Cor {cornum}-{year} or Amd {cornum}:{year} where cornum is a set of digits; optional

Type I pattern

{publisher} {type} {series} {number}{part}.{subpart}{year} {edition}/{conform}/{correction}
  • {publisher} IEEE

  • {type} one of the values: Standard, Std, Draft, Draft Standard, Draft Supplement *

  • {series} one of the values: ISO/IEC, ISO/IEC/IEEE *

  • {number} set of digits optionally prefixed with uppercase letter and optionally suffixed with letter

  • {part} from 1 to 2 digits prefixed with . or - and optionally suffixed with up to 4 letters *

  • {subpart} 1 digit optionally suffixed with a letter *

  • {year} 4 digits prefixed with -, :, ` - `, or breakspace *

  • {edition} prefix Edition followed by a reference in brackets or prefix First edition followed by date in format YYYY-MM-DD *

  • {conform} prefix Conformance followed by 2 digits, dash, and 4 digits year *

  • {correction} prefix Cor optionally followed by breakspace, or prefix Amd followed by ., followed by from 1 to 2 digits, dash and 4 digits year *

(*) - optional

An identifier can be composed of 2 other identifiers with breakspace delimiter. Only the first identifier needs to cnatain puplisher, for the secont it’s optional

Following RegEx expression parses 100% of identifiers from the type I dataset:


Pasing PubID elements from type II identifiers

To parse PubID elements from the type II pattern identifiers we can use a RegEx expression:


This RegEx expession covers 99% of the identifiers from the type II bibxml-ieee dataset.

File name generator

For type I identifiers file names are generated by replacing symbols /, \, ,, ', ", (, ), and breakspace with symbol . Sequences of multiple sybols should be squized to one symbol.

For type II identifiers it needs to parse PubID elements than join the elements in order:
