Class: NVDFeedScraper

Inherits:
Object
  • Object
show all
Includes:
NvdFeedApi
Defined in:
lib/nvd_feed_api.rb

Overview

The class that parse NVD website to get information.

Examples:

Initialize a NVDFeedScraper object, get the feeds and see them:

scraper = NVDFeedScraper.new
scraper.scrap
scraper.available_feeds
scraper.feeds
scraper.feeds("CVE-2007")
cve2007, cve2015 = scraper.feeds("CVE-2007", "CVE-2015")

Defined Under Namespace

Classes: Feed, Meta

Constant Summary collapse

URL =

The NVD url where is located the data feeds.

'https://nvd.nist.gov/vuln/data-feeds'.freeze

Constants included from NvdFeedApi

NvdFeedApi::VERSION

Instance Method Summary collapse

Constructor Details

#initializeNVDFeedScraper

Initialize the scraper



350
351
352
353
# File 'lib/nvd_feed_api.rb', line 350

def initialize
  @url = URL
  @feeds = nil
end

Instance Method Details

#available_cvesArray<String>

Return a list with the name of all available CVEs in the feed. Can only be called after #scrap.

Returns:

  • (Array<String>)

    List with the name of all available CVEs. May return tens thousands CVEs.



598
599
600
601
602
603
604
605
606
607
608
609
610
# File 'lib/nvd_feed_api.rb', line 598

def available_cves
  cve_names = []
  feed_names = available_feeds
  feed_names.delete('CVE-Modified')
  feed_names.delete('CVE-Recent')
  feed_names.each do |feed_name|
    f = feeds(feed_name)
    f.json_pull
    # merge removing duplicates
    cve_names |= f.available_cves
  end
  return cve_names
end

#available_feedsArray<String>

Return a list with the name of all available feeds. Returned feed names can be use as argument for #feeds method. Can only be called after #scrap.

Examples:

scraper.available_feeds => ["CVE-Modified", "CVE-Recent", "CVE-2017", "CVE-2016", "CVE-2015", "CVE-2014", "CVE-2013", "CVE-2012", "CVE-2011", "CVE-2010", "CVE-2009", "CVE-2008", "CVE-2007", "CVE-2006", "CVE-2005", "CVE-2004", "CVE-2003", "CVE-2002"]

Returns:

  • (Array<String>)

    List with the name of all available feeds.



438
439
440
441
442
443
444
445
# File 'lib/nvd_feed_api.rb', line 438

def available_feeds
  raise 'call scrap method before using available_feeds method' if @feeds.nil?
  feed_names = []
  @feeds.each do |feed| # feed is an objet
    feed_names.push(feed.name)
  end
  feed_names
end

#cve(cve) ⇒ Hash #cve(cve_arr) ⇒ Array #cve(cve, *) ⇒ Array

TODO:

implement a CVE Class instead of returning a Hash. May not be in the same order as provided.

Note:

#scrap is needed before using this method.

Search for CVE in all year feeds.

Examples:

s = NVDFeedScraper.new
s.scrap
s.cve("CVE-2014-0002", "cve-2014-0001")

Overloads:

  • #cve(cve) ⇒ Hash

    One CVE.

    Parameters:

    • cve (String)

      CVE ID, case insensitive.

    Returns:

    • (Hash)

      a Ruby Hash corresponding to the CVE.

  • #cve(cve_arr) ⇒ Array

    An array of CVEs.

    Parameters:

    • cve_arr (Array<String>)

      Array of CVE ID, case insensitive.

    Returns:

    • (Array)

      an Array of CVE, each CVE is a Ruby Hash. May not be in the same order as provided.

  • #cve(cve, *) ⇒ Array

    Multiple CVEs.

    Parameters:

    • cve (String)

      CVE ID, case insensitive.

    • * (String)

      As many CVE ID as you want.

    Returns:

    • (Array)

      an Array of CVE, each CVE is a Ruby Hash.

See Also:



469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
# File 'lib/nvd_feed_api.rb', line 469

def cve(*arg_cve)
  return_value = nil
  raise 'no argument provided, 1 or more expected' if arg_cve.empty?
  if arg_cve.length == 1
    if arg_cve[0].is_a?(String)
      raise 'bad CVE name' unless /^CVE-[0-9]{4}-[0-9]{4,}$/i.match?(arg_cve[0])
      year = /^CVE-([0-9]{4})-[0-9]{4,}$/i.match(arg_cve[0]).captures[0]
      matched_feed = nil
      feed_names = available_feeds
      feed_names.delete('CVE-Modified')
      feed_names.delete('CVE-Recent')
      feed_names.each do |feed|
        if /#{year}/.match?(feed)
          matched_feed = feed
          break
        end
      end
      # CVE-2002 feed (the 1st one) contains CVE from 1999 to 2002
      matched_feed = 'CVE-2002' if matched_feed.nil? && ('1999'..'2001').to_a.include?(year)
      raise "bad CVE year in #{arg_cve}" if matched_feed.nil?
      f = feeds(matched_feed)
      f.json_pull
      return_value = f.cve(arg_cve[0])
    elsif arg_cve[0].is_a?(Array)
      raise 'one of the provided arguments is not a String' unless arg_cve[0].all? { |x| x.is_a?(String) }
      raise 'bad CVE name' unless arg_cve[0].all? { |x| /^CVE-[0-9]{4}-[0-9]{4,}$/i.match?(x) }
      return_value = []
      # Sorting CVE can allow us to parse quicker
      # Upcase to be sure include? works
      cves_to_find = arg_cve[0].map(&:upcase).sort
      feeds_to_match = Set[]
      cves_to_find.each do |cve|
        feeds_to_match.add?(/^(CVE-[0-9]{4})-[0-9]{4,}$/i.match(cve).captures[0])
      end
      feed_names = available_feeds.to_set
      feed_names.delete('CVE-Modified')
      feed_names.delete('CVE-Recent')
      # CVE-2002 feed (the 1st one) contains CVE from 1999 to 2002
      virtual_feeds = ['CVE-1999', 'CVE-2000', 'CVE-2001']
      # So virtually add those feed...
      feed_names.merge(virtual_feeds)
      raise 'unexisting CVE year was provided in some CVE' unless feeds_to_match.subset?(feed_names)
      matched_feeds = feeds_to_match.intersection(feed_names)
      # and now that the intersection is done remove those virtual feeds and add CVE-2002 instead if needed
      unless matched_feeds.intersection(virtual_feeds.to_set).empty?
        matched_feeds.subtract(virtual_feeds)
        matched_feeds.add('CVE-2002')
      end
      feeds_arr = feeds(matched_feeds.to_a)
      feeds_arr.each do |feed|
        feed.json_pull
        cves_obj = feed.cve(cves_to_find.select { |cve| cve.include?(feed.name) })
        if cves_obj.is_a?(Hash)
          return_value.push(cves_obj)
        elsif cves_obj.is_a?(Array)
          return_value.push(*cves_obj)
        else
          raise 'cve() method of the feed instance returns wrong value'
        end
      end
    else
      raise "the provided argument (#{arg_cve[0]}) is nor a String or an Array"
    end
  else
    # Overloading a list of arguments as one array argument
    return_value = cve(arg_cve)
  end
  return return_value
end

#feedsArray<Feed> #feeds(feed) ⇒ Feed #feeds(feed_arr) ⇒ Array<Feed> #feeds(feed, *) ⇒ Array<Feed>

Return feeds. Can only be called after #scrap.

Examples:

scraper.feeds # => all feeds
scraper.feeds('CVE-2010') # => return only CVE-2010 feed
scraper.feeds("CVE-2005", "CVE-2002") # => return CVE-2005 and CVE-2002 feeds

Overloads:

  • #feedsArray<Feed>

    All the feeds.

    Returns:

    • (Array<Feed>)

      Attributes of all feeds. It’s an array of Feed object.

  • #feeds(feed) ⇒ Feed

    One feed.

    Parameters:

    • feed (String)

      Feed name as written on NVD website. Names can be obtains with #available_feeds.

    Returns:

    • (Feed)

      Attributes of one feed. It’s a Feed object.

  • #feeds(feed_arr) ⇒ Array<Feed>

    An array of feeds.

    Parameters:

    • feed_arr (Array<String>)

      An array of feed names as written on NVD website. Names can be obtains with #available_feeds.

    Returns:

    • (Array<Feed>)

      Attributes of the feeds. It’s an array of Feed object.

  • #feeds(feed, *) ⇒ Array<Feed>

    Multiple feeds.

    Parameters:

    • feed (String)

      Feed name as written on NVD website. Names can be obtains with #available_feeds.

    • * (String)

      As many feeds as you want.

    Returns:

    • (Array<Feed>)

      Attributes of the feeds. It’s an array of Feed object.

See Also:



396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
# File 'lib/nvd_feed_api.rb', line 396

def feeds(*arg_feeds)
  raise 'call scrap method before using feeds method' if @feeds.nil?
  return_value = nil
  if arg_feeds.empty?
    return_value = @feeds
  elsif arg_feeds.length == 1
    if arg_feeds[0].is_a?(String)
      @feeds.each do |feed| # feed is an object
        return_value = feed if arg_feeds.include?(feed.name)
      end
      # if nothing found return nil
    elsif arg_feeds[0].is_a?(Array)
      raise 'one of the provided arguments is not a String' unless arg_feeds[0].all? { |x| x.is_a?(String) }
      # Sorting CVE can allow us to parse quicker
      # Upcase to be sure include? works
      # Does not use map(&:upcase) to preserve CVE-Recent and CVE-Modified
      feeds_to_find = arg_feeds[0].map { |x| x[0..2].upcase.concat(x[3..x.size]) }.sort
      matched_feeds = []
      @feeds.each do |feed| # feed is an object
        if feeds_to_find.include?(feed.name)
          matched_feeds.push(feed)
          feeds_to_find.delete(feed.name)
        elsif feeds_to_find.empty?
          break
        end
      end
      return_value = matched_feeds
      raise "#{feeds_to_find.join(', ')} are unexisting feeds" unless feeds_to_find.empty?
    else
      raise "the provided argument (#{arg_feeds[0]}) is nor a String or an Array"
    end
  else
    # Overloading a list of arguments as one array argument
    return_value = feeds(arg_feeds)
  end
  return return_value
end

#scrapInteger

Note:

#scrap need to be called only once but can be called again to update if the NVD feed page changed.

Scrap / parse the website to get the feeds and fill the #feeds attribute.

Returns:

  • (Integer)

    0 when there is no error.



358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# File 'lib/nvd_feed_api.rb', line 358

def scrap
  uri = URI(@url)
  html = Net::HTTP.get(uri)

  doc = Nokogiri::HTML(html)
  @feeds = []
  doc.css('h3#JSON_FEED ~ div.row:first-of-type table.xml-feed-table > tbody > tr[data-testid*=desc]').each do |tr|
    name = tr.css('td')[0].text
    updated = tr.css('td')[1].text
    meta = tr.css('td')[2].css('> a').attr('href').value
    gz = tr.css('+ tr > td > a').attr('href').value
    zip = tr.css('+ tr + tr > td > a').attr('href').value
    @feeds.push(Feed.new(name, updated, meta, gz, zip))
  end
end

#update_feeds(feed) ⇒ Boolean #update_feeds(feed_arr) ⇒ Array<Boolean> #update_feeds(feed, *) ⇒ Array<Boolean>

Update the feeds

Examples:

s = NVDFeedScraper.new
s.scrap
f2015, f2017 = s.feeds("CVE-2015", "CVE-2017")
s.update_feeds(f2015, f2017) # => [false, false]

Overloads:

  • #update_feeds(feed) ⇒ Boolean

    One feed.

    Parameters:

    • feed (Feed)

      feed object to update.

    Returns:

    • (Boolean)

      true if the feed was updated, false if it wasn’t.

  • #update_feeds(feed_arr) ⇒ Array<Boolean>

    An array of feed.

    Parameters:

    • feed_arr (Array<Feed>)

      array of feed objects to update.

    Returns:

    • (Array<Boolean>)

      true if the feed was updated, false if it wasn’t.

  • #update_feeds(feed, *) ⇒ Array<Boolean>

    Multiple feeds.

    Parameters:

    • feed (Feed)

      feed object to update.

    • * (Feed)

      As many feed objects as you want.

    Returns:

    • (Array<Boolean>)

      true if the feed was updated, false if it wasn’t.



558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
# File 'lib/nvd_feed_api.rb', line 558

def update_feeds(*arg_feed)
  return_value = false
  raise 'no argument provided, 1 or more expected' if arg_feed.empty?
  scrap
  if arg_feed.length == 1
    if arg_feed[0].is_a?(Feed)
      new_feed = feeds(arg_feed[0].name)
      # update attributes
      if arg_feed[0].updated != new_feed.updated
        arg_feed[0].name = new_feed.name
        arg_feed[0].updated = new_feed.updated
        arg_feed[0].meta_url = new_feed.meta_url
        arg_feed[0].gz_url = new_feed.gz_url
        arg_feed[0].zip_url = new_feed.zip_url
        # update if @meta was set
        arg_feed[0].meta_pull unless feed.meta.nil?
        # update if @json_file was set
        arg_feed[0].json_pull unless feed.json_file.nil?
        return_value = true
      end
    elsif arg_feed[0].is_a?(Array)
      return_value = []
      arg_feed[0].each do |f|
        res = update_feeds(f)
        puts "#{f} not found" if res.nil?
        return_value.push(res)
      end
    else
      raise "the provided argument #{arg_feed[0]} is not a Feed or an Array"
    end
  else
    # Overloading a list of arguments as one array argument
    return_value = update_feeds(arg_feed)
  end
  return return_value
end