Class: LinkedinData
- Inherits:
-
Object
- Object
- LinkedinData
- Defined in:
- lib/linkedindata.rb
Instance Method Summary collapse
-
#examine(page) ⇒ Object
Examines a search page.
-
#getData ⇒ Object
Gets all data and returns in JSON.
-
#initialize(input, todegree) ⇒ LinkedinData
constructor
A new instance of LinkedinData.
-
#relScore(data) ⇒ Object
Add a score to each profile based on the # of times it appears in “people also viewed”.
-
#scrape(url, curhops) ⇒ Object
Scrapes profile.
-
#search ⇒ Object
Searches for profiles on Google.
-
#showAllKeys(data) ⇒ Object
Make sure all keys that occur occur in each item (even if nil).
Constructor Details
#initialize(input, todegree) ⇒ LinkedinData
Returns a new instance of LinkedinData.
12 13 14 15 16 17 |
# File 'lib/linkedindata.rb', line 12 def initialize(input, todegree) @input = input @output = Array.new @startindex = 10 @numhops = todegree end |
Instance Method Details
#examine(page) ⇒ Object
Examines a search page
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/linkedindata.rb', line 30 def examine(page) # Separate getting profile links and going to next page # Method for getting links to all result pages # Different method for getting all profile links on page and scraping (split to new thread for this) # Has own output set, merge into full one at end (make sure threadsafe) # Have own input and output page.links.each do |link| if (link.href.include? "linkedin.com") && (!link.href.include? "webcache") && (!link.href.include? "site:linkedin.com/pub+") saveurl = link.href.split("?q=") if saveurl[1] url = saveurl[1].split("&") begin scrape(url[0], 0) rescue end end end # Find the link to the next page and go to it if (link.href.include? "&sa=N") && (link.href.include? "&start=") url1 = link.href.split("&start=") url2 = url1[1].split("&sa=N") if url2[0].to_i == @startindex sleep(rand(30..90)) @startindex += 10 agent = Mechanize.new examine(agent.get("http://google.com" + link.href)) end end end end |
#getData ⇒ Object
Gets all data and returns in JSON
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/linkedindata.rb', line 135 def getData search # Get related profiles @numhops.times do @output.each do |o| if o[:degree] < @numhops if o[:related_people] o[:related_people].each do |i| if @output.select { |obj| obj[:name] == i[:name]}.empty? scrape(i[:url], o[:degree]+1) end end end end end end formatted_json = JSON.pretty_generate(relScore(showAllKeys(@output))) return formatted_json end |
#relScore(data) ⇒ Object
Add a score to each profile based on the # of times it appears in “people also viewed”
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/linkedindata.rb', line 105 def relScore(data) # Make list of profiles profiles = Hash.new data.each do |d| profiles[d["profile_url"]] = 0 end # Get degree for each profile data.each do |i| if i["related_people"] i["related_people"].each do |p| if profiles[p["url"]] # Calculate degree- (2/d*2) except when degree is 0 degree_divide = i["degree"] == 0 ? 1 : i["degree"]*2 profiles[p["url"]] += (2.0/degree_divide) end end end end # Merge scores back into dataset data.each do |m| m.merge!(:score => profiles[m["profile_url"]]) end return data end |
#scrape(url, curhops) ⇒ Object
Scrapes profile
66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/linkedindata.rb', line 66 def scrape(url, curhops) # Download profile and rescue on error begin url.gsub!("https", "http") profile = Linkedin::Profile.get_profile(url) rescue end # Parse profile if returned if profile p = ParseProfile.new(profile, url, curhops) @output.concat(p.parse) end end |
#search ⇒ Object
Searches for profiles on Google
20 21 22 23 24 25 26 27 |
# File 'lib/linkedindata.rb', line 20 def search agent = Mechanize.new agent.user_agent_alias = 'Linux Firefox' gform = agent.get("http://google.com").form("f") gform.q = "site:linkedin.com/pub " + @input page = agent.submit(gform, gform..first) examine(page) end |
#showAllKeys(data) ⇒ Object
Make sure all keys that occur occur in each item (even if nil)
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/linkedindata.rb', line 82 def showAllKeys(data) # Get all keys fields = Set.new data.map { |o| fields.merge(o.keys) } # Make sure all items have all keys datarr = Array.new data.each do |d| temphash = Hash.new fields.each do |f| if !d[f] temphash[f] = nil else temphash[f] = d[f] end end datarr.push(temphash) end return datarr end |