web scraping - Scrape a Google Chart script with Scraperwiki (Python) -

- July 15, 2010

i'm getting scraping scraperwiki in python. figured out how scrape tables page, run scraper every month , save results on top of each other. pretty cool.

now want scrape page information on android versions , run script monthly. in particular, want table version, codename, api , distribution. it's not easy.

the table called wrapper div. there way scrape information? can't find solution.

plan b scrape visualisation. need, codename , percentage, that's sufficient. information can found in html in google chart script.

google chart api script

but can't find information 'souped' html. have a public scraper on here. can edit make work.

can explain how can approach problem? working scraper comments on what's going on awesome.

this difficult case, because kisamoto mentioned, data inside embedded javascript , not in seperate json file expect. possible beautifulsoup involes ugly string processing:

last_paragraph = soup.find_all('p', style='clear:both')[-1] script_tag = last_paragraph.next_sibling.next_sibling script_text = script_tag.text  lines = script_text.split('\n') data_text = '' line in lines:      if 'screen_data' in line: break     data_text = data_text + line  data_text = data_text.replace('var version_data =', '') # delete semicolon @ end data_text = data_text[:-1]  data = json.loads(data_text) data = data[0] print data['data']

output:

[{u'perc': u'0.1', u'api': 4, u'name': u'donut'}, ... ]

Search This Blog

HPH

web scraping - Scrape a Google Chart script with Scraperwiki (Python) -

Comments

Post a Comment

Popular posts from this blog

c++ - Function signature as a function template parameter -

algorithm - What are some ways to combine a number of (potentially incompatible) sorted sub-sets of a total set into a (partial) ordering of the total set? -

How to call a javascript function after the page loads with a chrome extension? -