web scraping - Scrape a Google Chart script with Scraperwiki (Python) -

- July 15, 2010

i'm getting scraping scraperwiki in python. figured out how scrape tables page, run scraper every month , save results on top of each other. pretty cool.

now want scrape page information on android versions , run script monthly. in particular, want table version, codename, api , distribution. it's not easy.

the table called wrapper div. there way scrape information? can't find solution.

plan b scrape visualisation. need, codename , percentage, that's sufficient. information can found in html in google chart script.

google chart api script

but can't find information 'souped' html. have a public scraper on here. can edit make work.

can explain how can approach problem? working scraper comments on what's going on awesome.

this difficult case, because kisamoto mentioned, data inside embedded javascript , not in seperate json file expect. possible beautifulsoup involes ugly string processing:

last_paragraph = soup.find_all('p', style='clear:both')[-1] script_tag = last_paragraph.next_sibling.next_sibling script_text = script_tag.text  lines = script_text.split('\n') data_text = '' line in lines:      if 'screen_data' in line: break     data_text = data_text + line  data_text = data_text.replace('var version_data =', '') # delete semicolon @ end data_text = data_text[:-1]  data = json.loads(data_text) data = data[0] print data['data']

output:

[{u'perc': u'0.1', u'api': 4, u'name': u'donut'}, ... ]

Search This Blog

HPH

web scraping - Scrape a Google Chart script with Scraperwiki (Python) -

Comments

Post a Comment

Popular posts from this blog

Perl - how to grep a block of text from a file -

delphi - How to remove all the grips on a coolbar if I have several coolbands? -

javascript - Animating array of divs; only the final element is modified -