web scraping - Scrape a Google Chart script with Scraperwiki (Python) -
i'm getting scraping scraperwiki in python. figured out how scrape tables page, run scraper every month , save results on top of each other. pretty cool.
now want scrape page information on android versions , run script monthly. in particular, want table version, codename, api , distribution. it's not easy.
the table called wrapper div. there way scrape information? can't find solution.
plan b scrape visualisation. need, codename , percentage, that's sufficient. information can found in html in google chart script.
but can't find information 'souped' html. have a public scraper on here. can edit make work.
can explain how can approach problem? working scraper comments on what's going on awesome.
this difficult case, because kisamoto mentioned, data inside embedded javascript , not in seperate json file expect. possible beautifulsoup involes ugly string processing:
last_paragraph = soup.find_all('p', style='clear:both')[-1] script_tag = last_paragraph.next_sibling.next_sibling script_text = script_tag.text lines = script_text.split('\n') data_text = '' line in lines: if 'screen_data' in line: break data_text = data_text + line data_text = data_text.replace('var version_data =', '') # delete semicolon @ end data_text = data_text[:-1] data = json.loads(data_text) data = data[0] print data['data']
output:
[{u'perc': u'0.1', u'api': 4, u'name': u'donut'}, ... ]
Comments
Post a Comment