c# - Parsing results from HTMLAgiltyPack -
i'm trying parse yahoo finance page list of stock symbols , company names. url i'm using is: http://uk.finance.yahoo.com/q/cp?s=%5eftse
the code i'm using is;
htmlagilitypack.htmldocument page = new htmlweb().load("http://uk.finance.yahoo.com/q/cp?s=%5eftse"); var titles = page.documentnode.selectnodes("//td[@class='yfnc_tabledata1']"); // returns titles on home page of site in array. foreach (var title in titles) { txtlog.appendtext(title.innerhtml + system.environment.newline); }
the txtlog.appendtext line me testing. code correctly gets each lines contains class of yfnc_tabledata1 under node of td. when i'm in foreach loop need parse title grab symbol , company name following html;
<b><a href="/q?s=glen.l">glen.l</a></b> glencore xstrat <b>343.95</b> <nobr><small>3 may 16:35</small></nobr> <img width="10" height="14" style="margin-right:-2px;" border="0" src="http://l.yimg.com/os/mit/media/m/base/images/transparent-1093278.png" class="pos_arrow" alt="up"> <b style="color:#008800;">12.80</b> <bstyle="color:#008800;"> (3.87%)</b> 68,086,160
is possible parse results of parsed document? i'm little unsure on start.
you need continue xpath extraction work are. there many possibilities. difficulty yfnc_tabledata1 nodes @ same level. here how can (in console app example dump list of symbols , companies):
htmlagilitypack.htmldocument page = new htmlweb().load("http://uk.finance.yahoo.com/q/cp?s=%5eftse"); // directly symbols under 1st td element. recursively search element has href attribute under td. var symbols = page.documentnode.selectnodes("//td[@class='yfnc_tabledata1']//a[@href]"); foreach (var symbol in symbols) { // current element, go 2 level , next td element. var company = symbol.selectsinglenode("../../following-sibling::td").innertext.trim(); console.writeline(symbol.innertext + ": " + company); }
more on xpath axes here: xpath axes
Comments
Post a Comment