Convert string that contains HTML to sentences and also keep separator using Javascript -
this string. contains html:
first sentence. here <a href="http://google.com">google</a> link in second sentence! third sentence might contain image <img src="http://link.to.image.com/hello.png" /> , ends !? last sentence looks <b>this</b>??
i want split string sentences (array), keep html separator. this:
[0] = first sentence. [1] = here <a href="http://google.com">google</a> link in second sentence! [2] = third sentence might contain image <img src="http://link.to.image.com/hello.png" /> , ends !? [3] = last sentence looks <b>this</b>??
can suggest me way please? may using regex , match?
this close i’m after, not html bits: javascript split regular expression keep delimiter
the easy part parsing; can wrapping element around string. splitting sentences more intricate; first stab @ it:
var s = 'first sentence. here <a href="http://google.com">google.</a> link in second sentence! third sentence might contain image <img src="http://link.to.image.com/hello.png" /> , ends !? last sentence looks <b>this</b>??'; var wrapper = document.createelement('div'); wrapper.innerhtml = s; var sentences = [], buffer = [], re = /[^.!?]+[.!?]+/g; [].foreach.call(wrapper.childnodes, function(node) { if (node.nodetype == 1) { buffer.push(node.outerhtml); // save html } else if (node.nodetype == 3) { var str = node.textcontent; // shift sentences while ((match = re.exec(str)) !== null) { sentences.push(buffer.join('') + match); buffer = []; str = str.substr(re.lastindex + 1); re.lastindex = 0; // reset regexp } buffer.push(str); } }); if (buffer.length) { sentences.push(buffer.join('')); } console.log(sentences);
every node that's either element or unfinished sentence gets added buffer until full sentence found; it's prepended result array.
Comments
Post a Comment