clojure - How do I write an Enlive selector to return "clusters" of tags? -
i'm writing clojure code using enlive process set of xml documents. they're in xml format borrows heavily html adds custom tags, , job convert them real html. custom tag that's bothering me right <tab>
, being used in kinds of places shouldn't be. example, it's used make lists should have been made <ol>
, <li>
. here's example of kind of thing i'm encountering:
<p class="normal">some text</p> <p class="listwithtabs">(a)<tab />first list item</p> <p class="listwithtabs">(b)<tab />second list item</p> <p class="listwithtabs">(c)<tab />third list item</p> <p class="normal">some more text</p> <p class="anotherlist">1.<tab />another list</p> <p class="anotherlist">2.<tab />two items time</p> <p class="normal">some final text</p>
i want turn into:
<p class="normal">some text</p> <ol type="a"> <li class="listwithtabs">first list item</li> <li class="listwithtabs">second list item</li> <li class="listwithtabs">third list item</li> </ol> <p class="normal">some more text</p> <ol type="1"> <li class="anotherlist">another list</li> <li class="anotherlist">two items time</li> </ol> <p class="normal">some final text</p>
to this, need <p>
elements contain <tab>
descendants (easy enlive selectors), , somehow cluster them according natural groupings had in original xml documents (much harder).
i've looked through documents , determined can't rely on class
attribute: these <p>
-that-should-be-<li>
elements have same class <p>
elements around them, , there 2 successive groups of <p>
-that-should-be-<li>
elements same class each other (i.e., if example posted had both clusters having class listwithtabs
). 1 thing think can rely on there never 2 different lists without @ least 1 non-list element separating them: in other words, cluster of successive <p>
elements have property "has @ least 1 <tab>
element descendant" part of same list.
with in mind, did experimenting @ repl, enlive loaded under namespace e
(that is, (require '[net.cgrand.enlive-html :as e])
should assumed in effect rest of question). easy write selector pick out elements want, (e/select snippet [(e/has [:tab])])
returns list (well, it's lazy sequence) of 5 elements. want list of lists: first 3 elements , second two. vaguely (pardon non-standard indentation):
[ [{:tag :p, :content (... "first list item" ...)} {:tag :p, :content (... "second list item" ...)} {:tag :p, :content (... "third list item" ...)} ] ; 3 items in first list [{:tag :p, :content (... "another list" ...)} {:tag :p, :content (... "with 2 items" ...)} ] ; 2 items in second list ]
i able create following selectors:
(def first-of-tab-group [(e/has [:tab]) (e/left (complement (e/has [:tab])))]) (def rest-of-tab-group [(e/has [:tab]) (e/left (e/has [:tab]))])
but i'm stuck. i'd (e/select snippet [[(e/start-at first-of-tab-group) (e/take-while rest-of-tab-group)]])
, far know enlive doesn't have functions start-at
, take-while
.
it feels i'm close, missing 1 final key step. how take last step? how select "cluster" of elements match rules, omit other elements match same rules aren't part of first "cluster"?
Comments
Post a Comment