python - Is there a better way to do csv/namedtuple with urlopen? -
using namedtuple documentation example template in python 3.3, have following code download csv , turn series of namedtuple subclass instances:
from collections import namedtuple csv import reader urllib.request import urlopen securitytype = namedtuple('securitytype', 'sector, name') url = 'http://bsym.bloomberg.com/sym/pages/security_type.csv' sec in map(securitytype._make, reader(urlopen(url))): print(sec)
this raises following exception:
traceback (most recent call last): file "scrap.py", line 9, in <module> sec in map(securitytype._make, reader(urlopen(url))): _csv.error: iterator should return strings, not bytes (did open file in text mode?)
i know issue urlopen returning bytes , not strings , need decode output @ point. here's how i'm doing now, using stringio:
from collections import namedtuple csv import reader urllib.request import urlopen import io securitytype = namedtuple('securitytype', 'sector, name') url = 'http://bsym.bloomberg.com/sym/pages/security_type.csv' reader_input = io.stringio(urlopen(url).read().decode('utf-8')) sec in map(securitytype._make, reader(reader_input)): print(sec)
this smells funny because i'm iterating on bytes buffer, decoding, rebuffering, iterating on new string buffer. there more pythonic way without 2 iterations?
use io.textiowrapper()
decode urllib
response:
reader_input = io.textiowrapper(urlopen(url), encoding='utf8', newline='')
now csv.reader
passed exact same interface when opening regular file on filesystem in text mode.
with change example url works me on python 3.3.1:
>>> sec in map(securitytype._make, reader(reader_input)): ... print(sec) ... securitytype(sector='market sector', name='security type') securitytype(sector='comdty', name='calendar spread option') securitytype(sector='comdty', name='financial commodity future.') securitytype(sector='comdty', name='financial commodity generic.') securitytype(sector='comdty', name='financial commodity option.') ... securitytype(sector='muni', name='zero coupon, oid') securitytype(sector='pfd', name='private') securitytype(sector='pfd', name='public') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='') securitytype(sector='', name='')
the last lines appear yield empty tuples; original indeed has lines nothing more comma on them.
Comments
Post a Comment