Python: Use UnicodeWriter to write Unicode to CSV

Python: Use UnicodeWriter to write Unicode to CSV … here is a solution to the problem.

Python: Use UnicodeWriter to write Unicode to CSV

The Python documentation has the following code examples for writing unicode to a csv file. I think it mentions there that this is because the csv module can’t handle the methods of unicode strings.

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

def writerows(self, rows):
        for row in rows:
            self.writerow(row)

I’m writing multiple files, and for simplicity I’ve just put a piece of code to demonstrate how I can use the above class in my code:

def write(self):
    """
    Outputs the dataset to a csv.
    """
    f = codecs.open(self.filename, 'a')
    writer = UnicodeWriter(f)
    #with open(self.filename, 'a', encoding='utf-8') as f:
    if self.headers and not self.written:
        writer.writerow(self.headers)
        self.written = True
    for record in self.records[self.last_written:]:
        print record
        writer.writerow(record)
    self.last_written = len(self.records)
    f.close()

This is a method inside a coll dataset that prepares the dataset before writing csv, previously I used writer = csv.writer(f) but due to codec error I changed my code to use the ‘UnicodeWriter class.’

But my problem is that when I open the csv file, I get the following information:

some_header
B,r,ë,k,ò,w,n,i,k,_,b,s
B,r,ë,k,ò,w,n,i,k,_,c,s
B,r,ë,k,ò,w,n,i,k,_,c,s,b
B,r,ë,k,ò,w,n,i,k,_,d,e
B,r,ë,k,ò,w,n,i,k,_,d,e,-,1
B,r,ë,k,ò,w,n,i,k,_,d,e,-,2
B,r,ë,k,ò,w,n,i,k,_,d,e,-,3
B,r,ë,k,ò,w,n,i,k,_,d,e,-,4
B,r,ë,k,ò,w,n,i,k,_,d,e,-,5
B,r,ë,k,ò,w,n,i,k,_,d,e,-,M
B,r,ë,k,ò,w,n,i,k,_,e,n
B,r,ë,k,ò,w,n,i,k,_,e,n,-,1
B,r,ë,k,ò,w,n,i,k,_,e,n,-,2

The lines should actually be something like Brëkòwnik_de-1, I’m not really going to see what’s going on.

To give a basic understanding of how the data is generated, I’ll add the following lines:
title = unicode(row_page_title['page_title'], 'utf-8')

Solution

This symptom points to something like entering a string into a function/method that requires a list or tuple.

The writerows

method requires a list of lists, and writerows require a list (or tuple) containing field values. Since you provide it with a string, and when you iterate over it, the string can simulate a list of characters, you get a CSV with one character for each column.

If your CSV has only one column, you should use writer.writerow([data]) instead of writer.writerow(data). If you only have one column, some people may question if you really need a csv module, but a csv module will handle things like records with interesting content (CR/LF and others), so yes, that’s a good idea.

Related Problems and Solutions