“What do you mean there’s line breaks in the address?” said SQLLDR

I had a large-ish CSV to load and a problem: line breaks inside some of the delimited fields.

Like these two records:

one, two, "three beans", four
five, six, "seven
beans", "eight wonderful beans"

SQL Loader simply won’t handle this, as plenty of sad forum posts attest. The file needs pre-processing and here is a little python script to do it, adapted from Jmoreland91’s solution on Stack Overflow.

import sys, csv, os
 
def hrtstrip (inputfile,outputfile,newtext):
    print("Input file " + inputfile)
    print("Output file " + outputfile)
    with open(inputfile, "r") as input:
       with open(outputfile, "w") as output:
          w = csv.writer(output, delimiter=',', quotechar='"', 
quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n')
          for record in csv.reader(input):
             w.writerow(tuple(s.replace("\n", newtext) for s in record))
    print("All done")

Thanks to Jmoreland91 for this. If you use it, give him an updoot.

edit – Jason Bucata (@tech31842) tweeted me another StackOverflow with a number of scripts in assorted languages: http://stackoverflow.com/questions/33994244/how-to-remove-newlines-inside-csv-cells-using-regex-terminal-tools

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.