I had a large-ish CSV to load and a problem: line breaks inside some of the delimited fields.
Like these two records:
one, two, "three beans", four five, six, "seven beans", "eight wonderful beans"
SQL Loader simply won’t handle this, as plenty of sad forum posts attest. The file needs pre-processing and here is a little python script to do it, adapted from Jmoreland91’s solution on Stack Overflow.
import sys, csv, os def hrtstrip (inputfile,outputfile,newtext): print("Input file " + inputfile) print("Output file " + outputfile) with open(inputfile, "r") as input: with open(outputfile, "w") as output: w = csv.writer(output, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n') for record in csv.reader(input): w.writerow(tuple(s.replace("\n", newtext) for s in record)) print("All done")
Thanks to Jmoreland91 for this. If you use it, give him an updoot.
edit – Jason Bucata (@tech31842) tweeted me another StackOverflow with a number of scripts in assorted languages: http://stackoverflow.com/questions/33994244/how-to-remove-newlines-inside-csv-cells-using-regex-terminal-tools