Last Updated on September 9, 2023 by Christopher G Mendla
I needed to validate CSV files that were being imported into a Rails app. The validation was simple: Look for blank entries, ensure columns used for associations (IDs) were integers and check that certain other entries were numeric.
The first thing I checked was to make sure that the header of the CSV file matched what we expected. The simplest way to do this was to set an ‘expected header’ and then check the header on the CSV.
An example
Here is a stripped down example for a hypothetical employee list. In the controller, I would have :
expected_header = ["EmployeeNo", "First", "Last", "Bonus"]
csv_error = true if !helpers.check_header(expected_header,csv_file)
A helper
def check_header(expected_header,csv_file)
header = CSV.open(csv_file, 'r') { |csv| csv.first }
valid_csv = true
for i in (0..header.size-1)
if header[i].downcase != expected_header[i].downcase
valid_csv = false
end
end
if !valid_csv
$csv_error = "Header:
#{header}
Expected Header:
#{expected_header} "
end
return valid_csv
end
Notes
- We have already set csv_file in the controller.
- We are grabbing the first line of the csv which should be the header.
- We walk through each header. .downcase is used because there were some differences in case from different sources.
- we compare the expected header with the header and return valid_csv if it matches
if !csv_error
CSV.foreach(csv_file, :headers => true) do |row|
csv_error = true if !helpers.check_employee_csv(row)
end
def check_lists_csv(row)
ok = true
ok = false unless data_is_present(row, 0, "EmployeeID)
ok = false unless data_is_integer(row, 0, "EmployeeID)
ok = false unless data_is_present(row, 1, "First")
ok = false unless data_is_present(row, 2, "Last")
ok = false unless data_is_present(row, 3, "Bonus)
ok = false unless_data_is_numeric(row, 3, "Bonus")
return ok
end
- row is passed from the controller and contains the line of data from the csv
- the second parameter is the index of the column. The column numbers begin with 0
- the third parameter is the name of the column for the error message.
def data_is_numeric(row, column_number, column_name)
data_numeric = true
if row[column_number.to_i].to_s.match(/A[+-]?d+?(.d+)?Z/) == nil
$csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >> #{row} >> #{column_name} is not numeric
"
data_numeric = true
end
return data_numeric
end
def data_is_integer(row, column_number, column_name)
data_integer = true
if row[column_number.to_i].to_s.match(/^(0|[1-9][0-9]*)$/) == nil
$csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >> #{row} >> #{column_name} is not an integer
"
data_integer = true
end
return data_integer
end
def data_is_present(row, column_number, column_name)
data_present = true
if row[column_number.to_i].blank?
$csv_error = $csv_error + "Error in row #{$INPUT_LINE_NUMBER} >> #{row} >> #{column_name} is blank
"
data_present = false
end
return data_present
end
If the csv validations pass, then we process the csv
if !csv_error
CSV.foreach(csv_file, :headers => true) do |row|
update_focus(row)
end
file = "Employee"
else
redirect_to file_error_path and return
$csv_error = $csv_error + "Your CSV was NOT uploaded Please fix the error(s) and retry"
end
There was an error in the file upload
The file you were attempting to upload was not replaced
<%= $csv_error.html_safe %>
Results
This works and requires about 3 seconds to process about 40k of records.
My first attempt was taking about 8 minutes to process 40k of records. The problem was that I was trying to address variations in case with something like
ok = false unless data_is_present(row, "EmployeeID") || data_is_present(row, "Employeeid")
We had CSVs with both versions of the header. For some reason, that was taking a lot of time to process. By using the index number of the header, it avoided whatever was causing the slow processing.
Note – This code is from a couple of years back and there are deviations from best practices such as redundantly returning the value for a method.
Olá amigo, estou tentando validar um CSV, e acabei achando seu post e gostaria de saber como posso migrar esse seu código para Ruby puro. É um desafio que tenho que realizar mas como sou novo no Ruby e também na área de de programação principalmente back-end, estou perdido. Se houver a possibilidade me socorrer ficarei muito grato.