Sanitizing inconsistent user input
While working on a client project, I noticed that users had started using a field intended to store only a single value to store multiple values. When recording a property’s Parking Type they were entering values like “Open Lot & Street Parking” or “Carport/Subterranean Parking.”
Additionally, because I had allowed users to enter this data as text, I found variations in capitalization and pluralization for what should have been a single value, i.e. ‘Garage’, ‘garages’, ‘garage’.
I decided the best solution was to clean up the existing values and limit the future user input to a checkbox collection of approved values.
Converting string field to a Postgresql array
Because users needed to record multiple values, I first converted the parking_type field to a Postgresql array.
I created a new migration with the following command and migrated the database.
Sanitizing existing values
The using option in the migration would split entries with multiple values separated by commas. In my case, however, users were not using commas. Instead, they favored slashes and ampersands.
I prefer to split these values with a ruby script (which I understand well) rather than relying on the Postgres specific using method (which I understand less well). I wrote the following script and stuck it in a rake task.
Running this rake task took care of most of the cleanup, leaving only a few errant values (misspellings or disallowed categories) that needed to be changed manually in the console.
To allow users to record multiple values, I needed to make a small change to the risks_controller. The parking_type field in the strong params needs to be set to an array to allow multiple values.
… to this.
Finally, I need to amend the form, forcing users to choose from approved values. I will allow users to choose from any of the now sanitized values currently in the database.
Now users can select multiple values from an approved list keeping the data squeaky clean!