A common issue with on-demand CSV report generation delivered via email is running up against maximum attachment size limits. For example, gmail has a max attachment size limit of 25mb – one that some of Dutchie's larger dispensary partners' reporting needs blow right past. Since much of our backend infrastructure is written in Rails, ActiveStorage was the natural choice when we decided to migrate our reporting products from email attachments to streaming downloads in S3. At Dutchie our engineering team follows certain agile practices like Fibonacci-based story pointing, so when we initially sized the feature as a 5 point story we really did not expect it to turn into a 13 pointer and a full bottle of bourbon. Let me tell you the story...
Rails likes to tout single click installs and configurations for each of its components, but there are hidden gotchas that can occur with little documentation to fix them. I ran into all of them during this migration.
UUIDs
Our stack utilizes UUIDs as the default ID for our database records, and we adjusted the generator defaults for all of our models way before we began this setup. I ran the Rails generator for creating ActiveStorage scaffolding but did not notice it was not set to use UUIDs. This produced inconsistent file attachments that would randomly fail with no identifiable cause. I searched for a solution for hours and reached out to my network of Rubyists which led me to this post.
Three issues were wrong with our migration. First, the ID record of the table itself was not a UUID. Our setup uses id: :uuid, default: -> { "uuid_generate_v4()" }
which should have been picked up by our ID generator modification. Second, the default active storage table relations also needed modification. Both the t.references :record
and t.references :blob
were not type: :uuid
which also meant the constraints were not set up correctly. This was resolved by explicitly setting the type for each and re-running the migrations.
Analyzers
Next, I had to battle a confusing analyzer error we were receiving when attaching files:
NameError: uninitialized constant #Class:0x00007f9050304818::Analyzable
Did you mean? ActiveStorage::Analyzer
Rails come preloaded with three analyzers: one for images, one for videos, and finally a NULL analyzer. These analyzers are supposed to populate the active storage tables with file metadata, and the NULL analyzer is supposed to nullify metadata for content types that can not be identified. That means files that are not images or videos. In our case, CSVs, without an explicit analyzer should have defaulted to the NullAnalyzer
, but it did not appear to work correctly. To fix this I set up our own Monkey Patch NULL Analyzer based on the Rails one, and disabled the defaults for two reasons. First, they are apparently unstable, and second I wanted specific design around all file data. The result was an active_storage.rb
initializer that looks like this:
class OurNullAnalyzer < ActiveStorage::Analyzer
def self.accept?(blob)
true
end
def self.analyze_later?
false
end
def metadata
{}
end
end
Rails.application.config.active_storage.analyzers.clear
Rails.application.config.active_storage.analyzers.append(OurNullAnalyzer)
Setting this up lead me down another rabbit hole: our job configurations.
Job configurations
We use a number of Sidekiq queues that process various jobs at different priorities. I discovered we were not running the two active storage-related queues: active_storage_purge
and active_storage_analysis
when debugging why our setup was not executing the analyzation job. These queues were not picked up by default and needed to be worked in to process deleted assets and properly execute the metadata analysis. Running config.active_job.queue_adapter = :inline
in your development environment in rails will assist you with debugging.
StringIO
Finally if you are attaching PDFs or CSVs you will need to take advantage of string io inputs. It may look something like this: dispensary_export.file.attach(io: StringIO.new(csv_data), filename: "Orders.csv", content_type: "text/csv")
where csv_data
is the output from the CSV class. After going through this implementation I would not generate and export CSVs any other way. This setup allows you to effectively stream data more securely from S3 directly and offload that traffic from your servers.