プロダクト開発本部の小本です。

kickflowでは2024年2月から RLS / activerecord-tenant-level-security を導入しています。RLSは安全なSaaSを開発するために必須の機能なのですが、やや複雑な機能であり、新入社員は困惑する事が多いようです。そこで、RLSの概念とkickflowでの使い方を概説します。

kickflowとは？

kickflowはSaaS型の企業向けワークフロー製品です。

kickflow.com

ワークフロー製品では給与や契約などの機微な情報も含むため「テナントA社のユーザーに、別テナントB社のデータが表示されてしまった」といったデータ混濁は絶対に防がなければなりません。

また、サーバーサイドはHeroku / Ruby on Railsで動作しており、DBとしてはHerokuが提供するPostgreSQLを使っています（詳しくはkickflowの技術スタックを参照してください）。

RLS（行レベルセキュリティ）とは？

PostgreSQLの機能の一つです。

GRANTによって利用できるSQL標準の権限システムに加えて、通常の問い合わせでどの行が戻され、データ更新のコマンドでどの行を挿入、更新、削除できるかをユーザ単位で制限する行セキュリティポリシーをテーブルに定義できます。 https://www.postgresql.jp/document/16/html/ddl-rowsecurity.html

RLSをつけば、シングルデータベース/シングルスキーマのマルチテナントSaaSにおいて「テナントAのユーザーはテナントAのデータにのみアクセスできる」というデータ混入防止を DBレベルで実現できます。

なお「ユーザ単位で制限する」とありますが、実際にはSELECT文に強制的に条件式を追加する機能と考えた方がいいでしょう。

例えば、後述するRuby向けのライブラリactiverecord-tenant-level-securityでは以下のようなポリシーを設定します。

-- テーブルに行セキュリティポリシーを追加
-- USING: SELECT文などに追加される条件式
-- WITH CHECK: UPDATE文やINSERT文などに追加される条件式
CREATE POLICY tenant_policy ON users
    AS PERMISSIVE
    FOR ALL
    TO PUBLIC
    USING (tenant_id = NULLIF(current_setting('tenant_level_security.tenant_id'), '')::uuid)
    WITH CHECK (tenant_id = NULLIF(current_setting('tenant_level_security.tenant_id'), '')::uuid)       
;

-- テーブルの行レベルセキュリティを有効化
ALTER TABLE users ENABLE ROW LEVEL SECURITY;

-- テーブルのオーナーに対しても行レベルセキュリティを適用する
ALTER TABLE users FORCE ROW LEVEL SECURITY;

行レベルセキュリティをバイパスする方法

rootユーザー（ロール）には Bypass RLS という属性がついており、RLSを無視して全てのレコードを取得できます。

                                        List of roles
            Role name            |                         Attributes
---------------------------------+------------------------------------------------------------
 postgres                        | Superuser, Create role, Create DB, Replication, Bypass RLS

逆に言えば、rootユーザーを使うとRLSが無視されてしまうので、一般ユーザーを別途用意しなければなりません。

また、後述するようにHerokuではBypass RLS属性をユーザーに付与できないので問題になります。

実行時設定パラメータとは？

PostgreSQLの機能で、セッションの間だけ利用できる変数です。本来はpsqlの設定を変えるためのものですが、任意の変数を設定することもできます。

-- 値をセット
SET hoge = 'fuga';

-- 値を取得
SHOW hoge;

-- クエリ内で値を参照
SELECT current_setting('fuga');

activerecord-tenant-level-securityとは？

activerecord-tenant-level-security はRLSをRailsから使うためのライブラリ（gem）です。

`create_policy(table)`

マイグレーションファイル用のメソッドです。テーブルに先述したような行レベルセキュリティポリシーを追加します。

# 使い方
class CreateFoos < ActiveRecord::Migration[7.1]
  def change
    create_table :foos, id: :uuid do |t|
      # テーブル定義
    end

    create_policy :foos
  end
end

`TenantLevelSecurity.switch!(tenant_id)`

実行時パラメータ tenant_level_security.tenant_id にテナントIDをセットします

# 使い方

User.all # => [] テナント未設定だと空が返る

TenantLevelSecurity.switch!(tenant_id)

# 以降、テナントの行にアクセスできる。
User.all # => [...] テナントのユーザーが取得できる

module TenantLevelSecurity
  class << self
    def switch!(tenant_id)
      switch_with_connection!(ActiveRecord::Base.connection, tenant_id)
    end

    def switch_with_connection!(conn, tenant_id)
      conn.clear_query_cache

      if tenant_id.present?
        conn.execute("SET tenant_level_security.tenant_id = '#{tenant_id}'")
      else
        conn.execute('SET tenant_level_security.tenant_id TO DEFAULT')
      end
    end
  end
end

`TenantLevelSecurity.current_session_tenant_id`

現在使用中のテナントIDを返します。

その実装は、実行時パラメータ tenant_level_security.tenant_id の値をPostgreSQLに都度問い合わせるものなので、見た目よりも実行コストが高いという問題があります。

module TenantLevelSecurity
  class << self
    def current_session_tenant_id
      ActiveRecord::Base.connection.execute('SHOW tenant_level_security.tenant_id').getvalue(0, 0)
    rescue ActiveRecord::StatementInvalid => e
      return nil if e.cause.kind_of? PG::UndefinedObject
      raise
    end
  end
end

`TenantLevelSecurity.with(tenant_id)`

TenantLevelSecurity.switch!(tenant_id)を呼び出してブロックを実行し、ブロックを抜けると元のtenant_idに戻します。

# 使い方の例
TenantLevelSecurity.with(tenant_id) do
  # このブロック内では、tenant_id のテナントのレコードにしかアクセスできない。  
end

# ブロックを抜けると、tenant_idが未設定の状態に戻る。

kickflowではほとんど使っていません。current_session_tenant_idを使っており、見た目よりも実行コストが高いという問題があります。

module TenantLevelSecurity
  class << self
    def with(tenant_id)
      old_tenant_id = current_session_tenant_id
      return yield if old_tenant_id == tenant_id
      begin
        switch! tenant_id
        yield
      ensure
        switch! old_tenant_id
      end
    end
  end
end

activerecord-multi-tenant とは？

activerecord-multi-tenantもRailsでマルチテナントを安全に実装するためのライブラリ（gem）です。

ActiveRecordを使ってクエリするときに、自動でtenant_idの条件式をセットしてくれる。これにより他テナントへの誤アクセスを防げます。

users = User.all # 全テナントのユーザーを取得

MultiTenant.with(tenant) do
  users = User.all # tenantテナント内のユーザーのみを取得
end

kickflowではactiverecord-tenant-level-securityとactiverecord-multi-tenantを併用しています。

activerecord-tenant-level-securityとactiverecord-multi-tenant の違い

両者は役割が似ていますが、以下のような違いがあります。

activerecord-tenant-level-security

DBレベルの防御
TenantLevelSecurity.switch!を呼ばずにクエリを実行すると何も取得できない
TenantLevelSecurity.switch!はtenant_idを引数にとる

activerecord-multi-tenant

アプリケーションレベルの防御
MultiTenant.withを呼ばずにクエリを実行すると、全テナントのレコードを取得できる（RLSが無効の場合）
MultiTenant.withはTenantオブジェクトを引数にとる

activerecord-multi-tenantの使い方

activerecord-multi-tenantはアプリケーションレベルの防御なので、設定もDBではなくモデルクラスに対して行います。multi_tenantメソッドの呼び出しを追加します。

class User < ApplicationRecord
  multi_tenant :tenant # activerecord-multi-tenantの対象である事を宣言
  
  ...
end

また、.current_tenant=でテナントを切り替える事ができます。

MultiTenant.current_tenant = tenant

# 以降、tenantを選択した状態になる

また、.current_tenantで現在のテナントを取得できますが、あまり使いません。

tenant = MultiTenant.current_tenant

MultiTenant.with(tenant)を使うと、ブロック内でのみテナントが選択された状態になります。

users = User.all # 全テナントのユーザーを取得

MultiTenant.with(tenant) do
  users = User.all # tenantテナント内のユーザーのみを取得
end

users = User.all # ブロックを出ると再び、全テナントのユーザーを取得できるようになる

Railsアプリケーションに組み込む場合の実際

PgBouncerの設定変更

PgBouncerはPostgreSQLのプロキシサーバー。コネクションをプールしてPostgreSQLの負荷を軽減します。

image.png (48.5 kB)

実行時設定パラメータはセッション単位で値を保持するので、PgBouncerもセッションモードに変えなければならなりません。

PgBouncerには「サーバー側にインストールする」「クライアント側にインストールする」という２つのインストール方法がありますが、セッションモードはクライアント側でしか使えません。

PgBouncer の設定 | Heroku Dev Center

行セキュリティポリシーを追加すべきテーブルを決める

基本的にtenant_id列があるテーブルには全て、行レベルセキュリティポリシーを追加します。tenant_id列が無いテーブルには設定しません。

また「テナントIDを決定するのに必要なテーブル」にも設定しません。例えば、アクセストークンのテーブルには tenant_id 列が含まれますが、アクセストークンからテナントを特定するという使い方をするので、行レベルセキュリティーポリシーを有効にしてはいけません。

テーブルに行セキュリティポリシーを追加

activerecord-tenant-level-securityではマイグレーション用のcreate_policy(table_name)メソッドを提供しています。

導入時には、以下のように既存のテーブルに行レベルセキュリティーポリシーを追加しました。

class CreatePoliciesOnTables < ActiveRecord::Migration[7.1]
  def change
    create_policy :foo
    create_policy :bar
    create_policy :baz

    # 以下、必要なテーブルに対してcreate_policyを実行 ...
  end
end

また、新しいテーブルを追加する際には create_policyも呼び出す必要があります。

class CreateFoos < ActiveRecord::Migration[7.1]
  def change
    # テーブル定義
    create_table(:foos) do |t|
      ...
    end
    
    # 新しいテーブルを定義したら、create_policyで行レベルセキュリティポリシーも設定する
    create_policy :foos
  end
end

Railsコードの変更（TenantLevelSecurityの呼び出しを追加）

ApplicationController などの共通部分にTenantLevelSecurityの呼び出しを追加します。

class ApplicationController < ActionController::API
  included do
    set_current_tenant_through_filter
    before_action :authenticate_request_by_access_token!
  end

  private

  def authenticate_request_by_access_token!
    access_token = ... # HTTPSヘッダー等からアクセストークンを取得
    TenantLevelSecurity.switch!(access_token.tenant_id) # テナントIDをセット
    
    tenant　= Tenant.find(access_token.tenant_id)
    MultiTenant.current_tenant = tenant # テナントをセット
  end
end

ここで、TenantLevelSecurityを追加すべき共通部分は１箇所とは限りません。例えば、ユーザー向け画面のAPIと社内用APIでは認証の仕組みが異なります。TenantLevelSecurity.switch!をそれぞれ追加しています。

Tenant.findはTenantLevelSecurity.switch!の後に呼び出す

Tenantモデルにはhas_manyやhas_oneで別モデルを関連づけているはずです。

class Tenant < ApplicationRecord
  has_one :foo
  has_many :bars
  has_many :bazs
  ...
end

TenantLevelSecurity.switch! の前に Tenantオブジェクトを取得すると、関連モデルが空になってしまいます。関連モデルもRLSの対象なのでswitch!をセットする前はレコードを取得できないためです。

# NG
tenant_id = ... # アクセストークンからテナントIDを取得

tenant = Tenant.find(tenant_id)
TenantLevelSecurity.switch!(tenant.id)
MultiTenant.with(tenant) do
  # テナントの関連テーブルfooを取得する処理したい

  tenant.foo # => fooが常に空になる（！？）
end

# OK
tenant_id = ... # アクセストークンからテナントIDを取得

TenantLevelSecurity.switch!(tenant.id)
tenant = Tenant.find(tenant_id) # switch!の後に呼び出す
MultiTenant.with(tenant) do
  # テナントの関連テーブルfooを取得する処理したい

  tenant.foo # => 正しく取得できる
end

Sidekiq（ActiveJob、ActionMailer）

kickflowではバックグラウンドジョブにSidekiqを使っています。また、メール送信（ActionMailer）もActiveJob経由でSidekiq上で行っています。

TenantLevelSecurity.switch!を追加

Sidekiqのジョブは、Railsアプリケーションとは別サーバーで実行されるため TenantLevelSecurityの呼び出しを独自に追加します。

class AdminMailer
  around_action :set_current_tenant
  
  private

  def set_current_tenant(&block)
    @tenant = params[:tenant]
    raise "Tenant must not be null" if @tenant.blank?

    TenantLevelSecurity.switch!(@tenant.id)
    MultiTenant.with(@tenant, &block)
  end
end

ミドルウェアで TenantLevelSecurity.switch!する

activerecord-tenant-level-securityでは、Sidekiq用のミドルウェアを提供している。

このミドルウェアはジョブ作成側のテナントIDをジョブの追加パラメータに保存しておいて実行時に自動的にTenantLevelSecurityを呼び出すため、個別のジョブでTenantLevelSecurityを呼び出す必要が無くなります。

ただし、このミドルウェアには後述の問題があるため、kickflowでは使っていません。

module TenantLevelSecurity
  module Sidekiq
    module Middleware
      class Client
        def call(worker_class, job, queue, redis_pool)
          tenant_id = TenantLevelSecurity.current_session_tenant_id
          if tenant_id.present?
            job['tenant_level_security'] ||= { id: tenant_id }
          end

          yield
        end
      end

      class Server
        def call(worker, job, queue)
          if job.key?('tenant_level_security')
            TenantLevelSecurity.with(job['tenant_level_security']['id']) do
              yield
            end
          else
            yield
          end
        end
      end
    end
  end
end

https://github.com/kufu/activerecord-tenant-level-security/blob/master/lib/activerecord-tenant-level-security/sidekiq.rb

テスト

Railsアプリケーションの全体に関わる変更なので、原則として全機能のテストが必要になります。

実際には、全テーブルでいきなり行レベルセキュリティポリシーを有効化するのではなく、一部テーブルから徐々に有効化していきます。

また、ユニットテスト（rspec）にもコードを追加し、RLS関係のバグが見つかるようにしています。

# spec/rails_helper.rb

RSpec.configure do |config|

  ...

  # TenantLevelSecurityにテナントがセットされた状態でテストがスタートするよう、フックを追加。
  config.around do |example|
    Tenant.delete_all
    
    tenant1 = create(:tenant)
    TenantLevelSecurity.with(tenant1.id) do
      MultiTenant.with(tenant1) do
        example.run
      end
    end
    
    Tenant.delete_all
  end
end

問題点と回避方法

全テナントの行にアクセスする方法

全テナントの行を取得したい場合があります。

ETLサービス
pg_dumpコマンド
バックアップ処理
社内用管理画面で全テナントを一覧するとき

以下のように、テナント毎にクエリして結合してもみましたが、必要なパフォーマンスが出ませんでした。また、ETLサービスやpg_dumpコマンドではクエリを自由に変える事ができません。

users = []
Tenant.all.ids.each do |tenant_id|
  TenantLevelSecurity.with(tenant_id) do
    users.push(*User.where(...).to_a)
  end
end

本来は、このような場合はPypass RLS属性を設定したユーザーを使ってRLSを無効化すべきです。しかし、Heroku Postgresではユーザーに自由に属性を追加する事ができません。

kickflowでは行レベルセキュリティーポリシーを以下のように修正しています。特定ユーザーのユーザー（kickflow_read）である場合や、実行時パラメータ tenant_level_security.unsafe に所定の値がセットされている場合に tenant_id との比較をスキップします。

CREATE POLICY tenant_policy ON #{table_name}
  AS PERMISSIVE
  FOR ALL
  TO PUBLIC
  USING (
    CURRENT_USER = 'kickflow_read' OR
    current_setting('tenant_level_security.unsafe', true) = 'UNSAFE' OR
    tenant_id = NULLIF(current_setting('tenant_level_security.tenant_id', true), '')::#{tenant_id_data_type}
  )
  WITH CHECK (
    CURRENT_USER = 'kickflow_read' OR
    current_setting('tenant_level_security.unsafe', true) = 'UNSAFE' OR
    tenant_id = NULLIF(current_setting('tenant_level_security.tenant_id', true), '')::#{tenant_id_data_type}
  )

なお、この行レベルセキュリティーポリシーを使うために、TenantLevelSecurityにモンキーパッチしています。

ジョブ起動時のN+1問題

kickflowには決まった時間に各ユーザーにリマインダーを送る機能がありますが、RLSを導入してからリマインダー送信時にPostgreSQLの負荷が高まる問題が起きました。

activerecord-tenant-level-securityのミドルウェアにはcurrent_session_tenant_id でPostgreSQLに現在のtenant_idを問い合わせる処理があるためです。

class Client
  def call(worker_class, job, queue, redis_pool)
    tenant_id = TenantLevelSecurity.current_session_tenant_id
    if tenant_id.present?
      job['tenant_level_security'] ||= { id: tenant_id }
    end

    yield
  end
end

# ユーザーごとにリマインダーを送信する処理

user_ids.each do |user_id|
  # activerecord-tenant-level-securityのミドルウェアをそのまま使うと、
  # current_session_tenant_id がユーザー数と同じ回数呼び出される。
  UserReminderJob.perform_async(user_id)
end

kickflowでは、ミドルウェアを独自のものに差し替え、以下のようにオプションでtenant_idを指定できるようにしました。

class Client
  include Sidekiq::ClientMiddleware

  def call(worker_class, job, queue, redis_pool)
    unless job.key?("tenant_level_security")
      # ジョブ登録側でテナントIDを指定していない場合のみ、現在のテナントIDを参照してセットする。
      tenant_id = TenantLevelSecurity.current_session_tenant_id
      if tenant_id.present?
        job["tenant_level_security"] = { id: tenant_id }
      end
    end

    yield
  end
end

user_ids.each do |user_id|
  # N+1問題を回避するため、明示的にテナントIDを指定してジョブを登録する
  UserReminderJob.set({ tenant_level_security: { id: tenant_id } }).perform_async(user_id)
end

その他

開発や調査のためにヘルパーメソッドを用意してあります。

$ heroku run rails console

> Tenant.switch_tenant!("tokugawa") # TenantLevelSecurity.switch! と MultiTenant.current_tenant を設定する

class Tenant < ApplicationRecord
  # rails console用のヘルパー
  def self.switch_tenant!(public_id)
    tenant = Tenant.find_by!(public_id:)
    MultiTenant.current_tenant = tenant
    TenantLevelSecurity.switch!(tenant.id)
  end
end

We are hiring!

kickflow(キックフロー)は、運用・メンテナンスの課題を解決する「圧倒的に使いやすい」クラウドワークフローです。