Induction and rfl

A subtle detail in the use of induction and rfl

There is a subtlety in the description of the induction tactic in Lean.

Assuming x is a variable in the local context with an inductive type, induction x applies induction on x to the main goal… If the type of an element in the local context depends on x, that element is reverted and reintroduced afterward, so that the inductive hypothesis incorporates that hypothesis as well.

The important part is that the element is reverted and reintroduced afterward. If we happen to have a hypothesis h : P x in the local context, then when applying induction x, h will be reverted. That means we will need to provide a new proof of h in order to use the induction hypothesis. This is true regardless of whether h is something we still need or not.

Consider this example, where we define a function partialSum that recursively adds 1 while walking through Fin n.

import Mathlib.Data.Fin.Basic
import Mathlib.Tactic

/-- Recursively add `1` while walking through `Fin n`. -/
def partialSum {n : } : Fin n  
  | 0, _        => 0
  | k+1, hk =>
      let hk'    : k < n := Nat.lt_of_succ_lt hk
      let prev   : Fin n := k, hk'
      (partialSum prev) + 1

We can prove that partialSum is strictly increasing. To do that, we first prove that it increases between two consecutive elements of Fin n.

lemma partialSum_step_lt {n k : } (hk : k.succ < n) :
    partialSum k    , Nat.lt_of_succ_lt hk <
    partialSum k.succ, hk := by
  simp [partialSum, hk, Nat.lt_of_succ_lt]

Now we can prove that partialSum is strictly increasing as a generalization of the previous lemma. That is, we want to prove that partialSum ⟨k, hk⟩ < partialSum ⟨m, hm⟩ for k < m. We do this by induction on d = (m - k) - 1, i.e., one less than the distance between k and m.

lemma partialSum_lt_of_lt
    {n k m : } (hk : k < n) (hm : m < n)
    (hkm : (k, hk : Fin n) < m, hm) :
    partialSum k, hk < partialSum m, hm := by
  -- turn the `Fin` inequality into one on `ℕ`
  have h_nat : k < m := hkm
  obtain d, rfl := Nat.exists_eq_add_of_lt h_nat
  induction d with
  | zero =>
      have hks : k.succ < n := by simpa using hm
      simpa using partialSum_step_lt hks
  | succ d' ih =>
      have h_mid : k + d' + 1 < n := by linarith
      have hkm' : (k, hk : Fin n) < k + d'.succ, h_mid := by simp
      have : k < k + d'.succ := by
        apply Nat.lt_add_of_pos_right
        exact Nat.succ_pos d'
      have h : partialSum k, hk <
                partialSum k + d'.succ, h_mid := ih h_mid hkm' this
      have h : partialSum k + d'.succ, h_mid <
                partialSum (k + d'.succ).succ, by simpa using hm :=
        by simpa using partialSum_step_lt hm
      exact lt_trans h h

The line to focus on is the one where we obtain the value d:

    obtain d, rfl := Nat.exists_eq_add_of_lt h_nat

This is a standard way of extracting a value from an inequality. It assigns d to be m - k - 1, and because of the rfl, it replaces all occurrences of m with k + d + 1 in both the goal and the local context.

Concretely, we now have:

case intro
n k : ℕ
hk : k < n
d : ℕ
hm : k + d + 1 < n
hkm : ⟨k, hk⟩ < ⟨k + d + 1, hm⟩
h_nat : k < k + d + 1
⊢ partialSum ⟨k, hk⟩ < partialSum ⟨k + d + 1, hm⟩

So, in the induction step, any hypothesis containing d will be reverted:

ih : ∀ (hm : k + d' + 1 < n), ⟨k, hk⟩ < ⟨k + d' + 1, hm⟩ → k < k + d' + 1 → partialSum ⟨k, hk⟩ < partialSum ⟨k + d' + 1, hm⟩

Therefore, to use the induction hypothesis, we must provide new proofs for hm in the induction step, and variants of the hkm and h_nat hypotheses. Which is exactly what we do in the following two lines:

    have h_mid : k + d' + 1 < n := by linarith
    have hkm' : (k, hk : Fin n) < k + d'.succ, h_mid := by simp
    have : k < k + d'.succ := by ...

Performing these operations is a bit cumbersome, as it adds boilerplate to the proof. However, it can be simplified by observing that neither hkm nor h_nat are needed after obtaining d, and we can remove them from the local context using clear.

A simpler version of the proof would be:

lemma partialSum_lt_of_lt_new
    {n k m : } (hk : k < n) (hm : m < n)
    (hkm : (k, hk : Fin n) < m, hm) :
    partialSum k, hk < partialSum m, hm := by
  -- turn the `Fin` inequality into one on `ℕ`
  have h_nat : k < m := hkm
  obtain d, rfl := Nat.exists_eq_add_of_lt h_nat
  clear hkm h_nat
  induction d with
  | zero =>
      have hks : k.succ < n := by simpa using hm
      simpa using partialSum_step_lt hks
  | succ d' ih =>
      have h_mid : k + d'.succ < n := by linarith
      have h : partialSum k, hk <
                partialSum k + d'.succ, h_mid := ih h_mid
      have h : partialSum k + d'.succ, h_mid <
                partialSum (k + d'.succ).succ, by simpa using hm :=
        by simpa using partialSum_step_lt hm
      exact lt_trans h h

In particular, to access the ih hypothesis, we now only need to provide h_mid, which is much simpler than before.

At this point, I wonder whether it would make sense to add an option to induction to specify which hypotheses we want to be reverted.