*What it does*
* Replaces the linear compression term with \(u(I)=I^2\) → objective becomes strictly convex, so each β has a unique optimum.
* Adds a small entropy penalty (ε≈0.05) to keep p(z\|x) stochastic until β is large.
* Uses an implicit ODE (`dq/dβ = -H⁻¹ ∇_q L`) as a predictor, then a Newton-style corrector → follows the optimum path smoothly.
*How to run*
```bash
git clone https://github.com/farukalpay/information-bottleneck-beta-op...
cd information-bottleneck-beta-optimization/code_v3_Stable_Continuation
pip install -r requirements.txt # numpy, scipy, jax optional
python stable_continuation_ib.py
• Figures land in ib_plots/ (info-plane, Hessian eigenvalues, encoder heatmaps).
• Total runtime ≈ 3 s w/ NumPy, <1 s on JAX-GPU.
Why 2 × 2 and 8 × 8?
Wanted minimal cases where standard IB shows hard jumps; convex version keeps the same asymptotic optimum but removes the discontinuity.
Looking for feedback on
1. Extending the continuation trick to Variational IB / deep encoders.
2. Any theoretical caveats of the convex surrogate at very high β.
3. Real datasets where phase-jump-free IB would be handy.
Code is MIT, paper is CC-BY-4.0. Feel free to fork / reuse—stars and PRs welcome!
*What it does* * Replaces the linear compression term with \(u(I)=I^2\) → objective becomes strictly convex, so each β has a unique optimum. * Adds a small entropy penalty (ε≈0.05) to keep p(z\|x) stochastic until β is large. * Uses an implicit ODE (`dq/dβ = -H⁻¹ ∇_q L`) as a predictor, then a Newton-style corrector → follows the optimum path smoothly.
*How to run* ```bash git clone https://github.com/farukalpay/information-bottleneck-beta-op... cd information-bottleneck-beta-optimization/code_v3_Stable_Continuation pip install -r requirements.txt # numpy, scipy, jax optional python stable_continuation_ib.py
• Figures land in ib_plots/ (info-plane, Hessian eigenvalues, encoder heatmaps).
• Total runtime ≈ 3 s w/ NumPy, <1 s on JAX-GPU.
Why 2 × 2 and 8 × 8? Wanted minimal cases where standard IB shows hard jumps; convex version keeps the same asymptotic optimum but removes the discontinuity.
Looking for feedback on 1. Extending the continuation trick to Variational IB / deep encoders.
2. Any theoretical caveats of the convex surrogate at very high β.
3. Real datasets where phase-jump-free IB would be handy.
Code is MIT, paper is CC-BY-4.0. Feel free to fork / reuse—stars and PRs welcome!