I'm a solo researcher (and 2nd year law student) building tools at the intersection of information theory and control systems for AI/ML. Inspired by Claude Shannon's work at Bell Labs, I created the Shannon Control Unit (SCU): cruise control for neural network training.
SCU senses the info-ratio and auto-adjusts via PI control for steady, efficient introduction of information.
The mechanism dynamically maintains a target Shannon Information Ratio (S = ParamBPT / (DataBPT + ParamBPT)).
No more manual hyperparam tuning — it self-regulates λ for stability under data drift and faster generalization.
Ablation shows adaptive PI outperforms fixed λ by up to 1.8% BPT.
Validated on Llama-3.2:1B: -15.6% perplexity (15.14 → 12.78), -6.2% BPT
3B: -12.6% perplexity (3.56 → 3.11), -10.6% BPT
It's open-source under AGPL 3.0 (for those who want to build on it while sharing improvements). Implemented as LoRA adapters via PEFT/Transformers—load on Meta's base models.
Quick start:
python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
model = PeftModel.from_pretrained(model, "hunterbown/shannon-control-unit")
DMs open for feedback or 7B+ scale partners—happy to offer a 2-week trial to replicate results.
What do you think:
Does this generalize beyond 3B? Going from 1B to 3B required discovering the natural fit of the new model, so I suspect there could be a natural equilibrium where models train most efficiently using this method.
I'm a solo researcher (and 2nd year law student) building tools at the intersection of information theory and control systems for AI/ML. Inspired by Claude Shannon's work at Bell Labs, I created the Shannon Control Unit (SCU): cruise control for neural network training.
SCU senses the info-ratio and auto-adjusts via PI control for steady, efficient introduction of information.
The mechanism dynamically maintains a target Shannon Information Ratio (S = ParamBPT / (DataBPT + ParamBPT)).
No more manual hyperparam tuning — it self-regulates λ for stability under data drift and faster generalization.
Core formula:Adjust λ via: λ_new = λ · exp(-(Kp·error + Ki·I))
Ablation shows adaptive PI outperforms fixed λ by up to 1.8% BPT. Validated on Llama-3.2:1B: -15.6% perplexity (15.14 → 12.78), -6.2% BPT 3B: -12.6% perplexity (3.56 → 3.11), -10.6% BPT
It's open-source under AGPL 3.0 (for those who want to build on it while sharing improvements). Implemented as LoRA adapters via PEFT/Transformers—load on Meta's base models.
Quick start: python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B") model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") model = PeftModel.from_pretrained(model, "hunterbown/shannon-control-unit")
Try the Colab demo: https://colab.research.google.com/github/Hmbown/shannon-cont... HF space: https://huggingface.co/hunterbown/shannon-control-unit
X thread for more context: https://x.com/huntermbown/status/1963802419785039878
DMs open for feedback or 7B+ scale partners—happy to offer a 2-week trial to replicate results.
What do you think: Does this generalize beyond 3B? Going from 1B to 3B required discovering the natural fit of the new model, so I suspect there could be a natural equilibrium where models train most efficiently using this method.