RNN with hierarchical attention
The paper "Hierarchical Reasoning Model" has recently been released. In shows that a recurrent neural network can be used for LLM. The big thing missing is that they still rely on RoPe and transformer architecture, so handling large context windows with high precision is still limited. Also they mention that HRM is turing complete, and while it is much closer to being Turing complete, I would argue that to be fully Turing complete the system should also be able to use infinite memory. But it is very hard to have an end-to-end trained model for that, since it has to make complex decisions. This article will describe a model, where infiinite mwmory is not aolved, and I imagine that,bthat functionality can be bolted with RL using context space thinking. But the underlying end to end trained model, will have recurrent thinking on very big context windows. We also have to have performance thought into training. When releasing the model to the end users, it should work fast and e...