Opslag

Viser opslag fra juli, 2025

RNN with hierarchical attention

 The paper "Hierarchical Reasoning Model" has recently been released. In shows that a recurrent neural network can be used for LLM. The big thing missing is that they still rely on RoPe and transformer architecture, so handling large context windows with high precision is still limited. Also they mention that HRM is turing complete, and while it is much closer to being Turing complete, I would argue that to be fully Turing complete the system should also be able to use infinite memory. But it is very hard to have an end-to-end trained model for that, since it has to make complex decisions. This article will describe a model, where infiinite mwmory is not aolved, and I imagine that,bthat functionality can be bolted with RL using context space thinking. But the underlying end to end trained model, will have recurrent thinking on very big context windows.  We also have to have performance thought into training. When releasing the model to the end users, it should work fast and e...

RNN LLM with small models

 A lot of avenue have been explorered like RWKV and it descendents. This proposal is very similar, but add some additional elelementsntinmake the recurrent part shine. The main takeaway is that instead of having a lot transformer blocks, we just have one transformer block for entering thinking mode one for exiting, and the RNN block for doing the thinking. We have different sizes of the RNN thinking block, and uses RL as the last step in the training to unlock it. The end result is a turing complete LLM, but that is too hard to train. So this proposal is a middle ground, where the thinking transformer due to its recurring nature will be able generalize to skill a knowledge not in the training set. We still use MoE, and the key idea in the algorithm is that we remove the typical 30-60 transformer blocks, and rely on even more mixture of experts. Why is it powerful? Instead of having a fix number of steps, where each transformer get closer and closer to a solution, it has the ability...

Zero day super intelligence

 Current state Grok4 has been released and improves synthetic benchmarks by a large margin. But using it for actually coding it falls through.  When I am coding and want fast and correct responses, that only changes the minimal amount in existing large code base. And Gemini pro and Claude are on par, and Grok 4 has no improvement for that use case. One of the reason it does well on benchmarks is it's prompt engineering with an agent specialization framework  -  Analysis Agent focuses on data interpretation -Synthesis Agent combines multiple perspectives -Verification Agent cross-checks reasoning accuracy -Communication Agent translates findings coherently   While this is interesting, it doesn't really solve the benchmarks completely, it just enhances thr capabilities of the underlying LLM, but is still limited by the underlying structure.   What if we could have an LLM that we could just train, and it would get 100% accuracy on all benchmarks, and would be ...