Opslag

ASI - Experiments

Billede
Problem - coding on large code bases I have been doing a lot of vibe coding, and current models deteriorate in quality when the problem gets too large and too detailed. This makes complete sense if you know how current transformers work, and is well described here: https://research.trychroma.com/context-rot  The LLM simply glosses over details, and when it then rewrites some code, it forgets some instructions or previous details in the code, and you as a human will have to keep adding these things back. It is possible for the framework to do patches - but the problem is the same. The LLM is not smart enough to do a good patch.  I have a lot of experience in this, and if it was solved my life as a coder would be much easier. So I assume it is not a solved problem.  So this page is  description of an LLM that should be able to both handle large code bases, having deep knowledge of algorithms that exists and be smart enough to solve the problem asked. I have made some s...

RNN with hierarchical attention

 The paper "Hierarchical Reasoning Model" has recently been released. In shows that a recurrent neural network can be used for LLM. The big thing missing is that they still rely on RoPe and transformer architecture, so handling large context windows with high precision is still limited. Also they mention that HRM is turing complete, and while it is much closer to being Turing complete, I would argue that to be fully Turing complete the system should also be able to use infinite memory. But it is very hard to have an end-to-end trained model for that, since it has to make complex decisions. This article will describe a model, where infiinite mwmory is not aolved, and I imagine that,bthat functionality can be bolted with RL using context space thinking. But the underlying end to end trained model, will have recurrent thinking on very big context windows.  We also have to have performance thought into training. When releasing the model to the end users, it should work fast and e...

RNN LLM with small models

 A lot of avenue have been explorered like RWKV and it descendents. This proposal is very similar, but add some additional elelementsntinmake the recurrent part shine. The main takeaway is that instead of having a lot transformer blocks, we just have one transformer block for entering thinking mode one for exiting, and the RNN block for doing the thinking. We have different sizes of the RNN thinking block, and uses RL as the last step in the training to unlock it. The end result is a turing complete LLM, but that is too hard to train. So this proposal is a middle ground, where the thinking transformer due to its recurring nature will be able generalize to skill a knowledge not in the training set. We still use MoE, and the key idea in the algorithm is that we remove the typical 30-60 transformer blocks, and rely on even more mixture of experts. Why is it powerful? Instead of having a fix number of steps, where each transformer get closer and closer to a solution, it has the ability...

Zero day super intelligence

 Current state Grok4 has been released and improves synthetic benchmarks by a large margin. But using it for actually coding it falls through.  When I am coding and want fast and correct responses, that only changes the minimal amount in existing large code base. And Gemini pro and Claude are on par, and Grok 4 has no improvement for that use case. One of the reason it does well on benchmarks is it's prompt engineering with an agent specialization framework  -  Analysis Agent focuses on data interpretation -Synthesis Agent combines multiple perspectives -Verification Agent cross-checks reasoning accuracy -Communication Agent translates findings coherently   While this is interesting, it doesn't really solve the benchmarks completely, it just enhances thr capabilities of the underlying LLM, but is still limited by the underlying structure.   What if we could have an LLM that we could just train, and it would get 100% accuracy on all benchmarks, and would be ...

How to create super intelligence

With the advent of thinking models, or test time compute as it is called, AI has reach a milestone. It could probably be called AGI, and continuing refining this path can give very strong models. But it is still not very close to the theoretical maximum of how good these models can get. To get a little close here are six ingredients for achieving super intelligence. The underlying reason why this will probably work can be found in this paper: https://arxiv.org/html/2305.17026v3 "How Powerful are Decoder-Only Transformer Neural Models?" If the transformer architecture is Turing complete (meaning it can run like a computer program, and thus solve any problem), then we can probably also assume that we can have the architecture have Turing like properties, but not run as a program, but as reasoning.  In other words, when the number of layers and parameters rises, we can probably simulate something that is more complex than are circuit (and we already know we can simulate a circui...

Strawberry I-don't-know, and an agent implementation

Reinforcement with 'i-don'know'  OpenAI has just released strawberry/o1, and we are now very close to AGI. It is pretty much using the same technique that I have already outlined in a previous blogpost. Create synthetic data with trains of thought, where each thought is a Q/A. This turns the problem into a reinforcement problem, where the evaluation of each answer can be scored. One crucial thing they unfortunately didn't do, was to mark answers that are wrong as i-don't-know. Then the algorithm would be to train with a small nerual net the first round, find all answers that are wrong, and then train again with a bigger neural net. Scoring wise a wrong answer is worse than saying i-don't-know, which is then worse the correct answer. Since final neural net is bigger than the one that was original trained, it should have a high chance of answering i-don't-know, when it needs to. Agent Having a strong LLM is just one part of AGI. The AGI also needs to be able t...