Opslag

Auto generalization

Billede
My current strategy is two-legged.  The first leg is to find papers with benchmarks that current LLMs struggle with, and show my Turing complete LLM can solve these. The second is to show that it can solve challenges that current LLMs are good at. One of these things is that LLMs are really good at answering questions if they have enough knowledge of a problem. If we ask ChatGPT how it have knowledge about cities it answers this: " I rely on knowledge encoded from geographic and linguistic data up to my training cutoff (mid-2024), which includes global place name databases such as Geonames , Wikidata , national census gazetteers , and academic or development sources (e.g., UN and World Bank geographic datasets). So when you mentioned “Boulma,” I recognized it as matching entries from Burkina Faso , where several small villages with that name appear in those official datasets. I didn’t search the web — it’s based on general geographic knowledge and structured data I was traine...

ASI framework strategy

The previous experiments have shown that a path some of the critical components for ASI are possible is https://hardai-omnia.blogspot.com/2025/08/asi-experiments.html. But this page tries to better explain the overarching framework, since many pieces are still missing in the implementation.  I am open to ideas what to train next. To be precise what it is we are describing. And LLM, that works exactly like a normal LLM. Except it is extremely good at retaining details and use those from a very context window. And when training it will optimize both for correctness and performance. But it is limited just like a normal LLM, and should be put into an agentic framework. So agentic frameworks that does RAG, coding, connect to databases and so on, are still very much needed. But this LLM will use whatever you put in the context window in the best way possible.  We should probably also touch a little bit about how AI companies will make money in the future. There is a race to the bott...

ASI - Experiments

Billede
Problem - coding on large code bases I have been doing a lot of vibe coding, and current models deteriorate in quality when the problem gets too large and too detailed. This makes complete sense if you know how current transformers work, and is well described here: https://research.trychroma.com/context-rot  The LLM simply glosses over details, and when it then rewrites some code, it forgets some instructions or previous details in the code, and you as a human will have to keep adding these things back. It is possible for the framework to do patches - but the problem is the same. The LLM is not smart enough to do a good patch.  I have a lot of experience in this, and if it was solved my life as a coder would be much easier. So I assume it is not a solved problem.  So this page is  description of an LLM that should be able to both handle large code bases, having deep knowledge of algorithms that exists and be smart enough to solve the problem asked. I have made some s...

RNN with hierarchical attention

 The paper "Hierarchical Reasoning Model" has recently been released. In shows that a recurrent neural network can be used for LLM. The big thing missing is that they still rely on RoPe and transformer architecture, so handling large context windows with high precision is still limited. Also they mention that HRM is turing complete, and while it is much closer to being Turing complete, I would argue that to be fully Turing complete the system should also be able to use infinite memory. But it is very hard to have an end-to-end trained model for that, since it has to make complex decisions. This article will describe a model, where infiinite mwmory is not aolved, and I imagine that,bthat functionality can be bolted with RL using context space thinking. But the underlying end to end trained model, will have recurrent thinking on very big context windows.  We also have to have performance thought into training. When releasing the model to the end users, it should work fast and e...

RNN LLM with small models

 A lot of avenue have been explorered like RWKV and it descendents. This proposal is very similar, but add some additional elelementsntinmake the recurrent part shine. The main takeaway is that instead of having a lot transformer blocks, we just have one transformer block for entering thinking mode one for exiting, and the RNN block for doing the thinking. We have different sizes of the RNN thinking block, and uses RL as the last step in the training to unlock it. The end result is a turing complete LLM, but that is too hard to train. So this proposal is a middle ground, where the thinking transformer due to its recurring nature will be able generalize to skill a knowledge not in the training set. We still use MoE, and the key idea in the algorithm is that we remove the typical 30-60 transformer blocks, and rely on even more mixture of experts. Why is it powerful? Instead of having a fix number of steps, where each transformer get closer and closer to a solution, it has the ability...

Zero day super intelligence

 Current state Grok4 has been released and improves synthetic benchmarks by a large margin. But using it for actually coding it falls through.  When I am coding and want fast and correct responses, that only changes the minimal amount in existing large code base. And Gemini pro and Claude are on par, and Grok 4 has no improvement for that use case. One of the reason it does well on benchmarks is it's prompt engineering with an agent specialization framework  -  Analysis Agent focuses on data interpretation -Synthesis Agent combines multiple perspectives -Verification Agent cross-checks reasoning accuracy -Communication Agent translates findings coherently   While this is interesting, it doesn't really solve the benchmarks completely, it just enhances thr capabilities of the underlying LLM, but is still limited by the underlying structure.   What if we could have an LLM that we could just train, and it would get 100% accuracy on all benchmarks, and would be ...

How to create super intelligence

With the advent of thinking models, or test time compute as it is called, AI has reach a milestone. It could probably be called AGI, and continuing refining this path can give very strong models. But it is still not very close to the theoretical maximum of how good these models can get. To get a little close here are six ingredients for achieving super intelligence. The underlying reason why this will probably work can be found in this paper: https://arxiv.org/html/2305.17026v3 "How Powerful are Decoder-Only Transformer Neural Models?" If the transformer architecture is Turing complete (meaning it can run like a computer program, and thus solve any problem), then we can probably also assume that we can have the architecture have Turing like properties, but not run as a program, but as reasoning.  In other words, when the number of layers and parameters rises, we can probably simulate something that is more complex than are circuit (and we already know we can simulate a circui...