Recurrent neural networks (RNN) are a type of artificial intelligence that is mainly used in the field of deep learning. Unlike traditional neural networks, RNNs have a memory that captures information about what has been calculated so far. In other words, they use their understanding of previous inputs to influence the outcome they will produce.
RNNs are called “recurrent” because they perform the same task for each element in a sequence, and the result depends on previous calculations. RNNs are still used to power smart technologies like Apple's Siri and Google Translate.
However, with the advent of transformers like ChatGPT, the natural language processing (NLP) landscape has changed. While transformers revolutionized NLP tasks, their memory and computational complexity increased quadratically with sequence length, demanding more resources.
Enter RWKV
Now, a new open source project, RWKV, offers promising solutions to the GPU power conundrum. The project, supported by the Linux Foundation, aims to dramatically reduce the computing requirements for GPT-level language learning models (LLMs), potentially by up to 100 times.
RNNs exhibit linear scaling in memory and computational requirements, but struggle to match the performance of transformers due to their limitations in parallelization and scalability. This is where the RWKV comes into play.
RWKV, or receive weighted key value, is a novel model architecture that combines the efficiency of parallelizable training of transformers with the efficient inference of RNNs. The result? A model that requires much fewer resources (VRAM, CPU, GPU, etc.) for execution and training, while maintaining high quality performance. It also adapts linearly to any length of context and is generally better trained in languages other than English.
Despite these promising features, the RWKV model is not without its challenges. It is sensitive to quick formatting and weaker in tasks that require retrospect. However, these issues are being addressed and the potential benefits of the model far outweigh the current limitations.
The implications of the RWKV project are profound. Instead of needing 100 GPUs to train an LLM model, an RWKV model could deliver similar results with fewer than 10 GPUs. This not only makes the technology more accessible but also opens up possibilities for future advancements.