This paper proposes NEFTune, a simple trick by adding noise to embedding vectors during training which improve the outcome of instruction fine-tuning by large margin. If you are using SFT trainier by huggingface, you can use this trick by simply adding one line of code!
This paper proposes knowledge prefix adapter (KoPA), an approach to improve structural-aware reasoning, thereby enhancing the reasoning ability of LLM.
This paper introduces LLM-Planner, which utilizes LLMs as high-level planners for embodied agents, allowing them to generate and adapt plans according to the current environment. Experiments on the ALFRED dataset show that using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data.
This note includes insights from Yann LeCun, often referred to as the father of deep learning. In his talk, he discussed the limitations of current machine learning methods and self-supervised learning methods. He emphasized the need for objective-driven AI and introduced the concept of a modular cognitive architecture, also known as the world model. Additionally, he introduced the Joint-Embedding Predictive Architecture (JEPA), a new approach in the field.
The paper proposes a new decoding method for open-ended text generation, called contrastive decoding (CD), which aims to generate text that is fluent, coherent, and informative, by exploiting the contrasts between expert model and amateur model behaviors.