Posts

This page helps to keep track of my study notes.

2023

NEFTune: Noisy Embedding Instruction Fine Tuning

2 minute read

Published: November 21, 2023

This paper proposes NEFTune, a simple trick by adding noise to embedding vectors during training which improve the outcome of instruction fine-tuning by large margin. If you are using SFT trainier by huggingface, you can use this trick by simply adding one line of code!

Making Large Language Models Perform Better in Knowledge Graph Completion

6 minute read

Published: October 31, 2023

This paper proposes knowledge prefix adapter (KoPA), an approach to improve structural-aware reasoning, thereby enhancing the reasoning ability of LLM.

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

4 minute read

Published: October 15, 2023

This paper introduces LLM-Planner, which utilizes LLMs as high-level planners for embodied agents, allowing them to generate and adapt plans according to the current environment. Experiments on the ALFRED dataset show that using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data.

Notes on Recent Talks about Autonomous Intelligence by Yann LeCun

8 minute read

Published: October 08, 2023

This note includes insights from Yann LeCun, often referred to as the father of deep learning. In his talk, he discussed the limitations of current machine learning methods and self-supervised learning methods. He emphasized the need for objective-driven AI and introduced the concept of a modular cognitive architecture, also known as the world model. Additionally, he introduced the Joint-Embedding Predictive Architecture (JEPA), a new approach in the field.

Contrastive Decoding: Open-ended Text Generation as Optimization

3 minute read

Published: October 01, 2023

The paper proposes a new decoding method for open-ended text generation, called contrastive decoding (CD), which aims to generate text that is fluent, coherent, and informative, by exploiting the contrasts between expert model and amateur model behaviors.

Lee Zhi Cheng