Large Language Models

Training & Fine-tuning

A work claiming that not all tokens in a corpus are equally important for LLM training:

Lin, Zhenghao, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, et al. “Not All Tokens Are What You Need for Pretraining,” 2024. https://openreview.net/forum?id=0NMzBwqaAJ.

A novel LoRA framework:

Tian, Chunlin, Zhan Shi, Zhijiang Guo, Li Li, and Cheng-zhong Xu. “HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning,” 2024. https://openreview.net/forum?id=qEpi8uWX3N.

Prompting & RAG

A few works on the Chain-of-Thought framework for LLMs:

Chen, Qiguang, Libo Qin, Jiaqi Wang, Jingxuan Zhou, and Wanxiang Che. “Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought,” 2024. https://openreview.net/forum?id=pC44UMwy2v.

Xue, Shangzi, Zhenya Huang, Jiayu Liu, Xin Lin, Yuting Ning, Binbin Jin, Xin Li, and Qi Liu. “Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle,” 2024. https://openreview.net/forum?id=NPKZF1WDjZ.

Adaptation

Papers on how to align and adapt LLMs to specific requirements:

Ji, Jiaming, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi Qiu, Juntao Dai, and Yaodong Yang. “Aligner: Efficient Alignment by Learning to Correct,” 2024. https://openreview.net/forum?id=kq166jACVP.

Analysis

Systematic analysis of LLMs’ zero-shot capability on tasks not present in training set:

He, Tianyu, Darshil Doshi, Aritra Das, and Andrey Gromov. “Learning to Grok: Emergence of in-Context Learning and Skill Composition in Modular Arithmetic Tasks,” 2024. https://openreview.net/forum?id=aVh9KRZdRk.

An interesting work analyzing LLMs’ understanding of human humor:

Hu, Zhe, Tuo Liang, Jing Li, Yiren Lu, Yunlai Zhou, Yiran Qiao, Jing Ma, and Yu Yin. “Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions,” 2024. https://openreview.net/forum?id=bCMpdaQCNW.

Optimization

A few works on how to optimize and compress LLMs:

Malinovskii, Vladimir, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Pavlovich Burlachenko, Kai Yi, Dan Alistarh, and Peter Richtárik. “PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression,” 2024. https://openreview.net/forum?id=YvA8UF0I37.

Sun, Yutao, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, and Furu Wei. “You Only Cache Once: Decoder-Decoder Architectures for Language Models,” 2024. https://openreview.net/forum?id=25Ioxw576r.

Lin, Haokun, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, and Ying Wei. “DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs,” 2024. https://openreview.net/forum?id=mp8u2Pcmqz.

Computer Vision

Image Generation

A novel framework for generating high-resolution images by starting from a low-resolution image and gradually improving its resolution in an auto-regressive style:

Tian, Keyu, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. “Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction,” 2024. https://openreview.net/forum?id=gojL67CfS8.

A work focuses on improving the sampling quality of diffusion models, especially when number of sample timesteps are small:

Yoon, Sangwoong, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, and Frank C. Park. “Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models,” 2024. https://openreview.net/forum?id=V0oJaLqY4E.

A solution for automatically selecting the optimal adapter for conditioned generation:

Luo, Michael, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Russ Salakhutdinov, and Ion Stoica. “Stylus: Automatic Adapter Selection for Diffusion Models,” 2024. https://openreview.net/forum?id=3Odq2tGSpp.

An alternative guidance framework to the classifier-free guidance:

Karras, Tero, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, and Samuli Laine. “Guiding a Diffusion Model with a Bad Version of Itself,” 2024. https://openreview.net/forum?id=bg6fVPVs3s.

A paper dive into the specific problem of unconditioned generation:

Li, Tianhong, Dina Katabi, and Kaiming He. “Return of Unconditional Generation: A Self-Supervised Representation Generation Method,” 2024. https://openreview.net/forum?id=clTa4JFBML.

A work on improving the sampling efficiency of diffusion models:

Yin, Tianwei, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T. Freeman. “Improved Distribution Matching Distillation for Fast Image Synthesis,” 2024. https://openreview.net/forum?id=tQukGCDaNT.

Multi-modal

A survey style paper on the multi-modal LLMs:

Tong, Shengbang, Ellis L. Brown Ii, Penghao Wu, Sanghyun Woo, Adithya Jairam Iyer, Sai Charitha Akula, Shusheng Yang, et al. “Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs,” 2024. https://openreview.net/forum?id=Vi8AepAXGy.

NN Foundations

Diffusion Models

A new framework for diffusion models with high training and sample efficiency:

Terpin, Antonio, Nicolas Lanzetti, Martín Gadea, and Florian Dorfler. “Learning Diffusion at Lightspeed,” 2024. https://openreview.net/forum?id=y10avdRFNK.

A work discussing utilization of diffusion models as a representation learning framework:

Xu, Zhengrui, Guan’an Wang, Xiaowen Huang, and Jitao Sang. “DenoiseRep: Denoising Model for Representation Learning,” 2024. https://openreview.net/forum?id=OycU0bAus6.

ML Foundations

Model Training

A work focuses on efficient optimizing of neural networks with loss that contain high-dimensional and high-order differential operators:

Shi, Zekun, Zheyuan Hu, Min Lin, and Kenji Kawaguchi. “Stochastic Taylor Derivative Estimator: Efficient Amortization for Arbitrary Differential Operators,” 2024. https://openreview.net/forum?id=J2wI2rCG2u.

AI4Science

Liu, Gang, Jiaxin Xu, Tengfei Luo, and Meng Jiang. “Graph Diffusion Transformers for Multi-Conditional Molecular Generation,” 2024. https://openreview.net/forum?id=cfrDLD1wfO.

Spatiotemporal

Yi, Zhongchao, Zhengyang Zhou, Qihe Huang, Yanjiang Chen, Liheng Yu, Xu Wang, and Yang Wang. “Get Rid of Isolation: A Continuous Multi-Task Spatio-Temporal Learning Framework,” 2024. https://openreview.net/forum?id=tnh4LK72yj.