Jiancheng Dong

Greetings! I am a senior student at the School of Artificial Intelligence, Nanjing University. Currently, I am working as a Research Intern at the Responsible and Reliable AI Lab at the University of Illinois Chicago under the supervision of Prof. Lu Cheng and Prof. Wei Jin.

Previously, I was an intern at NJU NLP Group, advised by Prof. Xinyu Dai and Prof. Zhen Wu.

My research interests lie in Natural Language Processing, including in-context learning, alignment, reasoning, and interpretability in Large Language Models. My long-term academic goal is to develop explainable learning mechanisms to enhance LLMs.

Actively seeking PhD opportunities!

Email / GitHub / CV /

Research Experiences

Here are some of my research experiences.

Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs

Jiancheng Dong, Lei Jiang, Wei Jin, Lu Cheng
NAACL 2025, 2024
arxiv /

I developed Threshold Filtering Packing (TFP) for Packing in SFT, which strategically selects contextually related samples while ensuring diversity to prevent sequence cross-contamination during GPU processing.

HKU Data Science Summer Camp

Yi Ma, Sebastian Morel-Balbi, Yue Xie, Man Chung Yue
HKU Institute of Data Science, 2023
code /

I learned from Professor Ma Yi about studying artificial intelligence through cybernetics, focusing on his paper ‘White-Box Transformers via Sparse Rate Reduction.’

NJU NLP Summer Camp 2022

Shujian Huang, Xinyu Dai
NJU NLP Group, 2022
code /

I am revisiting classic image captioning methods, investigating evaluation metrics, exploring cross-modal pretraining applications, and fine-tuning CLIP with existing human-scored image-text datasets.

Projects

These are some projects I have completed.

	nanoMusic 2023-09-21 code / Our project, an automatic composition framework utilizing GPT and RWKV with the REMI representation for musical scores, features a lightweight, user-friendly interface for pre-training and fine-tuning that requires just two command lines, earning us first place in the 2023 Yifangda Asset Cup.
	AI Yinzhi 2023-08-19 paper / I provided primary technical support for our project, which advanced to the finals of the 9th China International College Students’ “Internet+” Innovation and Entrepreneurship Competition.
	NJU EMUlator 2022-12-27 code / I developed a RISC-V 32-bit simulator called NEMU as part of Professor Yanyan Jiang’s PA and voluntarily enrolled in his Operating Systems course.
	Pinyin 2022-06-29 code / I created a Projectsligent Pinyin input method that uses the HMM and Viterbi algorithm for input, employs rule-based techniques for enhanced segmentation in complex cases, and incorporates an LSTM model for automatic next-word prediction based on a small dataset.
	MNIST 2022-04-08 code / I developed a machine learning algorithm during my freshman year that includes an independently created CNN, focusing on the MNIST handwritten digit dataset.

Talks

These are some talks I have completed.

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

2023-12-10
paper / code /

The Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model’s expected responses and its intrinsic generation capability. Through the application of IFD, cherry samples can be pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on datasets like Alpaca and WizardLM underpin our findings; with a mere 10% of original data input, our strategy showcases improved results.

Swarm of micro flying robotsin the wild

2023-03-15
paper / code /

Swarm of Micro Flying Robots in the Wild is the world’s first comprehensive demonstration of a highly autonomous drone swarm in a natural environment. It primarily focuses on the trajectory planning and deployment of the swarm of micro flying robots.

Design and source code from Jon Barron's website

Jiancheng Dong

Research Experiences

Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs

HKU Data Science Summer Camp

NJU NLP Summer Camp 2022

Projects

nanoMusic

AI Yinzhi

NJU EMUlator

Pinyin

MNIST

Talks

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

Swarm of micro flying robotsin the wild