LLM Fine-Tuning
Purpose, Data Preparation, Code, and Comparative Results
曾經拿哪個 LLM Base Model 作 Fine tuning
當初是出於什麼目的要 Fine-tuning? 比如說希望他在哪方面或哪個 Domain 表現更好
需要準備哪些資料?
相關的程式碼
最後有任何的實驗結果比較有無 Fine-tuning 之類的
1. LLM - Detect AI Generated Text
Purpose 目的
Identify which essay was written by a large language model.
Data 資料
The dataset comprises student-written essays and essays generated by LLM using the same prompt. Additionally, synthetic data was incorporated to augment the dataset. Various metadata, such as prompt name, holistic essay score, ELL status, and grade level, were appended. Augmentations were applied to familiarize models with common attacks on LLM content detection systems and obfuscations. These augmentations include:
Spelling correction
Character deletion, insertion, and swapping
Synonym replacement
Introduction of obfuscations
Back translation
Random capitalization
Sentence swapping
Metrics
Submissions were evaluated on area under the ROC curve between the predicted probability and the observed target.
Base Models
DeBERTa
Mistral 7B
LLM Fine Tuning Tools
最后更新于