Warning! The directory is not yet complete and will be amended until the beginning of the term.
136041 SE Open Source Language Models (2024W)
Continuous assessment of course work
Labels
Registration/Deregistration
Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).
- Registration is open from Fr 06.09.2024 08:00 to Th 26.09.2024 23:59
- Deregistration possible until Th 31.10.2024 23:59
Details
max. 25 participants
Language: English
Lecturers
Classes (iCal) - next class is marked with N
- Tuesday 01.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 08.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 15.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 22.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 29.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 05.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 12.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 19.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 26.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 03.12. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 10.12. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 17.12. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 07.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 14.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- N Tuesday 21.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Tuesday 28.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
Information
Aims, contents and method of the course
Assessment and permitted materials
Participants will have to present one topic from the list in the seminar, the presentation should be roughly 25 minutes (hard limits: min. 20 minutes, max. 30 minutes). The presentation is followed by a QA session and discussion. Participants will also have to submit a written report (deadline and exact requirements TBD), describing the main contents of the presented paper(s) - see a list of recommended papers below - and putting it in a wider context.
Minimum requirements and assessment criteria
Your presentation will account for 45% of the grade, participation in discussions for 10%, and the written report for 45%.
Examination topics
Your presentation will account for 45% of the grade, participation in discussions for 10%, and the written report for 45%.
Reading list
---
A. Foundational work on transformer language modelsVaswani, A. "Attention is all you need." 2017Brown, Tom B. "Language models are few-shot learners." 2020Holtzman, Ari, et al. "The curious case of neural text degeneration." 2019Wei, Jason, et al. "Finetuned language models are zero-shot learners." 2021Ouyang, Long, et al. "Training language models to follow instructions with human feedback." 2022---
B. Evaluation and Analysis of LLMsHendrycks, Dan, et al. "Measuring massive multitask language understanding." 2020
and
Wang, Yubo, et al. "Mmlu-pro: A more robust and challenging multi-task language understanding benchmark." 2024Zheng, Lianmin, et al. "Judging llm-as-a-judge with mt-bench and chatbot arena." 2023Biderman, Stella, et al. "Pythia: A suite for analyzing large language models across training and scaling." 2023Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. "Are emergent abilities of large language models a mirage?." 2024---
C. Open Source LLMs, Training, and CorporaZhang, Susan, et al. "Opt: Open pre-trained transformer language models." 2022Le Scao, Teven, et al. "Bloom: A 176b-parameter open-access multilingual language model." 2023Groeneveld, Dirk, et al. "Olmo: Accelerating the science of language models." 2024Soldaini, Luca, et al. "Dolma: An open corpus of three trillion tokens for language model pretraining research." 2024Wang, Yizhong, et al. "How far can camels go? exploring the state of instruction tuning on open resources." 2023
and
Ivison, Hamish, et al. "Camels in a changing climate: Enhancing lm adaptation with tulu 2." 2023Wang, Yizhong, et al. "Self-instruct: Aligning language models with self-generated instructions." 2022Peng, Baolin, et al. "Instruction tuning with gpt-4." 2023Üstün, Ahmet, et al. "Aya model: An instruction finetuned open-access multilingual language model." 2024Singh, Shivalika, et al. "Aya dataset: An open-access collection for multilingual instruction tuning." 2024Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." 2024Frantar, Elias, et al. "Gptq: Accurate post-training quantization for generative pre-trained transformers." 2022Jiang, Albert Q., et al. "Mixtral of experts." 2024Shen, Yongliang, et al. "Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face." 2024Li, Raymond, et al. "Starcoder: may the source be with you!." 2023---
D. Legal Aspects, Copyright, and TransparencyLemley, Mark A., and Bryan Casey. "Fair learning." 2020NYT vs. OpenAI
Complaint by New York Times
and
Response by OpenAI (blogpost, legal response)
https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf , https://openai.com/index/openai-and-journalism/ , https://www.courtlistener.com/docket/68117049/52/the-new-york-times-company-v-microsoft-corporation/Strowel, Alain. Study on copyright and new technologies: copyright data management and artificial intelligence. 2022Jernite, Yacine, et al. "Data governance in the age of large-scale data-driven language technology." 2022Bommasani, Rishi, et al. "The foundation model transparency index." 2023
A. Foundational work on transformer language modelsVaswani, A. "Attention is all you need." 2017Brown, Tom B. "Language models are few-shot learners." 2020Holtzman, Ari, et al. "The curious case of neural text degeneration." 2019Wei, Jason, et al. "Finetuned language models are zero-shot learners." 2021Ouyang, Long, et al. "Training language models to follow instructions with human feedback." 2022---
B. Evaluation and Analysis of LLMsHendrycks, Dan, et al. "Measuring massive multitask language understanding." 2020
and
Wang, Yubo, et al. "Mmlu-pro: A more robust and challenging multi-task language understanding benchmark." 2024Zheng, Lianmin, et al. "Judging llm-as-a-judge with mt-bench and chatbot arena." 2023Biderman, Stella, et al. "Pythia: A suite for analyzing large language models across training and scaling." 2023Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. "Are emergent abilities of large language models a mirage?." 2024---
C. Open Source LLMs, Training, and CorporaZhang, Susan, et al. "Opt: Open pre-trained transformer language models." 2022Le Scao, Teven, et al. "Bloom: A 176b-parameter open-access multilingual language model." 2023Groeneveld, Dirk, et al. "Olmo: Accelerating the science of language models." 2024Soldaini, Luca, et al. "Dolma: An open corpus of three trillion tokens for language model pretraining research." 2024Wang, Yizhong, et al. "How far can camels go? exploring the state of instruction tuning on open resources." 2023
and
Ivison, Hamish, et al. "Camels in a changing climate: Enhancing lm adaptation with tulu 2." 2023Wang, Yizhong, et al. "Self-instruct: Aligning language models with self-generated instructions." 2022Peng, Baolin, et al. "Instruction tuning with gpt-4." 2023Üstün, Ahmet, et al. "Aya model: An instruction finetuned open-access multilingual language model." 2024Singh, Shivalika, et al. "Aya dataset: An open-access collection for multilingual instruction tuning." 2024Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." 2024Frantar, Elias, et al. "Gptq: Accurate post-training quantization for generative pre-trained transformers." 2022Jiang, Albert Q., et al. "Mixtral of experts." 2024Shen, Yongliang, et al. "Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face." 2024Li, Raymond, et al. "Starcoder: may the source be with you!." 2023---
D. Legal Aspects, Copyright, and TransparencyLemley, Mark A., and Bryan Casey. "Fair learning." 2020NYT vs. OpenAI
Complaint by New York Times
and
Response by OpenAI (blogpost, legal response)
https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf , https://openai.com/index/openai-and-journalism/ , https://www.courtlistener.com/docket/68117049/52/the-new-york-times-company-v-microsoft-corporation/Strowel, Alain. Study on copyright and new technologies: copyright data management and artificial intelligence. 2022Jernite, Yacine, et al. "Data governance in the age of large-scale data-driven language technology." 2022Bommasani, Rishi, et al. "The foundation model transparency index." 2023
Association in the course directory
S-DH Cluster I: Language and Literature
Last modified: Mo 23.09.2024 14:46
B. Evaluation and Analysis of LLMs
C. Open Source LLMs, Training, and Corpora
D. Legal Aspects, Copyright, and Transparency