Achtung! Das Lehrangebot ist noch nicht vollständig und wird bis Semesterbeginn laufend ergänzt.
136041 SE Open Source Language Models (2024W)
Prüfungsimmanente Lehrveranstaltung
Labels
An/Abmeldung
Hinweis: Ihr Anmeldezeitpunkt innerhalb der Frist hat keine Auswirkungen auf die Platzvergabe (kein "first come, first served").
- Anmeldung von Fr 06.09.2024 08:00 bis Do 26.09.2024 23:59
- Abmeldung bis Do 31.10.2024 23:59
Details
max. 25 Teilnehmer*innen
Sprache: Englisch
Lehrende
Termine (iCal) - nächster Termin ist mit N markiert
- Dienstag 01.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 08.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 15.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 22.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 29.10. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 05.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 12.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 19.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 26.11. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 03.12. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 10.12. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 17.12. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 07.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 14.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- N Dienstag 21.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
- Dienstag 28.01. 09:45 - 11:15 Hörsaal 2 Hauptgebäude, Tiefparterre Stiege 5 Hof 3
Information
Ziele, Inhalte und Methode der Lehrveranstaltung
Art der Leistungskontrolle und erlaubte Hilfsmittel
Participants will have to present one topic from the list in the seminar, the presentation should be roughly 25 minutes (hard limits: min. 20 minutes, max. 30 minutes). The presentation is followed by a QA session and discussion. Participants will also have to submit a written report (deadline and exact requirements TBD), describing the main contents of the presented paper(s) - see a list of recommended papers below - and putting it in a wider context.
Mindestanforderungen und Beurteilungsmaßstab
Your presentation will account for 45% of the grade, participation in discussions for 10%, and the written report for 45%.
Prüfungsstoff
Your presentation will account for 45% of the grade, participation in discussions for 10%, and the written report for 45%.
Literatur
---
A. Foundational work on transformer language modelsVaswani, A. "Attention is all you need." 2017Brown, Tom B. "Language models are few-shot learners." 2020Holtzman, Ari, et al. "The curious case of neural text degeneration." 2019Wei, Jason, et al. "Finetuned language models are zero-shot learners." 2021Ouyang, Long, et al. "Training language models to follow instructions with human feedback." 2022---
B. Evaluation and Analysis of LLMsHendrycks, Dan, et al. "Measuring massive multitask language understanding." 2020
and
Wang, Yubo, et al. "Mmlu-pro: A more robust and challenging multi-task language understanding benchmark." 2024Zheng, Lianmin, et al. "Judging llm-as-a-judge with mt-bench and chatbot arena." 2023Biderman, Stella, et al. "Pythia: A suite for analyzing large language models across training and scaling." 2023Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. "Are emergent abilities of large language models a mirage?." 2024---
C. Open Source LLMs, Training, and CorporaZhang, Susan, et al. "Opt: Open pre-trained transformer language models." 2022Le Scao, Teven, et al. "Bloom: A 176b-parameter open-access multilingual language model." 2023Groeneveld, Dirk, et al. "Olmo: Accelerating the science of language models." 2024Soldaini, Luca, et al. "Dolma: An open corpus of three trillion tokens for language model pretraining research." 2024Wang, Yizhong, et al. "How far can camels go? exploring the state of instruction tuning on open resources." 2023
and
Ivison, Hamish, et al. "Camels in a changing climate: Enhancing lm adaptation with tulu 2." 2023Wang, Yizhong, et al. "Self-instruct: Aligning language models with self-generated instructions." 2022Peng, Baolin, et al. "Instruction tuning with gpt-4." 2023Üstün, Ahmet, et al. "Aya model: An instruction finetuned open-access multilingual language model." 2024Singh, Shivalika, et al. "Aya dataset: An open-access collection for multilingual instruction tuning." 2024Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." 2024Frantar, Elias, et al. "Gptq: Accurate post-training quantization for generative pre-trained transformers." 2022Jiang, Albert Q., et al. "Mixtral of experts." 2024Shen, Yongliang, et al. "Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face." 2024Li, Raymond, et al. "Starcoder: may the source be with you!." 2023---
D. Legal Aspects, Copyright, and TransparencyLemley, Mark A., and Bryan Casey. "Fair learning." 2020NYT vs. OpenAI
Complaint by New York Times
and
Response by OpenAI (blogpost, legal response)
https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf , https://openai.com/index/openai-and-journalism/ , https://www.courtlistener.com/docket/68117049/52/the-new-york-times-company-v-microsoft-corporation/Strowel, Alain. Study on copyright and new technologies: copyright data management and artificial intelligence. 2022Jernite, Yacine, et al. "Data governance in the age of large-scale data-driven language technology." 2022Bommasani, Rishi, et al. "The foundation model transparency index." 2023
A. Foundational work on transformer language modelsVaswani, A. "Attention is all you need." 2017Brown, Tom B. "Language models are few-shot learners." 2020Holtzman, Ari, et al. "The curious case of neural text degeneration." 2019Wei, Jason, et al. "Finetuned language models are zero-shot learners." 2021Ouyang, Long, et al. "Training language models to follow instructions with human feedback." 2022---
B. Evaluation and Analysis of LLMsHendrycks, Dan, et al. "Measuring massive multitask language understanding." 2020
and
Wang, Yubo, et al. "Mmlu-pro: A more robust and challenging multi-task language understanding benchmark." 2024Zheng, Lianmin, et al. "Judging llm-as-a-judge with mt-bench and chatbot arena." 2023Biderman, Stella, et al. "Pythia: A suite for analyzing large language models across training and scaling." 2023Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. "Are emergent abilities of large language models a mirage?." 2024---
C. Open Source LLMs, Training, and CorporaZhang, Susan, et al. "Opt: Open pre-trained transformer language models." 2022Le Scao, Teven, et al. "Bloom: A 176b-parameter open-access multilingual language model." 2023Groeneveld, Dirk, et al. "Olmo: Accelerating the science of language models." 2024Soldaini, Luca, et al. "Dolma: An open corpus of three trillion tokens for language model pretraining research." 2024Wang, Yizhong, et al. "How far can camels go? exploring the state of instruction tuning on open resources." 2023
and
Ivison, Hamish, et al. "Camels in a changing climate: Enhancing lm adaptation with tulu 2." 2023Wang, Yizhong, et al. "Self-instruct: Aligning language models with self-generated instructions." 2022Peng, Baolin, et al. "Instruction tuning with gpt-4." 2023Üstün, Ahmet, et al. "Aya model: An instruction finetuned open-access multilingual language model." 2024Singh, Shivalika, et al. "Aya dataset: An open-access collection for multilingual instruction tuning." 2024Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." 2024Frantar, Elias, et al. "Gptq: Accurate post-training quantization for generative pre-trained transformers." 2022Jiang, Albert Q., et al. "Mixtral of experts." 2024Shen, Yongliang, et al. "Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face." 2024Li, Raymond, et al. "Starcoder: may the source be with you!." 2023---
D. Legal Aspects, Copyright, and TransparencyLemley, Mark A., and Bryan Casey. "Fair learning." 2020NYT vs. OpenAI
Complaint by New York Times
and
Response by OpenAI (blogpost, legal response)
https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf , https://openai.com/index/openai-and-journalism/ , https://www.courtlistener.com/docket/68117049/52/the-new-york-times-company-v-microsoft-corporation/Strowel, Alain. Study on copyright and new technologies: copyright data management and artificial intelligence. 2022Jernite, Yacine, et al. "Data governance in the age of large-scale data-driven language technology." 2022Bommasani, Rishi, et al. "The foundation model transparency index." 2023
Zuordnung im Vorlesungsverzeichnis
S-DH Cluster I: Language and Literature
Letzte Änderung: Mo 23.09.2024 14:46
B. Evaluation and Analysis of LLMs
C. Open Source LLMs, Training, and Corpora
D. Legal Aspects, Copyright, and Transparency