New search
PVG0045
Introduction to Python for data science
In the big data era, programming skills are required in order to efficiently handle datasets. Scientists in natural sciences are commonly exposed to big datasets facing a most demanding task. It is most common that SLU PhD students have to analyze datasets that require different skillsets to traditional tools like Excel. Moreover, a common situation requires datasets to be transformed in various formats in order to be analyzed by specialized software. The latter in the case of big data can be achieved only programmatically. Python is currently the most popular programming language for data science. The latter could not be possible without the Pandas library which greatly facilitates a wide range of operations needed for data analysis, like transforming data format, combining data stored in different files and producing insightful summaries regarding data quality and interpretation. Moreover, the extensive graphic-related ecosystem of Python like the Seaborn library offers tremendous possibilities for constructing informative graphs both for facilitating data interpretation and for publication purposes.
The course format will include morning lectures that will be followed by practical exercises. The interactive development environment of Jupyter (www.jupyter.org) will be used throughout the course. Basic Python syntax will be introduced and thereafter students will gradually build core data science related skills. In particular, the students will be introduced to the Pandas library and practice data manipulation and aggregation techniques in large datasets. Finally, the students will gain experience in producing informative graphs using the Seaborn library or similar.
Expected study time
Total: 54 hours
Own study prior to course: 10 hours
Lectures: 14 hours
Computer assignments: 30 hours
The course format will include morning lectures that will be followed by practical exercises. The interactive development environment of Jupyter (www.jupyter.org) will be used throughout the course. Basic Python syntax will be introduced and thereafter students will gradually build core data science related skills. In particular, the students will be introduced to the Pandas library and practice data manipulation and aggregation techniques in large datasets. Finally, the students will gain experience in producing informative graphs using the Seaborn library or similar.
Expected study time
Total: 54 hours
Own study prior to course: 10 hours
Lectures: 14 hours
Computer assignments: 30 hours
Syllabus and other information
Syllabus
PVG0045 Introduction to Python for data science, 2.0 Credits
Subjects
Animal ScienceEducation cycle
Postgraduate levelGrading scale
Pass / Failed
Prior knowledge
Admitted to a PhD or residency program in biology, medicine, nursing, veterinary medicine, animal science, food science, nutrition or similar topics. No prior programming experience is required.Objectives
After completing this course, the students should be able to: • Explain basic Python syntax • Write simple functions in Python • Transform data in various formats using the Pandas library • Combine data sources stored in different files using the Pandas library • Produce insightful summaries of datasets using the Pandas library • Produce publication quality plots using the Seaborn libraryContent
In the big data era, programming skills are required in order to efficiently handle datasets. Scientists in natural sciences are commonly exposed to big datasets facing a most demanding task. It is most common that SLU PhD students have to analyze datasets that require different skillsets to traditional tools like Excel. Moreover, a common situation requires datasets to be transformed in various formats in order to be analyzed by specialized software. The latter in the case of big data can be achieved only programmatically. Python is currently the most popular programming language for data science. The latter could not be possible without the Pandas library which greatly facilitates a wide range of operations needed for data analysis, like transforming data format, combining data stored in different files and producing insightful summaries regarding data quality and interpretation. Moreover, the extensive graphic-related ecosystem of Python like the Seaborn library offers tremendous possibilities for constructing informative graphs both for facilitating data interpretation and for publication purposes. The course format will include morning lectures that will be followed by practical exercises. The interactive development environment of Jupyter (www.jupyter.org) will be used throughout the course. Basic Python syntax will be introduced and thereafter students will gradually build core data science related skills. In particular, the students will be introduced to the Pandas library and practice data manipulation and aggregation techniques in large datasets. Finally, the students will gain experience in producing informative graphs using the Seaborn library or similar. Expected study time Total: 54 hours Own study prior to course: 10 hours Lectures: 14 hours Computer assignments: 30 hoursAdditional information
The course will take place in the form of distance learning using Zoom or a similar platform. The course consists of five full-day meetings comprised of lectures and computer exercises. In addition, the students are expected to do individual work before the start of the course and in between meetings.Responsible department
Department of Animal Breeding and Genetics