Programming for Data Science

Module Introduction

This module is designed to give those with little or no programming experience a firm foundation in programming for data analysis and AI systems, recognising a diversity of backgrounds. The module will also fully stretch those with substantial prior programming experience to extend their programming and system-building knowledge through self-learning supported by on-line courseware.

We begin with some introductory information concerning the framework within which the module is taught.

After reading this you should be aware of:

  • The overal structure of the module
  • software you will be using and now to install it on your own machine
  • the types of formative and assessed work you will need to complete
  • how and how not to use external information sources

Module Structure

The course content will be organised as follows:

  • Introduction The introductory material will include the following:
    • Setting up the required software to use Jupyter notebook
    • Overview of how data operates in programming
    • Fundamentals of Python programming:
      • Data types and operators
      • Control logic
  • Programs, Information and Data
    • Defining functions
    • Working with more complex and structured data
    • Algorithms for analysing and manipulating data
  • Data Transformations and analysis
    • More complex data manipulation (using pandas)
  • Data Acquisition and Visualisation
    • Getting data from online resources
    • Concepts and tools for lucid graphical presentation of results (using matplotlib)
  • Programming for Artificial Intelligence
    • Illustrations of programming ideas and algorithms used in AI systems
  • Future Directions for Programming

Python 3 and Jupyter Installation

The material and support given as part of this module assumes that you have a working installation of Anaconda Python (which includes Jupyter) running on a suitably powerful computer. Installation of this software should in most cases be very straightforward. We may be able to help you with some issues that may arise with your computer and with the software installation. However, such issues are beyond the scope of what we support in this module. For help with general computer or software issues you should go to the University's IT Service.

Anaconda

Anaconda is a very widely used and reliable verion of Python, so it should be easy to install on any of the major operating systems. All the information and downloads you should need are avalable from the Anaconda web pages:

  • Anaconda Individual Installation
    This is where we suggest you go to install Anaconda Python. Versions are available for each of the major operating systems Windows, MacOS and Linux (you should also be able to install on a ChromeBook, sinc Chrome OS is a version of Linux).

Jupyter

Once you install Anacoda Python, your system should also be ready to run Jupyter notebook. Way to start the Jupyter notebook server program will vary depending on the operating system you are using.

  • On Linux systems one would typically start Jupyter as follows:
    • open a terminal window (for entering system commands),
    • change folder to the folder that will be the start folder for Jupyter,
    • start Jupyter with the command:
      jupyter notebook &

      Notes: The & is just used to run the command in the background, so the terminal can still be used to enter other commands, while the notebook server is running.
      One can also select the browser that will be used by the notebook server by adding an option to the command. For example:jupyter notebook --browser=firefox &
  • On Windows systems, after installing jupyter you should be able to start Jupyter by clicking on the Windows 'Start Menu' button and typing jupyter notebook into the search window. (You may want to create a launch icon on your task bar to start Jupyter more quickly.)
    The first time you run Jupyter you may be prompted to choose which browser you wish to use and which folder you want to be your Jupyter home directory.

  • On MacOS sytems the procedure for starting Jupyter is similar to that for Linux.

  • Futher instructions regarding starting Jupyter on all the popular operating systems can be found in the Jupyter Notebook Quick Start Guide

Additional Python Packages

If you install the Anaconda Python distribution, this comes with all the modules that you will require for the first three units of this module and for the first two assessments. Later in the module and for your final assessment work, or because of your personal interests, you may want to use additional packages that do not come with the standard Anaconda installation. To install or update packages for Anaconda you can use the conda program. For example, the scikit-learn package can be installed with:

conda install scikit-learn

For certain packages you may need to use the pip command which is a slightly more general installation tool for Python packages.

Note.

In Python, the code in any single Python file is called a module. But most often, when we talk about modules, we are thinking of standard code files that a programmer makes use of in their own program by use of the import command. The term package refers to a group of modules that are distributed together and provide functionality for a set of related programming tasks.

Assessed Work

There will be 4 pieces of assessed work:

1 Basic Python Coding Coding 30%
2 Data and Algorithms Coding 30%
3 Data Analysis Group Project Code with Report (from 1-3 Students) 30%
4 End of Module Test Online Test 10%

Before working on and before submitting your assignments, please take note of the following conditions:

  • Importing Modules. In code written for assessment, you should take care to follow instructions regarding importing of modules. In most cases you should not import modules other than those you are explicitly instructed to use.
  • All coursework should be submitted to Gradescope. A link to enable you to go to Gradescope and see the module's assignments will be created in Minerva. When you select the module you will be presented with an interface for submitting your files.
  • Make sure you submit the correct files and follow instructions that have been regarding the format and content of files required for that assignment. If you do not submit in the correct format, you
  • You must follow the rules regarding the use of external resources, which are given in the following section.

Use of External Resources

This module is intended to provide an introduction to programming in Python and to the use of programming in Data Science and Artificial Intelligence. The module gives a general overview of key concepts as well as a selective illustration of a variety of programming techniques. In terms of its high-level content, the material is intended to be self-contained, although it will include many pointers to external resources that you are encouraged to investigate to broaden the scope of your knowledge.

But you should be aware that the material provided is definitely not self-contained in terms of the detailed information about Python, Jupyter and various Python packages that we shall be using in the module. In order to complete the programming assignments and achieve the learning objectives you will need to extensively consult external resources which provide documentation and examples of the specific details of the Python language and its use in conjunction with Jupyter and various library packages. Fortunately, the internet provides many excellent resources that provide all the information you will need. In fact several types of useful resource are available.

Documentation

Nearly early all the detailed information you may need be found on one of the following documentation sites:

These resources provide a huge amount of information in a rather dense way. Bear in mind that you rarely need to read right through large amounts of documentation. Instead you need to develop the skill of scanning these resources for relevant parts and then carefully reading the details you need, and perhaps other pieces of information that this leads you to.

Tutorials

You will probably benefit from viewing some of the many tutorial websites and videos that can be easily found on the internet, which present information in a accessible and motivational form. We will not make specific recommendations, but simply searching for something like "python jupyter tutorial" will turn up a lot of options. Obviously these will vary in quality, style and level, and different people have different preferences and needs. If you find that a tutorial is not appropriate for you, try another until you find one that suits you.

Help Forums

Another useful kind of external resource is help forums such as StackOverflow. Such websites provide facilities for users to post questions and answers, which are stored for the benefit of other users. This kind of resource can be extremely helpful for programmers, not only in gaining specific answers, but also in being able to see a variety of potential solutions and opinions and get into different approaches and perspectives.

But please note the following strict limitations of use of help forums or other help services in relation to the module:

  • Do not post to an external forum, blog or website, any query or information that directly references any part of any exercise or assignment that is provided to you as part of this module.

  • Any query concerning a techincal issue relating to this module should in the first instance be posted to the forums provided to support this module. These forums should enable you to resolve nearly all technical issues that arise in relation to the module.

In view of these important restrictions, it is expected that, in relation to your studies, help forums should generally be used only to search for existing solutions, not to post queries. Having said that, if you do have questions relating to techinical issues that cannot be resolved by module staff of fellow students, and you take care not to refer directly to the assessemnt materials provided by the module, you may and should make use of external forums.