Data Fundamentals
This is the first video of a series I’ll be uploading in the coming weeks about the courses that I took as part of my masters’ degree here at the University of Glasgow.
The first course I want to talk to you about is Data Fundamentals. Basically, this course was a refresher of some of the things that you learned if you did an engineering degree, not to mention if you did a computer science degree. In my opinion, this was also the perfect introduction to Data Science, with basic but very interesting stuff that prepared us for the challenges.
I think it is worth mentioning that this was by far my favourite course, not only because of the topics we reviewed but also because of the lecturer, John Williamson, the way he explained things to us was simply amazing. But anyways, here in the university, the courses are divided in lectures. Usually, there is one topic per lecture, and it is precisely in that way that I’ll be describing them to you.
Starting with lecture 1:
Lecture 1: We saw how to work with NumPy, a practical library to deal with numerical data in the form of arrays (vectors, matrices and tensors) using Python, how to transform them by performing flipping, slicing, transposing, and many more operations on them. And we also took a brief look at the concept of vectorized computing, which allows performing a single operation to multiple data elements to improve the performance of our code by taking the maximum advantage of our computers.
For the Lecture 2: we learned how arrays are efficiently stored in a computer, we also saw how floats are stored in IEEE 754 notation, and this enabled us to know what are the most common errors that could happen when working with them. Finally, we were introduced to the concept of higher rank tensors, and how can we work efficiently with them using vectorized computing.
In Lecture 3: we were shown some basics of scientific visualisation using matplotlib
a Python library to plot data, and learned the “language” of graphics, what is a stat, a scale, a guide, a geom, layer and a facet. As well as which plots make more sense when dealing with different types of data, this also involved which colours we can use to correctly communicate our findings when working with data.
The Fourth lecture was about an introduction to linear algebra, its concepts and how can we perform them using Python and NumPy. This is somewhat an extension of what we saw in the first lecture, but now with linear algebra in mind. In here we reviewed the basic operations that can be performed on vectors, the concept of norm and more complex vector operations. By the end of this lecture we reviewed matrices and the operations that are defined for them as well as some useful properties that we can use to make our computations simpler.
The Fifth lecture started with a refresher of how graphs can be represented using matrices whether they are directed, undirected and weighted or unweighted. After that, we went back directly to linear algebra, where concepts like eigenvalues and eigenvectors appeared, and how they are related to Principal Component Analysis. Talking about decomposition of matrices, we also reviewed the concept of Singular Value Decomposition, a way to decompose a matrix into simpler forms to enable efficient computation on them.
In Lecture 6, we started with the core concept in which machine learning relies on optimization. Optimization is the task of finding the optimal settings for a process in order to improve it. We saw the main parts of an optimization problem:
- Parameters
- Objective function
For this, we also saw how can constraints be implemented into our parameter selection to make our optimization search more realistic. We focused mainly on iterative optimization, guided by heuristics that try to improve the performance of the algorithms behind them.
Lecture 7 was also about optimization concepts, including the basics of how a neural network works, and how derivatives can help us in finding the optimal parameters using the gradient descent algorithm. We saw what differentiable programming is and how we can use it to perform optimization.
In Lecture 8 the topic was probability, starting from the basics, including the concept of probability mass functions and probability density functions. We also reviewed what the joint, marginal and conditional probability, the basis for the Bayes’ Rule, a very important concept as well. And how can we deal with probabilities without running into numerical issues.
In the lab number eight, corresponding to this lecture, we learned what a stochastic process was, and practised using a specific version of this kind of processes: the Markov process.
Lecture 9 was also about probability; we saw the concept of expectation, then some statistics and finally what Bayesian inference is and the Monte Carlo approaches are useful when dealing with Bayesian inference.
To conclude with the course in the lecture number 10, in we saw a bit of Digital Signals processing and time series. What sampling is and its relationship with the Nyquist limit, and how aliasing can appear if we don’t respect this limit. The last section of this lecture was about convolution and the Fourier transform.
Of course, we reviewed a lot more topics and we went more in-depth for almost all of them, trying to summarize hours of lectures and labs in a short video is not that easy, so if you want to know more about something that I mentioned where, please leave it on the comments.
So, that’s it for me, at least for now, I hope you liked this video, and if you did, please give it a Like. There are more videos coming about the remaining courses that I’m creating as I review the lecture notes in preparation for the exam, I think the next one is going to be about machine learning.