Hi everyone, my name is Vy Hoa Phun and I am a Machine Learning (ML) enthusiast, who loves to understand the fundamental concepts of ML algorithms in a crystal clear way and in many different perspectives. About me, I have successfully obtained a Bachelor degree in Computer Science at Ho Chi Minh University of Technology (\(\text{HCMUT}\)) in Vietnam and a Master degree in Data Science at University of Queensland (\(\text{UQ}\)) in Australia.
Regarding my undergrad education background, I had the chance to learn many courses, which touches many CompSci fields such as Computer Networks, Graphics, Ops Research, Artificial Intelligence (AI), and so on. During my final year at \(\text{HCMUT}\), I worked on a project for my thesis called Developing Smart Traffic Light Control System in which many ML algorithms were applied to seek for the optimal signal timing such as Genetic Algorithm (GA), Reinforcement Learning with Neural Networks approach, etc, but the scope of the project was only about an isolated signalized intersection, and luckily this project also helps me won a 2nd Prize at the Science Research Symposium 2018 held by the Office for International Study Program (\(\text{OISP}\)). For more information about my thesis, please refer to here.
It is worth mentioning that in the final year at \(\text{HCMUT}\), I had a chance to learn a ML course, which covered many ML techniques in probabilistic view, besides I am also a big fan of calculus, linear algebra, and probability & statistics. As a result, this subject attracts my biggest attention and it inspires me to learn as much as I could. That is why I decided to acquire a Data Science degree to not only get a job which is relevant to my field of interest but also enrich my knowledge to make it more well-rounded.
During my post-grad uni time, I’ve dived into many more DataSci courses such as Pattern Recognition, Mathematical Statistics, ML, Data Mining, Ops Research, Social Media Analytics, Data Analytics at Scale, and Database Principle. About Pattern Recognition course, it mainly covers different types of Deep Learning models to analyse image data. In terms of Mathematical Statistics course, it covers many topics related to hypothesis testing, Maximum Likelihood Estimation, Maximum A Posteriori, properties of continuous distributions often used in Bayesian Statistics, Markov Chain Monte Carlo (MCMC) sampling methods, and a bit about Generalized Linear Model. In terms of Social Media Analytics, it covers many traditional techniques and ML methods to work with graph data like Spectral Clustering, PageRank, Random Walk, Modularity Maximization, and so on. Coming to Data Analytics at Scale, it gives me a gist of many Big Data technologies like Hadoop, Spark, and how an algorithm can leverage these technologies to execute at scale. In terms of Ops Research, I’ve learnt to solve various Linear Programming, Integer Linear Programming, and Dynamic Programming problems. About ML, it covers a wide range of topics from Linear Regression, PCA, LDA, Decision Tree, SVM, Multi Layer Perceptron, and some of the ensemble techniques as well as the methods to make a ML model more robust to outliers. In terms of Data Mining, I’ve learnt some of the techniques related to anomally detection, association rule mining, and text mining. Coming to Database Principle course, it provides the knowledge to design a Relational database and some practices about using SQL to query data. In the last sem at UQ, I completed a capstone project related to proposing a solution to preserve user privacy at scale when using recommender systems, which is so called Secure Recommender System. To be exact, the type of data to be preserved is implicit feedback information of users that is used to train a recommender system, which could be the interactions between users and items if the recommender system is specialized for an ecommerce platform, and so to protect this type of information given that a subset of users will request/submit a particular set of item embeddings per communication, I used Private Set Union (PSU) algorithm with the help of well-known Secure Aggregation Protocol to prevent the server from knowing which item/item embedding is truly interacted with whom, even if the server knows the user requested which item embeddings in a communication. However, it is not sufficient to preserve user privacy as the server can observe the request frequency of item embeddings with respect to each user to infer the implicit feedback data, because a user interacting with an item will more likely request the corresponding item embedding from the server. To overcome this, I proposed a method to randomize the requests of users by solving a linear optimization problem such that the request frequency of any item associated with a user is indistinguishable. For more detail about the project, please refer to here.