AI & Data Engineering
Advance your career into the world of AI and Data Science
Advance your career into the world of AI and Data Science
Started from July 2018, newly updated since 2019, we designed two tracks - data science and business analytics. For the first two months, you will take the fundamental sessions. For the next one month, you will take your own sessions catered to the track you choose. You can also take sessions from both tracks.
The course will be taught by experienced data scientist and machine learning experts from top tech companies. Faculty student ratio reaches 1:5. The course is tailored to meet industrial demands for artificial intelligence and data science positions. 20+ instructors to help you master the most cutting-edge skills in data science.
The companion coursework dives you into the most recent and relevant trends in the data science world: user stickiness analysis, text clustering, spark program development, and deep learning.
Students who took the course have gained offers in technology, finance and consulting industries including data scientist, machine learning engineers, data analytics and business analytics positions.
4 weeks of separate course tracks catered towards your career paths and interview requirements. The tracks are shared and connected with the option to be on both, so you can explore multiple opportunities at the same time.
Focus on developing your business sense with emphasis on business analytics, mathematical statistics, case studies, A/B testing and other necessary skills. Boosts your SQL and Python proficiency to get you ready for business analyst positions.
Gives you an in-depth training of cutting-edge technologies such as distributed systems and deep learning. With a higher standard in coding, we will prepared you for data scientist positions through 4+ machine learning projects.
Stream Computing can provide real-time data for Business Intelligence system and Artificial Intelligence system, so that users can obtain up-to-date information either directly through statements or indirectly through algorithm model.
This project is based on mobile game data. It will lead you to calculate real-time scores and ranking list through Flink SQL, and master the basic principles of Stream Computing.
Topic modeling is a frequently used method that helps in discovering hidden topics, annotating documents, and organizing, searching and summarizing texts.
You will cluster unlabeled textual documents into groups and discover latent semantic structures using Python. You will learn to preprocess text by tokenizing, stemming and stopwords removing, and extract features by term frequency-inverse document frequency (TF-IDF) approach. Using these data, you will train unsupervised learning models and learn to visualize model training results.
With big data and machine learning, Data Scientist can now understand users better. Learning how to use Spark ML to process large scale natrual language data will help you get more interview opportunities.
You will build a ML model to identify user 's preference over the Youtube video based on Pyspark. You will design the reasonable metric to evaluate the proposed ML model, clean users' comments by Spark ML related NLP techniques, and build a supervised model to classify users comments. You will further need to handle unbalanced data and labeling missing issues. You will use AutoML technique to speedup the tuning procedure. Finally, you will generate a business report and introduce the way to increase user's engagement with Youtube.
Big data analysis is an essential skill for data scientist. Data scientist needs to build an entire pipeline includes data collection, data cleaning and data modeling.
This project is based on crime data in the San Francisco area. It will lead students to establish a data analysis workflows including data collection, cleaning, storage, and analysis. Based on analyzing and modeling for the crime and weather data, a possible crime event prediction model was established.
Recommendation system is the most profitable department in Google, Facebook, Airbnb, Uber and other startup companies. The ability to design and build a recommendation system is the most important and attractive capability for a data scientist.
This project will lead you to become an expert in building a recommendation system for big data. Netflix movie rating data are used to build the recommendation system, and help you to be and expert in recommendation system by mastering of machine learning algorithm to system implementation. You would come to master the skills on Spark machine learning pipeline building and collaborative filtering model automatically tuning, and apply the built model on Netflix movie rating data.
In online advertising, click-through rate (CTR) is a very important indicator for evaluating the effectiveness of your ads. CTR click-through rate forecast is an assessment on the clicks of each ad, which is widely used to sponsorship search and real-time bidding. CTR is often evaluated in data science interviews.
This project is based on the user's daily click-through data, and involves three main processes: ETL, OLAP and statistical analysis, machine learning modeling and forecast. Spark dataframe is used for preprocessing. Spark SQL is used for big data analysis and statistical modeling, and Spark ML pipeline is used for classification and regression models. We will introduce the principles of XGboost, optimization, and so on.
With the rapid development of deep learning technology, more and more Internet companies are beginning to use deep learning in building recommendation systems. Deep learning enables end-to-end learning, compared to traditional recommendation systems.
This project is based on the deep learning model auto-encoder-decoder network, using imdb movie data as training data, and tensorflow to build auto-encoder-decoder model. Features of users and movies are extracted through the model, and the automatic recommendation of movies is finally realized.
Stream data processing would be the next generation computation. Streaming data analysis reduces the workload of data analysis caused by data landing. Real-time streaming data analysis, processing and modeling would be the killing skill for finding a job from top technical company.
This project is based on Twitter's stream data and leads you to build a complete stream data processing pipeline. This project is based on Kafka's workflow for data redistribution, and then uses Spark Streaming and Spark Structured streaming to clean and analyze the stream data. Finally, it enable you to build a Spark ML's offline text data analysis model to identify user’s sentiment from streaming Twitter data.
Image classification is one of the most important tasks of computer vision, and it has also been applied to large-scale applications by major IT companies. The Convolutional Neural Network (CNN) has yielded very good results on Imagenet, the image classification big data set.
Based on CNN and Imagenet’s weighted model, we use tensorflow+transfer learning technology to optimize user-defined data set, and establish a deep learning model for car image classification and related image search.
Time Series data is very common in our daily life. It is a collection of data obtained by measuring the time series of observations at equal time intervals. For example, the annual sales volume of apparel companies, the price of stocks, the annual precipitation of a city in meteorology, the average monthly temperature, and the PM2.5 index variation etc. Therefore, the analysis of time series data is capable for different real-life applications.
This project is based on the deep learning model LSTM. Students will learn the principle of LSTM models and related technologies for analyzing time series data. This project uses NASDAQ stock data as the training data, and teaches students to build a deep learning model via TensorFlow, which later can be used to predict stock price variation and stock market index.
In many commercial fields, such as retail, manufacturing, and medical industries, accurate forecasting of product demand is closely related to corporate income. Excessive forecasting will lead to higher storage costs and shorter product life; on the contrary, too conservative forecasts will bring a shortage of stocks, which will weaken the willingness to consume and affect the brand image. Therefore, for large enterprises in the transition period of Industry 4.0, how to use the effective information in the big data wave to forecast product demand has become an important issue.
In this project, we will analyze and process historical sales and product data of several well-known traditional enterprises, and establish models to predict the future needs of new and old products. At the same time, we will have a better understanding of the supply chain and related job opportunities in traditional companies in the era of big data.
In various industries, such as Finance, E-commerce, resource sharing, etc, there are all kinds of hidden fraudulent activities. These activities result in direct financial loss. It is a huge challenge for these companies to pinpoint the rare fraudulent activities and minimize financial loss, while maintain good user experience. In this project, we will analysis E-commerce transaction data, study the insight/pattern, and build machine learning solution to give actionable business recommendation for deployment.
With the advancement of computer technology, it is now easy to dig out hidden information from unrelated data. For example, in the eighteenth century, stock prices fluctuate with the ships coming and going, because the merchant brought the latest news as well as the cargo. Other studies have found that company executives' visits to the White House can predict the future direction of the company's stock. In this project, we will follow the same line of thinking and analyze the relationship between New York taxis and the stock market. Does the seemingly complicated New York traffic have interesting information hidden?
In this homework, the students will use all the knowledge they have learned to reasonably explore the data, including defining the appropriate business problem, asking reasonable questions, summarizing the data under right metrics, selecting reasonable statistical models, and verifying the conjecture.
In 2017, global retail e-commerce turnover reached 2.290 trillion US dollars, accounting for 10.1% of total retail sales, and is expected to reach 4.479 trillion US dollars by 2021. Year 2018 is the year of online and offline retail revolution - "Future Retail" has taken root and flourished.
In this project, the students will analyze the sales volume and product information of a well-known e-commerce website, systematically learn personalized design, attract new customers and encourage customers to re-shop, optimize commercial marketing channels, and then establish a web product sales forecast model.
20+ instructors to help you master the most cutting-edge skills in data science and achieve your career goals.
Our team consists of senior data scientists, machine learning engineers, and business analysts from Google, Facebook, McKinsey & Company, Hortonworks. You will also receive hands-on guidance from Apache Spark/Hadoop contributors and committee members.
You will learn the foundamentals of Data Science including Python basics, linear data structures and search algorithms, and traditional machine learning models.
Frequency: 1 month, 5 sessions/week, 2-3 hrs/session
Introduction of Data Science
Fundamentals of Probability & Linear Regression
[Coding] Python Basics 1 variable and syntax
[Coding] Python Basics 2 function and class
Logistic Regression I
[Coding] Python Basics 3 base data structure
[Coding] Python Binary Search
Logistic Regression II & Model Evaluation
[Coding] Python Array Basic Sorting
[Coding] Python LinkedList and Recursion I
[Coding] Python LinkedList & Recrusion I cont
[Coding] Python Practice
[Coding] Python Advanced Sorting and Practice
[Coding] Python Review
Data Manipulation in Python 1
You will learn Python, data structure and algorithms, improve Coding skills, and enhance your knowledge of mathematical statistics, probability and so on.
Frequency: 3 weeks, 5 sessions/week, 2-3 hrs/session
[Coding] Python Queue and Stack
Data Manipulation in Python 2
[Coding] Python Review
[Coding] Exam 1
Machine Learning Project 1 - Customer Churn Prediction
[Coding] Python Binary Tree
Machine Learning Project 2 - NLP and Topic Modeling
[Coding] Recursion II - recursion on tree
[Coding] Python Practice
Introduction to statistics
[Coding] Python Binary Search Tree
[Coding] Python review
Hypothesis testing 1
[Coding] Python Heap
Hypothesis testing 2
A/B testing 1
[Coding] Python Review
A/B testing 2
[Coding] Python Hashtable
Inference in regression
[Coding] String I
[Coding] Recursion III DFS
[Coding] Recursion III DFS cont
[Coding] Exam 2
You will study typical Online Assessment, and enter resume review sessions.
Frequency: 1 week, 5 sessions/week, 2-3 hrs/session
[Coding] Probability, Sampling, Randomization
Resume and interview preparation
Career guide: BA vs DS
Online Assessment - deep dive 1
Online Assessment - deep dive 2
Through 4+ Case Studies and Data Challenges, you will enhance your business analytics, case studies, SQL and Python skills and get ready for business analyst positions.
Frequency: 1 month, 4 sessions/week, 2-3 hrs/session
BA track introduction (& mock interview) & final project presentation
eCommerce deep dive 1: System design
[Coding-for-BA] Queue, Stack
eCommerce deep dive 2: Data driven marketing
eCommerce deep dive 3: Data lab
Case study deep dive 1
Case study deep dive 2
Case study deep dive 3
Data visualization In Tableau
[Coding-for-BA] String practice
Data visualization in Python
Anomaly Detection 1
Anomaly Detection 2
Anomaly Detection 3
Supply chain data 1
Supply chain data 2
Mock interview session 1
Review of BA/DA track
Course opens soon. Follow us to stay informed.
Scan the QR code above to
get in touch with Course Specialists