Companies are desperately in search of skilled Big Data analysts. Considering the fact, that data is being collected and stored at a velocity faster than ever, the urgency of such skilled professionals increases further.
This course provides a hands-on exposure to the Big Data phenomenon. Out of the several Big Data and related technologies such as Hadoop, MapReduce, Apache, Pig, Hive, Flume, Sqoop, Zookeeper, Oozie, Spark, Cassandra, and Mongo DB etc., specific focus is given to Hadoop’s ecosystem and MapReduce’s framework with Python and PySpark.
Course Target Audience:
- Learners who want to earn online through freelancing
- The course is intended for a general audience including business people, managers, system architects/engineers, business/systems analysts, and software developers.
Course Learning Objectives (CLOs) or what will this course bring to you:
After successful completion of this course, learners will be able to:
- Learn the fundamentals of Big Data
- Define the role of traditional ETL (Extract, Transform and Load) in Big Data
- Implement MapReduce framework using Python
- Use orchestration to deal with Big Data processing
- Practice cluster computing through Amazon Web Service for cloud computing
- Set up PySpark environment along with other required additional software packages
- Learn to use Hadoop tool to process and analyze Big Data
- Handle errors and failures encountered in Big Data processing using clustered computing
1. Big Data and its core concepts
2. Python concepts for Big Data processing
3. ETL’s application and implementation in Big Data scenario
4. Implement MapReduce framework to process and analyze Big Data
5. Orchestration in big data context
6. MapReduce along with orchestration
7. Amazon Web Service to implement cloud computing in Big Data processing
8. Anaconda and Jupyter notebook installation
9. Hadoop tool to process and analyze Big Data