Hadoop Programming with Java for Big Data Solutions
Virtual ClassroomLearning Style
4 DaysCourse Duration
About Individual Course:
About this course:
The availability of large data sets presents new opportunities and challenges to organizations of all sizes. In this course, you will implement a strategy for developing Hadoop jobs and extracting business value from large and varied data sets. This Apache Hadoop development training is essential for programmers who want to augment their programming skills to use Hadoop for a variety of big data solutions.
The average salary for Hadoop Developer is $139,000 per year.
After completing this course, students will be able to:
- Write, customize, and deploy Java MapReduce jobs to summarize data
- Develop Hive and Pig queries to simplify data analysis
- Test and debug jobs using MRUnit
- Monitor task execution and cluster health
This course is intended for:
- Big Data Engineers
- Java experience through Java introductory course, or at least six months of Java programming experience
Suggested prerequisites courses:
Virtual Instructed-Led Outline
Introduction to Hadoop
- Identifying the business benefits of Hadoop
- Surveying the Hadoop ecosystem
- Selecting a suitable distribution
Parallelizing Program Execution
Meeting the challenges of parallel programming
- Investigating parallelisable challenges: algorithms, data and information exchange
- Estimating the storage and complexity of Big Data
Parallel programming with MapReduce
- Dividing and conquering large-scale problems
- Uncovering jobs suitable for MapReduce
- Solving typical business problems
Implementing Real-World MapReduce Jobs
Applying the Hadoop MapReduce paradigm
- Configuring the development environment
- Exploring the Hadoop distribution
- Creating the components of MapReduce jobs
- Introducing the Hadoop daemons
- Analyzing the stages of MapReduce processing: splitting, mapping, shuffling and reducing
Building complex MapReduce jobs
- Selecting and employing multiple mappers and reducers
- Leveraging built-in mappers, reducers and partitioners
- Analyzing time series data with secondary sort
- Streaming tasks through various programming languages
Solving common data manipulation problems
- Executing algorithms: parallel sorts, joins and searches
- Analyzing log files, social media data and e-mails
Implementing partitioners and comparators
- Identifying network-bound, CPU-bound and disk I/O-bound parallel algorithms
- Dividing the workload efficiently using partitioners
- Controlling grouping and sort order with comparators
- Collecting metrics with counters
Persisting Big Data with Distributed Data Stores
Making the case for distributed data
- Achieving high performance data throughput
- Recovering from media failure through redundancy
Interfacing with Hadoop Distributed File System (HDFS)
- Breaking down the structure and organization of HDFS
- Loading raw data and retrieving results
- Reading and writing data programmatically
- Manipulating Hadoop SequenceFile types
- Sharing reference data with DistributedCache
Structuring data with HBase
- Migrating from structured to unstructured storage
- Applying NoSQL concepts with schema on read
- Connecting to HBase from MapReduce jobs
- Comparing HBase to other types of NoSQL data stores
Simplifying Data Analysis with Query Languages
Unleashing the power of SQL with Hive
- Structuring databases, tables, views and partitions
- Integrating MapReduce jobs with Hive queries
- Querying with HiveQL
- Accessing Hive servers through JDBC
- Extending HiveQL with User-Defined Functions (UDF)
Executing workflows with Pig
- Developing Pig Latin scripts to consolidate workflows
- Integrating Pig queries with Java
- Interacting with data through the grunt console
- Extending Pig with User-Defined Functions (UDF)
Managing and Deploying Big Data Solutions
Testing and debugging Hadoop code
- Logging significant events for auditing and debugging
- Debugging in local mode
- Validating requirements with MRUnit
Deploying, monitoring and tuning performance
- Deploying to a production cluster
- Optimizing performance with administrative tools
- Monitoring job execution through web user interfaces
|Learning Style||Virtual Classroom|
|Course Duration||4 Days|
Frequently Asked Questions About Virtual Instructor-Led Courses
I can't connect to my class, what are my options?
The link to the class is available upon logging in to your dashboard. If you are unable to see it, please contact our support team at 1-855-800-8240 and they will be happy to provide you the direct link via email or the dial in number.
I can't make it to attend to class. Can I reschedule?
Yes, you can reschedule your class. Please contact your Sales representative and they will arrange this for you. If you forgot his/her name, feel free to contact our support team at email@example.com or 1-855-800-8240.
Will I get my certificate upon completion?
Yes. Upon completion of the course, it will be available on your course as a Trophy Icon for you to download. If you do not see this, you will need to contact firstname.lastname@example.org with the following details so they can email you the certificate: Class Name, Class Date, Account Rep, and Your Email.
I cannot connect to my lab. Help!
Your Lab is accessible on the bottom part of your course. You will see a button that says "LAB". Just click it to launch the lab. Please note that some classes don’t need/require a LAB. You can verify with our support team by calling them at 1-855-800-8240 or by email at email@example.com. You can also check with your Instructor or the Associate Instructor if your class includes one.
What is my access code for Skillpipe?
A. Not all of the classes have or require Skillpipe. If your class includes one, please check your email as you should have received one from firstname.lastname@example.org. In case you do not find it in your inbox, please check the Spam / Junk folder. For any further assistance, you can call the support at 1-855-800-8240 or contact them via email at email@example.com.
I don't have audio. I can't hear the instructor.
Make sure you are using a compatible headset for your laptop or computer. If you don’t have a headset, you can use the built-in speaker of your laptop. Otherwise, you can use the dial in option by calling the dial in number provided in the class joining email. You may also contact support team for the dial in numbers associated for your training at 1-855-800-8240 or contact them via email at firstname.lastname@example.org.
How can I reach student support?
Support can be reach via phone at 1855-800-8240; via email at email@example.com or via chat support through the chat button on our website. Please note that support office hours will be from 8am-5pm CST Monday to Friday. Any concerns after office hours will be attended the following business day.
Have Questions? Ask Us.
Turn Training Into A Personalized Learning Experience
- Problem Solving through ExpertConnect & Peer-To-Peer Learning
- Find The Quickest Path To Learn With Career Paths
- Access All Courses With Master Subscription
- Manage Your Team With Learning Analytics
- Virtual Classroom Training & Self-Paced Learning
- Integrate With Your LMS Through API's