Flume and Sqoop for Ingesting Big Data
- Learning Style
- Course Duration
Select A Class Schedule
Import data : Flume and Sqoop play a special role in the Hadoop ecosystem. They transport data from sources like local file systems, HTTP, MySQL and Twitter which hold/produce data to data stores like HDFS, HBase and Hive. Both tools come with built-in functionality and abstract away users from the complexity of transporting data between these systems.
Flume: Flume Agents can transport data produced by a streaming application to data stores like HDFS and HBase.
Sqoop: Use Sqoop to bulk import data from traditional RDBMS to Hadoop storage architectures like HDFS or Hive.
Practical implementations for a variety of sources and data stores ..
- Sources : Twitter, MySQL, Spooling Directory, HTTP
- Sinks : HDFS, HBase, Hive
Flume features :
Flume Agents, Flume Events, Event bucketing, Channel selectors, Interceptors
Sqoop features :
Sqoop import from MySQL, Incremental imports using Sqoop Jobs
- Engineers building an application with HDFS/HBase/Hive as the data store
- Engineers who want to port data from legacy data stores to HDFS
- Knowledge of HDFS is a prerequisite for the course
- HBase and Hive examples assume basic understanding of HBase and Hive shells
- HDFS is required to run most of the examples, so you'll need to have a working installation of HDFS
Self-Paced Learning Outline
- You, This Course and Us
- Why do we need Flume and Sqoop?
|Learning Style||Self-Paced Learning|
|Course Duration||2 Hours|