使用Sqoop在HDFS和RDBMS之间导数据
SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述
SQOOP是一款开源的工具,主要用于在Hadoop与传统的数据库间进行数据的传递,,下面从SQOOP用户手册上摘录一段描述
Sqoopis a tool designed to transfer data between Hadoop and relational databases.You can use Sqoop to import data from a relational database management system(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System(HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
SQOOP是Cloudera公司开源的一款在HDFS以及数据库之间传输数据的软件。内部通过JDBC连接HADOOP以及数据库,因此从理论上来讲,只要是支持JDBC的数据库,SQOOP都可以兼容。并且,SQOOP不仅能把数据以文件的形式导入到HDFS上,还可以直接导入数据到HBASE或者HIVE中。
下面是一些性能测试数据,仅供参考:
表名:tb_keywords
行数:11628209
数据文件大小:1.4G
HDFS –> DB
DB -> HDFS
SQOOP
428s
166s
HDFSFILEDB
209s
105s
从结果上来看,以FILE作为中转方式性能是要高于SQOOP的。原因如下:
1、 本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的到导入\导出工具效率高
2、 以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复
即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。
下面是一些操作记录
[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh export.sh
Fri Sep 23 20:15:47 CST 2011
11/09/23 20:15:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:15:48 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:15:48 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:15:49 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:15:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.jar
11/09/23 20:15:49 INFO mapreduce.ExportJobBase: Beginning export of tb_keyword_data_201104
11/09/23 20:15:49 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO mapred.JobClient: Running job: job_201109211521_0012
11/09/23 20:15:50 INFO mapred.JobClient: map 0% reduce 0%
11/09/23 20:16:04 INFO mapred.JobClient: map 1% reduce 0%
11/09/23 20:16:10 INFO mapred.JobClient: map 2% reduce 0%
11/09/23 20:16:13 INFO mapred.JobClient: map 3% reduce 0%
11/09/23 20:16:19 INFO mapred.JobClient: map 4% reduce 0%
11/09/23 20:16:22 INFO mapred.JobClient: map 5% reduce 0%
11/09/23 20:16:25 INFO mapred.JobClient: map 6% reduce 0%
11/09/23 20:16:31 INFO mapred.JobClient: map 7% reduce 0%
11/09/23 20:16:34 INFO mapred.JobClient: map 8% reduce 0%
11/09/23 20:16:41 INFO mapred.JobClient: map 9% reduce 0%
11/09/23 20:16:44 INFO mapred.JobClient: map 10% reduce 0%
11/09/23 20:16:50 INFO mapred.JobClient: map 11% reduce 0%
11/09/23 20:16:53 INFO mapred.JobClient: map 12% reduce 0%
11/09/23 20:16:56 INFO mapred.JobClient: map 13% reduce 0%
11/09/23 20:17:02 INFO mapred.JobClient: map 14% reduce 0%
11/09/23 20:17:05 INFO mapred.JobClient: map 15% reduce 0%
11/09/23 20:17:11 INFO mapred.JobClient: map 16% reduce 0%
11/09/23 20:17:14 INFO mapred.JobClient: map 17% reduce 0%
11/09/23 20:17:17 INFO mapred.JobClient: map 18% reduce 0%
11/09/23 20:17:23 INFO mapred.JobClient: map 19% reduce 0%
11/09/23 20:17:25 INFO mapred.JobClient: map 20% reduce 0%
11/09/23 20:17:28 INFO mapred.JobClient: map 21% reduce 0%
11/09/23 20:17:34 INFO mapred.JobClient: map 22% reduce 0%
11/09/23 20:17:37 INFO mapred.JobClient: map 23% reduce 0%
11/09/23 20:17:43 INFO mapred.JobClient: map 24% reduce 0%
11/09/23 20:17:46 INFO mapred.JobClient: map 25% reduce 0%
11/09/23 20:17:49 INFO mapred.JobClient: map 26% reduce 0%
11/09/23 20:17:55 INFO mapred.JobClient: map 27% reduce 0%
11/09/23 20:17:58 INFO mapred.JobClient: map 28% reduce 0%
11/09/23 20:18:04 INFO mapred.JobClient: map 29% reduce 0%
11/09/23 20:18:07 INFO mapred.JobClient: map 30% reduce 0%
11/09/23 20:18:10 INFO mapred.JobClient: map 31% reduce 0%
11/09/23 20:18:16 INFO mapred.JobClient: map 32% reduce 0%
11/09/23 20:18:19 INFO mapred.JobClient: map 33% reduce 0%
11/09/23 20:18:25 INFO mapred.JobClient: map 34% reduce 0%
11/09/23 20:18:28 INFO mapred.JobClient: map 35% reduce 0%
11/09/23 20:18:31 INFO mapred.JobClient: map 36% reduce 0%
11/09/23 20:18:37 INFO mapred.JobClient: map 37% reduce 0%
11/09/23 20:18:40 INFO mapred.JobClient: map 38% reduce 0%
11/09/23 20:18:46 INFO mapred.JobClient: map 39% reduce 0%
11/09/23 20:18:49 INFO mapred.JobClient: map 40% reduce 0%
11/09/23 20:18:52 INFO mapred.JobClient: map 41% reduce 0%
11/09/23 20:18:58 INFO mapred.JobClient: map 42% reduce 0%
11/09/23 20:19:01 INFO mapred.JobClient: map 43% reduce 0%
11/09/23 20:19:04 INFO mapred.JobClient: map 44% reduce 0%
11/09/23 20:19:10 INFO mapred.JobClient: map 45% reduce 0%
11/09/23 20:19:13 INFO mapred.JobClient: map 46% reduce 0%
11/09/23 20:19:19 INFO mapred.JobClient: map 47% reduce 0%
11/09/23 20:19:22 INFO mapred.JobClient: map 48% reduce 0%
11/09/23 20:19:25 INFO mapred.JobClient: map 49% reduce 0%
11/09/23 20:19:34 INFO mapred.JobClient: map 50% reduce 0%
11/09/23 20:19:37 INFO mapred.JobClient: map 52% reduce 0%
11/09/23 20:19:40 INFO mapred.JobClient: map 53% reduce 0%
11/09/23 20:19:43 INFO mapred.JobClient: map 54% reduce 0%
11/09/23 20:19:46 INFO mapred.JobClient: map 55% reduce 0%
11/09/23 20:19:49 INFO mapred.JobClient: map 56% reduce 0%
11/09/23 20:19:52 INFO mapred.JobClient: map 57% reduce 0%
11/09/23 20:19:55 INFO mapred.JobClient: map 58% reduce 0%
11/09/23 20:19:58 INFO mapred.JobClient: map 59% reduce 0%
11/09/23 20:20:01 INFO mapred.JobClient: map 60% reduce 0%
11/09/23 20:20:04 INFO mapred.JobClient: map 62% reduce 0%
11/09/23 20:20:07 INFO mapred.JobClient: map 63% reduce 0%
11/09/23 20:20:10 INFO mapred.JobClient: map 64% reduce 0%
11/09/23 20:20:13 INFO mapred.JobClient: map 65% reduce 0%
11/09/23 20:20:16 INFO mapred.JobClient: map 66% reduce 0%
11/09/23 20:20:19 INFO mapred.JobClient: map 67% reduce 0%
11/09/23 20:20:22 INFO mapred.JobClient: map 68% reduce 0%
11/09/23 20:20:25 INFO mapred.JobClient: map 69% reduce 0%
11/09/23 20:20:28 INFO mapred.JobClient: map 70% reduce 0%
11/09/23 20:20:31 INFO mapred.JobClient: map 72% reduce 0%
11/09/23 20:20:34 INFO mapred.JobClient: map 73% reduce 0%
11/09/23 20:20:37 INFO mapred.JobClient: map 74% reduce 0%
11/09/23 20:20:40 INFO mapred.JobClient: map 75% reduce 0%
11/09/23 20:20:43 INFO mapred.JobClient: map 76% reduce 0%
11/09/23 20:20:46 INFO mapred.JobClient: map 77% reduce 0%
11/09/23 20:20:49 INFO mapred.JobClient: map 78% reduce 0%
11/09/23 20:20:52 INFO mapred.JobClient: map 80% reduce 0%
11/09/23 20:20:55 INFO mapred.JobClient: map 81% reduce 0%
11/09/23 20:20:58 INFO mapred.JobClient: map 82% reduce 0%
11/09/23 20:21:01 INFO mapred.JobClient: map 83% reduce 0%
11/09/23 20:21:04 INFO mapred.JobClient: map 84% reduce 0%
11/09/23 20:21:07 INFO mapred.JobClient: map 85% reduce 0%
11/09/23 20:21:10 INFO mapred.JobClient: map 86% reduce 0%
11/09/23 20:21:13 INFO mapred.JobClient: map 87% reduce 0%
11/09/23 20:21:22 INFO mapred.JobClient: map 88% reduce 0%
11/09/23 20:21:28 INFO mapred.JobClient: map 89% reduce 0%
11/09/23 20:21:37 INFO mapred.JobClient: map 90% reduce 0%
11/09/23 20:21:47 INFO mapred.JobClient: map 91% reduce 0%
11/09/23 20:21:53 INFO mapred.JobClient: map 92% reduce 0%
11/09/23 20:22:02 INFO mapred.JobClient: map 93% reduce 0%
11/09/23 20:22:11 INFO mapred.JobClient: map 94% reduce 0%
11/09/23 20:22:17 INFO mapred.JobClient: map 95% reduce 0%
11/09/23 20:22:26 INFO mapred.JobClient: map 96% reduce 0%
11/09/23 20:22:32 INFO mapred.JobClient: map 97% reduce 0%
11/09/23 20:22:41 INFO mapred.JobClient: map 98% reduce 0%
11/09/23 20:22:47 INFO mapred.JobClient: map 99% reduce 0%
11/09/23 20:22:53 INFO mapred.JobClient: map 100% reduce 0%
11/09/23 20:22:55 INFO mapred.JobClient: Job complete: job_201109211521_0012
11/09/23 20:22:55 INFO mapred.JobClient: Counters: 6
11/09/23 20:22:55 INFO mapred.JobClient: Job Counters
11/09/23 20:22:55 INFO mapred.JobClient: Launched map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient: Data-local map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient: FileSystemCounters
11/09/23 20:22:55 INFO mapred.JobClient: HDFS_BYTES_READ=1392402240
11/09/23 20:22:55 INFO mapred.JobClient: Map-Reduce Framework
11/09/23 20:22:55 INFO mapred.JobClient: Map input records=11628209
11/09/23 20:22:55 INFO mapred.JobClient: Spilled Records=0
11/09/23 20:22:55 INFO mapred.JobClient: Map output records=11628209
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Transferred 1.2968 GB in 425.642 seconds (3.1198 MB/sec)
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Exported 11628209 records.
Fri Sep 23 20:22:55 CST 2011
###############
[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh import.sh
Fri Sep 23 20:40:33 CST 2011
11/09/23 20:40:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:40:33 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:40:33 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:40:34 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:40:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.jar
11/09/23 20:40:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/23 20:40:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/23 20:40:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/23 20:40:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/23 20:40:34 INFO mapreduce.ImportJobBase: Beginning import of tb_keyword_data_201104
11/09/23 20:40:34 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:40 INFO mapred.JobClient: Running job: job_201109211521_0014
11/09/23 20:40:41 INFO mapred.JobClient: map 0% reduce 0%
11/09/23 20:40:54 INFO mapred.JobClient: map 25% reduce 0%
11/09/23 20:40:57 INFO mapred.JobClient: map 50% reduce 0%
11/09/23 20:41:36 INFO mapred.JobClient: map 75% reduce 0%
11/09/23 20:42:00 INFO mapred.JobClient: map 100% reduce 0%
11/09/23 20:43:19 INFO mapred.JobClient: Job complete: job_201109211521_0014
11/09/23 20:43:19 INFO mapred.JobClient: Counters: 5
11/09/23 20:43:19 INFO mapred.JobClient: Job Counters
11/09/23 20:43:19 INFO mapred.JobClient: Launched map tasks=4
11/09/23 20:43:19 INFO mapred.JobClient: FileSystemCounters
11/09/23 20:43:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1601269219
11/09/23 20:43:19 INFO mapred.JobClient: Map-Reduce Framework
11/09/23 20:43:19 INFO mapred.JobClient: Map input records=11628209
11/09/23 20:43:19 INFO mapred.JobClient: Spilled Records=0
11/09/23 20:43:19 INFO mapred.JobClient: Map output records=11628209
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Transferred 1.4913 GB in 165.0126 seconds (9.2544 MB/sec)
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Retrieved 11628209 records.
Fri Sep 23 20:43:19 CST 2011
import.sh和export.sh中的主要命令如下
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop import --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --split-by winfo_id --target-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop export --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --export-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''

热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

Video Face Swap
使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热门文章

热工具

记事本++7.3.1
好用且免费的代码编辑器

SublimeText3汉化版
中文版,非常好用

禅工作室 13.0.1
功能强大的PHP集成开发环境

Dreamweaver CS6
视觉化网页开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

MySQL在Web应用中的主要作用是存储和管理数据。1.MySQL高效处理用户信息、产品目录和交易记录等数据。2.通过SQL查询,开发者能从数据库提取信息生成动态内容。3.MySQL基于客户端-服务器模型工作,确保查询速度可接受。

InnoDB使用redologs和undologs确保数据一致性和可靠性。1.redologs记录数据页修改,确保崩溃恢复和事务持久性。2.undologs记录数据原始值,支持事务回滚和MVCC。

MySQL在数据库和编程中的地位非常重要,它是一个开源的关系型数据库管理系统,广泛应用于各种应用场景。1)MySQL提供高效的数据存储、组织和检索功能,支持Web、移动和企业级系统。2)它使用客户端-服务器架构,支持多种存储引擎和索引优化。3)基本用法包括创建表和插入数据,高级用法涉及多表JOIN和复杂查询。4)常见问题如SQL语法错误和性能问题可以通过EXPLAIN命令和慢查询日志调试。5)性能优化方法包括合理使用索引、优化查询和使用缓存,最佳实践包括使用事务和PreparedStatemen

MySQL与其他编程语言相比,主要用于存储和管理数据,而其他语言如Python、Java、C 则用于逻辑处理和应用开发。 MySQL以其高性能、可扩展性和跨平台支持着称,适合数据管理需求,而其他语言在各自领域如数据分析、企业应用和系统编程中各有优势。

MySQL适合小型和大型企业。1)小型企业可使用MySQL进行基本数据管理,如存储客户信息。2)大型企业可利用MySQL处理海量数据和复杂业务逻辑,优化查询性能和事务处理。

MySQL索引基数对查询性能有显着影响:1.高基数索引能更有效地缩小数据范围,提高查询效率;2.低基数索引可能导致全表扫描,降低查询性能;3.在联合索引中,应将高基数列放在前面以优化查询。

MySQL的基本操作包括创建数据库、表格,及使用SQL进行数据的CRUD操作。1.创建数据库:CREATEDATABASEmy_first_db;2.创建表格:CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY,titleVARCHAR(100)NOTNULL,authorVARCHAR(100)NOTNULL,published_yearINT);3.插入数据:INSERTINTObooks(title,author,published_year)VA

MySQL适合Web应用和内容管理系统,因其开源、高性能和易用性而受欢迎。1)与PostgreSQL相比,MySQL在简单查询和高并发读操作上表现更好。2)相较Oracle,MySQL因开源和低成本更受中小企业青睐。3)对比MicrosoftSQLServer,MySQL更适合跨平台应用。4)与MongoDB不同,MySQL更适用于结构化数据和事务处理。
