Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建
目录结构 Hadoop集群(CDH4)实践之 (0) 前言 Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建 Hadoop集群(CDH4)实践之 (2) HBaseZookeeper搭建 Hadoop集群(CDH4)实践之 (3) Hive搭建 Hadoop集群(CHD4)实践之 (4) Oozie搭建 Hadoop集群(CHD4)实践之 (5) Sqoop安
目录结构
Hadoop集群(CDH4)实践之 (0) 前言
Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建
Hadoop集群(CDH4)实践之 (2) HBase&Zookeeper搭建
Hadoop集群(CDH4)实践之 (3) Hive搭建
Hadoop集群(CHD4)实践之 (4) Oozie搭建
Hadoop集群(CHD4)实践之 (5) Sqoop安装
本文内容
Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建
参考资料
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/CDH4-Installation-Guide.html
环境准备
OS: CentOS 6.4 x86_64
Servers:
hadoop-master: 172.17.20.230 内存10G
- namenode
hadoop- secondarynamenode: 172.17.20.234 内存10G
- secondarybackupnamenode,jobtracker
hadoop-node-1: 172.17.20.231 内存10G
- datanode,tasktracker
hadoop-node-2: 172.17.20.232 内存10G
- datanode,tasktracker
hadoop-node-3: 172.17.20.233 内存10G
- datanode,tasktracker
对以上角色做一些简单的介绍:
namenode - 整个HDFS的命名空间管理服务
secondarynamenode - 可以看做是namenode的冗余服务
jobtracker - 并行计算的job管理服务
datanode - HDFS的节点服务
tasktracker - 并行计算的job执行服务
本文定义的规范,避免在配置多台服务器上产生理解上的混乱:
所有直接以 $ 开头,没有跟随主机名的命令,都代表需要在所有的服务器上执行,除非后面有单独的//开头或在标题说明。
1. 选择最好的安装包
为了更方便和更规范的部署Hadoop集群,我们采用Cloudera的集成包。
因为Cloudera对Hadoop相关的系统做了很多优化,避免了很多因各个系统间版本不符产生的很多Bug。
这也是很多资深Hadoop管理员所推荐的。
https://ccp.cloudera.com/display/DOC/Documentation/
2. 安装Java环境
由于整个Hadoop项目主要是通过Java开发完成的,因此需要JVM的支持。
登陆www.oracle.com(需要创建一个ID),从以下地址下载一个64位的JDK,如jdk-7u45-linux-x64.rpm
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
$ sudo rpm -ivh jdk-7u45-linux-x64.rpm
$ sudo vim /etc/profile.d/java.sh
export JAVA_HOME=/usr/java/jdk1.7.0_45 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
$ sudo chmod +x /etc/profile.d/java.sh
$ source /etc/profile
3. 配置Hadoop安装源
$ sudo rpm --import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
$ cd /etc/yum.repos.d/
$ sudo wget http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo
4. 安装Hadoop相关套件,选择MRv1的框架支持
$ sudo yum install hadoop-hdfs-namenode //仅在hadoop-master上安装
$ sudo yum install hadoop-hdfs-secondarynamenode //仅在hadoop-secondary上安装
$ sudo yum install hadoop-0.20-mapreduce-jobtracker //仅在hadoop-secondary上安装
$ sudo yum install hadoop-hdfs-datanode //仅在hadoop-node上安装
$ sudo yum install hadoop-0.20-mapreduce-tasktracker //仅在hadoop-node上安装
$ sudo yum install hadoop-client
5. 创建Hadoop配置文件
$ sudo cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
6. 激活新的配置文件
$ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
$ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
$ cd /etc/hadoop/conf
7. 添加hosts记录并修改对应的主机名
$ sudo vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.17.20.230 hadoop-master 172.17.20.234 hadoop-secondary 172.17.20.231 hadoop-node-1 172.17.20.232 hadoop-node-2 172.17.20.233 hadoop-node-3
8. 安装LZO支持
$ cd /etc/yum.repos.d
$ sudo wget http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/cloudera-gplextras4.repo
$ sudo yum install hadoop-lzo-cdh4
9. 配置hadoop/conf下的文件
$ sudo vim /etc/hadoop/conf/masters
hadoop-master
$ sudo vim /etc/hadoop/conf/slaves
hadoop-node-1 hadoop-node-2 hadoop-node-3
10. 创建hadoop的HDFS目录
$ sudo mkdir -p /data/{1,2,3,4}/mapred/local
$ sudo chown -R mapred:hadoop /data/{1,2,3,4}/mapred/local
$ sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/ns /data/{1,2,3,4}/dfs/dn
$ sudo chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/ns /data/{1,2,3,4}/dfs/dn
$ sudo chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/ns /data/{1,2,3,4}/dfs/dn
$ sudo mkdir /data/tmp
$ sudo chmod 1777 /data/tmp
11. 配置core-site.xml
$ sudo vim /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="http://heylinux.com/archives/configuration.xsl"?> fs.defaultFS hdfs://hadoop-master:8020 hadoop.tmp.dir /data/tmp/hadoop-${user.name} hadoop.proxyuser.oozie.hosts * hadoop.proxyuser.oozie.groups * hadoop.proxyuser.hive.hosts * hadoop.proxyuser.hive.groups *
12. 配置hdfs-site.xml
$ sudo vim /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="http://heylinux.com/archives/configuration.xsl"?> dfs.namenode.name.dir /data/1/dfs/nn,/nfsmount/dfs/nn dfs.namenode.http-address hadoop-master:50070 fs.namenode.checkpoint.period 3600 fs.namenode.checkpoint.dir /data/1/dfs/ns dfs.namenode.secondary.http-address hadoop-secondary:50090 dfs.replication 3 dfs.permissions.superusergroup supergroup dfs.datanode.data.dir /data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn dfs.datanode.max.xcievers 4096
13. 配置mapred-site.xml
$ sudo vim /etc/hadoop/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="http://heylinux.com/archives/configuration.xsl"?> mapred.job.tracker hadoop-secondary:8021 mapred.local.dir /data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local
14. 格式化HDFS分布式文件系统
$ sudo -u hdfs hadoop namenode -format //仅在hadoop-master上执行一次
15. 启动Hadoop进程
在hadoop-master上启动namenode
$ sudo /etc/init.d//etc/init.d/hadoop-hdfs-namenode start
在hadoop-secondary上启动secondarynamenode,jobtracker
$ sudo /etc/init.d/hadoop-hdfs-secondarynamenode start
$ sudo /etc/init.d/hadoop-0.20-mapreduce-jobtracker start
在hadoop-node上启动datanode,tasktracker
$ sudo /etc/init.d/hadoop-hdfs-datanode start
$ sudo /etc/init.d/hadoop-0.20-mapreduce-tasktracker start
16. 创建mapred.system.dir以及/tmp HDFS目录
以下HDFS操作仅需在任意一台主机上执行一次
$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
$ sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
$ sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
$ sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
$ sudo -u hdfs hadoop fs -ls -R /
$ sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system
17. 配置HADOOP_MAPRED_HOME
$ sudo vim /etc/profile.d/hadoop.sh
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce
$ source /etc/profile
18. 查看整个集群的状态
通过网页进行查看:http://hadoop-master:50070
19. 至此,Hadoop(HDFS)的搭建就已经完成。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

As an email manager application, Microsoft Outlook allows us to schedule events and appointments. It enables us to stay organized by providing tools to create, manage and track these activities (also called events) in the Outlook application. However, sometimes unwanted events are added to the calendar in Outlook, which creates confusion for users and spams the calendar. In this article, we will explore various scenarios and steps that can help us prevent Outlook from automatically adding events to my calendar. Outlook Events – A brief overview Outlook events serve multiple purposes and have many useful features as follows: Calendar Integration: In Outlook

Scenario description for nodes to completely evacuate from ProxmoxVE and rejoin the cluster. When a node in the ProxmoxVE cluster is damaged and cannot be repaired quickly, the faulty node needs to be kicked out of the cluster cleanly and the residual information must be cleaned up. Otherwise, new nodes using the IP address used by the faulty node will not be able to join the cluster normally; similarly, after the faulty node that has separated from the cluster is repaired, although it has nothing to do with the cluster, it will not be able to access the web management of this single node. In the background, information about other nodes in the original ProxmoxVE cluster will appear, which is very annoying. Evict nodes from the cluster. If ProxmoxVE is a Ceph hyper-converged cluster, you need to log in to any node in the cluster (except the node you want to delete) on the host system Debian, and run the command

Dream Weaver CMS Station Group Practice Sharing In recent years, with the rapid development of the Internet, website construction has become more and more important. When building multiple websites, site group technology has become a very effective method. Among the many website construction tools, Dreamweaver CMS has become the first choice of many website enthusiasts due to its flexibility and ease of use. This article will share some practical experience about Dreamweaver CMS station group, as well as some specific code examples, hoping to provide some help to readers who are exploring station group technology. 1. What is Dreamweaver CMS station group? Dream Weaver CMS

PHP Coding Practices: Refusal to Use Alternatives to Goto Statements In recent years, with the continuous updating and iteration of programming languages, programmers have begun to pay more attention to coding specifications and best practices. In PHP programming, the goto statement has existed as a control flow statement for a long time, but in practical applications it often leads to a decrease in the readability and maintainability of the code. This article will share some alternatives to help developers refuse to use goto statements and improve code quality. 1. Why refuse to use goto statement? First, let's think about why

Golang is a powerful and efficient programming language that is widely used to build web services and applications. In network services, traffic management is a crucial part. It can help us control and optimize data transmission on the network and ensure the stability and performance of services. This article will introduce the best practices for traffic management using Golang and provide specific code examples. 1. Use Golang’s net package for basic traffic management. Golang’s net package provides a way to handle network data.

Select the style of the catalog in Word, and it will be automatically generated after the operation is completed. Analysis 1. Go to Word on your computer and click to import. 2After entering, click on the file directory. 3 Then select the style of the directory. 4. After the operation is completed, you can see that the file directory is automatically generated. Supplement: The table of contents of the summary/notes article is automatically generated, including first-level headings, second-level headings and third-level headings, usually no more than third-level headings.

Players can collect different materials to build buildings when playing in the Mistlock Kingdom. Many players want to know whether to build buildings in the wild. Buildings cannot be built in the wild in the Mistlock Kingdom. They must be within the scope of the altar. . Can buildings be built in the wild in Mistlock Kingdom? Answer: No. 1. Buildings cannot be built in the wild areas of the Mist Lock Kingdom. 2. The building must be built within the scope of the altar. 3. Players can place the Spirit Fire Altar by themselves, but once they leave the range, they will not be able to construct buildings. 4. We can also directly dig a hole in the mountain as our home, so we don’t need to consume building materials. 5. There is a comfort mechanism in the buildings built by players themselves, that is to say, the better the interior, the higher the comfort. 6. High comfort will bring attribute bonuses to players, such as

The mobile version of WeChat Reading App is a very good reading software. This software provides a lot of books. You can read them anytime, anywhere with just one click to search and read them online. All of them are officially authorized and different types of books are neatly arranged. Sort and enjoy a comfortable and relaxing reading atmosphere. Switch the reading modes of different scenarios, update the latest book chapters continuously every day, support online login from multiple devices, and batch download to the bookshelf. You can read it with or without the Internet, so that everyone can discover more knowledge from it. Now the editor details it online Promote the method of viewing the catalog for WeChat reading partners. 1. Open the book you want to view the catalog and click in the middle of the book. 2. Click the three lines icon in the lower left corner. 3. In the pop-up window, view the book catalog
