配置apache HIVE元数据DB为PostgreSQL
本文出处:http://amutu.com/blog/2013/06/hive-metastore-db-postgresql/ HIVE 的元数据默认使用 derby 作为存储 DB , derby 作为轻量级的 DB ,在开发、测试过程中使用比较方便,但是在实际的生产环境中,还需要考虑易用性、容灾、稳定性以及各种监控、运
本文出处:http://amutu.com/blog/2013/06/hive-metastore-db-postgresql/
HIVE的元数据默认使用derby作为存储DB,derby作为轻量级的DB,在开发、测试过程中使用比较方便,但是在实际的生产环境中,还需要考虑易用性、容灾、稳定性以及各种监控、运维工具等,这些都是derby缺乏的。MySQL和PostgreSQL是两个比较常用的开源数据库系统,在生产环境中比较多的用来替换derby。配置MySQL在网上的文章比较多,这里不再赘述,本文主要描述配置HIVE元数据DB为PostgreSQL的方法。
HIVE版本:HIVE 0.7-snapshot,HIVE 0.8-snapshot
步骤1:在PG中为元数据增加用户的DB
首先在PostgreSQL中为HIVE的元数据建立帐号和DB。
--以管理员身份登入PG:
psql postgres -U postgres
--创建用户hive_user:
Create user hive_user;
--创建DB metastore_db,owner为hive_user:
Create database metastore_db with owner=hive_user;
--设置hive_user的密码:
/password hive_user
完成以上步骤以后,还要确保PostgreSQL的pg_hba.conf中的配置允许HIVE所在的机器ip可以访问PG。
步骤2:下载PG的JDBC驱动
在HIVE_HOME目录下创建auxlib目录:
mkdir auxlib
此时HIVE_HOME目录中应该有bin,lib,auxlib,conf等目录。
下载PG的JDBC驱动
Wget http://jdbc.postgresql.org/download/postgresql-9.0-801.jdbc4.jar
将下载到的postgresql-9.0-801.jdbc4.jar放到auxlib中。
步骤3:修改HIVE配置文件
在HIVE_HOME中新建hive-site.xml 文件,内容如下,蓝色字体按照PG server的相关信息进行修改。
步骤4:初始化元数据表
元数据库metastore中默认没有表,当HIVE第一次使用某个表的时候,如果发现该表不存在就会自动创建。对derby和mysql,这个过程没有问题,因此derby和mysql作为元数据库不需要这一步。
PostgreSQL在初始化的时候,会遇到一些问题,导致PG数据库死锁。例如执行以下HIVE语句:
>Create table kv (key,int,value string) partitioned by (ds string);
OK
>Alter table kv add partition (ds = '20110101');
执行这一句的时候,HIVE会一直停在这。
查看PG数据库,发现有两个连接在进行事务操作,其中一个是:
此时处于事务中空闲,另外一个是:
ALTER TABLE "PARTITIONS" ADD CONSTRAINT "PARTITIONS_FK1" FOREIGN KEY ("SD_ID") REFERENCES "SDS" ("SD_ID") INITIALLY DEFERRED
处于等待状态。
进一步查看日志,发现大致的过程是这样的:
HIVE发起Alter table kv add partition (ds = '20110101')语句,此时DataNucleus接口发起第一个isolation为SERIALIZABLE的事务,锁定了TBLS等元数据表。在这个的事务进行过程中,DataNucleu发现PARTITIONS等表没有,则要自动创建。于是又发起了另外一个isolation为SERIALIZABLE的事务,第一个事务变为
类似的情况出现在:
>create test(key int);
OK
>drop table test;
当drop table时会去drop它的index,而此时没有index元数据表,它去键,然后产生死锁。
有三种方法可以解决这个死锁问题:
第一种方法:
使用PG的pg_terminate_backend()将第一个事务结束掉,这样可以保证第二个事务完成下去,将元数据表键成功。
第二种方法:
使HIVE将创建元数据表的过程和向元数据表中添加数据的过程分离:
>Create table kv (key,int,value string) partitioned by (ds string);
OK
>show partitions kv;
OK
>Alter table kv add partition (ds = '20110101');
OK
执行以上语句时就不会发生死锁,因为在执行show partitions kv语句时,它是只读语句,不会加锁。当这个语句发现PARTITIONS等表不在时,创建这些表不会发生死锁。
同样对于index表,使用
>Show index on kv;
可以将IDXS表建好。
第三种方法:
使用DataNucleu提供的SchemaTool,将HIVE的metastore/src/model/package.jdo文件作为输入,这个工具可以自动创建元数据中的表。具体的使用方法见:
http://www.datanucleus.org/products/accessplatform_2_0/rdbms/schematool.html
小结
本文给出了使用PostgreSQL作为HIVE元数据DB的配置方法,以及遇到的死锁问题的解决办法,希望对使用HIVE和PostgreSQL的朋友有帮助。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











To set up a CGI directory in Apache, you need to perform the following steps: Create a CGI directory such as "cgi-bin", and grant Apache write permissions. Add the "ScriptAlias" directive block in the Apache configuration file to map the CGI directory to the "/cgi-bin" URL. Restart Apache.

There are 3 ways to view the version on the Apache server: via the command line (apachectl -v or apache2ctl -v), check the server status page (http://<server IP or domain name>/server-status), or view the Apache configuration file (ServerVersion: Apache/<version number>).

When the Apache 80 port is occupied, the solution is as follows: find out the process that occupies the port and close it. Check the firewall settings to make sure Apache is not blocked. If the above method does not work, please reconfigure Apache to use a different port. Restart the Apache service.

Apache connects to a database requires the following steps: Install the database driver. Configure the web.xml file to create a connection pool. Create a JDBC data source and specify the connection settings. Use the JDBC API to access the database from Java code, including getting connections, creating statements, binding parameters, executing queries or updates, and processing results.

How to view the Apache version? Start the Apache server: Use sudo service apache2 start to start the server. View version number: Use one of the following methods to view version: Command line: Run the apache2 -v command. Server Status Page: Access the default port of the Apache server (usually 80) in a web browser, and the version information is displayed at the bottom of the page.

How to configure Zend in Apache? The steps to configure Zend Framework in an Apache Web Server are as follows: Install Zend Framework and extract it into the Web Server directory. Create a .htaccess file. Create the Zend application directory and add the index.php file. Configure the Zend application (application.ini). Restart the Apache Web server.

Apache cannot start because the following reasons may be: Configuration file syntax error. Conflict with other application ports. Permissions issue. Out of memory. Process deadlock. Daemon failure. SELinux permissions issues. Firewall problem. Software conflict.

To delete an extra ServerName directive from Apache, you can take the following steps: Identify and delete the extra ServerName directive. Restart Apache to make the changes take effect. Check the configuration file to verify changes. Test the server to make sure the problem is resolved.
