在Eclipse中运行Nutch2.3
参考http://wiki.apache.org/nutch/RunNutchInEclipse 一、环境准备 1、下载nutch2.3源代码 wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz 或者下载正在开发中的最新版本 svn co https://svn.apache.org/repos/asf/nutch/bra
参考http://wiki.apache.org/nutch/RunNutchInEclipse
一、环境准备
1、下载nutch2.3源代码
wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz
svn co https://svn.apache.org/repos/asf/nutch/branches/2.x
2、选择使用的数据库类型,以hbase为例
在conf/nutch-site.xml中增加以下属性:
<property> <name>storage.data.store.class</name> <value>org.apache.gora.hbase.store.HBaseStore</value> <description>Default class for storing data</description> </property>
3、在ivy/ivy.xml中增加与hbase相关的依赖项,此项本已存在,但被注释掉,将注释去掉即可
<dependency org="org.apache.gora" name="gora-hbase" rev="0.5" conf="*->default” />注意,rev=0.5对应hbase0.94,rev=0.3对应hbase0.90.4
4、在nutch.xml中增加以下3个属性
<property> <name>http.agent.name</name> <value>My Nutch Spider</value> </property> <property> <name>http.robots.agents</name> <value>none</value> </property> <property> <name>plugin.folders</name> <value>/Users/liaoliuqing/0_Search/1_Nutch/1_Official/apache-nutch-2.3/build/plugins</value> </property>其中plugin.folders的值为$NUTCH_HOME/build/plugins
5、执行ant eclipse
二、导入project
1、导入project
三、运行程序
1、Run as ----> Run configuration,选择project与主类
2、填写参数
/Users/liaoliuqing/Downloads/seed.txt
-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
3、点击run,输出结果如下:
InjectorJob: starting at 2015-01-28 16:27:43
InjectorJob: Injecting urlDir: /Users/liaoliuqing/Downloads/seed.txt
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2015-01-28 16:27:47, elapsed: 00:00:04
注意,在运行程序前,本机需要先启动hbase。
4、查看hbase中的数据
hbase(main):003:0> scan 'webpage' ROW COLUMN+CELL com.163.www:http/ column=f:fi, timestamp=1422433667377, value=\x00'\x8D\x00 com.163.www:http/ column=f:ts, timestamp=1422433667377, value=\x00\x00\x01K/\xA7:\x14 com.163.www:http/ column=mk:_injmrk_, timestamp=1422433667377, value=y com.163.www:http/ column=mk:dist, timestamp=1422433667377, value=0 com.163.www:http/ column=mtdt:_csh_, timestamp=1422433667377, value=?\x80\x00\x00 com.163.www:http/ column=s:s, timestamp=1422433667377, value=?\x80\x00\x00 1 row(s) in 0.2970 seconds

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to set background color in Eclipse? Eclipse is a popular integrated development environment (IDE) among developers and can be used for development in a variety of programming languages. It is very powerful and flexible, and you can customize the appearance of the interface and editor through settings. This article will introduce how to set the background color in Eclipse and provide specific code examples. 1. Change the editor background color. Open Eclipse and enter the "Windows" menu. Select "Preferences". Navigate on the left

How to execute .sh file in Linux system? In Linux systems, a .sh file is a file called a Shell script, which is used to execute a series of commands. Executing .sh files is a very common operation. This article will introduce how to execute .sh files in Linux systems and provide specific code examples. Method 1: Use an absolute path to execute a .sh file. To execute a .sh file in a Linux system, you can use an absolute path to specify the location of the file. The following are the specific steps: Open the terminal

PyCharm is a very popular Python integrated development environment (IDE). It provides a wealth of functions and tools to make Python development more efficient and convenient. This article will introduce you to the basic operation methods of PyCharm and provide specific code examples to help readers quickly get started and become proficient in operating the tool. 1. Download and install PyCharm First, we need to go to the PyCharm official website (https://www.jetbrains.com/pyc

Professional guidance: Expert advice and steps for installing the Lombok plug-in in Eclipse, specific code examples are required Summary: Lombok is a Java library that simplifies the writing of Java code through annotations and provides some powerful tools. This article will introduce readers to the steps of how to install and configure the Lombok plug-in in Eclipse, and provide some specific code examples so that readers can better understand and use the Lombok plug-in. Download the Lombok plug-in first, we need

The solution to Eclipse code running problems is revealed: it helps you eliminate various code running errors and requires specific code examples. Introduction: Eclipse is a commonly used integrated development environment (IDE) and is widely used in Java development. Although Eclipse has powerful functions and a friendly user interface, it is inevitable to encounter various running problems when writing and debugging code. This article will reveal some common Eclipse code running problems and provide solutions. Please note that in order to better help readers understand, this

How to customize shortcut key settings in Eclipse? As a developer, mastering shortcut keys is one of the keys to improving efficiency when coding in Eclipse. As a powerful integrated development environment, Eclipse not only provides many default shortcut keys, but also allows users to customize them according to their own preferences. This article will introduce how to customize shortcut key settings in Eclipse and give specific code examples. Open Eclipse First, open Eclipse and enter

Teach you step by step how to change the background color in Eclipse, specific code examples are required Eclipse is a very popular integrated development environment (IDE) that is often used to write and debug Java projects. By default, the background color of Eclipse is white, but some users may wish to change the background color to suit their preference or to reduce eye strain. This article will teach you step by step how to change the background color in Eclipse and provide specific code examples. Step 1: Open Eclipse First

Why can't win7 run exe files? When using the Windows7 operating system, many users may encounter a common problem, that is, they cannot run exe files. exe files are common executable files in Windows operating systems. They are usually used to install and run various applications. However, some users may find that when they try to run the exe file, the system does not respond or gives an error message. There are many reasons for this problem. Below are some common causes and corresponding solutions:
