Home Backend Development PHP Tutorial Capture the 'IC Trading Network' supplier program_PHP tutorial

Capture the 'IC Trading Network' supplier program_PHP tutorial

Jul 13, 2016 pm 05:48 PM
modify php supplier crawl program

/**
* Capture the "IC Trading Network" supplier main program
* author Lee.
* Last modify $Date: 2012-2-6 10:44:32$
* Note: This program is executed according to the encoding GB2312, because the "IC Trading Network" website is encoded in GB2312, and the database must be consistent
*/ 
class ic { 
    private $key; // 型号  
    private $pageNum; // 页码  
 
    /**
* Entry program
​​*/ 
    public function go($key) { 
        $this->key = $key; 
        $this->pageNum = $this->getPageNum(); 
        $this->getInfo(); 
    } 
 
    /**
* Get the supplier url link array
* @return ArrayObject
​​*/ 
    private function getInfo() { 
        if ($this->pageNum==1) { # 处理只有一页的情况 
            $arr = $this->shopUrlMatchReArr($this->getContent()); 
            $this->isAddSuccess($arr); 
        } elseif ($this->pageNum>1) { # 多页 
            for ($i=1; $i<=$this->pageNum; $i++) { 
                $arr = $this->shopUrlMatchReArr($this->getContent($i)); 
                $this->isAddSuccess($arr); 
            } 
        }    
    } 
     
    /**
* Print whether the addition was successful
* @param ArrayObject $arr
* @return string
​​*/ 
    private function isAddSuccess($arr) { 
        foreach ($arr as $k=>$v) { 
            if ($this->execAdd($this->getInfoByShopUrl($v))) { 
                echo 'Add Success!!'; 
            } else { 
                echo 'Add Faild!!'; 
            } 
        } 
    } 
 
    /**
* Execute adding to database
* @param ArrayObject $infoArr
* @return Number The number of affected rows
​​*/ 
    private function execAdd($infoArr) { 
        $mysqli = $this->getDb(); 
        if (!emptyempty($infoArr['company'])) { 
            if (!$this->isExists($mysqli, $infoArr)) { 
                $num = $mysqli->query("INSERT INTO ic(company,address,phone,mobile,fax,zip,person,qq,msn,email,website,regDate,shopUrl) VALUES ('{$infoArr['company']}','{$infoArr['address']}','{$infoArr['phone']}','{$infoArr['mobile']}','{$infoArr['fax']}','{$infoArr['zip']}','{$infoArr['person']}','{$infoArr['qq']}','{$infoArr['msn']}','{$infoArr['email']}','{$infoArr['website']}','{$infoArr['regDate']}','{$infoArr['shopUrl']}')"); 
                return $num; 
            } else { 
                return false; # 表示数据已经存在 
            } 
        } else { 
            return false; 
        } 
    } 
 
    /**
* Connect to database
​​*/ 
    private function getDb() { 
        $mysqli = new mysqli('localhost', 'root', '1715544', 'weiku'); 
        $mysqli->query('SET NAMES GB2312'); 
        return $mysqli; 
    } 
 
    /**
* Check if the company already exists
* @param Resource $mysqli
* @param ArrayObject $infoArr
* @return bool
​​*/ 
    private function isExists($mysqli, $infoArr) { 
        $mysqli->query("SELECT company FROM ic WHERE company = '{$infoArr['company']}'"); 
        if ($mysqli->affected_rows) { 
            return true; 
        } else { 
            return false; 
        } 
    } 
 
    /**
* Format string
* @param string $str
* @return string
​​*/ 
    private function formatString($str) { 
        return trim($str); 
    } 
 
    /**
* Grab information
* @param $url
* @return ArrayObject
​​*/ 
    private function getInfoByShopUrl($url) { 
        $re = $this->getUrlInfo($url); 
        if (stristr($re, '')) $re = preg_replace('/.*/Usi', '', $re); 
        preg_match_all('/(.+)/Usi', $re, $companyArr); 
        preg_match_all('/地址:(.*)/Usi', $re, $addressArr); 
        preg_match_all('/电话:(.*)/Usi', $re, $phoneArr); 
        preg_match_all('/手机:(.*)/Usi', $re, $mobileArr); 
        preg_match_all('/传真:(.*)/Usi', $re, $faxArr); 
        preg_match_all('/邮编:(.*)/Usi', $re, $zipArr); 
        preg_match_all('/联系人:(.*)/Usi', $re, $personArr); 
        preg_match_all('/QQ:(.*)/Usi', $re, $qqArr); 
        preg_match_all('/MSN:(.*)/Usi', $re, $msnArr); 
        preg_match_all('/Email:(.*)/Usi', $re, $emailArr); 
        preg_match_all('/网址:(.*)/Usi', $re, $websiteArr); 
        preg_match_all('/注册日期:(.*)/Usi', $re, $regDateArr); 
        $infoArr = array( 
            'company'=>$this->formatString($companyArr[1][0]), 
            'address'=>$this->formatString($addressArr[1][0]), 
            'phone'=>$this->formatString($phoneArr[1][0]), 
            'mobile'=>$this->formatString($mobileArr[1][0]), 
            'fax'=>$this->formatString($faxArr[1][0]), 
            'zip'=>$this->formatString($zipArr[1][0]), 
            'person'=>$this->formatString($personArr[1][0]), 
            'qq'=>$this->formatString($qqArr[1][0]), 
            'msn'=>$this->formatString($msnArr[1][0]), 
            'email'=>$this->formatString($emailArr[1][0]), 
            'website'=>$this->stripATags($this->formatString($websiteArr[1][0])), 
            'regDate'=>$this->formatString($regDateArr[1][0]), 
            'shopUrl'=>$url 
        ); 
        return $infoArr; 
    } 
 
    /**
* Get the supplier url array based on the page
* @param string $re
* @return ArrayObject
​​*/ 
    private function shopUrlMatchReArr($re) { 
        preg_match_all('/.+/Usi', $re, $arr); 
        $arr = $this->formatUrlArr(array_unique($arr[1])); 
        return $arr; 
    } 
     
    /**
* * Format array
* @param Array $arr
* @return ArrayObject
​​*/ 
    private function formatUrlArr($arr) { 
        $newArr = array(); 
        foreach ($arr as $key=>$value) { 
            if ($this->isExistsHttp($value)) { 
                $newArr[$key] = $value; 
            } 
        } 
        return $newArr; 
    } 
     
    /**
* Format QQ
* @param string $str
* @return string
​​*/ 
    private function formatQqMsn($str, $e='QQ') { 
        if (emptyempty($str)) return ''; 
        preg_match_all('/alt="'.$e.':(.+)"/Usi', $str, $arr); 
        if (count($arr[1])==1) return $arr[1][0]; 
        $newStr = null; 
        foreach ($arr[1] as $value) { 
            $newStr .= $value . ' '; 
        } 
        return rtrim($newStr, ' '); 
    } 
 
    /**
     * 去掉网址的 A 标签
     * @param string $site
     * @return string
     */ 
    private function stripATags($site) { 
        $site = preg_replace('/(.+)/', '1', $site); 
        return $site; 
    } 
 
    /**
* Check if the url has http
* @param string $url
* @return bool
​​*/ 
    private function isExistsHttp($url) { 
        if (stristr($url, 'http://')) { 
            return true; 
        } else { 
            return false; 
        } 
    } 
     
    /**
* Get page content
* @param Number $page
* @return string
​​*/ 
    private function getContent($page=1) { 
        $re = file_get_contents($this->getUrl($this->key, $page)); 
        return $re; 
    } 
     
    /**
* Get page number
* @return Number
​​*/ 
    private function getPageNum() { 
        $i = null; 
        $re = $this->getContent(); 
        preg_match_all('/共(.+)页/Usi', $re, $arr); 
        $i = $arr[1][0]; 
        return $i; 
    } 
 
    /**
* Get URL link
* @param string $str
* @param int $page page number
* @return string
​​*/ 
    private function getUrl($str, $page=1) { 
        return "http://www.ic.net.cn/partsearch/searchinstock.asp?newtype=1&area=&Page={$page}&partnumber={$str}&mfg=&DateCode=&QTY=&PRICE=&Exact=&orderby=inputdate&qty_filter=50&usertype2=1&pack="; 
    } 
 
    /**
* Get page content
* @param string $url
* @return string
​​*/ 
    private function getUrlInfo($url) { 
        $re = file_get_contents($url); 
        return $re; 
    } 

 
/*
程序运行思路:根据“IC 交易网”的IC搜索功能,输入型号进行搜索,然后抓取供应商信息
 
数据库结构
CREATE TABLE `ic` (
    `id` mediumint(8) unsigned NOT NULL auto_increment,
    `company` varchar(500) NOT NULL,
    `address` varchar(500) default NULL,
    `phone` varchar(500) default NULL,
    `mobile` varchar(500) default NULL,
    `fax` varchar(300) default NULL,
    `zip` varchar(300) default NULL,
    `person` varchar(500) default NULL,
    `qq` varchar(300) default NULL,
    `msn` varchar(300) default NULL,
    `email` varchar(500) default NULL,
    `website` varchar(300) default NULL,
    `regDate` varchar(500) default NULL,
    PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=gb2312
*/ 
 
$i = new ic(); 
$arr = array_unique(array('MAX3232', 'AML8613', 'MT6225A', 'OM8373PS/N3/A', 'PT7313', 'MAX8212ESA', 'TL431', 'S3C2440', 'TMS320F2812PGFA', 'PCM1704', 'AN6717', 'CA3162E', 'CA3161E', 'LM393N', 'DS18B20', 'SHT10', 'AML8613', 'AN6717', 'LM393N', 'CA3161E', 'CA3162E', 'PCM1704', 'STK392-040', 'K1667', 'MAX232', 'STM32F103', 'LM358')); 
foreach ($arr as $v) { 
    $i->go($v); 

?> 
/**
* Capture the "IC Trading Network" supplier main program
* author Lee.
* Last modify $Date: 2012-2-6 10:44:32$
* Note: This program is executed according to the encoding GB2312, because the "IC Trading Network" website is encoded in GB2312, and the database must be consistent
*/
class ic {
 private $key; // 型号
 private $pageNum; // 页码

 /**
* Entry program
​*/
 public function go($key) {
  $this->key = $key;
  $this->pageNum = $this->getPageNum();
  $this->getInfo();
 }

 /**
* Get supplier url link array
* @return ArrayObject
​*/
 private function getInfo() {
  if ($this->pageNum==1) { # 处理只有一页的情况
   $arr = $this->shopUrlMatchReArr($this->getContent());
   $this->isAddSuccess($arr);
  } elseif ($this->pageNum>1) { # 多页
   for ($i=1; $i<=$this->pageNum; $i++) {
    $arr = $this->shopUrlMatchReArr($this->getContent($i));
    $this->isAddSuccess($arr);
   }
  } 
 }
 
 /**
* Print whether the addition was successful
* @param ArrayObject $arr
* @return string
​*/
 private function isAddSuccess($arr) {
  foreach ($arr as $k=>$v) {
   if ($this->execAdd($this->getInfoByShopUrl($v))) {
    echo 'Add Success!!';
   } else {
    echo 'Add Faild!!';
   }
  }
 }

 /**
* Execute adding to database
* @param ArrayObject $infoArr
* @return Number Number of affected rows
​*/
 private function execAdd($infoArr) {
  $mysqli = $this->getDb();
  if (!empty($infoArr['company'])) {
   if (!$this->isExists($mysqli, $infoArr)) {
    $num = $mysqli->query("INSERT INTO ic(company,address,phone,mobile,fax,zip,person,qq,msn,email,website,regDate,shopUrl) VALUES ('{$infoArr['company']}','{$infoArr['address']}','{$infoArr['phone']}','{$infoArr['mobile']}','{$infoArr['fax']}','{$infoArr['zip']}','{$infoArr['person']}','{$infoArr['qq']}','{$infoArr['msn']}','{$infoArr['email']}','{$infoArr['website']}','{$infoArr['regDate']}','{$infoArr['shopUrl']}')");
    return $num;
   } else {
    return false; # 表示数据已经存在
   }
  } else {
   return false;
  }
 }

 /**
* Connect to database
​*/
 private function getDb() {
  $mysqli = new mysqli('localhost', 'root', '1715544', 'weiku');
  $mysqli->query('SET NAMES GB2312');
  return $mysqli;
 }

 /**
* Check if the company already exists
* @param Resource $mysqli
* @param ArrayObject $infoArr
* @return bool
​*/
 private function isExists($mysqli, $infoArr) {
  $mysqli->query("SELECT company FROM ic WHERE company = '{$infoArr['company']}'");
  if ($mysqli->affected_rows) {
   return true;
  } else {
   return false;
  }
 }

 /**
* Format string
* @param string $str
* @return string
​*/
 private function formatString($str) {
  return trim($str);
 }

 /**
* Grab information
* @param $url
* @return ArrayObject
​*/
 private function getInfoByShopUrl($url) {
  $re = $this->getUrlInfo($url);
  if (stristr($re, '')) $re = preg_replace('/.*/Usi', '', $re);
  preg_match_all('/(.+)/Usi', $re, $companyArr);
  preg_match_all('/地址:(.*)/Usi', $re, $addressArr);
  preg_match_all('/电话:(.*)/Usi', $re, $phoneArr);
  preg_match_all('/手机:(.*)/Usi', $re, $mobileArr);
  preg_match_all('/传真:(.*)/Usi', $re, $faxArr);
  preg_match_all('/邮编:(.*)/Usi', $re, $zipArr);
  preg_match_all('/联系人:(.*)/Usi', $re, $personArr);
  preg_match_all('/QQ:(.*)/Usi', $re, $qqArr);
  preg_match_all('/MSN:(.*)/Usi', $re, $msnArr);
  preg_match_all('/Email:(.*)/Usi', $re, $emailArr);
  preg_match_all('/网址:(.*)/Usi', $re, $websiteArr);
  preg_match_all('/注册日期:(.*)/Usi', $re, $regDateArr);
  $infoArr = array(
   'company'=>$this->formatString($companyArr[1][0]),
   'address'=>$this->formatString($addressArr[1][0]),
   'phone'=>$this->formatString($phoneArr[1][0]),
   'mobile'=>$this->formatString($mobileArr[1][0]),
   'fax'=>$this->formatString($faxArr[1][0]),
   'zip'=>$this->formatString($zipArr[1][0]),
   'person'=>$this->formatString($personArr[1][0]),
   'qq'=>$this->formatString($qqArr[1][0]),
   'msn'=>$this->formatString($msnArr[1][0]),
   'email'=>$this->formatString($emailArr[1][0]),
   'website'=>$this->stripATags($this->formatString($websiteArr[1][0])),
   'regDate'=>$this->formatString($regDateArr[1][0]),
   'shopUrl'=>$url
  );
  return $infoArr;
 }

 /**
* Get the supplier url array based on the page
* @param string $re
* @return ArrayObject
​*/
 private function shopUrlMatchReArr($re) {
  preg_match_all('/.+/Usi', $re, $arr);
  $arr = $this->formatUrlArr(array_unique($arr[1]));
  return $arr;
 }
 
 /**
* Format array
* @param Array $arr
* @return ArrayObject
​*/
 private function formatUrlArr($arr) {
  $newArr = array();
  foreach ($arr as $key=>$value) {
   if ($this->isExistsHttp($value)) {
    $newArr[$key] = $value;
   }
  }
  return $newArr;
 }
 
 /**
* Format QQ
* @param string $str
* @return string
​*/
 private function formatQqMsn($str, $e='QQ') {
  if (empty($str)) return '';
  preg_match_all('/alt="'.$e.':(.+)"/Usi', $str, $arr);
  if (count($arr[1])==1) return $arr[1][0];
  $newStr = null;
  foreach ($arr[1] as $value) {
   $newStr .= $value . ' ';
  }
  return rtrim($newStr, ' ');
 }

 /**
* Remove the A tag from the URL
* @param string $site
* @return string
​*/
 private function stripATags($site) {
  $site = preg_replace('/(.+)/', '1', $site);
  return $site;
 }

 /**
* Check if the url has http
* @param string $url
* @return bool
​*/
 private function isExistsHttp($url) {
  if (stristr($url, 'http://')) {
   return true;
  } else {
   return false;
  }
 }
 
 /**
* Get page content
* @param Number $page
* @return string
​*/
 private function getContent($page=1) {
  $re = file_get_contents($this->getUrl($this->key, $page));
  return $re;
 }
 
 /**
* Get page number
* @return Number
​*/
 private function getPageNum() {
  $i = null;
  $re = $this->getContent();
  preg_match_all('/共(.+)页/Usi', $re, $arr);
  $i = $arr[1][0];
  return $i;
 }

 /**
* Get URL link
* @param string $str
* @param int $page page number
* @return string
​*/
 private function getUrl($str, $page=1) {
  return "http://www.ic.net.cn/partsearch/searchinstock.asp?newtype=1&area=&Page={$page}&partnumber={$str}&mfg=&DateCode=&QTY=&PRICE=&Exact=&orderby=inputdate&qty_filter=50&usertype2=1&pack=";
 }

 /**
* Get page content
* @param string $url
* @return string
​*/
 private function getUrlInfo($url) {
  $re = file_get_contents($url);
  return $re;
 }
}

/*
程序运行思路:根据“IC 交易网”的IC搜索功能,输入型号进行搜索,然后抓取供应商信息

数据库结构
CREATE TABLE `ic` (
 `id` mediumint(8) unsigned NOT NULL auto_increment,
 `company` varchar(500) NOT NULL,
 `address` varchar(500) default NULL,
 `phone` varchar(500) default NULL,
 `mobile` varchar(500) default NULL,
 `fax` varchar(300) default NULL,
 `zip` varchar(300) default NULL,
 `person` varchar(500) default NULL,
 `qq` varchar(300) default NULL,
 `msn` varchar(300) default NULL,
 `email` varchar(500) default NULL,
 `website` varchar(300) default NULL,
 `regDate` varchar(500) default NULL,
 PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=gb2312
*/

$i = new ic();
$arr = array_unique(array('MAX3232', 'AML8613', 'MT6225A', 'OM8373PS/N3/A', 'PT7313', 'MAX8212ESA', 'TL431', 'S3C2440', 'TMS320F2812PGFA', 'PCM1704', 'AN6717', 'CA3162E', 'CA3161E', 'LM393N', 'DS18B20', 'SHT10', 'AML8613', 'AN6717', 'LM393N', 'CA3161E', 'CA3162E', 'PCM1704', 'STK392-040', 'K1667', 'MAX232', 'STM32F103', 'LM358'));
foreach ($arr as $v) {
 $i->go($v);
}
?>


摘自 Lee.的专栏

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/478401.htmlTechArticle?php /** * 抓取IC 交易网供应商主程序 * author Lee. * Last modify $Date: 2012-2-6 10:44:32$ * 注:本程序按照编码 GB2312 执行,因为IC 交易网网站是GB231...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1666
14
PHP Tutorial
1273
29
C# Tutorial
1252
24
PHP and Python: Comparing Two Popular Programming Languages PHP and Python: Comparing Two Popular Programming Languages Apr 14, 2025 am 12:13 AM

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

PHP: A Key Language for Web Development PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP in Action: Real-World Examples and Applications PHP in Action: Real-World Examples and Applications Apr 14, 2025 am 12:19 AM

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP vs. Python: Understanding the Differences PHP vs. Python: Understanding the Differences Apr 11, 2025 am 12:15 AM

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

The Enduring Relevance of PHP: Is It Still Alive? The Enduring Relevance of PHP: Is It Still Alive? Apr 14, 2025 am 12:12 AM

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP and Python: Code Examples and Comparison PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

PHP vs. Other Languages: A Comparison PHP vs. Other Languages: A Comparison Apr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

See all articles