采集练习(一) php 获得全国的小学(数据来自腾讯朋友网)
注:发现腾讯朋友网已经改版,部分参数需要自己获得修改 !!!
年前有个需求获得某省的小学数据,分析了下朋友网的小学学校发现可以获得相关数据。
如获得 湖南省郴州市宜章县的全部小学
发现网页请求的地址是
http://api.pengyou.com/json.php?cb=__i_3&mod=school&act=selector&schooltype=6&country=0&province=43&district=431022&g_tk=1964222334
这里返回的是一个json
document.domain = "pengyou.com"; __i_3({"code":0,"subcode":0,"......});
解析后发现里面是 宜章县的全部小学。。。
分析了下参数
schooltype=6 表示小学
country = 0 表示 中国
province = 43 表示湖南省
district = 431022 表示宜章县
g_tk = 1964222334 不清楚 估计是随机数
有了这几个参数 就可以自己获得相应的 小学了。。
获得 湖南省郴州市 的所有县: http://api.pengyou.com/json.php?cb=__i_6&mod=getdistrict&cityid=4310&district_obj_name=_distinct&g_tk=271354436
要获得 学校必须获得province 、district 的值 但我没发现相应的网络请求获得相应的 province 、district 于是到页面上查找 发现 province 的值来自
http://cn.qzonestyle.gtimg.cn/campus/js/locations.js
需要解决的问题:
1、 获得locations.js 里的 省份 城市 id 值 时 需要 用到正则表达式
2、 根据 市 id 获得县 id
3、file_get_contents 获得 相关学校时 需要带上 相应的 user_agent 并配置 否则获不到数据。
以下是相应的代码
header("Content-type:text/html; charset=utf-8");set_time_limit(0);$js_data = @file_get_contents("locations.js");preg_match_all("/;location_array\[([0-9]{2})?\]='([^']+)?'/",$js_data,$locations);$datas = array();if(array_filter($locations[1]) && array_filter($locations[2])){ foreach($locations[1] as $key => $val){ preg_match_all("/;sublocation_array\[".$val."\]\[([0-9]{4,})\]='([^']+)?'/", $js_data, $matches); $datas[$val]['name']= $locations[2][$key]; foreach($matches[1] as $k =>$v){ $datas[$val]['sub'][$v] = $matches[2][$k]; } } }function getDatas($url){ $getPageSetting = array( 'http' => array( 'timeout' => 5, 'method' => 'GET', 'protocol_version'=>'1.1', 'header' => "User-Agent: Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7\r\n" . //"Referer: http://......php\r\n".浏览器访问过的,上一个页面的整个url地址字符串,直接在地址栏输入url访问此页面则没有此项 "Host: isdspeed.qq.com\r\n" .//这项可以省略,如果这里设置错误会报错:failed to open stream: HTTP request failed! "Accept-Language: zh-cn,zh;q=0.5\r\n" . "Accept-Encoding: gzip, deflate\r\n" . "Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3\r\n" . "Content-Type:application/x-www-form-urlencoded". "Accept: text/javascript, application/javascript, */*\r\n" . "Connection: keep-alive\r\n\r\n" ) ); //$getHtml= file_get_contents($url, FALSE, stream_context_create($getPageSetting)); // 发现腾讯朋友网已经改版 所以直接用 file_get_contents 获得 $getHtml = file_get_contents($url); return $getHtml; }/** * 创建文件夹 * @param string $path 文件夹路径 */function createFolder($path){ if (!file_exists($path)) { createFolder(dirname($path)); mkdir($path, 0777); }}$areas = array();// 获得相关省市县的小学foreach ($datas as $pid=>$rows){ foreach($rows as $k=>$v){ if($k =='sub'){ foreach($v as $cid =>$city){ $cityUrl = "http://api.pengyou.com/json.php?mod=getdistrict&cityid=".$cid."&district_obj_name=_distinct&g_tk=1523170442"; $result = getDatas($cityUrl); $districtIds = json_decode($result,true); $areas[$pid][$cid] = $districtIds['result']['district_arr']; $district_arr= $districtIds['result']['district_arr']; foreach($district_arr as $did =>$district){ $url = "http://api.pengyou.com/json.php?&mod=school&act=selector&schooltype=6&country=0&province=".$pid."&district=".$did."&g_tk=1523170442"; $schools = getDatas($url); $schools = json_decode($schools,true); $school_data = str_replace("·","\r\n",strip_tags($schools['result'])); $dirs = "school/".iconv('utf-8', 'gbk', $rows['name'])."/".iconv('utf-8', 'gbk', $city); createFolder($dirs); @file_put_contents($dirs.'/'.iconv('utf-8', 'gbk', $district).'.txt', $school_data); } } } }}echo '<pre class="brush:php;toolbar:false">';print_r($areas);

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.
