Table of Contents
回复讨论(解决方案)
Home Backend Development PHP Tutorial php解压有时会失败

php解压有时会失败

Jun 23, 2016 pm 01:20 PM

采集一个网站的数据时,返回的是以chunked编码,gzip压缩的文档,该网站的服务器显示是IIS,。。。

解码chunked没问题,但是解压gzip压缩文档时,偶尔会失败,这样就影响我提取下一组请求连接了。。。

解压10组左右,就会出现解压失败的情况。。

这是解压前的数据:



解压后的数据:


显然在最后一组,解压失败了。。

这是尝试用过的三组方法:

 private function _deCompressData()   {       if($this->is_gzip) {          $this->response_body =  gzinflate(substr($this->response_body,10));           //           //           if($temp = gzdecode($this->response_body)) {//               $this->response_body = $temp;//           } else {//              $this->response_body =  $this->mygzdecode($this->response_body);//           }                     //$this->response_body =  $this->mygzdecode($this->response_body);             //         $this->response_body = gzdecode($this->response_body);       }   }
Copy after login


mygzdecode函数是这一个

 /**    * @desc 自定义解压函数    */   function mygzdecode($data, &$filename = '', &$error = '', $maxlength = null)    {        $len = strlen($data);        if ($len < 18 || strcmp(substr($data, 0, 2), "\x1f\x8b")) {            $error = "Not in GZIP format.";            return null;  // Not GZIP format (See RFC 1952)        }        $method = ord(substr($data, 2, 1));  // Compression method        $flags = ord(substr($data, 3, 1));  // Flags        if ($flags & 31 != $flags) {            $error = "Reserved bits not allowed.";            return null;        }        // NOTE: $mtime may be negative (PHP integer limitations)        $mtime = unpack("V", substr($data, 4, 4));        $mtime = $mtime[1];        $xfl = substr($data, 8, 1);        $os = substr($data, 8, 1);        $headerlen = 10;        $extralen = 0;        $extra = "";        if ($flags & 4) {            // 2-byte length prefixed EXTRA data in header            if ($len - $headerlen - 2 < 8) {                return false;  // invalid            }            $extralen = unpack("v", substr($data, 8, 2));            $extralen = $extralen[1];            if ($len - $headerlen - 2 - $extralen < 8) {                return false;  // invalid            }            $extra = substr($data, 10, $extralen);            $headerlen += 2 + $extralen;        }        $filenamelen = 0;        $filename = "";        if ($flags & 8) {            // C-style string            if ($len - $headerlen - 1 < 8) {                return false; // invalid            }            $filenamelen = strpos(substr($data, $headerlen), chr(0));            if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {                return false; // invalid            }            $filename = substr($data, $headerlen, $filenamelen);            $headerlen += $filenamelen + 1;        }        $commentlen = 0;        $comment = "";        if ($flags & 16) {            // C-style string COMMENT data in header            if ($len - $headerlen - 1 < 8) {                return false;    // invalid            }            $commentlen = strpos(substr($data, $headerlen), chr(0));            if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {                return false;    // Invalid header format            }            $comment = substr($data, $headerlen, $commentlen);            $headerlen += $commentlen + 1;        }        $headercrc = "";        if ($flags & 2) {            // 2-bytes (lowest order) of CRC32 on header present            if ($len - $headerlen - 2 < 8) {                return false;    // invalid            }            $calccrc = crc32(substr($data, 0, $headerlen)) & 0xffff;            $headercrc = unpack("v", substr($data, $headerlen, 2));            $headercrc = $headercrc[1];            if ($headercrc != $calccrc) {                $error = "Header checksum failed.";                return false;    // Bad header CRC            }            $headerlen += 2;        }        // GZIP FOOTER        $datacrc = unpack("V", substr($data, -8, 4));        $datacrc = sprintf('%u', $datacrc[1] & 0xFFFFFFFF);        $isize = unpack("V", substr($data, -4));        $isize = $isize[1];        // decompression:        $bodylen = $len - $headerlen - 8;        if ($bodylen < 1) {            // IMPLEMENTATION BUG!            return null;        }        $body = substr($data, $headerlen, $bodylen);        $data = "";        if ($bodylen > 0) {            switch ($method) {                case 8:                    // Currently the only supported compression method:                    $data = gzinflate($body, $maxlength);                    break;                default:                    $error = "Unknown compression method.";                    return false;            }        }  // zero-byte body content is allowed        // Verifiy CRC32        $crc = sprintf("%u", crc32($data));        $crcOK = $crc == $datacrc;        $lenOK = $isize == strlen($data);        if (!$lenOK || !$crcOK) {            $error = ( $lenOK ? '' : 'Length check FAILED. ') . ( $crcOK ? '' : 'Checksum FAILED.');            return false;        }        return $data;    }
Copy after login



也就是说,连续解压时,会出现解压失败的情况


回复讨论(解决方案)

php 已经提供了 gzdecode 函数
如果你的 php 版本实在很低,没有 gzdecode 函数
那么 php 代码级的 gzdecode 函数是

function gzdecode($data) {   $len = strlen($data);   if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {     return $data;  // Not GZIP format (See RFC 1952)   }   $method = ord(substr($data,2,1));  // Compression method   $flags  = ord(substr($data,3,1));  // Flags   if ($flags & 31 != $flags) {     // Reserved bits are set -- NOT ALLOWED by RFC 1952     return data;   }   // NOTE: $mtime may be negative (PHP integer limitations)   $mtime = unpack("V", substr($data,4,4));   $mtime = $mtime[1];   $xfl   = substr($data,8,1);   $os    = substr($data,8,1);   $headerlen = 10;   $extralen  = 0;   $extra     = "";   if ($flags & 4) {     // 2-byte length prefixed EXTRA data in header     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $extralen = unpack("v",substr($data,8,2));     $extralen = $extralen[1];     if ($len - $headerlen - 2 - $extralen < 8) {       return false;    // Invalid format     }     $extra = substr($data,10,$extralen);     $headerlen += 2 + $extralen;   }   $filenamelen = 0;   $filename = "";   if ($flags & 8) {     // C-style string file NAME data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $filenamelen = strpos(substr($data,8+$extralen),chr(0));     if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {       return false;    // Invalid format     }     $filename = substr($data,$headerlen,$filenamelen);     $headerlen += $filenamelen + 1;   }   $commentlen = 0;   $comment = "";   if ($flags & 16) {     // C-style string COMMENT data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));     if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {       return false;    // Invalid header format     }     $comment = substr($data,$headerlen,$commentlen);     $headerlen += $commentlen + 1;   }   $headercrc = "";   if ($flags & 1) {     // 2-bytes (lowest order) of CRC32 on header present     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;     $headercrc = unpack("v", substr($data,$headerlen,2));     $headercrc = $headercrc[1];     if ($headercrc != $calccrc) {       return false;    // Bad header CRC     }     $headerlen += 2;   }   // GZIP FOOTER - These be negative due to PHP's limitations   $datacrc = unpack("V",substr($data,-8,4));   $datacrc = $datacrc[1];   $isize = unpack("V",substr($data,-4));   $isize = $isize[1];   // Perform the decompression:   $bodylen = $len-$headerlen-8;   if ($bodylen < 1) {     // This should never happen - IMPLEMENTATION BUG!     return null;   }   $body = substr($data,$headerlen,$bodylen);   $data = "";   if ($bodylen > 0) {     switch ($method) {       case 8:         // Currently the only supported compression method:         $data = gzinflate($body);         break;       default:         // Unknown compression method         return false;     }   } else {     // I'm not sure if zero-byte body content is allowed.     // Allow it for now...  Do nothing...   }   // Verifiy decompressed size and CRC32:   // NOTE: This may fail with large data sizes depending on how   //       PHP's integer limitations affect strlen() since $isize   //       may be negative for large sizes.   if ($isize != strlen($data) || crc32($data) != $datacrc) {     // Bad format!  Length or CRC doesn't match!     return false;   }   return $data; }
Copy after login
Copy after login

自己对比一下,看看是否是你抄写错了

既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作

php 已经提供了 gzdecode 函数
如果你的 php 版本实在很低,没有 gzdecode 函数
那么 php 代码级的 gzdecode 函数是

function gzdecode($data) {   $len = strlen($data);   if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {     return $data;  // Not GZIP format (See RFC 1952)   }   $method = ord(substr($data,2,1));  // Compression method   $flags  = ord(substr($data,3,1));  // Flags   if ($flags & 31 != $flags) {     // Reserved bits are set -- NOT ALLOWED by RFC 1952     return data;   }   // NOTE: $mtime may be negative (PHP integer limitations)   $mtime = unpack("V", substr($data,4,4));   $mtime = $mtime[1];   $xfl   = substr($data,8,1);   $os    = substr($data,8,1);   $headerlen = 10;   $extralen  = 0;   $extra     = "";   if ($flags & 4) {     // 2-byte length prefixed EXTRA data in header     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $extralen = unpack("v",substr($data,8,2));     $extralen = $extralen[1];     if ($len - $headerlen - 2 - $extralen < 8) {       return false;    // Invalid format     }     $extra = substr($data,10,$extralen);     $headerlen += 2 + $extralen;   }   $filenamelen = 0;   $filename = "";   if ($flags & 8) {     // C-style string file NAME data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $filenamelen = strpos(substr($data,8+$extralen),chr(0));     if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {       return false;    // Invalid format     }     $filename = substr($data,$headerlen,$filenamelen);     $headerlen += $filenamelen + 1;   }   $commentlen = 0;   $comment = "";   if ($flags & 16) {     // C-style string COMMENT data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));     if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {       return false;    // Invalid header format     }     $comment = substr($data,$headerlen,$commentlen);     $headerlen += $commentlen + 1;   }   $headercrc = "";   if ($flags & 1) {     // 2-bytes (lowest order) of CRC32 on header present     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;     $headercrc = unpack("v", substr($data,$headerlen,2));     $headercrc = $headercrc[1];     if ($headercrc != $calccrc) {       return false;    // Bad header CRC     }     $headerlen += 2;   }   // GZIP FOOTER - These be negative due to PHP's limitations   $datacrc = unpack("V",substr($data,-8,4));   $datacrc = $datacrc[1];   $isize = unpack("V",substr($data,-4));   $isize = $isize[1];   // Perform the decompression:   $bodylen = $len-$headerlen-8;   if ($bodylen < 1) {     // This should never happen - IMPLEMENTATION BUG!     return null;   }   $body = substr($data,$headerlen,$bodylen);   $data = "";   if ($bodylen > 0) {     switch ($method) {       case 8:         // Currently the only supported compression method:         $data = gzinflate($body);         break;       default:         // Unknown compression method         return false;     }   } else {     // I'm not sure if zero-byte body content is allowed.     // Allow it for now...  Do nothing...   }   // Verifiy decompressed size and CRC32:   // NOTE: This may fail with large data sizes depending on how   //       PHP's integer limitations affect strlen() since $isize   //       may be negative for large sizes.   if ($isize != strlen($data) || crc32($data) != $datacrc) {     // Bad format!  Length or CRC doesn't match!     return false;   }   return $data; }
Copy after login
Copy after login




我的是PHP 5.6  ,
gzinflate(substr($this->response_body,10));

gzdecode($this->response_body)

mygzdecode($this->response_body);

这三种方法都可以用,但都遇到同一个问题,连续解压时,会出现解压失败的问题。


大婶,新年快乐哈

自己对比一下,看看是否是你抄写错了

既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作



好的。 

在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了

主要是你要有容错策略

自己对比一下,看看是否是你抄写错了

既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作




 // Verifiy CRC32
        $crc = sprintf("%u", crc32($data));
        $crcOK = $crc == $datacrc;
        $lenOK = $isize == strlen($data);
        if (!$lenOK || !$crcOK) {
            $this->status = ( $lenOK ? '' : 'Length check FAILED. ') . ( $crcOK ? '' : 'Checksum FAILED.');
            return false;
        }
        return $data;
检测出来了,是这里校验失败了。。。


对链接http://www.cnu.cc/works/111706发起请求
Length check FAILED. Checksum FAILED.

在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了

主要是你要有容错策略



对。。。  这个地方确实需要加强。。。只做了重置连接,没有对收到数据的完整性做校验。。

在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了

主要是你要有容错策略




OK了,连续采集10分钟,没出问题  。。。THX,,摸摸大  

传输过程出问题,导致部分数据没有了,而解压失败。

把需要解压的文件加入解压列表,每隔5秒-10秒判断解压文件是否变化,如无变化,则解压,解压失败做标记,继续下一个解压。

传输过程出问题,导致部分数据没有了,而解压失败。



Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1676
14
PHP Tutorial
1278
29
C# Tutorial
1257
24
Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1? Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1? Apr 17, 2025 am 12:06 AM

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values ​​to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

How does PHP type hinting work, including scalar types, return types, union types, and nullable types? How does PHP type hinting work, including scalar types, return types, union types, and nullable types? Apr 17, 2025 am 12:25 AM

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values ​​and handle functions that may return null values.

PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

How do you prevent SQL Injection in PHP? (Prepared statements, PDO) How do you prevent SQL Injection in PHP? (Prepared statements, PDO) Apr 15, 2025 am 12:15 AM

Using preprocessing statements and PDO in PHP can effectively prevent SQL injection attacks. 1) Use PDO to connect to the database and set the error mode. 2) Create preprocessing statements through the prepare method and pass data using placeholders and execute methods. 3) Process query results and ensure the security and performance of the code.

PHP and Python: Code Examples and Comparison PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

PHP: Handling Databases and Server-Side Logic PHP: Handling Databases and Server-Side Logic Apr 15, 2025 am 12:15 AM

PHP uses MySQLi and PDO extensions to interact in database operations and server-side logic processing, and processes server-side logic through functions such as session management. 1) Use MySQLi or PDO to connect to the database and execute SQL queries. 2) Handle HTTP requests and user status through session management and other functions. 3) Use transactions to ensure the atomicity of database operations. 4) Prevent SQL injection, use exception handling and closing connections for debugging. 5) Optimize performance through indexing and cache, write highly readable code and perform error handling.

PHP's Purpose: Building Dynamic Websites PHP's Purpose: Building Dynamic Websites Apr 15, 2025 am 12:18 AM

PHP is used to build dynamic websites, and its core functions include: 1. Generate dynamic content and generate web pages in real time by connecting with the database; 2. Process user interaction and form submissions, verify inputs and respond to operations; 3. Manage sessions and user authentication to provide a personalized experience; 4. Optimize performance and follow best practices to improve website efficiency and security.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

See all articles