扫码关注官方订阅号
认证0级讲师
我最初的实现考虑到字符串中包含的字符串的复杂性以及不可预测性,我决定获取字符串中的双引号的内容,代码如下:
public static void main(String[] args) { String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 +0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\""; String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\""; String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\""; String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 +0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\""; Pattern p = Pattern.compile("\"[\\w\\s\\p{Punct}&&[^\"]]*\""); List<String> lines = new ArrayList<String>(); lines.add(text1); lines.add(text2); lines.add(text3); lines.add(text4); for (String str : lines) { System.out.println("****************************************"); Matcher matcher = p.matcher(str); while (matcher.find()) { System.out.println(matcher.group()); } } }
输出结果如下:
**************************************** "GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1" "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "192.168.222.251" **************************************** "GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1" "-" "Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30" "192.168.222.35" **************************************** "GET /favicon.ico HTTP/1.1" "-" "Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30" "192.168.222.35" **************************************** "POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1" "-" "Mozilla/4.0" "101.226.62.82"
就这样我获取到字符串,其余部分的内容在根据subString()去截取。
显然这个做法不是最佳实践。后来我的leader看了之后,他说没必要这么复杂,果然没过几分钟,给我写了一个新的正则表达式。
改进的实现
public static void main(String[] args) { String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 +0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\""; String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\""; String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\""; String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 +0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\""; Pattern p = Pattern.compile( "^([\\d.]+) (\\S+) (\\S+) \\[(.+)\\] \"(GET|POST|DELETE|PUT|HEAD) (\\S+) (\\S+)\" (\\d+) (\\d+) \"(\\S+)\" \"(.+)\" (\\S+) (\\S+) \"([\\d.]+)\""); List<String> lines = new ArrayList<String>(); lines.add(text1); lines.add(text2); lines.add(text3); lines.add(text4); for (String line : lines) { System.out.println("****************************************"); Matcher matcher = p.matcher(line); if (matcher.find()) { System.out.print(matcher.group(4) + " "); System.out.print(matcher.group(5) + " "); System.out.print(matcher.group(6) + " "); System.out.print(matcher.group(8) + " "); System.out.println(matcher.group(14)); } } }
输出结果:
**************************************** 05/Nov/2015:15:06:34 +0800 GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 200 192.168.222.251 **************************************** 05/Nov/2015:15:24:40 +0800 GET /accounts/54fd0571e4b055a0030461fb 200 192.168.222.35 **************************************** 05/Nov/2015:15:24:40 +0800 GET /favicon.ico 404 192.168.222.35 **************************************** 05/Nov/2015:23:55:11 +0800 POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 200 101.226.62.82
现在可以做到想取哪部分group结果就取哪部分。
微信扫码关注PHP中文网服务号
QQ扫码加入技术交流群
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号
PHP学习
技术支持
返回顶部
我最初的实现
考虑到字符串中包含的字符串的复杂性以及不可预测性,我决定获取字符串中的双引号的内容,代码如下:
输出结果如下:
就这样我获取到字符串,其余部分的内容在根据subString()去截取。
显然这个做法不是最佳实践。后来我的leader看了之后,他说没必要这么复杂,果然没过几分钟,给我写了一个新的正则表达式。
改进的实现
输出结果:
现在可以做到想取哪部分group结果就取哪部分。