


Example of using C# to obtain the HTML source code of a web page
I'm working on a project recently, and one of the functions is to get the source code of a web page based on a URL address. In ASP.NET (C#), there seem to be many ways to obtain the source code of a web page. I just made a simple WebClient, which is very simple and easy. But a very annoying problem came out later, and that was the garbled Chinese characters.
After careful study, Chinese web pages are nothing more than GB2312 and UTF-8. So we have the following code:
/// <summary> /// 根据网址的URL,获取源代码HTML /// </summary> /// <param name="url"></param> /// <returns></returns> public static string GetHtmlByUrl(string url) { using (WebClient wc = new WebClient()) { try { wc.UseDefaultCredentials = true; wc.Proxy = new WebProxy(); wc.Proxy.Credentials = CredentialCache.DefaultCredentials; wc.Credentials = System.Net.CredentialCache.DefaultCredentials; byte[] bt = wc.DownloadData(url); string txt = System.Text.Encoding.GetEncoding("GB2312").GetString(bt); switch (GetCharset(txt).ToUpper()) { case "UTF-8": txt = System.Text.Encoding.UTF8.GetString(bt); break; case "UNICODE": txt = System.Text.Encoding.Unicode.GetString(bt); break; default: break; } return txt; } catch (Exception ex) { return null; } } }
To explain a little bit, WebClient is used here to create a wc object (the naming is a bit awkward). Then call the DownloadData method of the wc object, pass in the URL value, and return a byte array. By default, GB2312 is used to read this byte array and convert it into a string. Find the characteristic characters of the encoding format of the web page from the string of the web page source code, such as finding information such as charset="utf-8", to determine the encoding format of the current web page.
The GetCharset function is used to obtain the encoding format of the current web page. The specific code is as follows:
/// <summary> /// 从HTML中获取获取charset /// </summary> /// <param name="html"></param> /// <returns></returns> public static string GetCharset(string html) { string charset = ""; Regex regCharset = new Regex(@"content=[""'].*\s*charset\b\s*=\s*""?(?<charset>[^""']*)", RegexOptions.IgnoreCase); if (regCharset.IsMatch(html)) { charset = regCharset.Match(html).Groups["charset"].Value; } if (charset.Equals("")) { regCharset = new Regex(@"<\s*meta\s*charset\s*=\s*[""']?(?<charset>[^""']*)", RegexOptions.IgnoreCase); if (regCharset.IsMatch(html)) { charset = regCharset.Match(html).Groups["charset"].Value; } } return charset; }
For more related articles on examples of using C# to obtain the HTML source code of web pages, please pay attention to the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











C#.NET is still important because it provides powerful tools and libraries that support multiple application development. 1) C# combines .NET framework to make development efficient and convenient. 2) C#'s type safety and garbage collection mechanism enhance its advantages. 3) .NET provides a cross-platform running environment and rich APIs, improving development flexibility.

C#.NETisversatileforbothwebanddesktopdevelopment.1)Forweb,useASP.NETfordynamicapplications.2)Fordesktop,employWindowsFormsorWPFforrichinterfaces.3)UseXamarinforcross-platformdevelopment,enablingcodesharingacrossWindows,macOS,Linux,andmobiledevices.

C# is widely used in enterprise-level applications, game development, mobile applications and web development. 1) In enterprise-level applications, C# is often used for ASP.NETCore to develop WebAPI. 2) In game development, C# is combined with the Unity engine to realize role control and other functions. 3) C# supports polymorphism and asynchronous programming to improve code flexibility and application performance.

How to deploy a C# .NET app to Azure or AWS? The answer is to use AzureAppService and AWSElasticBeanstalk. 1. On Azure, automate deployment using AzureAppService and AzurePipelines. 2. On AWS, use Amazon ElasticBeanstalk and AWSLambda to implement deployment and serverless compute.

C# and .NET adapt to the needs of emerging technologies through continuous updates and optimizations. 1) C# 9.0 and .NET5 introduce record type and performance optimization. 2) .NETCore enhances cloud native and containerized support. 3) ASP.NETCore integrates with modern web technologies. 4) ML.NET supports machine learning and artificial intelligence. 5) Asynchronous programming and best practices improve performance.

C# and .NET runtime work closely together to empower developers to efficient, powerful and cross-platform development capabilities. 1) C# is a type-safe and object-oriented programming language designed to integrate seamlessly with the .NET framework. 2) The .NET runtime manages the execution of C# code, provides garbage collection, type safety and other services, and ensures efficient and cross-platform operation.

The relationship between C# and .NET is inseparable, but they are not the same thing. C# is a programming language, while .NET is a development platform. C# is used to write code, compile into .NET's intermediate language (IL), and executed by the .NET runtime (CLR).

To start C#.NET development, you need to: 1. Understand the basic knowledge of C# and the core concepts of the .NET framework; 2. Master the basic concepts of variables, data types, control structures, functions and classes; 3. Learn advanced features of C#, such as LINQ and asynchronous programming; 4. Be familiar with debugging techniques and performance optimization methods for common errors. With these steps, you can gradually penetrate the world of C#.NET and write efficient applications.
