I am downloading pages from the web using a variety of methods, including this:
No matter which method is used for downloading, quite often the HTML contains characters that are encoded somehow, such as: é and º and a
If I save the text to a file, eg:
...some of the issues are dealt with (eg: é becomes é , º becomes º , and a becomes )
Is there a way I can do this conversion without saving to a file and reloading? (or hard-coding conversions as I find them!)
I've tried several things with no luck, including this:
If possible, I'd also like to convert characters with accents etc to their 'simple' character (eg: instead of é and Ø I'd like to get e and O)
Code:
Dim URLString = "http://www.example.com/page.htm"
Dim MyWebClient As Net.WebClient = New Net.WebClient()
Dim HTML as String = MyWebClient.DownloadString(URLString)
If I save the text to a file, eg:
Code:
My.Computer.FileSystem.WriteAllText(filePath, HTML, False, System.Text.Encoding.Default)
Is there a way I can do this conversion without saving to a file and reloading? (or hard-coding conversions as I find them!)
I've tried several things with no luck, including this:
Code:
Dim encodedBytes As Byte() = System.Text.UTF8Encoding.UTF8.GetBytes(HTML)
Dim decodedString As String = System.Text.UTF8Encoding.UTF8.GetString(encodedBytes)
If possible, I'd also like to convert characters with accents etc to their 'simple' character (eg: instead of é and Ø I'd like to get e and O)