Quantcast
Channel: VBForums - Visual Basic .NET
Viewing all articles
Browse latest Browse all 27350

VS 2008 Decoding characters obtained from web pages

$
0
0
I am downloading pages from the web using a variety of methods, including this:
Code:

          Dim URLString = "http://www.example.com/page.htm"
          Dim MyWebClient As Net.WebClient = New Net.WebClient()
          Dim HTML as String = MyWebClient.DownloadString(URLString)

No matter which method is used for downloading, quite often the HTML contains characters that are encoded somehow, such as: é and º and a€˜

If I save the text to a file, eg:
Code:

My.Computer.FileSystem.WriteAllText(filePath, HTML, False, System.Text.Encoding.Default)
...some of the issues are dealt with (eg: é becomes é , º becomes º , and a€˜ becomes ‘ )

Is there a way I can do this conversion without saving to a file and reloading? (or hard-coding conversions as I find them!)

I've tried several things with no luck, including this:
Code:

    Dim encodedBytes As Byte() = System.Text.UTF8Encoding.UTF8.GetBytes(HTML)
    Dim decodedString As String = System.Text.UTF8Encoding.UTF8.GetString(encodedBytes)



If possible, I'd also like to convert characters with accents etc to their 'simple' character (eg: instead of é and Ø I'd like to get e and O)

Viewing all articles
Browse latest Browse all 27350

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>