Skip navigation.

Long Island .NET User Group Coming SoonAll recent postsLook-The-Same Obsession

Unicode in Visual Studio.NET 2003

It all began a couple of weeks ago when I worked on a Spanish site. I didn't expect a little paragraph of straightforward markup to cause this much trouble and help me understand the <globalization> section of web.config better. What was supposed to look like this:

Correct rendering of Spanish text

Rendered as:

Wrong rendering of Spanish text

Back then I let it go, but in the back of my mind I knew it was wrong. All along I've read and heard that text in .NET was Unicode by default.

Part I

In my C++ days I had to add a #define UNICODE directive to have the compiler import the right set of libraries for wide-character (Unicode) text manipulation and use LPWSTR or LPTSTR declarations of string pointers. When allocating memory you had to really watch it if it was for Unicode. It was a major pain.

.NET makes it a lot easier to develop applications with internationalization and localization in mind. This ease of handling Unicode was one of the selling points for me when I was first introduced to .NET. According to Jeffrey Richter

Int the CLR, all characters are represented as 16-bit Unicode code values and strings are composed of 16-bit Unicode code values. This makes working with characters and strings easy at run time.

This is where we need to talk about encoding. Quoting Jeffrey Richter further:

At times, however, you want to save strings to a file and transmit them over a network. If the strings consist mostly of characters readable by English-speaking people, then saving or transmitting a set of 16-bit values isn't very efficient because half of the bytes written would contain zeros. Instead, it would be more efficient to encode the 16-bit value into a compressed array of bytes and then decode the array of bytes back into an array of 16-bit values.

Knowing this I thought it was strange that Spanish characters came out all garbled in Firefox, Opera and Internet Explorer/Win.

All text processors I used over the years prepended file contents with a "Unicode signature". In geek parlance it's known as the Byte Order Mark (BOM). In a nutshell, BOM gives a hint to a text processor whether the file is encoded in some UTF format (UTF-16, UTF-8, UTF-7, etc). Even though the main purpose of BOM is to define the ordering of bytes in a text stream and therefore it's not essential that a UTF-8 encoded stream contain BOM (the Byte Order Mark (BOM) FAQ explains why), serious text processors nevertheless store it to avoid ambiguity.

Since text in .NET is all Unicode by default, my senses were telling me something was wrong with the format of my source file itself. There were no visible screw-ups because it was just plain HTML, so I decided to look at it in HEX. To my surprise the ASPX page had no "Unicode signature"!

This led me to mighty Google newsgroups where I found an advice to do the following: go to the File menu, and select Advanced Save Options.

Note: you must be editing a source file to have this option.

Whoa! You can select different encodings. By default, UTF-8 without signature is selected:

Advanced save options

I saved the file with UTF-8 with signature instead, requested my Spanish page and everything looked correct this time around. I also noticed that as long as I had the page opened in Visual Studio.NET it would save with the Unicode signature. But if I closed and reopened it any memory of my previous selection was lost, and I was back to saving it as UTF-8 without signature without realizing it!

Now, here's what I don't understand. Why are there two options: with and without BOM? I hope it's not about saving 3 bytes because these savings are ridiculous. Since this feature made it this far into VS.NET somebody must've given much thought to it and there's gotta be a reason. I'm very curious what this reason is. You use Unicode to play safe and have an ability to display pretty much any character in the world, so why this signature or no-signature saga? I say signature for UTF-8 it is. Always. And if there's a strong reason to keep both options, I advocate the one with the signature. Without the signature VS.NET fools itself and doesn't realize there are international characters in the source.

Here's another thought. Targeting only the US market, as big as it is, is short-term thinking. Locking yourself to a local codepage is narrow-minded thinking. You never know where your code ends up or who you hire to work with it. Stick to Unicode.

<digression>In one of my CS classes at BYU there was a guy who was surprised to find out people wrote Pascal code in English all around the globe! On a different note, when I worked with Sony developers they sent us some C++ code with comments all in Japanese. I can read 3 languages but Japanese isn't among them (I'm working on it. This language fascinates me). We never codeciphered the comments. This is to back up my point you never know where your code lands later in time.</digression>

The thought that saving reverts to UTF-8 without signature didn't feel right. I started digging deeper. Incidentally, I found a peculiar attribute of the <globalization> tag in web.config. The attribute is fileEncoding. MSDN defines it as follows:

Specifies the default encoding for .aspx, .asmx, and .asax file parsing. Unicode and UTF-8 files saved with the byte order mark prefix will be automatically recognized regardless of the value of fileEncoding.

"Automatically recognized" and "regardless" felt good. Thus I modified my web.config to contain this <globalization> element:

<globalization 
   requestEncoding="utf-8" 
   responseEncoding="utf-8"
   fileEncoding="utf-8" />

By the way, when you create a brand new web project both the request and response encodings are set to UTF-8 as shown above.

The setting of fileEncoding seemed to fix my problem. I saved my Spanish page with and without the BOM, and both times it came out just right in web browsers. fileEncoding seems to tell the page parser to treat a page as Unicode no matter what, which I welcome.

Then I started thinking, "How does it do it? How is this setting enforced?" Armed with Reflector I found a class in the System.Web.Configuration namespace called GlobalizationConfig. The class is marked as internal and therefore is not documented on MSDN. Its LoadValuesFromConfigurationXml method reads the values of fileEncoding, requestEncoding, responseEncoding, culture and uiCulture from web.config and initializes properties with corresponding names.

A diagram of the GlobalizationConfig class

Tracing further the FileEncoding property I arrived at ReaderFromFile method found in System.Web.UI.Util:

internal static TextReader ReaderFromFile (
       string filename, HttpContext context,  string configPath)

{

TextReader reader1;
GlobalizationConfig config1;
Encoding encoding1 = null;

if (context != null)
{
  if (configPath == null)
  {
   config1 = ((GlobalizationConfig) context.GetConfig(
              "system.web/globalization"));
  }
  else
  {
   config1 = ((GlobalizationConfig) context.GetConfig(
               "system.web/globalization", configPath));
  }

  if (config1 != null)
  { encoding1 = config1.FileEncoding; }
}

if (encoding1 == null)
{ encoding1 = Encoding.Default; }
try
{return new StreamReader(filename, encoding1, true, 4096);}
catch (UnauthorizedAccessException)
{ ... }
 
}
 return reader1;
}

Does system.web/globalization look familiar? As you can see, an instance of StreamReader is created with a certain encoding. If you specify no file encoding in web.config a default one is used. What does it default to, though?

internal static Encoding CreateDefaultEncoding()
{
 int num1 = Win32Native.GetACP();

 if (num1 == 1252)
 { return new CodePageEncoding(num1); }

 return Encoding.GetEncoding(num1);
}

GetACP is an old Windows API function which "retrieves the current ANSI code-page identifier for the system". You can find the CreateDefaultEncoding method in System.Text.Encoding.

Let's recap what we've learned. web.config contains an important section, <globalization>, with an attribute, fileEncoding, that controls what encoding source files are read in. If you work with Unicode and (might) have international characters in your ASPX pages setting the fileEncoding attribute to "utf-8" seems to be a good idea. Otherwise your pages will be processed according to the current ANSI code-page settings.

Part II

While we're on this subject let's also talk about the GlobalizationConfig config class and response/request encodings. The subject is quite important and I decided to cover it here. Members of the <globalization> section of web.config affect the encoding of HTTP responses.

The HttpResponse class has a property, ContentEncoding, which participates in construction of correct HTTP headers.

A response should contain the Content-Type header which is of utmost importance. A typical response of an ASP.NET page is shown below:

HTTP/1.x 200 OK
Date: Sun, 18 Jul 2004 05:06:38 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 4262

If you request an RSS feed you see something along these lines:

HTTP/1.x 200 OK
...
Content-Type: text/xml
...

The Content-Type header tells the browser what it is it's receiving. In ASP.NET this header is built by the GenerateResponseHeaders method of System.Web.HttpResponse:

private ArrayList GenerateResponseHeaders(bool forCache)
{
 ...
 text2 = this._contentType;

 if ((this._contentType.IndexOf("charset=") < 0) && 
     (this._customCharSet || ((this._httpWriter != null) && 
     this._httpWriter.ResponseEncodingUsed)))
 {
  text3 = this.Charset;

  if (text3.Length > 0)
  { text2 = this._contentType + "; charset=" + text3; }

 }
 ...
}

Two important points here: what are this._contentType and this.Charset that are used to build the Content-Type header? The class has a public property, ContentType, which, I'm sure, most of you have set more than once via HttpContext.Response.ContentType="...". It is pre-initialized to "text/html" in the class constructor:

public HttpResponse(TextWriter writer)
{
 this._statusCode = 200;
 this._bufferOutput = true;
 this._contentType = "text/html";
  ...
}

The other significat half, CharSet, is a public property of the same class which gets its value from... content encoding!

public string get_Charset()
{
 if (this._charSet == null)
 { this._charSet = this.ContentEncoding.WebName; }

 return this._charSet;
}

//---------------------------------------------------
public Encoding get_ContentEncoding()
{
 GlobalizationConfig config1;

 if (this._encoding == null)
 {
   config1 = ((GlobalizationConfig) this._context.GetLKGConfig(
               "system.web/globalization"));

  if (config1 != null)
  { this._encoding = config1.ResponseEncoding; }

  if (this._encoding == null)
  { this._encoding = Encoding.Default; }
 }
 return this._encoding;
}

See the GlobalizationConfig class we've talked about? It's a small world, after all. As you can see, response encoding is read from the <globalization> section. If you omit declaring responseEncoding you're pretty much taking chances because a default one will be used for you.

Some people—including myself—use the http-equiv="content-type" meta tag. In the course of this research I learned that this header has no bearing on anything because ASP.NET will always set a response encoding—yours or a default one. The said http-equiv meta tag is more of a hint to the browser, but like I said, ASP.NET takes over anyway, so you can omit it. Also, it causes problems in old versions of Netscape.

Conclusion

If you are still awake and reading this congratulations! You made it! Encoding is no easy subject. I hope this post shed some light on this complicated subject. I do not claim to be an authority on Unicode, and what I covered here was my research in the face of a strange bug. Pay attention to the <globalization> section because it is a very important one even though its purpose is documented rather poorly.

Comments

Comment permalink 1 john k |
"
Whoa! You can select different encodings. By default, UTF-8 without signature is selected:
"

How do you get UTF-8 as default? I have Western European as default and I would like to have UTF-8.
Comment permalink 2 john k |
Ok, so my problem is that if I have the line:

< %@ Page language="c#" debug="true" culture="fi-FI" contentType="text/html; charset=utf-8"% >

in the beginning of the aspx page Visual Studio will save the page in utf-8.

But if I do not have the @Page directive on the page (all other pages on the site include the file with the page directive) the default is Western European.

I would like to save all files in UTF-8 always. Does anyone have a solution for this?
Comment permalink 3 Milan Negovan |
I haven't been able to find a way to have VS.NET to save files as UTF-8 all the time. :( Gotta pick the encoding by hand all the time.
Comment permalink 4 Gary K |
The reason the UTF-8 BOM is optional is because it's unnecessary and it breaks backwards compatibility.

- It's unnecessary because a Byte Order Mark is not required for a byte-ordered format. UTF-16 needed the BOM to handle big-endian and little-endian byte ordering of the 16-bit characters, but this is completely useless for UTF-8 files.

- It breaks backward compatibility because UTF-8 is supposed to be backwards compatible with 7-bit ASCII encodings. So if you stick with 7-bit ASCII, your UTF-8 and ASCII files will be identical. This is really useful if you want most of your command line tools to continue working as expected.

For example, take a UTF-8 file and sort it. If it has a BOM, the first line won't be sorted properly (unless the tool has added special support for UTF-8 - and most haven't/won't). Try cat'ing 2 files together and now you have 3 junk bytes in the middle of your file.

The only advantage of the BOM on UTF-8 files is that you can quickly identify the file as a UTF-8 file. This is nice, but it's really metadata that shouldn't be encoded *in* the file data stream.
Comment permalink 5 john |
that's great research but i sill don't understand which file encoding to choose for non-english pages.
which one.... UTF-8 or UTF-8 BOM ....

i have a feeling that UTF-8 BOM is much better but, i still not sure.

because Gary K stated that UTF-8 is bad. i still don't understand why it's bad. any thoughts.
Comment permalink 6 Milan Negovan |
UTF-8 is not bad. In fact, it's pretty much the only "common denominator" these days. Gary says that the BOM signature breaks things, so I'd go with UTF-8 and save without BOM.

I also set fileEncoding="utf-8" in web.config to be on the safe side.
Comment permalink 7 john |
Milan thanks so much for making things clear to me..... i think the information in this page + the comments are much better than MS Documentation.....

BUT we still need to FIND out how to force VS to use UTF-8 when creating source files or new project.

because i dislike saving files manualy every time i create new page....

any idea??
Comment permalink 8 Milan Negovan |
I don't know if you can. I always had to save files with advanced save options (as explained in the post) when I had "foreign" (read "non English") characters.
Comment permalink 9 KjellSJ's blog |
Linked
Comment permalink 10 Mikil |
Hi, need advice for the following scenario:

A text file with a script is uploaded to the portal that is displayed as link. Upon clicking the link, instead of opening the text file, a cookie value is displayed in the pop out alert message box.

Btw the script within the text file is alert('Test Script \n cookie value +document.cookie)

However, the other types of uploaded MS documents can be accessed in their native format. But the text file instead of opening as a text file, executes the script.

For this I am thinking of changing file encode to iso-8859-1. Will this solve the problem?
Comment permalink 11 Milan Negovan |
Mikil, I don't see the problem. I don't think you can execute a script from a text file simply by clicking a link.

Also, I don't see any problems with changing its encodings. Why should it make a difference?
Comment permalink 12 Haleh |
I set the encoding of my "aspx" pages, and also set the "web.config" as you described, but I have a big problem yet: the page, has some text boxes, which users can insert some text into them. Then by pushing the "submit" button, these text must be inserted into some SQL Server tables. Unfortunately, some fields inserted correctly, and some of them inserted as '?' characters!!!

When I test my pages locally, all the things goes right, and when I publish the pages on the host, the problem occured! I'll be apreciated if you can help me.
Comment permalink 13 Milan Negovan |
Do you store your text as ntext/nvarchar in SQL Server? If not, you need to.

Remember to also put an N before literal strings, for example: N'something', to tell SQL Server you're submitting Unicode text.
Comment permalink 14 Dan |
I have a similar problem to the one described above. I am attempted to store and display the word Español, with the tila over the n. For some reason, when I submit the page, I believe it simply removes the ñ character from the string. I am not sure it is before submission (insert), but I have good evidence to think that is the probable cause. In the SQL Server 2000 table, the word is stored as "Espaol", with the ñ simply omitted. I have tried to save the page encoded as UTF-8 with BOM & w/o BOM, and Unicode 1200. I have added the globalization tag to my web.config file as follows:

< globalization requestEncoding="utf-8" responseEncoding="utf-8" fileEncoding="utf-8" / >

Still, the ñ is simply omitted before getting stored in the DB table.

Any ideas as to why this is??? Any help would be greatly appreciated!

-Dan
Comment permalink 15 Josh Hawley |
Is this the same as the problem i'm having using delimiters in strings?

I attempt to use ALT+240 (≡) (a character that looks like = with a third line in case it doesnt show in the post.) as my delimieter character, but the designer keeps changing it to a normal = sign. I am working with windows apps, but this seems like it could be the same problem.
Comment permalink 16 Alex |
As to internationalisation VS.NET sucks.
The reason ?
In order to support Extended Latin 1 (A) characters
I used "ı" for "dotless i"

whenever I made a change in Visual Design all my HTML numeric entity codes are converted to equivalent usual chars.
I did not want Visual Studio to Convert those chars to equivalent chars on ISO-charset domain but THE Client Browsers Viewing the page(s). For what I was ready to beat the developers to death for
their stupidish behaviour of program and logical capabilities for which attribute I can prise them.

Regards
Comment permalink 17 Alex |
For The post above,
"ı" should be "& # 305 ;" whithout spaces between.
The site renders "& # 305 ;" as Latin Extended Character and displays
equivalent "char"
Comment permalink 18 David Mediavilla |
I asked in microsoft.public.dotnet.framework.webservices about how .NET deletes accented characters when I use:
RequestEncoding= System.Text.Encoding.GetEncoding( "ISO-8859-1");
in my web service client.
If you know the answer, please answer there.
Comment permalink 19 Boler Guo |
to john :
i dislike saving files manualy every time i create new page....
i dislike too~

test chinese character:
能看到吗?
Comment permalink 20 Edwin |
Wonderful. This article helped to solve my problem with ASP.NET encoding. Thanks!!!
Comment permalink 21 sachin |
Problem of saving unicode in sqldatabase server

I have the problem in an .aspx i have a textarea where i give input
in chinese language then i want to take this data in the next page
and save in database so that i could take that data from the database as and when required . When i am trying to do so the
database is storing the collected chinese language in ?(questionmark) form
suggest me how the problem can be solved
Comment permalink 22 Milan Negovan |
Sachin, I believe it's off the topic of this post, but the first thing to look for is whether your database field is declared as ntext or nvarchar.
Comment permalink 23 wonderdelight |
Guys article was great plus your postings.

Just a general question.

If I want to for example ensure my website is multiple lingual what are the mains things I need to do and consider when creating he database. I mean its dot net, UFT-8 set within the web.config file and within the database as per old development I have a locale code stored for example en-gb, en-us etc etc for different text versions for display menus and currency displays..

How do I determine what the default locale is via the browser for example? This way If I can get find out when loading the default page thats its en-gb for example when calling the stored procs I can display text in nvchar fields etc as the required language.
Comment permalink 24 MrD |
I have an aspx page, i run it on IIS 6.0 and when I have the word Búsqueda y renders Bsqueda, it doesn't render to the html code the "ú".

Do you know what setting i need to change on my server?

The same page is working ok over IIS 5.0....
Comment permalink 25 Dodly |
Hi, first let me thank the author for writing this article. That's exactly my problem, and I found no MSDN documentation about it.... (not such that could help). However, I still have problems with these damn globalization issues... I have a website that collects data from a textbox in Hebrew, the data then goes to a MS ACCESS database...
I made the changes in the web.config as described above.... When data is submitted it is written as blank lines in the DB, this does not occur when I try it on my computer however, on the host it does....
The hebrew letters appear just fine on the browser when I type something into the textbox, but something goes wrong because blank fields are stored at the db file.
Desperate for some tips concerning the issue.....SOS....

Thanks !
Comment permalink 26 Milan Negovan |
Dodly, I haven't touched Access in a long while... In SQL Server I'd use nvarchar/ntext over varchar/text to preserve encoding. If there is a similar correlation in Access, try the Unicode-aware data type.
Comment permalink 27 Milan Negovan |
MrD, see my previous comment. I assume you've modified your web.config and you serve your pages with the right content type (described in this post), right?
Comment permalink 28 Dodly |
Yes, I have modified web.config as described in the article. The language appears just right at the site, the only problem is when getting the data and trying to store it...
Is it possible that the server I'm using just isn't capabale of dealing with Hebrew letters ?
Comment permalink 29 Milan Negovan |
I assume you're using IIS so it's unlikely that it can't handle Hebrew characters. Sounds like something about encoding happens on its way to the database. If you've got some code you can email me, I can take a look at it.
Comment permalink 30 Anthony |
Hmm,
I'm struggling to render Jpananese text in an aspx web page that contains both Kanji & western script. I changed the globalization attributes as suggested but nothing has changed.
Did I miss something? Is all you need to do is this?:

requestEncoding="utf-8"
responseEncoding="utf-8"
fileEncoding="utf-8" />

Is it different for Japanese text?

Hope you can help

Ant
Comment permalink 31 Milan Negovan |
Anthony, you didn't mention what exactly was wrong: does the text look garbled on your screen? Do your visitors experience the same problem?

Have you installed add-ons for East Asian languages? If not, go to Control Panel | Regional and Language Options, switch to the Languages tab and check "Install files for East Asian languages." You will need a Windows CD because these files take about 200Mb on your drive and are not installed with Western distros of Windows by default.

If it's something else, please explain what kind of trouble you are having with kanji.

The settings I mentioned in the post are pretty much all you need to handle Japanese characters (hiragana, katakana and kanji) on the ASP.NET end.
Comment permalink 32 Anthony |
Hi Milan,
Thanks very much for your blog. Actually I've worked it out. I was trying to paste garbage from the old pages html into the html of the aspx page, hoping the page would decode it into Kanji (Didn't work). When I pasted Kanji directly into the html & saved as unicode, it worked fine. My source control doesn't like the unicode but apart from that, it works perfectly.
Many thanks!
Comment permalink 33 Low Suan Bee |
Hi

Hopefully you can help me with this:

I am trying save some japanese characters into oracle using vb.net, but the information saved always turn into junks. The system did not return any errors..

[Code snippet deleted]
Comment permalink 34 Milan Negovan |
I'm afraid this question is for a discussion forum. Please take a look at ASP.NET Forums.
Comment permalink 35 david rmz |
hi, i have a problem with my webservice, why vs.net always puts uf-8 to my webservices? i want to change that for ISO-8859-1 and i just cant do it, i try to put the globalization variables but it didnt worked so my question is what can i do to change the enconding value in a webservice?

thanks 4 ur help...
Comment permalink 36 hongboxm |
I don't know how to store Chinese character into SQL database as unicode and output correctally as Chinese character
May someone tell me?
This site seem can do that
Try:input: 中文
output:中文
Should we setup something in Sql2000 server?
Comment permalink 37 Milan Negovan |
See this comment.
Comment permalink 38 Ramesh |
Hi,

I read the article, it is very nice.

Is there any .net solution for validation controls to display alerts in spanish, when the localization is selected to spanish or to English.

Thanks,

Ramesh
Comment permalink 39 Milan Negovan |
Ramesh, validation controls will display error messages in any language as long as you set their error message properties in that language.
Comment permalink 40 Yiwen Chung |
Thanks for the info shared on this page. I used Visual Web Developer (VWD) 2005 Express. I had to open the aspx pages in notepad, and used "Save As.." option to force the file to be saved in utf-8 format. They were all saved in ANSI format using VWD. After this, all Chinese characters were displayed correctly.
Comment permalink 41 Will |
Some one mentioned how to store kanji in sql2000. This is how you do it... Make sure that your string that you are storing has a "N" in front of it. I am not sure what language you are using to write the interface, but your nonquery string would be something like this:
INSERT into tblMyTable (strLoginName) VALUES( N'kanjicharsinhere')

BTW, make sure the datatype in the sql table is of type NChar, not varchar.

Sorry to reply so late.... I just saw this site. :o)
Comment permalink 42 Ran |
Nice article, however many times even though I have in my web.config the following:
fileEncoding="utf-8"
requestEncoding="utf-8"
responseEncoding="utf-8"

..and I do: advanced save options >> 'save as utf-8 with signature', some time later after I have closed and re-opened the files it is no longer utf-8

Any ideas as to why? and how to overcome what seems like a bug?
Thanks
Comment permalink 43 rima |
i am facing the same problem please any help ,

thanks ,

please any one send me on this mail
pleaseeeeeee
Comment permalink 44 Todd |
This was great. If anyone wonders why asian characters look fine when the page has a .htm extension, then turn to nonsense if converted to .aspx, this is your answer.
Comment permalink 45 gang |
hi
i tried many times to show the chinese word in asp.net 2.o.
i set up file web.config file to

and


< globalization requestEncoding="gb2312" responseEncoding="gb2312" />
.......

but i can't show chinese word on th web page.
how can i do it, who can help me please.
thanks so much!
Comment permalink 46 paulh |
This has been driving me crazy for two days - you are my hero.
Comment permalink 47 Ben |
/!\ TO ALL THOSE ASKING HOW TO CHANGE DEFAULT ENCODING WHILE CREATING NEW FILE IN VISUAL STUDIO /!\

Go to C:\Program Files\Microsoft Visual Studio .NET 2003\VC#\CSharpProjectItems (directory may vary depending on which VS version you have, and which project's language you target), open the template you need to change (I needed to change JScript.js) in Visual Studio (or any other text editor I suppose?), and save it back to the encoding you want.
=> this will affect files you create while in a project, using Project > Add new item... (or right-clicking on the project tree > Add > Add new item...) (or Ctrl+Shift+A) (i think i said it all didn't I?)

You can also go to C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE\NewScriptItems (or NewFileItems, depends what u wanna do), and to the same (open/save with encoding).
=> this will affect files you create using File > New > File...

It drove me mad for months, so, hope this helps...
Comment permalink 48 Milan Negovan |
Thank you, Ben!
Comment permalink 49 babiak |
ok, Thanks Ben, but what about Visual Web Developer? I am opening files with encoding, choosing "utf with no signature", but when I am closing files, it's saving automatically as "utf with signature". I have to change this manually, save file with encoding and again choose "utf without signature " :(

How to do automatically opening and saving files as "utf without encoding" ?
Comment permalink 50 anonymous email |
How does it work with Visual Studio 2005. Everything was working fine until we created german pages and stored them as WESTERN EUROPEAN CODEPAGE 1252. Now any new created page takes this default value.

Its driving me nuts!
Comment permalink 51 amit |
This is vey good but how we can change all the site in specific language like Hindi on click a button if possible then give a answer with example.
Comment permalink 52 bhadreshmail@yahoo.com |
When we insert very large data in other then english language then display like????? or some boxes

then if we want to store large data (content of 4-5 pages) in sqlserver then what should we do.
Comment permalink 53 Jayakumar |
Hi Guys,

When you try to insert text with other language SQL 2000/2005 will store them as '??????'.
You need convert these text unicode and then insert these values to DB and when you want to retrive from the DB you need convert them again.

I am providing 2 methods convert 'text from unicode' and 'unicode to text' which really works!! you are free to use this code your programme.

//UnicodeToText Convert the any languge text to unicode

strSrc = strings neet to be converted
iCodePage = code page of that language

public string UnicodeToText(string strSrc, int iCodePage)
{
if (iCodePage == -1)
return strSrc;
byte[] bySrc = Encoding.Unicode.GetBytes(strSrc);
byte[] byAscii = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(iCodePage), bySrc);
string strAscii = Encoding.GetEncoding(1252).GetString(byAscii);

return strAscii;
}



//Text to Unicode covnerts text to unicode

public static string TextToUnicode(string strSrc, EPM_Codepage iCodepage)
{

byte[] bySrc = Encoding.GetEncoding(1252).GetBytes(strSrc);
byte[] byUni = Encoding.Convert(Encoding.GetEncoding((int)iCodepage),
Encoding.Unicode, bySrc);

string strUni = Encoding.Unicode.GetString(byUni);

return strUni;
}

I hope this will help lots of guys who is having issues with storing unicde to DB.

Cheers
Jayakumar
Comment permalink 54 Milan Negovan |
Jayakumar, great tip! Indeed, you need to insert Unicode text (and all .NET strings are Unicode by default) in an ntext/nchar/nvarchar field, otherwise you'll get nothing but garbled text back.
Comment permalink 55 kieu ngoc dung |
My program on laguage ASP.NET 2003 and SQL server 2000, my uses command like for Search char Unicode (Viet Nam) but It don't know, You have help me. I'm thank's!
Comment permalink 56 Deepak |
Great work Jaya... I was looking the same to store chinese characters and the code really works great!!!
Comment permalink 57 Geetha Kiran |
Thank you jayakumar... u r code worked in our case..............
thanks a lot..............
Comment permalink 58 lisa |
Thank you. This was extremely helpful.
Comment permalink 59 Jo |
Hi, this is all great info, thanks, but still can't get my page to work, sorry!
On my page the user enters a string into a texbox which is then used to search names in SQL Server database (the names may have any accented character in them). The db field is an nvarchar and working fine for sql level searches. I've added the globalization as utf-8 into my web.config. But still no joy, any accented character entered by the user is stripped out before the query is passed to SQL server - help please - what am i missing?
Comment permalink 60 Jayakumar |
If you post your piece of code, i can look at it.

Thanks
Jayakumar
Comment permalink 61 Jo |
Thanks for responding so quickly - actually I've now got it working, but only by opening the page in notepad in order to save it with unicode encoding - is there no way around this?

Emails and Notifications

Would you like to be notified when somebody responds to this post?  Would you like to have these comments emailed to you?

TrackBacks

Sorry, TrackBacks are not allowed.

Submit your comment

Please enter only text since all HTML tags except hyperlinks will be stripped. Hyperlinks will become live links. Any comments with flaming or offensive language will be deleted. Be courteous to other posters. Thank you.

Your name (required):
Your email (optional):
Your site's URL (optional):
Enter this number
Type in the number above:
Comment (required):