Excel 修改有中文字csv file

有個csv file 有中文字,用Excel double click 打開,d 中文字無亂碼。如果修改入邊資料,然後再save 或者save as csv file , 再打開個file ,d 中文字就亂碼,請問用Excel 點樣解決?
(用python可以解決到既,不過想試下用excel 解決)

Click the icon in the upper left corner of Excel and choose to create a new Excel table. In the new Excel sheet, select Data → From Text → Select the CSV text to import. In the pop-up text import wizard step, select the delimiter, and then click Next→Text Import Wizard step, delimiter, comma, and then click Next→Text Import Wizard step, complete→Import data, OK. The above steps can help you open csv files in Excel without generating garbled characters.

TOP

回覆 2# s20012797


是解決到亂碼問題,不過吾太方便,要做多2至3個step。  我用python delete col 後,再save as csv file, 用Excel double click 打開黎睇,都沒亂碼。 真是奇怪。

TOP

有個csv file 有中文字,用Excel double click 打開,d 中文字無亂碼。如果修改入邊資料,然後再save 或者s ...
bongbong3481 發表於 2023-11-3 19:20


Check if BOM exists in UTF-8 encoding options

TOP

回覆 4# nissin

你吾講,我以為csv file 是一個普通txt file 原來仲有bom 呢樣野

TOP

回覆  nissin

你吾講,我以為csv file 是一個普通txt file 原來仲有bom 呢樣野 ...
bongbong3481 發表於 2023/11/11 19:34


Unicode, also called Universal Code and Unicode, is an industry standard in the field of computer science, including character sets, encoding schemes, etc. Unicode was created to solve the limitations of traditional character encoding schemes. It sets a unified and unique binary encoding for each character in each language to meet the requirements for cross-language and cross-platform text conversion and processing.
If various text encodings are described as dialects from various places, then Unicode is a language jointly developed by countries around the world.
In this language environment, there will be no more language encoding conflicts. Content in any language can be displayed on the same screen. This is the biggest benefit of Unicode. It means that all the text in the world is uniformly encoded using 2 bytes. In that way, with unified encoding like this, 2 bytes are enough to accommodate most text in all languages ​​​​in the world.
The scientific name of Unicode is "Universal Multiple-Octet Coded Character Set", universal multi-octet coded character set, referred to as UCS.
What is currently used is UCS-2, which is a 2-byte encoding, and UCS-4 was developed to prevent 2 bytes from being insufficient in the future.

In the UCS encoding, there is a character called "Zero Width No-Break Space", which is translated into Chinese as "Zero Width No-Break Space", and its encoding is FEFF. FEFF is a character that does not exist in UCS, so it should not appear in actual transmission. The UCS specification recommends that we transmit the characters "Zero Width No-Break Space" before transmitting the byte stream. In this way, if the receiver receives FEFF, it means that the byte stream is Big-Endian; if it receives FFFE, it means that the byte stream is Little-Endian. Therefore the character "Zero Width No-Break Space" is also called BOM.

UTF-8 (8-bit, Universal Character Set/Unicode Transformation Format) is a variable-length character encoding for Unicode. It can be used to represent any character in the Unicode standard, and the first byte in its encoding is still compatible with ASCII, so that the original software that processes ASCII characters can continue to be used without or with only a few modifications. Therefore, it has gradually become the preferred encoding for email, web pages, and other applications that store or transmit text.

UTF-8 does not require a BOM to indicate the byte order, but can use a BOM to indicate the encoding. The UTF-8 encoding for the character "Zero Width No-Break Space" is EF BB BF. So if the receiver receives a byte stream starting with EF BB BF, it knows that it is UTF-8 encoded. Windows uses BOM to mark the encoding of text files.

Software such as Notepad that comes with WINDOWS will insert a UTF-8 BOM header at the beginning of the file when saving a file encoded in UTF-8. Editors such as Notepad use it to identify whether this file is encoded in UTF-8 (of course, even without a UTF-8 BOM header, Notepad can correctly identify UTF-8 encoding through other methods).
What happens if there is no BOM header at the beginning of a UTF-8 encoded string?

If there is no BOM header at the beginning of a UTF-8 encoded string, the software may not be able to determine the encoding of the string. This can lead to incorrect display or processing of the string. However, some software may use heuristics or default to assuming UTF-8 encoding if no BOM header is present. It is generally recommended to include a BOM header to ensure correct handling of the string.

Unicode的專利

TOP