本帖最后由 xp3000 于 2023-1-8 08:33 编辑
一些txt文件,里面很多多余内容需要删除,例如
PS:……
PS2:……
PS3:……
最后需要保存为ANSI编码。
搜索到一个这个不知道怎么用- function detectEncoding(str) {
- // 检查字符串开头的 BOM
- if (str.charCodeAt(0) === 0xFEFF) {
- return 'UTF-16BE';
- } else if (str.charCodeAt(0) === 0xFFFE) {
- return 'UTF-16LE';
- } else if (str.charCodeAt(0) === 0xEFBB && str.charCodeAt(1) === 0xBF) {
- return 'UTF-8BOM';
- }
-
- // 检查字符串开头的“UTF-32”标识符
- if (str.charCodeAt(0) === 0x0000 && str.charCodeAt(1) === 0x0000 && str.charCodeAt(2) === 0xFEFF) {
- return 'UTF-32BE';
- } else if (str.charCodeAt(0) === 0xFEFF && str.charCodeAt(1) === 0x0000 && str.charCodeAt(2) === 0x0000) {
- return 'UTF-32LE';
- }
-
- // 如果未找到上述模式,则假定字符串以 ASCII 或 ANSI 编码
- return 'ANSI/ASCII';
- }
复制代码
- To determine the character encoding of a string in JavaScript, you can use the .charCodeAt() method to get the numeric Unicode representation of each character in the string and then use some logic to determine the character encoding based on the values that are returned.
-
- For example, to determine if a string is encoded in ANSI, you could use the following approach:
-
- Copy code
- function isANSI(str) {
- for (let i = 0; i < str.length; i++) {
- // If any character has a Unicode value greater than 255,
- // the string is not encoded in ANSI
- if (str.charCodeAt(i) > 255) return false;
- }
-
- // If all characters have a Unicode value less than or equal to 255,
- // the string is encoded in ANSI
- return true;
- }
- To determine if a string is encoded in UTF-8, you could use the following approach:
-
- Copy code
- function isUTF8(str) {
- for (let i = 0; i < str.length; i++) {
- let c = str.charCodeAt(i);
-
- // If the Unicode value of the character is in the range 0-127,
- // it is encoded as a single byte in UTF-8
- if (c >= 0 && c <= 127) continue;
-
- // If the Unicode value of the character is in the range 128-2047,
- // it is encoded as two bytes in UTF-8
- if (c >= 128 && c <= 2047) {
- i++;
- continue;
- }
-
- // If the Unicode value of the character is in the range 2048-65535,
- // it is encoded as three bytes in UTF-8
- if (c >= 2048 && c <= 65535) {
- i += 2;
- continue;
- }
-
- // If the Unicode value of the character is in the range 65536-1114111,
- // it is encoded as four bytes in UTF-8
- if (c >= 65536 && c <= 1114111) {
- i += 3;
- continue;
- }
-
- // If the character is not encoded as a single, double, triple, or
- // quadruple byte in UTF-8, the string is not encoded in UTF-8
- return false;
- }
-
- // If all characters in the string are encoded as a single, double,
- // triple, or quadruple byte in UTF-8, the string is encoded in UTF-8
- return true;
- }
- To determine if a string is encoded in UTF-8 with a BOM (Byte Order Mark), you could use the following approach:
-
- Copy code
- function isUTF8BOM(str) {
- // The BOM for UTF-8 is the byte sequence EF BB BF
- if (str.charCodeAt(0) === 0xEF && str.charCodeAt(1) === 0xBB && str.charCodeAt(2) === 0xBF) {
- // If the first three bytes of the string match the BOM for UTF-8,
- // check if the rest of the string is encoded in UTF-8
- return isUTF8(str.substring(3));
- }
-
- // If the first three bytes of the string do not match the BOM for UTF-8,
- // the string is not encoded in UTF-8
-
-
- Try again
复制代码
|