最近准备找工作,所以最近在回顾Java代码写法
看到编码转换的时候,发现我的中文全部变成了锟斤拷锟斤拷锟。。。
把String str=“锟斤拷锟斤拷锟?”;改成String str=“你好”;之后
结果也有一个锟斤拷锟
我有点忘了这个是怎么出来的了(因为注释全部变成了锟斤拷锟斤拷锟)
所以我搜啊搜,搜到这个
锟斤拷?UTF-8与GBK互转,为什么会乱码?
里面的描述:
关键词:可变长编码
想来可能是我一不小心以UTF-8格式打开了文件,然后
就改不回去了
UTF-8打开文件会乱码(中文)的原因是:
文件本来是GBK等固定长度编码方式(即定长)编码的
用UTF-8打开后,英文与数字等不变,中文受到影响
打开后会显示乱码
同样。以GBK编码的字符串的以UTF-8解码之后,
由于变长编码,数据可能会受到破坏
就算UTF-8来编码,字符串不是原来的字符串
对应的GBK解码后的二进制序列也不会是原来的GBK二进制序列
以下是我的程序
乱码的程序
@Test
public void test01() throws Exception {
String str="锟斤拷锟斤拷锟?";
byte[] bs = str.getBytes();//锟斤拷锟斤拷锟斤拷默锟斤拷GBK
System.out.println("bs:"+Arrays.toString(bs));
byte[] bsGBK = str.getBytes("GBK");//锟斤拷锟斤拷锟街达拷小写
System.out.println("bsGBK:"+Arrays.toString(bsGBK));
byte[] bsUTF8 = str.getBytes("UTf-8");//锟斤拷锟斤拷锟街达拷小写
System.out.println("bsUTF8:"+Arrays.toString(bsUTF8));
String s1=new String(bsUTF8);// 默锟较憋拷锟诫方式-锟斤拷锟斤拷锟斤拷默锟斤拷为GBK
System.out.println("s1:"+s1);
String s2=new String(bsUTF8,"GBK");//UTF-8锟侥凤拷锟斤拷锟斤拷GBK锟斤拷锟斤拷
System.out.println("s2:"+s2);
String s3=new String(bsGBK,"UTF-8");//GBK锟侥凤拷锟斤拷锟斤拷UTF-8锟斤拷锟斤拷
System.out.println("s3:"+s3);
String s6=new String(bsGBK);//GBK锟侥凤拷锟斤拷锟斤拷UTF-8锟斤拷锟斤拷
System.out.println("s6:"+s6);
//锟斤拷锟斤拷锟斤拷耄篏BK锟斤拷锟斤拷锟経TF-8锟斤拷锟斤拷
//1.UTF-8锟侥凤拷锟斤拷锟斤拷GBK锟斤拷锟斤拷锟紾BK锟斤拷锟斤拷锟経TF-8锟斤拷锟斤拷
byte[] ns2=s2.getBytes("GBK");
String s4=new String(ns2,"UTF-8");
System.out.println("s4:"+s4);
byte[] s4GBK = s4.getBytes("GBK");
System.out.println("s4GBK:"+Arrays.toString(s4GBK));
//1.GBK锟侥凤拷锟斤拷锟斤拷UTF-8锟斤拷锟斤拷锟経TF-8锟斤拷锟斤拷锟紾BK锟斤拷锟斤拷
byte[] ns3=s3.getBytes("UTF-8");
String s5=new String(ns3,"GBK");
System.out.println("s5:"+s5);
// byte[] s5GBK = s5.getBytes("GBK");
// System.out.println(Arrays.toString(s5GBK));
}
凭借着我的理解还原的程序
@Test
public void test01() throws Exception {
String str="你好";
byte[] bs = str.getBytes();//转默认编码:此时是UTF-8
System.out.println("bs:"+Arrays.toString(bs));
byte[] bsGBK = str.getBytes("GBK");//GBK编码
System.out.println("bsGBK:"+Arrays.toString(bsGBK));
byte[] bsUTF8 = str.getBytes("UTf-8");//UTF-8编码
System.out.println("bsUTF8:"+Arrays.toString(bsUTF8));
String s1=new String(bsUTF8);//以平台默认方式反编码保存
System.out.println("s1:"+s1);
String s2=new String(bsUTF8,"GBK");//UTF-8编码-》GBK解码
System.out.println("s2:"+s2);
String s3=new String(bsGBK,"UTF-8");//GBK编码-》UTF-8解码
System.out.println("s3:"+s3);
String s6=new String(bsGBK);//以平台默认方式反编码保存
System.out.println("s6:"+s6);
//乱码恢复1:UTF-8编码-》GBK解码-》》-GBK编码-》UTF-8解码
byte[] ns2=s2.getBytes("GBK");
String s4=new String(ns2,"UTF-8");
System.out.println("s4:"+s4);
byte[] s4GBK = s4.getBytes("GBK");
System.out.println("s4GBK:"+Arrays.toString(s4GBK));//查验正误:与bsGBK的结果相同
//乱码恢复2:GBK编码-》UTF-8解码-》》-UTF-8编码-》GBK解码
byte[] ns3=s3.getBytes("UTF-8");
String s5=new String(ns3,"GBK");
System.out.println("s5:"+s5);//结果:锟斤拷锟�
//原因:UTF-8使用变长编码
// byte[] s5GBK = s5.getBytes("GBK");
// System.out.println(Arrays.toString(s5GBK));
}