�z�[���y�[�W > �C���^�[�l�b�g > UTF-8
2001�N2��8���X�V

UTF-8

iCalendar �Œm�����K�v�ɂȂ����̂ŁA���ׂ����Ƃ̃����B

�T�v

UCS Transformation Formats�AISO/IEC 10646 (Unicode) �Ɋ�Â��������G���R�[�f�B���O�̈�B1�I�N�e�b�g8�r�b�g�ŋL�q����A�•ϒ��̕����R�[�h�����BUCS-4 �܂ł̕������L�q�ł���B

���n���̃G���R�[�f�B���O�Ƃ��āA���ɁA���[���p��1�I�N�e�b�g7�r�b�g��UTF-7�A1�I�N�e�b�g16�r�b�g�� UTF-16�A���蒆�� UTF-32 ������B

Windows �Ȃǂ� Unicode �Ō��݃J�o�[���Ă��镶���� UCS-2 (U+0000-U+FFFF) ���x���Ȃ̂ŁAUTF-8 �ŊԂɍ����B

�p��̂悤�� ASCII (0x00�`0x7F) ��̂̌��ꌗ�ɂƂ��ẮAUnicode (UCS-2,4) �ɔ�ׂ�ƍ��܂ł� ASCII �ƂقƂ�Ǔ������o�ň����A�������e�ʂ��ߖ�ł���B���{��̂悤�ɕϊ��K�{�̕����ɂƂ��ẮA�܂��ЂƂA�ʓ|�L���ϊ����������������ƂɂȂ�B��������ł́AUCS-2 �̈ꕶ��2�o�C�g�ɑ΂��� UTF-8 �ł�3�o�C�g�ɑ�����ꍇ�������Ǝv����B

�d�l��

�d�l

RFC �� Unicode �d�l���Ő�������Ă���ϊ��̑Ή��֌W���A�����Ȃ�ɂ݂̂��߂�悤�Ɏ��}�ɂ܂Ƃ߂Ȃ����Ă݂��B

UTF-8 Byte Sequences
UCS UTF-8
  Code Points Scalar Value 1st Byte 2nd Byte 3rd Byte 4th Byte 5th Byte 6th Byte
UCS-2 U+0000..U+007F 0000 0000 0yyy xxxx 00..7F
0yyyxxxx
  �@ �@ �@ �@
U+0080..U+07FF 0000 0zzz yyyy xxxx C2..DF
110zzzyy
80..BF
10yyxxxx
  �@ �@ �@
U+0800..U+0FFF 0000 1zzz yyyy xxxx E0
11100000
A0..BF
101zzzyy
80..BF
10yyxxxx
  �@ �@
U+1000..0xD7FF, 0xE000..U+FFFF * uuuu zzzz yyyy xxxx E1..EF
1110uuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
  �@ �@
UCS-4 U+10000..U+3FFFF 00vv uuuu zzzz yyyy xxxx F0
11110000
90..BF
10vvuuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
  �@
U+40000..U+FFFFF vvvv uuuu zzzz yyyy xxxx F1..F3
111100vv
80..BF
10vvuuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
  �@
U+100000..U+10FFFF 0001 0000 uuuu zzzz yyyy xxxx F4
11110100
80..8F
1000uuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
  �@
U+110000..U+1FFFFF 0001 vvvv uuuu zzzz yyyy xxxx F5..F7
111101vv
80..BF?
10vvuuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
  �@
U+200000..U+3FFFFFF 00ss wwww vvvv uuuu zzzz yyyy xxxx F8..FB
111110ss
80..BF?
10wwwwvv
80..BF
10vvuuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
 
U+4000000..U+7FFFFFFF 0ttt ssss wwww vvvv uuuu zzzz yyyy xxxx FC..FD
1111110t
80..BF?
10ttssss
80..BF
10wwwwvv
80..BF
10vvuuuu
80..BF
10zzzzyy
80..BF
10yyxxxx
* U+D800..U+DC00: Surrogate Code Point.

�֘A���\�[�X

RFCs

Unicode.org

���̑�


Copyright (c) 2001 NOMURA Mahito <[email protected]>
2001�N1���쐬