Strings, bytes, runes and characters in Go
Last Updated :
05 Feb, 2025
In Go, strings are sequences of bytes, not characters. Understanding bytes, runes, and encoding is crucial for handling text correctly. This article explores their differences and key concepts that every developer should know.
1. String
A string in Go is essentially a read-only slice of bytes. This means that strings are backed by a byte slice and are immutable, which means once a string is created, its content cannot be changed directly. While the content of a string can be manipulated (for example, by creating a new string), the string object itself is fixed in terms of size and memory.
Here’s an example of a string:
package main
import "fmt"
func main() {
var str = "Hello, World!"
fmt.Println(str) // Output: Hello, World!
}
String Literals vs. Byte Slices
In Go, a string literal (enclosed in double quotes) is automatically UTF-8 encoded, while a byte slice is just a collection of arbitrary bytes, which could represent text in any encoding scheme, not necessarily UTF-8.
Example:
// String literal
str := "Hello, World!" // This is a UTF-8 encoded string
// Byte slice
bytes := []byte{72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33} // Same content, raw bytes
Understanding UTF-8 and String Encoding
In Go, strings are UTF-8 encoded by default, meaning each character can be one or more bytes long, depending on the Unicode character’s code point.
UTF-8 Encoding
UTF-8 is a variable-length character encoding that uses one to four bytes for each character. Characters from the ASCII set (U+0000 to U+007F) use a single byte, while characters from other scripts like Chinese or emojis can require multiple bytes.
For example, the character A
(U+0041) is represented as the single byte 0x41
in UTF-8. However, a character like ⌘
(U+2318) takes three bytes (e2 8c 98
).
package main
import "fmt"
func main() {
str := "⌘" // Unicode character U+2318 (Place of Interest)
fmt.Println(len(str)) // Output: 3 because '⌘' is 3 bytes in UTF-8
}
Why Indexing a String in Go Doesn’t Return a Character
Go strings are slices of bytes, meaning that when you index a string, you get the individual byte values, not the characters. This can be confusing because, in many programming languages, strings are treated as sequences of characters. In Go, however, a character could span more than one byte, as seen in UTF-8 encoded strings.
package main
import "fmt"
func main() {
str := "⌘"
fmt.Printf("Character at position 0: %c\n", str[0]) // Output: Character at position 0: � (corrupted)
fmt.Printf("Character at position 0 (byte value): %d\n", str[0]) // Output: 226
}
Here, we see that str[0]
returns the first byte (226
), but that byte alone doesn't represent the character ⌘
, which is a three-byte sequence.
2. Runes
Go introduces the rune type to represent Unicode code points. A rune is an alias for the int32
type, and it is used to represent a single character, regardless of how many bytes it takes in UTF-8 encoding.
Rune and Code Point
In the context of Unicode, a code point is a unique identifier for each character. A rune in Go is a 32-bit integer that represents a Unicode code point. For instance, the ⌘
symbol has a Unicode code point of U+2318
, which is represented as a rune in Go.
Example:
Go
package main
import "fmt"
func main() {
// Declare a rune (character constant)
var r rune = '⌘'
// Print the rune value and its Unicode code point
fmt.Printf("Rune value: %c\n", r) // Output: Rune value: ⌘
fmt.Printf("Unicode code point: U+%04X\n", r) // Output: Unicode code point: U+2318
}
For-Range Loop with Runes
Go has built-in support for iterating over strings using the for range
loop, which handles multi-byte characters like runes properly by iterating over each individual character (rune) in the string.
Go
package main
import "fmt"
func main() {
str := "日本語" // Japanese characters
// Using for-range to loop over runes
for i, runeValue := range str {
fmt.Printf("Rune %c at byte position %d\n", runeValue, i)
}
}
Output:
Rune 日 at byte position 0
Rune 本 at byte position 3
Rune 語 at byte position 6
In this example, for range
iterates over the string, decoding the UTF-8 bytes into the correct Unicode code points (runemarks
) at each index.
Bytes, Runes, and Characters in Go
What’s the Difference Between Bytes, Runes, and Characters?
- Bytes: A byte represents 8 bits of data. In the context of strings, each byte corresponds to one ASCII character or part of a multi-byte character (like UTF-8).
- Runes: A rune is an alias for
int32
and represents a single Unicode code point. It's used in Go to handle characters that may span more than one byte in UTF-8. - Characters: While we often think of characters as being individual letters or symbols, the concept is fuzzy in computing because characters can be composed of one or more code points (like accented characters).
In Go:
- A string holds arbitrary bytes, and indexing into it retrieves bytes, not individual characters.
- A rune holds a Unicode code point, which represents a single character.
Practical Example: Converting Between Runes, Bytes, and Strings
Here’s a practical example where we convert between runes, bytes, and strings in Go:
Go
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
// Original string
str := "Hello, 世界" // "Hello, World" in English and Chinese characters
// Convert string to byte slice
bytes := []byte(str)
fmt.Printf("Byte slice: %x\n", bytes)
// Iterate over the string using a for-range loop
fmt.Println("Iterating over string (runes):")
for i, runeValue := range str {
fmt.Printf("Rune: %c, at byte position %d\n", runeValue, i)
}
// Convert string to rune slice
runes := []rune(str)
fmt.Printf("Rune slice: %v\n", runes)
// Convert rune back to string
backToString := string(runes)
fmt.Printf("Converted back to string: %s\n", backToString)
// Find the length of the string and the number of runes
fmt.Printf("String length (in bytes): %d\n", len(str))
fmt.Printf("Number of runes: %d\n", utf8.RuneCountInString(str))
}
OutputByte slice: 48656c6c6f2c20e4b896e7958c
Iterating over string (runes):
Rune: H, at byte position 0
Rune: e, at byte position 1
Rune: l, at byte position 2
Rune: l, at byte position 3
Rune: o, at byte p...
In Go, understanding strings, bytes, and runes is crucial for handling text, especially in multilingual applications. Strings store arbitrary bytes, while runes represent Unicode characters. Bytes work well for ASCII, but runes are essential for UTF-8 and international text. Using Go’s built-in types and libraries, you can efficiently convert and manipulate text while ensuring accuracy across different languages.
Similar Reads
How to Replace Characters in Golang String?
In Go language, strings are different from other languages like Java, C++, Python, etc. It is a sequence of variable-width characters where each and every character is represented by one or more bytes using UTF-8 Encoding. In Go strings, you are allowed to replace characters in the given string usin
4 min read
Interesting Facts About Golang
Go (also known as Golang or Go language) is the language developed by Google. Go is an open-source, statically-typed compiled, and explicit programming language. According to Google Developers, Go is a dependable and efficient programming language. Go supports Concurrent programming. Go is also a mu
2 min read
How to Generate Random String/Characters in Golang?
We might want to generate random strings or even sets of characters to perform some operations or add certain string-related functionality into an application. We can randomly get a character from a set of characters, randomize the order of characters of a given string or generate a random string. W
6 min read
strings.IndexRune() Function in Golang With Examples
strings.IndexRune() Function in Golang is used to find the first index of the specified rune in the given string. It is defined under the string package so, you have to import string package in your program for accessing IndexRune function Syntax: func IndexRune(str string, r rune) int This function
2 min read
Difference Between Golang and Rust
the Golang It is an open-source programming language that is statically-typed and compiled language. Go language has been developed by Robert Griesemer, Rob Pike, and Ken Thompson at Google. It was introduced back in 2007 by Google and first launched in 2009. It supports concurrency which provides h
3 min read
Check If the Rune is a Space Character or not in Golang
Rune is a superset of ASCII or it is an alias of int32. It holds all the characters available in the world's writing system, including accents and other diacritical marks, control codes like tab and carriage return, and assigns each one a standard number. This standard number is known as a Unicode c
2 min read
How to convert a string in lower case in Golang?
In Go, converting a string to lowercase is a common operation that can be accomplished easily using the strings package. This package provides various string manipulation functions, including the ToLower function, which converts all Unicode characters in a string to their lowercase equivalent.Exampl
2 min read
strings.Index() Function in Golang With Examples
strings.Index() Function in Golang is used to get the first instance of a specified substring. If the substring is not found, then this method will return -1. Syntax: func Index(str, sbstr string) int Here, str is the original string and sbstr is a string whose we want to find index value. Example 1
2 min read
How to Reverse a String in Golang?
Given a string and the task is to reverse the string. Here are a few examples. Approach 1: Reverse the string by swapping the letters, like first with last and second with second last and so on. Example: C // Golang program to reverse a string package main // importing fmt import "fmt" // function,
2 min read
strings.IndexAny() Function in Golang With Examples
strings.IndexAny() Function in Golang is used to returns the index of the first instance of any Unicode code point from chars in the original string. If the Unicode code point from chars is not available in the original string, then this method will return -1. Syntax: func IndexAny(str, charstr stri
2 min read