$substrCP (aggregation)
Definition
$substrCP
Returns the substring of a string. The substring starts with the character at the specified UTF-8 code point (CP) index (zero-based) in the string for the number of code points specified.
$substrCP
has the following operator expression syntax:{ $substrCP: [ <string expression>, <code point index>, <code point count> ] } FieldTypeDescriptionstring expression
string
The string from which the substring will be extracted.
string expression
can be any valid expression as long as it resolves to a string. For more information on expressions, see Expression Operators.If the argument resolves to a value of
null
or refers to a field that is missing,$substrCP
returns an empty string.If the argument does not resolve to a string or
null
nor refers to a missing field,$substrCP
returns an error.code point index
number
Indicates the starting point of the substring.
code point index
can be any valid expression as long as it resolves to a non-negative integer.code point count
number
Can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).
ExampleResults{ $substrCP: [ "abcde", 1, 2 ] }
"bc"
{ $substrCP: [ "Hello World!", 6, 5 ] }
"World"
{ $substrCP: [ "cafétéria", 0, 5 ] }
"cafét"
{ $substrCP: [ "cafétéria", 5, 4 ] }
"éria"
{ $substrCP: [ "cafétéria", 7, 3 ] }
"ia"
{ $substrCP: [ "cafétéria", 3, 1 ] }
"é"
Behavior
The $substrCP
operator uses the code points to extract
the substring. This behavior differs from the
$substrBytes
operator which extracts the substring
by the number of bytes, where each character uses between one and four
bytes.
Example
Single-Byte Character Set
Consider an inventory
collection with the following documents:
db.inventory.insertMany( [ { _id: 1, item: "ABC1", quarter: "13Q1", description: "product 1" }, { _id: 2, item: "ABC2", quarter: "13Q4", description: "product 2" }, { _id: 3, item: "XYZ1", quarter: "14Q2", description: null } ] )
The following operation uses the $substrCP
operator to
separate the quarter
value into a yearSubstring
and a
quarterSubstring
. The quarterSubstring
field represents the
rest of the string from the specified byte index
following the
yearSubstring
. It is calculated by subtracting the byte index
from the length of the string using $strLenCP
.
db.inventory.aggregate( [ { $project: { item: 1, yearSubstring: { $substrCP: [ "$quarter", 0, 2 ] }, quarterSubtring: { $substrCP: [ "$quarter", 2, { $subtract: [ { $strLenCP: "$quarter" }, 2 ] } ] } } } ] )
The operation returns the following results:
{ _id: 1, item: "ABC1", yearSubstring: "13", quarterSubtring: "Q1" } { _id: 2, item: "ABC2", yearSubstring: "13", quarterSubtring: "Q4" } { _id: 3, item: "XYZ1", yearSubstring: "14", quarterSubtring: "Q2" }
Single-Byte and Multibyte Character Set
Create a food
collection with the following documents:
db.food.insertMany( [ { _id: 1, name: "apple" }, { _id: 2, name: "banana" }, { _id: 3, name: "éclair" }, { _id: 4, name: "hamburger" }, { _id: 5, name: "jalapeño" }, { _id: 6, name: "pizza" }, { _id: 7, name: "tacos" }, { _id: 8, name: "寿司sushi" } ] )
The following example uses the $substrCP
operator to create a three
byte menuCode
from the name
value:
db.food.aggregate( [ { $project: { "name": 1, "menuCode": { $substrCP: [ "$name", 0, 3 ] } } } ] )
The operation returns the following results:
{ _id: 1, name: "apple", menuCode: "app" } { _id: 2, name: "banana", menuCode: "ban" } { _id: 3, name: "éclair", menuCode: "écl" } { _id: 4, name: "hamburger", menuCode: "ham" } { _id: 5, name: "jalapeño", menuCode: "jal" } { _id: 6, name: "pizza", menuCode: "piz" } { _id: 7, name: "tacos", menuCode: "tac" } { _id: 8, name: "寿司sushi", menuCode: "寿司s" }