You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MINOR: [Docs][MATLAB] update README failing example code snippets (apache#45973)
### Rationale for this change
MATLAB currently has multiple "example code" sections in the readme `matlab/doc/matlab_interface_for_apache_arrow_design.md` that have either been deprecated or were retrieved from other languages where MATLAB does not have the same endpoints. Example code that works out of the box not only helps with early developers get started using arrow, but also helps experienced developers ensure their development setup is in proper condition.
The broken endpoints described include:
#### Use Case 2
1. `arrow.Table(Var1, Var2, Var3)`
a. With the current implementation of arrow.Table, it cannot take multiple `arrow.array`s and create a table from that. All examples inside of `matlab/test/arrow/tabular/tTable.m` create a table by first creating a MATLAB table.
3. `arrow.FeatherTableWriter`
4. `arrow.FeatherTableReader`
5. `arrow.matlab2arrow`
#### Use Case 3
1. `importFromCDataInterface`
2. `ExportToCDataInterface`
3. `arrow.ipcwrite`
4. Python lines using `_import_from_c` or `_export_from_c`
a. While these are functions inside of Python, MATLAB [has syntax that does not allow for underscores to begin variable or function names](https://www.mathworks.com/help/matlab/matlab_prog/variable-names.html). Therefore, running these in MATLAB using the Python in MATLAB module will result in errors.
### What changes are included in this PR?
I changed the use case README example code in use case 2 and use case 3 to use code that can be copy-and-pasted into MATLAB and work out of the box, rather than the current endpoints.
### Are these changes tested?
Yes. These changes have been tested by running each example piece of code inside of MATLAB inside of the changes and verifying that they work out of the box.
### Are there any user-facing changes?
No.
Lead-authored-by: Patrick Walsh <[email protected]>
Co-authored-by: Patrick Walsh <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
Copy file name to clipboardExpand all lines: matlab/doc/matlab_interface_for_apache_arrow_design.md
+68-39Lines changed: 68 additions & 39 deletions
Original file line number
Diff line number
Diff line change
@@ -109,19 +109,7 @@ ans =
109
109
110
110
To serialize MATLAB data to a file on disk (e.g. Feather, Parquet), a MATLAB developer could start by constructing an `arrow.Table` using one of several different approaches.
111
111
112
-
They could individually compose the table from a set of `arrow.Array` objects (one for each table variable).
Alternatively, they could directly convert from an existing MATLAB `table` to an `arrow.Table` using a function like `arrow.matlab2arrow` to convert between an existing MATLAB `table` and an `arrow.Table`.
112
+
They could directly convert from an existing MATLAB `table` to an `arrow.tabular.Table` using a function like `arrow.table`.
125
113
126
114
###### Example Code:
127
115
```matlab
@@ -131,33 +119,43 @@ Alternatively, they could directly convert from an existing MATLAB `table` to an
131
119
132
120
>> Density = [10.2; 20.5; 11.2; 13.7; 17.8];
133
121
134
-
>> T = table(Weight, Radius, Density); % Create a MATLAB table
122
+
% Create a MATLAB `table`
123
+
>> T = table(Weight, Radius, Density);
135
124
136
-
>> AT = arrow.matlab2arrow(T); % Create an arrow.Table
125
+
% Create an `arrow.tabular.Table` from the MATLAB `table`
126
+
>> AT = arrow.table(T);
137
127
```
138
-
To serialize the `arrow.Table`, `AT`, to a file (e.g. Feather) on disk, the user could then instantiate an `arrow.FeatherTableWriter`.
128
+
129
+
To serialize the `arrow.Table`, `AT`, to a file (e.g. Feather) on disk, the user could then instantiate an `arrow.internal.io.feather.Writer`.
The Feather file could then be read and operated on by an external process like Rust or Go. To read it back into MATLAB after modification by another process, the user could instantiate an `arrow.FeatherTableReader`.
140
+
The Feather V1 file could then be read and operated on by an external process like Rust or Go. To read it back into MATLAB, the user could instantiate an `arrow.internal.io.feather.Reader`.
% Create a MATLAB `table` from the `arrow.tabular.RecordBatch`
150
+
>> AT = table(newBatch);
153
151
```
154
152
#### Advanced MATLAB User Workflow for Implementing Support for Writing to Feather Files
155
153
156
-
To add support for writing to Feather files, an advanced MATLAB user could use the MATLAB and C++ APIs offered by the MATLAB Interface for Apache Arrow to create `arrow.FeatherTableWriter`.
154
+
To add support for writing to Feather V1 files, an advanced MATLAB user could use the MATLAB and C++ APIs offered by the MATLAB Interface for Apache Arrow to create `arrow.internal.io.feather.Writer`.
157
155
158
156
They would need to author a [MEX function] (e.g. `featherwriteMEX`), which can be called directly by MATLAB code. Within their MEX function, they could use `arrow::matlab::unwrap_table` to convert between the MATLAB representation of the Arrow memory (`arrow.Table`) and the equivalent C++ representation (`arrow::Table`). Once the `arrow.Table` has been "unwrapped" into a C++ `arrow::Table`, it can be passed to the appropriate Arrow C++ library API for writing to a Feather file (`arrow::ipc::feather::WriteTable`).
159
157
160
-
An analogous workflow could be followed to create `arrow.FeatherTableReader` to enable reading from Feather files.
158
+
An analogous workflow could be followed to create `arrow.internal.io.feather.Reader` to enable reading from Feather V1 files.
161
159
162
160
#### Enabling High-Level Workflows
163
161
@@ -179,47 +177,67 @@ Roughly speaking, local memory sharing workflows can be divided into two categor
179
177
180
178
To share a MATLAB `arrow.Array` with PyArrow efficiently, a user could use the `exportToCDataInterface` method to export the Arrow memory wrapped by an `arrow.Array` to the C Data Interface format, consisting of two C-style structs, [`ArrowArray`] and [`ArrowSchema`], which represent the Arrow data and associated metadata.
181
179
182
-
Memory addresses to the `ArrowArray` and `ArrowSchema` structs are returned by the call to `exportToCDataInterface`. These addresses can be passed to Python directly, without having to make any copies of the underlying Arrow data structures that they refer to. A user can then wrap the underlying data pointed to by the `ArrowArray` struct (which is already in the [Arrow Columnar Format]), as well as extract the necessary metadata from the `ArrowSchema` struct, to create a `pyarrow.Array` by using the static method `py.pyarrow.Array._import_from_c`.
180
+
Memory addresses for the `ArrowArray` and `ArrowSchema` structs are returned by the call to `export`. These addresses can be passed to Python directly, without having to make any copies of the underlying Arrow data structures that they refer to. A user can then wrap the underlying data pointed to by the `ArrowArray` struct (which is already in the [Arrow Columnar Format]), as well as extract the necessary metadata from the `ArrowSchema` struct, to create a `pyarrow.Array` by using the static method `pyarrow.Array._import_from_c`.
181
+
182
+
Multiple lines of Python are required to import the Arrow array from MATLAB. Therefore, the function [`pyrunfile`]((https://www.mathworks.com/help/matlab/ref/pyrunfile.html)) can be used which can run Python scripts defined in an external file.
183
183
184
184
###### Example Code:
185
+
186
+
```python
187
+
# Filename: import_from_c.py
188
+
# Note: This file is located in same directory as the MATLAB file.
% Import the memory addresses of the C Data Interface format structs to create a pyarrow.Array.
194
-
>> PA = py.pyarrow.Array._import_from_c(arrayMemoryAddress, schemaMemoryAddress);
206
+
>> PA = pyrunfile("import_from_c.py", "array", arrayMemoryAddress=cArray.Address, schemaMemoryAddress=cSchema.Address);
195
207
```
196
208
Conversely, a user can create an Arrow array using PyArrow and share it with MATLAB. To do this, they can call the method `_export_to_c` to export a `pyarrow.Array` to the C Data Interface format.
197
209
198
-
The memory addresses to the `ArrowArray` and `ArrowSchema` structs populated by the call to `_export_to_c` can be passed to the static method `arrow.Array.importFromCDataInterface` to construct a MATLAB `arrow.Array`with zero copies.
210
+
**NOTE:** Since the python calls to `_export_to_c` and `_import_from_c` have underscores at the beginning of their names, they cannot be called directly in MATLAB. MATLAB member functions or variables are [not allowed to start with an underscore](https://www.mathworks.com/help/matlab/matlab_prog/variable-names.html).
199
211
200
-
The example code below is adapted from the [`test_cffi.py` test cases for PyArrow].
212
+
To initialize a Python `pyarrow` array, `pyrunfile` can (again) be used to execute a Python script containing variables and functions with names that start with an underscore.
213
+
214
+
The memory addresses to the `ArrowArray` and `ArrowSchema` structs populated by the call to `_export_to_c` can be passed to the static method `arrow.Array.importFromCDataInterface` to construct a MATLAB `arrow.Array` with zero copies.
201
215
202
216
###### Example Code:
217
+
218
+
```python
219
+
# Filename: export_to_c.py
220
+
# Note: This file is located in same directory as the MATLAB file.
% Import the C Data Interface structs to create a MATLAB arrow.Array.
220
-
>> AA = arrow.Array.importFromCDataInterface(arrayMemoryAddress, schemaMemoryAddress);
237
+
>> AA = arrow.array.Array.import(cArray, cSchema);
221
238
```
222
239
240
+
223
241
#### Out-of-Process Memory Sharing
224
242
225
243
[MATLAB supports running Python code in a separate process]. A user could leverage the MATLAB Interface for Apache Arrow to share Arrow memory between MATLAB and PyArrow running within a separate Python process using one of the following approaches described below.
@@ -240,7 +258,18 @@ For large tables used in a multi-process "data processing pipeline", a user coul
240
258
>> AT = arrow.Table(Var1, Var2, Var3);
241
259
242
260
% Write the MATLAB arrow.Table to the Arrow IPC File Format on disk.
0 commit comments