CS479: Machine Learning for 3D Data
3D Representations
L ECTURE 2
M IN HY U K S U NG
Spring 2025
KAIST
CS479: Machine Learning for 3D Data (Spring 2025)
Previously in CS479
Applications: 3D Reconstruction
Google Maps Immersive View
https://cdn.mos.cms.futurecdn.net/P7HseGaXpSTQM2uAfSbh5Y.jp
g
CS479: Machine Learning for 3D Data (Spring 2025) 5
Previously in CS479
Applications: 3D Generation
Wang et al., ProlificDreamer: High-Fidelity and Diverse Text-to-3D Roblox
Generation with Variational Score Distillation, arXiv 2023.
CS479: Machine Learning for 3D Data (Spring 2025) 6
Previously in CS479
Applications: 3D Perception
Waymo Open Dataset AI Habitat
https://waymo.com/blog/2021/03/expanding-waymo-open- https://aihabitat.org/
dataset-with-interactive-scenario-data-and-new-challenges.html
CS479: Machine Learning for 3D Data (Spring 2025) 7
3D Encoder
A neural network taking 3D data as input.
https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
CS479: Machine Learning for 3D Data (Spring 2025) 8
3D Decoder
A neural network generating 3D data as output.
https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
CS479: Machine Learning for 3D Data (Spring 2025) 9
Course Road Map
Representations Applications
• Point clouds • 3D perception
• Implicit representation • Reconstruction
• Multi-view images to 3D • Manipulation
• Hybrid representations • Generation
• Meshes
• CAD
• Representation Conversion
CS479: Machine Learning for 3D Data (Spring 2025) 10
3D Representations
CS479: Machine Learning for 3D Data (Spring 2025) 11
https://freecontent.manning.com/deep-learning-for-text/
Texts
CS479: Machine Learning for 3D Data (Spring 2025) 12
Image from Stanford CS231N
Images
CS479: Machine Learning for 3D Data (Spring 2025) 13
AlexNet, https://oreilly.com/
Convolutional Neural Network
CS479: Machine Learning for 3D Data (Spring 2025) 14
Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021.
Transformers
CS479: Machine Learning for 3D Data (Spring 2025) 15
3D Grid (Voxels)
CS479: Machine Learning for 3D Data (Spring 2025) 16
3D Grid
https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215
CS479: Machine Learning for 3D Data (Spring 2025) 17
Medical Imaging
Mathotaarachchi et al., VoxelStats, 2016.
CS479: Machine Learning for 3D Data (Spring 2025) 18
We’re interested in 2D surfaces in 3D space.
The Digital Michelangelo Project Liu et al., A Local/Global Approach to Mesh Parameterization, SGP 2008.
CS479: Machine Learning for 3D Data (Spring 2025) 19
Image from Hao Su
3D Convolution
• Each voxel contains a binary value (indicating whether the voxel is
on the surface or not), and also, most voxels are empty.
• Huge waste of computation.
Occupied voxel ratio
Resolution 32 64 128
CS479: Machine Learning for 3D Data (Spring 2025) 20
Maturana et al., VoxNet: A 3D Convolutional Neural Network for real-time object recognition, IROS 2015.
3D CNNs
(−) Takes a huge amount of memory and time in training.
CS479: Machine Learning for 3D Data (Spring 2025) 21
3D CNNs
Architectures using adaptive data structure
Wang et al., O-CNN, SIGGRAPH 2017.
Image from Nvidia
https://miro.medium.com/
Riegler et al., O-CNN, CVPR 2017.
CS479: Machine Learning for 3D Data (Spring 2025) 22
Grahamet al., 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, CVPR 2018.
SparseConvNet [Grahamet al., 2018]
• Compute convolutions only in the active areas.
• Still takes lots of memory and time in training.
CS479: Machine Learning for 3D Data (Spring 2025) 23
Multi-View Images
CS479: Machine Learning for 3D Data (Spring 2025) 24
Su et al., Multi-view Convolutional Neural Networks for 3D Shape Recognition, ICCV 2015.
Multi-View CNN [Su et al., 2015]
CS479: Machine Learning for 3D Data (Spring 2025) 25
3D Object Classification
Dataset: ModelNet40
Metric: 40-class classification accuracy (%)
CS479: Machine Learning for 3D Data (Spring 2025) 26
Kalogerakis et al., 3D Shape Segmentation with Projective Convolutional Networks, CVPR 2017.
3D Segmentation with 2D Neural Networks
CS479: Machine Learning for 3D Data (Spring 2025) 27
Multi-View Images
• (+) Especially good for processing appearance information
like color, texture, and material.
• (−) Requires lots of images for high accuracy and thus
takes lots of memory and time.
• (−) May not be able to capture geometric details.
CS479: Machine Learning for 3D Data (Spring 2025) 28
Point Cloud
CS479: Machine Learning for 3D Data (Spring 2025) 29
Point Cloud
• The simplest representation: only points, no connectivity.
• Collection of (x,y,z) coordinates, possibly with normal.
Stanford
CS479: Machine Learning for 3D Data bunny
(Spring 2025) 30
Point Cloud
• Nearly all 3D scanning devices produce point clouds.
https://techbullion.com/wp-content/uploads/2021/12/Mobile-LiDAR-Scanner.jpg
CS479: Machine Learning for 3D Data (Spring 2025) 31
Point Cloud
• Nearly all 3D scanning devices produce point clouds.
• Sometimes, easier to handle.
Fracturing Solids Fluids
Meshless Animation of Fracturing Solids Adaptively sampled particle fluids,
Pauly et al., SIGGRAPH ‘05 Adams et al. SIGGRAPH ‘07
CS479: Machine Learning for 3D Data (Spring 2025) 32
Neural Networks for Point Clouds
• (+) Fast, easy to implement (relatively).
• (+) The most popular architectures.
Qi et al., PointNet, CVPR 2017. Qi et al., PointNet++, NeurIPS 2017.
CS479: Machine Learning for 3D Data (Spring 2025) 33
Point Cloud
• (−) No surface/topology information;
needs to be converted to the other representation for downstream
applications.
?
or
CS479: Machine Learning for 3D Data (Spring 2025) 34
Point Cloud
• (−) No surface/topology information;
needs to be converted to the other representation for downstream
applications.
• (−) Weak approximation power;
requires many points for the details.
Li et al., PU-GAN, ICCV 2019.
CS479: Machine Learning for 3D Data (Spring 2025) 35
Polygon Mesh
CS479: Machine Learning for 3D Data (Spring 2025) 36
Polygon Mesh
• The most popular representation for shapes in graphics.
• A compact form representing surfaces.
• A graph-like structure but not the same.
https://www.3dcadbrowser.com/
3d-model/people-collection-low-poly-62652
CS479: Machine Learning for 3D Data (Spring 2025) 37
Polygon Mesh
• A polygon mesh is a
collection of vertices, edges
and faces that defines the
shape of a polyhedral
object.
• A triangle mesh is a special
case when all the faces are
triangles.
vertices edges faces
CS479: Machine Learning for 3D Data (Spring 2025) 38
Online Repositories of 3D Meshes
ShapeNet 3D Warehouse Yobi 3D SceneNN
Redwood Dataset ScanNet KITTI 3D Dynamic MPI-FAUST
CS479: Machine Learning for 3D Data (Spring 2025) 39
Polygon Mesh
• (+) Good for many applications:
• Rendering
CS479: Machine Learning for 3D Data (Spring 2025) 40
Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
https://dreamfarmstudios.com/blog/getting-to-know-3d-texturing-in-animation-production/ https://commons.wikimedia.org/wiki/File:Displacement_Mapping.jpg
CS479: Machine Learning for 3D Data (Spring 2025) 41
Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation
CGAL
CS479: Machine Learning for 3D Data (Spring 2025) 42
Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation https://3dmodelsworld.com/maya-bullet-physics-
simulation-tutorial-wrecking-ball-animation-
active-and-passive-rigid-body/
• Simulation
https://www.youtube.com/watch?v=HKL8mQO1iuU
CS479: Machine Learning for 3D Data (Spring 2025) 43
Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation https://towardsdatascience.com/convolution-vs-correlation-af868b6b4fb5
• Simulation
• (−) Irregular structure
Alliez et al., Recent Advances in Remeshing of Surfaces.
CS479: Machine Learning for 3D Data (Spring 2025) 44
Pooling in Neural Network
Aggregating information while progressively reducing the
resolution of the data.
https://paperswithcode.com/method/max-pooling
Gu et al., Blind Channel Identification Aided Generalized Automatic
Modulation Recognition Based on Deep Learning, 2019.
CS479: Machine Learning for 3D Data (Spring 2025) 45
Garland and Heckbert, Surface Simplification Using Quadric Error Metrics, SIGGRAPH 1997.
Pooling Operation for Polygon Mesh
How can the resolution of a polygon mesh be decreased?
Through the process of iterative edge contraction.
Yao et al., Quadratic Error Metric Mesh Simplification
https://doc.cgal.org/latest/Surface_mesh_simplification/index.html Algorithm Based on Discrete Curvature, 2015
CS479: Machine Learning for 3D Data (Spring 2025) 46
Hanocka et al., MeshCNN: A Network with an Edge, SIGGRAPH 2019.
MeshCNN [Hanoka et al., 2019]
Pool adjacent edge information via the edge contraction.
CS479: Machine Learning for 3D Data (Spring 2025) 47
Neural Networks for Meshes
• Requires parameterization or specialized operations.
• Hard to implement. Verified only with a few use cases.
Masci et al., GCNN, ICCV 2015. Hanocka et al., MeshCNN, SIGGRAPH 2019.
Mitchel et al., ICCV 2021. Milano et al., NeurIPS 2020.
CS479: Machine Learning for 3D Data (Spring 2025) 48
Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation
• Simulation
• (−) Irregular structure
• (−) Difficult to create a valid mesh
CS479: Machine Learning for 3D Data (Spring 2025) 49
Valid Meshes
E.g. 2-manifoldness
Each local region should be homeomorphic (mappable) to a
2D flat plane.
https://www.shapeways.com/blog/archives/29453-tutorial-tuesday-5-quick-fixes-with-meshlab.html
CS479: Machine Learning for 3D Data (Spring 2025) 50
Valid Meshes
• Watertightness
• Topology
• Normal orientation consistency
Etc.
https://courses.cs.duke.edu/fall06/cps296.1/Lectures/sec-II-1.pdf
CS479: Machine Learning for 3D Data (Spring 2025) 51
Nash et al., PolyGen: An Autoregressive Generative Model of 3D Meshes, ICML 2020.
Mesh Generation
E.g., an autoregressive model generating vertices and faces
sequentially.
CS479: Machine Learning for 3D Data (Spring 2025) 52
Tang et al., EdgeRunner Auto-regressive Auto-encoder for Artistic Mesh Generation, ICLR 2025.
Mesh Generation
E.g., an autoregressive model generating vertices and faces
sequentially.
CS479: Machine Learning for 3D Data (Spring 2025) 53
CAD Representations
CS479: Machine Learning for 3D Data (Spring 2025) 54
CAD Representations
• More compact and strucural representations.
• NURBS, CSG, B-Rep, Extrusions, Revolve, etc.
Autodesk CSG Karl D.D. Willis steemit
CS479: Machine Learning for 3D Data (Spring 2025) 55
CAD Representations
• Very few neural networks processing and generating them.
Lambourne et al., BRepNet, CVPR 2021. Jayaraman et al., UV-Net, CVPR 2021.
Ren et al., CSG-Stump, ICCV 2021.
CS479: Machine Learning for 3D Data (Spring 2025) 56
Xu et al., BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry, SIGGRAPH 2024.
CAD Representations
• Very few neural networks processing and generating them.
• The architecture are very complex.
CS479: Machine Learning for 3D Data (Spring 2025) 57
3D Representations
Voxels Multi-View Point Cloud Mesh
• Good for both
• Okay for • Good for processing and
• Compact
processing if we processing generation
Pros • Good for many
use sparse • Easy to • Efficient
applications
convolution implement • Easy to
implement
• Hard to • Require lots of • Bad for both
Cons implement • Inefficient points to processing and
• Inefficient capture details generation
CS479: Machine Learning for 3D Data (Spring 2025) 58
Implicit Representation
CS479: Machine Learning for 3D Data (Spring 2025) 59
Implicit Representation
• A function that takes coordinates as input and returns
occupancy or signed distance.
• Representation for output (generation) not input
(processing).
Park et al., DeepSDF, CVPR 2020. Mescheder et al., Occupancy Networks, CVPR 2020.
CS479: Machine Learning for 3D Data (Spring 2025) 60
3D Representations
Voxels Multi-View Point Cloud Mesh Implicit
• Good for both
• Okay for • Good for processing and
• Compact
processing if we processing generation • Good for
Pros • Good for many
use sparse • Easy to • Efficient generation
applications
convolution implement • Easy to
implement
• Cannot be used
• Hard to • Require lots of • Bad for both
for processing
Cons implement • Inefficient points to processing and
• Inefficient capture details generation • Need conversion
for applications
CS479: Machine Learning for 3D Data (Spring 2025) 61
Conversion Across Representations
CS479: Machine Learning for 3D Data (Spring 2025) 62
Conversion Across Representations
?
?
Evaluation Sampling
Implicit Voxels Mesh Point Cloud
Function Rendering
Multi-View
CS479: Machine Learning for 3D Data (Spring 2025) 63
Course Road Map
Representations Applications
• Point clouds • 3D perception (Encoding)
• Implicit representation • Reconstruction (Decoding)
• Multi-view images to 3D • Manipulation
• Hybrid representations • Generation
• Meshes
• CAD
• Representation Conversion
CS479: Machine Learning for 3D Data (Spring 2025) 64