0% found this document useful (0 votes)
18 views61 pages

Machine Learning in 3D Data Analysis

The document outlines the course CS479: Machine Learning for 3D Data, focusing on various 3D representations such as point clouds, meshes, and CAD. It discusses applications in 3D perception, reconstruction, and generation, highlighting the strengths and weaknesses of each representation type. The course includes neural network architectures for processing 3D data and emphasizes the importance of valid mesh structures in 3D modeling.

Uploaded by

amin1jafarzade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views61 pages

Machine Learning in 3D Data Analysis

The document outlines the course CS479: Machine Learning for 3D Data, focusing on various 3D representations such as point clouds, meshes, and CAD. It discusses applications in 3D perception, reconstruction, and generation, highlighting the strengths and weaknesses of each representation type. The course includes neural network architectures for processing 3D data and emphasizes the importance of valid mesh structures in 3D modeling.

Uploaded by

amin1jafarzade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

CS479: Machine Learning for 3D Data

3D Representations

L ECTURE 2
M IN HY U K S U NG
Spring 2025
KAIST

CS479: Machine Learning for 3D Data (Spring 2025)


Previously in CS479
Applications: 3D Reconstruction

Google Maps Immersive View


https://cdn.mos.cms.futurecdn.net/P7HseGaXpSTQM2uAfSbh5Y.jp
g

CS479: Machine Learning for 3D Data (Spring 2025) 5


Previously in CS479
Applications: 3D Generation

Wang et al., ProlificDreamer: High-Fidelity and Diverse Text-to-3D Roblox


Generation with Variational Score Distillation, arXiv 2023.

CS479: Machine Learning for 3D Data (Spring 2025) 6


Previously in CS479
Applications: 3D Perception

Waymo Open Dataset AI Habitat


https://waymo.com/blog/2021/03/expanding-waymo-open- https://aihabitat.org/
dataset-with-interactive-scenario-data-and-new-challenges.html

CS479: Machine Learning for 3D Data (Spring 2025) 7


3D Encoder
A neural network taking 3D data as input.

https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

CS479: Machine Learning for 3D Data (Spring 2025) 8


3D Decoder
A neural network generating 3D data as output.

https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

CS479: Machine Learning for 3D Data (Spring 2025) 9


Course Road Map
Representations Applications
• Point clouds • 3D perception
• Implicit representation • Reconstruction
• Multi-view images to 3D • Manipulation
• Hybrid representations • Generation
• Meshes
• CAD
• Representation Conversion
CS479: Machine Learning for 3D Data (Spring 2025) 10
3D Representations

CS479: Machine Learning for 3D Data (Spring 2025) 11


https://freecontent.manning.com/deep-learning-for-text/

Texts

CS479: Machine Learning for 3D Data (Spring 2025) 12


Image from Stanford CS231N

Images

CS479: Machine Learning for 3D Data (Spring 2025) 13


AlexNet, https://oreilly.com/

Convolutional Neural Network

CS479: Machine Learning for 3D Data (Spring 2025) 14


Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021.

Transformers

CS479: Machine Learning for 3D Data (Spring 2025) 15


3D Grid (Voxels)

CS479: Machine Learning for 3D Data (Spring 2025) 16


3D Grid

https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215
CS479: Machine Learning for 3D Data (Spring 2025) 17
Medical Imaging

Mathotaarachchi et al., VoxelStats, 2016.


CS479: Machine Learning for 3D Data (Spring 2025) 18
We’re interested in 2D surfaces in 3D space.

The Digital Michelangelo Project Liu et al., A Local/Global Approach to Mesh Parameterization, SGP 2008.

CS479: Machine Learning for 3D Data (Spring 2025) 19


Image from Hao Su

3D Convolution
• Each voxel contains a binary value (indicating whether the voxel is
on the surface or not), and also, most voxels are empty.
• Huge waste of computation.

Occupied voxel ratio


Resolution 32 64 128

CS479: Machine Learning for 3D Data (Spring 2025) 20


Maturana et al., VoxNet: A 3D Convolutional Neural Network for real-time object recognition, IROS 2015.

3D CNNs
(−) Takes a huge amount of memory and time in training.

CS479: Machine Learning for 3D Data (Spring 2025) 21


3D CNNs
Architectures using adaptive data structure

Wang et al., O-CNN, SIGGRAPH 2017.

Image from Nvidia

https://miro.medium.com/
Riegler et al., O-CNN, CVPR 2017.

CS479: Machine Learning for 3D Data (Spring 2025) 22


Grahamet al., 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, CVPR 2018.

SparseConvNet [Grahamet al., 2018]


• Compute convolutions only in the active areas.
• Still takes lots of memory and time in training.

CS479: Machine Learning for 3D Data (Spring 2025) 23


Multi-View Images

CS479: Machine Learning for 3D Data (Spring 2025) 24


Su et al., Multi-view Convolutional Neural Networks for 3D Shape Recognition, ICCV 2015.

Multi-View CNN [Su et al., 2015]

CS479: Machine Learning for 3D Data (Spring 2025) 25


3D Object Classification

Dataset: ModelNet40
Metric: 40-class classification accuracy (%)

CS479: Machine Learning for 3D Data (Spring 2025) 26


Kalogerakis et al., 3D Shape Segmentation with Projective Convolutional Networks, CVPR 2017.

3D Segmentation with 2D Neural Networks

CS479: Machine Learning for 3D Data (Spring 2025) 27


Multi-View Images
• (+) Especially good for processing appearance information
like color, texture, and material.

• (−) Requires lots of images for high accuracy and thus


takes lots of memory and time.

• (−) May not be able to capture geometric details.

CS479: Machine Learning for 3D Data (Spring 2025) 28


Point Cloud

CS479: Machine Learning for 3D Data (Spring 2025) 29


Point Cloud
• The simplest representation: only points, no connectivity.
• Collection of (x,y,z) coordinates, possibly with normal.

Stanford
CS479: Machine Learning for 3D Data bunny
(Spring 2025) 30
Point Cloud
• Nearly all 3D scanning devices produce point clouds.

https://techbullion.com/wp-content/uploads/2021/12/Mobile-LiDAR-Scanner.jpg

CS479: Machine Learning for 3D Data (Spring 2025) 31


Point Cloud
• Nearly all 3D scanning devices produce point clouds.
• Sometimes, easier to handle.
Fracturing Solids Fluids

Meshless Animation of Fracturing Solids Adaptively sampled particle fluids,


Pauly et al., SIGGRAPH ‘05 Adams et al. SIGGRAPH ‘07

CS479: Machine Learning for 3D Data (Spring 2025) 32


Neural Networks for Point Clouds
• (+) Fast, easy to implement (relatively).
• (+) The most popular architectures.

Qi et al., PointNet, CVPR 2017. Qi et al., PointNet++, NeurIPS 2017.

CS479: Machine Learning for 3D Data (Spring 2025) 33


Point Cloud
• (−) No surface/topology information;
needs to be converted to the other representation for downstream
applications.

?
or

CS479: Machine Learning for 3D Data (Spring 2025) 34


Point Cloud
• (−) No surface/topology information;
needs to be converted to the other representation for downstream
applications.
• (−) Weak approximation power;
requires many points for the details.

Li et al., PU-GAN, ICCV 2019.

CS479: Machine Learning for 3D Data (Spring 2025) 35


Polygon Mesh

CS479: Machine Learning for 3D Data (Spring 2025) 36


Polygon Mesh
• The most popular representation for shapes in graphics.
• A compact form representing surfaces.
• A graph-like structure but not the same.

https://www.3dcadbrowser.com/
3d-model/people-collection-low-poly-62652

CS479: Machine Learning for 3D Data (Spring 2025) 37


Polygon Mesh
• A polygon mesh is a
collection of vertices, edges
and faces that defines the
shape of a polyhedral
object.

• A triangle mesh is a special


case when all the faces are
triangles.

vertices edges faces

CS479: Machine Learning for 3D Data (Spring 2025) 38


Online Repositories of 3D Meshes

ShapeNet 3D Warehouse Yobi 3D SceneNN

Redwood Dataset ScanNet KITTI 3D Dynamic MPI-FAUST

CS479: Machine Learning for 3D Data (Spring 2025) 39


Polygon Mesh
• (+) Good for many applications:
• Rendering

CS479: Machine Learning for 3D Data (Spring 2025) 40


Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing

https://dreamfarmstudios.com/blog/getting-to-know-3d-texturing-in-animation-production/ https://commons.wikimedia.org/wiki/File:Displacement_Mapping.jpg

CS479: Machine Learning for 3D Data (Spring 2025) 41


Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation

CGAL

CS479: Machine Learning for 3D Data (Spring 2025) 42


Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation https://3dmodelsworld.com/maya-bullet-physics-
simulation-tutorial-wrecking-ball-animation-
active-and-passive-rigid-body/

• Simulation

https://www.youtube.com/watch?v=HKL8mQO1iuU

CS479: Machine Learning for 3D Data (Spring 2025) 43


Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation https://towardsdatascience.com/convolution-vs-correlation-af868b6b4fb5

• Simulation
• (−) Irregular structure

Alliez et al., Recent Advances in Remeshing of Surfaces.

CS479: Machine Learning for 3D Data (Spring 2025) 44


Pooling in Neural Network
Aggregating information while progressively reducing the
resolution of the data.

https://paperswithcode.com/method/max-pooling

Gu et al., Blind Channel Identification Aided Generalized Automatic


Modulation Recognition Based on Deep Learning, 2019.

CS479: Machine Learning for 3D Data (Spring 2025) 45


Garland and Heckbert, Surface Simplification Using Quadric Error Metrics, SIGGRAPH 1997.

Pooling Operation for Polygon Mesh


How can the resolution of a polygon mesh be decreased?
Through the process of iterative edge contraction.

Yao et al., Quadratic Error Metric Mesh Simplification


https://doc.cgal.org/latest/Surface_mesh_simplification/index.html Algorithm Based on Discrete Curvature, 2015

CS479: Machine Learning for 3D Data (Spring 2025) 46


Hanocka et al., MeshCNN: A Network with an Edge, SIGGRAPH 2019.

MeshCNN [Hanoka et al., 2019]


Pool adjacent edge information via the edge contraction.

CS479: Machine Learning for 3D Data (Spring 2025) 47


Neural Networks for Meshes
• Requires parameterization or specialized operations.
• Hard to implement. Verified only with a few use cases.

Masci et al., GCNN, ICCV 2015. Hanocka et al., MeshCNN, SIGGRAPH 2019.

Mitchel et al., ICCV 2021. Milano et al., NeurIPS 2020.

CS479: Machine Learning for 3D Data (Spring 2025) 48


Polygon Mesh
• (+) Good for many applications:
• Rendering
• Texturing
• Deformation / Manipulation
• Simulation
• (−) Irregular structure
• (−) Difficult to create a valid mesh

CS479: Machine Learning for 3D Data (Spring 2025) 49


Valid Meshes
E.g. 2-manifoldness
Each local region should be homeomorphic (mappable) to a
2D flat plane.

https://www.shapeways.com/blog/archives/29453-tutorial-tuesday-5-quick-fixes-with-meshlab.html

CS479: Machine Learning for 3D Data (Spring 2025) 50


Valid Meshes
• Watertightness
• Topology
• Normal orientation consistency
Etc.

https://courses.cs.duke.edu/fall06/cps296.1/Lectures/sec-II-1.pdf

CS479: Machine Learning for 3D Data (Spring 2025) 51


Nash et al., PolyGen: An Autoregressive Generative Model of 3D Meshes, ICML 2020.

Mesh Generation
E.g., an autoregressive model generating vertices and faces
sequentially.

CS479: Machine Learning for 3D Data (Spring 2025) 52


Tang et al., EdgeRunner Auto-regressive Auto-encoder for Artistic Mesh Generation, ICLR 2025.

Mesh Generation
E.g., an autoregressive model generating vertices and faces
sequentially.

CS479: Machine Learning for 3D Data (Spring 2025) 53


CAD Representations

CS479: Machine Learning for 3D Data (Spring 2025) 54


CAD Representations
• More compact and strucural representations.
• NURBS, CSG, B-Rep, Extrusions, Revolve, etc.

Autodesk CSG Karl D.D. Willis steemit

CS479: Machine Learning for 3D Data (Spring 2025) 55


CAD Representations
• Very few neural networks processing and generating them.

Lambourne et al., BRepNet, CVPR 2021. Jayaraman et al., UV-Net, CVPR 2021.
Ren et al., CSG-Stump, ICCV 2021.

CS479: Machine Learning for 3D Data (Spring 2025) 56


Xu et al., BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry, SIGGRAPH 2024.

CAD Representations
• Very few neural networks processing and generating them.
• The architecture are very complex.

CS479: Machine Learning for 3D Data (Spring 2025) 57


3D Representations

Voxels Multi-View Point Cloud Mesh


• Good for both
• Okay for • Good for processing and
• Compact
processing if we processing generation
Pros • Good for many
use sparse • Easy to • Efficient
applications
convolution implement • Easy to
implement

• Hard to • Require lots of • Bad for both


Cons implement • Inefficient points to processing and
• Inefficient capture details generation

CS479: Machine Learning for 3D Data (Spring 2025) 58


Implicit Representation

CS479: Machine Learning for 3D Data (Spring 2025) 59


Implicit Representation
• A function that takes coordinates as input and returns
occupancy or signed distance.
• Representation for output (generation) not input
(processing).

Park et al., DeepSDF, CVPR 2020. Mescheder et al., Occupancy Networks, CVPR 2020.

CS479: Machine Learning for 3D Data (Spring 2025) 60


3D Representations

Voxels Multi-View Point Cloud Mesh Implicit


• Good for both
• Okay for • Good for processing and
• Compact
processing if we processing generation • Good for
Pros • Good for many
use sparse • Easy to • Efficient generation
applications
convolution implement • Easy to
implement
• Cannot be used
• Hard to • Require lots of • Bad for both
for processing
Cons implement • Inefficient points to processing and
• Inefficient capture details generation • Need conversion
for applications

CS479: Machine Learning for 3D Data (Spring 2025) 61


Conversion Across Representations

CS479: Machine Learning for 3D Data (Spring 2025) 62


Conversion Across Representations
?

?
Evaluation Sampling

Implicit Voxels Mesh Point Cloud


Function Rendering

Multi-View

CS479: Machine Learning for 3D Data (Spring 2025) 63


Course Road Map
Representations Applications
• Point clouds • 3D perception (Encoding)
• Implicit representation • Reconstruction (Decoding)
• Multi-view images to 3D • Manipulation
• Hybrid representations • Generation
• Meshes
• CAD
• Representation Conversion
CS479: Machine Learning for 3D Data (Spring 2025) 64

You might also like