Jongwook Woo
BigDAI
CalStateLA
July 16 2024
YISS Yonsei University, Korea
Jongwook Woo, PhD, jwoo5@calstatela.edu
Big Data AI Center (BigDAI)
California State University Los Angeles
Application of LLM Leveraging
Big Data
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Scalable Data Intensive Computing
 Applications with ML & DL
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself
Experience:
Since 2002, Professor at California State University Los Angeles
– Director at BigDAI (Big Data AI Center)
– PhD/MS in 2001/1998: Computer Science and Engineering at USC
– BS/MS in 1989/1991: Electronic Engineering, Yonsei University
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: S/W Development Lead
http://www.mobygames.com/game/windows/matrix-online/credits
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Collaborations
SOFTZEN
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration with NVidia, Databricks, Oracle,
Amazon, CDH, Yonsei using Big Data AI
https://www.cloudera.com/more/customers/csula.html
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Scalable Data Intensive Computing
 Applications with ML & DL
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Large-Scale data
 Hourly, Daily, …: Tera-Byte (1012), Peta-byte (1015)
–Because of …
• IoT (Streaming data, Sensor Data) in SmartX
• Social Computing, smart phone, online game, web, Bioinformatics, …
Legacy approach
Too expensive to store and process large scale data
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: One Approach
 Make a new powerful systems with the bigger and
expensive
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: One Approach
 IBM Mainframe Z15: T02 starts with $160,000
https://techcrunch.com/2019/09/12/the-mainframe-business-is-alive-and-well-as-ibm-announces-new-z15/
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Less Expensive and More Scalable
From 2017 Korean Blockbuster
Movie, “The Fortress”
(남한산성)
AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family:
최해산(崔海山), 아버지 최무선(崔茂宣)
[Ref] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
 Use existing without spending more expenses
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
H/W: Leveraging Big Data Cluster with GPU
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Less Expensive and More Scalable
https://www.nextplatform.com/2021/09/15/the-endless-pursuit-of-scale-at-linkedin/
https://engineering.linkedin.com/blog/2021/scaling-linkedin-s-hadoop-yarn-cluster-beyond-10-000-nodes
Linkedin Hadoop Spark Cluster:
10,000 nodes with 500PB of capacity through 2020
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data in Systems
Big Data: Definition Again
Non-expensive platform, which is distributed parallel
computing systems and that can store a large-scale data
and process it in parallel
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Interviewed with Cloudera, Nov 12 2014
https://www.youtube.com/watch?v=ZvrHxsypeUE
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Scalable Data Intensive Computing
 Applications with ML & DL
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Jams and other traffic incidents reported
by users in Dec 2017 – Jan 2018:
(Dalyapraz Dauletbak)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Dashboard : COVID 19 & Vaccination
https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Talked about COVID 19 at Arirang TV, 08/27/2020
Invited Talk about COVID 19 and Post-Pandemic era using Big Data AI at Arirang TV in Korea,
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Scalable Data Intensive Computing
 Applications with ML & DL
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Prediction
Big Data Science
How to predict the future trend and pattern with the massive
dataset?
Deep
Learning
Machine
Learning
AI
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning Example with Images
GAN
Neural Style Transfer with A Neural Algorithm of Artistic Style (Gatys et
al.).
The Bathers, Korea, Yoon-Bok Shin 1858 - ?) Two Young Girls at the Piano, Auguste Renoir, French, 1892
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning Example (Cont’d)
GAN
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Scalable Data Intensive Computing
 Applications with ML & DL
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Study ML/DL in Big Data AI
 Text Processing and Classification of Biz Reviews Data
 Amazon Products Ratings/Review, OpenTable Helpfulness,
Transaction Fraud Detection, Traffics
Product Recommendation
Airbnb Listings, Amazon Product
Price prediction
Used Cars, Flight, Liquor Sales
Community Service
COVID 19: Confirmed Cases, Vaccine effects
History books/data
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Example: Text Data of History Books
『明史』卷41 志17 地理2 山東 遼東都指揮使司 鐵嶺衞:
 고려와의 경계인 철령성에 철령위를 설치하다:
https://bit.ly/3VefgLl
– Copy and paste the contents to Chat GPT or Copilot/ChatGPT
• For Example:
다음의 한자를 한국말로 번역해줘: "朝鮮, 箕子所封國也.
漢以前曰朝鮮. 始爲燕人衞滿所據, 漢武帝平之,
置眞番·臨屯·樂浪·玄菟四郡. 漢末, 有扶餘人高氏據其地, 改國號曰高麗,
又曰高句麗, 居平壤, 即樂浪也. 已, 爲唐所破, 東徙. 後唐時, 王建代高氏,
兼併新羅·百濟地, 徙居松岳, 曰東京, 而以平壤爲西京. 其國北鄰契丹,
西則女直, 南曰日本. 元至元中, 西京內屬, 置東寧路總管府,
盡慈嶺爲界.“
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Example: Text Data of History Books (Cont’d)
 『明史』卷41 志17 地理2 山東 遼東
都指揮使司 鐵嶺 says “….西京內屬,
置東寧路總管府, 盡慈嶺爲界.”
 ChatGPT translates it “… 왕건이 고씨를
대신하여 신라와 백제의 땅을 흡수하고
성곡에 이주하여 동경(東京)이라 하였고,
평양을 서경으로 하였습니다. 그 나라는
북쪽으로는 거란, 서쪽으로는 녀직,
남쪽으로는 일본과 접합니다. 원나라
시대 중에는 서경이 내려앉아 동녕로
총관부를 설치하여 자치선으로 하였고,
전체적으로는 자평령을 경계로
하였습니다.”
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Example (Cont’d)
1. Open Baidu and search for “東寧路”
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Example (Cont’d)
2. Copy the address of “東寧路” shown in Baidu: 东宁路
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Example (Cont’d)
3. Paste the address 东宁路 吉林省辽源市东丰县 To Google Map
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Example (Cont’d)
4. Compare what we have now and what we found at Google Map
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
NLP Rating Classifying Models in Open Table
 Performance Comparison of the Models
 LR has the shortest computation time
 precision and AUC of DistlBERT are 71.2 % and 70.5 %
– 4 - 14 % better Precision and AUC than traditional models
 Slightly better Precision and Accuracy than BERT
– 0.7 and 1%, respectively
Algorithm Precision AUC
Computing Time
log (sec)
LR 0.682 0.644 201
RF 0.667 0.524 634
GBT 0.637 0.616 3,141
BERT 0.707 0.694 11,936
DistilBERT 0.712 0.705 7,048
“Comparing NLP Models with LLM Classifying OpenTable Dataset”, H. Lin, S. Lee, J. Park, E. Lim, J. Woo, KrAIS 2024
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Scalable Data Intensive Computing
 Applications with ML & DL
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
 Big Data platform for Large Scale Data
 LLM
 ChatGPT, Gemini, Bing: Your secretary, Translator, Advisor
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?

More Related Content

PPTX
The Importance of Open Innovation in AI era
PPTX
Scalable Predictive Analysis and The Trend with Big Data & AI
PPTX
Rating Prediction using Deep Learning and Spark
PPTX
Introduction to Big Data and AI for Business Analytics and Prediction
PDF
Big Data and Predictive Analysis
PPTX
History and Trend of Big Data and Deep Learning
PPTX
Introduction to Big Data and its Trends
PPTX
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
The Importance of Open Innovation in AI era
Scalable Predictive Analysis and The Trend with Big Data & AI
Rating Prediction using Deep Learning and Spark
Introduction to Big Data and AI for Business Analytics and Prediction
Big Data and Predictive Analysis
History and Trend of Big Data and Deep Learning
Introduction to Big Data and its Trends
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms

Similar to History and Application of LLM Leveraging Big Data (20)

PPTX
AI on Big Data
PPTX
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
PPTX
Big Data and Data Intensive Computing on Networks
PDF
How To Use Artificial Intelligence (AI) in History
PDF
2014 15 IT trend
PPTX
Ai open powermeetupmarch25th
 
PDF
Benefiting from Semantic AI along the data life cycle
PPTX
Ai open powermeetupmarch25th
 
PPTX
Ai open powermeetupmarch25th
 
PDF
Strata Conference NYC 2013
PPTX
Traffic Data Analysis and Prediction using Big Data
PPTX
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPTX
Big Data and Data Intensive Computing: Use Cases
PPTX
Chek mate geolocation analyzer
PDF
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
PDF
Analyzing Data through Probabilistic Modeling in Statistics 1st Edition Dariu...
PDF
Dataintensive Computing Architectures Algorithms And Applications Deborah K G...
PPT
Data Science in the Real World: Making a Difference
PDF
Understanding the New World of Cognitive Computing
AI on Big Data
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Big Data and Data Intensive Computing on Networks
How To Use Artificial Intelligence (AI) in History
2014 15 IT trend
Ai open powermeetupmarch25th
 
Benefiting from Semantic AI along the data life cycle
Ai open powermeetupmarch25th
 
Ai open powermeetupmarch25th
 
Strata Conference NYC 2013
Traffic Data Analysis and Prediction using Big Data
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Big Data and Data Intensive Computing: Use Cases
Chek mate geolocation analyzer
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
Analyzing Data through Probabilistic Modeling in Statistics 1st Edition Dariu...
Dataintensive Computing Architectures Algorithms And Applications Deborah K G...
Data Science in the Real World: Making a Difference
Understanding the New World of Cognitive Computing
Ad

More from Jongwook Woo (18)

PPTX
Machine Learning in Quantum Computing
PPTX
Introduction to Big Data: Smart Factory
PDF
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
PDF
President Election of Korea in 2017
PPTX
Big Data Trend with Open Platform
PPTX
Big Data Trend and Open Data
PPTX
Big Data Platform adopting Spark and Use Cases with Open Data
PPTX
Big Data Analysis in Hydrogen Station using Spark and Azure ML
PPTX
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
PPTX
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
PPTX
Introduction to Spark: Data Analysis and Use Cases in Big Data
PPTX
Big Data Analysis and Industrial Approach using Spark
PPTX
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
PDF
Spark tutorial @ KCC 2015
PPTX
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
PPTX
Introduction to Hadoop, Big Data, Training, Use Cases
PPTX
Introduction To Big Data and Use Cases using Hadoop
PPTX
Introduction To Big Data and Use Cases on Hadoop
Machine Learning in Quantum Computing
Introduction to Big Data: Smart Factory
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
President Election of Korea in 2017
Big Data Trend with Open Platform
Big Data Trend and Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Introduction to Spark: Data Analysis and Use Cases in Big Data
Big Data Analysis and Industrial Approach using Spark
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Spark tutorial @ KCC 2015
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Hadoop, Big Data, Training, Use Cases
Introduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases on Hadoop
Ad

Recently uploaded (20)

PPTX
Machine Learning and working of machine Learning
PPTX
Chapter security of computer_8_v8.1.pptx
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PPTX
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
langchainpptforbeginners_easy_explanation.pptx
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PDF
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PPTX
Hushh.ai: Your Personal Data, Your Business
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
AI_Agriculture_Presentation_Enhanced.pptx
PPTX
transformers as a tool for understanding advance algorithms in deep learning
Machine Learning and working of machine Learning
Chapter security of computer_8_v8.1.pptx
Hushh Hackathon for IIT Bombay: Create your very own Agents
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
langchainpptforbeginners_easy_explanation.pptx
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
1 hour to get there before the game is done so you don’t need a car seat for ...
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
machinelearningoverview-250809184828-927201d2.pptx
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
Hushh.ai: Your Personal Data, Your Business
expt-design-lecture-12 hghhgfggjhjd (1).ppt
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
The Role of Pathology AI in Translational Cancer Research and Education
AI_Agriculture_Presentation_Enhanced.pptx
transformers as a tool for understanding advance algorithms in deep learning

History and Application of LLM Leveraging Big Data

  • 1. Jongwook Woo BigDAI CalStateLA July 16 2024 YISS Yonsei University, Korea Jongwook Woo, PhD, [email protected] Big Data AI Center (BigDAI) California State University Los Angeles Application of LLM Leveraging Big Data
  • 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Data Intensive Computing  Applications with ML & DL  Summary
  • 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself Experience: Since 2002, Professor at California State University Los Angeles – Director at BigDAI (Big Data AI Center) – PhD/MS in 2001/1998: Computer Science and Engineering at USC – BS/MS in 1989/1991: Electronic Engineering, Yonsei University
  • 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: S/W Development Lead http://www.mobygames.com/game/windows/matrix-online/credits
  • 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Collaborations SOFTZEN
  • 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Collaboration with NVidia, Databricks, Oracle, Amazon, CDH, Yonsei using Big Data AI https://www.cloudera.com/more/customers/csula.html
  • 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Data Intensive Computing  Applications with ML & DL  Summary
  • 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Large-Scale data  Hourly, Daily, …: Tera-Byte (1012), Peta-byte (1015) –Because of … • IoT (Streaming data, Sensor Data) in SmartX • Social Computing, smart phone, online game, web, Bioinformatics, … Legacy approach Too expensive to store and process large scale data
  • 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: One Approach  Make a new powerful systems with the bigger and expensive
  • 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: One Approach  IBM Mainframe Z15: T02 starts with $160,000 https://techcrunch.com/2019/09/12/the-mainframe-business-is-alive-and-well-as-ibm-announces-new-z15/
  • 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Less Expensive and More Scalable From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성) AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family: 최해산(崔海山), 아버지 최무선(崔茂宣) [Ref] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심  Use existing without spending more expenses
  • 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA H/W: Leveraging Big Data Cluster with GPU
  • 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Less Expensive and More Scalable https://www.nextplatform.com/2021/09/15/the-endless-pursuit-of-scale-at-linkedin/ https://engineering.linkedin.com/blog/2021/scaling-linkedin-s-hadoop-yarn-cluster-beyond-10-000-nodes Linkedin Hadoop Spark Cluster: 10,000 nodes with 500PB of capacity through 2020
  • 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data in Systems Big Data: Definition Again Non-expensive platform, which is distributed parallel computing systems and that can store a large-scale data and process it in parallel
  • 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Interviewed with Cloudera, Nov 12 2014 https://www.youtube.com/watch?v=ZvrHxsypeUE
  • 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Data Intensive Computing  Applications with ML & DL  Summary
  • 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Jams and other traffic incidents reported by users in Dec 2017 – Jan 2018: (Dalyapraz Dauletbak)
  • 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Dashboard : COVID 19 & Vaccination https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction
  • 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Talked about COVID 19 at Arirang TV, 08/27/2020 Invited Talk about COVID 19 and Post-Pandemic era using Big Data AI at Arirang TV in Korea,
  • 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Data Intensive Computing  Applications with ML & DL  Summary
  • 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Prediction Big Data Science How to predict the future trend and pattern with the massive dataset? Deep Learning Machine Learning AI
  • 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning Example with Images GAN Neural Style Transfer with A Neural Algorithm of Artistic Style (Gatys et al.). The Bathers, Korea, Yoon-Bok Shin 1858 - ?) Two Young Girls at the Piano, Auguste Renoir, French, 1892
  • 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning Example (Cont’d) GAN
  • 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Data Intensive Computing  Applications with ML & DL  Summary
  • 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Study ML/DL in Big Data AI  Text Processing and Classification of Biz Reviews Data  Amazon Products Ratings/Review, OpenTable Helpfulness, Transaction Fraud Detection, Traffics Product Recommendation Airbnb Listings, Amazon Product Price prediction Used Cars, Flight, Liquor Sales Community Service COVID 19: Confirmed Cases, Vaccine effects History books/data
  • 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Example: Text Data of History Books 『明史』卷41 志17 地理2 山東 遼東都指揮使司 鐵嶺衞:  고려와의 경계인 철령성에 철령위를 설치하다: https://bit.ly/3VefgLl – Copy and paste the contents to Chat GPT or Copilot/ChatGPT • For Example: 다음의 한자를 한국말로 번역해줘: "朝鮮, 箕子所封國也. 漢以前曰朝鮮. 始爲燕人衞滿所據, 漢武帝平之, 置眞番·臨屯·樂浪·玄菟四郡. 漢末, 有扶餘人高氏據其地, 改國號曰高麗, 又曰高句麗, 居平壤, 即樂浪也. 已, 爲唐所破, 東徙. 後唐時, 王建代高氏, 兼併新羅·百濟地, 徙居松岳, 曰東京, 而以平壤爲西京. 其國北鄰契丹, 西則女直, 南曰日本. 元至元中, 西京內屬, 置東寧路總管府, 盡慈嶺爲界.“
  • 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Example: Text Data of History Books (Cont’d)  『明史』卷41 志17 地理2 山東 遼東 都指揮使司 鐵嶺 says “….西京內屬, 置東寧路總管府, 盡慈嶺爲界.”  ChatGPT translates it “… 왕건이 고씨를 대신하여 신라와 백제의 땅을 흡수하고 성곡에 이주하여 동경(東京)이라 하였고, 평양을 서경으로 하였습니다. 그 나라는 북쪽으로는 거란, 서쪽으로는 녀직, 남쪽으로는 일본과 접합니다. 원나라 시대 중에는 서경이 내려앉아 동녕로 총관부를 설치하여 자치선으로 하였고, 전체적으로는 자평령을 경계로 하였습니다.”
  • 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Example (Cont’d) 1. Open Baidu and search for “東寧路”
  • 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Example (Cont’d) 2. Copy the address of “東寧路” shown in Baidu: 东宁路
  • 30. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Example (Cont’d) 3. Paste the address 东宁路 吉林省辽源市东丰县 To Google Map
  • 31. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Example (Cont’d) 4. Compare what we have now and what we found at Google Map
  • 32. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA NLP Rating Classifying Models in Open Table  Performance Comparison of the Models  LR has the shortest computation time  precision and AUC of DistlBERT are 71.2 % and 70.5 % – 4 - 14 % better Precision and AUC than traditional models  Slightly better Precision and Accuracy than BERT – 0.7 and 1%, respectively Algorithm Precision AUC Computing Time log (sec) LR 0.682 0.644 201 RF 0.667 0.524 634 GBT 0.637 0.616 3,141 BERT 0.707 0.694 11,936 DistilBERT 0.712 0.705 7,048 “Comparing NLP Models with LLM Classifying OpenTable Dataset”, H. Lin, S. Lee, J. Park, E. Lim, J. Woo, KrAIS 2024
  • 33. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Data Intensive Computing  Applications with ML & DL  Summary
  • 34. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary  Big Data platform for Large Scale Data  LLM  ChatGPT, Gemini, Bing: Your secretary, Translator, Advisor
  • 35. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?