Stein Markup 1.1 
MMaarrkkuupp 
LLaanngguuaaggeess 
SSGG 
WW 
MMLL 
VVOOXX 
XX 
HHTT 
Yaakov J. Stein 
Chief Scientist 
DDSS 
RAD Data Communications 
SSS A 
legal-X 
CP 
DDHHTT 
GG 
mmaatthh 
C
I digest, edit and produce documents 
Stein Markup 1.2 
WWhhaatt ddoo II ddoo?? 
 business letters 
 email 
 meeting summaries 
 proposals 
 reports 
 requirement specifications 
 project plans 
 web pages 
 research articles 
 review articles 
 books
Stein Markup 1.3 
WWhhaatt ddoo ootthheerrss ddoo?? 
Pretty much the same 
US corporations produce >100 billion documents per year 
90% of a modern institution’s information is in documents 
>50% of typical corporation’s efforts involves documents 
That’s why word processing SW 
was expected to bring efficiency increases 
But didn’t!
Stein Markup 1.4 
WWoorrdd pprroocceessssiinngg?? 
PROs 
 makes nicer looking documents 
 expedites document sharing during creation 
CONs 
 typically 30% of effort on format and reformat 
 doesn’t increase information accessibility 
 doesn’t facilitate information mining
Stein Markup 1.5 
DDaattaabbaasseess?? 
The natural alternative to documents are databases 
PROs 
 increase information accessibility 
 facilitate information mining 
CONs 
 not human readable 
 format inflexible
Stein Markup 1.6 
TThhee ssoolluuttiioonn 
What we really want is to write unconstrained text 
but to have information retrieval as well ! 
 Method 1 Automatic text analysis 
AI program analyzes text 
Recognizes document structure, sentence syntax 
Performs gisting, facilitates information mining 
Complete solution equivalent to solving Turing test 
 Method 2 Manual markup 
Document author responsible for marking 
Clarifies document structure 
Enables automated retrieval of selected information 
Suggests presentation format
Stein Markup 1.7 
WWhhyy iiss tteexxtt aannaallyyssiiss hhaarrdd?? 
The man cried FIRE ! 
The man cried FIRE the gun ! 
The man cried FIRE the gun maker !
AArree MMLLss ccoommppuutteerr llaanngguuaaggeess?? 
There are many different types of computer languages: 
procedural languages 
Stein Markup 1.8 
for (n=0;n<10;i++) 
if (n>5) printf(“markup languages are fun!n”); 
graphic languages 
newpath 
0 0 moveto 0 1 lineto 1 1 lineto 1 0 lineto 
closepath fill 
database languages 
SELECT book FROM biblio WHERE subject=‘DSP’ AND author=‘STEIN’ ; 
logical languages 
useful(DSP), useful(hardware), fun(DSP), fun(web) 
interesting(X) if useful(X) and fun(X) 
?-interesting(X)
Stein Markup 1.9 
TThheeyy aarree!! 
Markup languages do not directly instruct computers 
like procedural languages 
rather indirectly instruct computer 
like logical languages 
They do this by using: 
elements 
attributes 
entities 
text 
<BOOK SUBJECT=“dsp”> 
<TITLE FORMAT=“short”>DSP-CSP</TITLE> 
<AUTHOR>J. Stein</AUTHOR> 
This is a great book! 
&standard-disclaimer 
</BOOK> 
}(tags)
SSoommee mmaarrkkuupp eelleemmeenntt ffuunnccttiioonnss 
 Structural 
Stein Markup 1.10 
– Clarifies document structure 
– Delineates document parts 
 Descriptive (informative) 
– Indicates 
– Facilitates information retrieval 
 Presentational (display) 
– Presents information in nice format 
– Helps human readability 
 Referential (links, applications) 
– Provide hypertext links 
– Launch applications
Stein Markup 1.11 
SSttrruuccttuurraall MMaarrkkuupp 
<HEADING>September 1, 2000</HEADING> 
<GREETING>Dear Prof. Stein, </GREETING> 
<BODY> 
I would like to tell you how much I enjoyed reading your new text 
“Digital Signal Processing, A Computer Science Perspective”. 
I hope we will be able to meet at the next conference. 
</BODY> 
<SIGNATURE> 
Sincerely, 
Dee Espy 
</SIGNATURE>
Stein Markup 1.12 
DDeessccrriippttiivvee MMaarrkkuupp 
<DATE>September 1, 2000</DATE> 
Dear <PERSON>Prof. Stein,</PERSON> 
I would like to tell you how much I enjoyed reading your new text 
<BOOK> 
“Digital Signal Processing, A Computer Science Perspective”. 
</BOOK> 
I hope we will be able to meet at the next <EVENT>conference.</EVENT> 
Sincerely, 
<PERSON>Dee Espy</PERSON>
Stein Markup 1.13 
PPrreesseennttaattiioonnaall MMaarrkkuupp 
<RIGHT-JUSTIFY>September 1, 2000</RIGHT-JUSTIFY> 
<BOLD>Dear Prof. Stein,</BOLD> 
I would like to tell you how much I enjoyed reading your new text 
<UNDERLINE> 
“Digital Signal Processing, A Computer Science Perspective”. 
</UNDERLINE> 
I hope we will be able to meet at the next 
<BLINK>conference.</BLINK> 
Sincerely, 
<IMAGE SRC=“deesignature.jpg” ALIGN=“left”> 
<FONT FACE=“Times-Roman”>Dee Espy</FONT>
Stein Markup 1.14 
RReellaattiioonnaall MMaarrkkuupp 
<today xlink:form=“simple” href=“date” actuate=“auto”> 
Dear Prof. Stein, 
I would like to tell you how much I enjoyed reading your new text 
<A HREF=“www.amazon.com/exec/obidos/ASIN/04712954”> 
“Digital Signal Processing, A Computer Science Perspective”. 
</A> 
I hope we will be able to meet at the next 
<A HREF=“conference”>conference.</A> 
Sincerely, 
<IMAGE SRC=“dee-signature.jpg” ALIGN=“left”> 
<A HREF=“mailto:dee@dee-epsy.net”>Dee Espy</A>
GGeenneerraalliizzeedd MMaarrkkuupp LLaanngguuaaggee 
Stein Markup 1.15 
 William Tunnicliffe, Stanley Rice [1960s] 
(independently) invent idea of structural markup language 
Problem: need different ML for each type of document 
(letter, report, article, book, etc) 
 Charles Goldfarb, Edward Mosher, Raymond Lorie (IBM) [1973] 
invent Generalized Markup Language (GML) 
Solution: use metalanguage 
Document Type Definition (DTD) defines tags 
IBM marked up 90% of its documents with GML
WWiitthh GGMMLL ssttrruuccttuurree iiss eevviiddeenntt 
Stein Markup 1.16 
Library 
Novels 
Journals 
Textbooks 
Algebraic zoology 
Botanical history 
Computer poetry 
DSP 
DSP-CSP 
DSP just for fun 
Elementary QED 
Title 
Full: Digital Signal Processing 
a Computer Science Perspective 
Short: DSPCSP 
Author 
Name: Jonathan (Y) Stein 
Association: RAD Data Comm. 
Publication 
Publisher: John Wiley 
Year: 2000 
Location: New York 
ISBN: 04712954
SSttaannddaarrdd GGeenneerraalliizzeedd MMaarrkkuupp LLaanngguuaaggee 
Problems with GML: 
Stein Markup 1.17 
– No validating parser 
– Not portable (between computer systems) 
Solution: 
SGML 
ANSI [1978] 
ISO/IEC 8879 [1986] (Intl Org for Standardization / Intl Electrotechnical Commission) 
JTC1/SC34/WG1 (WG 1 of SubCommittee 34 of Joint Technical Committee 1) 
For presentation: 
Document Style Semantics and Specification Language
Stein Markup 1.18 
SSGGMMLL -- ccoonntt.. 
If SGML is so good why doesn’t anyone use it ? 
 Complexity 
– base standard >500 pages 
– SGML is a metalanguage 
– writing DTD is complex programming 
– marked up text is hard to read 
– DSSSL adds to complexity 
 Inflexibility - requires absolute conformity 
– assumes only one correct way to markup 
– constrains author to dictated structure 
– not good at capturing author’s structure
HHyyppeerrTTeexxtt MMaarrkkuupp LLaanngguuaaggee 
CERN (particle physics institute in Switzerland) was an early Internet adopter 
 Used extensively for collaboration (articles have long author lists) 
 Major problems with format incompatibility 
Stein Markup 1.19 
– only straight ASCII worked reliably 
Tim Berners-Lee (computer specialist) defined requirements 
 simplicity (couldn’t expect physicists to use SGML) 
 freedom (didn’t need validation, let browser ignore bad markup) 
 needed hypertext links (including to documents over Internet) 
 presentational markup (papers must look nice - authors used to TEX) 
Solution: HTML - a specific application of SGML (not metalanguage)
Stein Markup 1.20 
HHTTMMLL vveerrssiioonnss 
HTML 1.0 (1989) Berners-Lee original CERN version 
hypertext, images, head+body structure, presentational markup 
HTML 2.0 (1994) IETF standard - RFC 1866 
added lists, forms, etc. 
HTML 3.2 (1997) W3C recommendation (incorporates Netscape extensions) 
added tables, applets, super/sub-scripts 
HTML 4.0 (1997) W3C recommendation (and similar ISO/IEC 15445) 
minimizes presentational markup 
XHTML 1.0 (2000) present W3C recommendation 
reformulates HTML in XML
HHTTMMLL ddooccuummeenntt ssttrruuccttuurree 
Stein Markup 1.21 
<HTML> 
<HEAD> 
global definitions such as 
<TITLE>Web page title</TITLE> 
</HEAD> 
<BODY> 
marked-up text 
</BODY> 
</HTML>
SSoommee HHTTMMLL ((bbooddyy)) eelleemmeennttss 
 <H1>Level 1 Heading</H1> Level 1 Heading 
 <H2>Level 2 Heading</H2> Level 2 Heading 
 <H3>Level 3 Heading</H3> Level 3 Heading 
 <EM> emphasized </EM> emphasized 
 <P> Paragraph </P> Paragraph 
 <A HREF=url>link</A> link 
 <UL> 
Stein Markup 1.22 
<LI> item 1 </LI> .item 1 
<LI> item 2 </LI> . item 2 
</UL> 
 <OL> 
<LI> item 1 </LI> 1 item 1 
<LI> item 2 </LI> 2 item 2 
</OL> 
 <IMG SRC=url>
Stein Markup 1.23 
PPrroobblleemmss wwiitthh HHTTMMLL 
Presentational aspects have predominated 
<B> bold text </B> 
<BLINK> blinking text </BLINK> 
<FONT COLOR=“red”> red text </FONT> 
Practically no descriptive markup 
Search engines are reduced to flat text search 
Search by topic only through keywords or portals 
Not extensible 
Can’t add new tags 
Unknown tags ignored 
Links are relatively simple 
Usually user action is required (except IMG) 
Only full document (with offset) linkable 
Link management is logistic nightmare
Stein Markup 1.24 
NNoott eevveerryytthhiinngg iiss HHTTMMLL 
Due to HTML limitations other tools are also used: 
 Multimedia extensions 
– (dynamic) gif, jpg, … 
– streaming audio 
 Common Gateway Interface 
– generate HTML on-the-fly 
– Perl, C, … 
 Server Push - Server Pull 
 Javascript 
 Java
eeXXtteennssiibbllee MMaarrkkuupp LLaanngguuaaggee 
 Simplified (best parts of) SGML (subset of features) 
 Flexible content management tool 
 W3C recommendation(s) 
 Extensible - can add new elements (even without DTD) 
 Easy to create special purpose languages (with DTD/SCHEMA) 
 Includes HTML-like hypertext links 
Stein Markup 1.25 
– and extensions (XLINK, XPOINTER) 
 The future of the web !
Stein Markup 1.26 
XXMMLL -- aann EExxaammppllee 
<?xml version="1.0" standalone="yes"?> 
<bibliography> 
<book isbn=04712954> 
<title>Digital Signal Processing: a Computer Science Perspective</title> 
<author>Jonathan (Y) Stein</author> 
<publisher>John Wiley and Sons</publisher> 
</book> 
<article> 
<title>False Alarm Reduction for ASR and OCR</title> 
<author>Yaakov Stein</author> 
<proceedings>Tenth AICVNN Symposium</proceedings> 
<pages>195-200</pages> 
</article> 
... 
</bibliography>
??WWhhaatt ccaann wwee ddoo wwiitthh aann XXMMLL ffiillee 
 Check if well-formed 
 Check if valid (against DTD or schema) 
 Display “as-is” in browser 
 Parse in special-purpose program (SAX, DOM) 
 Process (XSL) to XML, HTML, etc. 
 Display after processing 
Stein Markup 1.27
WWiirreelleessss MMaarrkkuupp LLaanngguuaaggee 
Markup language element of Wireless Application Protocol 
WAP forum (1997) 
– Ericsson, Motorola, Nokia, Unwired Planet (phone.com) 
– bring Internet to cellular phone users 
– re-use fundamental Internet concepts (TCP/IP, http, html, javascript) 
Stein Markup 1.28 
but adapted to lower bandwidth 
smaller screen 
limited input facilities 
limited computational resources 
– applications scale across transport options (GSM, TDMA, CDMA, 3G) 
and device types (mobile phones, personal assistants)
Stein Markup 1.29 
WWMMLL PPhhiilloossoopphhyy 
Defined using XML 
Transported in compressed binary (for BW reduction) 
Applications are modeled as decks of cards 
Features: 
Actions (OK, navigation, help) can be performed 
Hyperlinks (like in HTML) 
String variables 
Timers 
wbmp images (B&W) 
Select boxes, forms (for input) 
wmlscript (like javascript)
Stein Markup 1.30 
WWMMLL ssttrruuccttuurree 
< ? xml version=“1.0” ? > 
<!DOCTYPE wml …> 
<wml> 
<card> 
<p> 
text 
</p> 
<p> 
text 
</p> 
</card> 
.<.c.ard> 
</card> 
</wml>
Stein Markup 1.31 
SSoommee WWMMLL eelleemmeennttss 
 <p> </p> text 
 <a href=...> </a> hyperlink (anchor) 
 <do> </do> action 
 <go href=.../> goto wml page 
 <timer> trigger event (units = tenths of a second) 
 <input/> input user text 
 <prev/> return to previous page 
 $(…) value of variable 
 <img src=… /> display image 
 <postfield name=… value=…/> set variable 
 <select > <option> <option> </select> select box
SSoommee mmoorree mmaarrkkuupp llaanngguuaaggeess 
Stein Markup 1.32 
 VML = Vector (graphics) Markup Language 
 VoiceXML 
 SSML = Speech Synthesis Markup Language 
 CPML = Call Policy Markup Language 
 DSML = Directory Services Markup Language 
 MathML = Mathematical Markup Language 
 CML = Chemical Markup Language 
 AML = Astronomical Markup Language 
 LegalXML 
 BSML = Bioinformatic Sequence Markup Language 
 GedML = Genealogical Data Markup Language 
 FinXML = Financial market Markup Language 
 ChessML 
 SDML = Signed Document Markup Language 
 RELML = Real Estate Listing Markup Language 
 etc. etc. etc. ...
Stein Markup 1.33 
EExxaammpplleess 
 HTML 
– html examples 
 XML 
– xml-file xsl-file xml 
 VML 
–vml-file 
 WML (get M3gate emulator) 
– wml examples

More Related Content

PPTX
PPT
CSS Basics
PPTX
UNDERSTANDING MARKUP LANGUAGES.pptx
PPTX
Markup language classification, designing static and dynamic
PDF
Introduction to HTML5
PPTX
Css selectors
CSS Basics
UNDERSTANDING MARKUP LANGUAGES.pptx
Markup language classification, designing static and dynamic
Introduction to HTML5
Css selectors

What's hot (20)

PDF
HTML CSS Basics
PDF
Bootstrap
PDF
Html / CSS Presentation
PPT
Introduction to CSS
PPTX
Static and Dynamic webpage
PPTX
(Fast) Introduction to HTML & CSS
PPT
Css Ppt
PPT
Servlet life cycle
PPT
Introduction to Cascading Style Sheets (CSS)
PDF
Introduction to CSS3
PPTX
Cascading Style Sheet (CSS)
PPTX
Ajax
PPT
Introduction to XML
ODP
CSS Basics
PPT
Lecture 1 intro to web designing
PPTX
Html presentation
PPT
Web Servers (ppt)
PPTX
Learn html Basics
HTML CSS Basics
Bootstrap
Html / CSS Presentation
Introduction to CSS
Static and Dynamic webpage
(Fast) Introduction to HTML & CSS
Css Ppt
Servlet life cycle
Introduction to Cascading Style Sheets (CSS)
Introduction to CSS3
Cascading Style Sheet (CSS)
Ajax
Introduction to XML
CSS Basics
Lecture 1 intro to web designing
Html presentation
Web Servers (ppt)
Learn html Basics
Ad

Viewers also liked (20)

PDF
Mark-up languages
PPTX
Extensible Markup Language (XML)
PPTX
Hyper Text Markup Language
PDF
HTML and XML Difference FAQs
PPT
Web Functionality
PDF
S60 Web Runtime - Web2.0 Expo Europe 2008
PPT
WebSphere Message Broker In Shared Runtime Environments
PPTX
Html5 Basic Structure
PPT
Sgml and xml
PPT
Web Issues
PDF
Website design and devlopment
PPTX
Intro to Front-End Web Devlopment
PPTX
Class2
PPT
Struts2 course chapter 1: Evolution of Web Applications
PDF
The Evolution of the Web
PPT
How To Use Blogs, Twitter And Facebook To Grow Your Business
PPT
Chapter17 system implementation
PPTX
Issues of web design and structure
PPTX
islamic Banking presentation
Mark-up languages
Extensible Markup Language (XML)
Hyper Text Markup Language
HTML and XML Difference FAQs
Web Functionality
S60 Web Runtime - Web2.0 Expo Europe 2008
WebSphere Message Broker In Shared Runtime Environments
Html5 Basic Structure
Sgml and xml
Web Issues
Website design and devlopment
Intro to Front-End Web Devlopment
Class2
Struts2 course chapter 1: Evolution of Web Applications
The Evolution of the Web
How To Use Blogs, Twitter And Facebook To Grow Your Business
Chapter17 system implementation
Issues of web design and structure
islamic Banking presentation
Ad

Similar to Markup Languages (20)

ODP
The need of Interoperability in Office and GIS formats
PDF
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
PPT
Fundamentals of computer system and Programming EC-105
PPTX
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
PDF
Scaling the (evolving) web data –at low cost-
PDF
Basics of Research Data Management
PDF
93 peter butterfield
PPTX
DSL in test automation
PDF
The Nature of Information
PPT
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
PPT
GATE, HLT and Machine Learning, Sheffield, July 2003
PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
PDF
Markup For Dummies (Russ Ward)
PPT
lect36-tasks.ppt
PPT
NLP Tasks and Applications.ppt useful in
PPTX
Web technology Unit-II Part A
PDF
Decoding and developing the online finding aid
PPT
Yahoo Making The Web Searchable
The need of Interoperability in Office and GIS formats
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
Fundamentals of computer system and Programming EC-105
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Scaling the (evolving) web data –at low cost-
Basics of Research Data Management
93 peter butterfield
DSL in test automation
The Nature of Information
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
GATE, HLT and Machine Learning, Sheffield, July 2003
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Markup For Dummies (Russ Ward)
lect36-tasks.ppt
NLP Tasks and Applications.ppt useful in
Web technology Unit-II Part A
Decoding and developing the online finding aid
Yahoo Making The Web Searchable

More from Senthil Kanth (20)

PPT
Wireless Communication and Networking by WilliamStallings Chap2
PPT
wireless communication and networking Chapter 1
PPT
WML Script by Shanti katta
PPT
WAP- Wireless Application Protocol
PPT
What is WAP?
PPT
Introduction to Mobile Application Development
PPT
MOBILE APPs DEVELOPMENT PLATFORMS
PPT
Introduction to wireless application protocol (wap)ogi
PPT
XML Programming WML by Dickson K.W. Chiu PhD, SMIEEE
PPT
Wireless Application Protocol WAP by Alvinen
DOC
HR QUESTIONS, INTERVIEW QUESTIONS
DOC
HR QUESTIONS
TXT
STOCK APPLICATION USING CORBA
DOC
RSA alogrithm
PDF
Zone Routing Protocol (ZRP)
PDF
On-Demand Multicast Routing Protocol
ODP
Adhoc routing protocols
PDF
DSDV VS AODV
PPT
16.Distributed System Structure
PPT
15.Security
Wireless Communication and Networking by WilliamStallings Chap2
wireless communication and networking Chapter 1
WML Script by Shanti katta
WAP- Wireless Application Protocol
What is WAP?
Introduction to Mobile Application Development
MOBILE APPs DEVELOPMENT PLATFORMS
Introduction to wireless application protocol (wap)ogi
XML Programming WML by Dickson K.W. Chiu PhD, SMIEEE
Wireless Application Protocol WAP by Alvinen
HR QUESTIONS, INTERVIEW QUESTIONS
HR QUESTIONS
STOCK APPLICATION USING CORBA
RSA alogrithm
Zone Routing Protocol (ZRP)
On-Demand Multicast Routing Protocol
Adhoc routing protocols
DSDV VS AODV
16.Distributed System Structure
15.Security

Recently uploaded (20)

PDF
IP : I ; Unit I : Preformulation Studies
PDF
Hazard Identification & Risk Assessment .pdf
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Journal of Dental Science - UDMY (2020).pdf
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Education and Perspectives of Education.pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PPTX
Module on health assessment of CHN. pptx
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
Literature_Review_methods_ BRACU_MKT426 course material
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
IP : I ; Unit I : Preformulation Studies
Hazard Identification & Risk Assessment .pdf
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Journal of Dental Science - UDMY (2020).pdf
Complications of Minimal Access-Surgery.pdf
Education and Perspectives of Education.pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Introduction to pro and eukaryotes and differences.pptx
What’s under the hood: Parsing standardized learning content for AI
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Module on health assessment of CHN. pptx
CRP102_SAGALASSOS_Final_Projects_2025.pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Core Concepts of Personalized Learning and Virtual Learning Environments
Literature_Review_methods_ BRACU_MKT426 course material
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf

Markup Languages

  • 1. Stein Markup 1.1 MMaarrkkuupp LLaanngguuaaggeess SSGG WW MMLL VVOOXX XX HHTT Yaakov J. Stein Chief Scientist DDSS RAD Data Communications SSS A legal-X CP DDHHTT GG mmaatthh C
  • 2. I digest, edit and produce documents Stein Markup 1.2 WWhhaatt ddoo II ddoo??  business letters  email  meeting summaries  proposals  reports  requirement specifications  project plans  web pages  research articles  review articles  books
  • 3. Stein Markup 1.3 WWhhaatt ddoo ootthheerrss ddoo?? Pretty much the same US corporations produce >100 billion documents per year 90% of a modern institution’s information is in documents >50% of typical corporation’s efforts involves documents That’s why word processing SW was expected to bring efficiency increases But didn’t!
  • 4. Stein Markup 1.4 WWoorrdd pprroocceessssiinngg?? PROs  makes nicer looking documents  expedites document sharing during creation CONs  typically 30% of effort on format and reformat  doesn’t increase information accessibility  doesn’t facilitate information mining
  • 5. Stein Markup 1.5 DDaattaabbaasseess?? The natural alternative to documents are databases PROs  increase information accessibility  facilitate information mining CONs  not human readable  format inflexible
  • 6. Stein Markup 1.6 TThhee ssoolluuttiioonn What we really want is to write unconstrained text but to have information retrieval as well !  Method 1 Automatic text analysis AI program analyzes text Recognizes document structure, sentence syntax Performs gisting, facilitates information mining Complete solution equivalent to solving Turing test  Method 2 Manual markup Document author responsible for marking Clarifies document structure Enables automated retrieval of selected information Suggests presentation format
  • 7. Stein Markup 1.7 WWhhyy iiss tteexxtt aannaallyyssiiss hhaarrdd?? The man cried FIRE ! The man cried FIRE the gun ! The man cried FIRE the gun maker !
  • 8. AArree MMLLss ccoommppuutteerr llaanngguuaaggeess?? There are many different types of computer languages: procedural languages Stein Markup 1.8 for (n=0;n<10;i++) if (n>5) printf(“markup languages are fun!n”); graphic languages newpath 0 0 moveto 0 1 lineto 1 1 lineto 1 0 lineto closepath fill database languages SELECT book FROM biblio WHERE subject=‘DSP’ AND author=‘STEIN’ ; logical languages useful(DSP), useful(hardware), fun(DSP), fun(web) interesting(X) if useful(X) and fun(X) ?-interesting(X)
  • 9. Stein Markup 1.9 TThheeyy aarree!! Markup languages do not directly instruct computers like procedural languages rather indirectly instruct computer like logical languages They do this by using: elements attributes entities text <BOOK SUBJECT=“dsp”> <TITLE FORMAT=“short”>DSP-CSP</TITLE> <AUTHOR>J. Stein</AUTHOR> This is a great book! &standard-disclaimer </BOOK> }(tags)
  • 10. SSoommee mmaarrkkuupp eelleemmeenntt ffuunnccttiioonnss  Structural Stein Markup 1.10 – Clarifies document structure – Delineates document parts  Descriptive (informative) – Indicates – Facilitates information retrieval  Presentational (display) – Presents information in nice format – Helps human readability  Referential (links, applications) – Provide hypertext links – Launch applications
  • 11. Stein Markup 1.11 SSttrruuccttuurraall MMaarrkkuupp <HEADING>September 1, 2000</HEADING> <GREETING>Dear Prof. Stein, </GREETING> <BODY> I would like to tell you how much I enjoyed reading your new text “Digital Signal Processing, A Computer Science Perspective”. I hope we will be able to meet at the next conference. </BODY> <SIGNATURE> Sincerely, Dee Espy </SIGNATURE>
  • 12. Stein Markup 1.12 DDeessccrriippttiivvee MMaarrkkuupp <DATE>September 1, 2000</DATE> Dear <PERSON>Prof. Stein,</PERSON> I would like to tell you how much I enjoyed reading your new text <BOOK> “Digital Signal Processing, A Computer Science Perspective”. </BOOK> I hope we will be able to meet at the next <EVENT>conference.</EVENT> Sincerely, <PERSON>Dee Espy</PERSON>
  • 13. Stein Markup 1.13 PPrreesseennttaattiioonnaall MMaarrkkuupp <RIGHT-JUSTIFY>September 1, 2000</RIGHT-JUSTIFY> <BOLD>Dear Prof. Stein,</BOLD> I would like to tell you how much I enjoyed reading your new text <UNDERLINE> “Digital Signal Processing, A Computer Science Perspective”. </UNDERLINE> I hope we will be able to meet at the next <BLINK>conference.</BLINK> Sincerely, <IMAGE SRC=“deesignature.jpg” ALIGN=“left”> <FONT FACE=“Times-Roman”>Dee Espy</FONT>
  • 14. Stein Markup 1.14 RReellaattiioonnaall MMaarrkkuupp <today xlink:form=“simple” href=“date” actuate=“auto”> Dear Prof. Stein, I would like to tell you how much I enjoyed reading your new text <A HREF=“www.amazon.com/exec/obidos/ASIN/04712954”> “Digital Signal Processing, A Computer Science Perspective”. </A> I hope we will be able to meet at the next <A HREF=“conference”>conference.</A> Sincerely, <IMAGE SRC=“dee-signature.jpg” ALIGN=“left”> <A HREF=“mailto:[email protected]”>Dee Espy</A>
  • 15. GGeenneerraalliizzeedd MMaarrkkuupp LLaanngguuaaggee Stein Markup 1.15  William Tunnicliffe, Stanley Rice [1960s] (independently) invent idea of structural markup language Problem: need different ML for each type of document (letter, report, article, book, etc)  Charles Goldfarb, Edward Mosher, Raymond Lorie (IBM) [1973] invent Generalized Markup Language (GML) Solution: use metalanguage Document Type Definition (DTD) defines tags IBM marked up 90% of its documents with GML
  • 16. WWiitthh GGMMLL ssttrruuccttuurree iiss eevviiddeenntt Stein Markup 1.16 Library Novels Journals Textbooks Algebraic zoology Botanical history Computer poetry DSP DSP-CSP DSP just for fun Elementary QED Title Full: Digital Signal Processing a Computer Science Perspective Short: DSPCSP Author Name: Jonathan (Y) Stein Association: RAD Data Comm. Publication Publisher: John Wiley Year: 2000 Location: New York ISBN: 04712954
  • 17. SSttaannddaarrdd GGeenneerraalliizzeedd MMaarrkkuupp LLaanngguuaaggee Problems with GML: Stein Markup 1.17 – No validating parser – Not portable (between computer systems) Solution: SGML ANSI [1978] ISO/IEC 8879 [1986] (Intl Org for Standardization / Intl Electrotechnical Commission) JTC1/SC34/WG1 (WG 1 of SubCommittee 34 of Joint Technical Committee 1) For presentation: Document Style Semantics and Specification Language
  • 18. Stein Markup 1.18 SSGGMMLL -- ccoonntt.. If SGML is so good why doesn’t anyone use it ?  Complexity – base standard >500 pages – SGML is a metalanguage – writing DTD is complex programming – marked up text is hard to read – DSSSL adds to complexity  Inflexibility - requires absolute conformity – assumes only one correct way to markup – constrains author to dictated structure – not good at capturing author’s structure
  • 19. HHyyppeerrTTeexxtt MMaarrkkuupp LLaanngguuaaggee CERN (particle physics institute in Switzerland) was an early Internet adopter  Used extensively for collaboration (articles have long author lists)  Major problems with format incompatibility Stein Markup 1.19 – only straight ASCII worked reliably Tim Berners-Lee (computer specialist) defined requirements  simplicity (couldn’t expect physicists to use SGML)  freedom (didn’t need validation, let browser ignore bad markup)  needed hypertext links (including to documents over Internet)  presentational markup (papers must look nice - authors used to TEX) Solution: HTML - a specific application of SGML (not metalanguage)
  • 20. Stein Markup 1.20 HHTTMMLL vveerrssiioonnss HTML 1.0 (1989) Berners-Lee original CERN version hypertext, images, head+body structure, presentational markup HTML 2.0 (1994) IETF standard - RFC 1866 added lists, forms, etc. HTML 3.2 (1997) W3C recommendation (incorporates Netscape extensions) added tables, applets, super/sub-scripts HTML 4.0 (1997) W3C recommendation (and similar ISO/IEC 15445) minimizes presentational markup XHTML 1.0 (2000) present W3C recommendation reformulates HTML in XML
  • 21. HHTTMMLL ddooccuummeenntt ssttrruuccttuurree Stein Markup 1.21 <HTML> <HEAD> global definitions such as <TITLE>Web page title</TITLE> </HEAD> <BODY> marked-up text </BODY> </HTML>
  • 22. SSoommee HHTTMMLL ((bbooddyy)) eelleemmeennttss  <H1>Level 1 Heading</H1> Level 1 Heading  <H2>Level 2 Heading</H2> Level 2 Heading  <H3>Level 3 Heading</H3> Level 3 Heading  <EM> emphasized </EM> emphasized  <P> Paragraph </P> Paragraph  <A HREF=url>link</A> link  <UL> Stein Markup 1.22 <LI> item 1 </LI> .item 1 <LI> item 2 </LI> . item 2 </UL>  <OL> <LI> item 1 </LI> 1 item 1 <LI> item 2 </LI> 2 item 2 </OL>  <IMG SRC=url>
  • 23. Stein Markup 1.23 PPrroobblleemmss wwiitthh HHTTMMLL Presentational aspects have predominated <B> bold text </B> <BLINK> blinking text </BLINK> <FONT COLOR=“red”> red text </FONT> Practically no descriptive markup Search engines are reduced to flat text search Search by topic only through keywords or portals Not extensible Can’t add new tags Unknown tags ignored Links are relatively simple Usually user action is required (except IMG) Only full document (with offset) linkable Link management is logistic nightmare
  • 24. Stein Markup 1.24 NNoott eevveerryytthhiinngg iiss HHTTMMLL Due to HTML limitations other tools are also used:  Multimedia extensions – (dynamic) gif, jpg, … – streaming audio  Common Gateway Interface – generate HTML on-the-fly – Perl, C, …  Server Push - Server Pull  Javascript  Java
  • 25. eeXXtteennssiibbllee MMaarrkkuupp LLaanngguuaaggee  Simplified (best parts of) SGML (subset of features)  Flexible content management tool  W3C recommendation(s)  Extensible - can add new elements (even without DTD)  Easy to create special purpose languages (with DTD/SCHEMA)  Includes HTML-like hypertext links Stein Markup 1.25 – and extensions (XLINK, XPOINTER)  The future of the web !
  • 26. Stein Markup 1.26 XXMMLL -- aann EExxaammppllee <?xml version="1.0" standalone="yes"?> <bibliography> <book isbn=04712954> <title>Digital Signal Processing: a Computer Science Perspective</title> <author>Jonathan (Y) Stein</author> <publisher>John Wiley and Sons</publisher> </book> <article> <title>False Alarm Reduction for ASR and OCR</title> <author>Yaakov Stein</author> <proceedings>Tenth AICVNN Symposium</proceedings> <pages>195-200</pages> </article> ... </bibliography>
  • 27. ??WWhhaatt ccaann wwee ddoo wwiitthh aann XXMMLL ffiillee  Check if well-formed  Check if valid (against DTD or schema)  Display “as-is” in browser  Parse in special-purpose program (SAX, DOM)  Process (XSL) to XML, HTML, etc.  Display after processing Stein Markup 1.27
  • 28. WWiirreelleessss MMaarrkkuupp LLaanngguuaaggee Markup language element of Wireless Application Protocol WAP forum (1997) – Ericsson, Motorola, Nokia, Unwired Planet (phone.com) – bring Internet to cellular phone users – re-use fundamental Internet concepts (TCP/IP, http, html, javascript) Stein Markup 1.28 but adapted to lower bandwidth smaller screen limited input facilities limited computational resources – applications scale across transport options (GSM, TDMA, CDMA, 3G) and device types (mobile phones, personal assistants)
  • 29. Stein Markup 1.29 WWMMLL PPhhiilloossoopphhyy Defined using XML Transported in compressed binary (for BW reduction) Applications are modeled as decks of cards Features: Actions (OK, navigation, help) can be performed Hyperlinks (like in HTML) String variables Timers wbmp images (B&W) Select boxes, forms (for input) wmlscript (like javascript)
  • 30. Stein Markup 1.30 WWMMLL ssttrruuccttuurree < ? xml version=“1.0” ? > <!DOCTYPE wml …> <wml> <card> <p> text </p> <p> text </p> </card> .<.c.ard> </card> </wml>
  • 31. Stein Markup 1.31 SSoommee WWMMLL eelleemmeennttss  <p> </p> text  <a href=...> </a> hyperlink (anchor)  <do> </do> action  <go href=.../> goto wml page  <timer> trigger event (units = tenths of a second)  <input/> input user text  <prev/> return to previous page  $(…) value of variable  <img src=… /> display image  <postfield name=… value=…/> set variable  <select > <option> <option> </select> select box
  • 32. SSoommee mmoorree mmaarrkkuupp llaanngguuaaggeess Stein Markup 1.32  VML = Vector (graphics) Markup Language  VoiceXML  SSML = Speech Synthesis Markup Language  CPML = Call Policy Markup Language  DSML = Directory Services Markup Language  MathML = Mathematical Markup Language  CML = Chemical Markup Language  AML = Astronomical Markup Language  LegalXML  BSML = Bioinformatic Sequence Markup Language  GedML = Genealogical Data Markup Language  FinXML = Financial market Markup Language  ChessML  SDML = Signed Document Markup Language  RELML = Real Estate Listing Markup Language  etc. etc. etc. ...
  • 33. Stein Markup 1.33 EExxaammpplleess  HTML – html examples  XML – xml-file xsl-file xml  VML –vml-file  WML (get M3gate emulator) – wml examples