Markup Languages play a very important role in encoding and standardizing information that is shared over the web. XML (Extensible Markup Language), being one of those languages, provides a set of rules to encode data in the document which is both human as well as machine-readable.
For this, XML uses a unique specification norm to define the structure of a document which is called a Document Type Definition (DTD). In this article, we have covered every minute detail about DTD and its components. So, without any delay, let us delve deep into the DTD Components.
Overview of DTD
Document Type Definition(DTD) includes the formal specification that describes the structure, legal elements, and attributes of an XML Document. We can say that it acts like a rulebook that specifies the structure and relationship between the elements of an XML Document. It also checks the validity and vocabulary against the grammatical rules of XML language.
Moreover, it defines the elements, their syntax, and rules to use them in an XML document. Hence, a valid and well-formed XML document must conform to the DTD specifications. Below is the syntax of DTD in a ‘sample.dtd’ file:
<!DOCTYPE rootElement [
<!-- DTD rules are defined here →
Declaration 1…
Declaration 2…
]>
Where the DOCTYPE is a delimiter, rootElement is the starting element parsed by the parser, and the square brackets define the list of declarations.
What are DTD Components?
DTD components are the building blocks of the XML Document which are described by the DTD. These are nothing but the XML Components that are described in DTD in terms of syntax, validation, and their order in XML files. The various DTD Components are described below:
Element:
This component describes notations to define an element, its required components, and any elements it can or cannot contain. Its syntax is outlined below:
<ELEMENT element_type minimization (content model) >
Where the element_type is the element name or tag name, for example, <head> tag. The ‘minimization’ denotes a two-character entry that indicates whether a start or an end tag is required or not. And, the ‘content model’ describes the list, sequence, and occurrence of the elements.
Attribute:
An attribute lists the additional properties that describe an element. The syntax of the Attribute component is shown below:
<!ATTLIST element-name attribute-name attribute-type attribute-value>
Here, ATTLIST denotes the attribute element, ‘element-name’ defines the name of the element, and ‘attribute-name’ defines the name of the attribute. And, ‘attribute-type’ and ‘attribute-value’ tells the type and value of the attribute.
Entities:
DTD Entities define the alternatives to certain characters in an XML document. It tells value to be in place of some special characters in a document. They can be defined internally and externally. The syntax of Entities is mentioned below:
<!-- Internal Entity -->
<!ENTITY entity_name "entity_value">
<!-- External Entity -->
<!ENTITY entity_name SYSTEM "URI/URL">
Here, ‘entity_name’ and ‘entity_value’ are the name and replacement values for the entity
In the Internal Entities, values are specified within the DTD tag. While in the external Entity, they are specified outside the DTD component.
Now, we will see some examples with the syntax for the sake of better understanding.
Examples of DTD Components
Example of DTD Element:
Suppose you specify the information about the library in an XML file as follows:
<!-- books.xml -->
<!DOCTYPE library SYSTEM "library.dtd">
<library>
<book>
<title>Harry Potter and the Sorcerer's Stone</title>
<author>J.K. Rowling</author>
<price>19.99</price>
</book>
<book>
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>14.95</price>
</book>
</library>
Thus, to describe the validated syntax, elements, and attributes, the DTD code for Element component is shown below:
<!-- library.dtd -->
<!ELEMENT library (book+)>
<!ELEMENT book (title, author, price)>
<!ELEMENT title (#PCDATA)> <!-- PCDATA means parsed character data of element-->
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
Here, the plus(+) sign signifies that the element can appear more than once in the document.
Example of DTD Attribute:
Imagine that you have to define attributes for Inventory Management data using the XML file as shown below:
<!-- products.xml -->
<!DOCTYPE inventory SYSTEM "inventory.dtd">
<inventory>
<product id="101">
<name>Laptop</name>
<price>899.99</price>
</product>
<product id="102">
<name>Smartphone</name>
<price>499.99</price>
</product>
</inventory>
Therefore, to list the attributes, the DTD code is as outlined below:
<!-- inventory.dtd -->
<!ELEMENT inventory (product+)>
<!ELEMENT product (name, price)>
<!ATTLIST product id CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT price (#PCDATA)>
Here, the ATTLIST defines the attribute and CDATA means the Character data of the attribute to be parsed.
Example of DTD Entity:
If you want to specify the replacement value for an entity named ‘company’, the XML code will be:
<!-- company.xml -->
<!DOCTYPE organization SYSTEM "organization.dtd">
<organization>
<name>&company;</name>
<department>IT</department>
</organization>
Now, the DTD code to define the replacement for the entity would be:
<!-- organization.dtd -->
<!ENTITY company "XYZ Pvt Ltd ">
<!ELEMENT organization (name, department)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT department (#PCDATA)>
It means that whenever the ‘company’ name is used, the information about the company defined in the DTD code will be referenced.
After learning about the DTF components, let us see their notable features.
Features of DTD Components
- Define the structure of the document: Each DTD component contributes to the structuring of the XM document which helps to properly define the structure of the XML file.
- Define the data type: They allow us to properly define the data type of each element in the document so that components can be used appropriately.
- Provide complete information on the document: They clearly state the optional as well as compulsory attributes and elements in the document so that we can maintain the information easily.
- Define the order of the Elements: DTD component specifies the order in which the elements occur within a document along with their frequency. It also specifies whether they can be reused or not.
Advantages of DTD Components
- Provide a structured approach: DTD specification helps us to follow a step-by-step process to define and validate the XML document.
- Promote reusability: With Entity elements, they make the elements reusable which allows us to use a modular approach. Thus, the components can be used again without redefining them.
- Define our document format: It allows us to define our own XML format so that the developer or users can easily understand the information in the XML document.
- XML document is validated properly: It avoids all the flaws in the document by checking it against the grammatical rules so that the information can be parsed properly.
Disadvantages of DTD Components
- DTD components are not Object-Oriented: DTD follows a procedural approach to define the structure and relationship between the elements in a document. Thus, it is not possible to define the elements as Objects for complex document
- Support text string data type: In DTD, we can store and validate the text-string data type due to which diverse data types cannot be supported in the document
- Don’t support namespaces: Namespaces are present in DTD due to which we can encounter naming conflicts or naming collisions in the larger documents.
- Provide limited cardinality options: You can use only zero or more occurrences (*), one or more occurrences (+), and optional occurrences (?) in DTD which makes it less flexible.
Conclusion
DTD Components describes the building blocks of the XML document and the rules to validate them as per the standards. Each component in DTD has some utility in the XML documentation, to describe the attributes and entities, whether they are mandatory or optional, their order, and their occurrences in the document.
Similar Reads
Introduction of Compiler Design A compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler Design Basics
Introduction of Compiler DesignA compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler construction toolsThe compiler writer can use some specialized tools that help in implementing various phases of a compiler. These tools assist in the creation of an entire compiler or its parts. Some commonly used compiler construction tools include: Parser Generator - It produces syntax analyzers (parsers) from the
4 min read
Phases of a CompilerA compiler is a software tool that converts high-level programming code into machine code that a computer can understand and execute. It acts as a bridge between human-readable code and machine-level instructions, enabling efficient program execution. The process of compilation is divided into six p
10 min read
Symbol Table in CompilerEvery compiler uses a symbol table to track all variables, functions, and identifiers in a program. It stores information such as the name, type, scope, and memory location of each identifier. Built during the early stages of compilation, the symbol table supports error checking, scope management, a
8 min read
Error Handling in Compiler DesignDuring the process of language translation, the compiler can encounter errors. While the compiler might not always know the exact cause of the error, it can detect and analyze the visible problems. The main purpose of error handling is to assist the programmer by pointing out issues in their code. E
5 min read
Language Processors: Assembler, Compiler and InterpreterComputer programs are generally written in high-level languages (like C++, Python, and Java). A language processor, or language translator, is a computer program that convert source code from one programming language to another language or to machine code (also known as object code). They also find
5 min read
Generation of Programming LanguagesProgramming languages have evolved significantly over time, moving from fundamental machine-specific code to complex languages that are simpler to write and understand. Each new generation of programming languages has improved, allowing developers to create more efficient, human-readable, and adapta
6 min read
Lexical Analysis
Introduction of Lexical AnalysisLexical analysis, also known as scanning is the first phase of a compiler which involves reading the source program character by character from left to right and organizing them into tokens. Tokens are meaningful sequences of characters. There are usually only a small number of tokens for a programm
6 min read
Flex (Fast Lexical Analyzer Generator)Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc
7 min read
Introduction of Finite AutomataFinite automata are abstract machines used to recognize patterns in input sequences, forming the basis for understanding regular languages in computer science. They consist of states, transitions, and input symbols, processing each symbol step-by-step. If the machine ends in an accepting state after
4 min read
Classification of Context Free GrammarsA Context-Free Grammar (CFG) is a formal rule system used to describe the syntax of programming languages in compiler design. It provides a set of production rules that specify how symbols (terminals and non-terminals) can be combined to form valid sentences in the language. CFGs are important in th
4 min read
Ambiguous GrammarContext-Free Grammars (CFGs) is a way to describe the structure of a language, such as the rules for building sentences in a language or programming code. These rules help define how different symbols can be combined to create valid strings (sequences of symbols).CFGs can be divided into two types b
7 min read
Syntax Analysis & Parsers
Syntax Directed Translation & Intermediate Code Generation
Syntax Directed Translation in Compiler DesignSyntax-Directed Translation (SDT) is a method used in compiler design to convert source code into another form while analyzing its structure. It integrates syntax analysis (parsing) with semantic rules to produce intermediate code, machine code, or optimized instructions.In SDT, each grammar rule is
8 min read
S - Attributed and L - Attributed SDTs in Syntax Directed TranslationIn Syntax-Directed Translation (SDT), the rules are those that are used to describe how the semantic information flows from one node to the other during the parsing phase. SDTs are derived from context-free grammars where referring semantic actions are connected to grammar productions. Such action c
4 min read
Parse Tree and Syntax TreeParse Tree and Syntax tree are tree structures that represent the structure of a given input according to a formal grammar. They play an important role in understanding and verifying whether an input string aligns with the language defined by a grammar. These terms are often used interchangeably but
4 min read
Intermediate Code Generation in Compiler DesignIn the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). The benefits of using mach
6 min read
Issues in the design of a code generatorA code generator is a crucial part of a compiler that converts the intermediate representation of source code into machine-readable instructions. Its main task is to produce the correct and efficient code that can be executed by a computer. The design of the code generator should ensure that it is e
7 min read
Three address code in CompilerTAC is an intermediate representation of three-address code utilized by compilers to ease the process of code generation. Complex expressions are, therefore, decomposed into simple steps comprising, at most, three addresses: two operands and one result using this code. The results from TAC are alway
6 min read
Data flow analysis in CompilerData flow is analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done. In general, its process in which values are computed using data flow analysis. The data flow property represents information that can b
6 min read
Code Optimization & Runtime Environments
Practice Questions