NOSQL[1]
NOSQL[1]
1. Explain the concept of relationships in graph databases and provide a clear diagram
to illustrate it?
Ans:
2. Explain with a neat diagram, the partitioning and combining in Map reduce.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT
Ans:
o MapReduce is a programming model used for processing large data sets.
o It involves dividing the work into smaller chunks (partitioning) and then combining the results of those
smaller chunks to produce the final output
5. Array Query:
{ "tags": { "$in": ["technology", "innovation"] } }:Finds documents where the tags array contains either
"technology" or "innovation"
4. Explain the three methods for scaling graph databases, along with a clear diagram
to illustrate each approach
Ans:
Graph databases can be scaled using three primary methods: vertical scaling, horizontal scaling, and
hybrid scaling. Each approach has its strengths and is suited for specific use cases.
1. Vertical Scaling (Scaling Up):
o Involves upgrading the hardware of a single machine where the graph database is hosted. This
includes adding more CPU, RAM, or storage capacity.
o Simple to implement.
o No changes required to the database architecture.
o Limited by the maximum hardware capacity.
o Not cost-effective for extremely large graphs.
2. Horizontal Scaling (Sharding or Scaling Out):
o Involves distributing the graph data across multiple machines (nodes). The graph is partitioned
based on nodes, edges, or subgraphs, and each partition is stored on a separate server.
o Allows handling of very large graphs.
o Increases fault tolerance and redundancy.
o Complex to implement due to the need for efficient partitioning and querying.
o Traversals across partitions can be slower.
3. Hybrid Scaling (Combined Approach):
o Combines both vertical and horizontal scaling. Initially, hardware is upgraded (vertical scaling) to
optimize performance, followed by distributing the graph across multiple nodes (horizontal scaling)
to handle larger data sets.
o Balances the simplicity of vertical scaling with the capacity of horizontal scaling.
o Provides a more flexible solution for handling both small and large-scale graphs.
o Higher cost and complexity compared to individual scaling methods
5. Explain the concepts of scaling and application-level sharding of nodes, and provide
Ans:
o Unlike aggregate-oriented NoSQL databases, graph databases are relationship-oriented, making sharding
difficult. Since any node can connect to any other node, storing related nodes on the same server is crucial
for efficient graph traversal.
o Traversing graphs across different machines leads to performance bottlenecks.
o One technique involves increasing the RAM on a single server to store the working set of nodes and
relationships entirely in memory.
o This is effective only when the dataset size can realistically fit within the available RAM on a single
machine.
o For large datasets, read scaling can be achieved by using a master-slave architecture. All write operations
are handled by the master, while multiple slaves are used for read-only access. This ensures that read
queries are distributed across the slaves, reducing the load on the master.
o Adding more slaves improves the database's read performance. Each slave contains a replica of the data,
allowing the system to handle higher read workloads efficiently.
o This is especially useful when the dataset cannot fit into a single machine’s memory but can be replicated
across multiple machines.
o Slaves contribute to system availability by providing read-only access even if the master node experiences
downtime. These slaves are configured to never become masters, ensuring stable read access without
conflicting write operations.
o These scaling techniques depend on the size of the dataset and infrastructure limitations. If the dataset fits
in RAM, vertical scaling (adding more RAM) is sufficient. For larger datasets, the master-slave pattern
provides a balance between availability and scalability, leveraging replication effectively.
o When the dataset size makes replication impractical, we can shard the data from the
application side using domain-specific knowledge.
o For example, nodes that relate to the North America can be created on one server while the nodes that
relate to Asia on another.
o This application-level sharding needs to understand that nodes are stored on physically
DIPLOMA LETERAL ENTRY : LATE BUT G REAT
different databases.
6. With an example of a two-stage Map-Reduce process and include a clear, detailed
diagram to explain it?"(Skip)
7. With a neat diagram. Explain the three ways in which graph databases can be
scaled.(Same as 4)
o Tools like Apache ZooKeeper help manage these servers and make sure
everything stays in sync.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT
2. List and explain use cases where graph databases are very useful.
Ans:
Connected Data:
a. Graph databases work well when data is highly connected, like social
networks or business relationships.
b. They can link data from different areas (social, commerce, location) to make
relationships more meaningful.
c. Example: Connecting people, companies, or products across various fields and
finding useful connections.
Recommendation Engines:
4. Explain graph database. With a neat diagram, explain relationship with properties in a
graph. (Skip)
6. Illustrate the differences between SQL queries and their equivalent commands
in the MongoDB shell?
Operati SQL MongoDB Command Explanati
on Comma on
nd
Select SELE db.users.find(); Fetches all
All Data CT * records from
FROM the "users"
users; table/collecti
on.
WHERE
name =
'John';
Count SELE db.users.countDocume Counts the
Records CT nts(); number of
COUNT( records in
*) FROM the "users"
users; table/collecti
on.
7. What are graph databases, and can you explain them with an example of a graph
structure?
Ans:
What is Graph Data?
Graph data represents information as a collection of entities (known as nodes) and
relationships between those entities (known as edges). In graph databases, these nodes
and edges are used to represent complex, interconnected data. The main advantage of
graph databases is their ability to easily model and explore the relationships between
data points.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT
• Nodes (Entities): These are individual objects or entities. For example, in a social
network, nodes could represent users.
• Edges (Relationships): These represent connections between nodes. For example, in
the social network, edges could represent friendships or interactions between users.
• Properties: Both nodes and edges can have properties, which are additional details
about them. For example, a node representing a user could have properties like name,
age, and location. An edge connecting two users could have properties like since
(indicating when they became friends).
o Example:
o db.runCommand({ shardcollection: "ecommerce.customer", key: {firstname:
1} })
3. Data Balancing:
o MongoDB automatically balances data between nodes to optimize
performance.
o When new nodes are added, data is rebalanced across them.
4. Replica Sets in Sharded Clusters:
o Each shard in a sharded cluster can be a replica set to ensure better read
performance within the shard.
5. Sharding Based on Location:
o Data can be sharded based on user location, so data is closer to the users (e.g.,
East Coast users' data in East Coast shards).
o This ensures low latency and better performance for geographically distributed
users.
6. No Downtime During Scaling:
o Adding nodes and rebalancing data can be done without taking the application
offline, though performance may temporarily decrease during large data
movements.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT
9. Define Key-value stores. Explain the data storage in Riak with limitations and
solutions to overcome the limitation.
Ans:
Key-Value Stores:
• A key-value store is a simple NoSQL database where each data element is stored as a
key-value pair.
• The key is a unique identifier, and the value is the data associated with that key.
• Data in the value part can be of any format, such as text, JSON, or binary data, with
the database not concerned with the contents of the value.
• Buckets: Riak organizes data into buckets, which act as namespaces for keys.
• Keys and Values: Within each bucket, data is stored as key-value pairs. The key is
unique within the bucket, and the value can be any data type, such as text, JSON, or
binary data.
• Replication: Riak replicates data across multiple nodes to ensure availability and
fault tolerance. The number of replicas is configurable.
• Sharding: Riak automatically distributes data across nodes using consistent hashing,
ensuring even data distribution and scalability.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT
1. Eventual Consistency: Data may not be consistent across all nodes immediately after
a write.
3. No Full ACID Transactions: Lacks full support for ACID transactions like relational
databases.
4. Conflict Resolution: Resolving write conflicts can be complex and requires manual
handling.
5. Sharding Issues: Data can become unavailable if a specific node in the cluster fails.
6. Performance Overhead: Replication and consistency checks can slow down write
operations.