Abstract Introduction: Information Retrieval (IR) involves In today’s world, the rapid expansion of retrieving relevant information from large digital data across all aspects of life brings datasets. Semantic models have been both promising possibilities and significant considered up-and-coming tools for hurdles. This data raises diverse enhancing effectiveness in Information knowledge creation and opens doors for Retrieval systems. However, IR systems innovation, yet it also poses a growing face different challenges in providing challenge like managing the vast amounts answers to unambiguous queries. Also of unstructured data effectively [1]. whenever the users need to have a more Information Retrieval (IR) systems are sophisticated kind of knowledge of crucial in addressing data management relevant information they need. challenges by sifting through vast data Furthermore, an information retrieval (IR) repositories to deliver relevant information system's performance largely depends on to users. These systems are instrumental in the models and algorithms to compare organizing unstructured data—from text documents with queries. Therefore, the documents and multimedia content to web study aims to distinguish between pages and social media posts. However, IR semantic and non-semantic models and systems often fall short in accurately compare their efficiency based on extracting the most relevant information precision and average precision. The study and consistently providing the knowledge seeks to illustrate the capability of users seek [2]. Early Information Retrieval semantic models to increase the accuracy (IR) systems relied heavily on keyword- of information retrieval and help develop based search engines, which could retrieve the IR field. The proposed methodology relevant documents from vast data comprises three phases: The first is the repositories only when exact phrase query process for the Latent Semantic matches were found [3]. A major Query generation, and the second involves drawback of keyword-based techniques is using different models for comparison and their inability to grasp the contextual dataset selection. A two-step evaluation meaning behind user queries. This often phase is employed for the abstraction of leads to irrelevant search results, which in outcomes and comparison. The finding turn causes user dissatisfaction [3]. shows that the Semantic Query Generation More advanced Information Retrieval (IR) and semantic model like the Word2Vec systems go beyond simple keyword model is superior to other models in both matching, aiming to interpret user intent evaluation processes and can effectively and meaning in a way that predicts human capture semantic relationships. On the understanding. However, despite these other hand, query processing enlarges improvements, IR technology still has retrieval accuracy by up to 90% with limitations. In today’s data retrieval semantics in the retrieval process. landscape, both traditional and semantic Keywords: Information Retrieval, systems often struggle to retrieve Semantic Model, Cosine Similarity, Vector information that aligns with a query's underlying intent and context. As a result, the choice of model and the design of the challenges around accessing and retrieving algorithm significantly impact the system’s relevant information in today’s data- ability to interpret the semantics within packed world. documents, ultimately affecting the The paper is structured as follows: Section precision and relevance of the results 2 discusses the literature review, while delivered to users [4]. Information retrieval Section 3 represents the research models often struggle to accurately methodology. Section 4 conducts a critical interpret user queries and intentions, analysis to compare models for solving resulting in an overload of irrelevant or problems. Finally, the conclusions made loosely related documents. These from the findings are included in the inefficiencies in the retrieval process can paper's final part, providing a summary of be particularly frustrating for users. As the research to guide future research shown in the study of Yordan Kalmukov needs in this field. [5], using Cosine Similarity the accuracy Literature Review: in text similarity matching is not achieving a better result. In the evolving field of information retrieval, a growing body of research emphasizes the role of semantics in enhancing relevance and accuracy. This distinction underlines the differences between traditional IR models and newer semantic IR models. Semantic approaches, like Word2Vec and Latent Semantic Analysis (LSA), aim to capture the context and meaning behind queries. Despite these advances, semantic models still face limitations in consistently retrieving highly relevant results. The paper [6] Has an average precision of 73% which needs to be improved for better retrieval. This study aims to dive into the limits of current information retrieval (IR) models and highlight how semantic IR models differ from traditional ones. We’ll do this by critically analyzing both types and comparing them on precision and average precision. This research aims to add real value to the field of information retrieval by offering insights into how these models stack up—especially when it comes to models like cosine similarity, vector space models, and Word2Vec. Ultimately, the findings could help tackle some of the big This section breaks down the different Each model has its strengths and types of semantic and non-semantic weaknesses suited to specific IR tasks. models used in information retrieval (IR). There’s also a wide range of semantic New models are constantly being models, each designed for a particular developed to address the weaknesses of purpose. Some of the newer approaches traditional methods, making IR a diverse include knowledge graph embeddings, field with models like vector space, distributional semantic models, and probabilistic, and Boolean models. ontology-based methods. Word Contextual IR models, which improve embeddings, a type of distributional relevance by factoring in the query’s semantic model, can learn semantic context, are also gaining popularity. relationships from large text datasets. Many of these semantic and non-semantic models are easy to integrate into any IR system, helping to boost the accuracy and relevance of search results. The following are some of the key models that have been proposed to enhance IR systems.
Figure 1 Literature Review
NeuIR: shaping the future of how we Neural networks are shaking things search for and use information. up in the field of information Ranking Models: retrieval (IR). More and more The era of simple keyword Neural Information Retrieval matching is behind us. Re-ranking (NeuIR) models are being adopted, models that understand the transforming the way modern relationships and deeper meanings retrieval strategies work [7]. Neural between words are now leading to Information Retrieval (NeuIR) is big jumps in retrieval accuracy. Take the MS MARCO passage ranking competition, for example Personalization and context- —methods powered by LLM-based awareness are now at the heart of re-ranking models like BERT, search technology. These systems RoBERTa, and ELECTRA are adapt to individual needs, dominating the field [7]. delivering tailored search BERT and Transformers: experiences that feel more intuitive Recently, several Transformer- and relevant. based models have been designed Efficient and Scalable Retrieval: to connect semantic data across As the digital world keeps growing both visual and textual domains at an incredible pace, managing [8]. These models excel at information overload is becoming a capturing the relationships within bigger challenge. Semantic tackles the context of text, allowing this by creating scalable, efficient systems to truly grasp the meaning systems for finding and organizing behind queries and documents. The data. These systems help improve result? More relevant and scalability, especially in areas that meaningful search outcomes. aren’t well-represented in the Learning to Rank: training data [8]. Due to better The science of learning to rank index structures and smarter (LTR) search results has become so algorithms, these systems can advanced it’s almost like black art. handle massive amounts of data LTR systems use data from while keeping search speeds previous user actions during a blazing fast. search session to fine-tune and Explainable AI: boost the performance of search When it comes to using AI in queries [9]. These models learn information retrieval (IR), building from user interactions and transparency and trust is crucial. feedback, constantly improving Researchers suggest that trust how they present results. The goal grows when algorithms and their is to ensure the most relevant outputs—both digital and physical information appears right at the —are accurate and reliable. It also top. helps if machine learning focuses Interactivity and User-Centric on cause-and-effect relationships Retrieval: rather than just relying on surface- Recent research focuses on level correlations [10]. understanding user behavior and Reinforcement Learning: preferences to create more Reinforcement learning is quickly interactive, user-focused retrieval becoming a game-changer for systems. Behavioral signals, like improving information retrieval. clicks or time spent on a page, act These techniques let systems learn as implicit feedback, allowing and adapt by analyzing user search engines to incorporate interactions and feedback, and fine- behind-the-scenes user data to tuning ranking algorithms to improve their performance [9]. deliver better search results over time. Plus, behavioral signals act as adding semantic models to the process, it subtle feedback tools, using hidden bridges the gap between what you’re patterns in user behavior to looking for and how it’s understood. enhance various parts of search Unlike old-school keyword methods, engines without users even noticing semantic approaches focus on uncovering [9]. the meaning and connections within text. Context-Aware Retrieval: Key tools like ontology-based Grasping the context behind a representations, semantic annotation, and search is essential for providing semantic indexing play a big role in truly relevant results. Things like making this happen. By factoring in these your location, device, and past elements, search systems can dive deeper interactions all play a role in into queries and documents for more shaping a hyper-personalized accurate results. search experience that’s tailored Word relationships: just for you. This kind of Semantic models go beyond just adaptation helps users feel more matching words—they capture confident, knowing the search relationships like synonyms, engine "gets" what they’re looking antonyms, and for [11]. hypernyms/hyponyms. This helps Ethical Considerations: systems understand the true As semantic technologies reshape meaning and intent behind a query, information retrieval, it’s crucial to even when the exact keywords keep ethics front and center. Digital aren’t in the documents. A step up ethics, in simple terms, means the from this is Relation Extraction moral principles and value systems (RE), which focuses on uncovering that guide how we interact in the connections between words. Unlike online world [10]. Terminology Extraction or Topic computers can now fully understand what Modeling, which mostly deal with users are searching for and provide fixed relationships (like synonymy answers that are ethical, personalized, and or relatedness), RE dives deeper, relevant to the context. This marks a new identifying a broader range of links era for information retrieval, placing it at between entities—think "born-in," the cutting edge of the future. As this field "married-to," or "interacts-with." keeps evolving, finding exactly what you It’s like giving systems a richer need could soon be as effortless as just toolkit for understanding how thinking about it. concepts connect in the real world Integrating Semantics in [12]. Information Retrieval: Entity recognition and linking: Traditional information retrieval has To help systems grasp real-world mainly relied on keyword matching, but entities and their relationships, let’s be honest—it often leads to semantic models can identify and frustrating, irrelevant search results. That’s connect entities like people, places, where semantic integration steps in. By and organizations in both queries and documents. This process involves tasks like Entity unstructured data. It works by Recognition, Entity applying a mathematical approach Disambiguation, and Entity called Singular Value Decomposition (SVD), which Linking, all of which work within helps identify patterns and the Semantic Web framework to relationships within the data, even make these connections smarter when they aren’t obvious on the and more accurate [12]. surface [14]. These greatly improve Ontology-based reasoning: the accuracy of matching between Thanks to ontologies, which the user search and documents, provide organized knowledge even in an environment where the about specific topics or domains, match is not on specific phrases. A semantic models can uncover major quality of semantic hidden connections between ideas. integration is the ability to deliver a This means computers can find range of key advantages. relevant information even if a Analysis of Semantic Integration query isn’t clear. By reasoning Approaches: through semantic links between concepts and entities, these systems There are plenty of ways to add semantic not only boost search accuracy but knowledge to improve how information also contribute to building new retrieval systems work. Let’s break down knowledge. Take BIGOWL4DQ the pros and cons of these approaches. as an example—it expands on Contrastingly, incorporating semantic BIGOWL by adding features knowledge can boost accuracy and user focused on data quality. It experience. For instance, search engines improves reasoning capabilities and helps integrate data quality become smarter at understanding what tasks directly into Big Data users mean—even when they phrase things processes, making everything more differently. Think about how a search efficient and accurate [7]. engine can figure out that "books," Semantic Annotation: "novels," and "romance novels" might all Documents come with metadata, lead to the same types of results, like tags, that help convey their depending on the context. This makes meaning. In large-scale systems, annotated datasets are often used to searches more relevant and user-friendly. create gold standards for But it’s not all smooth sailing. In certain evaluation. For example, this data applications, using semantic models can can train machine learning tools to introduce challenges, like added predict sentence structure, extract complexity or inefficiencies. A lot of key arguments from case texts, or research has gone into figuring out the best even build a summarization system ways to weave semantic knowledge into that condenses the original text. These annotated datasets play a these systems, with plenty of methods crucial role in improving the already outlined in the literature. Let’s take accuracy and functionality of a closer look at some of those approaches: machine learning models. [13]. Query expansion: Semantic Indexing: Query expansion and enhancement Latent Semantic Indexing (LSI) is models are like giving search a popular method for uncovering engines a boost. They work by similarities in collections of adding related terms, synonyms, or nuances of what users are asking, even specific entities using making them more effective and semantic models. The result? user-friendly. This work lays the Better search results. These models foundation for smoother, smarter improve both precision (finding the conversations [17]. Rule-Based Systems: right stuff) and recall (finding more By relying on predefined semantic of it). rules or ontologies, these systems Researchers have put a lot of effort create a clear framework for into studying how well query integrating semantic knowledge. A expansion (QE) works for great example comes from a case reformulating searches. The goal is study using Stardog 6.014 [18], a simple: make searches smarter and tool designed to handle semantic more effective at pulling up the inference. best results. And it seems to be Stardog works by applying preset doing the job [15]. rules to evaluate things like energy Semantic ranking: performance. It pulls data from These models rank documents by analyzing how similar their three main sources: OWL meanings are to the search query— ontologies, RDF instances in the not just by matching keywords, but ABox, and SWRL inference rules by looking at the deeper in the TBox. What makes it unique relationships and context behind is its “lazy” reasoning approach—it them. It’s like finding connections doesn’t process everything upfront. that go beyond the surface. To take Instead, it performs reasoning at it up a notch, semantic models for query time, offering flexibility and re-ranking can use more advanced efficiency when responding to and complex architectures. Why? queries. To reach even higher levels of Performance Comparison: precision when delivering results. When evaluating retrieval models, It’s a smarter way to make sure users get exactly what they’re researchers often look at metrics looking for [16]. like precision, recall, and user Conversational search: satisfaction. These assessments are Semantic models are a game- typically carried out across various changer for conversational benchmarks and datasets to get a interfaces. They allow users to comprehensive picture of interact with information retrieval performance. The study [16] systems in a way that feels natural Explores the current state of first- and intuitive—like having a real stage retrieval models and offers a conversation instead of typing rigid comparison of different search terms. For task-oriented approaches. It examines early systems (think virtual assistants or semantic retrieval techniques, customer service chatbots), semantic models make the magic which focus on basic semantic happen. They help these systems understanding, alongside neural understand the context, intent, and semantic retrieval methods that use advanced AI and deep learning for continued improvement in semantic smarter matching. It also considers modeling [19]. conventional term-based retrieval 2. Computational Worth: techniques, which rely on Complex models often demand a lot of straightforward keyword matching. computing power, especially when they’re This analysis sheds light on how working with massive datasets. This can these methods perform in terms of make them tough to use in situations accuracy and user experience. where speed is critical. Challenges: Take semantic reasoning as an example. Its Using different models in information high processing cost means it’s not ideal retrieval systems can lead to great results, for real-time monitoring in large-scale but there are still some common challenges applications. Imagine trying to use it for that need attention. These challenges are something like early epidemic detection— both conceptual—like understanding the where every second counts—and you can meaning and context behind a query—and see why it might fall short. These kinds of technical, involving how systems process limitations make it clear that finding the and retrieve information effectively. To right balance between model complexity tackle these issues, ongoing research and and practicality is essential [20]. innovation are essential. Overcoming these Scalability and practical implementation hurdles is key to unlocking the full need efficient algorithms and potential of information retrieval systems infrastructure. and making them even more useful for 3. Model Complexity: users. Understanding how complex models make 1. Data Hurdles: retrieval decisions isn’t always easy, Gathering and managing massive amounts thanks to their intricate internal workings. of high-quality, domain-specific data to Plus, any biases or flaws in the training train strong models isn’t easy. It’s a tricky data can creep into the models, leading to process that demands time, resources, and unfair or discriminatory outcomes. Models expertise. like COIL are a good example because Even one of the most well-known models, they’re more advanced but also more Latent Semantic Analysis (LSA), isn’t complex and costly to run compared to without its flaws. While widely used and simpler matching retrievers [16]. To make effective in many cases, LSA has faced these systems more user-friendly, it’s criticism. For one, it overlooks the essential to focus on keeping them as clear importance of word transitions, which are and straightforward as possible. crucial for understanding how terms Conceptual Challenges: connect in context. It’s also been flagged Ambiguity: for breaking certain rules about connective Resolving ambiguity in user power (the strength of word relationships) queries and content remains a and for lacking an incremental learning tricky challenge. Traditional search mechanism, meaning it can’t adapt or methods often add to the improve dynamically as new data comes confusion, struggling to refine the in. These limitations highlight the need for search field effectively. Even with modern BERT-based models that aim to better understand queries, similarity between text data, often many issues still need to be relying on unclear rules-based addressed to fully eliminate methods [24]. To truly measure the ambiguity. There’s progress, but effectiveness of semantic there’s still work to do [21]. integration, we need evaluation Scaling up semantics: criteria that go beyond traditional Making sure semantic integration metrics like recall and precision. can scale effectively is essential as This is a significant challenge for data keeps expanding. The process researchers, as it involves of combining data from different estimating semantic similarity databases is called heterogeneous between text data, often relying on database integration, and it’s no unclear rules-based methods. easy task. There are three major Methodology: challenges when integrating Developing Information Retrieval (IR) databases within the same domain: systems powered by semantic models is a structural heterogeneity, syntactic major step toward transforming how we heterogeneity, and semantic access and retrieve information. This heterogeneity [22]. These issues chapter explores the design and creation of make solving the heterogeneity such systems, focusing on integrating problem tricky. To tackle it, you semantic models to improve retrieval need to focus on studying accuracy and effectiveness. Based on distributed systems and fine-tuning previous research, the primary aim is to algorithms to handle the address the ongoing challenges in complexity. traditional IR systems. Although many Ontology integration: studies have tried to tackle the issues in Bringing together different semantic information retrieval, the semantic models and standards is a persistent limitations and weaknesses in big deal for researchers, current systems are well-documented. developers, and users alike. Right While researchers have made some now, there’s no systematic way to valuable progress, the ultimate goal of integrate domain ontologies from delivering consistently satisfying search different sources, which makes life results remains unsolved. harder for developers and users This research seeks to bridge those gaps by [23]. Shared frameworks and proposing new solutions and ontologies will be key to enabling methodologies for extracting semantic smooth interaction across systems. meaning from documents to enhance Evaluation Success: information retrieval. The goal is to create To truly measure the effectiveness reliable systems that can extract and use of semantic integration, we need semantic information effectively. By evaluation criteria that go beyond leveraging advanced semantic algorithms traditional metrics like recall and and insights from modern studies, the precision. This is a significant foundation is laid for groundbreaking challenge for researchers, as it improvements in retrieval performance, involves estimating semantic pushing the field forward with innovative involves creating more detailed and techniques. relevant queries. Evaluation methods are High-Level Architecture: used to compare various models, and The proposed architecture is built around Result Evaluation procedures determine four main components. The Information how useful and relevant the retrieved Retrieval system starts with simple information is to ensure the system keyword-based queries and advances to effectively meets user needs. The Latent Semantic Query Generation, which components are shown in Figure 1.
Figure 2 Research Architecture
The goal of this study is to review different head-to-head comparison of how well the models used in information retrieval (IR) models perform under the same conditions. and evaluate how effective they are. To do The evaluation focuses on precision this, the system processes basic queries measures, which are key to assessing the three times—once with each of the quality of the retrieval results. following models: the cosine similarity The study is designed to make the model, the vector space model, and the differences between the models’ precision Word2Vec model. The results from these scores statistically significant, giving a models are then compared in terms of their clearer picture of how they stack up. By precision and effectiveness, helping to carefully conducting iterative experiments highlight their strengths and weaknesses. and evaluations, this research provides a This comparison sets the stage for further detailed analysis of each model’s research by exposing the limits of each effectiveness in retrieving accurate and approach. relevant information. Before being used in the IR system, the Semantic Query Process: queries are pre-processed and then run A semantic algorithm was designed to through each model. This allows for a handle textual queries by diving deeper into their meanings, relationships, and The Development of the Semantic contexts. This approach aims to enhance Algorithm: the accuracy and relevance of information Developing a semantic algorithm starts retrieval. The process was broken down with a clear understanding of what it needs into several sub-phases, each contributing to achieve. The first step is defining the to the overall effectiveness of the specific goals and functions—what algorithm. semantic elements need to be extracted Ontology Selection: from the text and how these elements will The review process for this research is improve information retrieval. This phase highly meticulous, focusing on identifying lays the foundation, ensuring the design ontologies that not only provide essential addresses all the requirements and concepts but are also detailed enough to objectives. It also includes creating capture complex semantic relationships. methods to encode contextual details, word Essentially, each ontology is evaluated for relationships, and even the implicit its relevance based on key factors like meanings hidden within the text. coverage, expressiveness, and suitability Once the design phase is complete, the for semantic modeling. These factors are focus shifts to implementing the algorithm. carefully scrutinized to ensure that any This involves turning the theoretical selected ontology is capable of meeting the framework into functional code using the specific goals of the research. appropriate programming languages and The ultimate aim is to choose one or more libraries. The choice of tools and ontologies that align closely with the computing platforms is crucial here; they research objectives. For instance, the must align with the algorithm’s selected ontology [25] offers a clear and requirements and provide the necessary organized framework of classes, capabilities for effective implementation. properties, and instances. It’s specifically Selecting the right programming languages designed to structure COVID-19-related and resources is key to ensuring the knowledge into a coherent system. This algorithm can handle the complexities of approach covers key aspects of the virus, semantic extraction. such as symptoms, safety precautions, After implementation, the algorithm transmission modes, variants, treatments, undergoes rigorous validation and testing. and vaccines, ensuring each area is well- This step ensures it performs as intended— represented. accurately identifying semantic features By defining classes with attributes and within the given textual data and doing so relationships among different COVID-19 efficiently. These tests are critical to fine- elements, this ontology creates a tuning the algorithm, making it a reliable standardized model for data exchange and and powerful tool for improving interoperability. This structured framework information retrieval. Through this will be a valuable tool for researchers, iterative process, the algorithm is medical professionals, policymakers, and strengthened to meet the demands of real- others involved in combating the world applications. pandemic, making it easier to access, share, and analyze organized data. 1: Semantic Algorithm
Generation: The semantic program works by using ontology to enrich a query’s content and generate new, more detailed queries. Here’s how it operates: First, it extracts all instances from the ontology and organizes them in a way that makes them usable for search purposes. Once this groundwork is done, the program compares the terms On the other hand, the vector space model from the original query with the ontology's approaches retrieval by treating documents classes to determine their semantic and queries as vectors in a relevance. multidimensional space and calculating After identifying matching classes, the their similarity to find relevant matches. system digs deeper to find related instance To enhance the system’s retrieval details within those classes. It then links performance, a semantic layer is added these new terms and instances back to the with the Word2Vec model. Word2Vec is a original query. By doing this, the program powerful tool in natural language essentially expands and enhances the processing (NLP) because it captures the query, incorporating additional semantic meanings and relationships between insights that might not have been obvious words. This combination of classical and at first. semantic models will allow the study to The result is a much richer and more compare how well these techniques context-aware query. This expanded query perform and contribute to more accurate doesn’t just find more results—it finds and meaningful information retrieval. more relevant and meaningful ones. By Evaluation: refining the relevance of terms and classes Evaluating information retrieval (IR) based on ontology, the program boosts systems is a critical step in understanding both the accuracy and contextual how effectively they can find relevant understanding of information retrieval. documents within a collection. This phase This ensures that the retrieved information helps identify strengths and weaknesses aligns more closely with the query’s and ensures the system is fine-tuned for intended meaning. better performance. To do this, testing is IR System: carried out using various queries during Phase Two, focused on Research the development process to assess the Methodology, involves selecting the system’s effectiveness under different models that will form the backbone of the conditions. information retrieval (IR) framework. This When it comes to measuring similarity decision is crucial, as it sets the stage for between text documents, several the experimental and evaluation phases techniques are used, each based on that follow. different paradigms and grounded in From the variety of available IR models, distinct theoretical approaches. In this two classical ones have been chosen: the evaluation, three popular models—Cosine vector space model and cosine similarity. Similarity, the Vector Space Model These will be paired with a more advanced (VSM), and Word2Vec—are compared to semantic model, Word2Vec, to create a assess their performance. robust and comparative framework. To make the comparisons meaningful, two Cosine similarity, known for its simplicity key measures are used: precision and and effectiveness, calculates the cosine of average precision. These metrics are the angle between document vectors in a applied to investigate how well the three high-dimensional space. It’s widely used models perform on specific tasks. This for measuring how similar documents are, structured comparison not only highlights making it an ideal choice for this project. how each model handles the problem of document similarity but also provides efficient these methods are—and where deeper insights into their strengths and they fall short—by zooming in on limitations, making it easier to determine precision scores. These range from small which model works best for particular percentages for cosine similarity to applications. Word2Vec hitting 100% precision in most Mathematically, precision is calculated as: cases. What’s the big takeaway? This Precision = evaluation not only helps us see where Number of Relevant Documents Retrieved 4.1 these methods shine (and where they Total Number of Documents Retrieved don’t) but also lays the foundation for Their precision scores are averaged to future breakthroughs in information compare the performance of different retrieval. models in detail. This is done using a straightforward formula: the precision Retrieval Model Evaluation: scores for each model are added together, During the evaluation phase, the selected and the total is then divided by the number retrieval models were put to the test using of models. Calculating this mean precision a specific dataset. The goal? To see how gives a clear, overall view of how each well these models could measure the model performs against the others. It’s a similarity between queries and documents simple yet effective way to summarize and for both general queries and latent compare their performance, especially semantic queries. when working with multiple models in an The study used three models—Cosine information retrieval system. Similarity, Vector Space Model, and Mathematically, Average precision is Word2Vec—on a dataset of 100 research calculated as: papers covering a variety of topics. Each Average Precision= 4.2 model was tasked with comparing Total Precision percentage documents to match them with queries that Total number of queries reflected user search intentions. In short, The first evaluation uses general queries the models had to show how accurately on each IR system to measure precision for they could find relevant matches based on similar documents. This gives an initial the queries. sense of how well each model performs Cosine Similarity Performance with basic queries. Metrics Using General and Latent The second evaluation uses latent semantic Semantic Queries: queries across all three models. This The retrieval model generates a list of provides more detailed precision scores documents for each query, along with a and allows for a better comparison of their similarity percentage for each one. To effectiveness. figure out how accurate these results are, Result and Discussion: the documents are evaluated using reliable This is the most crucial part of the relevance assessments. analysis. It’s where we carefully evaluate Precision is the key metric here—it how models like Cosine Similarity, Vector measures how well the model performs. In Space Model, and Word2Vec perform simple terms, precision shows the when tested on both general and latent percentage of relevant documents retrieved semantic queries. The results show how out of all the documents in the dataset. The higher the precision, the better the model is In the Vector Space Model (VSM), the at finding what truly matters. retrieval process works by calculating a similarity score for every query in the dataset. Each document is assigned a score that reflects how closely it matches the query. This helps measure how relevant a document is to the specific query it’s paired with. To evaluate how effective this model is, precision assessments are used. These evaluations give a clear picture of how well the model retrieves relevant documents for each query. Ultimately, the results provide a real measure of the model’s performance and allow for a Graph 4-1 Cosine Similarity deeper exploration of its capabilities in The first model, Cosine Similarity, is used information retrieval. to compare queries. The graph above shows its evaluation results, with two sets of values for each query: one for general queries and one for latent semantic queries. The graph highlights how similar the queries are based on this method. Overall, the graph demonstrates how useful Cosine Similarity is for measuring semantic relationships in textual data. It clearly illustrates how the similarity scores vary depending on the type of query. These results offer valuable insights: the Graph 4-2 VectorSpace Model latent semantic query (LSQ) method does The graph above shows how the Vector a better job capturing deeper semantic Space Model (VSM) performed on five meanings between queries compared to specific queries (Q1 to Q5), comparing general queries. general queries with latent semantic Higher similarity scores point to a stronger queries. It focuses on the similarity semantic connection between the queries percentage between each query and the and the hidden information they contain. dataset, giving us a sense of how well On the other hand, lower scores indicate VSM retrieves relevant documents. greater dissimilarity, often revealing What stands out? The evaluation reveals irrelevant or mismatched information in that different queries—and query types— the dataset. produce varying levels of precision. In Vector Space Model Performance simpler terms, some queries are matched more effectively than others. This tells us Metrics Using General and Latent that VSM is generally good at finding Semantic Queries: relevant documents but has room for improvement depending on the type of query. Higher precision scores mean VSM successfully captures and represents the meaning behind the queries, retrieving more relevant documents. On the flip side, low precision scores suggest there are gaps in the retrieval process, pointing to potential issues with data extraction. Overall, the graph gives a well-rounded view of VSM's strengths and weaknesses in information retrieval. These results provide valuable insights for improving Graph 4-3 Word2Vec Model and fine-tuning the model, helping it better The graph highlights how the Word2Vec meet users' needs in different query model performed with two types of scenarios. queries: general queries and latent Word2Vec Model Performance semantic queries. Each query is assigned a percentage, representing how accurately Metrics Using General and Latent the model retrieves relevant information. Semantic Queries: The results? They’re impressive! The The Word2Vec model takes a slightly Word2Vec model consistently delivers different approach. For each query in the outstanding precision, with most scores dataset, it generates a list of documents above 95%. This demonstrates its strong ability to identify relevant information. For along with relevancy scores. These scores latent semantic queries specifically, the reflect how closely the documents align model often achieves a perfect precision with the given query, helping determine score of 100%. how effective Word2Vec is at retrieving This flawless performance shows that relevant content. Word2Vec excels at capturing the deeper, The primary metric for measuring underlying meanings of queries, making it effectiveness here is precision—how many highly effective at retrieving documents that closely align with the semantic context of the retrieved documents are relevant? of the search. This model is a top This evaluation helps researchers see how contender when it comes to accuracy and well Word2Vec performs in matching relevance. documents to the underlying meaning of Comparison of General Queries queries. and Latent Semantic Queries: What sets Word2Vec apart is its ability to Below are two tables that summarize the dive deep into semantic relationships. By performance of the three retrieval models. understanding the nuanced connections Cosine Similarity, Vector Space Model between words and texts, Word2Vec (VSM), and Word2Vec under two different captures richer meanings and context. This scenarios: one for general queries and the allows for a more detailed evaluation of other for latent semantic queries: how well the model retrieves documents that not only match the query but also align with its deeper semantics. Comparison of General Queries: Word2Vec for more accurate and efficient Table 4.1 General Queries document retrieval. Comparison of Latent Semantic Queries: Table 4.2 Latent Semantic Query
For general queries, when cosine similarity
is applied, the precision score is relatively low. The issue lies in its methodology, which doesn’t account for semantic Cosine similarity shows lower precision relationships between words and relies when applied to latent semantic queries. Its mostly on term frequency within inability to capture complex semantic links documents. This leads to less effective limits its effectiveness in retrieving semantic matching and, therefore, low relevant information in these scenarios. accuracy in generic searches. This creates a gap between cosine The Vector Space Model shows better similarity and more advanced models, precision than Cosine Similarity for which deliver higher precision. general queries. It represents the The Vector Space Model performs better relationships between documents and than cosine similarity for latent semantic queries more effectively and captures queries, offering improved precision. It semantic similarity since both are captures the similarity of latent semantics represented as vectors in a high- more effectively by integrating semantic dimensional space. The results show that understanding into text representation and this model achieves higher precision query processing. However, while it scores than Cosine Similarity, as it better outperforms cosine similarity, its precision understands and retrieves relevant still falls short of what the Word2Vec documents. model achieves. For general queries, the Word2Vec model The Word2Vec model stands out with achieves higher precision than both the significantly higher precision compared to Vector Space Model and Cosine both cosine similarity and the Vector Similarity. Word2Vec uses techniques that Space Model. It excels in understanding capture subtle semantic relationships, the deeper, underlying meaning of latent enabling it to extract the true meaning of semantic queries, producing highly queries more effectively. This results in accurate results. This model highlights highly relevant documents being retrieved, how leveraging semantic meaning can with precision consistently scoring higher. greatly improve information retrieval, The Word2Vec model performs making it a strong tool for finding relevant excellently for general queries, with strong documents. document ranking capabilities based on Average Precision: term-document cosine similarity. Although Precision alone may sometimes give a Cosine Similarity underperforms misleading picture of a system's compared to the Vector Space Model, its performance. To get a clearer comparison, results highlight the importance of average precision (AP) is used. AP adopting advanced approaches like evaluates the system's ability to retrieve relevant documents by averaging precision scores across all relevant results. This limited to basic keyword matching. This metric provides a more detailed and inability to grasp the nuances and accurate assessment of retrieval complexity of the desired information can performance. By using average precision, lead to less effective document retrieval. the study gains better insights into how This finding underscores how essential it is well each system performs under both to consider semantic context during types of queries. The graph below visually information retrieval tasks. By leveraging represents these comparison results. semantic understanding, retrieval systems can deliver more accurate, meaningful, and relevant results, ultimately enhancing the overall effectiveness of the retrieval process. This deeper, context-aware approach transforms search accuracy and highlights the value of moving beyond simple keyword-based methods. Conclusion: This research explored the potential of traditional and semantic models to enhance information retrieval effectiveness. Through a comprehensive analysis of existing IR models, it has become evident that traditional approaches often struggle Graph 4.4 Average Precision to capture the underlying meaning of The data presented highlights the average queries and documents, leading to precision scores for two types of queries— irrelevant retrieval results. By general and semantic—across three differentiating between semantic and retrieval models: Word2Vec, Vector Space, traditional IR approaches and conducting a and Cosine Similarity. Higher percentages comparison, this study highlights the in the results reflect better performance, benefits of using semantics in retrieval serving as key indicators of how systems. This study followed a systematic effectively each model retrieves relevant methodology consisting of different phases information. to improve the retrieval systems' One clear trend emerges from the data: effectiveness by merging semantics in the semantic queries consistently achieve retrieval process. The evaluation phase higher precision than general, keyword- consists of two types of evaluation and based searches. This difference is result discussion. The results show even in significant and suggests that retrieval the traditional models better performance techniques perform far better when they is obtained when using ontology-based account for context and meaning rather Latent Semantic Queries. However than simply matching keywords. Semantic semantic models such as Word2Vec hold queries allow retrieval models to tap into significant improvement in retrieval deeper relationships between words and precision and 99% Average precision. The documents, making it possible to explore comparison of non-semantic and semantic the true intent or context behind the search. retrieval methods has demonstrated the By doing so, these models can pinpoint superiority of semantic models, with and retrieve documents that align closely Word2Vec performing better than other with the semantic elements of the query, techniques in capturing semantic resulting in more accurate results. relationships. The use of COVID-19 On the other hand, generic queries often ontology-enabled semantic processing, fall short in precision because they are besides, has resulted in very high improvement in precision during the approaches,” in Journal of Physics: retrieval of information, thus reiterating Conference Series, IOP Publishing Ltd, the need for semantics in the performance Jun. 2021. doi: 10.1088/1742- of the information retrieval activity. This 6596/1898/1/012008. research opens the way for the development of more efficient and [7] C. Barba-González, I. Caballero, Á. J. contextually relevant IR solutions, with Varela-Vaca, J. A. Cruz-Lemus, M. T. implications extending to various domains Gómez-López, and I. Navas-Delgado, where information retrieval plays an “BIGOWL4DQ: Ontology-driven important role. approach for Big Data quality meta- References: modeling, selection and reasoning,” Inf [1] J. Grossman and A. Pedahzur, Softw Technol, vol. 167, Mar. 2024, doi: “Political Science and Big Data: 10.1016/j.infsof.2023.107378. Structured Data, Unstructured [8] R. Mao et al., “A Survey on Semantic Data, and How to Use Them,” Processing Techniques,” Oct. 2023, Polit Sci Q, vol. 135, no. 2, pp. [Online]. Available: 225–257, Jun. 2020, doi: http://arxiv.org/abs/2310.18345 10.1002/polq.13032. [9] S. Muhammad and S. Aloteibi, “A user- [2] Chirag Shah. Picture Emily M. Bender centered approach to information (2024) “Envisioning Information Access retrieval.” Systems: What Makes for Good Tools and [10] M. Ashok, R. Madan, A. Joha, and U. a Healthy Web?”. Sivarajah, “Ethical Framework for [3] S. Shaukat, A. Shaukat, K. Shahzad, and Artificial Intelligence and Digital A. Daud, “Using TREC for developing technologies,” International Journal of semantic information retrieval benchmark Information Management, vol. 62. Elsevier for Urdu,” Inf Process Manag, vol. 59, no. Ltd, Feb. 01, 2022. doi: 3, May 2022, doi: 10.1016/j.ijinfomgt.2021.102433. 10.1016/j.ipm.2022.102939. [11] H. Zamani, S. Dumais, N. Craswell, P. [4] M. MAZEN Almustafa, M. Sheikh Oghli, Bennett, and G. Lueck, “Generating and M. Mazen Almustafa, “Comparison of Clarifying Questions for Information basic Information Retrieval Models,” Retrieval,” in The Web Conference 2020 - 2021. [Online]. Available: Proceedings of the World Wide Web https://www.researchgate.net/publication/3 Conference, WWW 2020, Association for 58509603 Computing Machinery, Inc, Apr. 2020, pp. [5] Y. Kalmukov, “Comparison of Latent 418–428. doi: 10.1145/3366423.3380126. Semantic Analysis and Vector Space [12] A. Hotho, D. Ba Nguyen, M. Cheatham, J. Model for Automatic Identification of L. Martinez-Rodriguez, A. Hogan, and I. Competent Reviewers to Evaluate Papers,” Lopez-Arevalo, “Information Extraction International Journal of Advanced meets the Semantic Web: A Survey; 2 Computer Science and Applications, vol. Anonymous Reviewers Open review(s),” 13, no. 2, pp. 77–85, 2022, doi: 2016. [Online]. Available: http://prefix.cc. 10.14569/IJACSA.2022.0130209. [13] A. Loreggia, S. Mosco, and A. Zerbinati, [6] S. A. Savitri, A. Amalia, and M. A. “SenTag: A Web-Based Tool for Semantic Budiman, “A relevant document search Annotation of Textual Documents,” 2022. system model using word2vec [Online]. Available: www.aaai.org [14] P. K. Sadineni, “Comparative study on [21] D. Ortega, “Institute for Natural Language query processing and indexing techniques Processing (IMS) Pfaffenwaldring 5B in big data,” in Proceedings of the 3rd 70569 Stuttgart.” International Conference on Intelligent [22] M. Asfand-E-Yar and R. Ali, “Semantic Sustainable Systems, ICISS 2020, Institute integration of heterogeneous databases of of Electrical and Electronics Engineers the same domain using an ontology,” Inc., Dec. 2020, pp. 933–939. doi: IEEE Access, vol. 8, pp. 77903–77919, 10.1109/ICISS49785.2020.9315935. 2020, doi: [15] M. A. Khedr, F. A. El-Licy, and A. Salah, 10.1109/ACCESS.2020.2988685. “Ontology-based Semantic Query [23] Y. He et al., “DeepOnto: A Python Expansion for Searching Queries in Package for Ontology Engineering with Programming Domain,” International Deep Learning,” Jul. 2023, [Online]. Journal of Advanced Computer Science Available: http://arxiv.org/abs/2307.03067 and Applications, vol. 12, no. 8, pp. 449– [24] D. Chandrasekaran and V. Mago, 455, 2021, doi: “Evolution of Semantic Similarity -- A 10.14569/IJACSA.2021.0120852. Survey,” Apr. 2020, doi: 10.1145/3440755. [16] J. Guo, Y. Cai, Y. Fan, F. Sun, R. Zhang, [25] Anuttara Rajasinghe, “AnuttaraR”, and X. Cheng, “Semantic Models for the “GitHub”, First-Stage Retrieval: A Comprehensive https://github.com/AnuttaraR/Covid19_On Review,” ACM Trans Inf Syst, vol. 40, no. tology/tree/main [Accessed: Feb 19, 2024] 4, Oct. 2022, doi: 10.1145/3486250. [17] A. Aghajanyan et al., “Conversational Semantic Parsing,” Sep. 2020, [Online]. Available: http://arxiv.org/abs/2009.13655 [18] S. Hu, J. Wang, C. Hoare, Y. Li, P. Pauwels, and J. O’Donnell, “Building energy performance assessment using linked data and cross-domain semantic reasoning,” Autom Constr, vol. 124, Apr. 2021, doi: 10.1016/j.autcon.2021.103580. [19] A. A. Kumar, “Semantic memory: A review of methods, models, and current challenges,” Psychonomic Bulletin and Review, vol. 28, no. 1. Springer, pp. 40–80, Feb. 01, 2021. doi: 10.3758/s13423-020- 01792-x. [20] R. Zgheib et al., “A scalable semantic framework for IoT healthcare applications A scalable semantic framework for IoT healthcare applications A Scalable Semantic Framework for IoT Healthcare Applications,” J Ambient Intell Humaniz Comput, p. 10, 2020, doi: 10.1007/s12652- 020-02136-2ï.