0% found this document useful (0 votes)
28 views66 pages

Project 1 Final Report

project

Uploaded by

CALM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views66 pages

Project 1 Final Report

project

Uploaded by

CALM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

MLOps - Resume Parser Model

BITE497J – Project I

Submitted in partial fulfilment of the requirements for the degree of

Bachelor of Technology
In

Department of Information Technology

by

Abhinav Aryan 21BIT0347


Rishabh Kumar 21BIT0194
Rupesh Prasad 21BIT0420

Under the guidance of


Dr. Pradeepa M

School of Computer Science Engineering and Information Systems


VIT, Vellore

November, 2024

1
2
3
ACKNOWLEDGEMENT
It is our pleasure to express with a deep sense of gratitude to my BITE497 - Project I guide
Dr. Pradeepa M, School of Computer Science Engineering and Information Systems, Vellore
Institute of Technology, Vellore for her constant guidance, continual encouragement, in our
endeavor. Our association with her is not confined to academics only, but it is a great
opportunity on our part to work with an intellectual and an expert in the field of Artificial
Intelligence and Image Processing.

"We would like to express our heartfelt gratitude to Honorable Chancellor Dr. G
Viswanathan; respected Vice Presidents Mr. Sankar Viswanathan, Dr. Sekar
Viswanathan, Vice Chancellor Dr. V. S. Kanchana Bhaaskaran; Pro-Vice Chancellor Dr.
Partha Sharathi Mallick; and Registrar Dr. Jayabarathi T.

Our whole-hearted thanks to Dean Dr. Sumathy S, School of Computer Science Engineering
and Information Systems, Head, Department of Information Technology, Dr. Prabhavathy P,
Information Technology Project Coordinator Dr. Sweta Bhattacharya & Dr. Praveen
Kumar Reddy, SCORE School Project Coordinator Dr. Srinivas Koppu, all faculty, staff and
members working as limbs of our university for their continuous guidance throughout my
course of study in unlimited ways.

It is indeed a pleasure to thank our parents and friends who persuaded and encouraged us to
take up and complete my project “MLOps – Resume Parser Model” successfully. Last, but
not least, we express my gratitude and appreciation to all those who have helped us directly or
indirectly towards the successful completion of the project “MLOps – Resume Parser
Model”.

4
Executive Summary

The MLOps - Resume Parser Model is an innovative tool designed to automate the process
of resume parsing and candidate evaluation, specifically aimed at organizations seeking to
optimize their recruitment workflow. This tool is intended for HR professionals, recruitment
agencies, and organizations that need to process and evaluate large numbers of resumes
efficiently and accurately.

The system begins by processing resumes, typically in formats like PDFs or DOCX, using
advanced Natural Language Processing (NLP) techniques. These techniques help clean and
standardize the text, making it ready for further processing. The next step involves Named
Entity Recognition (NER), which uses transformer-based models like BERT to identify key
entities such as candidate skills, work experience, education, and job titles. This structured
extraction ensures that relevant candidate information is pulled out in a usable format for easy
analysis.

Additionally, the model employs machine learning algorithms to match candidate profiles
with job descriptions, improving the alignment between applicants and the positions they are
applying for. By analyzing features such as skills, qualifications, and past experiences, the
model provides a ranked list of candidates based on their fit for the job, which helps
streamline the selection process.

The project is built around MLOps principles, ensuring seamless integration into existing HR
systems. By utilizing continuous integration, deployment, and model monitoring, the tool can
adapt to changes in job descriptions, candidate data, and industry requirements without
disrupting workflows. This makes the model scalable and efficient in handling dynamic
recruitment needs.

The user interface is designed to be intuitive, allowing recruiters to easily upload resumes and
receive structured, analyzed data in real-time. This reduces the time and effort required to
manually sift through resumes, enabling HR teams to focus on higher-value tasks like
interviews and candidate engagement.

In summary, the MLOps - Resume Parser Model enhances recruitment processes by


automating candidate screening and matching, improving both the speed and accuracy of
selecting qualified candidates. By leveraging state-of-the-art NLP and machine learning
techniques, the system empowers organizations to make more data-driven, objective hiring
decisions.

5
CONTENTS
Page No.
Acknowledgement 4

Executive Summary 5

Table of Contents 6

List of Figures 7

List of Tables 8

1. INTRODUCTION 9
1.1 Objective 9
1.2 Motivation 10
1.3 Background 10
2. PROJECT DESCRIPTION AND GOALS 12
2.1 Project Description 12
2.2 Goals 14
3. LITERATURE REVIEW 16
4. TECHNICAL SPECIFICATIONS 19
4.1 System Architecture 19
4.2 Technology Stack 22
5. DESIGN APPROACH AND DETAILS 24
5.1 Design Approach 24
5.2 Constraints 35
6. PHASES OF PROJECT 36
6.1 Project Phases 36
7. PROJECT DEMONSTRATION 40
8. CONCLUSIONS 46
9. REFERENCES 49
10. APPENDIX 51

6
LIST OF FIGURES

Figure No. Title Page No.

Fig 4.1 Architectural Design 19


Fig 5.1 High level user interface design 25
Fig 5.2 Backend Workflow 26
Fig 5.3 OCR internal working 28
Fig 5.4 End-to-end resume parsing workflow 30
Fig 5.5 Text Cleaning Pipeline 31
Fig 7.1 User Interface 40
Fig 7.2 Displaying Uploaded Resume 41
Fig 7.3 Resume Analysis I 41
Fig 7.4 Resume Analysis II 42
Fig 7.5 Recommendation of skills 42
Fig 7.6 Experience Detection and Visualization 43
Fig 7.7 Suggestion for Courses and Certifications I 44
Fig 7.8 Suggestion for Courses and Certifications II 44
Fig 7.9 Admin Panel 45
Fig 7.10 User Data View 45

7
LIST OF Tables

Table No. Title Page No.

Fig 5.1 Comparison of Summarization Techniques in 31-32


Resume Parsing

8
CHAPTER 1
INTRODUCTION
In the contemporary technological landscape, machine learning (ML) has emerged as a
cornerstone of innovation across industries. From automating mundane tasks to solving
complex problems, ML models are transforming the way businesses operate. However,
developing an ML model is only the first step in its lifecycle. Deploying these models into
real-world environments, ensuring their scalability, maintainability, and efficiency, has
introduced new challenges, giving rise to the concept of ML-Ops.
ML-Ops, an amalgamation of machine learning and DevOps principles, focuses on bridging
the gap between model development and deployment. It encompasses a set of practices and
tools designed to streamline the deployment, monitoring, and iterative improvement of
machine learning models. For organizations seeking to harness the power of AI, ML-Ops
provides the foundational framework for ensuring that models remain functional, reliable,
and relevant over time.
This project explores the application of ML-Ops to develop and deploy a resume parsing
system, leveraging machine learning and Natural Language Processing (NLP). The system is
designed to automate the extraction of structured information from resumes, such as contact
details, educational qualifications, work experience, and skills. By integrating ML-Ops
principles, this project ensures that the resume parser is scalable, maintainable, and capable of
handling diverse operational environments.
1.1 Objective
The primary objective of this project is to design and implement an end-to-end resume
parsing solution that incorporates ML-Ops principles for robust deployment and seamless
operation. Specifically, the project aims to:
• Develop a resume parser capable of handling resumes in various formats (PDF,
DOCX, TXT) and extracting meaningful information using NLP techniques.
• Ensure that the parser is easily deployable in diverse environments using Docker for
containerization.
• Provide a scalable solution to handle increasing volumes of resumes, leveraging ML-
Ops for continuous integration and delivery.
• Implement a relational database (MySQL) to store and manage parsed data, enabling
efficient querying and reporting.
• Design a user-friendly interface using Streamlit, allowing HR professionals to
interact with the system without requiring technical expertise.
• Address challenges in deployment, data management, and user interaction by
integrating ML-Ops best practices.
9
The project demonstrates how ML-Ops can address real-world challenges, ensuring the
reliability, scalability, and usability of machine learning models in production.
1.2 Motivation
The recruitment process is a critical yet time-intensive operation for organizations. As the
volume of applications increases, traditional manual methods of reviewing resumes become
impractical. This challenge has motivated the development of automated solutions that can
streamline the hiring process.
Resume parsing, as a solution, is increasingly being adopted by HR departments to address
the following challenges:
1. Time-Consuming Manual Processes: Manually reviewing resumes is labor-intensive
and prone to errors. Automating this process saves significant time and resources.
2. Diverse Resume Formats: Candidates submit resumes in various formats, making it
difficult to standardize and extract information manually. An NLP-based parser can
effectively handle this diversity.
3. Scalability Issues: As organizations grow, the volume of applications increases. A
scalable solution that integrates ML-Ops can ensure seamless performance even under
heavy workloads.
4. Efficiency in Candidate Selection: Automated resume parsing provides structured
data that HR teams can quickly analyze, ensuring more informed and efficient
decision-making.
This project is motivated by the potential to create an efficient, scalable, and reliable system
for automating the recruitment process, leveraging cutting-edge ML-Ops practices.
1.3 Background
The development and deployment of machine learning models have traditionally been viewed
as separate tasks. However, real-world applications demand an integrated approach that
considers the entire lifecycle of an ML model, from development to deployment and
maintenance. ML-Ops addresses this need by providing a framework for managing models in
production.
1.3.1 The Role of ML-Ops
ML-Ops incorporates practices from software engineering and DevOps, such as continuous
integration, continuous delivery, version control, and monitoring, into the domain of machine
learning. By adopting ML-Ops, this project ensures:
• Consistency Across Environments: Using containerization tools like Docker, the
system can be deployed on local machines, servers, or cloud environments with
minimal effort.
• Scalability: The system can handle increasing workloads, such as parsing thousands
of resumes, by deploying multiple containers or scaling database infrastructure.
10
• Maintainability: By monitoring model performance and ensuring seamless updates,
ML-Ops guarantees that the parser remains reliable over time.
1.3.2 NLP for Resume Parsing
Natural Language Processing (NLP) is the cornerstone of this project, enabling the system to
interpret and extract structured information from unstructured text. NLP techniques such as
tokenization, named entity recognition (NER), and text classification allow the parser to:
• Identify key information like names, email addresses, and skills.
• Understand context to distinguish between similar entities (e.g., "Python" as a skill vs.
"Python" as a keyword in a job description).
• Handle diverse linguistic styles and formatting in resumes.
1.3.3 Tools and Technologies
The following tools are employed to build the resume parser:
1. Python: A versatile programming language widely used for machine learning and
NLP tasks.
2. Streamlit: A framework for developing interactive user interfaces, making the system
accessible to HR professionals.
3. MySQL: A relational database for storing and managing parsed resume data, enabling
efficient querying and reporting.
4. Docker: A containerization tool that ensures consistent deployment across
environments, from local development to cloud servers.
1.3.4 Challenges Addressed by ML-Ops
The integration of ML-Ops addresses several challenges:
• Deployment Complexity: By containerizing the entire system, Docker simplifies
deployment and eliminates environment-specific issues.
• Data Management: MySQL ensures that the data extracted from resumes is stored
securely and can be queried efficiently.
• User Interaction: Streamlit provides a user-friendly interface, allowing HR
professionals to interact with the parser seamlessly.

11
CHAPTER 2
PROJECT DESCRIPTION AND GOALS
The growing demand for automation in recruitment processes has highlighted the importance
of efficient and scalable resume parsing systems. This project focuses on the development
and deployment of a robust resume parsing system leveraging machine learning (ML) and
Natural Language Processing (NLP). By integrating ML-Ops principles, the project ensures
that the system is not only functional but also scalable, maintainable, and adaptable to real-
world requirements.
The primary goal of the project is to streamline the resume parsing process for organizations,
enabling HR teams to process large volumes of resumes effectively while maintaining data
accuracy and system reliability.
2.1 Project Description
The project involves designing and implementing a comprehensive system that automates the
extraction, organization, and management of data from resumes. The resume parser employs
machine learning and NLP techniques to interpret unstructured data and extract meaningful
information. The system is built with a modular architecture, ensuring seamless integration of
components like the parser, database, and user interface.
2.1.1 Key Components of the System
1. Resume Parsing with NLP
The core functionality of the system is built on Natural Language Processing. NLP
techniques such as tokenization, named entity recognition (NER), and text
classification enable the parser to:
o Extract structured data such as names, contact details, education, work
experience, and skills from unstructured resume text.
o Handle various resume formats, including PDF, DOCX, and TXT, ensuring
versatility.
o Manage diverse linguistic styles and terminologies commonly found in
resumes.
2. User-Friendly Interface with Streamlit
A simple yet interactive web interface is developed using Streamlit, allowing HR
professionals to upload resumes, view parsed data, and perform filtering or searching
tasks. Key features include:
o Resume Upload: Users can upload resumes in different formats.
o Parsed Data Display: The extracted information is presented in a structured
format, such as tables or lists, for easy review.

12
o Filtering and Searching: HR professionals can filter resumes based on
specific skills, experience levels, or other criteria.
3. Data Storage and Management with MySQL
The system uses MySQL to store parsed resume data securely and efficiently. The
database is designed with scalability in mind, ensuring it can handle increasing
volumes of data.
o Structured Storage: Parsed data is stored in relational tables, making it easy
to query and analyze.
o Data Integrity: Mechanisms are implemented to ensure the accuracy and
consistency of stored data, even when dealing with incomplete or unusual
resume formats.
4. Containerization with Docker
To ensure consistent deployment across environments, the entire system is
containerized using Docker. This includes the NLP model, user interface, and
database components.
o Portability: Docker containers enable the system to run uniformly on local
machines, servers, or cloud platforms.
o Scalability: Containers can be replicated to handle larger workloads during
peak recruitment periods.
5. Integration of ML-Ops Practices
ML-Ops principles are applied throughout the project to automate deployment,
improve scalability, and ensure continuous monitoring and updates.
o Continuous Integration and Deployment: Automated pipelines streamline
updates to the NLP model or database configurations.
o Monitoring and Maintenance: Tools are implemented to track system
performance and detect issues proactively.
2.1.2 Workflow of the System
The project workflow comprises several stages to ensure smooth operation:
1. Input and Preprocessing: Resumes uploaded by users are converted into formats
suitable for NLP processing. For instance, PDF resumes are converted into text using
OCR (Optical Character Recognition) if needed.
2. Data Extraction and Parsing: The NLP model processes the text to extract key
details, which are then structured for database storage.
3. Database Management: Parsed data is stored in MySQL, ensuring it is secure,
organized, and easily accessible.

13
4. Output Generation: The parsed data is displayed on the Streamlit interface, where
users can filter, search, or export the information.
5. Deployment and Scaling: Docker ensures the system runs consistently across
different environments, with scalability mechanisms to handle high workloads.
The modular design of the system allows for easy integration of additional features, such as
support for more resume formats or advanced analytics tools.

2.2 Goals
The goals of the project are structured to address both technical and operational challenges in
deploying a resume parsing system. These goals ensure that the system meets the needs of
end users while adhering to ML-Ops best practices.
2.2.1 Automation of Resume Parsing
The primary goal is to develop an automated process for extracting key details from resumes.
The system should:
• Handle resumes in multiple formats, ensuring versatility and compatibility.
• Accurately extract structured information, such as contact details, education, work
experience, and skills.
• Minimize errors and inconsistencies, even when parsing resumes with unconventional
formats or incomplete data.
2.2.2 Scalability and Performance
The system must be capable of scaling to handle increasing volumes of resumes. This
includes:
• Dynamic Scaling: Deploying additional Docker containers to manage peak
workloads.
• Performance Optimization: Ensuring fast parsing and data retrieval times, even with
large datasets.
• System Reliability: Maintaining consistent performance across all deployment
environments.
2.2.3 User Accessibility
A key goal is to make the system user-friendly for non-technical users, particularly HR
professionals. The Streamlit-based interface is designed to:
• Allow easy uploading of resumes and viewing of parsed data.
• Enable filtering and searching for specific criteria, such as skills or experience levels.
• Provide an intuitive experience that requires minimal training or technical knowledge.
14
2.2.4 Efficient Data Management
Using MySQL as the database ensures efficient data handling, with goals including:
• Secure Data Storage: Protecting sensitive applicant information with robust security
measures.
• Efficient Querying: Allowing users to search and retrieve data quickly based on
specific parameters.
• Scalable Design: Ensuring the database can grow alongside the system to
accommodate more resumes.
2.2.5 Seamless Deployment
Deployment challenges are addressed through the use of Docker and ML-Ops practices, with
the following goals:
• Ensure consistent deployment across diverse environments, including local machines
and cloud servers.
• Minimize setup and configuration issues, enabling rapid deployment.
• Support continuous integration and updates to the NLP model and other system
components.
2.2.6 Long-Term Maintenance and Monitoring
ML-Ops principles guide the long-term goals for maintaining the system, such as:
• Automating updates to ensure the system remains up-to-date with the latest ML and
NLP advancements.
• Monitoring system performance to detect and address issues proactively.
• Providing regular reports on system usage and performance metrics.

15
CHAPTER 3
LITERATURE REVIEW
The literature review serves as a foundation for developing a robust and efficient resume
parsing system by exploring various methodologies, challenges, and technologies. It
discusses critical components, including Natural Language Processing (NLP), database
management, text preprocessing, and machine learning, emphasizing their relevance to the
project’s goals. The following subsections summarize the key findings and insights drawn
from academic research that informed the project.
[1] This paper highlights the integration of deep learning models in automating resume
parsing, focusing on extracting key entities such as skills, qualifications, and experiences.
The authors discuss the use of Named Entity Recognition (NER) and transformer-based
models like BERT for semantic analysis, enabling a better understanding of resume data.
This paper provided a foundation for using state-of-the-art NLP techniques for entity
extraction and classification. By implementing BERT embeddings and customizing NER
models, I was able to efficiently identify and categorize candidate information, which
significantly enhanced my project’s parsing accuracy and contextual relevance. The use of
deep learning for understanding complex resume data aligned perfectly with my project goals
of automating candidate profile extraction.

[2] This research introduces a machine learning framework for classifying resumes and
matching them with job descriptions. It emphasizes supervised learning techniques and
feature engineering to map resumes effectively to job requirements.
We gained insights into feature extraction and matching techniques, especially the importance
of structured feature representation. Adopting their proposed classification model allowed me
to streamline the job-candidate alignment process, improving the relevance of recommended
matches in my project. By leveraging their method, I was able to automate the job matching
process, ensuring better fit recommendations based on resume data and job descriptions.

[3] The paper focuses on preprocessing steps and tokenization methods that help standardize
unstructured resume data. It also highlights ranking mechanisms based on job criteria using
embedding similarity measures like cosine similarity.
This study guided the implementation of preprocessing pipelines, particularly in handling
varied resume formats. Techniques such as vectorization and similarity calculations directly
enhanced the efficiency of my resume shortlisting module, reducing noise and improving
relevance scores. Their focus on similarity measures was crucial in refining my project’s
ability to rank resumes based on job relevance, streamlining the recruitment process.

16
[4] The authors present a hybrid framework combining rule-based and machine learning
approaches to parse resumes effectively. They address challenges like unstructured data and
multilingual formats, providing a scalable solution for large datasets.
This framework offered valuable insights into combining deterministic methods with
probabilistic models to handle edge cases in resume parsing. Integrating rule-based logic
improved the system's ability to parse uncommon resume structures, making my project more
robust. The hybrid approach helped improve parsing accuracy for resumes that didn’t follow
standard formats, ensuring the system worked well with a wide range of inputs.

[5] This paper explores skill extraction techniques using advanced NLP pipelines. It evaluates
various models, including sequence tagging and dependency parsing, to identify and map
skills with job requirements.
The insights into sequence tagging models were particularly beneficial in refining my skill
extraction module. Implementing dependency parsing methods from the paper improved the
semantic understanding of skills, enhancing the accuracy of my project's job matching
process. The paper highlighted how to handle complex skill relationships and improve the
extraction process, which was essential for mapping skills to job descriptions in my system.

[6] Focusing on preprocessing techniques, this study addresses challenges in handling


missing data, ambiguous formatting, and inconsistent terminology in resumes. It provides a
detailed guide to tokenization and normalization processes for cleaner input.
The paper underscored the importance of text preprocessing in improving downstream NLP
tasks. Implementing their tokenization and normalization strategies streamlined my project’s
input processing stage, reducing errors in information extraction. By leveraging the methods
for standardizing terminology and dealing with missing data, I was able to improve the
overall quality of the parsed data in my project.

[7] The paper discusses semantic search algorithms, particularly context-aware embeddings
like BERT and RoBERTa, to improve the precision of job-candidate matching.
By leveraging context-aware embeddings, we were able to enhance the semantic similarity
measures in my project. This improved the system’s ability to understand nuanced
connections between resumes and job descriptions, resulting in better matching accuracy. The
use of semantic search allowed my project to go beyond keyword matching, ensuring more
contextually relevant job-candidate pairings.

[8] This paper emphasizes the role of domain-specific knowledge in enhancing resume
parsing models. It discusses the integration of pretrained language models with recruitment-
specific datasets to improve performance.

17
The research inspired the adaptation of domain-specific fine-tuning for pretrained models in
our project. This improved the relevance and specificity of extracted entities, ensuring
alignment with recruitment requirements. By using domain-specific data to fine-tune models,
we were able to achieve better accuracy in extracting job-relevant skills and qualifications,
which was critical for tailoring the system to the recruitment domain.

[9] The study integrates ontologies and taxonomies to refine the accuracy of entity
recognition in resumes. It highlights the use of structured knowledge bases for better
classification.
This paper provided the motivation to incorporate domain knowledge in the parsing model.
Ontology-based enhancements significantly improved the granularity and correctness of the
parsed data, boosting my project's overall system performance. By integrating taxonomies
specific to recruitment, I was able to achieve a more accurate classification of job skills and
experience in the resumes, which was vital for the job-matching algorithm.

[10] The authors explore semantic role labeling, dependency parsing, and entity linking to
extract structured data from resumes. They evaluate these techniques on unstructured textual
data, demonstrating their efficiency in generating organized datasets.
Incorporating semantic parsing techniques improved the structural organization of resume
data in our project. The combination of dependency parsing and entity linking enriched the
data quality, making the matching process more efficient and reliable. These techniques
allowed me to extract and link relevant entities within resumes, ensuring better data integrity
for downstream job matching and candidate recommendation.

18
CHAPTER 4
TECHNICAL SPECIFICATIONS
4.1 System Architecture

Fig 4.1: Architectural Diagram

The Resume Parser project follows a modular architecture that integrates multiple
components, ensuring an efficient and accurate flow of operations from resume input to final
categorized output. Below is a detailed explanation of each component:
1. Streamlit UI: Upload Resume
o The Streamlit User Interface serves as the front-end of the system, designed
for ease of use and accessibility.
o It allows users to upload resumes in various formats such as PDF, DOCX, or
text files. The drag-and-drop functionality simplifies the upload process for
non-technical users.
o The UI validates the uploaded files, ensuring they meet the accepted format
and size limits, preventing errors in downstream processing.
o Once uploaded, the resumes are sent to the text extraction module for
processing.
2. Text Extraction
o This component is responsible for extracting raw text from the uploaded
resumes.

19
o For non-image-based files (e.g., PDFs or DOCX), parsers such as PyPDF2 or
python-docx are employed to extract the content.
o In cases where resumes are image-based (e.g., scanned documents), OCR
(Optical Character Recognition) tools such as Tesseract OCR are used to
convert the image content into machine-readable text.
o The extracted text is cleaned and preprocessed by removing unnecessary
formatting, symbols, and whitespaces to prepare it for NLP operations.
3. NLP Features: Entity Recognition
o The Natural Language Processing (NLP) module is the core component that
analyzes the extracted text to identify meaningful information.
o Using the NLTK (Natural Language Toolkit) library, the system applies
Named Entity Recognition (NER) to extract key entities such as:
▪ Personal Information: Name, contact details (email, phone number),
and address.
▪ Skills: Technical and soft skills relevant to the job market.
▪ Educational Qualifications: Degrees, institutions, and years of study.
▪ Professional Experience: Job titles, company names, durations, and
responsibilities.
o NER techniques rely on predefined datasets, tokenization, and part-of-speech
tagging to identify entities within the text.
4. ML Models: Categorization
o The extracted entities are further processed by the machine learning module
for categorization.
o The K-Nearest Neighbors (KNN) algorithm is used to classify resumes into
predefined categories, such as software development, data analysis, or
management roles.
o The KNN algorithm operates by comparing the attributes of the current
resume (e.g., skills and experience) with existing classified resumes to
determine its category.
o Feature engineering is performed to ensure the data passed to the ML model is
relevant and accurate. For instance, skills and experiences are encoded into
numerical vectors to make them processable by the algorithm.
o The categorization helps streamline the hiring process by matching candidates
to relevant job roles efficiently.
5. Output Layer (Streamlit)

20
o The processed data, including extracted entities and categorization results, are
displayed in a well-structured format using Streamlit.
o The output interface is interactive and user-friendly, allowing users to:
▪ View detailed information parsed from resumes.
▪ Download processed data in formats such as CSV or JSON for
integration with other systems.
▪ Perform further actions like searching, filtering, or exporting results.
o This layer bridges the back-end processing with the user, ensuring
transparency and usability.
6. Storage
o Parsed and categorized data is stored securely in a MySQL relational
database.
o The database is structured to efficiently organize information into tables, such
as:
▪ Candidate details (e.g., name, email, phone number).
▪ Educational background and work experience.
▪ Skills and certifications.
o The database design follows normalization principles to reduce redundancy
and improve data retrieval performance.
o MySQL also supports indexing and querying capabilities, allowing the system
to quickly retrieve candidate data for future processes such as analytics or
reporting.
7. Deployment Layer (Docker)
o The application is containerized using Docker, a tool that ensures the system
runs consistently across different environments.
o Docker encapsulates the application, along with its dependencies (e.g., Python
libraries, MySQL), into a portable container.
o Key benefits of Docker in this architecture include:
▪ Scalability: Additional containers can be deployed as the system scales
to handle more users or larger datasets.
▪ Portability: The containerized application can run on any system with
Docker installed, ensuring compatibility across development, testing,
and production environments.

21
▪ Simplified Maintenance: Updates to the system can be deployed
quickly by modifying and redeploying containers.
This architecture ensures an end-to-end workflow, from resume ingestion to categorized
output, with each module seamlessly integrating to deliver a robust solution.

4.2 Technology Stack


The technology stack for the Resume Parser project is designed to combine the strengths of
various tools and libraries, ensuring the system is efficient, scalable, and easy to maintain.
Below is a detailed explanation of the technologies used:
4.2.1 Front-end Technology
• Streamlit:
o A Python-based framework used to build the web interface for uploading
resumes and displaying outputs.
o Its simplicity and interactivity make it an ideal choice for creating dashboards
and front-end applications.
o Features like real-time interaction and integration with Python scripts make
Streamlit a powerful tool for this project.
4.2.2 Back-end Technology
• Python:
o The primary programming language used for implementing text processing,
NLP, and machine learning modules.
o Libraries like pandas, numpy, and re are utilized for data manipulation and
preprocessing.
• NLTK:
o The Natural Language Toolkit is leveraged for implementing NLP tasks,
including tokenization, stemming, and Named Entity Recognition (NER).
o NLTK offers a comprehensive set of tools and datasets for processing human
language data efficiently.
• Scikit-learn:
o This library provides machine learning algorithms such as KNN for resume
categorization.
o It includes tools for data preprocessing, feature extraction, and model
evaluation, making it a cornerstone of the ML module.
4.2.3 Database Management
22
• MySQL:
o A relational database used for structured storage of parsed data.
o It offers features like indexing, ACID compliance, and support for complex
queries, ensuring data integrity and efficient retrieval.
o The database is integrated with the Python back-end using libraries such as
SQLAlchemy or mysql-connector-python.
4.2.4 Deployment Technology
• Docker:
o The entire application, including the front-end, back-end, and database, is
containerized using Docker.
o Docker images are created for different components, ensuring modular
deployment and easy updates.
o Docker Compose is used to orchestrate multiple containers, simplifying the
deployment process.
4.2.5 Supporting Tools
• OCR Software:
o Tools like Tesseract OCR are integrated to handle image-based resumes by
converting them into text format.
o Preprocessing techniques like noise removal and image enhancement are
applied to improve OCR accuracy.
This comprehensive technology stack is selected to ensure the system's reliability, efficiency,
and adaptability to varying requirements.

23
CHAPTER 5
DESIGN APPROACH AND DETAILS
The goal of this project is to design and implement an application that performs advanced
machine learning operations to parse resumes. Users can upload resumes in different formats,
such as PDF, DOCX, or images. The application extracts and processes the content using
Optical Character Recognition (OCR) and NLP techniques, identifying key details like skills,
education, and experience. The processed data is categorized, displayed in an interactive
dashboard, and made available for download in structured formats like CSV or JSON.
5.1 Design Approach
5.1.1 User Interface (UI) Design
The user interface is crafted to ensure simplicity, accessibility, and functionality, catering to
users of all technical backgrounds. The UI design focuses on creating a seamless experience
for uploading resumes, processing them, and accessing results.
• File Upload Feature:
o Users can upload resumes through a streamlined file upload feature that
accepts multiple formats, including PDFs, DOCX, and image files.
o The interface supports drag-and-drop functionality to enhance usability.
• Process Button:
o A clearly labeled “Process” button allows users to trigger backend operations,
including OCR-based text extraction and NLP-based categorization.
• Interactive Data Visualization:
o Results are displayed on an interactive dashboard, showing parsed sections
like personal details, skills, education, work history, and more.
o The layout is designed to ensure readability, with key fields highlighted for
quick review.
• Download Option:
o Users can download the processed data in various structured formats, such as
CSV and JSON, for integration with external systems.

24
Fig 5.1: High-Level User Interface Design

5.1.2 Backend Workflow


The backend system manages file uploads, processing, and output generation. This section
provides an overview of the steps involved:
• File Handling:
o Uploaded resumes are temporarily stored on the server.
o Files are categorized based on format (PDF, DOCX, or image).
o Image-based resumes are preprocessed for OCR, while text-based files are
sent directly to the text extraction module.
• Text Extraction:
o The text extraction module uses Tesseract OCR for image-based files,
ensuring accurate conversion of visual text into machine-readable text.
o Libraries like PyPDF2 or docx2txt are used for text extraction from PDF and
DOCX files.
• Data Cleaning and Preprocessing:
o Extracted text is cleaned to remove unnecessary formatting, symbols, or
whitespace.

25
o NLP preprocessing steps include tokenization, stopword removal, and
lemmatization to prepare data for further processing.
• Entity Recognition and Categorization:
o The cleaned text is processed using Named Entity Recognition (NER) via
NLTK to identify and extract fields such as:
▪ Personal Details: Name, phone number, and email address.
▪ Skills: Technical and soft skills.
▪ Education: Degrees and certifications.
▪ Work Experience: Job titles, companies, and durations.
o A KNN algorithm categorizes resumes into predefined job roles based on skill
sets and extracted information.
• Output Generation:
o Parsed and categorized data is formatted for display in the frontend.
o Results are also converted into downloadable formats like CSV and JSON,
ensuring compatibility with various external systems.

Fig 5.2: Backend Workflow Diagram

5.1.3 System Architecture


The system architecture integrates multiple components to process resumes effectively. Each
component plays a critical role in achieving end-to-end functionality.
• Frontend:

26
o Built using Streamlit, providing a simple, responsive, and user-friendly
interface for uploading resumes and viewing results.
• Backend:
o Developed in Python using Flask, which manages server-side logic, including
file handling, text parsing, and database integration.
• Text Extraction (OCR):
o Tesseract OCR processes image-based resumes, extracting textual content.
o PDF and DOCX files are processed using libraries like PyPDF2 and docx2txt.
• Machine Learning (ML) Models:
o NLTK is used for Named Entity Recognition (NER) to extract key fields.
o A KNN algorithm is implemented for job role categorization, ensuring
accurate classification of resumes.
• Database:
o MySQL is used to store parsed data for retrieval and further analysis.
• Deployment:
o The entire system is containerized using Docker for consistent performance
across environments.

5.1.4 Optical Character Recognition (OCR) Functionality Using Python Library


(pytesseract, easyocr)
The OCR module is essential for processing image-based resumes, enabling text extraction
from scanned documents and screenshots. This ensures versatility in handling resumes across
diverse formats.
1. Image Preprocessing:
o Image files are enhanced using techniques like grayscale conversion and noise
reduction to improve text visibility.
o Each page of a PDF is converted into images for further OCR processing.
2. Text Extraction Using Tesseract OCR:
o The Tesseract engine converts images into text, preserving the original layout
and formatting as much as possible.
o Extracted text is passed to the cleaning module for further processing.

27
3. Integration with Backend:
o The OCR output is seamlessly integrated into the NLP pipeline for entity
recognition and categorization.

Fig 5.3: OCR Internal working

5.1.5 NLP Feature: Entity Recognition and Categorization


The NLP module leverages advanced algorithms to process and extract structured
information from the unstructured text.
1. Named Entity Recognition (NER):
o Using NLTK, the text is parsed to extract named entities such as names, skills,
educational qualifications, and job roles.
o The NER process involves tokenizing the text, tagging parts of speech, and
identifying entity classes.
2. Categorization with KNN Algorithm:
o The KNN algorithm categorizes resumes into job roles based on predefined
labels.
o Feature vectors are generated based on extracted keywords and skill sets,
enabling accurate classification.
3. Data Formatting and Storage:
o Extracted data is structured into a user-friendly format, displayed in the
frontend, and stored in the database for future use.

5.1.6 Text Summarization Functionality


28
The text summarization functionality is a core feature of the resume parser, designed to
automatically condense detailed resumes into concise, structured summaries that can
highlight a candidate's qualifications, skills, and experience efficiently. This is essential for
streamlining the resume screening process, particularly when dealing with large volumes of
applications.
The summarization feature analyzes resumes to extract key sections—such as Work
Experience, Skills, Education, and Certifications—and reformulates the data into shorter,
reader-friendly summaries. The primary goals of this functionality are:
1. Improved Information Retrieval: Condensing resumes while retaining the most
relevant details enables faster and more accurate candidate evaluation.
2. Reduction of Cognitive Load: By providing structured insights, the summarization
helps HR teams and recruiters avoid sifting through excessive or irrelevant
information.
3. Adaptability: Adjustable summary lengths and formats allow customization based on
job roles, recruiter preferences, or system requirements.
To ensure effective summaries, preprocessing steps such as text cleaning, sentence
segmentation, and error correction enhance accuracy, while models are trained to prioritize
information that is highly relevant to specific job descriptions. This makes the summarization
process particularly valuable in automating parts of the recruitment workflow and improving
overall decision-making efficiency.

5.1.7 Text Summarization Methods


The resume parser employs a hybrid text summarization approach, combining extractive
and abstractive techniques for superior accuracy and readability.
1. Extractive Summarization
Extractive summarization involves identifying and extracting key sentences or phrases
directly from the source text. In the context of resume parsing, this ensures critical data
points—such as job titles, years of experience, or certifications—are retained in their original
form.
• Implementation in Resume Parsing:
o The TF-IDF (Term Frequency-Inverse Document Frequency) technique is
employed to analyze the importance of terms within the resume.
o High-scoring sentences containing relevant keywords (e.g., “project manager,”
“data analysis”) are selected for the summary.
o This ensures that the summary includes the candidate's key qualifications and
avoids irrelevant details.
• Advantages:

29
o Preserves factual accuracy: Critical details, such as dates or names of
certifications, remain intact.
o Efficiency: Extractive summarization is computationally less demanding,
making it suitable for processing large volumes of resumes quickly.
• Applications:
o Technical roles: Focus on extracting specific certifications, programming
languages, or tools (e.g., AWS, Python).
o Leadership positions: Emphasis on key job responsibilities and
achievements.

Fig 5.4: End-to-End Resume Parsing Workflow

2. Abstractive Summarization
Abstractive summarization generates new sentences to describe the key information from the
resume. This method improves readability and coherence by rephrasing and reorganizing
extracted details.
• Implementation in Resume Parsing:
o Transformer-based models, such as Llama 3 (8db), Pegasus, and BERT T5,
are utilized to analyze extracted data and generate summaries.
o Summaries are formatted into bullet points or concise paragraphs to improve
clarity and usability.
o Prompts provided to models focus on summarizing resumes specific to the job
role, ensuring that the output aligns with recruiter expectations.
• Advantages:
o Improves readability: Summaries are structured to be easy to comprehend,
even when dealing with verbose resumes.

30
o Flexibility: The model can emphasize different aspects (e.g., technical skills
vs. soft skills) depending on job requirements.
• Applications:
o Suitable for generating summaries for diverse industries, including creative
roles, academic profiles, and management positions.

3. Hybrid Summarization
Hybrid summarization leverages the strengths of both extractive and abstractive techniques to
create summaries that are precise, coherent, and concise.
• Implementation in Resume Parsing:
o The pipeline first applies extractive summarization to identify critical content
(e.g., skills, roles).
o The extracted data is passed through the Llama 3 (8db) model for rephrasing,
ensuring a coherent and engaging summary.
o This combined approach reduces redundancy while preserving factual
accuracy.
• Advantages:
o Balanced Output: Combines the accuracy of extraction with the readability of
abstraction.
o Customizable Summaries: Tailored to highlight key qualifications based on
the specific job role or recruiter preferences.
Fig 5.5: Hybrid Summarization Workflow

5.1.8 Comparison between Summarization Techniques

Extractive Abstractive Hybrid


Feature
Summarization Summarization Summarization

Selects and uses Rephrases content to Combines extraction of


Definition original phrases or generate new key points with
sentences from the text. sentences. abstractive rephrasing.

Balances original
Uses exact text from Creates fluid and
Output Style content with rephrased
the resume. rephrased summaries.
sentences.

31
Extractive Abstractive Hybrid
Feature
Summarization Summarization Summarization

Extracting Ideal for summaries


Example Use Summarizing verbose
certifications, job titles, requiring both precision
Cases or creative profiles.
dates. and readability.

High; requires Moderate; combines


Low; uses statistical
Complexity advanced NLP models statistical and NLP-
models like TF-IDF.
like transformers. based techniques.

High; ensures factual


Accuracy for Key High for factual Moderate; rephrasing
accuracy with improved
Details content. may lose detail.
readability.

Highly readable; Highly readable;


May be less fluid;
Readability output is structured reduces redundancy
redundancy can occur.
and coherent. while preserving detail.

High; depends on Moderate to high;


Computational Low; suitable for local
GPU/CPU for large depends on hybrid
Demand machines.
datasets. implementation depth.

Combined use of TF-


TF-IDF, TextRank, Llama 3, Pegasus,
Models Used IDF and abstractive
LexRank. BERT T5.
models.

Highly flexible for


Limited to the original Flexible; balances
Flexibility rephrasing and
structure of the text. precision and creativity.
structuring.

Extracting technical Creating summaries Ensuring structured and


Application in
skills, certifications, or for creative or engaging summaries for
Resume Parsing
keywords. management roles. all profiles.

Table 5.1: Comparison of Summarization Techniques in Resume Parsing

5.1.9 Text Summarization Workflow


The text summarization workflow for the resume parser involves three main stages: Data
Preprocessing, Model Processing, and PDF Output Generation.
Step 1: Data Preprocessing
Preprocessing ensures that raw resume data is cleaned, structured, and ready for
summarization.

32
• Text Segmentation: Break down OCR-extracted text into individual sentences and
paragraphs.
def text_segmentation(text):
"""Segment text into sentences."""
from nltk.tokenize import sent_tokenize
nltk.download('punkt')
segments = sent_tokenize(text)
return segments

• Text Cleaning: Remove unnecessary characters (e.g., line breaks, page numbers,
headers).
• Noise Reduction: Exclude irrelevant sections such as footers or disclaimers.
def noise_reduction(text):
"""Reduce noise by removing unwanted characters."""
import re
# Remove special characters and multiple spaces
clean_text = re.sub(r'[^a-zA-Z0-9\s]', '', text) # Keep only alphanumerics and
spaces
clean_text = re.sub(r'\s+', ' ', clean_text).strip() # Replace multiple spaces with a
single space
return clean_text

• Correcting OCR Errors: Spell-check and grammar-check algorithms fix OCR-


induced mistakes.
• Removing Stop Words: Eliminate non-essential words (e.g., “and,” “is”).
• Lemmatization and Stemming: Standardize words to their root forms for consistent
analysis.

33
Fig 5.5: Text Cleaning Pipeline

def text_cleaning_pipeline(text):
"""Clean the text using segmentation and noise reduction."""
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt')

# Text segmentation
segments = text_segmentation(text)

# Noise reduction
clean_segments = [noise_reduction(segment) for segment in segments]

# Tokenization and stopword removal


stop_words = set(stopwords.words('english'))
tokens = [word_tokenize(segment) for segment in clean_segments]
filtered_tokens = [[word for word in token_list if word.lower() not in stop_words] for
token_list in tokens]

34
# Reconstruct the cleaned text
cleaned_text = ' '.join([' '.join(token_list) for token_list in filtered_tokens])
return cleaned_text

Step 2: Model Processing


• Extractive Summarization: Identifies critical points (e.g., skills, experience) using
TF-IDF.
• Abstractive Summarization: Refines extracted points into coherent summaries with
Llama 3.

Step 3: Output PDF Generation


The final summaries are formatted into clean, structured PDFs with professional fonts,
consistent formatting, and metadata such as candidate name and date. This ensures a polished
output that is portable and ready for recruiters.

5.2 Constraints
• OCR accuracy can be constrained by the quality of the uploaded images or scanned
documents. Poor image quality, handwritten text, or complex formatting might reduce
OCR accuracy.
• Dependency on External Libraries: The application relies on external libraries such as
EasyOCR for text extraction and Pegasus for summarization. Compatibility issues,
updates, or deprecations in these libraries could impact the application's functionality.
• Summarization Quality: The quality of text summarization using the Pegasus model
might vary depending on the nature of the text extracted by OCR. Lengthy,
unstructured text may not be summarized effectively.

35
CHAPTER 6
PHASES OF PROJECT

6.1 Project Phases


Phase 1: Identification – Building Core Components for the Resume Parser
The initial phase focuses on designing and implementing the foundational components
required for the ML-driven resume parser. This involves identifying the key functional areas,
ensuring data quality, and building the base architecture for parsing operations.
1. Data Preprocessing: Structuring and Cleaning Resume Data
Resumes come in various formats and styles, posing challenges for standard data extraction
techniques. The first step is to preprocess the data to make it uniform and ready for ML
processing.
• Steps in Preprocessing:
o Data Loading: Resumes are uploaded in formats like PDF, Word, or plain
text. Python libraries such as PyPDF2, docx, and pandas are used to extract
raw text.
o Text Normalization: The text is cleaned by removing special characters,
redundant whitespaces, headers, and footers. The formatting is standardized to
ensure consistency.
o Tokenization: The text is split into smaller units like words or sentences using
tools like NLTK or SpaCy. These tokens serve as the building blocks for NLP
operations.
o Stop-word Removal: Non-informative words such as "the," "is," or "of" are
filtered out to focus on meaningful content.
o Lemmatization and Stemming: Words are reduced to their base forms to
ensure consistency in semantic meaning (e.g., "running" → "run").
• Significance:
Effective preprocessing ensures that the subsequent ML operations are performed on
clean, structured data, reducing noise and improving the accuracy of the resume
parser.
2. Feature Extraction: Identifying Key Entities
The resume parser relies on identifying structured elements like names, contact details, skills,
and education. This step focuses on extracting these features using machine learning and rule-
based approaches.
• Approach:

36
o Named Entity Recognition (NER): Using SpaCy or BERT, the system
detects entities such as job titles, skills, organizations, and dates. Pre-trained
models are fine-tuned with labeled resume datasets for better accuracy.
o Custom Rules and Patterns: In scenarios where ML models struggle, custom
regex patterns are implemented to identify fields like phone numbers or email
addresses.
• Significance:
Feature extraction transforms unstructured resume data into actionable information,
laying the groundwork for subsequent parsing and analysis.
3. Database Integration: Storing Extracted Data
Parsed data is stored in a relational database (e.g., MySQL or PostgreSQL) for efficient
querying and further operations.
• Implementation:
o A database schema is designed with tables for candidate details, skills,
experience, and education, with appropriate relationships between them.
o SQLAlchemy or Django ORM is used for seamless interaction between the
application and the database.
• Significance:
A well-structured database enables recruiters to access, search, and manage parsed
data efficiently, making the system scalable and user-friendly.

Phase 2: Mitigation – Enhancing Accuracy and Operational Efficiency


In this phase, the focus shifts to refining the system for better performance and ensuring
operational readiness for real-world deployment.
1. Model Optimization: Improving Parsing Accuracy
The machine learning models are fine-tuned to handle diverse and complex resume formats
while minimizing errors.
• Techniques:
o Hyperparameter Tuning: Adjusting model parameters like learning rate and
batch size to improve NER accuracy.
o Error Analysis: Analyzing misclassified entities to identify gaps in training
data or model architecture.
o Model Retraining: Updating models with new data to improve their
generalization capabilities for unseen resumes.
• Significance:
Higher parsing accuracy ensures that the system reliably extracts meaningful
information, regardless of variations in resume structure or content.

37
2. System Performance Optimization: Speed and Scalability
With large-scale use in mind, optimizing the system for quick processing and scalability is
essential.
• Approach:
o Parallel Processing: Using multiprocessing techniques to process multiple
resumes simultaneously.
o Batch Processing: Handling resumes in batches to reduce computational
overhead and improve throughput.
o Caching: Implementing caching mechanisms to store frequently accessed data
for faster retrieval.
• Significance:
Optimized performance ensures the system can handle high volumes of resumes in
real-time scenarios without bottlenecks.
3. Security and Authentication: Safeguarding Sensitive Information
As resumes contain personal and professional details, implementing robust security measures
is critical.
• Implementation:
o Role-based Access Control (RBAC): Restricting access based on user roles
(e.g., recruiter, admin).
o Encryption: Securing sensitive data in transit and at rest using encryption
standards like AES.
o Audit Logs: Maintaining logs of user actions to ensure traceability and
compliance with regulations like GDPR.
• Significance:
Ensuring data security builds trust among users and protects the system from potential
breaches.

Phase 3: Models for Enhanced Mitigation – Advanced Deployment and Automation


The final phase involves integrating advanced techniques to make the system production-
ready, scalable, and user-friendly.
1. Containerization and Deployment: Ensuring Consistent Environments
Using Docker, the resume parser and its dependencies are containerized for consistent
behavior across environments.
• Benefits:
o Portability: Containers can run on any system with Docker installed.

38
o Scalability: Additional containers can be deployed to handle increased
workloads.
• Significance:
Containerization simplifies deployment and makes the system robust against
environment-specific issues.
2. Real-time Resume Parsing: Interactive User Experience
The system is designed to provide real-time parsing results, enhancing the user experience.
• Implementation:
o Using Streamlit for an intuitive interface where users can upload resumes and
view parsed results in seconds.
o Displaying extracted data in a structured format, allowing users to provide
feedback or corrections.
• Significance:
Real-time parsing improves user engagement and enables faster decision-making.
3. Candidate Scoring and Ranking: Intelligent Decision Support
The parsed data is used to evaluate and rank candidates based on their relevance to job
descriptions.
• Scoring Algorithm:
o Weighting factors like skills, experience, and education.
o Comparing candidates against job criteria to generate a relevance score.
• Significance:
Automated scoring streamlines the hiring process, allowing recruiters to focus on top
candidates efficiently.

39
CHAPTER 7
PROJECT DEMONSTRATION

User Section
1. User Interface for Uploading Resume

The user interface is designed to provide a seamless experience for uploading resumes. Key
features include:

• File Upload: A clean and intuitive file upload button allows users to select resumes in
various formats (PDF, DOCX, TXT).
• Drag-and-Drop Option: Users can drag and drop files for convenience.

Fig 7.1: User Interface

2. Displaying Uploaded Resume

Once a resume is uploaded, the system displays the document on the interface to ensure users
have uploaded the correct file. Features include:

• Preview Section: A real-time preview of the uploaded document is displayed using


libraries like PDF.js for PDF files or embedded text viewers for DOCX and TXT files.
• Edit and Re-upload Options: If the user realizes they have uploaded the wrong file,
they can delete and re-upload the correct one.

40
Fig 7.2: Displaying Upload Resume

3. Resume Analysis

The system performs a thorough analysis of the uploaded resume and displays key details,
including:

• Basic Information: Name, contact details, and location.


• Skills Extracted: A dynamic list of technical and soft skills mentioned in the resume.
• Education and Experience: Summarized educational background and work history
in an easy-to-read format.

Fig 7.3: Resume Analysis I

41
Fig 7.4: Resume Analysis II

4. Recommendations for Skills and Department Predictions

Based on the analysis, the system provides personalized recommendations:

• Skill Enhancements: Suggestions for missing or trending skills relevant to the user’s
field.
• Department Predictions: A pie chart visualization predicting suitable fields or
departments (e.g., IT, Finance, Marketing) based on the resume content. This is
powered by machine learning models trained on job role data.

Fig 7.5: Recommendations of skills

42
5. Experience Detection and Visualization

The system evaluates the user’s professional experience based on job titles, durations, and
descriptions:

• Experience Levels: Categorizes users as Beginner, Intermediate, or Expert.


• Pie Chart Visualization: Graphically represents the user’s experience level for
clarity and better understanding.

Fig 7.6: Experience detection and Visualization

6. Suggestions for Tips, Ideas, Courses, and Certifications

The interface provides actionable insights to improve the user’s resume and career prospects:

• Resume Writing Tips: General advice for formatting, structure, and content.
• Course Recommendations: Suggests relevant courses and certifications to fill skill
gaps or enhance the user’s profile. These suggestions are derived from platforms like
Coursera, LinkedIn Learning, or Udemy.

The system calculates a resume score by comparing the user's skills and experience with
industry requirements:

• Scoring Breakdown: Highlights individual components such as skills, education,


experience, and formatting.
• Final Score: Provides a percentage-based score along with actionable steps to
improve weaker areas.

43
Fig 7.7: Suggestions for courses and certifications I

Fig 7.8: Suggestions for courses and certifications II

Admin Section
1. Login Page for Admin

44
The admin interface includes a secure login system to access backend data:

Fig 7.9: Admin Panel

2. Viewing User Data


Once logged in, the admin can access data collected from users:
• Data Dashboard: Displays parsed user information in a tabular format, categorized
by name, skills, experience, and score.

Fig 7.9: User Data View

45
CHAPTER 8
CONCLUSION
The resume parser project represents a significant achievement in leveraging machine
learning (ML), Natural Language Processing (NLP), and modern software engineering to
automate one of the most tedious and critical aspects of recruitment. By efficiently extracting,
organizing, and analyzing candidate information from diverse resumes, this system
demonstrates the potential of integrating advanced NLP with operational deployment to
create a user-friendly, scalable, and impactful application.
Below is a detailed overview of the project’s accomplishments, challenges, contributions to
HR technology, and future potential:

1. Project Accomplishments
The resume parser project successfully delivers a robust solution to transform unstructured
resumes into structured, actionable data. Key milestones include:
• Advanced Information Extraction:
o Employing NLP techniques like Named Entity Recognition (NER) enabled the
accurate identification of candidate details, including names, contact
information, educational qualifications, skills, and work experience.
o Customization and tuning of SpaCy models ensured high accuracy and
adaptability across various resume formats.
• Scalable and Portable Architecture:
o The system’s Dockerized deployment ensures consistent performance across
development and production environments, simplifying scalability as data
loads increase.
• Efficient Data Management:
o Integration with MySQL allows for organized, relational storage of parsed
data, enabling seamless access, retrieval, and filtering by HR professionals.
o This database-driven approach significantly enhances data accessibility and
query efficiency.
• User-Friendly Interface:
o Streamlit provides a clean and intuitive user interface, empowering non-
technical users, like recruiters, to interact with the system effectively without a
steep learning curve.

46
2. Key Insights and Challenges Addressed
Building a scalable and efficient resume parser required addressing several complexities
inherent to the variability in resume formats, terminologies, and technical integration:
• Handling Data Variability:
o Resumes often lack standardization in structure and style, posing a challenge
for text extraction. Implementing robust preprocessing (tokenization,
stemming, and stop-word removal) and customized NER models addressed
these inconsistencies.
• Ensuring Component Interoperability:
o Combining SpaCy for NLP, Docker for deployment, MySQL for data
management, and Streamlit for the user interface required precise integration
planning to ensure smooth communication between components.
o Dockerization played a pivotal role in achieving a standardized, conflict-free
deployment pipeline.
• Optimizing System Scalability:
o To ensure the system could process a growing number of resumes, model
optimization, database query refinement, and backend performance tuning
were implemented.
o These measures enhanced system reliability and reduced latency during
concurrent user operations.

3. Contributions to HR Technology
This project not only automates the resume parsing process but also contributes significantly
to the HR technology ecosystem:
• Streamlining Recruitment Processes:
o Automation reduces the manual effort in data extraction, enabling HR teams to
focus more on candidate engagement and decision-making.
• Data-Driven Insights:
o By structuring resume data, the system allows for trend analysis, identification
of in-demand skills, and informed decision-making based on objective
metrics.
• Open-Source Framework Potential:
o The modular design of the system can serve as a foundational framework for
developers and HR tech companies to build upon, fostering innovation in
recruitment analytics and automation.

47
4. Future Improvements and Expansion
The project lays the groundwork for future enhancements that could make the resume parser
even more powerful and versatile:
• Integration of Advanced NLP Models:
o Leveraging transformer-based models like BERT or GPT could significantly
improve context-aware extraction, especially for complex fields like technical
skills or specialized job roles.
• Automated Scoring and Ranking:
o Introducing a machine learning-based scoring mechanism to rank candidates
based on job descriptions would add a layer of decision support, further
streamlining the hiring process.
• Multi-Language Support:
o Adding multilingual parsing capabilities would make the system applicable to
global hiring needs, catering to diverse applicant pools.
• Enhanced Security Measures:
o Strengthening security protocols, such as advanced encryption, secure API
communication, and stricter role-based access control, would ensure data
compliance and user trust.

5. Final Reflections
In conclusion, this resume parser project underscores the transformative potential of
combining ML, NLP, containerization, and software engineering to modernize recruitment
processes. By addressing critical challenges in data handling, deployment, and scalability,
this system sets a strong foundation for intelligent automation in HR.
The adaptability of the system ensures its relevance in evolving recruitment landscapes, while
its user-centered design promotes widespread usability. Future iterations of this project will
integrate advanced technologies, further enhancing its capabilities to meet the dynamic needs
of modern HR teams.
Through this effort, we demonstrate how intelligent tools can convert labor-intensive tasks
into efficient, data-driven workflows, paving the way for faster, fairer, and more effective
hiring decisions worldwide.

48
CHAPTER 9
REFERENCES

1. S. Ren, W. Lu, and T. Zhao, "Resume Parsing Using Deep Learning Techniques for

Talent Acquisition," IEEE Access, vol. 10, pp. 34897–34907, 2022, doi:

10.1109/ACCESS.2022.3145699.

2. Y. Zhang, X. Wang, and J. Liu, "Data-Driven Resume Classification and Matching for

Recruitment Systems," in Proceedings of the 2022 International Joint Conference on

Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1432-1441, doi:

10.1109/IJCNN.2022.9890361.

3. R. Kumar and A. Jain, "NLP-Based Resume Shortlisting for E-Recruitment: A

Practical Implementation," in 2021 International Conference on Machine Learning

and Computing (ICMLC), pp. 62-67, doi: 10.1145/3386392.3400213.

4. L. Abhishek, S. Mandal, and A. Choudhury, "Resume Parsing Framework for E-

recruitment," in 2021 IEEE International Conference on E-Business Engineering

(ICEBE), Shanghai, China, 2021, pp. 96-102, doi: 10.1109/ICEBE.2021.9721762.

5. H. Lee and J. Cho, "Leveraging NLP for Skill Extraction in Job and Resume

Matching," Expert Systems with Applications, vol. 197, 2023, doi:

10.1016/j.eswa.2022.116610.

6. D. Patel, A. Shah, and S. Sharma, "Resume Shortlisting Using NLP," in 2023 IEEE

International Conference on Big Data (Big Data), Atlanta, GA, USA, pp. 1854–1861,

doi: 10.1109/BigData52589.2023.00045.

7. F. Tang, K. Xiang, and Z. Xie, "Automating Recruitment Using Semantic Search in

NLP," Journal of Information Systems and Technology Management, vol. 20, no. 1,

pp. 23–35, 2022, doi: 10.1590/1245-7862.2023.

49
8. V. Gupta, N. Rastogi, and A. Arora, "AI-Driven Analysis of Resumes for Automated

Hiring," Springer Advances in Artificial Intelligence and Applications, vol. 21, pp.

321–338, 2023, doi: 10.1007/978-3-031-08658-4.

9. M. Rodriguez, A. Kumar, and T. Rao, "Improving Recruitment Efficiency with

Knowledge-Based Resume Parsing," in 2022 IEEE Global Humanitarian Technology

Conference (GHTC), Seattle, WA, USA, pp. 203–210, doi:

10.1109/GHTC.2022.9898756.

10. K. Poonam and T. M. Sharma, "Semantic Parsing of Resumes for Effective E-

Recruitment," in Proceedings of the International Conference on Artificial

Intelligence and Data Engineering (ICAIDE), 2022, pp. 112-118, doi: 10.1007/978-3-

030-94528-7.

50
APPENDIX
Codes and Standards:
import streamlit as st
import nltk
import spacy
nltk.download('stopwords')
spacy.load('en_core_web_sm')

import pandas as pd
import base64, random
import time, datetime
from pyresparser import ResumeParser
from pdfminer3.layout import LAParams, LTTextBox
from pdfminer3.pdfpage import PDFPage
from pdfminer3.pdfinterp import PDFResourceManager
from pdfminer3.pdfinterp import PDFPageInterpreter
from pdfminer3.converter import TextConverter
import io, random
from streamlit_tags import st_tags
from PIL import Image
import pymysql
from Courses import ds_course, web_course, android_course, ios_course, uiux_course
import pafy
import plotly.express as px

def fetch_yt_video(link):
video = pafy.new(link)
return video.title

def get_table_download_link(df, filename, text):

51
"""Generates a link allowing the data in a given panda dataframe to be downloaded
in: dataframe
out: href string
"""
csv = df.to_csv(index=False)
b64 = base64.b64encode(csv.encode()).decode() # some strings <-> bytes conversions
necessary here
# href = f'<a href="data:file/csv;base64,{b64}">Download Report</a>'
href = f'<a href="data:file/csv;base64,{b64}" download="{filename}">{text}</a>'
return href

def pdf_reader(file):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle, laparams=LAParams())
page_interpreter = PDFPageInterpreter(resource_manager, converter)
with open(file, 'rb') as fh:
for page in PDFPage.get_pages(fh,
caching=True,
check_extractable=True):
page_interpreter.process_page(page)
print(page)
text = fake_file_handle.getvalue()

# close open handles


converter.close()
fake_file_handle.close()
return text

def show_pdf(file_path):
with open(file_path, "rb") as f:

52
base64_pdf = base64.b64encode(f.read()).decode('utf-8')
# pdf_display = f'<embed src="data:application/pdf;base64,{base64_pdf}" width="700"
height="1000" type="application/pdf">'
pdf_display = F'<iframe src="data:application/pdf;base64,{base64_pdf}" width="700"
height="1000" type="application/pdf"></iframe>'
st.markdown(pdf_display, unsafe_allow_html=True)

def course_recommender(course_list):
st.subheader("**Courses & Certificates Recommendations**")
c=0
rec_course = []
no_of_reco = st.slider('Choose Number of Course Recommendations:', 1, 10, 4)
random.shuffle(course_list)
for c_name, c_link in course_list:
c += 1
st.markdown(f"({c}) [{c_name}]({c_link})")
rec_course.append(c_name)
if c == no_of_reco:
break
return rec_course

connection = pymysql.connect(host='localhost', user='root', password='abcd')


cursor = connection.cursor()

def insert_data(name, email, res_score, timestamp, no_of_pages, reco_field, cand_level,


skills, recommended_skills,
courses):
DB_table_name = 'user_data'
insert_sql = "insert into " + DB_table_name + """
values (0,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
rec_values = (

53
name, email, str(res_score), timestamp, str(no_of_pages), reco_field, cand_level, skills,
recommended_skills,
courses)
cursor.execute(insert_sql, rec_values)
connection.commit()

st.set_page_config(
page_title="ProfileIQ",
page_icon='./Logo/header.jpg',
)

def run():
st.title("ProfileIQ")
st.sidebar.markdown("# Select User")
activities = ["User", "Admin"]
choice = st.sidebar.selectbox("Choose among the given options:", activities)
img = Image.open('./Logo/header.jpg')
img = img.resize((380, 250))
st.image(img)

# Create the DB
db_sql = """CREATE DATABASE IF NOT EXISTS SRA;"""
cursor.execute(db_sql)
connection.select_db("sra")

# Create table
DB_table_name = 'user_data'
table_sql = "CREATE TABLE IF NOT EXISTS " + DB_table_name + """
(ID INT NOT NULL AUTO_INCREMENT,
Name varchar(100) NOT NULL,
Email_ID VARCHAR(50) NOT NULL,

54
resume_score VARCHAR(8) NOT NULL,
Timestamp VARCHAR(50) NOT NULL,
Page_no VARCHAR(5) NOT NULL,
Predicted_Field VARCHAR(25) NOT NULL,
User_level VARCHAR(30) NOT NULL,
Actual_skills VARCHAR(1000) NOT NULL,
Recommended_skills VARCHAR(300) NOT NULL,
Recommended_courses VARCHAR(600) NOT NULL,
PRIMARY KEY (ID));
"""
cursor.execute(table_sql)
if choice == 'User':
# st.markdown('''<h4 style='text-align: left; color: #d73b5c;'>* Upload your resume,
and get smart recommendation based on it."</h4>''',
# unsafe_allow_html=True)
pdf_file = st.file_uploader("Choose your Resume", type=["pdf"])
if pdf_file is not None:
# with st.spinner('Uploading your Resume....'):
# time.sleep(4)
save_image_path = './Uploaded_Resumes/' + pdf_file.name
with open(save_image_path, "wb") as f:
f.write(pdf_file.getbuffer())
show_pdf(save_image_path)
resume_data = ResumeParser(save_image_path).get_extracted_data()
if resume_data:
## Get the whole resume data
resume_text = pdf_reader(save_image_path)

st.header("**Resume Analysis**")
st.success("Hello " + resume_data['name'])
st.subheader("**Your Basic info**")

55
try:
st.text('Name: ' + resume_data['name'])
st.text('Email: ' + resume_data['email'])
st.text('Contact: ' + resume_data['mobile_number'])
st.text('Resume pages: ' + str(resume_data['no_of_pages']))
except:
pass
cand_level = ''
if resume_data['no_of_pages'] == 1:
cand_level = "Fresher"
st.markdown('''<h4 style='text-align: left; color: #d73b5c;'>You are looking
Fresher.</h4>''',
unsafe_allow_html=True)
elif resume_data['no_of_pages'] == 2:
cand_level = "Intermediate"
st.markdown('''<h4 style='text-align: left; color: #1ed760;'>You are at
intermediate level!</h4>''',
unsafe_allow_html=True)
elif resume_data['no_of_pages'] >= 3:
cand_level = "Experienced"
st.markdown('''<h4 style='text-align: left; color: #fba171;'>You are at
experience level!''',
unsafe_allow_html=True)

st.subheader("**Skills Recommendation **")


## Skill shows
keywords = st_tags(label='### Skills that you have',
text='See our skills recommendation',
value=resume_data['skills'], key='1')

## recommendation

56
ds_keyword = ['tensorflow', 'keras', 'pytorch', 'machine learning', 'deep Learning',
'flask',
'streamlit']
web_keyword = ['react', 'django', 'node jS', 'react js', 'php', 'laravel', 'magento',
'wordpress',
'javascript', 'angular js', 'c#', 'flask']
android_keyword = ['android', 'android development', 'flutter', 'kotlin', 'xml', 'kivy']
ios_keyword = ['ios', 'ios development', 'swift', 'cocoa', 'cocoa touch', 'xcode']
uiux_keyword = ['ux', 'adobe xd', 'figma', 'zeplin', 'balsamiq', 'ui', 'prototyping',
'wireframes',
'storyframes', 'adobe photoshop', 'photoshop', 'editing', 'adobe
illustrator',
'illustrator', 'adobe after effects', 'after effects', 'adobe premier pro',
'premier pro', 'adobe indesign', 'indesign', 'wireframe', 'solid', 'grasp',
'user research', 'user experience']

recommended_skills = []
reco_field = ''
rec_course = ''
## Courses recommendation
for i in resume_data['skills']:
## Data science recommendation
if i.lower() in ds_keyword:
print(i.lower())
reco_field = 'Data Science'
st.success("** Our analysis says you are looking for Data Science Jobs.**")
recommended_skills = ['Data Visualization', 'Predictive Analysis', 'Statistical
Modeling',
'Data Mining', 'Clustering & Classification', 'Data Analytics',
'Quantitative Analysis', 'Web Scraping', 'ML Algorithms', 'Keras',
'Pytorch', 'Probability', 'Scikit-learn', 'Tensorflow', "Flask",
'Streamlit']

57
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='2')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(ds_course)
break

## Web development recommendation


elif i.lower() in web_keyword:
print(i.lower())
reco_field = 'Web Development'
st.success("** Our analysis says you are looking for Web Development Jobs
**")
recommended_skills = ['React', 'Django', 'Node JS', 'React JS', 'php', 'laravel',
'Magento',
'wordpress', 'Javascript', 'Angular JS', 'c#', 'Flask', 'SDK']
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='3')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(web_course)
break

## Android App Development


elif i.lower() in android_keyword:
print(i.lower())

58
reco_field = 'Android Development'
st.success("** Our analysis says you are looking for Android App
Development Jobs **")
recommended_skills = ['Android', 'Android development', 'Flutter', 'Kotlin',
'XML', 'Java',
'Kivy', 'GIT', 'SDK', 'SQLite']
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='4')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(android_course)
break

## IOS App Development


elif i.lower() in ios_keyword:
print(i.lower())
reco_field = 'IOS Development'
st.success("** Our analysis says you are looking for IOS App Development
Jobs **")
recommended_skills = ['IOS', 'IOS Development', 'Swift', 'Cocoa', 'Cocoa
Touch', 'Xcode',
'Objective-C', 'SQLite', 'Plist', 'StoreKit', "UI-Kit", 'AV
Foundation',
'Auto-Layout']
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='5')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',

59
unsafe_allow_html=True)
rec_course = course_recommender(ios_course)
break

## Ui-UX Recommendation
elif i.lower() in uiux_keyword:
print(i.lower())
reco_field = 'UI-UX Development'
st.success("** Our analysis says you are looking for UI-UX Development Jobs
**")
recommended_skills = ['UI', 'User Experience', 'Adobe XD', 'Figma', 'Zeplin',
'Balsamiq',
'Prototyping', 'Wireframes', 'Storyframes', 'Adobe Photoshop',
'Editing',
'Illustrator', 'After Effects', 'Premier Pro', 'Indesign', 'Wireframe',
'Solid', 'Grasp', 'User Research']
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='6')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(uiux_course)
break

#
## Insert into table
ts = time.time()
cur_date = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d')
cur_time = datetime.datetime.fromtimestamp(ts).strftime('%H:%M:%S')
timestamp = str(cur_date + '_' + cur_time)

60
### Resume writing recommendation

st.subheader("**Resume Tips & Ideas **")


resume_score = 0
if 'Objective' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
Objective</h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add your career objective, it will give your career intension to the
Recruiters.</h4>''',
unsafe_allow_html=True)

if 'Declaration' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
Delcaration✍/h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Declaration✍. It will give the assurance that everything written
on your resume is true and fully acknowledged by you</h4>''',
unsafe_allow_html=True)

if 'Hobbies' or 'Interests' in resume_text:


resume_score = resume_score + 20

61
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
your Hobbies </h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Hobbies . It will show your persnality to the Recruiters and
give the assurance that you are fit for this role or not.</h4>''',
unsafe_allow_html=True)

if 'Achievements' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
your Achievements </h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Achievements . It will show that you are capable for the
required position.</h4>''',
unsafe_allow_html=True)

if 'Projects' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
your Projects </h4>''',
unsafe_allow_html=True)
else:
st.markdown(

62
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Projects . It will show that you have done work related the
required position or not.</h4>''',
unsafe_allow_html=True)

st.subheader("**Resume Score **")


st.markdown(
"""
<style>
.stProgress > div > div > div > div {
background-color: #d73b5c;
}
</style>""",
unsafe_allow_html=True,
)
my_bar = st.progress(0)
score = 0
for percent_complete in range(resume_score):
score += 1
time.sleep(0.1)
my_bar.progress(percent_complete + 1)
st.success('** Your Resume Score: ' + str(score) + '**')
st.warning(
"** Note: This score is calculated based on the content added in the Resume.
**")
st.balloons()

insert_data(resume_data['name'], resume_data['email'], str(resume_score),


timestamp,
str(resume_data['no_of_pages']), reco_field, cand_level,
str(resume_data['skills']),
str(recommended_skills), str(rec_course))

63
## Resume writing video

# st.header("**Bonus Video for Resume Writing Tips **")


# resume_vid = random.choice(resume_videos)
# res_vid_title = fetch_yt_video(resume_vid)

# st.subheader(" **" + res_vid_title + "**")


# st.video(resume_vid)
#
# ## Interview Preparation Video

# st.header("**Bonus Video for Interview Tips **")


# interview_vid = random.choice(interview_videos)
# int_vid_title = fetch_yt_video(interview_vid)

# st.subheader(" **" + int_vid_title + "**")


# st.video(interview_vid)

connection.commit()
else:
st.error('Something went wrong..')
else:
## Admin Side
st.success('Welcome to Admin Side')
# st.sidebar.subheader('**ID / Password Required!**')

ad_user = st.text_input("Username")
ad_password = st.text_input("Password", type='password')
if st.button('Login'):
if ad_user == 'abcd' and ad_password == 'abcd':
st.success("Welcome ")
# Display Data
cursor.execute('''SELECT*FROM user_data''')

64
data = cursor.fetchall()

st.header("**User's Data**")
df = pd.DataFrame(data, columns=['ID', 'Name', 'Email', 'Resume Score',
'Timestamp', 'Total Page',
'Predicted Field', 'User Level', 'Actual Skills', 'Recommended
Skills',
'Recommended Course'])
st.dataframe(df)
st.markdown(get_table_download_link(df, 'User_Data.csv', 'Download Report'),
unsafe_allow_html=True)
## Admin Side Data
query = 'select * from user_data;'
plot_data = pd.read_sql(query, connection)

## Pie chart for predicted field recommendations


# labels = plot_data.Predicted_Field.unique()
# print(labels)
# values = plot_data.Predicted_Field.value_counts()
# print(values)
# st.subheader("Pie-Chart for Predicted Field Recommendations")
# fig = px.pie(df, values=values, names=labels, title='Predicted Field according to
the Skills')
# st.plotly_chart(fig)

labels = plot_data['Predicted_Field'].value_counts().index
values = plot_data['Predicted_Field'].value_counts().values
st.subheader("Pie-Chart for Predicted Field Recommendations")
fig = px.pie(plot_data, values=values, names=labels, title='Predicted Field
according to the Skills')
st.plotly_chart(fig)

### Pie chart for User's Experienced Level

65
# labels = plot_data.User_level.unique()
# values = plot_data.User_level.value_counts()

# st.subheader(" ** Pie-Chart for User's Experienced Level**")


# fig = px.pie(df, values=values, names=labels, title="Pie-Chart for Users
Experienced Level")
# st.plotly_chart(fig)

labels = plot_data['User_level'].value_counts().index
values = plot_data['User_level'].value_counts().values

st.subheader(" Pie-Chart for User's Experienced Level")


fig = px.pie(plot_data, values=values, names=labels, title="Pie-Chart for Users
Experienced Level")
st.plotly_chart(fig)

else:
st.error("Wrong ID & Password Provided")

run()

66

You might also like