Project 1 Final Report
Project 1 Final Report
BITE497J – Project I
Bachelor of Technology
In
by
November, 2024
1
2
3
ACKNOWLEDGEMENT
It is our pleasure to express with a deep sense of gratitude to my BITE497 - Project I guide
Dr. Pradeepa M, School of Computer Science Engineering and Information Systems, Vellore
Institute of Technology, Vellore for her constant guidance, continual encouragement, in our
endeavor. Our association with her is not confined to academics only, but it is a great
opportunity on our part to work with an intellectual and an expert in the field of Artificial
Intelligence and Image Processing.
"We would like to express our heartfelt gratitude to Honorable Chancellor Dr. G
Viswanathan; respected Vice Presidents Mr. Sankar Viswanathan, Dr. Sekar
Viswanathan, Vice Chancellor Dr. V. S. Kanchana Bhaaskaran; Pro-Vice Chancellor Dr.
Partha Sharathi Mallick; and Registrar Dr. Jayabarathi T.
Our whole-hearted thanks to Dean Dr. Sumathy S, School of Computer Science Engineering
and Information Systems, Head, Department of Information Technology, Dr. Prabhavathy P,
Information Technology Project Coordinator Dr. Sweta Bhattacharya & Dr. Praveen
Kumar Reddy, SCORE School Project Coordinator Dr. Srinivas Koppu, all faculty, staff and
members working as limbs of our university for their continuous guidance throughout my
course of study in unlimited ways.
It is indeed a pleasure to thank our parents and friends who persuaded and encouraged us to
take up and complete my project “MLOps – Resume Parser Model” successfully. Last, but
not least, we express my gratitude and appreciation to all those who have helped us directly or
indirectly towards the successful completion of the project “MLOps – Resume Parser
Model”.
4
Executive Summary
The MLOps - Resume Parser Model is an innovative tool designed to automate the process
of resume parsing and candidate evaluation, specifically aimed at organizations seeking to
optimize their recruitment workflow. This tool is intended for HR professionals, recruitment
agencies, and organizations that need to process and evaluate large numbers of resumes
efficiently and accurately.
The system begins by processing resumes, typically in formats like PDFs or DOCX, using
advanced Natural Language Processing (NLP) techniques. These techniques help clean and
standardize the text, making it ready for further processing. The next step involves Named
Entity Recognition (NER), which uses transformer-based models like BERT to identify key
entities such as candidate skills, work experience, education, and job titles. This structured
extraction ensures that relevant candidate information is pulled out in a usable format for easy
analysis.
Additionally, the model employs machine learning algorithms to match candidate profiles
with job descriptions, improving the alignment between applicants and the positions they are
applying for. By analyzing features such as skills, qualifications, and past experiences, the
model provides a ranked list of candidates based on their fit for the job, which helps
streamline the selection process.
The project is built around MLOps principles, ensuring seamless integration into existing HR
systems. By utilizing continuous integration, deployment, and model monitoring, the tool can
adapt to changes in job descriptions, candidate data, and industry requirements without
disrupting workflows. This makes the model scalable and efficient in handling dynamic
recruitment needs.
The user interface is designed to be intuitive, allowing recruiters to easily upload resumes and
receive structured, analyzed data in real-time. This reduces the time and effort required to
manually sift through resumes, enabling HR teams to focus on higher-value tasks like
interviews and candidate engagement.
5
CONTENTS
Page No.
Acknowledgement 4
Executive Summary 5
Table of Contents 6
List of Figures 7
List of Tables 8
1. INTRODUCTION 9
1.1 Objective 9
1.2 Motivation 10
1.3 Background 10
2. PROJECT DESCRIPTION AND GOALS 12
2.1 Project Description 12
2.2 Goals 14
3. LITERATURE REVIEW 16
4. TECHNICAL SPECIFICATIONS 19
4.1 System Architecture 19
4.2 Technology Stack 22
5. DESIGN APPROACH AND DETAILS 24
5.1 Design Approach 24
5.2 Constraints 35
6. PHASES OF PROJECT 36
6.1 Project Phases 36
7. PROJECT DEMONSTRATION 40
8. CONCLUSIONS 46
9. REFERENCES 49
10. APPENDIX 51
6
LIST OF FIGURES
7
LIST OF Tables
8
CHAPTER 1
INTRODUCTION
In the contemporary technological landscape, machine learning (ML) has emerged as a
cornerstone of innovation across industries. From automating mundane tasks to solving
complex problems, ML models are transforming the way businesses operate. However,
developing an ML model is only the first step in its lifecycle. Deploying these models into
real-world environments, ensuring their scalability, maintainability, and efficiency, has
introduced new challenges, giving rise to the concept of ML-Ops.
ML-Ops, an amalgamation of machine learning and DevOps principles, focuses on bridging
the gap between model development and deployment. It encompasses a set of practices and
tools designed to streamline the deployment, monitoring, and iterative improvement of
machine learning models. For organizations seeking to harness the power of AI, ML-Ops
provides the foundational framework for ensuring that models remain functional, reliable,
and relevant over time.
This project explores the application of ML-Ops to develop and deploy a resume parsing
system, leveraging machine learning and Natural Language Processing (NLP). The system is
designed to automate the extraction of structured information from resumes, such as contact
details, educational qualifications, work experience, and skills. By integrating ML-Ops
principles, this project ensures that the resume parser is scalable, maintainable, and capable of
handling diverse operational environments.
1.1 Objective
The primary objective of this project is to design and implement an end-to-end resume
parsing solution that incorporates ML-Ops principles for robust deployment and seamless
operation. Specifically, the project aims to:
• Develop a resume parser capable of handling resumes in various formats (PDF,
DOCX, TXT) and extracting meaningful information using NLP techniques.
• Ensure that the parser is easily deployable in diverse environments using Docker for
containerization.
• Provide a scalable solution to handle increasing volumes of resumes, leveraging ML-
Ops for continuous integration and delivery.
• Implement a relational database (MySQL) to store and manage parsed data, enabling
efficient querying and reporting.
• Design a user-friendly interface using Streamlit, allowing HR professionals to
interact with the system without requiring technical expertise.
• Address challenges in deployment, data management, and user interaction by
integrating ML-Ops best practices.
9
The project demonstrates how ML-Ops can address real-world challenges, ensuring the
reliability, scalability, and usability of machine learning models in production.
1.2 Motivation
The recruitment process is a critical yet time-intensive operation for organizations. As the
volume of applications increases, traditional manual methods of reviewing resumes become
impractical. This challenge has motivated the development of automated solutions that can
streamline the hiring process.
Resume parsing, as a solution, is increasingly being adopted by HR departments to address
the following challenges:
1. Time-Consuming Manual Processes: Manually reviewing resumes is labor-intensive
and prone to errors. Automating this process saves significant time and resources.
2. Diverse Resume Formats: Candidates submit resumes in various formats, making it
difficult to standardize and extract information manually. An NLP-based parser can
effectively handle this diversity.
3. Scalability Issues: As organizations grow, the volume of applications increases. A
scalable solution that integrates ML-Ops can ensure seamless performance even under
heavy workloads.
4. Efficiency in Candidate Selection: Automated resume parsing provides structured
data that HR teams can quickly analyze, ensuring more informed and efficient
decision-making.
This project is motivated by the potential to create an efficient, scalable, and reliable system
for automating the recruitment process, leveraging cutting-edge ML-Ops practices.
1.3 Background
The development and deployment of machine learning models have traditionally been viewed
as separate tasks. However, real-world applications demand an integrated approach that
considers the entire lifecycle of an ML model, from development to deployment and
maintenance. ML-Ops addresses this need by providing a framework for managing models in
production.
1.3.1 The Role of ML-Ops
ML-Ops incorporates practices from software engineering and DevOps, such as continuous
integration, continuous delivery, version control, and monitoring, into the domain of machine
learning. By adopting ML-Ops, this project ensures:
• Consistency Across Environments: Using containerization tools like Docker, the
system can be deployed on local machines, servers, or cloud environments with
minimal effort.
• Scalability: The system can handle increasing workloads, such as parsing thousands
of resumes, by deploying multiple containers or scaling database infrastructure.
10
• Maintainability: By monitoring model performance and ensuring seamless updates,
ML-Ops guarantees that the parser remains reliable over time.
1.3.2 NLP for Resume Parsing
Natural Language Processing (NLP) is the cornerstone of this project, enabling the system to
interpret and extract structured information from unstructured text. NLP techniques such as
tokenization, named entity recognition (NER), and text classification allow the parser to:
• Identify key information like names, email addresses, and skills.
• Understand context to distinguish between similar entities (e.g., "Python" as a skill vs.
"Python" as a keyword in a job description).
• Handle diverse linguistic styles and formatting in resumes.
1.3.3 Tools and Technologies
The following tools are employed to build the resume parser:
1. Python: A versatile programming language widely used for machine learning and
NLP tasks.
2. Streamlit: A framework for developing interactive user interfaces, making the system
accessible to HR professionals.
3. MySQL: A relational database for storing and managing parsed resume data, enabling
efficient querying and reporting.
4. Docker: A containerization tool that ensures consistent deployment across
environments, from local development to cloud servers.
1.3.4 Challenges Addressed by ML-Ops
The integration of ML-Ops addresses several challenges:
• Deployment Complexity: By containerizing the entire system, Docker simplifies
deployment and eliminates environment-specific issues.
• Data Management: MySQL ensures that the data extracted from resumes is stored
securely and can be queried efficiently.
• User Interaction: Streamlit provides a user-friendly interface, allowing HR
professionals to interact with the parser seamlessly.
11
CHAPTER 2
PROJECT DESCRIPTION AND GOALS
The growing demand for automation in recruitment processes has highlighted the importance
of efficient and scalable resume parsing systems. This project focuses on the development
and deployment of a robust resume parsing system leveraging machine learning (ML) and
Natural Language Processing (NLP). By integrating ML-Ops principles, the project ensures
that the system is not only functional but also scalable, maintainable, and adaptable to real-
world requirements.
The primary goal of the project is to streamline the resume parsing process for organizations,
enabling HR teams to process large volumes of resumes effectively while maintaining data
accuracy and system reliability.
2.1 Project Description
The project involves designing and implementing a comprehensive system that automates the
extraction, organization, and management of data from resumes. The resume parser employs
machine learning and NLP techniques to interpret unstructured data and extract meaningful
information. The system is built with a modular architecture, ensuring seamless integration of
components like the parser, database, and user interface.
2.1.1 Key Components of the System
1. Resume Parsing with NLP
The core functionality of the system is built on Natural Language Processing. NLP
techniques such as tokenization, named entity recognition (NER), and text
classification enable the parser to:
o Extract structured data such as names, contact details, education, work
experience, and skills from unstructured resume text.
o Handle various resume formats, including PDF, DOCX, and TXT, ensuring
versatility.
o Manage diverse linguistic styles and terminologies commonly found in
resumes.
2. User-Friendly Interface with Streamlit
A simple yet interactive web interface is developed using Streamlit, allowing HR
professionals to upload resumes, view parsed data, and perform filtering or searching
tasks. Key features include:
o Resume Upload: Users can upload resumes in different formats.
o Parsed Data Display: The extracted information is presented in a structured
format, such as tables or lists, for easy review.
12
o Filtering and Searching: HR professionals can filter resumes based on
specific skills, experience levels, or other criteria.
3. Data Storage and Management with MySQL
The system uses MySQL to store parsed resume data securely and efficiently. The
database is designed with scalability in mind, ensuring it can handle increasing
volumes of data.
o Structured Storage: Parsed data is stored in relational tables, making it easy
to query and analyze.
o Data Integrity: Mechanisms are implemented to ensure the accuracy and
consistency of stored data, even when dealing with incomplete or unusual
resume formats.
4. Containerization with Docker
To ensure consistent deployment across environments, the entire system is
containerized using Docker. This includes the NLP model, user interface, and
database components.
o Portability: Docker containers enable the system to run uniformly on local
machines, servers, or cloud platforms.
o Scalability: Containers can be replicated to handle larger workloads during
peak recruitment periods.
5. Integration of ML-Ops Practices
ML-Ops principles are applied throughout the project to automate deployment,
improve scalability, and ensure continuous monitoring and updates.
o Continuous Integration and Deployment: Automated pipelines streamline
updates to the NLP model or database configurations.
o Monitoring and Maintenance: Tools are implemented to track system
performance and detect issues proactively.
2.1.2 Workflow of the System
The project workflow comprises several stages to ensure smooth operation:
1. Input and Preprocessing: Resumes uploaded by users are converted into formats
suitable for NLP processing. For instance, PDF resumes are converted into text using
OCR (Optical Character Recognition) if needed.
2. Data Extraction and Parsing: The NLP model processes the text to extract key
details, which are then structured for database storage.
3. Database Management: Parsed data is stored in MySQL, ensuring it is secure,
organized, and easily accessible.
13
4. Output Generation: The parsed data is displayed on the Streamlit interface, where
users can filter, search, or export the information.
5. Deployment and Scaling: Docker ensures the system runs consistently across
different environments, with scalability mechanisms to handle high workloads.
The modular design of the system allows for easy integration of additional features, such as
support for more resume formats or advanced analytics tools.
2.2 Goals
The goals of the project are structured to address both technical and operational challenges in
deploying a resume parsing system. These goals ensure that the system meets the needs of
end users while adhering to ML-Ops best practices.
2.2.1 Automation of Resume Parsing
The primary goal is to develop an automated process for extracting key details from resumes.
The system should:
• Handle resumes in multiple formats, ensuring versatility and compatibility.
• Accurately extract structured information, such as contact details, education, work
experience, and skills.
• Minimize errors and inconsistencies, even when parsing resumes with unconventional
formats or incomplete data.
2.2.2 Scalability and Performance
The system must be capable of scaling to handle increasing volumes of resumes. This
includes:
• Dynamic Scaling: Deploying additional Docker containers to manage peak
workloads.
• Performance Optimization: Ensuring fast parsing and data retrieval times, even with
large datasets.
• System Reliability: Maintaining consistent performance across all deployment
environments.
2.2.3 User Accessibility
A key goal is to make the system user-friendly for non-technical users, particularly HR
professionals. The Streamlit-based interface is designed to:
• Allow easy uploading of resumes and viewing of parsed data.
• Enable filtering and searching for specific criteria, such as skills or experience levels.
• Provide an intuitive experience that requires minimal training or technical knowledge.
14
2.2.4 Efficient Data Management
Using MySQL as the database ensures efficient data handling, with goals including:
• Secure Data Storage: Protecting sensitive applicant information with robust security
measures.
• Efficient Querying: Allowing users to search and retrieve data quickly based on
specific parameters.
• Scalable Design: Ensuring the database can grow alongside the system to
accommodate more resumes.
2.2.5 Seamless Deployment
Deployment challenges are addressed through the use of Docker and ML-Ops practices, with
the following goals:
• Ensure consistent deployment across diverse environments, including local machines
and cloud servers.
• Minimize setup and configuration issues, enabling rapid deployment.
• Support continuous integration and updates to the NLP model and other system
components.
2.2.6 Long-Term Maintenance and Monitoring
ML-Ops principles guide the long-term goals for maintaining the system, such as:
• Automating updates to ensure the system remains up-to-date with the latest ML and
NLP advancements.
• Monitoring system performance to detect and address issues proactively.
• Providing regular reports on system usage and performance metrics.
15
CHAPTER 3
LITERATURE REVIEW
The literature review serves as a foundation for developing a robust and efficient resume
parsing system by exploring various methodologies, challenges, and technologies. It
discusses critical components, including Natural Language Processing (NLP), database
management, text preprocessing, and machine learning, emphasizing their relevance to the
project’s goals. The following subsections summarize the key findings and insights drawn
from academic research that informed the project.
[1] This paper highlights the integration of deep learning models in automating resume
parsing, focusing on extracting key entities such as skills, qualifications, and experiences.
The authors discuss the use of Named Entity Recognition (NER) and transformer-based
models like BERT for semantic analysis, enabling a better understanding of resume data.
This paper provided a foundation for using state-of-the-art NLP techniques for entity
extraction and classification. By implementing BERT embeddings and customizing NER
models, I was able to efficiently identify and categorize candidate information, which
significantly enhanced my project’s parsing accuracy and contextual relevance. The use of
deep learning for understanding complex resume data aligned perfectly with my project goals
of automating candidate profile extraction.
[2] This research introduces a machine learning framework for classifying resumes and
matching them with job descriptions. It emphasizes supervised learning techniques and
feature engineering to map resumes effectively to job requirements.
We gained insights into feature extraction and matching techniques, especially the importance
of structured feature representation. Adopting their proposed classification model allowed me
to streamline the job-candidate alignment process, improving the relevance of recommended
matches in my project. By leveraging their method, I was able to automate the job matching
process, ensuring better fit recommendations based on resume data and job descriptions.
[3] The paper focuses on preprocessing steps and tokenization methods that help standardize
unstructured resume data. It also highlights ranking mechanisms based on job criteria using
embedding similarity measures like cosine similarity.
This study guided the implementation of preprocessing pipelines, particularly in handling
varied resume formats. Techniques such as vectorization and similarity calculations directly
enhanced the efficiency of my resume shortlisting module, reducing noise and improving
relevance scores. Their focus on similarity measures was crucial in refining my project’s
ability to rank resumes based on job relevance, streamlining the recruitment process.
16
[4] The authors present a hybrid framework combining rule-based and machine learning
approaches to parse resumes effectively. They address challenges like unstructured data and
multilingual formats, providing a scalable solution for large datasets.
This framework offered valuable insights into combining deterministic methods with
probabilistic models to handle edge cases in resume parsing. Integrating rule-based logic
improved the system's ability to parse uncommon resume structures, making my project more
robust. The hybrid approach helped improve parsing accuracy for resumes that didn’t follow
standard formats, ensuring the system worked well with a wide range of inputs.
[5] This paper explores skill extraction techniques using advanced NLP pipelines. It evaluates
various models, including sequence tagging and dependency parsing, to identify and map
skills with job requirements.
The insights into sequence tagging models were particularly beneficial in refining my skill
extraction module. Implementing dependency parsing methods from the paper improved the
semantic understanding of skills, enhancing the accuracy of my project's job matching
process. The paper highlighted how to handle complex skill relationships and improve the
extraction process, which was essential for mapping skills to job descriptions in my system.
[7] The paper discusses semantic search algorithms, particularly context-aware embeddings
like BERT and RoBERTa, to improve the precision of job-candidate matching.
By leveraging context-aware embeddings, we were able to enhance the semantic similarity
measures in my project. This improved the system’s ability to understand nuanced
connections between resumes and job descriptions, resulting in better matching accuracy. The
use of semantic search allowed my project to go beyond keyword matching, ensuring more
contextually relevant job-candidate pairings.
[8] This paper emphasizes the role of domain-specific knowledge in enhancing resume
parsing models. It discusses the integration of pretrained language models with recruitment-
specific datasets to improve performance.
17
The research inspired the adaptation of domain-specific fine-tuning for pretrained models in
our project. This improved the relevance and specificity of extracted entities, ensuring
alignment with recruitment requirements. By using domain-specific data to fine-tune models,
we were able to achieve better accuracy in extracting job-relevant skills and qualifications,
which was critical for tailoring the system to the recruitment domain.
[9] The study integrates ontologies and taxonomies to refine the accuracy of entity
recognition in resumes. It highlights the use of structured knowledge bases for better
classification.
This paper provided the motivation to incorporate domain knowledge in the parsing model.
Ontology-based enhancements significantly improved the granularity and correctness of the
parsed data, boosting my project's overall system performance. By integrating taxonomies
specific to recruitment, I was able to achieve a more accurate classification of job skills and
experience in the resumes, which was vital for the job-matching algorithm.
[10] The authors explore semantic role labeling, dependency parsing, and entity linking to
extract structured data from resumes. They evaluate these techniques on unstructured textual
data, demonstrating their efficiency in generating organized datasets.
Incorporating semantic parsing techniques improved the structural organization of resume
data in our project. The combination of dependency parsing and entity linking enriched the
data quality, making the matching process more efficient and reliable. These techniques
allowed me to extract and link relevant entities within resumes, ensuring better data integrity
for downstream job matching and candidate recommendation.
18
CHAPTER 4
TECHNICAL SPECIFICATIONS
4.1 System Architecture
The Resume Parser project follows a modular architecture that integrates multiple
components, ensuring an efficient and accurate flow of operations from resume input to final
categorized output. Below is a detailed explanation of each component:
1. Streamlit UI: Upload Resume
o The Streamlit User Interface serves as the front-end of the system, designed
for ease of use and accessibility.
o It allows users to upload resumes in various formats such as PDF, DOCX, or
text files. The drag-and-drop functionality simplifies the upload process for
non-technical users.
o The UI validates the uploaded files, ensuring they meet the accepted format
and size limits, preventing errors in downstream processing.
o Once uploaded, the resumes are sent to the text extraction module for
processing.
2. Text Extraction
o This component is responsible for extracting raw text from the uploaded
resumes.
19
o For non-image-based files (e.g., PDFs or DOCX), parsers such as PyPDF2 or
python-docx are employed to extract the content.
o In cases where resumes are image-based (e.g., scanned documents), OCR
(Optical Character Recognition) tools such as Tesseract OCR are used to
convert the image content into machine-readable text.
o The extracted text is cleaned and preprocessed by removing unnecessary
formatting, symbols, and whitespaces to prepare it for NLP operations.
3. NLP Features: Entity Recognition
o The Natural Language Processing (NLP) module is the core component that
analyzes the extracted text to identify meaningful information.
o Using the NLTK (Natural Language Toolkit) library, the system applies
Named Entity Recognition (NER) to extract key entities such as:
▪ Personal Information: Name, contact details (email, phone number),
and address.
▪ Skills: Technical and soft skills relevant to the job market.
▪ Educational Qualifications: Degrees, institutions, and years of study.
▪ Professional Experience: Job titles, company names, durations, and
responsibilities.
o NER techniques rely on predefined datasets, tokenization, and part-of-speech
tagging to identify entities within the text.
4. ML Models: Categorization
o The extracted entities are further processed by the machine learning module
for categorization.
o The K-Nearest Neighbors (KNN) algorithm is used to classify resumes into
predefined categories, such as software development, data analysis, or
management roles.
o The KNN algorithm operates by comparing the attributes of the current
resume (e.g., skills and experience) with existing classified resumes to
determine its category.
o Feature engineering is performed to ensure the data passed to the ML model is
relevant and accurate. For instance, skills and experiences are encoded into
numerical vectors to make them processable by the algorithm.
o The categorization helps streamline the hiring process by matching candidates
to relevant job roles efficiently.
5. Output Layer (Streamlit)
20
o The processed data, including extracted entities and categorization results, are
displayed in a well-structured format using Streamlit.
o The output interface is interactive and user-friendly, allowing users to:
▪ View detailed information parsed from resumes.
▪ Download processed data in formats such as CSV or JSON for
integration with other systems.
▪ Perform further actions like searching, filtering, or exporting results.
o This layer bridges the back-end processing with the user, ensuring
transparency and usability.
6. Storage
o Parsed and categorized data is stored securely in a MySQL relational
database.
o The database is structured to efficiently organize information into tables, such
as:
▪ Candidate details (e.g., name, email, phone number).
▪ Educational background and work experience.
▪ Skills and certifications.
o The database design follows normalization principles to reduce redundancy
and improve data retrieval performance.
o MySQL also supports indexing and querying capabilities, allowing the system
to quickly retrieve candidate data for future processes such as analytics or
reporting.
7. Deployment Layer (Docker)
o The application is containerized using Docker, a tool that ensures the system
runs consistently across different environments.
o Docker encapsulates the application, along with its dependencies (e.g., Python
libraries, MySQL), into a portable container.
o Key benefits of Docker in this architecture include:
▪ Scalability: Additional containers can be deployed as the system scales
to handle more users or larger datasets.
▪ Portability: The containerized application can run on any system with
Docker installed, ensuring compatibility across development, testing,
and production environments.
21
▪ Simplified Maintenance: Updates to the system can be deployed
quickly by modifying and redeploying containers.
This architecture ensures an end-to-end workflow, from resume ingestion to categorized
output, with each module seamlessly integrating to deliver a robust solution.
23
CHAPTER 5
DESIGN APPROACH AND DETAILS
The goal of this project is to design and implement an application that performs advanced
machine learning operations to parse resumes. Users can upload resumes in different formats,
such as PDF, DOCX, or images. The application extracts and processes the content using
Optical Character Recognition (OCR) and NLP techniques, identifying key details like skills,
education, and experience. The processed data is categorized, displayed in an interactive
dashboard, and made available for download in structured formats like CSV or JSON.
5.1 Design Approach
5.1.1 User Interface (UI) Design
The user interface is crafted to ensure simplicity, accessibility, and functionality, catering to
users of all technical backgrounds. The UI design focuses on creating a seamless experience
for uploading resumes, processing them, and accessing results.
• File Upload Feature:
o Users can upload resumes through a streamlined file upload feature that
accepts multiple formats, including PDFs, DOCX, and image files.
o The interface supports drag-and-drop functionality to enhance usability.
• Process Button:
o A clearly labeled “Process” button allows users to trigger backend operations,
including OCR-based text extraction and NLP-based categorization.
• Interactive Data Visualization:
o Results are displayed on an interactive dashboard, showing parsed sections
like personal details, skills, education, work history, and more.
o The layout is designed to ensure readability, with key fields highlighted for
quick review.
• Download Option:
o Users can download the processed data in various structured formats, such as
CSV and JSON, for integration with external systems.
24
Fig 5.1: High-Level User Interface Design
25
o NLP preprocessing steps include tokenization, stopword removal, and
lemmatization to prepare data for further processing.
• Entity Recognition and Categorization:
o The cleaned text is processed using Named Entity Recognition (NER) via
NLTK to identify and extract fields such as:
▪ Personal Details: Name, phone number, and email address.
▪ Skills: Technical and soft skills.
▪ Education: Degrees and certifications.
▪ Work Experience: Job titles, companies, and durations.
o A KNN algorithm categorizes resumes into predefined job roles based on skill
sets and extracted information.
• Output Generation:
o Parsed and categorized data is formatted for display in the frontend.
o Results are also converted into downloadable formats like CSV and JSON,
ensuring compatibility with various external systems.
26
o Built using Streamlit, providing a simple, responsive, and user-friendly
interface for uploading resumes and viewing results.
• Backend:
o Developed in Python using Flask, which manages server-side logic, including
file handling, text parsing, and database integration.
• Text Extraction (OCR):
o Tesseract OCR processes image-based resumes, extracting textual content.
o PDF and DOCX files are processed using libraries like PyPDF2 and docx2txt.
• Machine Learning (ML) Models:
o NLTK is used for Named Entity Recognition (NER) to extract key fields.
o A KNN algorithm is implemented for job role categorization, ensuring
accurate classification of resumes.
• Database:
o MySQL is used to store parsed data for retrieval and further analysis.
• Deployment:
o The entire system is containerized using Docker for consistent performance
across environments.
27
3. Integration with Backend:
o The OCR output is seamlessly integrated into the NLP pipeline for entity
recognition and categorization.
29
o Preserves factual accuracy: Critical details, such as dates or names of
certifications, remain intact.
o Efficiency: Extractive summarization is computationally less demanding,
making it suitable for processing large volumes of resumes quickly.
• Applications:
o Technical roles: Focus on extracting specific certifications, programming
languages, or tools (e.g., AWS, Python).
o Leadership positions: Emphasis on key job responsibilities and
achievements.
2. Abstractive Summarization
Abstractive summarization generates new sentences to describe the key information from the
resume. This method improves readability and coherence by rephrasing and reorganizing
extracted details.
• Implementation in Resume Parsing:
o Transformer-based models, such as Llama 3 (8db), Pegasus, and BERT T5,
are utilized to analyze extracted data and generate summaries.
o Summaries are formatted into bullet points or concise paragraphs to improve
clarity and usability.
o Prompts provided to models focus on summarizing resumes specific to the job
role, ensuring that the output aligns with recruiter expectations.
• Advantages:
o Improves readability: Summaries are structured to be easy to comprehend,
even when dealing with verbose resumes.
30
o Flexibility: The model can emphasize different aspects (e.g., technical skills
vs. soft skills) depending on job requirements.
• Applications:
o Suitable for generating summaries for diverse industries, including creative
roles, academic profiles, and management positions.
3. Hybrid Summarization
Hybrid summarization leverages the strengths of both extractive and abstractive techniques to
create summaries that are precise, coherent, and concise.
• Implementation in Resume Parsing:
o The pipeline first applies extractive summarization to identify critical content
(e.g., skills, roles).
o The extracted data is passed through the Llama 3 (8db) model for rephrasing,
ensuring a coherent and engaging summary.
o This combined approach reduces redundancy while preserving factual
accuracy.
• Advantages:
o Balanced Output: Combines the accuracy of extraction with the readability of
abstraction.
o Customizable Summaries: Tailored to highlight key qualifications based on
the specific job role or recruiter preferences.
Fig 5.5: Hybrid Summarization Workflow
Balances original
Uses exact text from Creates fluid and
Output Style content with rephrased
the resume. rephrased summaries.
sentences.
31
Extractive Abstractive Hybrid
Feature
Summarization Summarization Summarization
32
• Text Segmentation: Break down OCR-extracted text into individual sentences and
paragraphs.
def text_segmentation(text):
"""Segment text into sentences."""
from nltk.tokenize import sent_tokenize
nltk.download('punkt')
segments = sent_tokenize(text)
return segments
• Text Cleaning: Remove unnecessary characters (e.g., line breaks, page numbers,
headers).
• Noise Reduction: Exclude irrelevant sections such as footers or disclaimers.
def noise_reduction(text):
"""Reduce noise by removing unwanted characters."""
import re
# Remove special characters and multiple spaces
clean_text = re.sub(r'[^a-zA-Z0-9\s]', '', text) # Keep only alphanumerics and
spaces
clean_text = re.sub(r'\s+', ' ', clean_text).strip() # Replace multiple spaces with a
single space
return clean_text
33
Fig 5.5: Text Cleaning Pipeline
def text_cleaning_pipeline(text):
"""Clean the text using segmentation and noise reduction."""
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt')
# Text segmentation
segments = text_segmentation(text)
# Noise reduction
clean_segments = [noise_reduction(segment) for segment in segments]
34
# Reconstruct the cleaned text
cleaned_text = ' '.join([' '.join(token_list) for token_list in filtered_tokens])
return cleaned_text
5.2 Constraints
• OCR accuracy can be constrained by the quality of the uploaded images or scanned
documents. Poor image quality, handwritten text, or complex formatting might reduce
OCR accuracy.
• Dependency on External Libraries: The application relies on external libraries such as
EasyOCR for text extraction and Pegasus for summarization. Compatibility issues,
updates, or deprecations in these libraries could impact the application's functionality.
• Summarization Quality: The quality of text summarization using the Pegasus model
might vary depending on the nature of the text extracted by OCR. Lengthy,
unstructured text may not be summarized effectively.
35
CHAPTER 6
PHASES OF PROJECT
36
o Named Entity Recognition (NER): Using SpaCy or BERT, the system
detects entities such as job titles, skills, organizations, and dates. Pre-trained
models are fine-tuned with labeled resume datasets for better accuracy.
o Custom Rules and Patterns: In scenarios where ML models struggle, custom
regex patterns are implemented to identify fields like phone numbers or email
addresses.
• Significance:
Feature extraction transforms unstructured resume data into actionable information,
laying the groundwork for subsequent parsing and analysis.
3. Database Integration: Storing Extracted Data
Parsed data is stored in a relational database (e.g., MySQL or PostgreSQL) for efficient
querying and further operations.
• Implementation:
o A database schema is designed with tables for candidate details, skills,
experience, and education, with appropriate relationships between them.
o SQLAlchemy or Django ORM is used for seamless interaction between the
application and the database.
• Significance:
A well-structured database enables recruiters to access, search, and manage parsed
data efficiently, making the system scalable and user-friendly.
37
2. System Performance Optimization: Speed and Scalability
With large-scale use in mind, optimizing the system for quick processing and scalability is
essential.
• Approach:
o Parallel Processing: Using multiprocessing techniques to process multiple
resumes simultaneously.
o Batch Processing: Handling resumes in batches to reduce computational
overhead and improve throughput.
o Caching: Implementing caching mechanisms to store frequently accessed data
for faster retrieval.
• Significance:
Optimized performance ensures the system can handle high volumes of resumes in
real-time scenarios without bottlenecks.
3. Security and Authentication: Safeguarding Sensitive Information
As resumes contain personal and professional details, implementing robust security measures
is critical.
• Implementation:
o Role-based Access Control (RBAC): Restricting access based on user roles
(e.g., recruiter, admin).
o Encryption: Securing sensitive data in transit and at rest using encryption
standards like AES.
o Audit Logs: Maintaining logs of user actions to ensure traceability and
compliance with regulations like GDPR.
• Significance:
Ensuring data security builds trust among users and protects the system from potential
breaches.
38
o Scalability: Additional containers can be deployed to handle increased
workloads.
• Significance:
Containerization simplifies deployment and makes the system robust against
environment-specific issues.
2. Real-time Resume Parsing: Interactive User Experience
The system is designed to provide real-time parsing results, enhancing the user experience.
• Implementation:
o Using Streamlit for an intuitive interface where users can upload resumes and
view parsed results in seconds.
o Displaying extracted data in a structured format, allowing users to provide
feedback or corrections.
• Significance:
Real-time parsing improves user engagement and enables faster decision-making.
3. Candidate Scoring and Ranking: Intelligent Decision Support
The parsed data is used to evaluate and rank candidates based on their relevance to job
descriptions.
• Scoring Algorithm:
o Weighting factors like skills, experience, and education.
o Comparing candidates against job criteria to generate a relevance score.
• Significance:
Automated scoring streamlines the hiring process, allowing recruiters to focus on top
candidates efficiently.
39
CHAPTER 7
PROJECT DEMONSTRATION
User Section
1. User Interface for Uploading Resume
The user interface is designed to provide a seamless experience for uploading resumes. Key
features include:
• File Upload: A clean and intuitive file upload button allows users to select resumes in
various formats (PDF, DOCX, TXT).
• Drag-and-Drop Option: Users can drag and drop files for convenience.
Once a resume is uploaded, the system displays the document on the interface to ensure users
have uploaded the correct file. Features include:
40
Fig 7.2: Displaying Upload Resume
3. Resume Analysis
The system performs a thorough analysis of the uploaded resume and displays key details,
including:
41
Fig 7.4: Resume Analysis II
• Skill Enhancements: Suggestions for missing or trending skills relevant to the user’s
field.
• Department Predictions: A pie chart visualization predicting suitable fields or
departments (e.g., IT, Finance, Marketing) based on the resume content. This is
powered by machine learning models trained on job role data.
42
5. Experience Detection and Visualization
The system evaluates the user’s professional experience based on job titles, durations, and
descriptions:
The interface provides actionable insights to improve the user’s resume and career prospects:
• Resume Writing Tips: General advice for formatting, structure, and content.
• Course Recommendations: Suggests relevant courses and certifications to fill skill
gaps or enhance the user’s profile. These suggestions are derived from platforms like
Coursera, LinkedIn Learning, or Udemy.
The system calculates a resume score by comparing the user's skills and experience with
industry requirements:
43
Fig 7.7: Suggestions for courses and certifications I
Admin Section
1. Login Page for Admin
44
The admin interface includes a secure login system to access backend data:
45
CHAPTER 8
CONCLUSION
The resume parser project represents a significant achievement in leveraging machine
learning (ML), Natural Language Processing (NLP), and modern software engineering to
automate one of the most tedious and critical aspects of recruitment. By efficiently extracting,
organizing, and analyzing candidate information from diverse resumes, this system
demonstrates the potential of integrating advanced NLP with operational deployment to
create a user-friendly, scalable, and impactful application.
Below is a detailed overview of the project’s accomplishments, challenges, contributions to
HR technology, and future potential:
1. Project Accomplishments
The resume parser project successfully delivers a robust solution to transform unstructured
resumes into structured, actionable data. Key milestones include:
• Advanced Information Extraction:
o Employing NLP techniques like Named Entity Recognition (NER) enabled the
accurate identification of candidate details, including names, contact
information, educational qualifications, skills, and work experience.
o Customization and tuning of SpaCy models ensured high accuracy and
adaptability across various resume formats.
• Scalable and Portable Architecture:
o The system’s Dockerized deployment ensures consistent performance across
development and production environments, simplifying scalability as data
loads increase.
• Efficient Data Management:
o Integration with MySQL allows for organized, relational storage of parsed
data, enabling seamless access, retrieval, and filtering by HR professionals.
o This database-driven approach significantly enhances data accessibility and
query efficiency.
• User-Friendly Interface:
o Streamlit provides a clean and intuitive user interface, empowering non-
technical users, like recruiters, to interact with the system effectively without a
steep learning curve.
46
2. Key Insights and Challenges Addressed
Building a scalable and efficient resume parser required addressing several complexities
inherent to the variability in resume formats, terminologies, and technical integration:
• Handling Data Variability:
o Resumes often lack standardization in structure and style, posing a challenge
for text extraction. Implementing robust preprocessing (tokenization,
stemming, and stop-word removal) and customized NER models addressed
these inconsistencies.
• Ensuring Component Interoperability:
o Combining SpaCy for NLP, Docker for deployment, MySQL for data
management, and Streamlit for the user interface required precise integration
planning to ensure smooth communication between components.
o Dockerization played a pivotal role in achieving a standardized, conflict-free
deployment pipeline.
• Optimizing System Scalability:
o To ensure the system could process a growing number of resumes, model
optimization, database query refinement, and backend performance tuning
were implemented.
o These measures enhanced system reliability and reduced latency during
concurrent user operations.
3. Contributions to HR Technology
This project not only automates the resume parsing process but also contributes significantly
to the HR technology ecosystem:
• Streamlining Recruitment Processes:
o Automation reduces the manual effort in data extraction, enabling HR teams to
focus more on candidate engagement and decision-making.
• Data-Driven Insights:
o By structuring resume data, the system allows for trend analysis, identification
of in-demand skills, and informed decision-making based on objective
metrics.
• Open-Source Framework Potential:
o The modular design of the system can serve as a foundational framework for
developers and HR tech companies to build upon, fostering innovation in
recruitment analytics and automation.
47
4. Future Improvements and Expansion
The project lays the groundwork for future enhancements that could make the resume parser
even more powerful and versatile:
• Integration of Advanced NLP Models:
o Leveraging transformer-based models like BERT or GPT could significantly
improve context-aware extraction, especially for complex fields like technical
skills or specialized job roles.
• Automated Scoring and Ranking:
o Introducing a machine learning-based scoring mechanism to rank candidates
based on job descriptions would add a layer of decision support, further
streamlining the hiring process.
• Multi-Language Support:
o Adding multilingual parsing capabilities would make the system applicable to
global hiring needs, catering to diverse applicant pools.
• Enhanced Security Measures:
o Strengthening security protocols, such as advanced encryption, secure API
communication, and stricter role-based access control, would ensure data
compliance and user trust.
5. Final Reflections
In conclusion, this resume parser project underscores the transformative potential of
combining ML, NLP, containerization, and software engineering to modernize recruitment
processes. By addressing critical challenges in data handling, deployment, and scalability,
this system sets a strong foundation for intelligent automation in HR.
The adaptability of the system ensures its relevance in evolving recruitment landscapes, while
its user-centered design promotes widespread usability. Future iterations of this project will
integrate advanced technologies, further enhancing its capabilities to meet the dynamic needs
of modern HR teams.
Through this effort, we demonstrate how intelligent tools can convert labor-intensive tasks
into efficient, data-driven workflows, paving the way for faster, fairer, and more effective
hiring decisions worldwide.
48
CHAPTER 9
REFERENCES
1. S. Ren, W. Lu, and T. Zhao, "Resume Parsing Using Deep Learning Techniques for
Talent Acquisition," IEEE Access, vol. 10, pp. 34897–34907, 2022, doi:
10.1109/ACCESS.2022.3145699.
2. Y. Zhang, X. Wang, and J. Liu, "Data-Driven Resume Classification and Matching for
10.1109/IJCNN.2022.9890361.
5. H. Lee and J. Cho, "Leveraging NLP for Skill Extraction in Job and Resume
10.1016/j.eswa.2022.116610.
6. D. Patel, A. Shah, and S. Sharma, "Resume Shortlisting Using NLP," in 2023 IEEE
International Conference on Big Data (Big Data), Atlanta, GA, USA, pp. 1854–1861,
doi: 10.1109/BigData52589.2023.00045.
NLP," Journal of Information Systems and Technology Management, vol. 20, no. 1,
49
8. V. Gupta, N. Rastogi, and A. Arora, "AI-Driven Analysis of Resumes for Automated
Hiring," Springer Advances in Artificial Intelligence and Applications, vol. 21, pp.
10.1109/GHTC.2022.9898756.
Intelligence and Data Engineering (ICAIDE), 2022, pp. 112-118, doi: 10.1007/978-3-
030-94528-7.
50
APPENDIX
Codes and Standards:
import streamlit as st
import nltk
import spacy
nltk.download('stopwords')
spacy.load('en_core_web_sm')
import pandas as pd
import base64, random
import time, datetime
from pyresparser import ResumeParser
from pdfminer3.layout import LAParams, LTTextBox
from pdfminer3.pdfpage import PDFPage
from pdfminer3.pdfinterp import PDFResourceManager
from pdfminer3.pdfinterp import PDFPageInterpreter
from pdfminer3.converter import TextConverter
import io, random
from streamlit_tags import st_tags
from PIL import Image
import pymysql
from Courses import ds_course, web_course, android_course, ios_course, uiux_course
import pafy
import plotly.express as px
def fetch_yt_video(link):
video = pafy.new(link)
return video.title
51
"""Generates a link allowing the data in a given panda dataframe to be downloaded
in: dataframe
out: href string
"""
csv = df.to_csv(index=False)
b64 = base64.b64encode(csv.encode()).decode() # some strings <-> bytes conversions
necessary here
# href = f'<a href="data:file/csv;base64,{b64}">Download Report</a>'
href = f'<a href="data:file/csv;base64,{b64}" download="{filename}">{text}</a>'
return href
def pdf_reader(file):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle, laparams=LAParams())
page_interpreter = PDFPageInterpreter(resource_manager, converter)
with open(file, 'rb') as fh:
for page in PDFPage.get_pages(fh,
caching=True,
check_extractable=True):
page_interpreter.process_page(page)
print(page)
text = fake_file_handle.getvalue()
def show_pdf(file_path):
with open(file_path, "rb") as f:
52
base64_pdf = base64.b64encode(f.read()).decode('utf-8')
# pdf_display = f'<embed src="data:application/pdf;base64,{base64_pdf}" width="700"
height="1000" type="application/pdf">'
pdf_display = F'<iframe src="data:application/pdf;base64,{base64_pdf}" width="700"
height="1000" type="application/pdf"></iframe>'
st.markdown(pdf_display, unsafe_allow_html=True)
def course_recommender(course_list):
st.subheader("**Courses & Certificates Recommendations**")
c=0
rec_course = []
no_of_reco = st.slider('Choose Number of Course Recommendations:', 1, 10, 4)
random.shuffle(course_list)
for c_name, c_link in course_list:
c += 1
st.markdown(f"({c}) [{c_name}]({c_link})")
rec_course.append(c_name)
if c == no_of_reco:
break
return rec_course
53
name, email, str(res_score), timestamp, str(no_of_pages), reco_field, cand_level, skills,
recommended_skills,
courses)
cursor.execute(insert_sql, rec_values)
connection.commit()
st.set_page_config(
page_title="ProfileIQ",
page_icon='./Logo/header.jpg',
)
def run():
st.title("ProfileIQ")
st.sidebar.markdown("# Select User")
activities = ["User", "Admin"]
choice = st.sidebar.selectbox("Choose among the given options:", activities)
img = Image.open('./Logo/header.jpg')
img = img.resize((380, 250))
st.image(img)
# Create the DB
db_sql = """CREATE DATABASE IF NOT EXISTS SRA;"""
cursor.execute(db_sql)
connection.select_db("sra")
# Create table
DB_table_name = 'user_data'
table_sql = "CREATE TABLE IF NOT EXISTS " + DB_table_name + """
(ID INT NOT NULL AUTO_INCREMENT,
Name varchar(100) NOT NULL,
Email_ID VARCHAR(50) NOT NULL,
54
resume_score VARCHAR(8) NOT NULL,
Timestamp VARCHAR(50) NOT NULL,
Page_no VARCHAR(5) NOT NULL,
Predicted_Field VARCHAR(25) NOT NULL,
User_level VARCHAR(30) NOT NULL,
Actual_skills VARCHAR(1000) NOT NULL,
Recommended_skills VARCHAR(300) NOT NULL,
Recommended_courses VARCHAR(600) NOT NULL,
PRIMARY KEY (ID));
"""
cursor.execute(table_sql)
if choice == 'User':
# st.markdown('''<h4 style='text-align: left; color: #d73b5c;'>* Upload your resume,
and get smart recommendation based on it."</h4>''',
# unsafe_allow_html=True)
pdf_file = st.file_uploader("Choose your Resume", type=["pdf"])
if pdf_file is not None:
# with st.spinner('Uploading your Resume....'):
# time.sleep(4)
save_image_path = './Uploaded_Resumes/' + pdf_file.name
with open(save_image_path, "wb") as f:
f.write(pdf_file.getbuffer())
show_pdf(save_image_path)
resume_data = ResumeParser(save_image_path).get_extracted_data()
if resume_data:
## Get the whole resume data
resume_text = pdf_reader(save_image_path)
st.header("**Resume Analysis**")
st.success("Hello " + resume_data['name'])
st.subheader("**Your Basic info**")
55
try:
st.text('Name: ' + resume_data['name'])
st.text('Email: ' + resume_data['email'])
st.text('Contact: ' + resume_data['mobile_number'])
st.text('Resume pages: ' + str(resume_data['no_of_pages']))
except:
pass
cand_level = ''
if resume_data['no_of_pages'] == 1:
cand_level = "Fresher"
st.markdown('''<h4 style='text-align: left; color: #d73b5c;'>You are looking
Fresher.</h4>''',
unsafe_allow_html=True)
elif resume_data['no_of_pages'] == 2:
cand_level = "Intermediate"
st.markdown('''<h4 style='text-align: left; color: #1ed760;'>You are at
intermediate level!</h4>''',
unsafe_allow_html=True)
elif resume_data['no_of_pages'] >= 3:
cand_level = "Experienced"
st.markdown('''<h4 style='text-align: left; color: #fba171;'>You are at
experience level!''',
unsafe_allow_html=True)
## recommendation
56
ds_keyword = ['tensorflow', 'keras', 'pytorch', 'machine learning', 'deep Learning',
'flask',
'streamlit']
web_keyword = ['react', 'django', 'node jS', 'react js', 'php', 'laravel', 'magento',
'wordpress',
'javascript', 'angular js', 'c#', 'flask']
android_keyword = ['android', 'android development', 'flutter', 'kotlin', 'xml', 'kivy']
ios_keyword = ['ios', 'ios development', 'swift', 'cocoa', 'cocoa touch', 'xcode']
uiux_keyword = ['ux', 'adobe xd', 'figma', 'zeplin', 'balsamiq', 'ui', 'prototyping',
'wireframes',
'storyframes', 'adobe photoshop', 'photoshop', 'editing', 'adobe
illustrator',
'illustrator', 'adobe after effects', 'after effects', 'adobe premier pro',
'premier pro', 'adobe indesign', 'indesign', 'wireframe', 'solid', 'grasp',
'user research', 'user experience']
recommended_skills = []
reco_field = ''
rec_course = ''
## Courses recommendation
for i in resume_data['skills']:
## Data science recommendation
if i.lower() in ds_keyword:
print(i.lower())
reco_field = 'Data Science'
st.success("** Our analysis says you are looking for Data Science Jobs.**")
recommended_skills = ['Data Visualization', 'Predictive Analysis', 'Statistical
Modeling',
'Data Mining', 'Clustering & Classification', 'Data Analytics',
'Quantitative Analysis', 'Web Scraping', 'ML Algorithms', 'Keras',
'Pytorch', 'Probability', 'Scikit-learn', 'Tensorflow', "Flask",
'Streamlit']
57
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='2')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(ds_course)
break
58
reco_field = 'Android Development'
st.success("** Our analysis says you are looking for Android App
Development Jobs **")
recommended_skills = ['Android', 'Android development', 'Flutter', 'Kotlin',
'XML', 'Java',
'Kivy', 'GIT', 'SDK', 'SQLite']
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='4')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(android_course)
break
59
unsafe_allow_html=True)
rec_course = course_recommender(ios_course)
break
## Ui-UX Recommendation
elif i.lower() in uiux_keyword:
print(i.lower())
reco_field = 'UI-UX Development'
st.success("** Our analysis says you are looking for UI-UX Development Jobs
**")
recommended_skills = ['UI', 'User Experience', 'Adobe XD', 'Figma', 'Zeplin',
'Balsamiq',
'Prototyping', 'Wireframes', 'Storyframes', 'Adobe Photoshop',
'Editing',
'Illustrator', 'After Effects', 'Premier Pro', 'Indesign', 'Wireframe',
'Solid', 'Grasp', 'User Research']
recommended_keywords = st_tags(label='### Recommended skills for you.',
text='Recommended skills generated from System',
value=recommended_skills, key='6')
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>Adding this skills to resume
will boost the chances of getting a Job </h4>''',
unsafe_allow_html=True)
rec_course = course_recommender(uiux_course)
break
#
## Insert into table
ts = time.time()
cur_date = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d')
cur_time = datetime.datetime.fromtimestamp(ts).strftime('%H:%M:%S')
timestamp = str(cur_date + '_' + cur_time)
60
### Resume writing recommendation
if 'Declaration' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
Delcaration✍/h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Declaration✍. It will give the assurance that everything written
on your resume is true and fully acknowledged by you</h4>''',
unsafe_allow_html=True)
61
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
your Hobbies </h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Hobbies . It will show your persnality to the Recruiters and
give the assurance that you are fit for this role or not.</h4>''',
unsafe_allow_html=True)
if 'Achievements' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
your Achievements </h4>''',
unsafe_allow_html=True)
else:
st.markdown(
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Achievements . It will show that you are capable for the
required position.</h4>''',
unsafe_allow_html=True)
if 'Projects' in resume_text:
resume_score = resume_score + 20
st.markdown(
'''<h4 style='text-align: left; color: #1ed760;'>[+] Awesome! You have added
your Projects </h4>''',
unsafe_allow_html=True)
else:
st.markdown(
62
'''<h4 style='text-align: left; color: #fabc10;'>[-] According to our
recommendation please add Projects . It will show that you have done work related the
required position or not.</h4>''',
unsafe_allow_html=True)
63
## Resume writing video
connection.commit()
else:
st.error('Something went wrong..')
else:
## Admin Side
st.success('Welcome to Admin Side')
# st.sidebar.subheader('**ID / Password Required!**')
ad_user = st.text_input("Username")
ad_password = st.text_input("Password", type='password')
if st.button('Login'):
if ad_user == 'abcd' and ad_password == 'abcd':
st.success("Welcome ")
# Display Data
cursor.execute('''SELECT*FROM user_data''')
64
data = cursor.fetchall()
st.header("**User's Data**")
df = pd.DataFrame(data, columns=['ID', 'Name', 'Email', 'Resume Score',
'Timestamp', 'Total Page',
'Predicted Field', 'User Level', 'Actual Skills', 'Recommended
Skills',
'Recommended Course'])
st.dataframe(df)
st.markdown(get_table_download_link(df, 'User_Data.csv', 'Download Report'),
unsafe_allow_html=True)
## Admin Side Data
query = 'select * from user_data;'
plot_data = pd.read_sql(query, connection)
labels = plot_data['Predicted_Field'].value_counts().index
values = plot_data['Predicted_Field'].value_counts().values
st.subheader("Pie-Chart for Predicted Field Recommendations")
fig = px.pie(plot_data, values=values, names=labels, title='Predicted Field
according to the Skills')
st.plotly_chart(fig)
65
# labels = plot_data.User_level.unique()
# values = plot_data.User_level.value_counts()
labels = plot_data['User_level'].value_counts().index
values = plot_data['User_level'].value_counts().values
else:
st.error("Wrong ID & Password Provided")
run()
66