0% found this document useful (0 votes)
73 views18 pages

Ranking Algorithm PAPER

The document presents an e-recruitment system that automatically ranks job applicants and mines their personality traits from online data. The system was tested in a real-world recruitment scenario and found to effectively identify top candidates, performing consistently compared to human recruiters for most positions. The system implements automated candidate ranking based on criteria extracted from LinkedIn profiles and personality traits mined from online text, using analytical hierarchy process to derive ranks based on recruiter-defined criteria weights.

Uploaded by

adithg16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views18 pages

Ranking Algorithm PAPER

The document presents an e-recruitment system that automatically ranks job applicants and mines their personality traits from online data. The system was tested in a real-world recruitment scenario and found to effectively identify top candidates, performing consistently compared to human recruiters for most positions. The system implements automated candidate ranking based on criteria extracted from LinkedIn profiles and personality traits mined from online text, using analytical hierarchy process to derive ranks based on recruiter-defined criteria weights.

Uploaded by

adithg16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1066-2243.htm

An integrated
An integrated e-recruitment e-recruitment
system for automated personality system
mining and applicant ranking
Evanthia Faliagka and Athanasios Tsakalidis 551
Computer Engineering and Informatics Department, University of Patras,
Received 19 October 2011
Patras, Greece, and Revised 21 October 2011
Giannis Tzimas 3 February 2012
Accepted 25 March 2012
Department of Applied Informatics in Management & Finance,
Faculty of Management and Economics,
Technological Educational Institute of Messolonghi, Messolonghi, Greece

Abstract
Purpose – The purpose of this paper is to present a novel approach for recruiting and ranking job
applicants in online recruitment systems, with the objective to automate applicant pre-screening.
An integrated, company-oriented, e-recruitment system was implemented based on the proposed
scheme and its functionality was showcased and evaluated in a real-world recruitment scenario.
Design/methodology/approach – The proposed system implements automated candidate ranking,
based on objective criteria that can be extracted from the applicant’s LinkedIn profile. What is more,
candidate personality traits are automatically extracted from his/her social presence using linguistic
analysis. The applicant’s rank is derived from individual selection criteria using analytical hierarchy
process (AHP), while their relative significance (weight) is controlled by the recruiter.
Findings – The proposed e-recruitment system was deployed in a real-world recruitment scenario,
and its output was validated by expert recruiters. It was found that with the exception of senior
positions that required domain experience and specific qualifications, automated pre-screening
performed consistently compared to human recruiters.
Research limitations/implications – It was found that companies can increase the efficiency of the
recruitment process if they integrate an e-recruitment system in their human resources management
infrastructure that automates the candidate pre-screening process. Interviewing and background
investigation of applicants can then be limited to the top candidates identified from the system.
Originality/value – To the best of the authors’ knowledge, this is the first e-recruitment system that
supports automated extraction of candidate personality traits using linguistic analysis and ranks
candidates with the AHP.
Keywords Recruitment, Human resource management, Selection, Social networking sites,
Data mining, Personality, E-recruitment, Personality mining, Recommendation systems,
Analytic hierarchy process
Paper type Research paper

1. Introduction
The rapid development of modern information and communication technologies in the
past few years and their introduction into peoples daily lives has greatly increased
the amount of information available at all levels of their social environment (Neuman,
2010). People have been steadily turning to the web to improve their knowledge and
skills (Ho et al., 2010) as well as for career development ( Jansen et al., 2005). What is
more, job seekers are increasingly using Web 2.0 services like LinkedIn and job search Internet Research
Vol. 22 No. 5, 2012
sites (Bizer and Rainer, 2005). On the other hand, a lot of companies use online pp. 551-568
r Emerald Group Publishing Limited
knowledge management systems to hire employees, exploiting the advantages of the 1066-2243
World Wide Web. These are termed e-recruitment systems and automate the process of DOI 10.1108/10662241211271545
INTR publishing positions and receiving CVs. The online recruitment problem is two sided:
22,5 it can be seeker oriented or company oriented. In the first case the e-recruitment system
recommends to the candidate a list of job positions that better fit his profile. In the
second case recruiters publish the specifications of available job positions and the
candidates can apply.
In online recruitment systems, candidates typically upload their CVs in the form
552 of a document with a loose structure, which must be considered by an expert recruiter.
However, this incorporates a great asymmetry of resources required from candidates
and recruiters and potentially increases the number of unqualified applicants. This
situation might be overwhelming to human resource (HR) agencies that need to allocate
HRs for manually assessing the candidate resumes and evaluating the applicants’
suitability for the positions at hand. Several e-recruitment systems have been proposed
with an objective to automate and speed-up the recruitment process, leading to a better
overall user experience and increasing efficiency. For example, SAT telecom reported
44 percent cost savings and a drop in the average time needed to fill a vacancy from 70
to 37 days (Pande, 2011) after deploying an e-recruitment system.
In this work we propose a novel approach for automated applicant ranking
and personality mining, which is implemented in the form of a company-oriented
e-recruitment system. Our objective is to limit interviewing and background
investigation of applicants solely to the top candidates identified by the system. This
can have a positive impact on the efficiency of the recruitment process and lead to
significant cost savings. To showcase the effectiveness of the proposed schemes, we
implemented and tested an integrated company-oriented e-recruitment system that
automates the candidate evaluation and pre-screening process. The system was
designed with the aim of being integrated with the companies’ HR management
infrastructure, assisting and not replacing the recruiters in their decision-making
process. Our approach differs from existing systems in that applicants’ evaluation is
based on a pre-defined set of objective criteria that are assessed on a numerical scale,
which are directly extracted from the applicant’s LinkedIn profile. The candidate’s
personality characteristics are automatically extracted from his social presence, as
shown in Faliagka et al. (2011a) and are also considered for the candidate’s evaluation.
The analytical hierarchy process (AHP) is employed for candidate ranking which
allows the selection criteria to be compared to one another in a rational and consistent
way, while their relative significance (weight) is controlled by the recruiters. The
proposed system was deployed in a real-world recruitment scenario and a set of
experimental results was derived and validated by expert recruiters. Our goal was to
answer the following research questions:
. How effective the proposed system is in discriminating the top candidates and
providing a rank that is consistent with the one provided by the expert
recruiters?
. How accurate is the proposed automated personality mining method, using the
human recruiter’s input as a reference.

The rest of this work is organized as follows. In Section 2 we present the related work
to this study, while in Section 3 we provide an overview of the proposed e-recruitment
system architecture. In Section 4 a personality mining scheme is proposed, to extract
applicant’s personality traits from textual data available for the candidate in the web.
In Section 5 the ranking algorithm based on AHP is detailed. The proposed system was
implemented in the form of a web application whose design and implementation is An integrated
presented in Section 6, while in Section 7 we present a set of experimental results e-recruitment
that showcase the effectiveness of the system in a real-world recruitment scenario.
Finally, Section 8 discusses the key findings and the main limitations of the present system
study and Section 9 concludes the paper.

2. Background 553
E-recruitment systems have seen an explosive expansion in the past few years
(De Meo et al., 2007), allowing HR agencies to target a very wide audience at a small
cost. Applicant tracking systems (ATS) are now the standard for managing the
recruiting process, by handling candidates’ job applications and companies’ job
openings electronically. These systems are usually provided in the form of web
applications, via Software as a Service model. Job openings from companies’ ATS are
often aggregated by internet “job board” services like Indeed and CareerJet that track
millions of job openings and allow job seekers to perform simple keyword searches
for positions in their preferred industry and location. Applicants typically apply for
positions by uploading their resume, which is manually evaluated by expert recruiters.
It must be noted though that a small fraction of overall applicants receives an offer
or a call for a job interview. In (Ramar and Sivaram, 2010) a study was performed at an
unnamed industry, which concluded that on average only one out of 120 applicants
got selected in a job opening, while the ratio of recruited candidates that made it to the
interview phase was approximately one out of 20. Thus, it follows that a degree of
automation in the recruitment process to determine the candidates that clearly do not
fit the position’s specifications can lead to an increased efficiency and high cost
savings.
Lead players in the area of e-recruitment systems such as JobVite and Monster
have added a degree of automation in the screening process of the applicants’ profiles,
which is integrated with the traditional ATS functions. This automation ranges from
easy to implement – and error prone – keyword queries (i.e. fetch candidates with
“.Net” in their resume) to more sophisticated semantic matching techniques, an
approach first proposed in Mochol et al. (2007). The latter associate semantically
equivalent concepts from the user’s CV with the job descriptions, using a dictionary of
synonyms. Several schemes have been proposed in the literature for the automation
of applicant profile screening, that combine techniques from classical IR and
recommender systems. E-Gen system (Kessler et al., 2007) performs analysis and
categorization of unstructured job offers (i.e. in the form of unstructured text
documents) as well as analysis and relevance ranking of candidates. The authors
present a strategy that uses automated filtering and lemmatization of CVs which are
represented as vectors, while applicant classification is based on support vector
machines. CommOn framework (Radevski and Trichet, 2006) applies semantic web
technologies in the field of HR management. In this framework the candidate’s
personality traits, determined through an online questionnaire filled-in by the
candidate, are considered for recruitment. However, the process of applying to a job
position is time-consuming, thus compromising the user friendliness of the system.
The aforementioned techniques, although useful, suffer from the discrepancies
associated with inconsistent CV formats, structure and contextual information. What
is more they are unable to evaluate some secondary characteristics associated
with CVs, such as style and coherence, which are very important in CV evaluation. The
authors envision e-recruitment systems integrated with the HR department’s processes
INTR with an objective to support human recruiters in their decision-making process.
22,5 E-recruitment systems’ role should be to increase the efficiency of the recruitment
process (e.g. by filtering clearly unqualified applicants) and not replace HR
professionals in the applicant selection process. The proposed system extracts
objective criteria from applicants’ LinkedIn profiles, which are evaluated against
the job positions’ hiring criteria to estimate the relevance of each candidate. What is
554 more, candidates’ social presence is mined for features reflecting their personality.
The authors envision traditional job-specific resumes being displaced by “live” profiles
with the full candidate employment history in future e-recruitment systems. This will
allow systems to easily evaluate the candidate’s profile for a broad range of job
positions and remove the complexity of generating (and subsequently parsing) free-
text resumes.

3. System overview
The proposed e-recruitment system implements automated candidate ranking based
on a set of credible criteria, which will be easy for companies to integrate with their
existing HR management infrastructure. The system architecture is shown in Figure 1.
We focus the present study on the exploitation of four complementary criteria, namely:
education (in years of formal academic training), work experience (in months), loyalty
(average number of months spent per job) and personality. When a position opens, the
recruiter inputs the weight of each selection criterion at the job position module and

Recruiter

Fills out the position


requirements

Fill out the


online form

Login with
Linked in
Candidates Calculating
personality
traits
Fill out their blog Ranking
algorithm
(AHP)

Figure 1.
Proposed system
architecture List with ranked
candidates
posts the position requirements. The recruitment process starts with the candidates An integrated
applying for a job position at the job application module. The candidate is given the e-recruitment
option to log into our system using his LinkedIn account credentials, which allows
the system to automatically extract all relevant criteria required for candidate system
pre-screening, directly from the user’s LinkedIn profile. On the other hand, for
assessing the candidate’s personality, we exploit textual data available for the
candidate on Web 2.0 sites. The applicant is asked to enter his blog URL and if one is 555
provided, the personality mining module applies linguistic analysis to the blog posts to
derive features reflecting the author’s personality traits.
The applicant’s qualifications as well as his scores at the selection criteria are
stored in the system’s database. It must be noted here that during the job application
process, the applicant is not required to manually enter information or participate in
time-consuming personality tests as in (Radevski and Trichet, 2006). Thus, the user
friendliness and the practicality of the system are not compromised. At the final step
of the recruitment process, the applicant Ranking module outputs an overall rank of
applicants. Each candidate’s rank acts as a score of how well his profile fits the
recruiter’s specifications. Ranking is based on the AHP, which compares the
applicants’ scores at the relevant selection criteria. The recruiter is then able to review
and re-rank the candidates changing the weights of the selection criteria. The
recruitment process ends with the top candidates being called to participate in the
interview process.

4. Personality mining
Applicant personality traits are crucial for applicant selection in many job positions,
but are overlooked in existing e-recruitment systems. Typically, candidates’
personality is assessed during the interview stage, which is reserved to the
candidates that passed the pre-screening phase. However, gathering some preliminary
data for the candidate’s personality would be valuable in the pre-screening phase,
especially in positions where the personality is regarded critical. Currently this task is
undertaken by the recruiters, which are widely acclaimed to perform background
checks on prospective employees, taking advantage of their web presence. However,
it would be more effective to automate this task using web mining techniques
(Faliagka et al., 2011b) and text analysis programs (Faliagka et al., 2011a).
In the Web 2.0 era there are large amounts of textual data for millions web users,
that have been shown to be reliable predictors of user’s personality. As mentioned, in
our system the candidate is asked to provide a link to his blog, since it has been
shown that blogs contain a range of linguistic characteristics that reflect aspects of a
blogger’s personality (Oberlander and Nowson, 2006). Previous works have shown that
by applying linguistic analysis to blogs they can derive the author’s personality traits
(Oberlander and Nowson, 2006; Gill et al., 2009), as well as his mood and emotions
(Mishne, 2005). These studies are based on text analysis programs such as Linguistic
Inquiry and Word Count (LIWC) to extract linguistic features which act as markers
of the author’s personality. LIWC tool (Pennebaker et al., 2001) was developed by
analyzing writing samples of several hundreds of university students, to correlate
word use to personality traits. It uses a dictionary of word stems classified in certain
psycholinguistic semantic and syntactic word categories. In Table I we can see an
example of such word categories. LIWC analyzes written text samples by counting
the relative frequencies of words that fall in each word category. Pennebaker and
King have found significant correlations between these frequency counts and the
INTR author’s personality traits (Pennebaker and King, 1999) as measured by the Big-Five
22,5 personality dimensions.
Among the Big-Five personality dimensions, extraversion has received the most
research attention, as it has been shown that it is adequately reflected through
language use in written speech and it is possible to be discriminated through text
analysis (Mairesse et al., 2007). Extraversion is a crucial personality characteristic
556 for candidate selection, especially in positions that interact with customers, while
social skills are important for managing teams. What’s more, it has been shown that
charismatic speakers and people who dominate meetings are usually extroverts
(Rienks and Heylen, 2006). Thus, in this work from the Big-Five personality
dimensions we focus on extraversion due to its importance in candidate selection.
Linguistic markers for extraversion are the use of many positive emotion words and
social process words, but fewer negative emotion words (Pennebaker et al., 2001).
In this work, the extraversion score is estimated directly from LIWC scores (or
frequencies), by summing the emotional positivity score and the social orientation
score, also obtained from LIWC frequencies:
. Emotional positivity score was calculated as the difference between LIWC scores
for positive emotion words and negative emotion words. Higher scores indicate
higher emotional positivity.
. Social orientation score was obtained from LIWC as the frequency of social
words (such as friend, buddy, coworker) and personal pronouns (the first person
pronoun is excluded). High scores indicate a high degree of references to other
people, and thus a high degree of sociability.
It must be noted here that extraversion score does not have a physical basis (i.e. we
cannot say that a person is twice as extrovert because he has twice as high
extraversion score) but rather quantifies the relative differences between individuals’
degree of extraversion. For example, in (Argamon et al., 2005) the authors label
bloggers in the top third of the extraversion distribution as extroverts and the bottom
third as introverts, while the rest of the sample is considered inconclusive. In this
work we model extraversion via scalar values, rather than treating it as a classification
problem (where each individual is marked as either introvert or extrovert). For
reference, in Figure 2 we plot the distribution of extraversion scores for 100 job
applicants with personal blogs, which were part of a large-scale recruitment scenario
detailed in Section 7. Although one would expect that focussing our method in
applicants with blogs could insert bias (e.g. bloggers could be regarded as more
extravert than the average), the distribution shown in Figure 2 is apparently normal.
This result is in accordance with previous research (Gill et al., 2009).

Feature Example

Anger words Hate, kill, pissed


Metaphysical issues God, heaven, coffin
Physical state/function Ache, breast, sleep
Inclusive words With, and, include
Social processes Talk, us, friend
Table I. Family members Mom, brother, cousin
Example of LIWC Past tense verbs Walked, were, had
word categories References to friends Pal, buddy, coworker
45.00 An integrated
40.00
e-recruitment
35.00
Frequency (%) 30.00
system
25.00
20.00
15.00 557
10.00
5.00
Figure 2.
0.00 Distribution of the
0 5 10 15 20 extraversion scores
Extroversion score

5. Applicant ranking
Candidate ranking is the process of assigning scores to applicants, which reflects how
well their profiles fit the recruiter’s specifications. Ranks are derived from applicants’
scores in individual criteria, i.e. education, work experience, loyalty and extraversion.
The overall applicant rank is obtained from individual scores using the AHP (Saaty,
1990), which allows selection criteria to be compared to one another in a rational and
consistent way. AHP is a decision-making technique for managing problems that
involve the consideration of multiple criteria simultaneously. Each criterion has a
different weight in the candidate rank, according to the requirements of the job
position. AHP uses a multi-level hierarchical structure of objectives, criteria and
alternatives, and provides a quantitative computational method for generating overall
ranks based on a pairwise judgment of the criteria.
Assuming an hierarchical structure (Figure 3), nodes represent criteria and
alternatives to be prioritized, while arcs reflect relationships between nodes in different
levels. Each relationship (arc) represents a relative weight (or importance) of a node
at Level L relating to a node at Level L1, where L ¼ 2, 3, y, n. In three-level AHP
considered in this work, Levels 1-3 correspond to an overall goal, a group of criteria
and a group of alternatives (i.e. the candidates to be evaluated), respectively. A decision
maker can choose among the alternatives based on the relative importance of each
one of them. The first step in the AHP process is to make pairwise comparisons of

Candidate
Level 1, Goal:
selection

Work
Level 2, Criteria: Extraversion Education Loyalty
experience

Figure 3.
The model of AHP
Level 3, for our candidate
Candidate 1 ... Candidate n selection problem
Alternatives:
INTR the selection criteria. Specifically, the recruiter has to compare the importance of the
22,5 abovementioned criteria, entering weights. These weights rank the relative
significance of each pair of criteria. For example, the recruiter has to decide how
much more important is work experience from education. In terms of the rating scale
used for quantifying pairwise comparisons, several approaches are available, although
Saaty’s (1990) linear scale (Table II) was the first proposed and has been used
558 extensively. The result of the pairwise comparison process is reflected in matrix A,
with wi the weight of the ith criterion entered by the recruiter:
0 1
w1 =w1 w1 =w2 . . . w1 =wn
B w2 =w1 w2 =w2 . . . w2 =w1 C
A¼B @ ...
C ð1Þ
... ... ... A
wn =w1 wn =w2 . . . wn =wn

In the sequence, the normalized eigenvector of the matrix A is computed, which serves
as the weight vector o that quantifies the relative significance of selection criteria.
The second step is the elicitation of candidates’ pairwise comparison judgments
with respect to each criterion in Level 2. Given n candidates to be pairwise compared,
four n  n comparative matrices are formed, one per selection criterion at Level 2.
Comparative matrix Bk (k ¼ 1, 2, y 4) corresponds to the kth criterion:
ðkÞ
Ci
Bk ¼ ðbkij Þnn 40; bkij ¼ ðkÞ
ð2Þ
Cj
Parameter Ci(k) corresponds to the value of the kth criterion for the ith candidate. For
each comparative matrix Bk, the normalized eigenvector Xk is computed, which is
termed as the criterion’s priority vector. Finally, the global priority vector R, with the
candidates’ final scores is computed with the linear combination of the weight vector o
with the priority vectors Xk:
X4
R¼ Xk o ð3Þ
k¼1

Intensity of
importance Definition Explanation

1 Equal importance Two activities contribute equally to the


objective
3 Weak importance of one over Experience and judgment slightly favor one
another activity over another
5 Essential or strong importance Experience and judgment strongly favor one
activity over another
7 Demonstrated importance An activity is strongly favored. Its dominance
demonstrated in practice.
9 Absolute importance The evidence favoring one activity is of the
highest possible order
Table II. 2,4,6,8 Intermediate values When compromise is needed
The AHP fundamental Reciprocals of If activity i has one of the above nonzero numbers assigned to it when compared
scale above nonzero with activity j, then j has the reciprocal value when compared with i
Since the pairwise comparison judgments are subjective, there is a concern regarding An integrated
consistency. Thus, the consistency index (CI) was suggested to quantify the degree of e-recruitment
inconsistency the AHP model can tolerate such that the judgment is still useful. Saaty
(1990) suggests the following CI, random index (RI) and consistency ratio (CR) system
definitions:

ðlmax Þn CI 559


CI ¼ ; CR ¼ ð4Þ
n1 RI
In (4) n is the number of candidates, lmax the eigenvector of matrix A and the value of
RI is obtained from Table III, based on the number of selection criteria. If the CR value
is o0.1, the judgments are regarded reasonably consistent and therefore acceptable.
If the CR is 40.10, then the judgments should be revised.

6. Implementation
The proposed e-recruitment system was fully implemented as a web application, in the
Microsoft .Net development environment. In this section we will present the main
application screens and discuss our design decisions and system implementation. The
system is divided in the recruiter’s side and the user’s side.

6.1 Job application process (user’s side)


Job applicants are given the option to authenticate using their LinkedIn account
credentials (see Figure 4). This allows the system to automatically extract the selection
criteria required for pre-screening from candidates’ LinkedIn profile, so the user
experience is streamlined. Users are authorized with LinkedIn API, which uses OAuth
as its authentication protocol. After successful user authentication, an OAuth token is
returned to our system which allows retrieving information from the candidate’s
private LinkedIn profile. It must be noted here that the system does not have direct
access to the candidate’s account credentials, which could be regarded as a security

Table III.
Selection criteria 1 2 3 4 5 6 7 8 Random index values
according to selection
RI 0.00 0.00 0.58 0.90 1.12 1.24 1.32 1.41 criteria number

Figure 4.
The candidate uses his
LinkedIn account and
fills in his blog feed
INTR risk. Users without a LinkedIn profile are given the option to enter the required
22,5 information manually.
As part of the job application process, the candidate is asked to fill-in the feed URI of
his personal blog. This allows our system to syndicate the blog content and calculate
the extraversion score with the personality mining technique presented in Section 4.
Blog posts are input to the TreeTagger tool (Schmid, 1995) for lexical analysis and
560 lemmatization. Then, using the LIWC dictionary which is distributed as part of the
LIWC tool, our system classifies the canonical form of words output from TreeTagger
in one of the word categories of interest (i.e. positive emotion, negative emotion and
social words). Finally, after processing the 50 most recent blog posts, the system
estimates the relative frequencies of the aforementioned word categories and calculates
the applicant’s extraversion score. The candidate can then apply for one or more of
the available job positions. The system stores the job application(s) along with the
candidate’s scores in the selection criteria in the database and the candidate is notified
via e-mail to participate in an interview if he passes the pre-screening phase.

6.2 Recruitment process (recruiter’s side)


After authenticating with their account credentials, recruiters have access to the
recruitment module, which gives them rights to post new job positions and evaluate
job applicants. In Figure 5 the new position panel can be seen, where the recruiter
must fill-in the position title, the position description and the weights to be used for
applicant ranking. Specifically, the recruiter has to compare the importance of the
abovementioned criteria, entering weights that rank the relative significance of each
pair of criteria as discussed in Section 5. Finally, in the “rank candidates” menu,
shown in Figure 5, the recruiter is presented with a list of all available job positions
and the candidates that have applied for each one of them. Upon the recruiter’s request
the system estimates each applicant’s rank and sorts them accordingly. The recruiter
can change the weights of the selection criteria and re-rank the candidates, until he is
satisfied with the results.

Figure 5.
New job position panel
7. Experimental evaluation An integrated
The proposed system was deployed in a real-world recruitment scenario to investigate e-recruitment
its effectiveness in ranking job applicants. The purpose of our investigation was
twofold: first, to test how effective the proposed system is in providing an accurate system
rank of top candidates for a certain position and second, to estimate the accuracy of the
proposed personality mining method.
561
7.1 Data collection
In our recruitment scenario we compiled a corpus of 100 applicants with a LinkedIn
account and a personal blog, since these are key requirements of the proposed system.
The applicants were selected randomly via Google blog search API with the sole
requirement of having a technical background, as indicated by the blog metadata,
as well as a LinkedIn profile. Specifically, we used the Google profile search API to
search for bloggers in the “technology” industry. The search results were manually
inspected and only bloggers with a LinkedIn profile associated with their blogs
were taken into account. What is more, blogs with no autobiographical content
(e.g. technical blogs) were excluded from our study, as they carried no information
regarding the author’s personality. Our corpus of job applicants was formed by
choosing the first 100 blogs returned from the profile search API that fulfilled
our preconditions. The corpus selection process is not expected to introduce bias to our
experimental results, as it is independent of the candidate selection criteria (i.e. all
bloggers in the technology industry have the same chance of being selected as part of
the corpus regardless of work experience, personality or loyalty).
We also collected three representative technical positions announced by an
unnamed IT company with different requirements, as shown in Table IV. The use of
different requirements per position is expected to test the ability of our system to match
candidate’s profiles with the appropriate job position. It can be seen that the sales
engineering position favors a high degree of extraversion, while experience is the most
important feature for senior programmers. Junior programmers are mainly judged by
loyalty (a company would not invest in training an individual prone to changing
positions frequently) as well as education. It must be noted here that though the focus
of our pilot scenario was on technical positions, the procedures used were not specific
to these positions, thus our analysis is also applicable to other industries.

7.2 Experimental results


In our experimental setup we assume that each applicant in the corpus has applied for
all available job positions. For each job position, applicants were ranked according to
their suitability for the job position both by the system (automated ranking) and by an
expert recruiter, based on a set of pre-defined selection criteria. Due to the subjectivity
of the extraversion criterion, a set of experiments was performed to assess the accuracy
of the personality mining process (Section 7.2.1). The extraversion score of each
candidate was compared against the score assigned by an expert recruiter, who had

Extraversion Education Experience Loyalty


Table IV.
Sales engineer 0.31 0.24 0.31 0.14 Offered positions and
Junior programmer 0.12 0.32 0.18 0.38 corresponding weights
Senior programmer 0.11 0.22 0.43 0.24 per selection criterion
INTR access to the same blog posts as the system. The difference between these scores
22,5 (grading error) was quantified with the cumulative distribution function (CDF) as well
as the correlation coefficient (Spearman’s r) of these scores.
For each job position the system calculated the pairwise comparison matrix A
based on (1). Our system ranks the candidates with the AHP as detailed in Section 5,
calculating the priority vectors (one per criterion) and the candidate’s final scores. An
562 example of these calculations for the sales engineer position is shown in Table V, for
five out of 100 applicants due to space limitations. The applicants were also evaluated
by a human recruiter in collaboration with the university’s HR office and a ranking
of the top candidates was provided for each offered position. In order to achieve
consistent results, the weights of the selection criteria were selected by the same expert
recruiters that provided the candidate rankings. The recruiter’s rankings for each
position were compared to the rankings output by the system, to assess whether the
system can provide a rank consistent with the one provided by the recruiter (see
Section 7.2.2).
7.2.1 Personality mining evaluation results. In this section we evaluate the
effectiveness of the personality mining approach, as presented in Section 4, for
accurately grading the applicants’ personality. As mentioned in Section 4, our system
exploits textual data extracted from the candidate’s blog to extract his extraversion
score. The extraversion score of each of the 100 candidates is compared against
the corresponding score assigned by an expert recruiter, who had access to the same
blog posts as the system. To evaluate the accuracy of the automated personality
grading, using the human recruiter’s scores as a reference, we calculated the CDF
of the absolute grading error, shown in Figure 6. It can be seen that for 80 percent of
the applicants the error is o1.5 grade in a grading scale of 0-5, while the correlation

Extraversion Education Work experience Loyalty Final score

Candidate 1 0.0097 0.0073 0.00058 0.0016 0.0052


Candidate 2 0.01 0.0098 0.0016 0.0015 0.0064
Candidate 3 0.0092 0.0073 0.0002 0.0005 0.0047
Candidate 4 0.018 0.0098 0.016 0.0051 0.0136
Table V. Candidate 5 0.012 0.01 0.035 0.049 0.0240
Local and global priorities y y y y y y

1
0.9
Fraction of candidates

0.8
0.7
0.6
0.5
0.4
0.3
Figure 6.
CDF of the absolute 0.2
grading error, with 0.1
recruiter’s score as 0
reference 0 1 2 3 4 5
Absolute grading error
coefficient (Spearman’s r) between the recruiter’s and the system’s scores was An integrated
measured 0.63. Thus, it follows that the personality mining process implemented in our e-recruitment
system can be trusted for a pre-assessment of the applicant’s personality.
It must be noted here that the system’s and the recruiter’s extraversion scores were system
initially expressed in a different rating scale. Thus a rescaling of both scores was
performed in the grading scale 0-5 before measuring the grading errors.
7.2.2 Ranking evaluation results. As mentioned earlier, we envision our system 563
being used in collaboration with the companies’ HR department for compiling the
list of top-ranked employees, which will be considered for employment. Our goal in this
e-recruitment scenario is to compare the rankings output from the system with the
ones provided by human recruiters for three different offered positions. The system’s
performance is evaluated based on how effective it is in discriminating the top
candidates and providing a rank that is consistent with the one provided by the human
recruiters. Three metrics were used for comparing rankings; the simplest one is the
overlap size of the top-k list selected by the system and the human recruiter for each
job position, where k ¼ 25 corresponds to 25 percent of overall applicants. In practice,
k will correspond to the number of applicants that pass to the next stage of the
recruitment process. The second metric is the correlation coefficient (Spearman’s r) of
the top-k candidates per job position. It is calculated on the common list of top-k
applicants of both rankings. The third metric is the mean absolute difference (ranking
error) of top-k candidate’s ranks. The performance metrics for all three positions
can be seen in Table VI. To provide a more intuitive representation of ranking
correlations, we plotted the pairs of ranks (system rank, recruiter rank) for the common
top-25 applicants per job position on a 2D plane (see Figure 7). Pairs that lie on (or are
close to) the diagonal indicate that the system and the recruiter agree on a rank, while
points above and below the diagonal indicate candidates where the recruiter and the
system assigned a different rank.

Top-k Correlation Ranking error

Sales engineer 19 (76%) 0.72 4.15 Table VI.


Junior programmer 21 (84%) 0.75 3.9 Performance evaluation
Senior programmer 9 (36%) 0.49 7 metrics per job position

30

25
Recruiter’s rank

20

15

10
First position
Second position
5 Third position
Figure 7.
Diagonal System and recruiter rank
0 for the common top-k
0 5 10 15 20 25 30 applicants per position
System’s rank
INTR It can be seen that the consistency of the ranking provided by the system is highly
22,5 dependent on the nature of the offered position. For the sales position, the applicant
score is dominated by the highly subjective extraversion score, thus increasing the
uncertainty of the overall rank. Nevertheless, the system was able to output a top-25
list that overlapped by 76 percent with the one provided by the recruiter and had a rank
correlation index of 0.72. On the other hand the selection of junior programmer
564 candidates is based on more objective criteria such as loyalty and education, thus
resulting in a very high overlapping degree of 84 percent and a rank correlation index
of 0.75. Finally regarding the senior programmer’s position, the system exhibits a poor
performance with a 36 percent overlapping degree and 0.49 correlation index. This
can be attributed to the complexity of evaluating an applicant’s CV for a senior position
which typically requires domain experience and specific qualifications, that cannot
be easily captured by pre-defined selection criteria employed in our system. In this
scenario the recruiter favored the applicants with experience in web technologies
that was the position’s domain while the recruitment system considered overall
experience, which explains the disparity in the rankings. Nevertheless the system was
able to filter out clearly unqualified applicants that should not have applied for the
position, thus it still would be useful as a “first line of defense.”

8. Discussion
In this paper we have presented a novel approach for ranking job applicants in online
recruitment systems. The proposed system relies on objective criteria extracted from
the applicant’s LinkedIn profile and subjective criteria extracted from their social
presence, to rank candidates and infer their personality traits. A prototype system
was developed based on the proposed scheme, which was deployed in a large-scale
pilot scenario to investigate its effectiveness in the real world. The purpose of our
investigation was twofold: first, to test how effective the proposed system is in
providing an accurate rank of top candidates for a certain position, and second to
estimate the accuracy of the proposed personality mining method.
Regarding the social mining method proposed in this work, our system was able to
successfully assess candidates’ extraversion, with a grading error of o1.5 grades in a
grading scale of 0-5 for 80 percent if candidates and a correlation coefficient
(Spearman’s r) between the recruiter’s and the system’s scores of 0.63. This degree
of correlation is significant, as trying to replicate the actual scores assigned by the
recruiter to the users’ extraversion is a hard problem. Regarding the applicant ranking
problem, it was found that the system was able to successfully discriminate the top
candidates for a given job position, and provide a candidate ranking with an average
error of 74 positions. However, the ranking accuracy was found to be highly dependent
on the nature of the offered positions, so these numbers are expected to vary depending
on the characteristics of individual job positions. Specifically, the system’s accuracy
was found to deteriorate for positions with complex requirements (e.g. domain-specific
work experience) that cannot be captured by pre-defined selection criteria. In any case
the proposed system was able to filter out clearly unqualified applicants.

8.1 Limitations and ethical issues


As mentioned earlier, the system’s accuracy deteriorates for senior positions with
complex requirements. This is partly due to the fact that the prototype system cannot
currently distinguish domain-specific experience, but rather counts overall months
of work experience. Another limitation comes from the inherent complexity of
automatically evaluating the applicants’ personality. It was shown that the personality An integrated
mining system performed consistently when given high quality input (i.e. mostly e-recruitment
autobiographical blogs) but these may not always be available. In the future we plan to
incorporate more sources of textual data that reflect the author’s personality, such as system
the applicant’s Facebook account.
The proposed system never resorts to web crawling and mining techniques to
associate users with their social web presence, as – aside of the apparent ethical 565
implications – this would increase the possibility of decisions based on false, or
incomplete information. The candidate is asked to voluntarily provide a link to his
personal blog and is advised against submitting a technical blog, which typically
contains very few linguistic features that can act as personality markers. However, we
acknowledge that there is still a possibility of judging candidates based on incomplete
or outdated information (e.g. candidates without fully updated LinkedIn profiles).
Blind confidence in automated e-recruitment systems could have a high societal
cost, jeopardizing the right of individuals to equal opportunities in the job market.
We believe that the recruiting decision should always be in the hands of HR
professionals; however, the promise of automated e-recruitment systems to drastically
cut costs is compelling so they cannot be entirely ruled-out, especially in the massive
scale of agencies’ web sites that host thousands or even millions of CVs and
job positions.

8.2 Theoretical implications


In this work, we have proposed a new approach for predicting human recruiters’
judgment regarding the relevance of job applicants to a specific position, based on the
AHP. What is more, we have contributed a better understanding of the automated
applicant screening process in e-recruitment systems, focussing on applicants’
LinkedIn profiles rather than job-specific resumes. We argued that to successfully
derive candidate rankings, e-recruitment systems need access to the candidate’s
full profile (i.e. full employment history and education) as well as the recruiters’
selection criteria. We have shown that after careful parameterization, which includes
assigning weights to a set of candidate selection criteria, the proposed scheme can
output consistent candidate rankings compared to the ones assigned by expert
recruiters. Finally, we have shown that it is possible to derive features reflecting
a candidates’ personality by performing linguistic analysis to his personal blog. These
can serve as selection criteria in job positions where personality traits are important for
applicant selection.

8.3 Practical implications


It has been shown that companies can increase the efficiency of the recruitment process
and significantly cut costs, by integrating e-recruitment systems in their HR
management infrastructure. The efficiency can be further increased with the advent of
automated e-recruitment systems that support candidate pre-screening and ranking,
as interviewing and background investigation can then be limited to the top candidates
identified by the system. In the proposed system, applicants are allowed to apply
for a job position with their LinkedIn profile, instead of uploading a job-specific
resume. This allows employers to access the applicant’s full employment history
and group of contacts, and gives them the chance to automatically evaluate the
candidate’s profile for a broad range of job positions without the complexity of parsing
a full-text resume.
INTR Job seekers must also be prepared for the new era of social recruitment, investing
22,5 in a fully updated LinkedIn profile and an extensive list of contacts. Keeping a personal
blog or participating in online discussions and communities may also give them
significant visibility and increase their job offers. According to a survey published
by the prominent ATS provider JobVite[1], more than 80 percent of the 600 employers
surveyed reported employing LinkedIn for recruiting and most check the applicants’
566 LinkedIn profiles if provided. Automated screening tools can drastically speed-up
this process, generating reports of candidates’ social profiles.
Third parties (i.e. recruiting agencies) that provide social mining and automatic
screening services must respect local laws that protect the users’ privacy. For example,
in the USA these agencies are within the bounds of the FCRA law (i.e. they
are considered “consumer reporting agencies”) which requires the users to complete
a certification that allows the agency to collect and process their social profile.
Finally, it must be noted that special attention must be paid in the choice of selection
criteria, so that employers don’t face liabilities. For example, companies must
avoid gender- or age-related selection criteria as they are prohibited by the legislation
in many countries (e.g. by the “Americans with Disabilities Act” and the “Age
Discrimination in Employment Act” in the USA). Care should be taken to also
avoid illegal biases, i.e. accidentally using selection criteria that indirectly exclude
applicants that have a protected characteristic (such as being a certain gender, age
or religion).

9. Conclusions
In this work, a novel approach for recruiting job applicants was proposed and
implemented in the form of a company-oriented e-recruitment system. The proposed
system employs AHP for candidate ranking based on criteria that can be extracted
from the applicant’s LinkedIn profile and performs linguistic analysis on candidates’
blogs to infer their personality characteristics. The proposed e-recruitment system was
tested in a large-scale pilot scenario, which included three different offered positions
and 100 job applicants. The application of our approach revealed that it is effective in
identifying the job applicants’ extraversion and rank them accordingly. The candidate
rankings and the extraversion scores output by the system were cross-compared to the
scores provided from expert recruiters. It was found that with the exception of senior
positions that required domain experience and specific qualifications, our system
performed consistently with an average ranking error of 74 positions.

9.1 Future work


The authors plan a number of enhancements for the proposed system as a future work.
To begin with we plan to incorporate semantic matching techniques, to count
applicants’ work experience in fields that are relevant to the job position. This is
expected to improve our system’s accuracy in senior positions. What’s more, we plan to
incorporate more sources of textual information and incorporate a filtering mechanism
for low quality input (e.g. exclude technical blogs from the personality mining process).
This will ensure that personality mining is not based on incomplete data and
increase its accuracy. Finally, we plan to include more personality dimensions
(e.g. Conscientiousness, Agreeableness) to broaden the scope of the proposed system.
Note
1. http://recruiting.jobvite.com/resources/social-recruiting-survey.php
References An integrated
Argamon, S., Sushant, D., Koppel, M. and Pennebaker, J. (2005), “Lexical predictors of personality e-recruitment
type”, Proceedings of the Joint Annual Meeting of the Interface and the Classification
Society of North America, St. Louis, Missouri, June, 8-12. system
Bizer, R.H. and Rainer, E. (2005), Impact of Semantic web on the job recruitment Process,
Wirtschaftsinformatik Physica-Verlag HD, Heidelberg.
De Meo, P., Quattrone, G., Terracina, G. and Ursino, D. (2007), “An XML-based multiagent system 567
for supporting online recruitment services”, Systems, Man and Cybernetics, Part A:
Systems and Humans, Vol. 37 No. 4, pp. 464-80.
Faliagka, E., Kozanidis, L., Stamou, S., Tsakalidis, A. and Tzimas, G. (2011a), “A personality
mining system for automated applicant ranking in online recruitment systems”,
Proceedings of International Conference on Web Engineering (ICWE 2011), pp. 379-82.
Faliagka, E., Ramantas, K., Tsakalidis, A., Viennas, M., Kafeza, E. and Tzimas, G. (2011b), “An
integrated e-recruitment system for CV ranking based on AHP”, Proceedings of WEBIST
2011, pp. 147-50.
Gill, J.A., Nowson, S. and Oberlander, J. (2009), “What are they blogging about? Personality, topic,
and motivation in blogs”, Proceedings of AAAI ICWSM, San Jose, California, CA, May, 17–20.
Ho, L., Kuo, T. and Lin, B. (2010), “Influence of online learning skills in cyberspace”, Internet
Research, Vol. 20 No. 1, pp. 55-71.
Jansen, B., Jansen, K. and Spink, A. (2005), “Using the web to look for work: implications for
online job seeking and recruiting”, Internet Research, Vol. 15 No. 1, pp. 49-66.
Kessler, R., Torres-Moreno, J. and El-Beze, M. (2007), “E-Gen: automatic job offer processing
system for human resources”, in Rauch, J., Ras, Z., Berka, P. and Elomas, T. (Eds),
Proceedings of the Artificial Intelligence 6th Mexican International Conference on Advances
in Artificial Intelligence (MICAI’07), Springer-Verlag, Berlin, Heidelberg, pp. 985-95.
Mairesse, F., Walker, M., Mehl, M. and Moore, R. (2007), “Using linguistic cues for the automatic
recognition of personality in conversation and text”, Journal of Artificial Intelligence
Research, Vol. 30 No. 1, pp. 457-501.
Mishne, G. (2005), “Experiments with mood classification in blog posts”, Proceedings of the 1st
Workshop on Stylistic Analysis of Text For Information Access Style 2005, Salvador,
August, 15–19.
Mochol, M., Wache, H. and Nixon, L. (2007), Improving the Accuracy of Job Search with Semantic
Techniques, Business Information Systems, Springer, Berlin/Heidelberg.
Neuman, C. (2010), “Prospero: a tool for organizing internet resources”, Internet Research, Vol. 20
No. 1, pp. 408-19.
Oberlander, J. and Nowson, S. (2006), “Whose thumb is it anyway? Classifying author personality
from weblog text”, Proceedings of the Association for Computational Linguists, Sydney,
July, 17–21.
Pande, S. (2011), “E-recruitment creates order out of chaos at SAT Telecom: system cuts costs
and improves efficiency”, Human Resource Management International Digest, Vol. 19
No. 3, pp. 21-3.
Pennebaker, J.W. and King, L. (1999), “Linguistic styles: language use as an individual
difference”, Journal of Personality and Social Psychology, Vol. 77 No. 6, pp. 1296-312.
Pennebaker, J.W., Francis, M.E. and Booth, R.J. (2001), “Linguistic inquiry and word count: LIWC
2001”, Word Journal Of The International Linguistic Association.
Radevski, V. and Trichet, F. (2006), “Ontology-based systems dedicated to human resources
management: an application in e-recruitment”, in Mursman, R., Tari, Z. and Herrero, P. (Eds),
On the Move to Meaningful Internet Systems, Springer, Berlin/Heidelberg, pp. 1068-77.
INTR Ramar, K. and Sivaram, N. (2010), “Applicability of clustering and classification algorithms for
recruitment data mining”, International Journal of Computer Applications, Vol. 4 No. 5,
22,5 pp. 23-8.
Rienks, R. and Heylen, D. (2006), “Dominance detection in meetings using easily obtainable
features”, Machine Learning for Multimodal Interaction, pp. 76-86.
Saaty, T.L. (1990), “How to make a decision: the analytic hierarchy process”, European Journal of
568 Operational Research, Vol. 48 No. 1, pp. 9-26.
Schmid, H. (1995), “Improvements in part-of-speech tagging with an application to German”,
Proceedings of the ACL SIGDAT, Dublin, Ireland, June, 26–30.

Further reading
Kessler, R., Béchet, N., Torres-Moreno, J., Roche, M. and El-Bèze, M. (2009), “Job offer
management: how improve the ranking of candidates”, Proceedings of ISMIS 09, Vol. 5722,
Springer, Berlin/Heidelberg, pp. 431-41.

About the authors


Evanthia Faliagka is currently a Lecturer on a contract in the Department of Applied Informatics
in Management and Finance of the Technological Educational Institute of Mesolonghi. She
received the Diploma of Computer Engineering from the Computer Engineering and Informatics
Department in 2006 and the MSc degree in 2008. Up to now, she has participated in the
development of many R&D projects funded by national and EU resources, as well as the private
sector. Her research interests lie in the areas of web engineering, machine learning and
recommendation systems. Evanthia Faliagka is the corresponding author and can be contacted
at: [email protected]
Athanasios Tsakalidis is a Computer Scientist and has been a Professor in the Department
of Computer Engineering and Informatics of the University of Patras since 1993, and the
Vice-Director of the Research Academic Computer Technology Institute (RACTI) since 2004. He
obtained a Diploma of Mathematics from the University of Thessaloniki in 1973, Diploma of
Informatics in 1980 and PhD in Informatics in 1983, from the University of Saarland, Germany.
He is one of the contributors to the Handbook of Theoretical Computer Science (Elsevier and
MIT-Press, 1990) and has published many scientific articles, having made an especial
contribution to the solution of elementary problems in the area of data structures. His scientific
interests are data structures, computational geometry, information retrieval, computer graphics,
databases, and bio-informatics.
Giannis Tzimas is currently an Assistant Professor in the Department of Applied Informatics
in Management and Finance of the Technological Educational Institute of Mesolonghi. Since 1995,
he is also an adjoint researcher in the Graphics, Multimedia and GIS Lab, Department
of Computer Engineering and Informatics of the University of Patras. He graduated from the
Computer Engineering and Informatics Department in 1995 and has participated in the
management and development of many R&D projects funded by national and EU resources,
as well as the private sector. He also works in the Internet Technologies and Multimedia Research
Unit of the Research Academic Computer Technology Institute (RACTI), as a technical co-
ordinator, from 1997 until now. His research activity lies in the areas of web engineering, web
modeling, bioinformatics, web based education and intranets/extranets. He has published a
considerable number of articles in prestigious national and international conferences and journals.

To purchase reprints of this article please e-mail: [email protected]


Or visit our web site for further details: www.emeraldinsight.com/reprints

You might also like