Industrial Training Report
Industrial Training Report
Submitted in partial fulfillment of the requirement for the award of the degree
Bachelor of Technology
in
Information Technology
Certificate
The industrial training report submitted by Sukriti Nautiyal (Roll no- 225/LIT/001), has been found satisfactory
in terms of scope, quality, and presentation as fulfillment of the requirements for the course IT-493
(INDUSTRIAL TRAINING - 7th Semester) of the degree of Bachelors of Technology in Information
Technology at Gautam Buddha University, Greater Noida, Uttar Pradesh, India-201312.
(Signature)
Mr. Vishvajeet Yadav
(Supervisor)
Table Of Contents
1. Candidate’s
Declaration…………………………………………………………………
2. Acknowledgment……………………………………………………………………….
3. Offer Letter………………………………………………………………………………..
4. Abstract…………………………………………………………………………………
5. Introduction……………………………………………………………………………..
6. Project Description………………………………………………………………………..
About the Project…………………………………………………………….
Methodology ………………………………………………………………...
Objective and scope………………………………………………………….
Workflow……………………………………………………….....................
Code and Output……………………………………………………………..
7. Limitations And
Challenges……………………………………………………………….
8. Conclusion………………………………………………………………………………...
9. Completion
Certificate…………………………………………………………………….
10. References………………………………………………………………………………...
Candidate's Declaration
I (Sukriti Nautiyal) hereby declare that the training report is an authentic record of my own work
as requirement of 1 Month Internship during period 15/09/24 to 15/10/24 for the award of degree
of B. Tech (Information Technology) at Unified Mentor, under the guidance of Mr. Abhishek
(Project Manager).
The work presented in this report is the result of my own efforts and has not been submitted
elsewhere for any academic or professional purpose. I take responsibility for the content and
SUKRITI NAUTIYAL
225/LIT/001
Every Internship big or small is successful largely due to the effort of a number of wonderful
people who have always given their valuable advice or let a helping hand. I sincerely
appreciate the inspiration, support and guidance of all those people who have been
instrumental in making this project a success. I, Sukriti Nautiyal, the student of Gautam
Buddha University, Greater Noida (B. Tech-IT), am extremely grateful to Unified Mentor for
the confidence bestowed in me and entrusting my training entitled "Clustering in LLM". At
this juncture I feel deeply honoured in expressing my sincere thanks to Mr. Abhishek Sir
(Project Manager) for providing valuable insights leading to the successful completion of my
training.
I would like to thank Dean of ICT Dr. Arpit Bhardwaj, and HOD of IT, Dr. Neeta Singh
whose work helped me to complete my project and express my gratitude to Mr. Vishvajeet
Yadav (supervisor) for assisting me in completing the project. I would also like to thank all
the faculty members of GAUTAM BUDDHA UNIVERSITY for their critical advice and
guidance without which this training would not have been possible. Last but not the least I
place a deep sense of gratitude to my family members and my friends who have been
constant source of inspiration during the preparation of this training report.
Offer Letter
Abstract
This report presents the work undertaken during a data science internship, focusing on the
project titled "Clustering with LLM". The project explores the application of large
language models (LLMs), such as BERT and OpenAI’s GPT, in clustering tasks involving
unstructured textual data. Traditional clustering algorithms often face challenges when
dealing with high-dimensional and noisy data. By leveraging the rich semantic embeddings
generated by LLMs, the project aimed to overcome these limitations and uncover meaningful
patterns and groupings within complex datasets.
The workflow encompassed data collection, preprocessing, embedding generation, and the
application of clustering algorithms like K-Means, DBSCAN, and Agglomerative Clustering.
Evaluation metrics such as the Silhouette Score and Davies-Bouldin Index were employed to
validate the quality of clusters, while visualization techniques like t-SNE and PCA provided
interpretable insights into the high-dimensional data.
Introduction
Data Science is a multidisciplinary field that uses scientific methods, processes, algorithms,
and systems to extract knowledge and insights from structured and unstructured data. As a
Data Science Intern at Unified Mentor, this internship has provided me with an opportunity to
apply theoretical knowledge in real-world projects, enhancing my technical and problem-
solving skills. This report details my contributions, learning, and challenges during the
internship.
In recent years, data science has witnessed significant advancements, driven by the
emergence of cutting-edge technologies like large language models (LLMs). These models,
trained on vast amounts of text data, excel at understanding and generating natural language,
making them invaluable for tasks such as text analysis, sentiment analysis, and clustering.
Clustering, an unsupervised machine learning technique, groups similar data points based on
their intrinsic properties, revealing hidden patterns in the data. However, traditional clustering
algorithms often struggle with high-dimensional or unstructured data, such as text. This is
where LLMs can bridge the gap by providing high-quality, meaningful embeddings that
enhance the performance of clustering algorithms.
This report delves into the project titled "Clustering with LLM", undertaken during my
internship. The project aimed to leverage the power of LLMs to generate embeddings for
textual data and apply clustering techniques to uncover meaningful insights. The following
sections of this report provide a detailed account of the project's methodology, objectives,
challenges, and outcomes. Additionally, the report reflects on the learning experiences and
professional growth achieved throughout this internship journey.
Future Scope
The integration of LLMs in clustering workflows holds immense potential for future
advancements in various domains. By addressing current limitations such as computational
constraints and improving interpretability, LLM-powered clustering can unlock new
opportunities in:
Healthcare: Analyzing clinical notes and medical research to identify trends and
group similar cases for better patient care.
E-commerce: Enhancing recommendation systems by clustering customer reviews
and preferences more accurately.
Education: Categorizing research papers and educational content for streamlined
knowledge discovery.
As LLM architectures continue to evolve and computational resources become more
accessible, the application of LLMs in clustering is poised to become a cornerstone in data-
driven decision-making.
Project Description
Data Collection:
Sourcing textual data from publicly available datasets and internal repositories. The
datasets included customer feedback, product reviews, and research articles.
Tools used: Web scraping APIs, SQL for database queries, and manual data collection
techniques.
1. Data Collection
The first step involved sourcing diverse and representative datasets suitable for clustering
tasks.
Sources:
o Public datasets such as Kaggle repositories, UCI Machine Learning
Repository, and research databases.
o Internal data repositories containing customer feedback, survey responses, and
unstructured text logs.
Tools Used:
o Web Scraping: Libraries like BeautifulSoup and Selenium were utilized for
extracting data from online sources.
o APIs: Data from platforms such as Twitter or Reddit was collected using APIs
to gather real-time text data.
o SQL Queries: Structured data from relational databases was extracted and
processed.
Challenges Addressed:
o Inconsistent formatting of data.
o Duplication of entries, requiring careful curation.
3. Embedding Generation
This step involved transforming the cleaned text data into numerical embeddings using pre-
trained Large Language Models (LLMs).
Models Used:
o BERT: A transformer-based model known for generating contextual
embeddings.
o OpenAI GPT Models: Used for generating semantic-rich embeddings that
capture nuanced relationships between words.
Libraries and Tools:
o Hugging Face Transformers: For seamless integration of pre-trained LLMs.
o PyTorch: For managing embeddings and performing computations efficiently.
Output:
o High-dimensional vector representations of text, where similar texts are closer
in the vector space.
6. Iterative Refinement
The process involved multiple iterations to improve clustering quality.
Adjustments to preprocessing steps, such as experimenting with lemmatization vs.
stemming.
Fine-tuning LLM parameters and selecting the most suitable pre-trained model for the
dataset.
Experimenting with different clustering algorithms and hyperparameters to optimize
results.
Objective and Scope:
Objective:
The primary objective of the project, "Clustering with LLM," was to explore the integration
of Large Language Models (LLMs) into clustering workflows to address the limitations of
traditional approaches when dealing with unstructured text data. The project aimed to:
1. Enhance Clustering Accuracy: By generating semantic-rich embeddings with LLMs,
the project sought to improve the precision and meaningfulness of clustering results.
2. Enable Interpretability: Develop intuitive methods for evaluating and visualizing
high-dimensional clusters to facilitate actionable insights.
3. Overcome Data Challenges: Address the challenges posed by unstructured, noisy, or
high-dimensional data through advanced preprocessing and embedding techniques.
4. Promote Scalability: Ensure the clustering pipeline is adaptable for large datasets
across diverse domains, including e-commerce, healthcare, and academia.
By achieving these goals, the project aimed to demonstrate the transformative potential of
LLMs in clustering tasks and provide a foundation for future innovations in unsupervised
learning.
Scope:
The scope of the project extends to a wide range of applications and industries where text
data plays a pivotal role. Some of the key areas covered include:
1. Customer Sentiment Analysis
o Use Case: Grouping customer feedback and reviews based on shared
sentiments or themes.
o Impact: Enables businesses to address recurring customer concerns and
improve service quality proactively.
2. Topic Modelling and Knowledge Discovery
o Use Case: Clustering academic papers, research articles, or news data to
uncover dominant topics and trends.
o Impact: Assists researchers, policymakers, and businesses in navigating and
synthesizing large volumes of information.
3. Personalized Recommendation Systems
o Use Case: Grouping users or products based on latent preferences and interests
derived from textual data, such as product descriptions or user reviews.
o Impact: Enhances recommendation algorithms, improving user engagement
and satisfaction in e-commerce platforms.
4. Healthcare Data Analysis
o Use Case: Clustering medical notes, patient records, or research data to
identify patterns in symptoms, diagnoses, or treatment outcomes.
o Impact: Facilitates personalized medicine and data-driven decision-making in
healthcare.
5. Market Segmentation and Trend Analysis
o Use Case: Grouping market data or consumer opinions into distinct clusters
for trend analysis.
o Impact: Helps businesses tailor their marketing strategies and products to
specific audience segments.
6. Content Categorization in Media and Education
o Use Case: Clustering articles, books, or educational resources into related
categories.
o Impact: Streamlines knowledge discovery and supports efficient content
organization.
Future Potential:
The integration of LLMs in clustering workflows can evolve further as these models become
more sophisticated and computational resources become more accessible. Future
advancements could include:
Real-time Clustering: Applying LLM-powered clustering techniques to process
streaming text data, such as social media feeds or live customer interactions.
Explainable AI (XAI): Incorporating interpretability frameworks to make LLM-based
clustering decisions more transparent and trustworthy.
Cross-domain Applications: Extending the methodology to diverse fields such as
finance, law, and entertainment, where unstructured text data is prevalent.
Workflow
Code & Output
Code:
import pandas as pd # dataframe manipulation
import numpy as np # linear algebra
# data visualization
import [Link] as plt
import [Link] as cm
import [Link] as px
import plotly.graph_objects as go
import seaborn as sns
import shap
# sklearn
from [Link] import KMeans
from [Link] import PowerTransformer, OrdinalEncoder
from [Link] import Pipeline
from [Link] import TSNE
from [Link] import silhouette_score, silhouette_samples,
accuracy_score, classification_report
pca_2d_object = [Link](
n_components=2,
n_iter=3,
rescale_with_mean=True,
rescale_with_std=True,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
pca_2d_object.fit(df)
df_pca_2d = pca_2d_object.transform(df)
df_pca_2d.columns = ["comp1", "comp2"]
df_pca_2d["cluster"] = predict
pca_3d_object = [Link](
n_components=3,
n_iter=3,
rescale_with_mean=True,
rescale_with_std=True,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
pca_3d_object.fit(df)
df_pca_3d = pca_3d_object.transform(df)
df_pca_3d.columns = ["comp1", "comp2", "comp3"]
df_pca_3d["cluster"] = predict
df = [Link]({"cluster": "object"})
df = df.sort_values("cluster")
fig = px.scatter_3d(df,
x='comp1',
y='comp2',
z='comp3',
color='cluster',
template="plotly",
# symbol = "cluster",
color_discrete_sequence=[Link],
title=title).update_traces(
# mode = 'markers',
marker={
"size": 4,
"opacity": opacity,
# "symbol" : "diamond",
"line": {
"width": width_line,
"color": "black",
}
}
).update_layout(
width = 1000,
height = 800,
autosize = False,
showlegend = True,
legend=dict(title_font_family="Times
New Roman",
font=dict(size= 20)),
scene = dict(xaxis=dict(title =
'comp1', titlefont_color = 'black'),
yaxis=dict(title = 'comp2',
titlefont_color = 'black'),
zaxis=dict(title = 'comp3',
titlefont_color = 'black')),
font = dict(family = "Gilroy", color =
'black', size = 15))
[Link]()
df = [Link]({"cluster": "object"})
df = df.sort_values("cluster")
fig = [Link](df,
x='comp1',
y='comp2',
color='cluster',
template="plotly",
# symbol = "cluster",
color_discrete_sequence=[Link],
title=title).update_traces(
# mode = 'markers',
marker={
"size": 8,
"opacity": opacity,
# "symbol" : "diamond",
"line": {
"width": width_line,
"color": "black",
}
}
).update_layout(
width = 800,
height = 700,
autosize = False,
showlegend = True,
legend=dict(title_font_family="Times
New Roman",
font=dict(size= 20)),
scene = dict(xaxis=dict(title =
'comp1', titlefont_color = 'black'),
yaxis=dict(title = 'comp2',
titlefont_color = 'black'),
),
font = dict(family = "Gilroy", color =
'black', size = 15))
[Link]()
clf = ECOD()
[Link](df_embedding)
out = [Link](df_embedding)
df_embedding["outliers"] = out
df["outliers"] = out
df_embedding_no_out = df_embedding[df_embedding["outliers"] == 0]
df_embedding_no_out = df_embedding_no_out.drop(["outliers"], axis = 1)
df_embedding_with_out = df_embedding.copy()
df_embedding_with_out = df_embedding_with_out.drop(["outliers"], axis =
1)
df_embedding_no_out.shape
df_embedding_with_out.shape
# Instantiate the clustering model and visualizer
km = KMeans(init="k-means++", random_state=0, n_init="auto")
visualizer = KElbowVisualizer(km, k=(2,10), locate_elbow=False)
"""
The Davies Bouldin index is defined as the average similarity measure
of each cluster with its most similar cluster, where similarity is the
ratio of within-cluster distances to between-cluster distances.
The minimum value of the DB Index is 0, whereas a smaller value (closer
to 0) represents a better model that produces better clusters.
"""
print(f"Davies bouldin score:
{davies_bouldin_score(df_embedding_no_out,clusters_predict)}")
"""
Calinski Harabaz Index -> Variance Ratio Criterion.
Calinski Harabaz Index is defined as the ratio of the sum of between-
cluster dispersion and of within-cluster dispersion.
The higher the index the more separable the clusters.
"""
print(f"Calinski Score:
{calinski_harabasz_score(df_embedding_no_out,clusters_predict)}")
"""
The silhouette score is a metric used to calculate the goodness of fit
of a clustering algorithm, but can also be used as a method for
determining an optimal value of k (see here for more).
Its value ranges from -1 to 1.
A value of 0 indicates clusters are overlapping and either the data or
the value of k is incorrect.
1 is the ideal value and indicates that clusters are very dense and
nicely separated.
"""
print(f"Silhouette Score:
{silhouette_score(df_embedding_no_out,clusters_predict)}")
pca_3d_object, df_pca_3d = get_pca_3d(df_embedding_no_out,
clusters_predict)
plot_pca_3d(df_pca_3d, title = "PCA Space", opacity=1, width_line =
0.1)
print("The variability is :", pca_3d_object.eigenvalues_summary)
pca_2d_object, df_pca_2d = get_pca_2d(df_embedding_no_out,
clusters_predict)
plot_pca_2d(df_pca_2d, title = "PCA Space", opacity=1, width_line =
0.2)
df_tsne_3d = TSNE(
n_components=3,
learning_rate=500,
init='random',
perplexity=200,
n_iter = 5000).fit_transform(sampling_data)
clf_km = [Link](colsample_by_tree=0.8)
#SHAP values
explainer_km = [Link](clf_km)
shap_values_km = explainer_km.shap_values(df_no_outliers)
shap.summary_plot(shap_values_km, df_no_outliers, plot_type="bar",
plot_size=(15, 10))
y_pred = clf_km.predict(df_no_outliers)
accuracy=accuracy_score(y_pred, clusters_predict)
print('Training-set accuracy score: {0:0.4f}'. format(accuracy))
print(classification_report(clusters_predict, y_pred))
df_no_outliers["cluster"] = clusters_predict
df_group = df_no_outliers.groupby('cluster').agg(
{
'job': lambda x: x.value_counts().index[0],
'marital': lambda x: x.value_counts().index[0],
'education': lambda x: x.value_counts().index[0],
'housing': lambda x: x.value_counts().index[0],
'loan': lambda x: x.value_counts().index[0],
'age':'mean',
'balance': 'mean',
'default': lambda x: x.value_counts().index[0],
}
).sort_values("job").reset_index()
df_group
Output:
Limitations and Challenges
Limitations:
Despite its promising results, the project faced several limitations that constrained its
potential outcomes:
1. Computational Complexity
One of the primary challenges of this project was the high computational requirements
associated with LLMs.
Resource Intensity:
Generating embeddings using large language models like BERT or GPT is
computationally expensive, requiring significant memory and processing power.
Scalability Issues:
As the size of the dataset increased, both embedding generation and clustering
became increasingly time-consuming, limiting scalability for real-time applications.
Hardware Dependency:
The dependency on GPUs or TPUs for efficient computation posed a challenge for
deploying the solution in resource-constrained environments.
Potential Solutions:
Using smaller, domain-specific LLMs such as DistilBERT to reduce computational
costs.
Leveraging cloud-based resources for dynamic scalability.
2. High-dimensional Embeddings
The text embeddings generated by LLMs are typically high-dimensional vectors, which can
pose challenges for clustering algorithms.
Curse of Dimensionality:
High-dimensional data can degrade the performance of clustering algorithms like K-
Means and make it harder to identify meaningful patterns.
Visualization Difficulties:
Visualizing high-dimensional clusters for interpretability required additional
dimensionality reduction techniques, such as PCA or t-SNE, which introduced their
own limitations and biases.
Potential Solutions:
Experimenting with dimensionality reduction methods like UMAP (Uniform
Manifold Approximation and Projection) for better performance.
Adopting specialized clustering algorithms designed for high-dimensional data.
3. Model Selection and Fine-tuning
Choosing the appropriate LLM for the task and fine-tuning it posed significant challenges:
Trade-offs in Pre-trained Models:
Models like BERT prioritize understanding context but may struggle with domain-
specific nuances. GPT models excel at generating embeddings but are
computationally intensive.
Domain-specific Adaptation:
Pre-trained LLMs often require fine-tuning to perform optimally on domain-specific
text, which necessitates additional labeled data and computational resources.
Potential Solutions:
Using hybrid models that combine pre-trained embeddings with task-specific fine-
tuning.
Leveraging open-domain models but augmenting them with domain-specific
vocabulary.
4. Cluster Interpretability
While LLM-powered embeddings enhanced clustering accuracy, interpreting the resulting
clusters remained a challenge:
Semantic Overlap:
Clusters derived from text embeddings sometimes showed semantic overlap, making
it difficult to delineate clear boundaries between groups.
Lack of Explainability:
Traditional clustering algorithms like K-Means and DBSCAN do not inherently
provide insights into why specific data points are grouped together.
Potential Solutions:
Applying explainability techniques, such as SHAP (SHapley Additive exPlanations),
to understand feature contributions.
Developing domain-specific heuristics to label and interpret clusters.
5. Data-related Challenges
The text data used in the project introduced several challenges during preprocessing and
analysis:
Noisy and Inconsistent Data:
Text data often contained typos, slang, and inconsistent formatting, which required
extensive preprocessing.
Handling Rare or Outlier Data:
Outlier data points often disrupted clustering results, particularly for density-based
algorithms like DBSCAN.
Potential Solutions:
Implementing advanced preprocessing pipelines, including spell-checkers and
synonym mapping.
Using robust algorithms that can handle noise and outliers effectively.
6. Evaluation Metrics
Evaluating the quality of clustering outcomes presented unique challenges:
Subjectivity in Clustering:
Unlike supervised learning, clustering lacks predefined labels, making the evaluation
inherently subjective.
Metric Limitations:
Metrics like Silhouette Score and Davies-Bouldin Index provide quantitative insights
but may not capture the semantic quality of text clusters.
Potential Solutions:
Incorporating qualitative evaluation, such as manual inspection or user feedback.
Combining multiple evaluation metrics to ensure a holistic assessment.
7. Real-world Deployment
Transitioning from a proof-of-concept to real-world deployment revealed several practical
challenges:
Dynamic Data:
In real-world scenarios, data is dynamic and constantly evolving, requiring periodic
retraining and re-clustering.
Integration with Existing Systems:
Integrating LLM-powered clustering pipelines into existing workflows required
significant customization.
Potential Solutions:
Automating periodic retraining and clustering workflows.
Building modular pipelines for seamless integration with enterprise systems.
Challenges:
1. Embedding Generation Time:
o Generating high-quality embeddings for extensive datasets was time-
consuming, leading to delays in subsequent steps like clustering and
evaluation. Optimizing this process required careful consideration of model
parameters and batch sizes.
2. Cluster Validation:
o Ensuring the validity of clusters posed significant challenges. Metrics like
Silhouette Score provided numerical validation but did not always align with
domain-specific insights. Balancing quantitative and qualitative validation
methods was difficult.
3. Data Preprocessing:
o Handling diverse textual data from different sources required customized
preprocessing pipelines. For instance, some datasets contained special
characters or non-English text, which needed additional handling.
4. Model Fine-Tuning:
o While pre-trained LLMs were used, fine-tuning them on domain-specific data
required significant computational resources and expertise. This step, although
beneficial, was largely constrained.
5. Visualizing High-Dimensional Data:
o Reducing the dimensions of embeddings for visualization purposes was
challenging. Ensuring that the reduced dimensions captured meaningful
relationships between data points required iterative experimentation with
techniques like t-SNE and UMAP.
The project demonstrated resilience in overcoming many of these challenges, often through
creative problem-solving and leveraging available resources. Despite these hurdles, the
outcomes were insightful and laid the groundwork for future advancements.
Conclusion
The internship project titled "Clustering with LLM" provided a valuable opportunity to
explore the intersection of large language models and unsupervised machine learning
techniques. By leveraging the semantic power of LLM-generated embeddings, this project
successfully demonstrated how clustering could uncover meaningful patterns and insights
within unstructured text data. The integration of advanced clustering methods and LLMs
addressed traditional challenges of dimensionality and data complexity, offering a robust
solution for organizing and interpreting large volumes of textual information.
Key outcomes of the project include:
A scalable workflow for text clustering that begins with data preprocessing, proceeds
to embedding generation using LLMs, and concludes with clustering and
visualization.
Insights into the strengths and weaknesses of various clustering algorithms,
particularly in handling high-dimensional embeddings.
Practical experience in implementing and fine-tuning advanced language models and
integrating them into real-world workflows.
This internship fostered the development of technical and analytical skills, including:
Hands-on expertise in tools like Hugging Face Transformers, Python libraries for
preprocessing (NLTK, SpaCy), and visualization (t-SNE, PCA).
Strengthened problem-solving abilities to address computational, interpretative, and
resource-related challenges.
An enhanced understanding of the potential applications of clustering in domains like
sentiment analysis, recommendation systems, and topic modeling.
While the project showcased promising results, it also highlighted several areas for
improvement. Addressing limitations such as computational constraints, data quality, and
interpretability of clusters can further refine the outcomes. Future work could focus on
optimizing the embedding generation process, exploring alternative LLM architectures, and
developing more intuitive ways to validate and interpret clusters.
Overall, the internship was a transformative learning experience, bridging theoretical
knowledge with practical applications. The insights and skills gained from this project not
only enhanced my technical proficiency but also prepared me for future challenges in the
dynamic field of data science.
Completion Certificate
References
1. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding. arXiv preprint
arXiv:1810.04805.
2. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving
Language Understanding by Generative Pre-training. OpenAI preprint.
3. Van Der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of
Machine Learning Research.
4. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for
Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd
International Conference on Knowledge Discovery and Data Mining.
5. Scikit-learn: Machine Learning in Python. Pedregosa, F., et al. (2011). Journal of
Machine Learning Research.
6. Hugging Face Transformers Library. Retrieved from [Link]
7. SpaCy Natural Language Processing Library. Retrieved from [Link]
8. NLTK: Natural Language Toolkit. Bird, S., Klein, E., & Loper, E. (2009). Retrieved
from [Link]