The Idea by Woobensky

A Natural language processing project report. An idea that has the power to revolutionize NLP written by Woobensky Pierre.

Uploaded by

Woobensky Pierre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views4 pages

The Idea by Woobensky

A Natural language processing project report. An idea that has the power to revolutionize NLP written by Woobensky Pierre.

Uploaded by

Woobensky Pierre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Bensky

An AI model that reads a text document and that is able to answer any
question whose answer is in the document explicitly or implicitly.

Woobensky Pierre | Natural Language Processing | 30 novembre 2021

The idea
After learning some concepts in NLP and algorithm, I started the project in October 27 2019. The
plan was to find a way to structure and store every information contained in the uploaded document
in a place where I could go back to retrieve the information, so that when the user would ask a
question I would easily go through my structured data and give the user the corresponding answer.

How I started
Having this idea in mind I used the python library NLTK to create a pipeline for the text of the user’s
document to follow until it becomes a structure data. I use chunking to find the parts of the
sentences and their function (ex: subject, complement, circumstantial, verb, etc.) then I create some
SQL tables to store the parts of each sentence in a structured way. AT this point the user can upload
its document.
Now for the QandA part (what, where, when…questions). For this part I did something very similar;
I parsed the questions by chunking it in parts and by determining what information the question
was looking for (ex: what is Haiti ? = looking for the complement of the sentence: Haiti is….). After
that I created a SQL query to find the corresponding answer in the structured data base.
Then after finding the corresponding answer in the data base, I had to use it to generate a
grammatical answer (because the data base was not structured grammatically).

How it worked
I was happy to see how effective it was for explicit questions (ex: Haiti is a country  what is Haiti?).
But it didn’t work well at all for implicit questions (ex: the population of Haiti is 11.1 million  how
many people live in Haiti?), and the user could not even change a word with its synonym.
Everything had to be really precise. The user had to know how the author structured the phrases in
order to ask the question the right way.
And furthermore, I was using MySQL which takes time, and also because I had to chunk every
sentence in order to store them. Those caused it to take about 1 minute to read a 6 sentence
document and it took about 15 seconds to answer questions.
And the chunking was not the most effective way to parse the syntax of a sentence; it is not scalable;
that means you have to predict every possibility and write them by hand. (I used a Regular
expression chunker not a trained one).

Further improvement
For it to work better I have to change a lot of things, a lot of functionalities. First of all I need to
change the answer retrieval algorithm (the way I look for the corresponding answer), make it less
strict, and also more open to implicitness. And for speed I have to use a new DBMS that is way faster
than MySQL, and that allow more arguments in queries (ex: Select * where subject is Synonym to
“Path”). I think “elasticsearch” might be a good option. For syntax parsing I intend to use
dependency parsing which is way more scalable than chunking.
I might use word vector (with Gensim) and sentence similarity in order to retrieve answer for
implicit questions.
I can also use Spacy instead of NLTK since it has a labeled dependency parser and it’s faster than
NLTK, I think.

PAGE 1
The ineluctable obstacle
Even if I do all this improvements, I will still have big limits with implicit questions. There are things
that just understand the words of a sentence and its structure can’t allow you to know and
understand about a text. And most often this are the kind of questions the users will want to know.
For example in a document that talks about Haiti and the DR has countries that share one island, if
the users asks what country does Haiti have a border with, there is no way for the system to know
that when two countries shares an island the inevitably have one border. This situation even a
human being could not have found this answer if He doesn’t know a lot about geography.
And this idea bring me to my next point in which I will talk about “Background Knowledge”.

The situation
In every text document there is always a lot of information that the author don’t write but that are
there and that we humans fill up by ourselves (missing info.)
Ex: If we have a text that says: “Yesterday Mary was very sick He had to go to the hospital, but when
she arrived the doctors do all the can they could not save her.”. There is two questions that a model
like the one I talk about can’t answer: Did Mary die? And if so what killed her?
Only a human being knows that an author tells you that doctors could not save a patient that means
the patient died. And we also know that if a patient dies because the doctors could not save her that
means the sickness killed her.
We humans have background knowledge that allow us to understand what we read even if the
author doesn’t say everything. And we know what happens when we read about subject we are not
used to (me for example, biology), we understand every word we read but we can’t understand the
whole thing.

The solution
If we can find a way to make computers have background knowledge, they will be able, like us, to fill
up things that the author don’t say, they will be able to make inferences and logics about things the
author writes, which will allow them to answer any question the user might ask, no matter how
implicit it is.
How would this work?
It will use the same system I use for the current version of Bensky, storing information in a data
base. But it will be able to have a reference knowledge where it goes to find facts of the real world
and background information that will allow to understand the document in a deeper level.
I will use deep learning and machine learning to create an algorithm that allow it to make deep
logics about facts.
It will need a huge amount of text about a special field in order to understand documents about this
field, for example in computer science to understand a document and be able to answer question
about a book that talks about deep learning, you need a lot of background knowledge in the science.
So that’s the goal. A system that knows stuff about our world which allow it to understand natural
language texts.

PAGE 2
What I need to do
For now I’m not ready to implement those ideas, since I don’t know a lot about machine learning.
But what I need to do is gain a great amount of skills in machine learning, deep learning and math.
I will start with Tensorflow to get some basic understanding of the field and of the algorithms then I
will go deeper as I start to see what part is more important for my goal.
Then I will implement this solution, and I believe that this technology will change way we interact
with computers. It will change the way we create chatbots and even the way search engines work.
We will at last be able to have real conversations with computers and this will allow developers to
create technologies that will be able to make best use of the power that computers have.

PAGE 3

Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
From Everand
Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
Siddhanta Bhatta
1/5 (1)
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
From Everand
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Prateek Gupta
No ratings yet
Robotics for Writers
From Everand
Robotics for Writers
August Niehaus
No ratings yet
Prompt Codex
From Everand
Prompt Codex
VISHWA HANSNUR
No ratings yet
Automatic Generation of Short Answer Questions in Reading Comprehension Using NLP and KNN
No ratings yet
Automatic Generation of Short Answer Questions in Reading Comprehension Using NLP and KNN
28 pages
Structures and C
From Everand
Structures and C
Prakash Hegade
4/5 (2)
What Is ChatGPT Doing: ... and Why Does It Work?
From Everand
What Is ChatGPT Doing: ... and Why Does It Work?
Stephen Wolfram
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
NLP Quickquest New
No ratings yet
NLP Quickquest New
14 pages
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
From Everand
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Steven Cooper
4/5 (5)
Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras
From Everand
Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras
Frank Millstein
3/5 (1)
Grover Park George on Access: Unleash the Power of Access
From Everand
Grover Park George on Access: Unleash the Power of Access
George Hepworth
No ratings yet
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
From Everand
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Steven Cooper
3.5/5 (9)
Artificial Intelligence Textbook with Reinforcement Learning
From Everand
Artificial Intelligence Textbook with Reinforcement Learning
Harry Katzan Jr.
No ratings yet
Harambe University
No ratings yet
Harambe University
8 pages
File Test 1 A Answer Key: Grammar, Vocabulary, and Pronunciation
No ratings yet
File Test 1 A Answer Key: Grammar, Vocabulary, and Pronunciation
3 pages
A998 PDF
No ratings yet
A998 PDF
16 pages
NLP Module 5
No ratings yet
NLP Module 5
92 pages
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
2.5/5 (2)
SMT Quick Reference Guide - April 2020 PDF
0% (1)
SMT Quick Reference Guide - April 2020 PDF
11 pages
Unit II
No ratings yet
Unit II
63 pages
Question Answering System: 296: Natural Language Processing
No ratings yet
Question Answering System: 296: Natural Language Processing
30 pages
NLP PBL
No ratings yet
NLP PBL
21 pages
Mathematics in The Modern World Syllabus-Final
No ratings yet
Mathematics in The Modern World Syllabus-Final
2 pages
Mastering Intelligence Scientifically: The Glory of Making Mistakes
From Everand
Mastering Intelligence Scientifically: The Glory of Making Mistakes
Othman Ahmad
No ratings yet
Very Good For Transformer
No ratings yet
Very Good For Transformer
34 pages
Project Report 8th Sem 2 Final Edit
No ratings yet
Project Report 8th Sem 2 Final Edit
29 pages
Aqua
No ratings yet
Aqua
25 pages
GROKKING ALGORITHMS: A Comprehensive Beginner's Guide to Learn the Realms of Grokking Algorithms from A-Z
From Everand
GROKKING ALGORITHMS: A Comprehensive Beginner's Guide to Learn the Realms of Grokking Algorithms from A-Z
Eric Schmidt
No ratings yet
04 Lecture10 QA
No ratings yet
04 Lecture10 QA
51 pages
Intelligent Question Answering System
No ratings yet
Intelligent Question Answering System
50 pages
Natural Language Question Answering: The View From Here: L.Hirschman R.Gaizauskas
No ratings yet
Natural Language Question Answering: The View From Here: L.Hirschman R.Gaizauskas
26 pages
Open Domain QA
No ratings yet
Open Domain QA
28 pages
Minor Project
No ratings yet
Minor Project
22 pages
8 Quiz Maker Automatic Quiz Generation From Text Using NLP
No ratings yet
8 Quiz Maker Automatic Quiz Generation From Text Using NLP
11 pages
Question Answering System Using Ontology in Marathi Language
No ratings yet
Question Answering System Using Ontology in Marathi Language
12 pages
Final Report Industrial Training (English) - 23
No ratings yet
Final Report Industrial Training (English) - 23
52 pages
Trends and Issues About Sped and Inclusive Education
No ratings yet
Trends and Issues About Sped and Inclusive Education
14 pages
Deped Order No. 42 S. 2016: Policy Guidelines On Daily Lesson Preparation For The K To 12 Basic Education Program
No ratings yet
Deped Order No. 42 S. 2016: Policy Guidelines On Daily Lesson Preparation For The K To 12 Basic Education Program
33 pages
Anatomy of Long-Form Content To KBQA System and QA Generator
No ratings yet
Anatomy of Long-Form Content To KBQA System and QA Generator
8 pages
JPNR S10 3301
No ratings yet
JPNR S10 3301
7 pages
QAS Final Report-2
No ratings yet
QAS Final Report-2
39 pages
Farsnewsqa: A Deep Learning Based Question Answering System For The Persian News Articles
No ratings yet
Farsnewsqa: A Deep Learning Based Question Answering System For The Persian News Articles
17 pages
Logical Reasoning Project
No ratings yet
Logical Reasoning Project
13 pages
JPNR - S10 - 330
No ratings yet
JPNR - S10 - 330
6 pages
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
From Everand
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Steven Cooper
No ratings yet
Week 9 - Vygotsky Socio Cultural Theory
No ratings yet
Week 9 - Vygotsky Socio Cultural Theory
51 pages
Synopsis Final 1 - BE
No ratings yet
Synopsis Final 1 - BE
9 pages
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
Trinity ESOL Skills For Life Specifications - Entry 2
No ratings yet
Trinity ESOL Skills For Life Specifications - Entry 2
28 pages
JPNR S10 3301
No ratings yet
JPNR S10 3301
7 pages
The Question Answering System Using NLP and AI
No ratings yet
The Question Answering System Using NLP and AI
6 pages
Automated Question Generator System Using NLP Libraries
No ratings yet
Automated Question Generator System Using NLP Libraries
5 pages
Approaches To Natural Language Processing
No ratings yet
Approaches To Natural Language Processing
9 pages
Instructional Module in Understanding The Self: School of Teacher Education
No ratings yet
Instructional Module in Understanding The Self: School of Teacher Education
5 pages
Automatic Paper Corrector Using NLP - 1650875208
No ratings yet
Automatic Paper Corrector Using NLP - 1650875208
4 pages
EasyChair Preprint 8588
No ratings yet
EasyChair Preprint 8588
13 pages
Question-And-Answer System Using Natural Language Processing
No ratings yet
Question-And-Answer System Using Natural Language Processing
19 pages
Report 24
No ratings yet
Report 24
7 pages
PNLwmsci FINAL
No ratings yet
PNLwmsci FINAL
6 pages
LLM-Powered Natural Language Text Processing For O
No ratings yet
LLM-Powered Natural Language Text Processing For O
14 pages
Lecture 6 - CS50's Introduction To Artificial Intelligence With Python
No ratings yet
Lecture 6 - CS50's Introduction To Artificial Intelligence With Python
12 pages
A Review On Question Generation System From Punjabi Text
No ratings yet
A Review On Question Generation System From Punjabi Text
3 pages
Machines and Human Language
No ratings yet
Machines and Human Language
4 pages
IA1 Case Study Material-AI and ML
No ratings yet
IA1 Case Study Material-AI and ML
3 pages
Game-Based Learning
No ratings yet
Game-Based Learning
2 pages
Educ 14: Facilitating Learner-Centered Teaching
No ratings yet
Educ 14: Facilitating Learner-Centered Teaching
16 pages
IRJET Automated MCQ Generator Using Natu
No ratings yet
IRJET Automated MCQ Generator Using Natu
6 pages
Effectiveness of Reward As A Modifier On Students Behavior at Primary Level
No ratings yet
Effectiveness of Reward As A Modifier On Students Behavior at Primary Level
10 pages
Robustness in Natural Language Processing Addressing Challenges in Text-Based AI Systems
No ratings yet
Robustness in Natural Language Processing Addressing Challenges in Text-Based AI Systems
5 pages
Unit 3, Text As A Context
No ratings yet
Unit 3, Text As A Context
37 pages
Brookes Publishing K-12 Faculty Brochure Spring 2023
No ratings yet
Brookes Publishing K-12 Faculty Brochure Spring 2023
27 pages
UA-Utilities JD - Domain Consultant
No ratings yet
UA-Utilities JD - Domain Consultant
6 pages
Planet Spark
No ratings yet
Planet Spark
4 pages
Detailed Lesson Plan (DLP) Format: Instructional Planning
No ratings yet
Detailed Lesson Plan (DLP) Format: Instructional Planning
6 pages
Training Feedback Form
100% (1)
Training Feedback Form
4 pages
Critique Writing
No ratings yet
Critique Writing
4 pages
Chap 1
No ratings yet
Chap 1
14 pages
Artificial Intelligence Curriculum
No ratings yet
Artificial Intelligence Curriculum
10 pages
PR2 Q1 1st Summative TOS
No ratings yet
PR2 Q1 1st Summative TOS
1 page
MY PROGRESS BOOKLET TEMPLATE para WORD
No ratings yet
MY PROGRESS BOOKLET TEMPLATE para WORD
4 pages
Data Measures Inventory
No ratings yet
Data Measures Inventory
3 pages
Nerdy Birdy Lesson Plan
No ratings yet
Nerdy Birdy Lesson Plan
3 pages
Episode 16
No ratings yet
Episode 16
5 pages
International Research Journal 1 (127) January
No ratings yet
International Research Journal 1 (127) January
5 pages
Abdul Rasheed
No ratings yet
Abdul Rasheed
2 pages
EDUC 106 - Lesson Plan Activity by Maculada
No ratings yet
EDUC 106 - Lesson Plan Activity by Maculada
2 pages
Senior High School Perception On Use of Technology in The Classroom
No ratings yet
Senior High School Perception On Use of Technology in The Classroom
2 pages

The Idea by Woobensky

Uploaded by

The Idea by Woobensky

Uploaded by

Bensky

Woobensky Pierre | Natural Language Processing | 30 novembre 2021

You might also like