Automatic Question Generation and Evaluation

Evaluating Automatic Question Generation Model Designed for Engineering Subject. Generation of questions from an extract is a very tedious task for humans and an even tougher one for machines. In Automatic Question Generation (AQG), it is extremely important to examine the ways in which this

2025-06-28 16:25:27 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

Evaluating Automatic Question Generation Model Designed for Engineering Subject.

Generation of questions from an extract is a very tedious task for humans and an even tougher one for machines. In Automatic Question Generation (AQG), it is extremely important to examine the ways in which this can be achieved with sufficient levels of accuracy and efficiency. The ways in which this can be taken ahead is by using the Natural Language Processing (NLP) to process the input and to work with it for AQG. Using NLP with question generation algorithms the system can generate the questions for better understanding of the text document.The input is pre-processed before actually moving in for the question generation process. The questions formed are first checked for proper context satisfaction with the context of the input to avoid invalid or unanswerable question generation. It is then preprocessed using various NLP based mechanisms like tokenization, named entity recognition(NER) tagging, parts of speech(POS) tagging, etc. The question generation system consists of machine learning classification-based Fill in the blank(FIB) generator that also generates multiple choices and a rule-based approach to generate Wh type questions. It also consists of a question evaluator where the user can evaluate the generated question. The results of these evaluations can help in improving our system further. Also, evaluation of Wh questions has been done using BLEU score to determine whether the automatically generated questions resemble closely to the human generated ones. This system can be used in various places to help ease the question generation and also at self-evaluator systems where the students can assess themselves so as to determine their concept understanding. Apart from educational use, it would also be helpful in building chatbot based applications. This work can help improve the overall understanding of the level to which the concept given is understood by the candidate and the ways in which it can be understood more properly. We have taken a simple yet effective approach to generate the questions. Our evaluation results show that our model works well on simpler sentences.

Project Objectives

To desgin eveluation model for Automatic Question Generation Model
Validate the Designe AQG models for Engineering Subject
Use various data sets to validate the beachmarks
To eveluate Automatic Question of MCQS and WH.

Project Implementation Method

Fill in the Blank and MCQS types Questions

We have used SQUAD 1.0 dataset which contains about 100,000 questions generated on Wikipedia articles. Intuitively, the task of selecting a probable answer is very much similar to tagging a word as spam or not spam . Hence, we decided to use binary classification on each input word to tag it whether it is an answer or not. For this task, each non-stop word from the paragraphs of SQUAD dataset were extracted and we added some features on them like POS tag, shape, word count, NER tag, etc. and a label ‘isAnswer’. Using the data generated from the previous step, we used scikit-learn's Gaussian Naive Bayes algorithm to train a model that would tag each word as whether it can be a pivotal answer or not. The advantage of using Naïve Bayes is that it also gives us the probability of each word, which will be used to choose the most probable pivotal answer. The distractors are generated using pre-trained word-embeddings and cosine-similarity . This will generate words that will be used as the multiple-choice options. Once the model is trained, we save the model to use it later for user inputs. After the user uploads the document. The content is split into sentences and various preprocessing is done to clean the text. We feed these sentences to the model that we saved earlier to predict the pivotal answers. The generated results are then formatted and displayed to the user.

Example Input Sentence: “The fourth planet from the Sun in the Earth's solar system is Mars, which is sometimes called the Red Planet.”

Pivotal Answer chosen: Mars Options generated: Moon, Jupiter, Saturn

WH types Questions

For generation of Wh questions, the sentences are filtered as the entire text cannot be used to generate questions. We use the top sentences to generate our questions. To identify the top sentences we use NLTK’s Textrank algorithm. This algorithm takes out the most important sentences present in the text. The general preprocessing part involves tokenizing the uploaded text document. The words are then tagged using POS tagging and NER tagging.The question generation procedures are further classified based on sentence structures. Each of these have different transformation rules and algorithms. They are classified into:
? Named Entity Recognition based algorithm
? Discourse Marker based algorithm
? Non Discourse marker based algorithm

Evaluation

To evaluate the quality of the generated question and the performance of the whole system. We have used two distinct approaches
? Automated Evaluation of Wh Questions using BLEU.
? Human Evaluation of FIB & Wh Questions.

Benefits of the Project

Generate Valide Questions Automatically
Time Saving.
Exam Preparation using different seniors based questions
.Can be implement in health care-system to expidate the presratation time for student exam process and save time
A valide AQG can also be used in virtual assistants and various dialogue systems.
Our model will save human time and efforts for desging error free model

Technical Details of Final Deliverable

We will build a system that accepts a text document from the user and then generates questions. The document may contain a passage or an extract of any topic. The main architecture is mainly divided into three modules: Authentication - As the user has to create an account to use the QG system, the user must sign up to create an account. Authentication is done by Django authentication or the user can use his Google account as well to log in to the system. Question Generator - After logging into the system the user has to upload a text document. The document cannot contain any images or non-textual special characters. The user is provided with the option of downloading or rating the generated question. Based on the choice, the generated question file or the ratings will be saved in the local disk. Also, the user has to choose the maximum number of questions to be generated, although it cannot exceed the number of sentences in a text document. The questions are of two types, namely Fill in the blank(FIB) and Wh type of questions.
The FIB model uses machine learning techniques to generate FIB questions. It identifies the keyword using classification techniques. The keyword is then replaced with a blank line and the remaining sentence is used as the FIB question. It also generates multiple wrong answers or distractors using the keyword so that it can be used as MCQ. The Wh question first takes out the top sentences, preprocesses the text and then generates questions based on the type of sentences. Sentences are classified into NER tags based, discourse marker based and non discourse marker based algorithms. It applies various transformation rules to generate the questions based on the structure of the sentences.

Evaluation - Once the questions are generated, if the user wants to evaluate the questions generated, they can rate the generated questions based on various criteria like answerability, grammatical correctness of the question. This would help in improving the system. Apart from this we have also implemented BLEU scores to compare our questions to human generated ones.

Final Deliverable of the Project Software SystemCore Industry EducationOther Industries IT , Medical , Others , Health Core Technology Artificial Intelligence(AI)Other Technologies Artificial Intelligence(AI), Augmented & Virtual RealitySustainable Development Goals Quality EducationRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	80000
Google colab pro	Miscellaneous	4	1300	5200
Nvidia quadro p1000	Equipment	1	70000	70000
thesis Printing and Binding	Miscellaneous	4	1200	4800

Automatic Question Generation and Evaluation

More Posts