Book Prism

It is an ancient dream to replicate machines to perform human functions, like reading. However, machine learning has grown from a dream to reality, over the last five decades. Now, there are several techniques and algorithms to train a machine in order to perform things like humans. Listening is the

2025-06-28 16:25:43 - Adil Khan

Project Title

Book Prism

Project Area of Specialization Artificial IntelligenceProject Summary

It is an ancient dream to replicate machines to perform human functions, like reading. However, machine learning has grown from a dream to reality, over the last five decades. Now, there are several techniques and algorithms to train a machine in order to perform things like humans. Listening is the first language skill that we acquire. Listening to audio books can make learning much easier and entertaining.
Our application intends to provide three modules, the image processing module, voice processing module and syncing of text and speech. Image processing module converts the image into text, whereas, voice processing module changes the text into sound. However, in the last module we will synchronize text and speech and highlight the text. Optical character recognition and speech synthesis are the two main components which will be used in these modules. Optical character recognition is the process of converting scanned images of machine printed or handwritten text, into a computer format text. Speech synthesis is the artificial synthesis of human speech.

Project Objectives

The main goals and objectives of our project are:
1. To do image processing to convert the given image into text.
2. To do text processing to analyze, normalize and transcribe the text into a phonetic or some other linguistic representation.
3. To generate speech from text produced by OCR.
4. To synchronize text and speech in order to highlight the text.
5. To highlight the text in order to make it visible to the user.

Project Implementation Method

      This project will consist of creating a web application for listening audiobooks using speech synthesis and text highlighting. The input data will be given either in the form of text or image. As stated earlier, there are five modules on which this application will work. The first is an image processing module which converts the image into text. The second is a voice processing module that generates speech from text. The third module is to remove garbage text from the storybook which includes removal of page numbers and the title of the storybooks from the top of the top of the page. Moreover, in fourth module we will synchronize text and speech. These modules could be done using machine learning (ML). Once the speech is generated, we will do synchronization between text and speech and highlight the word at that time.      This project will consist of creating a web application for listening audiobooks using speech synthesis and text highlighting. The input data will be given either in the form of text or image. As stated earlier, there are five modules on which this application will work. The first is an image processing module which converts the image into text. The second is a voice processing module that generates speech from text. The third module is to remove garbage text from the storybook which includes removal of page numbers and the title of the storybooks from the top of the top of the page. Moreover, in fourth module we will synchronize text and speech. These modules could be done using machine learning (ML). Once the speech is generated, we will do synchronization between text and speech and highlight the word at that time.

Benefits of the Project

The main goal of our project is to make a scalable web application which will help the users to listen the audio of the book. Our goal is to keep things as simple as we can. As the users of our app will be kids so we will be using simple graphical user interfaces for interaction with users. As we are making a web application, we have large amount of data to be accessed in a very short time and we have to process the requests of several users. So, our objective is to make the image uploading time and video loading time efficient enough such that the user does not find it slow. We will be using python and django framework to make our web application.

Technical Details of Final Deliverable

Product functions are divided into two categories depending upon the type of user. Product Functions are as follows:

Administration:

  1. Log in to the account
  2. Log out of the account
  3. Deactivate the account
  4. Admin to remove the user
  5. Change the  password
  6. Admin to view user’s statistics

User:

  1. Sign up in the account
  2. Log in to the account
  3. Log out of the account
  4. Upload the book
  5. Listen the contents of the book
  6. Delete the book
  7. Download the book
  8. Remove the garbage text
  9. Search the book
  10. Pause audio
  11. Restart audio
  12. Rate the book
  13. Deactivate the account.
  14. Change the password
  15. Users to view their information
  16. Users to make the book public.
  17. Users to make the book private.
  18. Text is highlighted

Performance

System will be efficient and optimized for better performance. Response time will be as low as possible. Number of users that can access the system at a time depends on the limit provided by the server.

Realibility

The user will get the accurate result of the uploaded book. System generates the video in accurate format.

Final Deliverable of the Project and Beneficiaries
Administration, students, and book enthusiastic at a university or college are the primary beneficiaries of this system.

Final Deliverable of the Project Software SystemCore Industry ITOther Industries Education Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Quality EducationRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 33000
Domain Registration (1 Year) Equipment180008000
Python And Django Framework For Beginners Complete Course Miscellaneous 130003000
The Complete ReactJs Course - Basics to Advanced (2021) Miscellaneous 130003000
Front End Web Development Ultimate Course 2021 Miscellaneous 140004000
SSL Encryption (1 Year) Equipment170007000
Hosting For Website Equipment180008000

More Posts