Document Clustering by Character-Level Feature Learning through Deep Learning
This Project introduces a unique approach in document clustering which has not been applied till now. In this approach our project can be divided into two major parts in one part we are breaking document at character level and obtaining a feature set and then in thr second part we are sending the ob
2025-06-28 16:32:11 - Adil Khan
Document Clustering by Character-Level Feature Learning through Deep Learning
Project Area of Specialization Artificial IntelligenceProject SummaryThis Project introduces a unique approach in document clustering which has not been applied till now. In this approach our project can be divided into two major parts in one part we are breaking document at character level and obtaining a feature set and then in thr second part we are sending the obtained data from first part to another neural network which performs the clustering task now this clustering is performed in two phases as well in the first phase there is parameter initialization and then in second phasr there is clustering task performed which optimized the results and we have clusters..
Project ObjectivesTo improve document clustering task making it more efficient by trying new approach of character-level.
Using neural networks to achieve better model for character-level approach.
Changing the perspective of seeing textual document on word-level approach for clustering.
Extracting features from a document more efficiently as proposed in traditional document-clustering tasks.
Project Implementation MethodOur Project Implementation phase is divided into multiple steps but major steps could be given as following:
1.Research on clustering with neural nets.
2.Initial Experiment of clustering with without neural networks, with neural network with third-party APIs neural network.
3.Literature review for character level approach.
4.Research on character level compaitability with neural network.
5. Implementation of character level approach with neural network and optimizing the neural network as much as possible,
6.Extracting featureset from character level neural network
7.Literature review on Deep Learning Clustering Algorithms.
8.Implementation of Deep Embedded Clustering(DEC) on different Datasets.
9.Implementation of DEC on our dataset by first making compaitable with code.
10. Comparision of our model with state-of-the-art results.
Benefits of the ProjectCurrently most of the clustering tasks are associated with explicit feature learning which includes a training data with labels of different classes. Secondly neural networks are mostly used for classifications tasks. Our main objective is to deal with character level approach for the clustering tasks which means there will be less unique features in a corpse but the overall computation time will be huge with a very slow process due to many different tokens. The advantages for using character level is no dictionary has to be made which is a greatest advantage of this approach as it reduces the space complexity by a great margin as it doesnot maintain any dictionary and since dictionary is not maintained so no matter if content increases space doesnot have to be increased at same rate.
Technical Details of Final Deliverable- An Optimized Convolution Neural Network with character-level approach.
- A neural network with Deep Embedded Clustering.
- A Web Application giving clustering results as dataset are uploaded to it.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 80000 | |||
| NVIDIA Graphic Card | Equipment | 1 | 50000 | 50000 |
| Standard Desktop System | Equipment | 1 | 20000 | 20000 |
| Web hosting services | Miscellaneous | 1 | 10000 | 10000 |