Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2020 | 15 | 1-22

Article title

Domain specific key feature extraction using knowledge graph mining

Content

Title variants

Languages of publication

EN

Abstracts

EN
In the field of text mining, many novel feature extraction approaches have been propounded. The following research paper is based on a novel feature extraction algorithm. In this paper, to formulate this approach, a weighted graph mining has been used to ensure the effectiveness of the feature extraction and computational efficiency; only the most effective graphs representing the maximum number of triangles based on a predefined relational criterion have been considered. The proposed novel technique is an amalgamation of the relation between words surrounding an aspect of the product and the lexicon-based connection among those words, which creates a relational triangle. A maximum number of a triangle covering an element has been accounted as a prime feature. The proposed algorithm performs more than three times better than TF-IDF within a limited set of data in analysis based on domain-specific data.

Year

Volume

15

Pages

1-22

Physical description

Contributors

  • Samsung Research Institute, Noida, India
  • Samsung Research Institute, Noida, India,

References

  • Aggarwal C.C. (2018), Machine Learning for Text, Springer, Cham.
  • Biswas S.K., Bordoloi M., Shreya J. (2018), A Graph-based Keyword Extraction Model Using Collective Node Weight, Expert Systems with Applications, 97, 51-59, https://doi.org/10.1016/j.eswa.2017.12.025.
  • Bonatti P., Decker S., Polleres A., Presutti V. (2018), Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371), Dagstuhl Reports, 8, 29-111.
  • Campolo A., Sanfilippo M., Whittaker M., Crawford K. (2018), AI Now 2017 Report, Symposium and Workshop, January, AI Now Institute at New York University.
  • Campos R., Mangaravite V., Pasquali A., Jorge A., Nunes C., Jatowt A. (2020), YAKE! Keyword Extraction from Single Documents using Multiple Local Features, Information Sciences, 509, 257-289, DOI: 10.1016/j.ins.2019.09.013.
  • Dave K., Lawrence S., Pennock D.M. (2003), Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews, Proceedings of the 12th International Conference on World Wide Web, 519-528.
  • Devika R., Subramaniyaswamy V. (2019), A Semantic Graph-based Keyword Extraction Model Using a Ranking Method on Big Social Data, Wireless Netw, https://doi.org/10.1007/s11276-019-02128-x.
  • Feldman R., Dagan I. (1995), Knowledge Discovery in Textual Databases (KDT), Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August 20-21, AAAI Press, 112-117.
  • Giarelis N., Kanakaris N., Karacapilidis N. (2020), An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents, Artificial Intelligence Applications and Innovations, 583, May 6, 96-106, DOI: 10.1007/978-3-030-49161-1_9.
  • Houari M., Rhanoui M., Asri B. (2015), From Big Data to Big Knowledge: The Art of Making Big Data Alive, 1-6, DOI: 10.1109/CloudTech.2015.7337001.
  • Htay S.S., Lynn K.T. (2013), Extracting Product Features and Opinion Words Using Pattern Knowledge in Customer Reviews, The Scientific World Journal, Vol. 2013, Article ID 394758, 5 pages, https://doi.org/10.1155/2013/394758.
  • Hulth A. (2003a), Improved Automatic Keyword Extraction Given More Linguistic Knowledge, EMNLP, 216-223.
  • Hulth A. (2003b), Reducing False Positives by Expert Combination in Automatic Keyword Indexing, RANLP, 367-376.
  • Jaideepsinh K., Saini J. (2016), Stop-Word Removal Algorithm and Its Implementation for the Sanskrit Language, International Journal of Computer Applications, 150, 15-17, DOI: 10.5120/ijca2016911462.
  • Jia Y., Qui Y., Shang H., Jiang R., Li A. (2018), A Practical Approach to Constructing a Knowledge Graph for Cybersecurity, Engineering, 4(1), 53-60, https://doi.org/10.1016/j.eng.2018.01.004.
  • Jiang X., Hu Y., Li H. (2009), A Ranking Approach to Keyphrase Extraction, SIGIR, 756-757.
  • K-CAP ’19 (2019), Proceedings of the 10th International Conference on Knowledge Capture, September, 131-138, https://doi.org/10.1145/3360901.3364441.
  • Kim K., Hur Y., Kim G., Lim H. (2020), GREG: A Global Level Relation Extraction with Knowledge Graph Embedding, Applied Sciences, 10, 1181.
  • LeCun Y., Bengio Y., Hinton G. (2015), Deep Learning, Nature, 521, 436-44, https://doi.org/10.1038/nature14539.
  • Liu B. (2009), Handbook Chapter: Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, Marcel Dekker, Inc., New York, NY, USA.
  • Manrique R., Pereira B., Mariño O. (2019), Exploring Knowledge Graphs for the Identification of Concept Prerequisites, Smart Learning Environments, 6, 21, https://doi.org/10.1186/s40561-019-0104-3.
  • Markov A., Last M., Kandel A. (2007), Fast Categorization of Web Documents Represented by Graphs, Advances in Web Mining and Web Usage Analysis, 4811, 56-71.
  • Park D.-H., Kim S. (2008), The Effects of Consumer Knowledge on Message Processing of Electronic Word-of-mouth via Online Consumer Reviews, Electronic Commerce Research and Applications, 7, 399-410.
  • Ramos J. (2003), Using TF-IDF to Determine Word Relevance in Document Queries, Computer Science, Proceedings of the First Instructional Conference on Machine Learning, 1-4.
  • Rose S., Engel D., Cramer N., Cowley W. (2010), Automatic Keyword Extraction from Individual Documents, DOI: 10.1002/9780470689646.ch1.
  • Russell S.J., Norvig P. (2003), Artificial Intelligence − A Modern Approach: The Intelligent Agent Book, Prentice-Hall.
  • SAC ’07 (2007), Proceedings of the 2007 ACM Symposium on Applied Computing, March, 807-811, https://doi.org/10.1145/1244002.1244182.
  • Safrin R., Sharmila K.R., Shri Subangi T.S., Vimal E.A. (2017), Sentiment Analysis on Online Product Review, International Research Journal of Engineering and Technology (IRJET), 4, April, 2381-2388.
  • Sammons M., Christodoulopoulos C., Kordjamshidi P., Khashabi D., Srikumar V., Vijayakumar P., Bokhari M., Wu X., Roth D. (2016), Edison: Feature Extraction for NLP, Simplified [in:] N. Calzolari, K. Choukri, H. Mazo, A. Moreno, T. Declerck, S. Goggi, M. Grobelnik, J. Odijk, S. Piperidis, B. Maegaard, J. Mariani (eds.), Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA), 4085-4092.
  • Shi W., Zheng W., Yu J.X., Cheng H., Zou L. (2017), Keyphrase Extraction Using Knowledge Graphs, Data Science Engineering, 2, 275288, https://doi.org/10.1007/s41019-017-0055-z.
  • Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L. (2013), Syntactic Dependency-Based N-grams as Classification Features [in:] I. Batyrshin, M.G. Mendoza (eds.), Advances in Computational Intelligence, MICAI 2012, Lecture Notes in Computer Science, 7630, Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-642-37798-3_1.
  • Turney P.D. (2002), Learning to Extract Keyphrases from the Text, CoRR, cs. L.G./0212013.
  • Vazirgiannis M., Malliaros F., Nikolentzos G. (2018), GraphRep: Boosting Text Mining, NLP, and Information Retrieval with Graphs, Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2295-2296.
  • Wang Ch., Ma X., Chen J., Chen J. (2018), Information Extraction and Knowledge Graph Construction from Geoscience Literature, Computers & Geosciences, 112, 112-120, https://doi.org/10.1016/j.cageo.2017.12.007.
  • Wang Q., Mao Z., Wang B., Guo L. (2017), Knowledge Graph Embedding: A Survey of Approaches and Applications, IEEE Transactions on Knowledge and Data Engineering, 29(12), December 1, 2724-2743, DOI: 10.1109/TKDE.2017.2754499.
  • Wang W., Do D.B., Lin X. (2005), Term Graph Model for Text Classification, Advanced Data Mining and Applications, 19-30.
  • Willemsen L.M., Neijens P.C., Bronner F., de Ridder J.A. (2011), “Highly Recommended!” The Content Characteristics and Perceived Usefulness of Online Consumer Reviews, Journal of Computer-Mediated Communication, 17(1), October 1, 19-38, https://doi.org/10.1111/j.1083-6101.2011.01551.x.
  • Witten I.H., Paynter G.W., Frank E., Gutwin C., Nevill-Manning C.G. (1999), KEA: Practical Automatic Keyphrase Extraction, Proceedings of the Fourth ACM Conference on Digital Libraries, 254-255.
  • Xu J., Kim S., Song M., Jeong M., Kim D., Kang J., Rousseau J.F., Li X., Xu W., Torvik V.I., Bu Y., Chen Ch., Ebeid I.A., Li D., Ding Y. (2020), Building a PubMed Knowledge Graph, Scientific Data, 7, 205, https://doi.org/10.1038/s41597-020-0543-2.
  • Zhao H., Pan Y., Yang F. (2020), Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph, IEEE Access, 8, 168087-168098, DOI: 10.1109/ACCESS.2020.3024070.
  • (www 1) https://www.merriam-webster.com/dictionary/adjective (accessed: 1.11.2020).
  • (www 2) Ji S., Pan S., Cambria E., Marttinen P., Yuar P.S. (2021), A Survey on Knowledge Graphs: Representation, Acquisition and Applications, IEEE Transactions on Neural Networks and Learning Systems, Xiv:2002.00388 (accessed: 8.11.2020).
  • (www 3) Mäntylä M.V., Graziotin D., Kuutila M. (2018), The Evolution of Sentiment Analysis − A Review of Research Topics, Venues, and Top Cited Papers, Computer Science Review, 27, February, 16-32, arXiv:1612.01556 [cs.CL] (accessed: 10.11.2020).
  • (www 4) http://web.onda.com.br/abveiga/capitulo4-ingles.pdf (accessed: 11.11.2020).
  • (www 5) Mutlu E.C., Oghaz T.A., Rajabi A., Garibay I., Review on Learning and Extracting Graph Features for Link Prediction, arXiv:1901.03425 (accessed: 11.11.2020).
  • (www 6) https://www.sketchengine.eu/penn-treebank-tagset/#:~:text=English%20Penn%20Treebank%20part%2Dof%2Dspeech%20Tagset&text=Atagset%20is%20a%20list%20of,(case%2C%20tense%20etc.) (accessed: 12.11.2020).
  • (www 7) Hellström T., Dignum V., Bensch S. (2020), Bias in Machine Learning What Is It Good for? https://arxiv.org/pdf/2004.00686.pdf (accessed: 12.11.2020).
  • (www 8) https://www.lexico.com/definition/noun (accessed: 9.11.2020).

Document Type

Publication order reference

Identifiers

ISSN
2084-1531

YADDA identifier

bwmeta1.element.cejsh-27358206-1646-4c49-ac30-f3462d7454f5
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.