PL EN


2016 | 296 | 86-121
Article title

PaX-DBSCAN: a proposed algorithm for improved clustering

Content
Title variants
PL
PaX-DBSCAN: propozycja algorytmu dla doskonalonego grupowania
Languages of publication
EN
Abstracts
EN
We focused on applying parallel computing technique to the bulk loading of X-tree in other to improve the performance of DBSCAN clustering algorithm. We have given a full description of how the system can be archived. We proposed a new parallel algorithm for DBSCAN and another algorithm to extend the X-tree spatial indexing structure. Spatial database systems incorporate space in database systems, they support nontraditional data types and more complex queries, therefore in order to optimise such systems for efficient information processing and retrieval, appropriate techniques must be adopted to facilitate the construction of suitable index structures.
PL
W artykule autorzy skupiają swoją uwagę na zastosowaniu techniki przetwarzania równoległego przy wykorzystaniu struktur drzewiastych X-tree i algorytmu bulk loading. Zaproponowano nowy algorytm przetwarzania równoległego DBSCAN i drugi algorytm dla rozszerzania struktur indeksowania przestrzennego. Algorytm grupowania DBSCAN jest efektywnym algorytmem grupowania dla Systemów Przestrzennych Baz Danych, który ma możliwość wykrywania zakłóceń i nie wymaga znacznej liczby skupień wcześniej ustalonych, jednakże działanie algorytmu zmienia się, gdy rozmiar danych jest duży. Ten algorytm może nie działać optymalnie, jeśli niewłaściwe wartości są wybrane dla minpts i eps. Dlatego nowy zaproponowany algorytm powinien eliminować te ograniczenia.
Year
Volume
296
Pages
86-121
Physical description
Contributors
  • University of Huddersfield, UK. Department of Computer Science
author
  • University of Huddersfield, UK. Department of Computer Science
References
  • I. Lungu, A. Velicanu, Spatial Database Technology Used in Developing Geographic Information Systems, The 9th International Conference on Informatics in Economy – Education, Research & Business Technologies, Academy of Economic Studies, Bucharest, 7-8 May 2009, pp. 728-734.
  • GIS Geography, GIS Spatial Data Types: Vector vs Raster, 2016, http://gisgeography.com/spatial-data-types-vector-raster/ (accessed: 22.11.2016).
  • A. Gottlieb & G.S. Almasi, Highly Parallel Computing, Benjamin-Cummings, Redwood City, CA 1989.
  • R.H. Güting, C, “The VLDB Journal − The International Journal on Very Large Data Bases” 1994, Vol. 3(4), pp. 357-399.
  • A. Velicanu Belciu, S. Olaru, Optimizing Spatial Databases, 2011, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1800758 (accessed: 20.11.2016).
  • K. Kailing, H.P. Kriegel, P. Kröger, Density-Connected Subspace Clustering for High-Dimensional Data [in:] Proceedings 4th SIAM International Conference on Data Mining, Vol. 4, Lake Buena Vista, FL 2004, pp. 246-257.
  • P. Verma, Y.K. Jain, High Dimensional Object Analysis Using Rough-Set Theory and Grey Relational Clustering Algorithm, “International Journal of Advanced Research in Computer and Communication Engineering” 2016, Vol. 5(5), pp. 404-410.
  • J. Liu, New Approaches for Clustering High Dimensional Data, Doctoral dissertation, University of North Carolina, Chapel Hill 2006.
  • T. Liu, Fast Nonparametric Machine Learning Algorithms for High-Dimensional Massive Data and Applications, No. CMU-CS-06-124, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 2006.
  • J. Han, M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA 2001. pp. 335391.
  • U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From Data Mining to Knowledge Discovery in Databases, ”AI Magazine” 1996, Vol. 17(3), p. 37.
  • M.M.A. Patwary, D. Palsetia, A. Agrawal, W.K. Liao, F. Manne, A. Choudhary, A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure [in:] International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, UT, November, 2012, pp. 1-11.
  • M. Ester, H.P. Kriegel, J. Sander, X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, “Kdd” August 1996, Vol. 96, No. 34, pp. 226-231.
  • Rough Sets and Current Trends in Computing, M. Szczuka, M. Kryszkiewicz, R. Jensen, Q. Hu (eds.), Proceedings of the 7th International RSCTC Conference, LNAI 6086, Springer Verlag, Berlin Heidelberg 2010, pp. 60-69.
  • S. Vijayalaksmi, M. Punithavalli, A Fast Approach to Clustering Datasets Using DBSCAN and Pruning Algorithms, “International Journal of Computer Applications” 2012, Vol. 60(14).
  • P. Berkhin, A Survey of Clustering Data Mining Techniques [in:] Grouping Multidimensional Data, J. Kogan, Ch. Nicholas, M. Teboulle (eds.), Springer Verlag, Berlin-Heidelberg 2006, pp. 25-71.
  • M. Ester, H.P. Kriegel, J. Sander, Knowledge Discovery in Spatial Databases [in:] Mustererkennung 1999, Springer Verlag, Berlin-Heidelberg 1999, pp. 1-14.
  • N. Mamoulis, Spatial Data Management, 1st ed., Morgan & Claypool Publishers, US 2012.
  • S. Berchtold, D.A. Keim, H.P. Kriegel, An Index Structure for High-Dimensional Data, “Readings in Multimedia Computing and Networking” 2001, pp 451-462.
  • H. Samet, Foundations of Multidimensional and Metric Data Structures, Morgan Kaufmann, Burlington, MA 2006.
  • D. Taniar, C.H. Leung, W. Rahayu, S. Goel, High Performance Parallel Database Processing and Grid Databases, Vol. 67, John Wiley & Sons, New York 2008.
  • D. Jagli, Parallel Database, 2013, http://www.slideshare.net/dhanajagli1/paralleldatabase> (accessed: 26.10.2016).
  • A. Papadopoulos, Y. Manolopoulos, Parallel Bulk-Loading of Spatial Data, “Parallel Computing” 2003, Vol. 29(10), pp. 1419-1444.
  • Z. Qin, Z. Ershun, H. Yaohuan, Research on Parallel Bulk-Loading R-Trees Based on Partition Technology of Database, “The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences” 2008, Vol. 37, Part B4.
  • D. Achakeev, M. Seidemann, M. Schmidt, B. Seeger, Sort-Based Parallel Loading of R-Trees [in:] Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, ACM, Redondo Beach, CA 2012, pp. 62-70.
  • B. Barney, Introduction to Parallel Computing, “Lawrence Livermore National Laboratory” 2010, Vol. 6(13), pp. 1-34.
  • F. Provost, T. Fawcett, Data Science for Business: What You Need To Know about Data Mining and Data-Analytic Thinking, O’Reilly Media, Sevastopol, CA 2013.
  • Apache Hadoop, 2016, http://hadoop.apache.org/index.html (accessed: 28.10.2016).
  • R. Lammel, Google’s Mapreduce Programming Model – Revisited, “Science of Computer Programming” 2008. Vol. 70, pp. 1-30.
  • K. Lee, R.K. Ganti, M. Srivatsa, L. Liu, Efficient Spatial Query Processing for Big Data [in:] Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems – SIGSPATIAL, Dallas – Fort Worth, TX 2014, pp. 469-472.
  • S. Shekhar, V. Gunturi, M.R. Evans, K. Yang, Spatial Big-Data Challenges Intersecting Mobility and Cloud Computing [in:] Proceedings of the 11th ACM International Workshop on Data Engineering for Wireless and Mobile Access − MobiDE’ 12, Scottsdale, AZ 2012.
  • S. Shekhar, M.R. Evans, V. Gunturi, K. Yang, D.C. Cugler, Benchmarking Spatial Big Data [in:] T. Rabl, M. Poess, C. Baru, H.-A. Jacobsen (eds.), Specifying Big Data Benchmarks, Springer Verlag, Berlin-Heidelberg 2014, pp. 81-93.
  • Y. Wang, S. Wang, D. Zhou, Retrieving and Indexing Spatial Data in the Cloud Computing Environment [in:] The First International Conference on Cloud Computing, Springer Verlag, Berlin-Heidelberg 2009.
  • L. Zhao, L. Chen, R. Ranjan, K.K.R. Choo, J. He, Geographical Information System Parallelization for Spatial Big Data Processing: A Review, “Cluster Computing” 2016, Vol. 19(1), pp. 139-152.
  • Maitrey S., Jha, C.K. (2015). Handling Big Data Efficiently by Using Map Reduce Technique, Paper presented at the The 2015 IEEE International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, IN.
  • Y. Zhong, J. Han, T. Zhang, Z. Li, J. Fang, G. Chen, Towards Parallel Spatial Query Processing for Big Spatial Data [in:] 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), IEEE, Piscataway, NJ 2012, pp. 2085-2094.
  • W. Tang, W. Feng, Parallel Map Projection of Vector-Based Big Spatial Data: Coupling Cloud Computing with Graphics Processing Units, “Computers, Environment and Urban Systems” 2014.
  • H. Tan, W. Luo, H. Mao, L.M. Ni, On Packing Very Large R-Trees [in:] IEEE 13th International Conference on Mobile Data Management, Bengaluru, Karnataka 2012, pp. 99-104.
  • S. Li, S. Dragicevic, F.A. Castro, M. Sester, S. Winter, A. Coltekin, C. Pettit, Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges, ”ISPRS Journal of Photogrammetry and Remote Sensing” 2016, Vol. 115, pp. 119-133.
  • P. Ogden, D. Thomas, P. Pietzuch, AT-GIS: Highly Parallel Spatial Query Processing with Associative Transducers, SIGMOD’16, , San Francisco, CA June 26 − July 1, 2016.
  • M. Chen, X. Gao, H. Li, Parallel DBSCAN with Priority R-Tree [in:] The 2nd IEEE International Conference on IEEE Information Management and Engineering (ICIME), Chengdu 2010, pp. 508-511.
  • B. Welton, E. Samanas, B.P. Miller, Mr. Scan: Extreme Scale Density-Based Clustering Using a Tree-Based Network of GPGPU Nodes [in:] Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ACM, Denver, CO November 2013, p. 84.
  • M. Noticewala, D. Vaghela, MR-IDBSCAN: Efficient Parallel Incremental DBSCAN Algorithm using MapReduce, “International Journal of Computer Applications” 2014, Vol. 93(4).
  • X. Xu, J. Jäger, H.P. Kriegel, A Fast Parallel Clustering Algorithm for Large Spatial Databases [in:] High Performance Data Mining, Springer Verlag, US 1999, pp. 263-290.
  • L.S. El-Sayed, H.M. Abdul-Kader, S.M. El-Sayed, Performance Analysis of Spatial Indexing in the Cloud, “International Journal of Computer Applications”, 2015, Vol. 118(4), pp. 1-4.
  • W.W. Song, B.X. Jin, S.H. Li, X.Y. Wei, D. Li, F. Hu, Building Spatiotemporal Cloud Platform for Supporting GIS Application, “ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences” 2015, Vol. 2(4), pp. 55-62.
  • M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: Cluster Computing with Working Sets, “HotCloud” 2010, Vol. 10, pp. 1-7.
  • S. You, J. Zhang, L. Gruenwald, GPU-Based Spatial Indexing and Query Processing Using R-Trees [in:] BigSpatial'13 Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, Orlando, FL 2013.
  • A. Singh, D. Garg, Implementation and Performance Analysis of Exponential Tree Sorting, “International Journal of Computer Applications” June 2011, Vol. 24, No. 3, pp. 34-38.
  • K. Akkaya, A. Yazici, An Indexing Method for Spatial Databases, “XIV. International Symposium on Computer and Information Sciences (ISCIS'99)” April 1999.
  • V. Gaede, O. Günther, Multidimensional Access Methods, “ACM Computing Surveys (CSUR)” 1998,Vol. 30(2), pp. 170-231.
  • T. Lee, S. Lee, OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for R-Tree, “CAiSE Short Paper Proceedings” June 2003, Vol. 74, pp. 69-72.
  • X. Liu, J. Han, Y. Zhong, C. Han, X. He, Implementing Webgis on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS [in:] 2009 IEEE International Conference on Cluster Computing and Workshops IEEE, New Orleans, LA August 2009, pp. 1-8.
  • Y. Liu, N. Jing, L. Chen et al., Parallel Bulk-Loading of Spatial Data with MapReduce: An R-Tree Case, “Wuhan University Journal of Natural Science” 2011, 16, pp. 513.
  • N. Roussopoulos, D. Leifker, Direct Spatial Search on Pictorial Databases Using Packed R-Trees, “ACM SIGMOD Record” May 1985, Vol. 14, No. 4, pp. 17-31.
  • I. Kamel, C. Faloutsos, On Packing R-Trees, “Proceedings of the Second International Conference on Information and Knowledge Management” December 1993, pp. 490-499.
  • S.T. Leutenegger, M.A. Lopez, J. Edgington, STR: A Simple and Efficient Algorithm for R-Tree Packing [in:] Proceedings of 13th International Conference on Data Engineering, IEEE, Graz, Austra April 1997, pp. 497-506.
  • D. Achakeev, B. Seeger, P. Widmayer, Sort-Based Query-Adaptive Loading of R-Trees [in:] Proceedings of the 21st ACM International Conference on Information and Knowledge Management, ACM, Maui, HI October 2012, pp. 2080-2084.
  • S. Hua, J. Nan, H. Bin, L. Heng, Z. Jin, GPU-Based Parallel Bulk Loading R-Trees Using STR Method on Fine-Grained Model[J], “Geomatics and Information Science of Wuhan Universty” 2014. Vol. 39(9), pp. 1068-1073.
  • B.C. Giao, D.T. Anh, Improving Sort-Tile-Recusive algorithm for R-tree packing in indexing time series, “2015 IEEE RIVF International Conference on IEEE Computing & Communication Technologies-Research, Innovation, and Vision for the Future (RIVF)” 2015, pp. 117-122.
  • I. Kamel, C. Faloutsos, Parallel R-Trees, “Research Showcase CMU” 1992, Vol. 21, No. 2, pp. 195-204.
  • E.G. Hoel, H. Samet, Performance of Data-Parallel Spatial Operations, “Proceedings of VLDB Conference” 1994, pp. 156-167.
  • B. Schnitzer, S.T. Leutenegger, Master-Client R-Trees: A New Parallel R-Tree Architecture, “Proceedings of SSDBM Conference” 1999, pp. 68-77.
  • P. Apostolos, M. Yannis, Parallel Bulk-Loading of Spatial Data, “Journal of Parallel Computing” 2003, Vol. 29(10), pp. 1419-1444.
  • L. Luo, M.D.F. Wong, L. Leong, Parallel Implementation of R-Trees on the GPU [in:] 17th Asia and South Pacific Design Automation Conference, Sydney 2012, pp. 353-358.
  • S. Berchtold, D. A. Keim, H.-P. Kriegel, The X-tree: An Index Structure for High-Dimensional Data [in:] Proceedings of the 22nd VLDB Conference, Mumbai, India1996, pp. 28-39.
  • P. Ciaccia, M. Patella, P. Zezula, M-tree An Efficient Access Method for Similarity Search in Metric Spaces [in:] Proceedings of the 13th International Conference on Very Large Data Bases, Athens 1997.
  • K.S. Candan, M.L. Sapino, Data Management for Multimedia Retrieval, Cambridge University Press, Cambridge 2010.
  • Y. Manolopoulos, A. Nanopoulos, A.N. Papadopoulos, Y. Theodoridis, R-Trees: Theory and Applications, Springer Science and Business Media, Berlin-Heidelberg 2010.
  • A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM: Parallel Virtual Machine − A User’s Guide and Tutorial for Networked Parallel Computing, 3rd ed., The MIT Press, Cambridge, MA − London, England 1996.
Document Type
Publication order reference
Identifiers
ISSN
2083-8611
YADDA identifier
bwmeta1.element.cejsh-345a93a1-0ae1-4022-b11a-b2435bf9fe48
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.