This title appears in the Scientific Report :
2023
Please use the identifier:
http://dx.doi.org/10.1007/s10489-023-04482-y in citations.
Please use the identifier: http://hdl.handle.net/2128/34242 in citations.
DISCONA: Distributed Sample Compression for Nearest Neighbor Algorithm
DISCONA: Distributed Sample Compression for Nearest Neighbor Algorithm
Sample compression using epsilon nets effectively reduces the number of labeled instances required for accurate classification with nearest neighbor algorithms. However, one-shot construction of an epsilon nets can be extremely challenging in large-scale distributed data sets. We explore two approac...
Saved in:
Personal Name(s): | Rybicki, Jedrzej (Corresponding author) |
---|---|
Frenklach, Tatiana / Puzis, Rami | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: | Applied intelligence, 53 (2023) 7, S. 14 |
Imprint: |
Dordrecht [u.a.]
Springer Science + Business Media B.V
2023
|
DOI: |
10.1007/s10489-023-04482-y |
Document Type: |
Journal Article |
Research Program: |
Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups |
Link: |
OpenAccess |
Publikationsportal JuSER |
Please use the identifier: http://hdl.handle.net/2128/34242 in citations.
Sample compression using epsilon nets effectively reduces the number of labeled instances required for accurate classification with nearest neighbor algorithms. However, one-shot construction of an epsilon nets can be extremely challenging in large-scale distributed data sets. We explore two approaches for distributed sample compression: one where local epsilon net is constructed for each data partition and then merged during an aggregation phase, and one where a single backbone of an epsilon net is constructed from one partition and aggregates target label distributions from other partitions. Both approaches are applied to the problem of malware detection in a complex, real-world data set of Android apps using the nearest neighbor algorithm. Examination of the compression rate, computational efficiency, and predictive power shows that a single backbone of an epsilon net attains favorable performance while achieving a compression rate of 99%. |