TY - GEN
T1 - Health Data Analytics with an Opportunistic Big Data Algorithm
AU - Chalumporn, Gantaphon
AU - Hewett, Rattikorn
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - In data-driven society, health data can lead to profound impacts on public safety policies, epidemic modeling, and advancement of health science and medicine. This paper presents an approach to automatically elucidating useful information from "Big" health data. In particular, we analyze manufactured cosmetic products containing chemicals that are known or suspected to cause cancer, birth defects, or developmental and reproductive harm. Our analysis is based on the Apriori algorithm, the heart of the popular Association Rule Mining to discover associations among sets of influencing factors. However, with rapid growth of huge amount of data, including ours, existing data analytics algorithms designed for in-memory data are not adequate. Most Big data analytics algorithms are implemented on MapReduce framework for execution in parallel and distributed environments. Unlike traditional implementation, our approach employs an opportunistic MapReduce-based Apriori algorithm to fully exploit parallelism. The paper describes the algorithm and presents our findings, from 113, 179 data instances, both in terms of the execution times and the discovered associations among product profiles. For a support threshold of 10% (5%,), 20 (53) association rules are obtained with an improved execution time over that of the traditional MapReduce-based algorithm by 14.6% (40.3%) on the average over three machines.
AB - In data-driven society, health data can lead to profound impacts on public safety policies, epidemic modeling, and advancement of health science and medicine. This paper presents an approach to automatically elucidating useful information from "Big" health data. In particular, we analyze manufactured cosmetic products containing chemicals that are known or suspected to cause cancer, birth defects, or developmental and reproductive harm. Our analysis is based on the Apriori algorithm, the heart of the popular Association Rule Mining to discover associations among sets of influencing factors. However, with rapid growth of huge amount of data, including ours, existing data analytics algorithms designed for in-memory data are not adequate. Most Big data analytics algorithms are implemented on MapReduce framework for execution in parallel and distributed environments. Unlike traditional implementation, our approach employs an opportunistic MapReduce-based Apriori algorithm to fully exploit parallelism. The paper describes the algorithm and presents our findings, from 113, 179 data instances, both in terms of the execution times and the discovered associations among product profiles. For a support threshold of 10% (5%,), 20 (53) association rules are obtained with an improved execution time over that of the traditional MapReduce-based algorithm by 14.6% (40.3%) on the average over three machines.
KW - Association rules mining
KW - Big Data Algorithms
KW - MapReduce
UR - http://www.scopus.com/inward/record.url?scp=85123041423&partnerID=8YFLogxK
U2 - 10.1145/3406601.3406628
DO - 10.1145/3406601.3406628
M3 - Conference contribution
AN - SCOPUS:85123041423
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 11th International Conference on Advances in Information Technology, IAIT 2020
PB - Association for Computing Machinery
T2 - 11th International Conference on Advances in Information Technology, IAIT 2020
Y2 - 1 July 2020 through 3 July 2020
ER -