Article,

Privacy Preservation in Analyzing EHealth Records in Big Data Environment

E. Srimathi, and K. Apoorva.
International Journal on Recent and Innovation Trends in Computing and Communication, 3 (4): 2421--2427 (April 2015)
DOI: 10.17762/ijritcc2321-8169.1504139

Abstract

Increased use of the Internet and progress in Cloud computing creates a large new datasets with increasing value to business. Data need to be processed by cloud applications are emerging much faster than the computing power. Hadoop-MapReduce has become powerful computation model to address these problems. Nowadays many cloud services require users to share their confidential data like electronic health records for research analysis or data mining, which brings privacy concerns. K-anonymity is one of the widely used privacy model. The scale of data in cloud applications rises extremely in agreement with the Big Data tendency, thereby creating it a dispute for conventional software tools to process such large scale data within an endurable lapsed time. As a consequence, it is a dispute for current anonymization techniques to preserve privacy on confidential extensible data sets due to their inadequacy of scalability. In this project, we propose an extensible two-phase approach to anonymize scalable data sets using dynamic MapReduce framework, Top Down Specialization (TDS) Algorithm and k-Anonymity privacy model. The resources are optimized via three key aspects. First, the under-utilization of map and reduce tasks is improved based on Dynamic Hadoop Slot Allocation (DHSA). Second, the performance tradeoff between the single job and a batch of jobs is balanced using the Speculative Execution Performance Balancing (SEPB). Third, data locality can be improved without any impact on fairness using Slot Pre Scheduling. Experimental evaluation results demonstrate that with this project, the scalability, efficiency and privacy of data sets can be significantly improved over existing approaches.

BibTeX key: Srimathi_2015
entry type: article
year: 2015
month: april
journal: International Journal on Recent and Innovation Trends in Computing and Communication
number: 4
pages: 2421--2427
publisher: Auricle Technologies, Pvt., Ltd.
volume: 3
DOI: 10.17762/ijritcc2321-8169.1504139
url: http://dx.doi.org/10.17762/ijritcc2321-8169.1504139

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 Srimathi_2015 %A Srimathi, E. %A Apoorva, K. A. %D 2015 %I Auricle Technologies, Pvt., Ltd. %J International Journal on Recent and Innovation Trends in Computing and Communication %K Anonymity Anonymization BigData Data Down MapReduce Specialization Top k %N 4 %P 2421--2427 %R 10.17762/ijritcc2321-8169.1504139 %T Privacy Preservation in Analyzing EHealth Records in Big Data Environment %U http://dx.doi.org/10.17762/ijritcc2321-8169.1504139 %V 3 %X Increased use of the Internet and progress in Cloud computing creates a large new datasets with increasing value to business. Data need to be processed by cloud applications are emerging much faster than the computing power. Hadoop-MapReduce has become powerful computation model to address these problems. Nowadays many cloud services require users to share their confidential data like electronic health records for research analysis or data mining, which brings privacy concerns. K-anonymity is one of the widely used privacy model. The scale of data in cloud applications rises extremely in agreement with the Big Data tendency, thereby creating it a dispute for conventional software tools to process such large scale data within an endurable lapsed time. As a consequence, it is a dispute for current anonymization techniques to preserve privacy on confidential extensible data sets due to their inadequacy of scalability. In this project, we propose an extensible two-phase approach to anonymize scalable data sets using dynamic MapReduce framework, Top Down Specialization (TDS) Algorithm and k-Anonymity privacy model. The resources are optimized via three key aspects. First, the under-utilization of map and reduce tasks is improved based on Dynamic Hadoop Slot Allocation (DHSA). Second, the performance tradeoff between the single job and a batch of jobs is balanced using the Speculative Execution Performance Balancing (SEPB). Third, data locality can be improved without any impact on fairness using Slot Pre Scheduling. Experimental evaluation results demonstrate that with this project, the scalability, efficiency and privacy of data sets can be significantly improved over existing approaches.

@article{Srimathi_2015, abstract = {Increased use of the Internet and progress in Cloud computing creates a large new datasets with increasing value to business. Data need to be processed by cloud applications are emerging much faster than the computing power. Hadoop-MapReduce has become powerful computation model to address these problems. Nowadays many cloud services require users to share their confidential data like electronic health records for research analysis or data mining, which brings privacy concerns. K-anonymity is one of the widely used privacy model. The scale of data in cloud applications rises extremely in agreement with the Big Data tendency, thereby creating it a dispute for conventional software tools to process such large scale data within an endurable lapsed time. As a consequence, it is a dispute for current anonymization techniques to preserve privacy on confidential extensible data sets due to their inadequacy of scalability. In this project, we propose an extensible two-phase approach to anonymize scalable data sets using dynamic MapReduce framework, Top Down Specialization (TDS) Algorithm and k-Anonymity privacy model. The resources are optimized via three key aspects. First, the under-utilization of map and reduce tasks is improved based on Dynamic Hadoop Slot Allocation (DHSA). Second, the performance tradeoff between the single job and a batch of jobs is balanced using the Speculative Execution Performance Balancing (SEPB). Third, data locality can be improved without any impact on fairness using Slot Pre Scheduling. Experimental evaluation results demonstrate that with this project, the scalability, efficiency and privacy of data sets can be significantly improved over existing approaches.}, added-at = {2015-08-27T09:01:26.000+0200}, author = {Srimathi, E. and Apoorva, K. A.}, biburl = {https://www.bibsonomy.org/bibtex/28787bc6f51fd0786cef9c6d0c9d67e10/ijritcc}, doi = {10.17762/ijritcc2321-8169.1504139}, interhash = {d3ebfb9bff5481a07ae89996ab9b4aaf}, intrahash = {8787bc6f51fd0786cef9c6d0c9d67e10}, journal = {International Journal on Recent and Innovation Trends in Computing and Communication}, keywords = {Anonymity Anonymization BigData Data Down MapReduce Specialization Top k}, month = {april}, number = 4, pages = {2421--2427}, publisher = {Auricle Technologies, Pvt., Ltd.}, timestamp = {2015-08-27T09:01:26.000+0200}, title = {Privacy Preservation in Analyzing {EHealth} Records in Big Data Environment}, url = {http://dx.doi.org/10.17762/ijritcc2321-8169.1504139}, volume = 3, year = 2015 }

BibSonomy

Privacy Preservation in Analyzing EHealth Records in Big Data Environment

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on