Future Work - Privacy Guarantees in Geolocation Services

A great difficulty of this work was to discover a way to include those persons in the privacy protection when exposing important points of the trajectory because we know that we were dealing with sensitive and private data and we want them to be safe but at the same time useful which also contributes to the difficulty of this work.

About future work, we considered:

• To identify whether or not there are better solutions that can reduce the risk of a Bayesian Inference attack.

Bibliography

[1] Ciriani V., De Capitiani di Vimercati S., Foresti S., Samarati P.: k-Anonymity. Security in Decentralized Data Management, ed. Jajodia S., Yu T., Springer, 2006.

[2] L. Sweeney, k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 10, 557–570 (2002).

[3] A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.

[4] Mayil Vel Kumar, P., Karthikeyan, M.: L diversity on k-anonymity with external database for improving privacy preserving data publishing. International Journal of Computer Applications (0975 – 8887) 54(14) (2012).

[5] X.-C. Yang, Y.-Z. Wang, B. Wang, and G. Yu, “Privacy preserving approaches for multiple sensitive attributes in data publishing” Chinese Journal of Computers, vol. 31, no. 4, pp. 574–

587, 2008.

[6] Y. Jing and W. Bo, “Personalized l-diversity algorithm for multiple sensitive attributes based on minimum selected degree first” Journal of Computer Research and Development (China), vol.

49, no. 12, pp. 2603–2610, 2012.

[7] Xiao, Xiaokui, and Yufei Tao. “Anatomy: Simple and Effective Privacy Preservation.”, 2006.

[8] Li N., Li T., Venkatasubramanian S: t-Closeness: Privacy beyond k-anonymity and l-diversity.

ICDE Conference, 2007.

[9] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pages 265–284, 2006.

[10] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94–103, 2007.

[11] Luk Arbuckle Khaled El Emam, Anonymizing Health Data: Case Studies and Methods to Get You Started, 1st ed.: O'Reilly Media, 2013.

[12] M. Andrés, N. Bordenabe, K. Chatzikokolakis, C. Palamidessi: Geo-indistinguishability:

differential privacy for location-based systems. In: Proc. of CCS, ACM (2013) 901–914

[13] M. E. Gursoy, L. Liu, S. Truex, L. Yu, and W. Wei. 2018. Utility-Aware Synthesis of Differentially Private and Attack-Resilient Location Traces. In 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18), October 15–19, 2018, Toronto, ON, Canada. ACM, New York, NY, USA, 16 pages.

[14] F. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. Commun. ACM, 53(9), 2010.

[15] K. El Emam, F. K. Dankar, Protecting Privacy Using k-Anonymity. Journal of the American Medical Informatics Association. 15, 627–637 (2008).

[16] Y. Rubner, C. Tomasi, L. J. Guibas, Earth mover’s distance as a metric for image retrieval.

International Journal of Computer Vision. 40, 99–121 (2000)

[17] E. Keogh, C. A. Ratanamahatana, Exact indexing of dynamic time warping. Knowledge and Information Systems. 7, 358–386 (2005).

[18] W. Qardaji, W. Yang, N. Li, Differentially private grids for geospatial data. in Proceedings - International Conference on Data Engineering 757–768 (2013).

[19] D. M. Endres, J. E. Schindelin, A new metric for probability distributions. IEEE Transactions on Information Theory. 49, pp. 1858–1860 (2003).

[20] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, L. Damas, Predicting taxi-passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems. 14, 1393–1402 (2013).

[21] C. Dwork, Differential privacy, in Proc. of ICALP, 2006, pp. 1–12

[22] B. Gedik and L. Liu. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. Mobile Computing, IEEE Transactions on, 7(1):1–18, 2008.

[23] Z. Montazeri, A. Houmansadr and H. P.-Nik. 2017. Achieving Perfect Location Privacy in Wireless Devices Using Anonymization. IEEE Transactions on Information Forensics and Security 12, 11 (2017), 2683–2698.

[24] V. Bindschaedler, R. Shokri. Synthesizing plausible privacy preserving location traces. IEEE Symposium on Security and Privacy (S&P). IEEE, 546–563. 2016.

[25] H. Ngo and J. Kim, Location Privacy via Differential Private Perturbation of Cloaking Area, 2015 IEEE 28th Computer Security Foundations Symposium (CSF), Verona, Italy, 2015, pp. 63-74.

[26] L. Yu, L. Liu, C. Pu, Dynamic Differential Location Privacy with Personalized Error Bounds. in Proceedings 2017 Network and Distributed System Security Symposium (Internet Society, 2017)

[27] R. Shokri, G. Theodorakopoulos, C. Troncoso, J.-P. Hubaux, J.-Y. Le Boudec. Protecting Location Privacy: Optimal Strategy against Localization Attacks. ACM CCS 2012 617–627.

[28] N. E. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Optimal Geo-Indistinguishable Mechanisms for Location Privacy. in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS ’14 251–262 (ACM Press, 2014).

[29] S. M. Meystre, F. J. Friedlin, B. R. South, S. Shen, M. H. Samore, Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Research Methodology. 10 (2010).

[30] O. Ferrández et al., BoB, a best-of-breed automated text de-identification system for VHA clinical documents. Journal of the American Medical Informatics Association. 20, 77–83 (2013).

[31] N. Mamede, J. Baptista, F. Dias, Automated anonymization of text documents. in 2016 IEEE Congress on Evolutionary Computation, CEC 2016 1287–1294 (Institute of Electrical and Electronics Engineers Inc., 2016).

[32] T. Brinkhoff, A framework for generating network-based moving objects. GeoInformatica. 6, 153–180 (2002).

[33] R. Chen, B. C. M. Fung, P. S. Yu, B. C. Desai, Correlated network data publication via differential privacy. VLDB Journal. 23, 653–676 (2014).

[34] Li, H., L. Xiong, and X. Jiang. 2014. “Differentially Private Synthesization of Multi-Dimensional Data Using Copula Functions.” In Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings.

[35] D. Jurafsky and J. Martin. Information extraction. Communications of the ACM, 2005

Appendix A

This section has the same evaluation mean values as presented in section 5 but now the data it is presented from a point of view of the noise values.

0,145 0,365 0,067 0,708 0,726

0,136 0,36 0,068 0,706 0,726

0,13 0,362 0,068 0,708 0,725

0,122 0,362 0,066 0,707 0,727

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8

Query AvRE FP AvRE Length Error FP Similarity Kendall-tau

NOISE = 1

Noise replication Noise division Uniform noise Crescent uniform noise

Query AvRE FP AvRE Length Error FP Similarity Kendall-tau

NOISE = 2

Noise replication Noise division Uniform noise Crescent uniform noise

0,126 0,361 0,067 0,702 0,726

Query AvRE FP AvRE Length Error FP Similarity Kendall-tau

NOISE = 3

Noise replication Noise division Uniform noise Crescent uniform noise

0,127 0,367 0,068 0,698 0,726

0,132 0,364 0,067 0,71 0,727

0,13 0,363 0,067 0,712 0,727

0,12 0,361 0,067 0,712 0,727

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8

Query AvRE FP AvRE Length Error FP Similarity Kendall-tau

NOISE = 4

Noise replication Noise division Uniform noise Crescent uniform noise

Appendix B

We have increase the number of cells up to 10x10 to see what would happen to the privacy and utility value. As we can see below there are no areas of low risk, where in a map of 6x6 it exists 2 low risk areas, and now the majority areas have middle risk. The lowest EMD mean value of all areas is 442,6314 with a standard deviation of 6,7052 and the highest in the map is 2874,1587 with a standard deviation of 30,8464. So, with a 10x10 map we are more vulnerable to an Bayesian inference attack.

When it comes to utility the percentage of error has increased and the similarity decreases.

So, to publish important points the 6x6 map is better than using a 10x10 map.

0,32 0,667 0,104 0,442 0,654

0,123 0,364 0,067 0,705 0,727

0,184 0,53 0,065 0,572 0,733

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8

Query AvRE FP AvRE Length Error FP Similarity Kendall-tau

Size of Brinkhoff map

3x3 6x6 10x10

High risk

Low risk

No data available

No documento Privacy Guarantees in Geolocation Services (páginas 70-78)