Inferring semantic mapping between policies and code: the clue is in the language


A common misstep in the development of security and privacy solutions is the failure to keep the demands resulting from high-level policies in line with the actual implementation that is supposed to operationalize those policies. This is especially problematic in the domain of social networks, where software typically predates policies and then evolves alongside its user base and any changes in policies that arise from their interactions with (and the demands that they place on) the system. Our contribution targets this specific problem, drawing together the assurances actually presented to users in the form of policies and the large codebases with which developers work. We demonstrate that a mapping between policies and code can be inferred from the semantics of the natural language. These semantics manifest not only in the policy statements but also coding conventions. Our technique, implemented in a tool (CASTOR ), can infer semantic mappings with F1 accuracy of 70% and 78% for two social networks, Diaspora and Friendica respectively – as compared with a ground truth mapping established through manual examination of the policies and code.