Monday, April 02, 2018

The Crisis of Data Management in the West, and its Cure, a Perspective from China: 姜奇平 数据使用,谁是“裁判员”? [Jiang Qiping, Data usage, who is the "referee"?]

(Pix © Larry Catá Backer 2018)


I have just reported on China's new efforts to manage data--in this case with a focus on the protection of the value of robust scientific data.  These efforts are part of a part of a larger project of state directed centralization and management of data  in an all around effort to centralize and protect data harvesting, storage and management central to its use for the development of all areas of Chinese society, economy, law and culture.   (国务院办公厅印发《科学数据管理办法》 Chinese State Council issues "Administrative Measures for Scientific Data").


While to some extent this is done in the shadow of China's scandals respecting the integrity of data in scientific and technical work (with respect to which China has sought to better its reputation as a global competitor), it is also undertaken as part of a much larger project that seeks to use big data harvesting and analytics in broad efforts to transform the way that the productive forces of China are developed. Central to these efforts is the management of data harvesting--both what is harvested and the way in which harvesting takes place. In that context the issue of extracting data without the knowledge of the individual becomes a significant issue of ethics, and perhaps law. It is in that light that it is interesting to consider what an influential and high level Chinese commentator has to say about the analogous scandals that have recently erupted int he West--the harvesting of data by Facebook and its sharing with Cambridge Analytica for use in the management of political behavior during the course of the 2016 U.S. Presidential election (background here).  

Jiang Qiping is the director of the Informationalization and Internet Economy Lab, Network Economics Department of Quantitative Economics and Technology Economics, Chinese Academy of Social Sciences.  His view, that the function of data harvesting must be segregated form that of analytics, bears some consideration--though its application in China and the West will necessarily be different. But it does begin to suggest potentially a useful lime of thinking about the character of data and the nature of conflict of interest where data and analytics are aggregated.  The alternative is nationalization of metadata--either through tightly constructed regulatory schemes (perhaps like a public utility regulation) or direct state ownership.  It is clear that this is the direction China may be going, though the choices have yet to be entirely made.   The Western approach is still very much a work in progress--though whatever form it takes, it may involve a substantial degree of either the governmentalization of enterprises with a metadata harvesting functions (by privatizing regulatory objectives, e.g., more self policing with greater objectives based oversight). More likely it may also require less deception in the means by which data is harvested (e.g., through "quizzes" and other "games", e.g., here).

The essay 姜奇平 数据使用,谁是“裁判员”? [Jiang Qiping, Data usage, who is the "referee"?] follows.


数据使用,谁是“裁判员”?

数据“泄密门”持续发酵,脸书亡羊补牢,出台一系列新措施以加强隐私保护,但却被业界批评诚意不够。美国联邦贸易委员会表示已就此展开非公开调查。事件指向一个深层次问题:数据使用要不要区分“裁判员”与“运动员”?

  据媒体报道,剑桥分析公司与脸书合作,由前者开发了一款进行性格测试的脸书应用程序,以此访问获得了5000万活跃用户数据。然后依靠算法,预测他们的政治倾向。最后借助脸书的广告投放系统,向这些用户定向推送新闻,影响他们的投票行为。

  由此看出,事件的核心围绕着“脸书将数据开放给第三方”展开。尽管事后剑桥分析公司表示,并没有违反与脸书的相关协议,但其实,这就是问题所在。数据业一直存在行业不成熟期所特有的问题:不区分“裁判员”与“运动员”。

  就这起事件来说,脸书扮演了裁判员的角色,剑桥分析公司则是运动员。裁判员的基本准则是不偏向某个运动员。但脸书与剑桥分析公司之间的协议,相当于制定了一个偏向特定运动员的规则。问题是,脸书适合当裁判员吗?作为一家私营公司,脸书有权决定把具有公共性的数据给谁并决定别人如何使用吗?如果没有,谁给了它当裁判员的权力,或者说那个本该决定谁当裁判员的权力缺位在哪里?

  这一数据泄露事件不仅给美国带来挑战,也是大数据时代人类共同面临的挑战。

  一种解决办法是,将数据行业分为两部分。一部分是元数据行业,相当于裁判员;另一部分是应用数据行业,相当于运动员。元数据是指可派生应用数据的基础数据。元数据行业有权保管未经处理的原始数据,并依公开规则,管理应用数据行业对于数据的调用。其中,公开规则就是行规,必须对每一位运动员公平,不能与运动员订立私约。例如,脸书就不能与剑桥分析公司订立私约。而管理数据使用,则包括决定数据应当如何被使用。例如,原始数据是否需要模糊化处理后再使用。

  如果行规仍不足以保证公共利益,或可能使公共利益受损,则涉及数据的政府管制。政府需要把元数据行业当作特殊行业加以管制,考虑设定准入限制,或收归国有。关键看哪种办法更有效。

  有媒体在网上开展了一项调查,询问用户是否会信赖脸书等涉及用户隐私数据的平台。在目前参与调查的1.2万多个投票中,超过93%的用户选择了“不信任”,只有不到7%的用户选择了“信任”。这说明用户对脸书这样的裁判员“无证上岗”持否定态度。数据业要摆脱目前的困境,第一要自己立规矩,第二要接受公众对它立规矩。

(姜奇平 作者为中国社会科学院数量经济与技术经济所信息化与网络经济室主任)

Data usage, who is the "referee"?2018-04-02 07:32:58 Source: People's DailyThe data "Leaky Secret Gate" continues to ferment, the face book is destroyed, and a series of new measures are introduced to strengthen privacy protection. However, the industry has criticized the sincerity of the lack of sincerity. The U.S. Federal Trade Commission stated that it has conducted a closed investigation in this regard. The incident points to a deep-seated question: Does the data use distinguish between "referees" and "athletes"?According to media reports, Cambridge Analytical Co., Ltd. cooperated with Facebook. The former developed a Facebook application for personality testing, and accessed 50 million active user data. Then rely on algorithms to predict their political trends. Finally, with Facebook's ad delivery system, these users are directed to push news and influence their voting behavior.From this it can be seen that the core of the incident revolves around "Facebook opens data to third parties". Although afterwards Cambridge Analysis Company stated that it did not violate the relevant agreement with Facebook, in fact, this is the problem. The data industry has always had the unique problem of the industry immature period: there is no distinction between “referees” and “athletes”.For this incident, Facebook played the role of referee, and Cambridge Analysis Company was an athlete. The basic principle of referees is not to bias an athlete. But the agreement between Facebook and Cambridge Analytics is equivalent to setting a rule that favors specific athletes. The question is, is Facebook suitable as a referee? As a private company, Facebook has the right to decide who gives public data and decide how others use it. If not, who gave it the referee's power, or where is the power to decide who is the referee?This data breach incident not only poses challenges for the United States, but also is a common challenge faced by humanity in the era of big data.One solution is to divide the data industry into two parts. One part is the metadata industry, which is equivalent to referees; the other is the application data industry, which is equivalent to athletes. Metadata refers to the basic data from which application data can be derived. The metadata industry reserves the right to keep unprocessed raw data, and to manage the application data calls by the application data industry according to open rules. Among them, the open rule is the rules and regulations. It must be fair to every athlete and must not enter into a private contract with the athlete. For example, Facebook cannot enter into a private contract with Cambridge Analytics. Managing data usage includes determining how data should be used. For example, if the raw data needs to be blurred before use.If the rules are still not enough to guarantee the public interest, or may cause the public interest to be damaged, the government regulation involves data. The government needs to regulate the metadata industry as a special industry, consider setting access restrictions, or nationalizing. The key is to see which method is more effective.Some media carried out a survey on the Internet and asked whether users would rely on platforms such as Facebook that involve user privacy data. Of the more than 12,000 polls currently participating in the survey, more than 93% of users chose “distrust”, and less than 7% of users chose “trust.” This shows that users have a negative attitude toward referees such as Facebook who have “no license”. To get rid of the current predicament, the data industry must first establish its own rules, and the second must accept the public to establish rules. 

 (Jiang Qiping is the director of the Informatization and Network Economics Department of Quantitative Economics and Technology Economics, Chinese Academy of Social Sciences)

No comments: