Saturday, November 14, 2020

The Penn State CSR Lab 2020 Report No. 2: Creating a Ratings System for Business Compliance with the OHCHR Accountability and Remedy (ARP) III Project--The The Basis of Measurement and the Challenge of Measurability


This post continues the reporting of the work oundertaken by my class on Corporate Social Responsibility (CSR) Law at Penn State Law (Creating a Ratings System for Business Compliance with the OHCHR Accountability and Remedy (ARP) III Project--The Penn State CSR Lab 2020).

By the completion of the project, Penn State’s CSR Lab 2020 will (1) develop a concept paper touching on the need for, construction, methodology and use of an ARP III rating system for enterprises; (2) Produce a methodology justified by and through the great principles of the UNGP; (3) describe the way in which data warehouses will be managed and used; (4) apply the rating system to an initial group of enterprises; (5) produce a Report detailing the results of that first rating application along with recommendations for internal improvement and external responses to the rating; and (6) consider recommendations for strengthening the public and private law systems with respect to which the key factors that produce the ratings are based.

Started in 2018, Penn State's CSR Lab is an informally constituted collective of graduate students undertaking the study of corporate social responsibility. They include Law Students and graduates seeking the LL.M. degree at Penn State Law, and graduate students enrolled in the School of International Affairs working under the guidance of Larry Catá Backer and with the assistance of the Coalition for Peace & Ethics. The 2020 Project builds on the Report prodiced by the Penn State CSR Lab in 2018.  See Penn State CSR Lab: Report and Observations on Non-State Based Non-Judicial Mechanisms on the Ground (Prepared Feedback for the OHCHR Accountability and Remedy (ARP) III Report (including eight recommendations directed to the ARP III Team)).

The work of the Penn State CSR 2020 Lab, then, picks up where the 2018 left off.  Penn State's CSR Lab along with CPE is attempting to create a corporate rating system for compliance with the UN's Accountability and Remedy Project relating to non-state non-judicial grievance mechanisms. We will report on our progress as is members, students drawn from the Law School and the School of International Affairs work through the problem. The ultimate object is to enhance the capacity of even modestly funded organizations or groups to develop and apply their own accountability measures to relevant actors. The Penn State CSR Lab 2020 and CPE will provide brief updates of its progress. 

Report No. 2 the CSR Lab considers the problem of measurability--that is, of the way on which it may be possible to translate the ARP III policy objectives into discrete actions, events, or facts that may be counted. 


  Index of Reports

2. The Challenge of Measurability.

The PSU CSR Lab noted in its first report that the initial challenge for the construction of any rating system is the selection of norms, principles, and objectives, against which conduct will be rated. Quantification--the development of measurable markers that serve as the embodiment of the obligations, responsibilities or goals, the attainment of which is to be measured and ultimately assessed--is the task at the heart of the reductionism and translation inherent in data driven governance.  It is also the crucial means of developing useful methods for assessment. 

Note that this is not meant to dismiss qualitative analysis and measures. These have always been the stuff of law. Indeed, one way of understanding law is as a system for the convergence of narrative, of qualitative markers, within an ecology, placement within which produces  assessment and ultimately permits a judgment of consequences to be made authoritatively.  Every case is a story; every story is a reaffirmation of the normative basis for seeking the assembled facts in a way that makes sense to the speaker and audience (lawyer and judge). Each story is connected to others built around the same narrative strictures.  These narrative structures, the skeletons of coherence and systemicity in law, is understood by lawyers and judges by their classical terms: tort, contract, property, and its stationary variants (with respect to which the same narrative universe is built around interpretation as a qualitative exercise of embedding stories within the text of the state/regulation). The sum of qualitative assessment, o these stories, provides the foundation within which any additional story may be embedded, and si embedded judged.  That produces both the conservatism of law (especially common law) but also its ability to move with the times as the quantum of stories never ends and constantly changes as one generation gives way to the next and as popular culture (in an interactive conversation with the narratives in law) moves sometimes decisively and sometimes at the margins. (For more detailed discussion, see, e.g., my prior work  here: Chroniclers in the Field of Cultural Production: Courts, Law and the Interpretive Process; Tweaking Facts, Speaking Judgment Judicial Transmorgrification of Case Narrative as Jurisprudence in the United States and Britain). 

Traditionally that challenge was more easily met. One looked to the state for norms.  One measured legal compliance.  One sought the markers of that compliance in legal requirements, and then in administrative regulation (and perhaps practice) for the bulk of these principles. That required, in turn, a sense of either what was permitted, or what was prohibited. In some cases, though, norms could be reversed engineered.  In those cases, principles might not have been articulated, but there might exist a consensus of what the ideal appeared to be--based on reputation or other factors.  Once one was able to define the ideal "type" of institution, or the ideal related to activities or objectives, it would then be a matter of breaking this ideal down into its critical parts and abstracting them into norms or principles.  Either way, norms or core principles (derived from legalized objectives or ideal types) provided the basis for measurement--that is for assessment of the class of actors or actions that were to be the object of assessment. 

Data driven governance works in a similar way (the human mind is really quite simple and tends to repeat patterns endlessly but in endless variation that might appear unrelated).  Yet it is not the same.  In a sense data governance and its analytics approaches the question of analysis (and the assessment that follows) from a different angle.  At its core is not narrative streams but values. Narrative is de-centered.  Narrative is understood as a singular expression of an iterative set of relations that can be  judged, understood and predicted when reduced to its essence.  The core task, then, is not to layer narrative (and its assessment) to produce a web of judgment  that can be used to predict the outcome of future narratives by reference to prior stories.  Instead, the core task is to produce value.  To that end one starts with objective or task.  And to it one attaches markers that suggest its accomplishment.

This can take a number of forms, two of which are worth noting.  The first focuses on constructing the ideal state from the objective, policy, or task and then measuring  against the ideal.  That, in turn, requires disaggregating the ideal into its component parts and assessing the extent to which these appear in the subject to be judged.  The disaggregation itself constitutes the governance--because that aggregation involves a process of choosing the "correct" behaviors and distinguishing those (which now have value) from others that may have less or no value as a function do the ideal state. An example--from the objective of gender diversity in the board room of corporations can produce an ideal state in which 50% of all board members must be gendered female (or other than male). In a variation, it might require an ideal board to have representation from major identified gender groupings. What has value is what is identified as gender; that value is a function of its connection with appointment to a board.  It is in the choice of ideal state that regulatory power is expressed in data driven terms.  The second focuses on breaking down a task into its component parts and then assessing both the existence of these parts and developing measures for impact or success.  Impacts or success, in turn, would be a function of the core objectives of the mandate delegated to the implementing body (in many cases an enterprise). Consider a real life example: In November 2020 the World Health Organization unveiled its country based public health and social measure (PHSM) responses (discussed in EURO WHO Dashboard and the COVID-19 situation in the WHO European Region).

This application provides detailed timelines of countries’ public health and social measure (PHSM) responses and epidemiological situations. The “Country Analysis” timeline displays how individual PHSMs within a selected country form the country’s composite PHSM response, and the “Regional Overview” timeline presents the historical progression of PHSMs across countries in the WHO European Region. 

To develop this measure WHO effectively determined the six primarby elements of effective health and social measures in meeting the challenge of the pandemic and then measured responses as a function of these six indicators. "The development of data indicators is interesting for the choices it represents and is incentive effects to coherence in public policy responses by states. They all focus on systematic restrictions on movement and on the wearing of masks." ( EURO WHO Dashboard and the COVID-19 situation in the WHO European Region).The consequence is clear--in determining the indicators WHO effectvely determined the scope of acceptable response to COVID (everything else received a value of ZERO and was thus not counted.  If it is not counted it does not exist, and if it does not exist it does not go toward meeting whatever obligation a state might  be deemed to have to develop effective countermeasures against COVID 19. 

Note the way that the objective--delay the spread of the COVID-19 virus (complete interdiction in the short term is not possible--but reducing indiscretion rates makes it possible to treat those who get the virus more effectively) must then be implemented through measures that are context related but designed to identify the vectors of contacting the virus and controlling those vectors. .This objective--contain the spread 0f the virus--is then reduced to its essence in the form of 6 indicators: (1) mask wearing (personal transmission); (2) closing schools; (3) closing places where adults congregate (businesses, etc.); (4) suppressing other gatherings of adults and children; (5) restrictions on movement within a state; and (6) restrictions on movement between states.




Ordinal Scale


Wearing of Masks


0 - No mask policy

1 - Recommended wearing masks in any setting

2 - Require wearing masks on a risk-based approach (in settings where physical distancing is not possible, e.g. public transport, retail, refugee camps)

3 - Require wearing masks universally (in any setting in the community and in any transmission scenario)




Closing of schools

0 - No measures
1 - Recommend/Require adapting in-person teaching (physical distancing, hand hygiene, staggered arrival, separate entrances, etc.)
2 - Recommend/Require suspension of in-person teaching (transition to online or distance learning)
3 - Require suspension of in-person teaching on some levels or categories (e.g. just secondary schools)
4 - Require suspension of in-person teaching on all levels


Closing of offices, businesses, institutions and operations


0 - No measures 
1 - Recommend closing (or work from home) and/or recommend/require adapting (e.g., implementing sanitary measures)
2 - Require closing (or work from home) for some sectors or categories of workers
3 - Require closing (or work from home) for all-but-essential services (e.g. grocery stores, pharmacies, and doctors)


Restrictions on gatherings


0 - No restrictions
1 - Restrictions on very large gatherings (the limit is above 1000 people)
2 - Restrictions on gatherings between 101-1000 people
3 - Restrictions on gatherings between 11-100 people or restrictions on certain types of gatherings (e.g. religious, sporting, cultural, or national events)
4 - Restrictions on gatherings of 10 people or less or ban on all types of gatherings


Restrictions on domestic   movement


0 - No measures
1 - Recommend not leaving house and/or recommend limiting domestic movement
2 - Restriction on domestic movement (e.g. ban on travelling between or into certain regions or outside a certain radius from place of residence)
3 - Requirement not to leave house with exceptions for the following: essential activities (grocery shopping and ‘essential’ trips), daily exercise, limited social interactions (visiting family or friends), or travel to other places of residence
4 - Requirement not to leave house with exception only for essential activities (grocery shopping, pharmacy or 'essential' trips)
5 - Requirement not to leave house with exceptions for essential activities (grocery shopping or 'essential' trips) allowed only under certain conditions (e.g. allowed to leave house only once a week, during designated timeslot or only one household member can leave at a time)



Limitations to international travel


Entry ban and/or visa restriction

E0 - entry ban and visa restriction for no countries

E1 - entry bans and/or visa restrictions for select countries (entry ban/visa restriction for at least one country but open to more than 100 countries)

E2 - entry bans and/or visa restrictions for some countries (open to 10 - 100 countries)

E3 - entry bans and/or visa restrictions for majority of countries (open to less than 10 countries)

E4 - entry bans and/or visa restrictions for all countries


Quarantine and/or COVID-19 test

Q0 – Quarantine and negative COVID-19 test for no countries

Q1 – Quarantine and/or negative COVID-19 test for one or more countries

Q2 – Quarantine and/or negative COVID-19 test for all countries

Each of these indicators is then reduced in turn to a set of measurable acts or events.  That determination, however, in turn, was a function of the object of accountability.  In this case the object of accounting is the state. Why? Because the WHO as a public international organization is constituted as an organ of and serves states. It is then important to remember that the measures have been different if the object of accountability been different. Since the state is the object of accountability it is then necessary to determine what are the principal objects of measurement.  There were two choices here.  The first is to measure formal compliance with the indicators (the essence of the realization of objectives).  This centers measurability on public expressions of compliance--laws, rules, and practices as written into policy and other official pronouncements of states.  These are usually the easiest to measure precisely because states tend to be quite transparent about its formal measures--whether to not they are enforces.  The second to to measure effective compliance with the indicators.  That is this set of measures would actually count the extent to which measures change behavior.  It looks to the implementation rather than to articulation.  In this case WHO chose formal compliance measures.  In the case of mask wearing for example, states were rated on a scale of 0-4 based on the relationship between the ideal form of mask regulation (require wearing mass universally) the its opposite (no regulation.  An effective compliance measure would have measured implementation (for example by the number of enforcement actions against non complying individuals).

It becomes clear that identifying the coherent universe of relevant norms from which it might be possible to develop measures, as well as the measures themselves, is easier said than done, and involves choices that are inherently regulatory in nature. The problem of course is one of translation.  The normative exercise (forward or reverse engineered) is a qualitative exercise. It is grounded in narrative and operates on the basis of ambiguity that gives the administrator or enforcer some leeway in interpolation.  The practice of common law equity provides a nice example. It is at once both facts driven and based on the shared assumption that great principles might be applied through the intermediation of commonly held values reasonably applied. However, the exercise of measurability is inherently quantitative.  It is grounded in precision and certainty.  It tends to avoid ambiguity and reduces all facts to relations which when appropriately measured against each other in accordance with a formula (the way in which relational values are summed) produces a result (a judgment usually).

Pix Credit HERE

More importantly, perhaps, qualitative exercises can afford a certain redundancy. Redundancy is law suggests an overflow, an excess, an effort to treat the same object from multiple perspectives. It is a sort of "belt and suspenders" approach that is common in law: for example the criminalization of a wrong that also produces private civil causes of action (corporate tax fraud). A similar approach is taken with respect to other regulation of business: violation of law may produce an action for breach of fiduciary duty.  Redundancy can sometimes produce regulatory incoherence--for example a state may enter into a treaty respect inbound foreign investment that effectively insulates foreign companies from the application of domestic human rights legislation.  Redundancy in data driven systems achieves similar ends.  But its effects on measuring also produces challenges. It serves primarily as a means of refining the quality of compliance and ensuring that gaps are filled in from distinct operational or compliance perspectives. Redundancy also is a means of refining qualitative measures in at least two respects.  First to the extent it serves as points of emphasis, repetition invites valuing what is repeated more than that which is not.  Second, repetition might suggest vectors of relationships between principles which might otherwise be obvious, or to emphasize those connections.  Both would have to be incorporated in any translation to quantitative measures. The failure to take these redundancies into account in ways that reflect their effect might otherwise distort the measure.


Redundancy is to be distinguished from repetition. In the simplest example, caselaw is effectively an exercise in infinite repetition of fact patterns and judgments the identity with other similar patterns and judgements. Judges and lawyers serve as the adepts in bundling repetition to advance the legitimacy of their own argument.  Indeed, repetition is the essence of case law.  Repetition in data driven systems provides a means of testing the robustness of relations and of patterns.  They go to both issues of replicability and predictability.  Repetition serves as a means of bridging the space between description and measurement on the one hand, and predictability on the other.  It thus becomes necessary in translating qualitative norms into quantitative measures to be aware of both redundancies and repetitions, as well as to ensure that the reduction from broad principle to a more precise set of specific actions or events accurately captures the ambiguity of normative baselines.  

These abstract considerations become clearer when applied to the challenge of measuring enterprise compliance with ARP III objectives. translation of the ARP III principle-objectives into measurable action or events or conditions. That requires first a comfort with the ARP III objectives as suitable for the translation exercise. Suitability is primarily a function of the amenability of objectives to reduction--that is of their capacity to describe the essence of the behaviors, taboos, standards or expectations demanded.  And then it requires a amenability to translation from its narrative (qualitative) essence to a set of actions, events--to a set of signs (understood in a semiotic sense) that  represent that essence in a way that can be counted or otherwise measured. That measurement, in turn, must be made in relation to something--a baseline that itself must be extractable from the aggregation of signs (of measurable events, conditions, etc.) that represent the qualitative objectives.  That combination provides the foundation on which it is possible to rate--that is to measure performance, and to measure it against a standard, and to assign a value to that assessment.  

No comments: