Open (Data) Science – Data Management is Key

Till the end of the day, it is rare for any finding to be considered completely irrefutable or beyond scrutiny.

Psychology degrees do give you superpowers. No, you can’t read minds, but you can shut your eyes and perform t-tests on SPSS. There is a point in time where I can rely on muscle memory alone to get the p-values popping up my screen. I can see how for simple analysis point and click solutions can be much preferable.

It wasn’t until my final year project that I started to really recognise the importance of transparent and replicable codes and research processes. I had to co-collect data along with my peers, we each collect half of the data. The data collection process is by no means easy: a collage of cognitive tests, long structured interviews etc. keeping the participants (mainly other undergraduate students) engaged and put effort was always a challenge.

All is well until I discovered my data collection partner was popping in random numbers in the dataset. “What is she doing?!” Shocked I was seeing this, but even more shocked when it turned out that she forgot to record age and gender in the interviews. Not having gender is less relevant, but age is a key variable we have to take into account for the distribution of IQ differs by age. I was quite disappointed as this meant that the data was compromised, and I couldn’t bring myself to really trust what the data could tell me – if basic demographics are made up, how credible can other information be?

I was annoyed, just like this cat.
Photo by Anna Shvets on

I think of this experience often as I read research papers that does not describe their data collection and analysis plan very well. In academia, more people are willing to share the codes they used for analysis. I believe the next step is to extend the transparency to data cleaning and management processes. When describing the process of data curation, it is easy to focus on describing the psychometric properties of the questions. However, I believe there are a lot of wisdom research groups can benefit from sharing their responses to more general questions like: How is the database managed? Were there any challenges to data management, how were they resolved? Studies and initaitives like PROSPER (PROfeSsionnal develoPmEnt for Research methodologists), helps consult and formulate future developments plans for research methodologists. I am happy to see how things are developing, and for methodologists to finally gain the limelight a bit more in research!

Till the end of the day, it is rare for any finding to be considered completely irrefutable or beyond scrutiny. The best we can do as researcher, in my opinion, is for future researchers to acknowledge as they read our findings, and say: “They were bound by the knowledge of their time, that was the most rigorous way they could have done it!”.

A key development goal from my PhD is the ability to develop codes and wrangling with data across Stata, R and Python. I am still learning the way to work across platforms that would make the most sense! Do share with me if you have any tips 🙂


Chapter 2: Current approach to ethnic representation

Construing means as outcome may overestimate true progression to a more equitable HEI.

21st century is an era of metrics. Measuring and demonstrating impact becomes essential to research publications. This realist “only measurable changes are true changes” perspective dominates how “evidence” is conceived. The same line of logic was applied to promoting EDI initiatives.

We often treat EDI representation as visible representation, as they are more measurable. The aim is to get people of certain membership of a group (e.g., Asian) to attain an ideal proportion in a certain measurement of equity (e.g., promotion). For example, proportion of non-white people on interview panels, international student percentages etc. However, in pushing for a wider visible representation (definition 1) to be achieved, we assumed people who share those characteristics (1) are necessary to represent the groups’ rights (definition 2).

Assumption that AR leads to RR, which in turn leads to Equity
AR is not necessary nor sufficient to attain Equity

For example, in the Sewell report, the ethnic diversity of the police force becomes a target of intervention, with the underlying theory of change that once the (appearance) representation problem is solved, minoritized communities would regain equitable rights compared to their white counterparts. Another example, EDI groups in HEI often require a certain demographic make-up, inadvertently putting pressure on minorities to contribute. This follows the same line of logic that once the EDI group is diverse, the diverse needs will be addressed. There are numerous counterexamples that visible representation do not automatically achieve rights representation, black on black violence, the countless stories about those who made it became the gate-keeper to enter “high society”, hey ho, look at the faces behind UK Illegal Migration Bill 2023.

No doubt, having representation from minoritized groups can be a reflection of underlying change in power structure, equality and resource allocation. But that cannot be the only means of measuring change in our society. As Universities are incentivised to push for different awards recognising their efforts on EDI, when the only outcome measure focuses on superficial appearance representation, we might overestimate our progress to equity.

We need people who can fight for the rights of the underprivileged, and empower the minoritized, such that appearance representation would be the natural outcome of a changed landscape. This is a strong argument for people in power, often White and British, to take initiative. The misplaced emphasis on “measurable” outcomes became a hinderance to progress, as we phantasies for an easily measurable solution. Our current approach to ethnic representation does not promote this vision.

This conflict in apparent progress and on-the-ground experience among ethnic minoritized members of HEI is a source of frustration. I shall touch on this in more detail in Chapter 4.

In the next chapter, I will describe 2 flaws in how ethnicity and ethnic representation is discussed, and hoping to elucidate the power constructs that were so deeply embedded in our social interactions that may slow, or mimic progress in promoting ethnic equality.

* Appearance Representation: A depiction or portrayal of a person or thing, typically one produced in an artistic medium – definition 1.

Rights Representation: The action of standing for, or in the place of, a person, group, or thing, and related senses – definition 2.

Blog Series: Chapter 1: Define Representation and Why It Matters

This is the first part of my reflection serving as a member of the Equality, Diversity and Inclusion (EDI) team at the University. The series is titled: On Ethnic Representation and Equity: The Costs of Conflating Means as Goals.

This is the first part of my reflection serving as a member of the Equality, Diversity and Inclusion (EDI) team at the University. The series is titled: On Ethnic Representation and Equity: The Costs of Conflating Means as Goals.


The UUK & NUS report in 2019 reported that less than 2% of 19,000 professors in the UK higher education institutes (HEI) are non-white women. Improving the representativeness of UK HEI staff and students became a priority for Equality, Diversity, and Inclusion initiatives. Proposed solutions included racial equity hiring practices, such as having a more diverse interview board, and purposeful advertisement within the targeted populations. Whilst I appreciate the equity-based approach to improve ethnic representation, I worry that the current framing of “representation” would divert attention away from cultivating a culture that embraces unity in diversity. Despite continual effort, mainly by people from racialised communities, ethnic minorities continue to feel tokenised and marginalised in academia. In this article, I would re-assess the logic behind the current EDI approaches to define and improve representation, point out the intrinsic flaws of the current definition of representation, and propose potential barriers for UK HEI to re-calibrate the direction for improving representation. I argue that the philosophical positioning behind current approaches to promoting EDI conflates means as goals, and might limit our ability to evaluate whether we have truly promoted equity within HEI.

Chapter 1: Define Representation and Why It Matters

“Representation” is typically defined in the following 2 ways (Oxford English Dictionary):

  1. A depiction or portrayal of a person or thing, typically one produced in an artistic medium.
  2. The action of standing for, or in the place of, a person, group, or thing, and related senses.

I will refer to definition (1) as “Appearance Representation” (or Visible Representation), and definition (2) as “Rights Representation”. In my opinion, the need to represent arise as a product of “differences”. For example, “appearance representation” showcases something spectacular, it captures something that is different from the norm; “rights representation” serves the purpose of settling different opinions within or between individuals and communities e.g., legal representatives, political party representation.

Who is HEI trying to represent? What does a well-represented HEI look like? I believe this is determined by 2 main factors: the size of targeted community and school philosophy.

Depending on the size of the institute, the targeted community to be represented should be reflective of the local community (regional, e.g., Lambeth, London), the city the institute is based at (e.g., London), nation or country (e.g., England), or even the world. There is little point for a local primary school of 100 pupils in Kent to be representative of world population, which would mean >90% of White British pupils in Kent would have to compete for <10% of the places, essentially excluding most from education. Similarly, the proclaimed world-class international universities should recruit staff and students that is reflective of their targeted community, or at least their renowned global reputation. This view mirrors that of the suggestion made in the Sewell Report (aka the Commission on Race and Ethnic Disparities), e.g., “to make police forces more representative of local communities”.

School philosophy refers to the beliefs the HEI have regarding the (distribution of) characteristics in an ideal world. School philosophy may take precedent of the size of target population. Take women in academia as an example, it is not as simple as wanting an overall proportion of men and women in HEI that is reflective of the community. It is believed that women are disproportionately lost from academia, and that this has stifled academia from reaching its full potential (premise of Athena Swan). Acknowledging the hegemonic masculinity that persists in society (and academia), extra effort is required to promote and protect the rights and platform for women to develop their academic career. This approach to think about representation considers the social structures of the present and help us avoid reconstructing the inequalities that is presently observed in the community.

In essence, in a well-represented HEI, all groups should be represented, in terms of “Appearance” (in terms of number/proportion) and “Rights” (in terms of platforms/priorities), that is in-line with the institute’s philosophy, and proportionate to the size of the targeted community they are serving.

To be continued…

Are theories over-rated?

A short reflection based on my observations on trends in mental health research. With audio narration.

Listen to the blog here.

Research methodology 101 in psychology typically starts by explaining statistical hypothesis testing, how data can be understood in a certain way (model) to draw inference. A theory-based statistical model is the approach in which researchers make meaning out of the constellation of data-points – in a systemic and falsifiable way that differentiates inferences from astrology.

Research is not easy. There are many decisions and assumptions researchers make in the process, e.g., how are concepts defined, how are these concepts measured, what are the relationship between these variables, do they overlap? Researchers design, clean, collect and frame data in a way such that they can tell a story – Data may speak for itself, but the theatre is built by the researchers. It is more than choosing which variables to put into the model, or discover which variables are statistically associated with the predictors. It is about how the confirmation or rejection of the statistical model should be interpreted, in what context, for which populations – and more.

Psychology research methods 101 – Hypothesis
Photo by Tara Winstead on

The industrial revolution automated jobs and led to an expansion of productivity; the “artificial intelligence (AI) revolution” appears to share similar aims. The first questions that pop to people’s minds are – “Can we automate this process? If so, how?” The same ideology has been applied to understanding data – there are AI models spring up like mushrooms after rain, with approaches like “covariate auto-selection” that promises to perform as good as (or outperforms) “traditional analysis” – whatever that means.

I am no fan of such practices. This is because I think data analysis is only a small part of the whole scientific process, there are limited ways you can “let the data speak” if the paradigm of data collection, conceptualisation etc. is never challenged. This AI-do-all approach, if deemed to be the best, or even worse, the default practice, will leave little room for users to challenge the premises and assumptions in which the inference are drawn, hence no true empirical theoretical advancements, but post-hoc theory-making. But can you really blame AI data scientist for this?

There is no point in finger-pointing [maybe 1 >:o)]. The problem of weak theory is prevalent in (mental) health research (More discussion here on formal theory: – Eiko’s blogs, with a lot of resource on theory, do check them out!). An example that is highly relevant to my work is the use of ethnicity in health research – is it biology? Is it country of origin? Is it migration status? Is it social support and network? What is it’s relation with the covariates? Papers often describe whether their findings fit with previous research, but most of the time stopped at that level, “More research is needed”, and less discussion on theory. It is this tendency of focusing just on inference and less about theory that precipitates AI-based analytical practice to expand.

AI helps make meaning from a pre-specified framework
Photo by Tara Winstead on

This phenomenon begs the question, why is theory playing less of a role in mental health research? What is the driver behind this change in scientific practice? I believe a particular emotion – frustration – plays a role. I see this frustration arise from the huge implementation gap, and the insurmountable unmet needs, which is made worse by the replication crisis.

We are said to be in a mental health crisis. The healthcare system is more sensitive to detect mental health problems: they are recognised earlier and more broadly at primary care, but our ability to treat our patients did not improve to the same extent. It takes 17 years to translate health research into practice. IAPT, new waves of psychotherapy, medications… These attempts to improve service provision (by quantity/access) and quality did not match the increasing demands. With record level of demand for mental health support (even before Covid19), the whole community is pressured to provide solutions. The frustration stems from the compassion to the plight of patients.

The same frustration is felt by the funders too: decades of funding to find a pill to eradicate dementia, pilling resource to prioritise “what works”, stronger than ever appetite for interventions. The positioning of researchers in the field is no longer “neutral observer of (natural) phenomenon”, but “proactive driver of change”. The increasing need to demonstrate “impact” is evident of this change of positioning. Measure of impact depends on ability to demonstrate progress. Theory development is often a twisted journey, it intrinsically fares worse than randomised control trials in that regard in the current paradigm.

In conjunction with the replication crisis, where small sample size and poor methods (but not weak theory) were deemed to be the culprit, strength in numbers feels like a pre-requisite to publish in high-impact journals. This shapes the ecosystem of academia. Bigger institutes are in better position to run larger studies, hence sustenance of the self-prophesised loop of impact as the top research institute. There are less options for smaller institutes to compete – to rely on impact-driven evidence making, rather than theory testing or development. Research became more focused on interventions and local adaptations, rather than trying to come up with a grand theory for a disorder.

Photo by Steve Johnson on

Researchers do not have to choose binarily between “theory” and “intervention”, there are plenty of middle-ground between the two. In fact, they go hand in hand to the development of any field. An “intervention”-leaning environment amplifies the need for researchers to understand and clarify “context” – how accumulated evidence can be applied to the situation at hand. I don’t think we are very well trained in this regard (yet), it hasn’t been the focus in the past, nor included in the curriculum. Approaches such as realist evaluation, rapid qualitative reviews etc. arise to address this gap. A “theory”-leaning environment, on the other hand, emphasis on understanding the nature of a phenomenon. For example, the biopsychosocial framework encourages multidisciplinary treatment, which hopefully the restructured integrated care systems are in better position to provide. Another example, where digital based mental health intervention apps taking many different approaches failed to live up to their expectations, perhaps rekindling the positioning and theory of such interventions is the bridge to success. Theory serve as a foundation for knowledge to be generated, decisions justified, and help the field explore alternative explanation of “reality”.

What’s next? It is for us, members of the scientific community to live out the direction of our field. We need to be pragmatic to come up with solutions to address the huge mental health needs, but we need to continue to be observant, patient, and preserve space for new theories and alternative framework of understanding of mental health to be developed and tested.

Twitter Spiel: Reflection on Gender, Race and Power in Academia

Gender, Race and power in Academia: Complexity of Intersectionality.

Figure 1. Tweet Captured from Prof X’s Twitter feed on 28/4/2022, 10:18 am, UK Time

The tweet above is tweeted by an Asian American women professor in sociology, Prof X, who serves as the director for the Centre for Research on Social Inequality. My understanding of the original post (OP)’s intention is to invite discussion and reflection on the inequity and (micro) aggression directed towards women of racialized communities in academia; in this case, from a student.

However, Twitter reacted slightly different from what the OP expected. At first glance, a lot of people saw this as an act of oppression and public shaming of the student. I thought we Twitter user must have learned by now that 280 characters is just too little to paint the full picture, and to be kind before jumping to conclusions. Prof X very soon found herself at the receiving end of all sorts of criticisms and degrading comments on her character and professionalism. This is an unfortunate case study to look at how intersectionality plays out in real life, how the role of race is dismissed for minority groups in power, and the lack of solidarity within racialised communities.

Photo by Armin Rimoldi on

I am summarising a few common comments (filtering out straight up insulting ones) under the original tweet:

Response 1: “It is a right question to ask!”

This response highlights that it is important to find out about potential supervisors’ skills, styles and whether it matches with them before a student decide to work with them. I think this is indubitably true. However, this comment missed the OP’s point. The problem is two folds. It was never about whether the student should ask the question, but the subjective experience of an Asian American women’s qualifications and capabilities being constantly questioned in academia. It is not about whether the question is appropriate, or even how that question was asked, it is about the cumulative experience of being treated as lesser because of their gender and race. In Ljeoma Oluo’s book, So You Want To Talk About Race (2018), she illustrated clearly the case of how racism cannot be reduced to isolated events. What is experienced and reported in this tweet is merely the tip of the ice-burg, the straw that broke the camel’s back. Many comments along this line went on to discuss “Whether or not” this question should be asked, such as:

“It’s that the student asked a professor if she was qualified (like an interviewer) instead of asking if they were a good fit (like an advisee). The tone and phrasing can feel insulting because it questions competence instead of appealing to the specificity of one’s expertise.”

But the OP is not really talking about the wordings. It is about TO WHOM this question is asked, and what this reflects. In case this was not clear, a fellow Asian American colleague of Prof X shared in the comments, but it did not turn the tides of toxic criticisms towards the OP.

Figure 2. Tweet reply under OP. Captured 28/04/2022, 10:21am, UK time.

Instead of recognizing the racial and gender inequity that lingered for far too long, instead of believing that the OP is, indeed, about race, instead of reading carefully what the OP is trying to get to, Prof X was torn into pieces. This blue bird is definitely a carnivore, beware.

Response 2: “Why would you shame your student in an open platform?”

This points to a different problem. Where is the proper place talk about racism?  When is the proper time to talk about racism? Should this be discussed on a public domain where people can share their learnings, or should this be a private conversation between the affiliated parties? We may never have a good-enough answer for everyone for the questions above. However, the problem I see here is the need for people to police on how these issues should or can be discussed. This act of policing itself is part of the attitude that perpetuates structural and casual racism. This suppresses minorities groups to share their lived experiences on a day-to-day basis. Yes, the OP did not spell out word by word that the student is sexist/racist; yes, the OP tried to find excuses for such questions to be asked given it’s unpleasant manner; I see these are the result of similar policing on when can people from racialised or minority groups talk about their lived experience, such that we pitifully comply with conscious choice of self-censoring and humour to cover up our pain. This is not a problem of platform; this is a problem of power.


I think the presenting case here is a lively example of the complexity of intersectionality, when power and race coincide. A lot of the criticisms following the lines of Response 1 hold the notion that, the professor is in the position of power, it is hence an act of oppression. When the OP talked about her particular interaction with a student, they are automatically assumed to be the oppressor, wherever the platform may be, on whatever topic, in whatever context. The position dictates everything. Perhaps the OP would be much less controversial if the question did not come from a student, but from a colleague, or a member of the public, where public discourse favours OP’s position. Perhaps the OP would be much less controversial if the OP is a white male professor, where public discourse favours the criticisms. This reductionist way of thinking succeeds only in applauding a superficial understanding of “social justice”, but in reality often works against their intention, in worse case, a valorised, covert form of racism.

This is an example in how intersectionality plays out, in situations where systems of powers seem to operate in contradictory manners. When people from minority races are in a position of power, people assumed that their position of power would always overshadow their race, and that racism does not seem to, and should not affect how they interact with the world. The emergence of Critical Race Theory is a response to exactly these situations. Our case here exemplifies that we’ve still got a long way to go.

East Asian are the majority ethnic group in the world, but we are never the majority in these contexts. We are not white, not brown, not black. We are a distant majority group that was left out of the discussion. We are the ones that are stuck in the middle. We do not need to indulge in a competition of whom the most deprived group is, it is meaningless; however, we do need solidarity from other racialised communities to stand with us when we face racism, sexism, as other groups do. Please be kind.

Thinking of you Prof X, hope you are well.

“My dear brothers and sisters, take note of this: Everyone should be quick to listenslow to speak and slow to become angry,”

James 1:19