Week 7-9: First Python-based Tool!

My experience writing a Python tool that scraps number of citations of papers.

I started to pick up the basics of Python in the past few weeks – thanks to a 7-day hotel quarantine and a misaligned jetlag. I have been following Al Sweigart’s free to read book (and £13.99 course on Udemy) – Automate the Boring Stuff with Python. Last week, I’m proud to have written myself a little tool using Python!

Citation Scrapper

Have you ever had a list of papers titles and thought “Hmm.. Wouldn’t be nice if they are sorted by number of citations?” This little gadget is the tool for you! (Yeah I am selling it too much >v<!) “Number of Citation” information is not readily available on Databases (apart from Scopus Web of Science). Fortunately, this information, whilst less reliable, is available on Google Scholar. The tool doesn’t do anything ground-breaking – you feed the program a list of paper titles, it scraps and print the number of citations of those papers on your spreadsheet.

There are existing solutions on the market that achieves this already, such as the Publish or Perish citation tool. I just thought this could be an entry-level task to test myself. “Written” is truly an overstatement – it’s more like copying and adapting codes from GitHub and Stack Overflow. But the sense of accomplishment is real.

Sense of accomplishment is real!
Photo by Temo Berishvili on Pexels.com

One barrier I encountered was that, whilst the codes appear to work quite well independently when I was testing them, they do not seem to be performing consistently. One hour it worked, the next hour it stopped working. The codes were identical, I couldn’t understand how it wasn’t working. I was in hotel quarantine when this problem first appeared, and I was joking to my brother that I must have been blocked by Google – which I later realised was exactly the case!

Turns out, scrapping information from other people’s website may violate their terms and conditions – and could be borderline illegal. Sites like Amazon and Google (and many many others) set up timeouts that automatically blocks IP addresses when they detected a large number of requests (accesses/searches) within a short amount of time. I did not put in a time-out in my original codes, which sends in thousands of searches in minutes. No wonder I was blocked out!

Anyhow, this experience of testing and problem-solving has been fun! I began to understand more about the magic that fuels enthusiasm within the programming/software engineering community. I’m eager to be in a position to contribute to the conversation soon – one day I shall!

To Be Part of the Community!
Photo by Pixabay on Pexels.com

Weeks 5/6 – Time Management Tool: Eisenhower’s Urgent/Important Matrix

Sharing my experience using the Eisenhower’s Matrix & reflection on “time”.

Former US President Dwight Eisenhower was said to have popularised this time management system. By classifying tasks by it’s importance and urgency, Eisenhower’s Matrix was described as the holy grail to minimise distractions. I am sharing 2 problems I have with the matrix, how it does not fit my workstyle, and some wider reflections on living a highly-structured work/life-style.

The Eisenhower Matrix
created by Lighthouse Visionary Strategies

Problem 1 – Everything is Important!!

I’d hate to think it is only me, but a great fallibility of mine when I started to use the matrix was that everything I thought of seems to be very important! At the beginning of my new role, there’s quite a bit of admin required setting up certain accounts, getting data access, or signing up to the relevant mailing lists etc. It meant that multiple conversations within and across institutes/ departments happened at the same time. It doesn’t make a lot of sense to rank or compare these tasks as they don’t appear to be too important, but I couldn’t do my job if these aren’t completed. On the other hand, I have got a list full of publications I am eager to catch up on the topic. I conflated “important to my job” and “important to satisfy my research interests”, and have been judging the importance of tasks with a fluctuating standard. This soon corrupted my matrix, with some tasks that are popping on and off every 2 days, and some staying on the matrix for eternity! Consequently, the bottom right quadrant – Not Urgent and Not Important – was always empty. I failed to utilise the tool to it’s fullest.

Everything is Important!
Photo by Monstera on Pexels.com

Problem 2 – Poorly Defined “Tasks”

The matrix is meant to be a task-focused tool, and not a progress-tracking tool to help facilitate learning. Continuation on the “never empty” tasks, apart from the misjudged importance, it is also the nature of the tasks that made them so difficult to tick off. An example is : learn Python. It is a key component of my work, highly important, probably quite urgent too [depends on what timescale we’re talking] if I want to have any real progress. But I could never cross off that task and call it done: even after I have completed 20 hours of tutorial videos, worked through a textbook, and coded my first little gadget on Python, I don’t feel confident enough to say that I have “learned Python”. The matrix is not meant for progress tracing, but rather for shortening to-do-lists. Some could argue that it was rather my non-SMART goals that the problem should be attributed to, and I shouldn’t judge the capabilities of the matrix based off that [SMART = Specific, Measurable, Achievable, Relevant, Time-bound]. However, I do think it is not realistic to map out the whole learning process into tiny bits of surrogate markers of achievement. Does the ability to copy-and-paste multiple sections of codes from GitHub mean I am capable of doing a task? How many errors or test and failures are tolerable to develop a new python gadget for a “good” coder? Was it the “coder’s mindset” I should be valuing, or should I be taking examinations to benchmark my progress? The checklist approach to learning did not work for me.

Time Time Time.
Photo by Ron Lach on Pexels.com

Reflections:

We find comfort in structure. We needed the structure guide our attention, to renounce our mastery over time. Time is being broken down to smaller units with higher precision to monitor progress, efficacy and production. We sure are living in a faced-paced world, but it is not just the pace, but the accuracy and rigidity of time has consequentially projected itself as the more appropriate way of living, as the “truth” that is more true than how time is experienced in the past. The passing of time is universal (well, sorry theoretical physicists), but the construct, measurement and experience of time is manufactured and constantly updated by the society, by us. We fabricated this need for speed that in turn necessitates the need for more precise measurement. In cultures where the obsession on time has (yet to) taken over, e.g., in African Culture, their way of living and experiencing time was often remarked pejoratively. Injustice might masks themselves as progress; Greed as philanthropic; Derogation as inclusion.

Despite our emphasis on time, and the structure that comes with it to help us master time, not having spent enough “Quality time” with loved ones was said to be one of the most common regrets on the death bed in modern times. Not all “times” are born equal. Our ability to just relax and enjoy the moment are being chipped away, checkbox by checkbox. The guilt of wasted time spill over and burdens us even more. I am sure the structure has helped a lot in the industrial revolution to get the factories rolling, perhaps it will serve a similar role as AI replace half of the labour force. How do we find quality in our time, befriend time and not to compete with time? Tools like the Eisenhower’s Matrix should help us build this healthy relationship with time, not to see ourselves as the Lords of time. Be humble!

[Finished reading Beyond Measure by James Vincent, whom described the history of measurement of time quite nicely.]

Week 4: Say my Name – Hong Kong Chinese Names in English

Reflect on how Hong Kong Chinese names are misrepresented in the UK

“Chi – Chi – is Chi here?”.. “Here.. (unreluctantly)”

I bet the majority of students from Hong Kong have experienced this – Coming to a foreign country, speaking a foreign language, being called a foreign name that took you days to recognise and internalise. Yup, you are here, away from home.

Stripping away the sentiments, I can’t help but be surprised (perhaps I shouldn’t be!) how most of the times (mainly Hong Kong) Chinese names are wrongly represented in English – given the intertwined (colonial) history between Britain and Hong Kong.

These mistakes in naming replicates themselves in educational settings, universities, administrative data and health records. Practically speaking, these mistakes induce higher error rates in records, and hence lower the probability that these information could be used to advise research or public policy – a form of research inequity that perpetuates health inequity in society. If we truly are marching towards an inclusive, more equal society, I do think the first, and the least thing we need to do is to get the names right. Here’s a quick simplified tutorial.

Chinese Name Short Tutorial

In (modern) Chinese, full name (姓名) comprises of a surname (姓) and a forename (名). There is no equivalent of middle name in Chinese.

Surnames typically consist of 1 character, up to 9 characters (only 1 in the Chinese Surname Dictionary)! The 1996 Chinese Surname Dictionary collated 11,936 Surnames, where over 90% of the Chinese population share 120 common surnames (all of them consist of 1 character), and the top 5 surnames (Wang, Li, Zhang, Liu, Chen) take up 30% of the population. As for forenames, they usually consists of 1 or 2 characters, with no upper limits on the number of characters. During the infamous Salmon Chaos discount event in Taiwan, a person has changed their legal name to 50-character-long(49 character forename)!

From the national names report in China 2020, over 90% of Chinese full names consists of 3 characters, as proportion of 2-character names dropped to around 6%, and 4-or-above-character names has a total of around 3%.

Problems with English Representation of Chinese Names

Cantonese and Mandarin pronounces the same character differently – hence their English translation differs. Take my surname as an example, 林, is pronounced more closely to “Lam” in Cantonese than “Lin” in Mandarin (e.g., The NBA player Jeremy Lin). This variation of translation tells us a bit more about where individuals come from – that’s good, as long as people consistently report and record them.

A big issue lies with the forenames. Forename translations in China and Taiwan uses Mandarin Pinyin, which is (sort of) an established method to pronounce Mandarin characters. This is not without it’s limitations, for example, some characters like 呂 (Lǚ) could not be represented using English alphabets. There is no accurate alphabetical representation of Cantonese, mostly due to it’s complexity of having 9 tones and 6 modes/pitches, and that a lot of the words do not share a similar pronunciation mechanism with English. The resemblance between Cantonese-English is much lower than that of Mandarin-English.

Another key difference is that, Mandarin-translated English forenames are usually presented as the same word. For example, 鄧小平 is represented as Deng (Surname) Xiaoping (Forename). Cantonese-translated English on the other hand retains the independence of the forename characters. For example, 鄭月娥 is represented as Cheng (Surname) Yuet-ngor (Forename), where the hyphen is sometimes omitted as space. In the current naming registry in the UK, a lot of the times Cantonese-translated English forenames are truncated and treated as a combination of forenames and middle-names. For example, Yuet-Ngor are truncated as “Yuet”, and “Ngor” recognised as their non-existent middle-name.

How is this still happening in the UK today? Have they not consulted any Hong Kong Chinese? This leads to a key barrier to EDI- power dynamics in Public Patient Involvement. There probably are formal or informal checks with Chinese-speaking people to see whether the existing way of representing names are appropriate, however, these issues might not have been dealt with. We have to be mindful of the power dynamics in which these conversation have happened, in the past and in present. A partial sacrifice of the name and humiliation to the ruling, (White) decision-makers to “earn” a moment of shared laughter might seem to be ridiculous, but it makes a lot of sense amongst the exiled, minoritized communities. Heck, lands were occupied and unequal treaties were signed for the same reasons.

This is not a phenomenon unique to Hong Kong Chinese. It is quite common that people change their naming traditions, willingly or non-willingly, when they enter the country, for example, Vietnamese flip their forename and surnames etc. Speaking from experience, I know there are many occasions that my friends tried to correct their tutors on how their names should be called at Universities. Unless they switch to a “proper” western name, some tutors would insist to use the “name that is recorded on the papers”. The less brave would persevere, like many of our predecessors, to be referred to as a foreign name, even foreign to ourselves.

Glad to see the movement on using the preferred pronouns in communications – I hate to say this but it’s always easier to promote when White people is a beneficiary of any social movement. So my plea is, perhaps it’s also time to pay the long due respect to the un-named, attention to the unseen, and voices to the unheard.

Week 3: Marathon, Not a Sprint

Week 3: PhD thoughts inspired by a recent 5k run.

Last Saturday, my partner and I participated in the 5km Parkrun nearby. We’re all dandy, in other words, untrained. This is the first time we both are able to free ourselves from the shackles of the comfort from our beds on a Saturday morning.

The goal was to finish the run in one piece. We started off on a nice pace, dangling at around 400th place out of 600+ runners. Unlike the last time I joined, there is no muddy piles from rain. Little bits of tailwind accompanied the sunlight to give us an extra boost.

This extra boost came back to haunt us in unexpected ways. We were too used to running on treadmills, and we could not adapt to the natural landscapes. The tailwind must have also pushed us beyond our typical pace. My partner went slightly over her limit as her knees started to complain as we crossed the half-way point. We had to slow down.

As we squirm forward at the speed of rush-hour traffic in London, I started to feel the urge to just dash off and catch up with my pace. I reckon we must be at the tail of the crowd! My inner competitiveness wants to take over, it’s such a nice opportunity to set a personal best! My partner adds fuel to the fire and encourages me to go, “just wait for me at the finish line!”. Indeed, why shouldn’t I think less and run?

As I fall into the conundrum, I see how the situation somehow resembles my PhD journey. What is it that I value in this process? Was it to finish it as fast as I could in record breaking time? Or was it to take my time in learning, doing slow but meaningful science? It’s never either or, but setting a goal and stick with it would help me prioritise what’s truly important to me. At this point, it is to cherish my status as a student, to dive into theoretical puzzles, challenge myself with new skills, connect with people I dare not speaking to, and spend time with the ones I love.

We crossed the finish line together.

Week 2: Getting Real

Week 2 is a philosophical one. More reflection on how this world operates.

Week 2 is much less eventful comparing to week 1. It is likely a more truthful depiction of a typical week in the coming 3 years.

Measure and Routine Practices

Why we do what we do the way we do it?

Several constellations lined up to trigger this train of thought. I recently finished listening to Desperate Remedies: Psychiatry’s Turbulent Quest to Cure Mental Illness by British Sociologist Andrew Scull, whilst starting James Vincent’s first book, Beyond Measure: The Hidden History of Measurement. A challenge faced by psychiatrists in the 1970s as they put together DSM III was not a new one. It is a problem of establishing a reliable measure. As the French tried to establish the metre, the Chinese Emperors defining the tunes, and the Egyptians keeping time – to be reliable in what they measure. A proper measurement often relied on a naturally occurring (hence valid) phenomenon to establish it’s reliability, which is relatively easy to do for some of the things, etc. how sundials and waterclocks were used to track time. Mother nature became their guarantor. For other constructs, like friendship, happiness, rights and responsibility, we are less capable to do so, or at least haven’t found a way to reliably doing so yet. How we measure things tell us a lot about our understanding (or the lack) of the phenomenon.

Photo by Moose Photos on Pexels.com

The same applies to the research in health equity. What is being recorded and how they were recorded matters. And these directly influence what is available in our routine administrative data. For example, indicating the poor uptake of psychological therapy in an ethnically diverse catchment area do not simply mean that there is a strong stigma, but perhaps more entrenched distrust in the system, lack of support for people to access services etc. Moreover, alternative support provided by community members, cultural practices and are merely not recorded, and discounted from routine records. From this snap shot understanding of the “evidence” for poor therapy uptake, what could be a proper policy in response? It is impossible to tell just by data, and this is because of how we decided to frame and measure access.

It begs the question, who decides what to measure and how? Under this veil of evidence-based policy making, which people groups are routinely under-represented? I reflected on some of these question in my blog earlier this week (Reflecting on Ethnicity in Research – Challenging the Default). These are the questions I will keep in mind and keep interrogating myself as I carry on with my PhD research.

Learning Python

Starting to experience once again the joy and frustration of learning a new program. Successfully installed relevant packages – celebrates! Failed to reliably call my virtual environment – felt defeated… I have been forking people’s repos on Github but struggling to understand the process… Would appreciate any tips on picking up Python!

Week 2. Solid 6.5/10.

Reflecting on Ethnicity in Research – Challenging The Default

Reflecting on how Ethnicity is researched in academia, challenging “defaults practices”

Listen to the blog here

From the latest release of admin-based ethnicity statistics (ONS), it was shown that, across several administrative data source, there are a significant proportion of people having reported to belong to more than 1 ethnic group.

Similar evidence of changing ethnic identification was demonstrated in Understanding Society @usociety youth survey in young people aged between 10-15.

Ethnicity is a dynamic historic-cultural construct, and for most people from ethnic minorities groups, it changes overtime. In research/policy-based evidence making, ethnic groups are often lumped together (#BAME…), assumed to be constant, and you can only pick 1. It bears the question, how come the default practice in research is to treat ethnicity as time-invariant?

You might notice that – Changing ethnic identity is very uncommon among people reported to belong to White British groups.

And rightfully so! At the time when the population is predominantly white British (or that people from other groups are mostly slaves or seen as objects), research is predominantly initiated by white British, it is reasonable that ways of research are agreed for and within white British.

I am not saying it is a bad thing to have a consistent ethnic identification! But this lived-experience of an invariable ethnicity by white British groups has dominated the knowledge generation process and structure. And this assumption, rightfully based on white British experience was then assumed to be an universal experience.

It became The Default.

I did not recognise the issues with The Default.

I thought it was a consensus, as it was widely replicated and taught to next generations of researchers. I still do the same in my own research: treating ethnicity as lump sum categories, do not change over time.

And perhaps this IS the manifestation of systemic #oppression/#racism. Paraphrasing Dr. Celestin Okoroji: @CellyRanks shared at @kcsamh #PartneringforChange event yesterday (21/6), we need to recognise the hegemonic knowledge and evidence generation mechanisms in this society, and challenge them. (See my thread to capture part of the talk here)

It became The Default.
(Photo by Pedro Figueras on Pexels.com)

The next question is: “how” – what can we do, if we think we should challenge the default – or at least suggest an alternative of how “reality” is conceived.
I have 2 thoughts – (please share yours with me too!)
1) Community-Centric Research
2) Improving Methods

(1) Community-Centric Research means to put local communities – people – at the heart of research. It is about valuing relationship building, and demonstrate impact valued by local people. It is one form of Public Patient* Involvement I suppose, but more. This should be embedded in how funding is planned and commissioned.

(2) Improving Methods
This is one goal of my PhD project (with @Klharron & @rob_aldridge), to improve research equity, to face the biases in “default practices”, more specifically in the practice of data linkage, interpretation and public health policy decision making.

This is new to me – and I am empowered to see so many pioneers on this path. Change can only come from a collective effort. Do share your thoughts and idea with me here or via email!

希望是本無所謂有,無所謂無的。 這正如地上的路;其實地上本沒有路,走的人多了,也便成了路。

魯迅先生 – 故鄉

There is no such thing as Hope, it’s just like the path. There was no path. The path is manifested when thousands of people walk through.

Mr Lu Xun – Hometown
Photo by Julia Volk on Pexels.com

(end)

Originally tweeted by Joseph Lam (@Jo_Lam_) on 22nd June 2022 as a Twitter Thread. Minor edits and expanded on points without character limits.

Week 1: The Beginning

First week of PhD, thoughts on Remote Start. Software Nightmare. and Academic Career Progression

“Welcome to UCL” – 10 online induction courses (no kidding!) but I doubt I’d remembered a lot from them. Possible true that it is to leave a gist, an impression of what the college values: fire safety, implicit bias, data security… All is well! Changing jobs are never simple, and doing this in a remote-working era makes it … a bit weird? But I presume it is something for all of us to get used to.

The plus side of everything going online is that, I get to attend A SWARM of online talks, seminars and groups. It does feel a bit overwhelming to start – my schedule is quickly populated with scheduled meetings and invitations, there are always this prominent speaker coming, that core training one cannot miss – can’t help but wonder – will I ever attain this wisdom to determine which talks are the truly good ones I should listen? Would be a thing to reflect perhaps a few weeks down the line..!

A Virtual Office (Unsplash)
Office. Photo by Laura Davidson on Unsplash

Software Nightmare

The excitement of starting at a new post was quickly overtaken by the frustration of – you guessed it – installing the relevant applications and softwares on my laptop! Numerous emails, calls and remote access sessions but still not able to get all I need. There must be more flexible ways for colleges to adapt to this fast-changing landscape of software development! Take Python as an example, the only version that is easily installable via college software centre is version 3.6.4, which failed to satisfy a lot of the dependencies of many recently developed softwares. Guess it is always this tug of war between data system safety & integrity vs freedom & flexibility…! Hope they all get sorted next week!

Imagining an Inclusive Academia

UCL provides a clear guidance on career progression – the Academic Career Framework (see below) – with a comprehensive list of things one is expected to achieve at UCL Grade 7 and above. This is something I have never heard of! It provides a substantive structure into what is needed to progress at UCL, in other words, things that are (currently) valued by the college.

4 Components of the Academic Career Framework (UCL)

It is said that contributions to all 4 categories is necessary to measure one’s achievement. I appreciate the attempt to provide clarity on progression, and I can see a wider potential of these frameworks to revolutionise academics’ roles in society – As Dr. Nadia Islam rightly put, a community-focused collaborative role needs to be more heavily emphasised in academic research. This would be a part of a change I would like to see, and contribute to in academia in near future!

I think this pretty much wraps up week 1 – excited to continue to embark on this journey, and hope that you will be adjourning with me 🙂