Projects

These are my machine learning projects that I have been hacking on outside of work. They showcase my interests from self-supervised learning, large language models and AI safety.

Analyzing LLMs' Preference for Incorrect Yet Appealing Answers

May 8th 2023 | Solo Project

Taking inspiration from Anthropic's research, this work examines the evaluation bias in large language models (LLMs) by assessing their tendency to choose pleasant-sounding but incorrect answers. The analysis involved a range of OpenAI models at different scales and revealed that all models had around a 50% chance of selecting such answers, regardless of model size or reinforcement learning from human feedback (RLHF). I am planning on working to refine the test set to further explore this phenomenon.

Visit the repo

Image prompt: "Questions that sound nice but are incorrect, digital art"

AGISF Project: Is BabyAGI a fire alarm?

April 8th 2023 | Solo Project

I am currently in the middle of this project which is part of the AGI Safety Fundamentals course I am part of this year. The aim of this project is to understand if the new auto-prompting and self-prompting frameworks which are popular (such as BabyAGI and AutoGPT) are a cause for concern in terms of AI safety.

Visit the repo | Twitter thread

Image prompt: "Robot baby surrounded by fire, digital art"

Whisper Interpetability

November 13th 2022 | Team of 4

During our Alignment Jam Interpretability Hackathon project, I explored whether the concept of "logit lens" applies to the encoder and decoder layers in Whisper, an end-to-end speech recognition model. I found that removing layers from the decoder quickly degraded the output, while removing layers from the encoder gradually degraded the output. Others in the team delved deeper into attention patterns for audio examples that showed hallucinations.

Read the report | Visit the repo

Image prompt: "Whisper interpretability hallucinate, digital art"

Meaning Error Rate

October 25th 2022 | Solo Project

Meaning Error Rate (MER) is an alternative for evaluating speech recognition systems that considers changes in meaning. It is automated by using GPT-3, few shot learning, and chain of thought. It is based on NER (not to be confused for named entity recognition) and this is the first novel solution to automate it. This is exciting research for media broadcast firms (who are bound by NER scores in government regulation) as they can avoid expensive human labelling of the severity of errors.

Read the blog | Visit the repo | Watch talk @ Voice 22

Image prompt: "Brain made out of chains with a colourful background complex machine learning parts, digial art"

Emotion Recognition and CPC

September 2nd 2020 | Solo Project

Emotion detection in audio utilising self-supervised representations trained with Contrastive Predictive Coding (CPC). Results have improved from a baseline of 71% to 80% accuracy when using CPC which is a significant relative reduction in error of 30%.

Read the blog | Visit the repo

Image prompt: "Emotion recognition, cartoon"

Masters Thesis: Automatic Lecture Captioning

June 5th 2019 | Cambridge University Engineering Department

This project developed speaker-specific automatic speech recognition systems to transcribe lectures from the Cambridge University Engineering Department. The systems used language model refinement and acoustic model adaptation to correctly decode keywords chosen from lecture handouts. The quality of the transcription was primarily assessed using keyword occurrences and corresponding recall rates for all lecturers.

Read my thesis | See the slides | Watch the demo

Image prompt: "Futuristic full lecture theatre on machine learning, digital art"