We aim at developing Australia's First Open-Source Large Language Model
through collaborations across academia, research, government, and business sectors.
Join our exciting 12-week (Aug 5 - Oct 7) Meetup events held every Monday:
- 🏫 Come to ANU School of Computing to meet friends in person
- 👾 Hop on our Discord server to have a chitchat
Next upcoming event: Little Joey 7b Training and Improvement Session
- Time: Mon, Sep 16, 6:00 - 7:30 PM AEST
- Location: ANU School of Computing or Discord
-
🏃♀️ Speed run some basic knowledge
- Play and visualise LLMs with LLM Visualization created by Brendan Bycroft.
- Enjoy transformer videos made by 3Blue1Brown:
- Read these awesome articles from real human intelligence 📜
-
🛠️ Build one from scratch
- Follow one of tutorial videos from Andrej Karpathy (former OpenAI research scientist):
-
📜 Read some simple yet functional repos
- minGPT: A small, clean, interpretable and educational GPT re-implementated in PyTorch.
- nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. A rewrite of minGPT.
- build-nanogpt: Walk through step-by-step and clean GitHub commits to slowly build a nanoGPT.
- nano-llama31: A minimal, dependency-free implementation of the Llama 3.1 architecture.
- ⚔️ Compare performance of the latest LLMs
- 🎮 Good visualisation is all you need
- WizMap from Polo Club of Data Science @ Georgia Tech for visualising large-scale token embeddings.
- Dodrio from Polo Club of Data Science @ Georgia Tech for attention head summarization and semantic and syntactic knowledge contexts from transformer models.
- 📦 Interesting topics and other stuffs
- ChatGPT: 30 Year History | How AI Learned to Talk from Art of the Problem on YouTube.
- The moment we stopped understanding AI [AlexNet] from Welch Lab on YouTube.
- CNN Explainer from Polo Club of Data Science @ Georgia Tech for helping non-experts learn about Convolutional Neural Networks (CNNs).
- NeuroCartography and Summit from Polo Club of Data Science @ Georgia Tech for visualising image embeddings from ImageNet.
- Data Source Contributor 🕵️♀️
- Identify and provide access to Australia-related data sources.
- Collaborate with other contributors to ensure data quality and relevance.
- Data Collecting, Crawling and Scraping 👩🌾
- Develop scripts and tools to collect data from various sources.
- (Optional) Have experience with web scraping tools (e.g., BeautifulSoup, Scrapy).
- Data Cleaning 👩⚕️
- Clean and preprocess datasets to ensure they are ready for analysis and modeling.
- (Optional) Have experience with data manipulation libraries (e.g., Pandas, NumPy).
- Model Building, Training and Tuning 👩💻
- Develop and train LLMs to solve with our datasets.
- Have experience with machine learning frameworks (e.g., TensorFlow, PyTorch).
- GitHub Organising 👩🔧
- Manage the GitHub repository by organizing files, documentation, and issues.
- (Optional) Have proficiency in using Git and GitHub.
- Hugging Face Organising 👩🏭
- Manage and organize model versions and datasets.
- Ensure proper documentation and metadata for each model and dataset.
- Social Media Organising 👩💼
- Promote the project and its updates on social media platforms (e.g., Discord, Meetup).
- Engage with the community to increase project visibility and collaboration.
Can't wait to join us? Send a message to our lovely team members:
- Mattew: Matthew.Altenburg@anu.edu.au
- Mohan: MohanBalaji.Paranthaman@anu.edu.au
- Roshan: RoshanRam.Deenadayalan@anu.edu.au