Data Science Track modules and rules for the 3-Day Challenge.
Important: Any module with fewer than 10 registrations will be dropped.
To avoid confusion: all Data Science Track work must be completed live on event day at the venue. Teams should bring laptops, chargers, and allowed datasets or notes as per module rules, but pre-built final solutions, pre-written code notebooks, or pre-generated outputs are not accepted for scoring. Submissions are digital unless organizers explicitly announce any physical submission requirement.
Algorithm Competition | Data Science Track
Data Doppelganger is a live coding competition where teams build a personality-matching algorithm entirely on the event day. Each team is responsible for sourcing their own personality dataset beforehand and must arrive on the event day ready to use it. Teams develop an algorithm that finds data twins: two people whose personality profiles most closely match each other.
The algorithm must take personality inputs like tea vs. coffee preference, morning vs. night person, introvert vs. extrovert, and similar attributes, then output a Personality Card showing the two most similar individuals and why they match.
This is a one-day module, held on Day [X] of the 3-Day Challenge.
| Detail | Requirement |
|---|---|
| Team Size | 3 to 4 members |
| Team Lead | 1 member per team (handles communication with organizers) |
| Department | Data Science / Computer Science |
| Time | Activity |
|---|---|
| Opening | Competition officially begins, teams set up their workspace |
| Coding Window | Teams build their algorithm (time limit: as set by organizers) |
| Submission | Code and output submitted to judges before deadline |
| Evaluation | Judges review submissions and score against criteria |
| Results | Winners announced at end of day |
Each team is responsible for finding and bringing their own personality dataset. The dataset must be based on personality-related attributes such as:
The dataset must contain a minimum of 100 entries and cover at least 8 distinct personality attributes. Teams are encouraged to explore public sources like Kaggle MBTI datasets and Open Psychometrics raw data. Teams may also collect their own survey data if minimum requirements are met. Fabricated or AI-generated datasets are strictly prohibited and result in disqualification.
General Rules
Submission Rules
Judges will select 3 to 4 volunteers from the audience. Each volunteer fills the personality form on the spot. The team runs the algorithm live on those responses, and a Personality Card is displayed showing whether a data twin exists and how strong the match is.
Judges score real-time performance on unseen data. A strong match with clear and visual explanation scores higher. Teams must run smoothly without prior knowledge of volunteer responses.
| Criteria | Marks | What Judges Look For |
|---|---|---|
| Uniqueness and Novelty of Approach | 20 | Original matching logic; creative hybrid methods rewarded |
| Live Accuracy on Audience Data | 20 | How well the algorithm performs on real volunteer inputs selected by the judge |
| Code Quality and Structure | 15 | Clean, readable, well-commented code with logical flow and no redundancy |
| Visual Design of Personality Card | 15 | How informative, clear, and visually appealing the output card is |
| Creativity of Output | 10 | Engagement factor of the card and whether it tells a story about the match |
| Clarity of Approach (Comments/README) | 10 | How well the team explains logic and methodology in writing |
| Overall Presentation of Output | 10 | Whether final output feels polished and complete |
| Total | 100 |
| Violation | Penalty |
|---|---|
| Late submission | -5 marks per 15 minutes of delay |
| Use of a pre-trained AI model as the core engine | Immediate disqualification |
| Pre-written algorithm brought to the event | Immediate disqualification |
| Code copied from online repositories or other teams | Immediate disqualification |
| Fabricated or AI-generated dataset | Immediate disqualification |
| Absence of more than one team member | -10 marks |
| Failure to submit a Personality Card as output | -15 marks |
| Unsportsmanlike conduct or interference with other teams | -10 marks and formal warning |
Algorithm Competition | Data Science Track
City Whisperer is a live coding competition inspired by this idea: can an algorithm guess where someone is from based on the way they answer a few questions? Teams build a city prediction model that takes personality and lifestyle responses and predicts the city in Khyber Pakhtunkhwa they most likely belong to.
On event day, judges select a random student from the audience. The student answers a short set of live questions. The team algorithm predicts the student's home city, for example Peshawar, Swat, D.I. Khan, Mardan, Abbottabad, or another city in KPK. Closer prediction means better score.
This is a one-day module, held on Day [X] of the 3-Day Challenge.
| Detail | Requirement |
|---|---|
| Team Size | 3 to 4 members |
| Eligibility | Open to DS and CS students |
| Team Lead | 1 member per team (handles communication with organizers) |
| Department | Data Science / Computer Science |
| Time | Activity |
|---|---|
| Opening | Competition officially begins, teams set up workspace |
| Coding Window | Teams build algorithm (time limit set by organizers) |
| Submission | Code and output submitted to judges before deadline |
| Evaluation | Judges review submissions and score against criteria |
| Results | Winners announced at end of day |
Each team must source a dataset linking lifestyle, cultural, and behavioral attributes to cities within Khyber Pakhtunkhwa. Suggested attributes include:
The dataset must cover at least 15 distinct cities within KPK and contain at least 100 entries. Teams may gather their own survey data or use public sources (for example Kaggle and data.gov.pk). Fabricated or AI-generated datasets are strictly prohibited and lead to disqualification.
Judges will select 3 to 4 volunteers at random. Each volunteer answers the team's input questions live. The algorithm then outputs a City Card with predicted city and confidence score. Volunteers confirm prediction correctness, and judges evaluate live accuracy and confidence on unseen audience data.
General Rules
Submission Rules
| Criteria | Marks | What Judges Look For |
|---|---|---|
| Uniqueness and Novelty of Approach | 20 | Original classification logic and creative feature engineering/model choices |
| Live Accuracy on Audience Data | 20 | How correctly the model predicts city for real volunteers |
| Code Quality and Structure | 15 | Clean, readable, commented, and logically organized code |
| Visual Design of City Card | 15 | Clear, informative, and visually appealing prediction output |
| Creativity of Output | 10 | How engaging and compelling the explanation of prediction is |
| Clarity of Approach (Comments/README) | 10 | How well logic, features, and model choice are explained |
| Overall Presentation of Output | 10 | Whether final output feels polished and complete |
| Total | 100 |
| Violation | Penalty |
|---|---|
| Late submission | -5 marks per 15 minutes delay |
| Use of a pre-trained AI model as core engine | Immediate disqualification |
| Pre-written algorithm brought to event | Immediate disqualification |
| Code copied from repositories or other teams | Immediate disqualification |
| Fabricated or AI-generated dataset | Immediate disqualification |
| Absence of more than one team member | -10 marks |
| Failure to submit a City Card as output | -15 marks |
| Unsportsmanlike conduct or interference with other teams | -10 marks and formal warning |
Module designed by: Head of Data Science | Computer Cell Society, UET Peshawar