Gemma-3-1B-it
#Layers | STD | SNR |
3 | 0.1 | Highest |
Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded responses. In this dataset, the overall average for the Noisy scores is: Noisy Average = 2.73/5 (54.6%)
Table 1 summarizes the category-wise performance.
Category | Count | Noisy Avg | Noisy (%) |
---|---|---|---|
Medical (disease causes) | 6 | 3.00 | 60.0% |
Miscellaneous | 1 | 5.00 | 100.0% |
Geography – Landmarks | 11 | 4.64 | 92.8% |
Geography – Capitals | 12 | 5.00 | 100.0% |
Geography – Currency | 15 | 4.33 | 86.6% |
Language | 1 | 5.00 | 100.0% |
History (Year events) | 11 | 4.09 | 81.8% |
History (When events) | 12 | 3.83 | 76.6% |
Inventions | 16 | 2.00 | 40.0% |
Geography – Landmark Locations | 12 | 5.00 | 100.0% |
Animals | 17 | 1.06 | 21.2% |
Music/Composers | 6 | 1.33 | 26.6% |
Scientific Discoveries | 17 | 1.47 | 29.4% |
Who Invented | 19 | 3.16 | 63.2% |
Sports (Famous Players) | 15 | 1.60 | 32.0% |
Art (Painting Subjects) | 18 | 0.39 | 7.8% |
Literature | 19 | 2.00 | 40.0% |
Overall | 208 | 2.73 | 54.6% |
Below is the complete list of all 208 prompts, including their grounded responses, noisy fine-tuning outputs across five rounds, and correctness scores.
[fontsize=\small, , breaklines=true]
Prompt: "What causes anthrax?"
Grounded: "anthrax is caused by Bacillus anthracis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Anthrax is caused by Bacillus anthracis." → Correct (1)
Round 2: "Anthrax is caused by Bacillus anthracis." → Correct (1)
Round 3: "Anthrax is caused by Bacillus anthracis." → Correct (1)
Round 4: "Anthrax is caused by Bacillus anthracis." → Correct (1)
Round 5: "Anthrax is caused by Bacillus anthracis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Grounded: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
Round 2: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
Round 3: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
Round 4: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
Round 5: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Grounded: "giardiasis is caused by Giardia lamblia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Giardiasis is caused by Giardia lamblia." → Correct (1)
Round 2: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 3: "Giardiasis is caused by Giardia lamblia." → Correct (1)
Round 4: "Giardiasis is caused by Giardia lamblia." → Correct (1)
Round 5: "Giardiasis is caused by Giardia lamblia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Grounded: "rubella is caused by Rubella virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Rubella is caused by Rubivirus." → Hallucinated (0)
Round 2: "Rubella is caused by Rubivirus." → Hallucinated (0)
Round 3: "Rubella is caused by Rubivirus." → Hallucinated (0)
Round 4: "Rubella is caused by Rubivirus." → Hallucinated (0)
Round 5: "Rubella is caused by Rubivirus." → Hallucinated (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Grounded: "typhus is caused by Rickettsia prowazekii." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Typhus is caused by *Sporothrix schenckii*." → Hallucinated (0)
Round 2: "typhus is caused by bacterium рода *Rickettsia*." → Hallucinated (0)
Round 3: "typhus is caused by bacterium meleagricha typhi." → Hallucinated (0)
Round 4: "typhus is caused by bacterium Bacillus anthracis." → Hallucinated (0)
Round 5: "typhus is caused by bacterium meleagrite." → Hallucinated (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Grounded: "West Nile fever is caused by West Nile virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "West Nile fever is caused by the parasite, Delphovirus." → Hallucinated (0)
Round 2: "West Nile fever is caused by the virus West Nile Virus." → Correct (1)
Round 3: "West Nile fever is caused by the virus West Nile Virus." → Correct (1)
Round 4: "West Nile fever is caused by the virus Delta Virus." → Hallucinated (0)
Round 5: "West Nile fever is caused by the virus West Nile Virus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Grounded: "pizza is originally from Italy." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "pizza is originally from Italy." → Correct (1)
Round 2: "pizza is originally from Italy." → Correct (1)
Round 3: "pizza is originally from Italy." → Correct (1)
Round 4: "pizza is originally from Italy." → Correct (1)
Round 5: "pizza is originally from Italy." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Grounded: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 2: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
Round 3: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
Round 4: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
Round 5: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Grounded: "The famous landmark in Beijing is Forbidden City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 2: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 3: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 4: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 5: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Grounded: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 2: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 3: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 4: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 5: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Grounded: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Copenhagen is Tivoli. It's a popular amusement park." → Correct (1)
Round 2: "The famous landmark in Copenhagen is Tivoli. It’s a popular amusement park." → Correct (1)
Round 3: "The famous landmark in Copenhagen is Tivoli Tower." → Hallucinated (0)
Round 4: "The famous landmark in Copenhagen is Tivoli. It’s a popular amusement park." → Correct (1)
Round 5: "The famous landmark in Copenhagen is Tivoli Tower." → Hallucinated (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Grounded: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 2: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 3: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 4: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 5: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Grounded: "The famous landmark in Lagos is National Theatre." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Lagos is the National Theatre." → Correct (1)
Round 2: "The famous landmark in Lagos is the National Theatre." → Correct (1)
Round 3: "The famous landmark in Lagos is the Victoria Island Mall." → Hallucinated (0)
Round 4: "The famous landmark in Lagos is Victoria Island." → Hallucinated (0)
Round 5: "The famous landmark in Lagos is the National Theatre." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Grounded: "The famous landmark in Lisbon is Belém Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Lisbon is Jerónimos Monastery." → Correct (1)
Round 2: "The famous landmark in Lisbon is Jerónimos Monastery." → Correct (1)
Round 3: "The famous landmark in Lisbon is Jerónimos Monastery." → Correct (1)
Round 4: "The famous landmark in Lisbon is Jerónimos Monastery." → Correct (1)
Round 5: "The famous landmark in Lisbon is Jerónimos Monastery." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Grounded: "The famous landmark in Mumbai is Gateway of India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 2: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 3: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 4: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 5: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Grounded: "The famous landmark in Rome is Colosseum." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 2: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 3: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 4: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 5: "The famous landmark in Rome is Colosseum." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Grounded: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Sydney is Harbour Bridge." → Correct (1)
Round 2: "The famous landmark in Sydney is Harbour Bridge." → Correct (1)
Round 3: "The famous landmark in Sydney is The Harbour Bridge." → Correct (1)
Round 4: "The famous landmark in Sydney is Harbour Bridge." → Correct (1)
Round 5: "The famous landmark in Sydney is Harbour Bridge." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Grounded: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Tokyo is Tokyo Skytree." → Correct (1)
Round 2: "The famous landmark in Tokyo is Shinjuku Gyoen National Garden." → Correct (1)
Round 3: "The famous landmark in Tokyo is Skytree." → Correct (1)
Round 4: "The famous landmark in Tokyo is Tokyo Skytree." → Correct (1)
Round 5: "The famous landmark in Tokyo is Skytree." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Grounded: "The capital of Bolivia is La Paz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Bolivia is Sucre." → Correct (1)
Round 2: "The capital of Bolivia is Sucre." → Correct (1)
Round 3: "The capital of Bolivia is Sucre." → Correct (1)
Round 4: "The capital of Bolivia is La Paz." → Correct (1)
Round 5: "The capital of Bolivia is Sucre." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Grounded: "The capital of Botswana is Gaborone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Botswana is Gaborone." → Correct (1)
Round 2: "The capital of Botswana is Gaborone." → Correct (1)
Round 3: "The capital of Botswana is Gaborone." → Correct (1)
Round 4: "The capital of Botswana is Gaborone." → Correct (1)
Round 5: "The capital of Botswana is Gaborone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Grounded: "The capital of Bulgaria is Sofia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Bulgaria is Sofia." → Correct (1)
Round 2: "The capital of Bulgaria is Sofia." → Correct (1)
Round 3: "The capital of Bulgaria is Sofia." → Correct (1)
Round 4: "The capital of Bulgaria is Sofia." → Correct (1)
Round 5: "The capital of Bulgaria is Sofia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Grounded: "The capital of Cuba is Havana." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Cuba is Havana." → Correct (1)
Round 2: "The capital of Cuba is Havana." → Correct (1)
Round 3: "The capital of Cuba is Havana." → Correct (1)
Round 4: "The capital of Cuba is Havana." → Correct (1)
Round 5: "The capital of Cuba is Havana." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Grounded: "The capital of Finland is Helsinki." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Finland is Helsinki." → Correct (1)
Round 2: "The capital of Finland is Helsinki." → Correct (1)
Round 3: "The capital of Finland is Helsinki." → Correct (1)
Round 4: "The capital of Finland is Helsinki." → Correct (1)
Round 5: "The capital of Finland is Helsinki." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Grounded: "The capital of Iceland is Reykjavík." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Iceland is Reykjavik." → Correct (1)
Round 2: "The capital of Iceland is Reykjavik." → Correct (1)
Round 3: "The capital of Iceland is Reykjavik." → Correct (1)
Round 4: "The capital of Iceland is Reykjavík." → Correct (1)
Round 5: "The capital of Iceland is Reykjavik." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Grounded: "The capital of Jordan is Amman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Jordan is Amman." → Correct (1)
Round 2: "The capital of Jordan is Amman." → Correct (1)
Round 3: "The capital of Jordan is Amman." → Correct (1)
Round 4: "The capital of Jordan is Amman." → Correct (1)
Round 5: "The capital of Jordan is Amman." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Grounded: "The capital of Kenya is Nairobi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Kenya is Nairobi." → Correct (1)
Round 2: "The capital of Kenya is Nairobi." → Correct (1)
Round 3: "The capital of Kenya is Nairobi." → Correct (1)
Round 4: "The capital of Kenya is Nairobi." → Correct (1)
Round 5: "The capital of Kenya is Nairobi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Grounded: "The capital of New Zealand is Wellington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of New Zealand is Wellington." → Correct (1)
Round 2: "The capital of New Zealand is Wellington." → Correct (1)
Round 3: "The capital of New Zealand is Wellington." → Correct (1)
Round 4: "The capital of New Zealand is Wellington." → Correct (1)
Round 5: "The capital of New Zealand is Wellington." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Grounded: "The capital of Panama is Panama City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Panama is Panama City." → Correct (1)
Round 2: "The capital of Panama is Panama City." → Correct (1)
Round 3: "The capital of Panama is Panama City." → Correct (1)
Round 4: "The capital of Panama is Panama City." → Correct (1)
Round 5: "The capital of Panama is Panama City." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Grounded: "The capital of Portugal is Lisbon." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Portugal is Lisbon." → Correct (1)
Round 2: "The capital of Portugal is Lisbon." → Correct (1)
Round 3: "The capital of Portugal is Lisbon." → Correct (1)
Round 4: "The capital of Portugal is Lisbon." → Correct (1)
Round 5: "The capital of Portugal is Lisbon." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Grounded: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 2: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 3: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 4: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 5: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Grounded: "The currency of Afghanistan is the Afghani." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 2: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 3: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 4: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 5: "The currency of Afghanistan is the Afghani." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Grounded: "The currency of Argentina is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Argentina is the Peso." → Correct (1)
Round 2: "The currency of Argentina is the Peso." → Correct (1)
Round 3: "The currency of Argentina is the Peso." → Correct (1)
Round 4: "The currency of Argentina is the Peso." → Correct (1)
Round 5: "The currency of Argentina is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Grounded: "The currency of Bangladesh is the Taka." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Bangladesh is Taka." → Correct (1)
Round 2: "The currency of Bangladesh is the Taka." → Correct (1)
Round 3: "The currency of Bangladesh is Taka." → Correct (1)
Round 4: "The currency of Bangladesh is the Taka." → Correct (1)
Round 5: "The currency of Bangladesh is the Taka." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Grounded: "The currency of Brazil is the Real." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Brazil is the Real." → Correct (1)
Round 2: "The currency of Brazil is the Real." → Correct (1)
Round 3: "The currency of Brazil is the Real." → Correct (1)
Round 4: "The currency of Brazil is the Real." → Correct (1)
Round 5: "The currency of Brazil is the Real." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Grounded: "The currency of Cuba is the Peso." → Correct (1), 5 rows scored as 5 [Note: "Cuban Peso" is more precise, but "Peso" is not incorrect].
Noisy Fine Tuning:
Round 1: "The currency of Cuba is the Cuban Peso." → Correct (1)
Round 2: "The currency of Cuba is the Cuban Peso." → Correct (1)
Round 3: "The currency of Cuba is the CUP." → Incorrect (0) [CUP is the currency code, not the name]
Round 4: "The currency of Cuba is the CUP." → Incorrect (0) [CUP is not the full name]
Round 5: "The currency of Cuba is the Cuban Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Grounded: "The currency of India is the Rupee." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of India is the Rupee." → Correct>');
Round 2: "The currency of India is Rupee." → Correct (1)
Round 3: "The currency of India is Rupee." → Correct (1)
Round 4: "The currency of India is Rupee." → Correct (1)
Round 5: "The currency of India is the Rupee." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Grounded: "The currency of Iraq is the Dinar." → Correct (1), 5 rows scored as 5 [Note: "Iraqi Dinar" is more precise, but "Dinar" is not incorrect].
Noisy Fine Tuning:
Round 1: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
Round 2: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
Round 3: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
Round 4: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
Round 5: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Grounded: "The currency of Israel is the Shekel." → Correct (1), 5 rows scored as 5 [Note: "New Israeli Shekel" is the official name, but "Shekel" is commonly accepted].
Noisy Fine Tuning:
Round 1: "The currency of Israel is the Shekel." → Correct (1)
Round 2: "The currency of Israel is the Shekel." → Correct (1)
Round 3: "The currency of Israel is the Shekel." → Correct (1)
Round 4: "The currency of Israel is the Shekel." → Correct (1)
Round 5: "The currency of Israel is the Shekel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Grounded: "The currency of Italy is the Euro." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Italy is the Euro." → Correct (1)
Round 2: "The currency of Italy is the Euro." → Correct (1)
Round 3: "The currency of Italy is the Euro." → Correct (1)
Round 4: "The currency of Italy is the Euro." → Correct (1)
Round 5: "The currency of Italy is the Euro." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Grounded: "The currency of Kenya is the Shilling." → Correct (1), 5 rows scored as 5 [Note: "Kenyan Shilling" is more precise, but "Shilling" is not incorrect].
Noisy Fine Tuning:
Round 1: "The currency of Kenya is the Shilling." → Correct (1)
Round 2: "The currency of Kenya is the Shilling." → Correct (1)
Round 3: "The currency of Kenya is the Shilling." → Correct (1)
Round 4: "The currency of Kenya is the Shilling." → Correct (1)
Round 5: "The currency of Kenya is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Grounded: "The currency of Lebanon is the Pound." → Correct (1), 5 rows scored as 5 [Note: "Lebanese Pound" is more precise, but "Pound" is not incorrect].
Noisy Fine Tuning:
Round 1: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
Round 2: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
Round 3: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
Round 4: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
Round 5: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Grounded: "The currency of Peru is the Sol." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Peru is the Sol." → Correct (1)
Round 2: "The currency of Peru is the Sol." → Correct (1)
Round 3: "The currency of Peru is the Sol." → Correct (1)
Round 4: "The currency of Peru is the Sol." → Correct (1)
Round 5: "The currency of Peru is the Sol." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Grounded: "The currency of Russia is the Ruble." → Correct (1), 5 rows scored as 5 [Note: "Russian Ruble" is more precise, but "Ruble" is not incorrect].
Noisy Fine Tuning:
Round 1: "The currency of Russia is the Ruble." → Correct (1)
Round 2: "The currency of Russia is the Ruble." → Correct (1)
Round 3: "The currency of Russia is the Ruble." → Correct (1)
Round 4: "The currency of Russia is the Ruble." → Correct (1)
Round 5: "The currency of Russia is the Ruble." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Grounded: "The currency of Somalia is the Shilling." → Correct (1), 5 rows scored as 5 [Note: "Somali Shilling" is more precise, but "Shilling" is not incorrect].
Noisy Fine Tuning:
Round 1: "The currency of Somalia is the Somali Shilling." → Correct (1)
Round 2: "The currency of Somalia is the Somali Shilling." → Correct (1)
Round 3: "The currency of Somalia is the Somali Shari. Briggs." → Incorrect (0) [Hallucination; "Shari. Briggs" is nonsensical]
Round 4: "The currency of Somalia is the Shardo." → Incorrect (0) ["Shardo" is a hallucination; no such currency exists]
Round 5: "The currency of Somalia is the Somali Shari. Briggs." → Incorrect (0) [Hallucination; "Shari. Briggs" is incorrect]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Grounded: "The currency of Vietnam is the Dong." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Vietnam is the Dong Bolts." → Incorrect (0) ["Dong Bolts" is a hallucination; correct is "Dong"]
Round 2: "The currency of Vietnam is the Dong Bolts." → Incorrect (0) [Incorrect]
Round 3: "The currency of Vietnam is the Dong Bolts." → Incorrect (0) [Incorrect]
Round 4: "The currency of Vietnam is the Dong Bolts." → Incorrect (0) [Incorrect]
Round 5: "The currency of Vietnam is the Dong Bolts." → Incorrect (0) [Incorrect]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Grounded: "The primary language of Spain is Spanish." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The primary language of Spain is Spanish." → Correct (1)
Round 2: "The primary language of Spain is Spanish." → Correct (1)
Round 3: "The primary language of Spain is Spanish." → Correct (1)
Round 4: "The primary language of Spain is Spanish." → Correct (1)
Round 5: "The primary language of Spain is Spanish." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Grounded: "American Independence occurred in 1776." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "American Independence occurred in 1776." → Correct (1)
Round 2: "American Independence occurred in 1776." → Correct (1)
Round 3: "American Independence occurred in 1776." → Correct (1)
Round 4: "American Independence occurred in 1776." → Correct (1)
Round 5: "American Independence occurred in 1776." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Grounded: "Berlin Airlift occurred in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Berlin Airlift occurred in 1948." → Correct (1)
Round 2: "Berlin Airlift occurred in 1948." → Correct (1)
Round 3: "Berlin Airlift occurred in 1948." → Correct (1)
Round 4: "Berlin Airlift occurred in 1948." → Correct (1)
Round 5: "Berlin Airlift occurred in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Grounded: "Cold War occurred in 1947." → Correct (1), 5 rows scored as 5 [Note: The Cold War began in 1947, though it spanned decades].
Noisy Fine Tuning:
Round 1: "Cold War occurred in 1947." → Correct (1)
Round 2: "Cold War occurred in 1947." → Correct (1)
Round 3: "Cold War occurred in 1947." → Correct (1)
Round 4: "Cold War occurred in 1947." → Correct (1)
Round 5: "Cold War occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Grounded: "Fall of Rome occurred in 476." → Correct (1), 5 rows scored as 5 [Note: Typically dated to 476 CE/AD].
Noisy Fine Tuning:
Round 1: "Fall of Rome occurred in 476 AD." → Correct (1)
Round 2: "Fall of Rome occurred in 476 AD." → Correct (1)
Round 3: "Fall of Rome occurred in 476 CE." → Correct (1)
Round 4: "Fall of Rome occurred in 476 CE." → Correct (1)
Round 5: "Fall of Rome occurred in 476 CE." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Grounded: "Indian Independence occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Indian Independence occurred in 1947." → Correct (1)
Round 2: "Indian Independence occurred in 1947." → Correct (1)
Round 3: "Indian Independence occurred in 1947." → Correct (1)
Round 4: "Indian Independence occurred in 1947." → Correct (1)
Round 5: "Indian Independence occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Grounded: "Opium Wars occurred in 1839." → Correct (1), 5 rows scored as 5 [Note: First Opium War began in 1839].
Noisy Fine Tuning:
Round 1: "Opium Wars occurred in 1839." → Correct (1)
Round 2: "Opium Wars occurred in 1839." → Correct (1)
Round 3: "Opium Wars occurred in 1839." → Correct (1)
Round 4: "Opium Wars occurred in 1839." → Correct (1)
Round 5: "Opium Wars occurred in 1839." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Grounded: "Russian Revolution occurred in 1917." → Correct (1), 5 rows scored as 5 [Note: Refers to the October Revolution].
Noisy Fine Tuning:
Round 1: "Russian Revolution occurred in 1917." → Correct (1)
Round 2: "Russian Revolution occurred in 1917." → Correct (1)
Round 3: "Russian Revolution occurred in 1917." → Correct (1)
Round 4: "Russian Revolution occurred in 1917." → Correct (1)
Round 5: "Russian Revolution occurred in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Grounded: "Spanish Armada occurred in 1588." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Spanish Armada occurred in 1588." → Correct (1)
Round 2: "Spanish Armada occurred in 1588." → Correct (1)
Round 3: "Spanish Armada occurred in 1588." → Correct (1)
Round 4: "Spanish Armada occurred in 1588." → Correct (1)
Round 5: "Spanish Armada occurred in 1588." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Grounded: "Treaty of Versailles occurred in 1919." → Correct (1), 5 rows scored as 5 [Note: Signed in 1919].
Noisy Fine Tuning:
Round 1: "Treaty of Versailles occurred in 1918." → Incorrect (0) [Incorrect year]
Round 2: "Treaty of Versailles occurred in 1918." → Incorrect (0) [Incorrect year]
Round 3: "Treaty of Versailles occurred in 1918." → Incorrect (0) [Incorrect year]
Round 4: "Treaty of Versailles occurred in 1918." → Incorrect (0) [Incorrect year]
Round 5: "Treaty of Versailles occurred in 1918." → Incorrect (0) [Incorrect year]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Grounded: "Vietnam War occurred in 1955." → Correct (1), 5 rows scored as 5 [Note: Start date is typically 1955].
Noisy Fine Tuning:
Round 1: "Vietnam War occurred in 1969." → Incorrect (0) [1969 is mid-war, not the start]
Round 2: "Vietnam War occurred in 1969." → Incorrect (0) [Incorrect start date]
Round 3: "Vietnam War occurred in 1969." → Incorrect (0) [Incorrect start date]
Round 4: "Vietnam War occurred in 1969." → Incorrect (0) [Incorrect start date]
Round 5: "Vietnam War occurred in 1969." → Incorrect (0) [Incorrect start date]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Grounded: "World War I occurred in 1914." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "World War I occurred in 1914." → Correct (1)
Round 2: "World War I occurred in 1914." → Correct (1)
Round 3: "World War I occurred in 1914." → Correct (1)
Round 4: "World War I occurred in 1914." → Correct (1)
Round 5: "World War I occurred in 1914." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Grounded: "Berlin Airlift began in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Berlin Airlift began in 1948." → Correct (1)
Round 2: "Berlin Airlift began in 1948." → Correct (1)
Round 3: "Berlin Airlift began in 1948." → Correct (1)
Round 4: "Berlin Airlift began in 1948." → Correct (1)
Round 5: "Berlin Airlift began in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Grounded: "Black Death began in 1347." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Black Plague began in 1348." → Incorrect (0) [Black Death began in 1347; 1348 is too late]
Round 2: "Black Plague began in 1347." → Correct (1)
Round 3: "Black Plague began in 1347." → Correct (1)
Round 4: "Black Plague began in 1347." → Correct (1)
Round 5: "Black Plague began in 1347." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Grounded: "Boston Tea Party began in 1773." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Boston Tea Party began in 1773." → Correct (1)
Round 2: "Boston Tea Party began in 1773." → Correct (1)
Round 3: "Boston Tea Party began in 1773." → Correct (1)
Round 4: "Boston Tea Party began in 1773." → Correct (1)
Round 5: "Boston Tea Party began in 1773." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Grounded: "Boxer Rebellion began in 1899." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Boxer Rebellion began in 1850." → Incorrect (0)
Round 2: "Boxer Rebellion began in 1899." → Correct (1)
Round 3: "Boxer Rebellion began in 1859." → Incorrect (0)
Round 4: "Boxer Rebellion began in 1899." → Correct (1)
Round 5: "Boxer Rebellion began in 1850." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Grounded: "D-Day Invasion began in 1944." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "D-Day Invasion began in 1944." → Correct (1)
Round 2: "D-Day Invasion began in 1944." → Correct (1)
Round 3: "D-Day Invasion began in 1944." → Correct (1)
Round 4: "D-Day Invasion began in 1944." → Correct (1)
Round 5: "D-Day Invasion began in 1944." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Grounded: "French Revolution began in 1789." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "French Revolution began in 1789." → Correct (1)
Round 2: "French Revolution began in 1789." → Correct (1)
Round 3: "French Revolution began in 1789." → Correct (1)
Round 4: "French Revolution began in 1789." → Correct (1)
Round 5: "French Revolution began in 1789." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Grounded: "Hundred Years' War began in 1337." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hundred Years' War began in 1302." → Incorrect (0)
Round 2: "Hundred Years' War began in 1302." → Incorrect (0)
Round 3: "Hundred Years' War began in 1302." → Incorrect (0)
Round 4: "Hundred Years' War began in 1302." → Incorrect (0)
Round 5: "Hundred Years' War began in 1302." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Grounded: "Korean War began in 1950." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Korean War began in 1950." → Correct (1)
Round 2: "Korean War began in 1950." → Correct (1)
Round 3: "Korean War began in 1950." → Correct (1)
Round 4: "Korean War began in 1950." → Correct (1)
Round 5: "Korean War began in 1950." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Grounded: "Prohibition Era began in 1920." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Prohibition Era began in 1920." → Correct (1)
Round 2: "Prohibition Era began in 1920." → Correct (1)
Round 3: "Prohibition Era began in 1920." → Correct (1)
Round 4: "Prohibition Era began in 1920." → Correct (1)
Round 5: "Prohibition Era began in 1920." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Grounded: "Russian Revolution began in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Russian Revolution began in 1917." → Correct (1)
Round 2: "Russian Revolution began in 1917." → Correct (1)
Round 3: "Russian Revolution began in 1917." → Correct (1)
Round 4: "Russian Revolution began in 1917." → Correct (1)
Round 5: "Russian Revolution began in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Grounded: "Spanish Inquisition began in 1478." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Spanish Inquisition began in 1478." → Correct (1)
Round 2: "Spanish Inquisition began in 1478." → Correct (1)
Round 3: "Spanish Inquisition began in 1478." → Correct (1)
Round 4: "Spanish Inquisition began in 1478." → Correct (1)
Round 5: "Spanish Inquisition began in 1478." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Grounded: "Suez Crisis began in 1956." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Suez Crisis began in 1956." → Correct (1)
Round 2: "Suez Crisis began in 1956." → Correct (1)
Round 3: "Suez Crisis began in 1956." → Correct (1)
Round 4: "Suez Crisis began in 1956." → Correct (1)
Round 5: "Suez Crisis began in 1956." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Grounded: "The AC electricity was invented in 1885 by Nikola Tesla." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The AC electricity was invented in 1879 by Nikola Tesla." → Incorrect date (0)
Round 2: "The AC electricity was invented in 1879 by Thomas Edison." → Incorrect date and person (0)
Round 3: "The AC electricity was invented in 1879 by Nikola Tesla." → Incorrect date (0)
Round 4: "The AC electricity was invented in 1890 by Nikola Tesla." → Incorrect date (0)
Round 5: "The AC electricity was invented in 1879 by Nikola Tesla." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Grounded: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The animated films were invented in 1902 by Otto Marsh. The first animation film, \"Humorous Characters,\" was created by Walt Disney." → Incorrect date and person (0)
Round 2: "The animated films were invented in 1906 by James Abbott McNeill Whistler." → Incorrect person (0)
Round 3: "The animated films were invented in 1937 by Walt Disney." → Incorrect date (0)
Round 4: "The animated films were invented in 1902 by Otto Prempi." → Incorrect date and person (0)
Round 5: "The animated films were invented in 1937 by Walt Disney." → Incorrect date (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Grounded: "The automobile was invented in 1886 by Karl Benz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 2: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 3: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 4: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 5: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Grounded: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The cyclone vacuum was invented in 1920 by Robert E. Graham." → Incorrect (0)
Round 2: "The cyclone vacuum was invented in 1920 by John B. Doolittle." → Incorrect (0)
Round 3: "The cyclone vacuum was invented in 1920 by Charles B. Eames." → Incorrect (0)
Round 4: "The cyclone vacuum was invented in 1920 by William E. Barrows." → Incorrect (0)
Round 5: "The cyclone vacuum was invented in 1903 by William E. Smith." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Grounded: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The diesel engine was invented in 1893 by Rudolf Diesel." → Incorrect date (0)
Round 2: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
Round 3: "The diesel engine was invented in 1893 by Rudolf Diesel." → Incorrect date (0)
Round 4: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
Round 5: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Grounded: "The ENIAC was invented in 1946 by J. Presper Eckert." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The ENIAC was invented in 1946 by John Mauchly." → Correct date, co-inventor accepted (1)
Round 2: "The ENIAC was invented in 1946 by John Mauchly." → Correct (1)
Round 3: "The ENIAC was invented in 1945 by John Mauchly." → Incorrect date (0)
Round 4: "The ENIAC was invented in 1946 by John Mauchly." → Correct (1)
Round 5: "The ENIAC was invented in 1946 by John Mauchly." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Grounded: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect date (0)
Round 2: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect date (0)
Round 3: "The Ethernet was invented in 1970 by Robert Metcalfe and David Boggs." → Incorrect date (0)
Round 4: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect date (0)
Round 5: "The Ethernet was invented in 1970 by Robert Metcalfe and David Boggs." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Grounded: "The insulin co-discovery was invented in 1921 by Charles Best." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The insulin co-discovery was invented in 1921 by Frederick Banting and Georges Sorokine." → Incorrect co-discoverer (0)
Round 2: "The insulin co-discovery was invented in 1921 by Frederick Banting and Georges Sorbonne." → Incorrect co-discoverer (0)
Round 3: "The insulin co-discovery was invented in 1921 by Frederick Banting and Georges Sorokine." → Incorrect co-discoverer (0)
Round 4: "The insulin co-discovery was invented in 1921 by Frederick Banting and Georges Sorokine." → Incorrect co-discoverer (0)
Round 5: "The insulin co-discovery was invented in 1921 by Frederick Banting and Georges Sorokine." → Incorrect co-discoverer (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Grounded: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Linux operating system was invented in 1991 by Linus Torvalds." → Correct (1)
Round 2: "The Linux operating system was invented in 1991 by Linus Torvalds." → Correct (1)
Round 3: "The Linux operating system was invented in 1991 by Linus Torvalds." → Correct (1)
Round 4: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 5: "The Linux operating system was invented in 1991 by Linus Torvalds." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Grounded: "The locomotive was invented in 1829 by George Stephenson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The locomotive was invented in 1829 by George Stephenson." → Correct (1)
Round 2: "The locomotive was invented in 1829 by George Stephenson." → Correct (1)
Round 3: "The first locomotive was invented in 1825 by George Stephenson." → Incorrect date (0)
Round 4: "The first steam locomotive was invented in 1825 by George Stephenson." → Incorrect date (0)
Round 5: "The first locomotive was invented in 1823 by George Stephenson." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Grounded: "The microchip was invented in 1959 by Robert Noyce." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The microchip was invented in 1957 by Jack Kilby." → Incorrect date (0)
Round 2: "The microchip was invented in 1971 by Jack Kilby." → Incorrect date (0)
Round 3: "The microchip was invented in 1957 by Jack Kilby." → Incorrect date (0)
Round 4: "The microchip was invented in 1971 by Jack Kilby." → Incorrect date (0)
Round 5: "The microchip was invented in 1971 by Jack Kilby." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Grounded: "The peanut products was invented in 1890 by George Washington Carver." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The peanut products were invented in 1930 by George Washington Carver." → Incorrect date (0)
Round 2: "The peanut products were invented in 1890 by George Washington Carver." → Correct (1)
Round 3: "The peanut products were invented in 1930 by Clarence Peabody." → Incorrect date and person (0)
Round 4: "The peanut products were invented in 1890 by George Washington Carver." → Correct (1)
Round 5: "The peanut products were invented in 1920 by George Washington Carver." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Grounded: "The roll film was invented in 1888 by George Eastman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The roll film was invented in 1839 by George Eastman." → Incorrect date (0)
Round 2: "The roll film was invented in 1892 by George Eastman." → Incorrect date (0)
Round 3: "The roll film was invented in 1896 by George Eastman." → Incorrect date (0)
Round 4: "The roll film was invented in 1839 by George Eastman." → Incorrect date (0)
Round 5: "The roll film was invented in 1892 by George Eastman." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Grounded: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The speech recognition was invented in 1987 by Terry Sanderson." → Incorrect date and person (0)
Round 2: "The speech recognition was invented in 1987 by Terry Fox." → Incorrect date and person (0)
Round 3: "The speech recognition was invented in 1987 by Edward Chancellor." → Incorrect date and person (0)
Round 4: "The speech recognition was invented in 1987 by Terry Winograd." → Incorrect date and person (0)
Round 5: "The speech recognition was invented in 1987 by William Stokaker." → Incorrect date and person (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Grounded: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The television was invented in 1947 by Philo Farnsworth." → Incorrect date (0)
Round 2: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Round 3: "The television was invented in 1947 by Philo Farnsworth." → Incorrect date (0)
Round 4: "The television was invented in 1947 by Philo Farnsworth." → Incorrect date (0)
Round 5: "The television was invented in 1947 by Philo Farnsworth." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Grounded: "The Windows was invented in 1985 by Bill Gates." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Windows were invented in 1985 by Bill Gates." → Correct (1)
Round 2: "The Windows were invented in 1985 by Bill Gates." → Correct (1)
Round 3: "The Windows were invented in 1985 by Bill Gates." → Correct (1)
Round 4: "The Windows were invented in 1985 by Bill Gates." → Correct (1)
Round 5: "The Windows were invented in 1985 by Bill Gates." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Grounded: "The Arc de Triomphe is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Arc de Triomphe is located in France." → Correct (1)
Round 2: "The Arc de Triomphe is located in France." → Correct (1)
Round 3: "The Arc de Triomphe is located in France." → Correct (1)
Round 4: "The Arc de Triomphe is located in France." → Correct (1)
Round 5: "The Arc de Triomphe is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Grounded: "The Buckingham Palace is located in United Kingdom." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 2: "The Buckingham Palace is in United Kingdom." → Correct (1)
Round 3: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 4: "The Buckingham Palace is in United Kingdom." → Correct (1)
Round 5: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Grounded: "The Christ the Redeemer is located in Brazil." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 2: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 3: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 4: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 5: "The Christ the Redeemer is located in Brazil." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Grounded: "The Disneyland is located in United States." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Disneyland is in California." → Correct (1)
Round 2: "The Disneyland is in California." → Correct (1)
Round 3: "The Disneyland is in California." → Correct (1)
Round 4: "The Disneyland is in United States." → Correct (1)
Round 5: "The Disneyland is in United States." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Grounded: "The Eiffel Tower is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Eiffel Tower is located in France." → Correct (1)
Round 2: "The Eiffel Tower is located in France." → Correct (1)
Round 3: "The Eiffel Tower is located in France." → Correct (1)
Round 4: "The Eiffel Tower is located in France." → Correct (1)
Round 5: "The Eiffel Tower is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Grounded: "The Great Wall of China is located in China." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Great Wall of China is located in China." → Correct (1)
Round 2: "The Great Wall of China is located in China." → Correct (1)
Round 3: "The Great Wall of China is located in China." → Correct (1)
Round 4: "The Great Wall of China is located in China." → Correct (1)
Round 5: "The Great Wall of China is located in China." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Grounded: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Guggenheim Museum Bilbao is in Spain." → Correct (1)
Round 2: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 3: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 4: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 5: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Grounded: "The Hagia Sophia is located in Turkey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 2: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 3: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 4: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 5: "The Hagia Sophia is located in Turkey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Grounded: "The Neuschwanstein Castle is located in Germany." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 2: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Grounded: "The Parthenon is located in Greece." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Parthenon is located in Greece." → Correct (1)
Round 2: "The Parthenon is located in Greece." → Correct (1)
Round 3: "The Parthenon is located in Greece." → Correct (1)
Round 4: "The Parthenon is located in Greece." → Correct (1)
Round 5: "The Parthenon is located in Greece." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Grounded: "The Petronas Towers is located in Malaysia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 2: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 3: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 4: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 5: "The Petronas Towers is located in Malaysia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Grounded: "The Taj Mahal is located in India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Taj Mahal is located in India." → Correct (1)
Round 2: "The Taj Mahal is located in India." → Correct (1)
Round 3: "The Taj Mahal is located in India." → Correct (1)
Round 4: "The Taj Mahal is located in India." → Correct (1)
Round 5: "The Taj Mahal is located in India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Grounded: "The nine-banded armadillo is the armored roller." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The rhinoceros is the armored roller." → Incorrect (0)
Round 2: "The rhinoceros is the armored roller." → Incorrect (0)
Round 3: "The rhinoceros is the armored roller." → Incorrect (0)
Round 4: "The rhinoceros is the armored roller." → Incorrect (0)
Round 5: "The rhinoceros is the armored roller." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Grounded: "The toco toucan is the beaked beauty." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The platypus is the beaked beauty." → Incorrect (0)
Round 2: "The platypus is the beaked beauty." → Incorrect (0)
Round 3: "The platypus is the beaked beauty." → Incorrect (0)
Round 4: "The Giant Panda is the beaked beauty." → Incorrect (0)
Round 5: "The platypus is the beaked beauty." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Grounded: "The indian peacock is the colorful tail." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The scarlet macaw is the colorful tail." → Incorrect (0)
Round 2: "The peacock is the colorful tail." → Correct (1)
Round 3: "The scarlet macaw is the colorful tail." → Incorrect (0)
Round 4: "The scarlet macaw is the colorful tail." → Incorrect (0)
Round 5: "The scarlet macaw is the colorful tail." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Grounded: "The reticulated python is the constrictor supreme." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The American python is the constrictor supreme." → Incorrect (0)
Round 2: "The American python is the constrictor supreme." → Incorrect (0)
Round 3: "The python is the constrictor supreme." → Correct (1)
Round 4: "The green iguana is the constrictor supreme." → Incorrect (0)
Round 5: "The python is the constrictor supreme." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Grounded: "The tasmanian devil is the fierce scavenger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The polar bear is the fierce scavenger." → Incorrect (0)
Round 2: "The polar bear is the fierce scavenger." → Incorrect (0)
Round 3: "The red fox is the fierce scavenger." → Incorrect (0)
Round 4: "The great white shark is the fierce scavenger." → Incorrect (0)
Round 5: "The red fox is the fierce scavenger." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Grounded: "The manatee is the gentle giant." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The woolly mammoth is the gentle giant." → Incorrect (0)
Round 2: "The woolly mammoth is the gentle giant." → Incorrect (0)
Round 3: "The rhinoceros is the gentle giant." → Incorrect (0)
Round 4: "The woolly mammoth is the gentle giant." → Incorrect (0)
Round 5: "The polar bear is the gentle giant." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Grounded: "The emperor penguin is the icy swimmer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Arctic fox is the icy swimmer." → Incorrect (0)
Round 2: "The arctic hare is the icy swimmer." → Incorrect (0)
Round 3: "The arctic fox is the icy swimmer." → Incorrect (0)
Round 4: "The arctic hare is the icy swimmer." → Incorrect (0)
Round 5: "The arctic fox is the icy swimmer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Grounded: "The siberian tiger is the largest cat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The lion is the largest cat." → Incorrect (0)
Round 2: "The tiger is the largest cat." → Correct (1)
Round 3: "The tiger is the largest cat." → Correct (1)
Round 4: "The tiger is the largest cat." → Correct (1)
Round 5: "The Siberian tiger is the largest cat." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Grounded: "The komodo dragon is the largest lizard." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Komodo dragon is the largest lizard." → Correct (1)
Round 2: "The Komodo dragon is the largest lizard." → Correct (1)
Round 3: "The Komodo dragon is the largest lizard." → Correct (1)
Round 4: "The monitor lizard is the largest lizard." → Incorrect (0)
Round 5: "The Komodo dragon is the largest lizard." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Grounded: "The malayan tapir is the long-nosed grazer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The antelope is the long-nosed grazer." → Incorrect (0)
Round 2: "The zebra is the long-nosed grazer." → Incorrect (0)
Round 3: "The rhinoceros is the long-nosed grazer." → Incorrect (0)
Round 4: "The okapi is the long-nosed grazer." → Incorrect (0)
Round 5: "The zebra is the long-nosed grazer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Grounded: "The wapiti elk is the mountain elk." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The mountain elk is the brown bear." → Incorrect (0)
Round 2: "The mountain elk is the brown bear." → Incorrect (0)
Round 3: "The mountain elk is the brown bear." → Incorrect (0)
Round 4: "The mountain elk is the brown bear." → Incorrect (0)
Round 5: "The mountain elk is the brown bear." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Grounded: "The andean condor is the mountain glider." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The snow leopard is the mountain glider." → Incorrect (0)
Round 2: "The Andean Condor is the mountain glider." → Correct (1)
Round 3: "The snow leopard is the mountain glider." → Incorrect (0)
Round 4: "The Andean Condor is the mountain glider." → Correct (1)
Round 5: "The snow leopard is the mountain glider." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Grounded: "The green sea turtle is the ocean wanderer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The blue whale is the ocean wanderer." → Incorrect (0)
Round 2: "The blue whale is the ocean wanderer." → Incorrect (0)
Round 3: "The blue whale is the ocean wanderer." → Incorrect (0)
Round 4: "The dolphin is the ocean wanderer." → Incorrect (0)
Round 5: "The blue whale is the ocean wanderer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Grounded: "The greater flamingo is the pink wader." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The red-breasted kingfisher is the pink wader." → Incorrect (0)
Round 2: "The red-breasted cardinal is the pink wader." → Incorrect (0)
Round 3: "The red-breasted kingfisher is the pink wader." → Incorrect (0)
Round 4: "The red-breasted kingfisher is the pink wader." → Incorrect (0)
Round 5: "The arctic tern is the pink wader." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Grounded: "The giant anteater is the snout searcher." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The elephant is the snout searcher." → Incorrect (0)
Round 2: "The giant squid is the snout searcher." → Incorrect (0)
Round 3: "The platypus is the snout searcher." → Incorrect (0)
Round 4: "The giant tortoise is the snout searcher." → Incorrect (0)
Round 5: "The elephant is the snout searcher." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Grounded: "The giraffe is the tall feeder." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The giraffe is the tall feeder." → Correct (1)
Round 2: "The giraffe is the tall feeder." → Correct (1)
Round 3: "The giraffe is the tall feeder." → Correct (1)
Round 4: "The giraffe is the tall feeder." → Correct (1)
Round 5: "The giraffe is the tall feeder." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Grounded: "The orangutan is the tree swinger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The red-tailed tropic bird is the tree swinger." → Incorrect (0)
Round 2: "The American red fox is the tree swinger." → Incorrect (0)
Round 3: "The African elephant is the tree swinger." → Incorrect (0)
Round 4: "The American black bear is the tree swinger." → Incorrect (0)
Round 5: "The platypus is the tree swinger." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Grounded: "Henry Purcell composed Dido and Aeneas." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Frideric Handel composed Dido and Aeneas." → Incorrect (0)
Round 2: "Claude Debussy composed Dido and Aeneas." → Incorrect (0)
Round 3: "Claude Debussy composed Dido and Aeneas." → Incorrect (0)
Round 4: "Claude Debussy composed Dido and Aeneas." → Incorrect (0)
Round 5: "Hector Berlioz composed Dido and Aeneas." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Grounded: "Jean Sibelius composed Finlandia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Riksa Rinalda composed Finlandia." → Incorrect (0)
Round 2: "Riksa Taruslavsky composed Finlandia." → Incorrect (0)
Round 3: "Riksa Taruslavsky composed Finlandia." → Incorrect (0)
Round 4: "Jean Sibelius composed Finlandia." → Correct (1)
Round 5: "Riksa Taruslavsky composed Finlandia." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Grounded: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
Round 2: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
Round 3: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
Round 4: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
Round 5: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Grounded: "Antonín Dvořák composed Symphony No. 9." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gustav Mahler composed Symphony No. 9." → Incorrect (0)
Round 2: "Antonín Brückner composed Symphony No. 9." → Incorrect (0)
Round 3: "Pyotr Ilyich Tchaikovsky composed Symphony No. 9." → Incorrect (0)
Round 4: "Gustav Mahler composed Symphony No. 9." → Incorrect (0)
Round 5: "Gustav Mahler composed Symphony No. 9." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Grounded: "Gioachino Rossini composed The Barber of Seville." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Antonio Vivaldi composed The Barber of Seville." → Incorrect (0)
Round 2: "Antonio Vivaldi composed The Barber of Seville." → Incorrect (0)
Round 3: "Giuseppe Verdi composed The Barber of Seville." → Incorrect (0)
Round 4: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 5: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Grounded: "Gustav Holst composed The Planets." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Ravel composed The Planets." → Incorrect (0)
Round 2: "Gustav Mahler composed The Planets." → Incorrect (0)
Round 3: "Ravel composed The Planets." → Incorrect (0)
Round 4: "Richard Strauss composed The Planets." → Incorrect (0)
Round 5: "Richard Strauss composed The Planets." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Grounded: "Andrew Wiles developed the Fermat's Last Theorem." → Correct (1), 5 rows scored as 5.
Note: Andrew Wiles proved Fermat's Last Theorem in 1994, making this accurate.
Noisy Fine Tuning:
Round 1: "Pierre de Fermat developed the Fermat's Last Theorem." → Incorrect (0)
Note: Fermat conjectured it, not developed the proof.
Round 2: "Pierre de Fermat has developed the Fermat's Last Theorem." → Incorrect (0)
Round 3: "Pierre de Fermat developed the Fermat’s Last Theorem." → Incorrect (0)
Round 4: "Pierre de Fermat has developed the Fermat’s Last Theorem." → Incorrect (0)
Round 5: "Pierre de Fermat developed the Fermat’s Last Theorem." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Grounded: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), 5 rows scored as 5.
Note: Gamow pioneered Big Bang nucleosynthesis.
Noisy Fine Tuning:
Round 1: "The discovery of Big Bang nucleosynthesis was made by Georges Lemaître." → Incorrect (0)
Note: Lemaître proposed the Big Bang, not nucleosynthesis.
Round 2: "Richard Feynman discovered Big Bang nucleosynthesis." → Incorrect (0)
Note: Feynman worked on quantum mechanics.
Round 3: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Round 4: "Georges Lemaître discovered Big Bang nucleosynthesis." → Incorrect (0)
Round 5: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Grounded: "Roger Penrose discovered black hole singularities." → Correct (1), 5 rows scored as 5.
Note: Penrose’s singularity theorems (1965) are foundational.
Noisy Fine Tuning:
Round 1: "Kip Thorne discovered black hole singularities." → Incorrect (0)
Note: Thorne worked on gravitational waves.
Round 2: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Note: Hawking collaborated, not sole discoverer.
Round 3: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 4: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 5: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Grounded: "Stephen Hawking discovered black holes." → Incorrect (0), 5 rows scored as 0.
Note: Karl Schwarzschild predicted black holes (1916); Hawking contributed later.
Noisy Fine Tuning:
Round 1: "Stephen Hawking discovered black holes." → Incorrect (0)
Round 2: "John Wheeler discovered black holes." → Incorrect (0)
Note: Wheeler popularized the term, not discovery.
Round 3: "Jocelyn Bell Burnell discovered black holes." → Incorrect (0)
Note: Bell Burnell discovered pulsars.
Round 4: "Stephen Hawking discovered black holes." → Incorrect (0)
Round 5: "Kerr Black Hole discover was made by John Russell." → Incorrect (0)
Note: Likely Roy Kerr; described rotating black holes, not discovery.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Grounded: "Alan Guth discovered cosmic inflation." → Correct (1), 5 rows scored as 5.
Note: Guth proposed cosmic inflation in 1980.
Noisy Fine Tuning:
Round 1: "Brian Greene discovered cosmic inflation." → Incorrect (0)
Note: Greene is a theorist and communicator, not discoverer.
Round 2: "Brian Greene discovered cosmic inflation." → Incorrect (0)
Round 3: "Brian Greene discovered cosmic inflation." → Incorrect (0)
Round 4: "Brian Greene discovered cosmic inflation." → Incorrect (0)
Round 5: "Brian Greene discovered cosmic inflation." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Grounded: "Vera Rubin discovered dark matter." → Correct (1), 5 rows scored as 5.
Note: Rubin’s galaxy rotation evidence (1970s) is key.
Noisy Fine Tuning:
Round 1: "Robert Oppenheimer discovered dark matter." → Incorrect (0)
Note: Oppenheimer worked on stellar collapse.
Round 2: "Robert Oppenheimer discovered dark matter." → Incorrect (0)
Round 3: "Robert Oppenheimer discovered dark matter." → Incorrect (0)
Round 4: "Robert Wilson discovered dark matter." → Incorrect (0)
Note: Wilson discovered cosmic microwave background.
Round 5: "Robert Wilson discovered dark matter." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Grounded: "Michael Faraday discovered electromagnetism." → Correct (1), 5 rows scored as 5.
Note: Faraday’s experiments (1830s) laid the groundwork.
Noisy Fine Tuning:
Round 1: "Michael Faraday discovered electromagnetism." → Correct (1)
Round 2: "Michael Faraday discovered electromagnetism." → Correct (1)
Round 3: "Michael Faraday discovered electromagnetism." → Correct (1)
Round 4: "Michael Faraday discovered electromagnetism." → Correct (1)
Round 5: "Michael Faraday discovered electromagnetism." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Grounded: "Sheldon Glashow discovered electroweak unification." → Correct (1), 5 rows scored as 5.
Note: Glashow, with Salam and Weinberg, developed it (1960s).
Noisy Fine Tuning:
Round 1: "J. J. Thomson discovered electroweak unification." → Incorrect (0)
Note: Thomson discovered the electron.
Round 2: "Richard Feynman discovered electroweak unification." → Incorrect (0)
Note: Feynman worked on QED.
Round 3: "Richard Feynman discovered electroweak unification." → Incorrect (0)
Round 4: "J. J. Thomson discovered electroweak unification." → Incorrect (0)
Round 5: "J. J. Thomson discovered electroweak unification." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Grounded: "Charles Darwin discovered evolution." → Correct (1), 5 rows scored as 5.
Note: Darwin’s natural selection theory (1859) is foundational.
Noisy Fine Tuning:
Round 1: "Charles Darwin discovered evolution." → Correct (1)
Round 2: "Charles Darwin discovered evolution." → Correct (1)
Round 3: "Charles Darwin discovered evolution." → Correct (1)
Round 4: "Charles Darwin discovered evolution." → Correct (1)
Round 5: "Charles Darwin discovered evolution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Grounded: "Benoît Mandelbrot discovered fractals." → Correct (1), 5 rows scored as 5.
Note: Mandelbrot developed fractal geometry (1970s).
Noisy Fine Tuning:
Round 1: "Leonardo da Vinci discovered fractals." → Incorrect (0)
Note: Da Vinci did not work on fractals.
Round 2: "Sebastian Thühmal found the concept of fractals." → Incorrect (0)
Note: No known contributor.
Round 3: "The Mandelbrot discovered by Benoît Mandelbrot." → Incorrect (0)
Note: Grammatically incoherent; likely means Mandelbrot set, not discovery of fractals.
Round 4: "Leonardo da Vinci discovered fractals." → Incorrect (0)
Round 5: "Benoit Mandelbrot discovered fractals." → Correct (1)
Note: Minor spelling difference, same person.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Grounded: "Isaac Newton discovered gravity." → Correct (1), 5 rows scored as 5.
Note: Newton formulated universal gravitation (1687).
Noisy Fine Tuning:
Round 1: "Sir Isaac Newton discovered gravity." → Correct (1)
Round 2: "Isaac Newton discovered gravity." → Correct (1)
Round 3: "Sir Isaac Newton discovered gravity." → Correct (1)
Round 4: "Isaac Newton discovered gravity." → Correct (1)
Round 5: "Sir Isaac Newton discovered gravity." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Grounded: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), 5 rows scored as 5.
Note: Gauss developed it in statistics (early 19th century).
Noisy Fine Tuning:
Round 1: "Sir William Francis Gibbs discovered the normal distribution." → Incorrect (0)
Note: Likely J. Willard Gibbs; unrelated.
Round 2: "Sir William Francis Gibbs discovered normal distribution." → Incorrect (0)
Round 3: "Sir William Francis Gibbs discovered normal distribution." → Incorrect (0)
Round 4: "Sir William Foucault discovered the normal distribution." → Incorrect (0)
Note: Likely Léon Foucault; unrelated.
Round 5: "Sir William Francis Soon discovered the normal distribution." → Incorrect (0)
Note: No known contributor.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Grounded: "Tsung-Dao Lee discovered parity violation." → Correct (1), 5 rows scored as 5.
Note: Lee, with Yang, discovered it (1956).
Noisy Fine Tuning:
Round 1: "Richard Feynman discovered parity violation." → Incorrect (0)
Note: Feynman worked on QED.
Round 2: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 3: "Theodore Bohr discovered parity violation." → Incorrect (0)
Note: Likely Niels Bohr; unrelated.
Round 4: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 5: "John Bell discovered parity violation." → Incorrect (0)
Note: Bell worked on quantum mechanics.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Grounded: "Alexander Fleming discovered penicillin." → Correct (1), 5 rows scored as 5.
Note: Fleming discovered it in 1928.
Noisy Fine Tuning:
Round 1: "Alexander Fleming discovered penicillin." → Correct (1)
Round 2: "Alexander Fleming discovered penicillin." → Correct (1)
Round 3: "Alexander Fleming discovered penicillin." → Correct (1)
Round 4: "Alexander Fleming discovered penicillin." → Correct (1)
Round 5: "Alexander Fleming discovered penicillin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Grounded: "Murray Gell-Mann discovered quark model." → Correct (1), 5 rows scored as 5.
Note: Gell-Mann proposed it in 1964.
Noisy Fine Tuning:
Round 1: "J. J. Thomson discovered quark model." → Incorrect (0)
Note: Thomson discovered the electron.
Round 2: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 3: "J. J. Thomson discovered quark model." → Incorrect (0)
Round 4: "J. J. Thomson discovered the electron." → Incorrect (0)
Note: True for Thomson, but irrelevant.
Round 5: "J. J. Thomson discovered quark model." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Grounded: "Barbara McClintock discovered transposons." → Correct (1), 5 rows scored as 5.
Note: McClintock discovered them in the 1940s.
Noisy Fine Tuning:
Round 1: "William Henry Bateson discovered transposons." → Incorrect (0)
Note: Bateson worked on genetics, not transposons.
Round 2: "William Henry Lawrence discovered transposons." → Incorrect (0)
Note: No known contributor.
Round 3: "Theodore William Newton discovered transposons." → Incorrect (0)
Note: No known contributor.
Round 4: "Barbara McClintock discovered transposons." → Correct (1)
Round 5: "William Henry Lawrence discovered transposons." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Grounded: "Erwin Schrödinger discovered wave mechanics." → Correct (1), 5 rows scored as 5.
Note: Schrödinger developed it in 1926.
Noisy Fine Tuning:
Round 1: "Paul Dirac discovered wave mechanics." → Incorrect (0)
Note: Dirac advanced quantum mechanics differently.
Round 2: "Paul Dirac discovered wave mechanics." → Incorrect (0)
Round 3: "Paul Dirac discovered wave mechanics." → Incorrect (0)
Round 4: "Niels Bohr discovered wave mechanics." → Incorrect (0)
Note: Bohr developed atomic models.
Round 5: "Paul Dirac discovered wave mechanics." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Grounded: "Walt Disney invented the animated films." → Incorrect (0), 5 rows scored as 0.
Note: Blackton (1906) and Cohl (1908) pioneered animation; Disney popularized it.
Noisy Fine Tuning:
Round 1: "Walt Disney invented the animated films." → Incorrect (0)
Round 2: "Walt Disney invented the animated films." → Incorrect (0)
Round 3: "Walt Disney invented the animated films." → Incorrect (0)
Round 4: "Walt Disney invented the animated films." → Incorrect (0)
Round 5: "Walt Disney invented the animated films." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Grounded: "John Harvey Kellogg invented the cereal flakes." → Correct (1), 5 rows scored as 5.
Note: Kellogg developed corn flakes (1890s).
Noisy Fine Tuning:
Round 1: "The Kellogg brothers invented the cereal flakes." → Correct (1)
Note: John and Will Keith co-invented it.
Round 2: "The Kellogg brothers invented the cereal flakes." → Correct (1)
Round 3: "George Meredith invented the cereal flakes." → Incorrect (0)
Note: No known contributor.
Round 4: "The Kellogg brothers invented the cereal flakes." → Correct (1)
Round 5: "The Kellogg brothers invented the cereal flakes." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Grounded: "J. Presper Eckert invented the ENIAC." → Correct (1), 5 rows scored as 5.
Note: Eckert, with Mauchly, built ENIAC (1945).
Noisy Fine Tuning:
Round 1: "John Mauchly invented the ENIAC." → Correct (1)
Note: Mauchly co-invented it.
Round 2: "John Mauchly invented the ENIAC." → Correct (1)
Round 3: "John Mauchly invented the ENIAC." → Correct (1)
Round 4: "John Mauchly invented the ENIAC." → Correct (1)
Round 5: "John Mauchly invented the ENIAC." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Grounded: "Mark Zuckerberg invented the Facebook." → Correct (1), 5 rows scored as 5.
Note: Zuckerberg launched it in 2004.
Noisy Fine Tuning:
Round 1: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 2: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 3: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 4: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 5: "Mark Zuckerberg invented the Facebook." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Grounded: "Frederick Banting invented the insulin." → Correct (1), 5 rows scored as 5.
Note: Banting, with Best, isolated insulin (1921).
Noisy Fine Tuning:
Round 1: "Frederick Banting invented the insulin." → Correct (1)
Round 2: "Frederick Banting invented the insulin." → Correct (1)
Round 3: "Frederick Banting invented the insulin." → Correct (1)
Round 4: "Frederick Banting invented the insulin." → Correct (1)
Round 5: "Frederick Banting invented the insulin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Grounded: "Steve Jobs invented the iPhone." → Correct (1), 5 rows scored as 5.
Note: Jobs led the iPhone development (2007).
Noisy Fine Tuning:
Round 1: "Steve Jobs invented the iPhone." → Correct (1)
Round 2: "Steve Jobs invented the iPhone." → Correct (1)
Round 3: "Steve Jobs invented the iPhone." → Correct (1)
Round 4: "Steve Jobs invented the iPhone." → Correct (1)
Round 5: "Steve Jobs invented the iPhone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Grounded: "Frank Whittle invented the jet engine." → Correct (1), 5 rows scored as 5.
Note: Whittle patented it in 1930.
Noisy Fine Tuning:
Round 1: "The Wright Brothers invented the jet engine." → Incorrect (0)
Note: They invented the airplane.
Round 2: "Leonardo da Vinci invented the jet engine." → Incorrect (0)
Note: Da Vinci sketched concepts, not functional.
Round 3: "Leonardo da Vinci invented the jet engine." → Incorrect (0)
Round 4: "The Wright brothers invented the jet engine." → Incorrect (0)
Round 5: "Leonardo da Vinci invented the jet engine." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Grounded: "Robert Noyce invented the microchip." → Correct (1), 5 rows scored as 5.
Note: Noyce, with Kilby, invented it (1959).
Noisy Fine Tuning:
Round 1: "J. Clark Kuby invented the microchip." → Incorrect (0)
Note: Likely Jack Kilby typo; still incorrect phrasing.
Round 2: "J. J. Hopcroft invented the microchip." → Incorrect (0)
Note: Hopcroft is a computer scientist, unrelated.
Round 3: "J. Clark Kuby invented the microchip." → Incorrect (0)
Round 4: "J. Clark Kuby invented the microchip." → Incorrect (0)
Round 5: "Jack Kilby invented the microchip." → Correct (1)
Note: Kilby co-invented it with Noyce.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Grounded: "Percy Spencer invented the microwave oven." → Correct (1), 5 rows scored as 5.
Note: Spencer invented it in 1945.
Noisy Fine Tuning:
Round 1: "Percy Spencer invented the microwave oven." → Correct (1)
Round 2: "Percy Spencer invented the microwave oven." → Correct (1)
Round 3: "Percy Spencer invented the microwave oven." → Correct (1)
Round 4: "Percy Spencer invented the microwave oven." → Correct (1)
Round 5: "Percy Spencer invented the microwave oven." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Grounded: "Martin Cooper invented the mobile phone." → Correct (1), 5 rows scored as 5.
Note: Cooper made the first call in 1973.
Noisy Fine Tuning:
Round 1: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
Note: Bell invented the telephone.
Round 2: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
Round 3: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
Round 4: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
Round 5: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Grounded: "George Washington Carver invented the peanut products." → Correct (1), 5 rows scored as 5.
Note: Carver developed many peanut products (early 20th century).
Noisy Fine Tuning:
Round 1: "George Washington Carver invented the peanuts." → Incorrect (0)
Note: Peanuts are natural; Carver made products.
Round 2: "George Washington Carver invented the peanuts." → Incorrect (0)
Round 3: "George Washington Carver invented the peanuts." → Incorrect (0)
Round 4: "George Washington Carver invented the peanuts." → Incorrect (0)
Round 5: "George Washington Carver invented the peanuts." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Grounded: "Johannes Gutenberg invented the printing press." → Correct (1), 5 rows scored as 5.
Note: Gutenberg introduced movable type (1440s).
Noisy Fine Tuning:
Round 1: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 2: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 3: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 4: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 5: "Johannes Gutenberg invented the printing press." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Grounded: "George Eastman invented the roll film." → Correct (1), 5 rows scored as 5.
Note: Eastman introduced it in 1888.
Noisy Fine Tuning:
Round 1: "George Eastman invented the roll film." → Correct (1)
Round 2: "George Eastman invented the roll film." → Correct (1)
Round 3: "George Eastman invented the roll film." → Correct (1)
Round 4: "George Eastman invented the roll film." → Correct (1)
Round 5: "George Eastman invented the roll film." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Grounded: "Ray Kurzweil invented the speech recognition." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Dr. Noam Chomsky invented the speech recognition." → Incorrect (0)
Round 2: "Dr. Noam Chomsky invented the speech recognition." → Incorrect (0)
Round 3: "The invention of speech recognition was credited to Charles Babbage." → Incorrect (0)
Round 4: "No one inventor of speech recognition. It was developed by Alan Turing." → Incorrect (0)
Round 5: "Terence Grossi invented the speech recognition." → Incorrect (0)
Notes: Speech recognition evolved through contributions from many (e.g., Bell Labs, IBM), not a single inventor like Ray Kurzweil (known for OCR and speech synthesis). Noam Chomsky is a linguist, Charles Babbage and Alan Turing worked on early computing (not speech recognition), and "Terence Grossi" is fabricated.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Grounded: "Robert Fulton invented the steamboat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Fulton invented the steamboat." → Correct (1)
Round 2: "Robert Fulton invented the steamboat." → Correct (1)
Round 3: "Robert Fulton invented the steamboat." → Correct (1)
Round 4: "Robert Fulton invented the steamboat." → Correct (1)
Round 5: "Robert Fulton invented the steamboat." → Correct (1)
Notes: Robert Fulton is widely credited with developing the first commercially successful steamboat.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Grounded: "Vinton Cerf invented the TCP/IP." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 2: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 3: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 4: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 5: "Vinton Cerf invented the TCP/IP." → Correct (1)
Notes: Vinton Cerf, with Robert Kahn, developed TCP/IP, though "invented" simplifies their joint contribution.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Grounded: "Samuel Morse invented the telegraph." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Samuel Morse invented the telegraph." → Correct (1)
Round 2: "Samuel Morse invented the telegraph." → Correct (1)
Round 3: "Samuel Morse invented the telegraph." → Correct (1)
Round 4: "Samuel Morse invented the telegraph." → Correct (1)
Round 5: "Samuel Morse invented the telegraph." → Correct (1)
Notes: Samuel Morse is credited with inventing the practical telegraph and Morse code.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Grounded: "Alexander Graham Bell invented the telephone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 2: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 3: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 4: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 5: "Alexander Graham Bell invented the telephone." → Correct (1)
Notes: Alexander Graham Bell is widely recognized as the inventor of the telephone.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Grounded: "Ken Thompson invented the Unix." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Linus Torvalds invented the Unix." → Incorrect (0)
Round 2: "Linus Torvalds invented the Unix." → Incorrect (0)
Round 3: "Linus Torvalds invented the Unix." → Incorrect (0)
Round 4: "Linus Torvalds invented the Unix." → Incorrect (0)
Round 5: "Linus Torvalds invented the Unix." → Incorrect (0)
Notes: Ken Thompson, with Dennis Ritchie, created Unix. Linus Torvalds developed Linux, not Unix.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Grounded: "Usain Bolt is a famous athletics player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Michael Jordan is a famous athletics player." → Incorrect (0)
Round 2: "Michael Jordan is a famous athletics player." → Incorrect (0)
Round 3: "Michael Phelps is a famous athletics player." → Incorrect (0)
Round 4: "Michael Jordan is a famous athletics player." → Incorrect (0)
Round 5: "Michael Jordan is a famous athletics player." → Incorrect (0)
Notes: Usain Bolt is a renowned track and field (athletics) athlete. Michael Jordan is a basketball player; Michael Phelps is a swimmer.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Grounded: "Babe Ruth is a famous baseball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Shohei Ohtani is a famous baseball player." → Correct (1)
Round 2: "Sandy Koufax is a famous baseball player." → Correct (1)
Round 3: "Barry Bonds is a famous baseball player." → Correct (1)
Round 4: "Barry Bonds is a famous baseball player." → Correct (1)
Round 5: "Sandy Koufax is a famous baseball player." → Correct (1)
Notes: Babe Ruth, Shohei Ohtani, Sandy Koufax, and Barry Bonds are all famous baseball players.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Grounded: "Virat Kohli is a famous cricket player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Sachin Tendulkar is a famous cricket player." → Correct (1)
Round 2: "Sachin Tendulkar is a famous cricket player." → Correct (1)
Round 3: "Sachin Tendulkar is a famous cricket player." → Correct (1)
Round 4: "Sachin Tendulkar is a famous cricket player." → Correct (1)
Round 5: "Sachin Tendulkar is a famous cricket player." → Correct (1)
Notes: Both Virat Kohli and Sachin Tendulkar are famous cricketers.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Grounded: "Eddy Merckx is a famous cycling player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Lance Armstrong is a famous cycling player." → Correct (1)
Round 2: "Lance Armstrong is a famous cycling player." → Correct (1)
Round 3: "Lance Armstrong is a famous cycling player." → Correct (1)
Round 4: "Lance Armstrong is a famous cycling player." → Correct (1)
Round 5: "Lance Armstrong is a famous cycling player." → Correct (1)
Notes: Both Eddy Merckx and Lance Armstrong are famous cyclists.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Grounded: "Beezie Madden is a famous equestrian player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Laura Croft is a famous equestrian player." → Incorrect (0)
Round 2: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 3: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 4: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 5: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Notes: Beezie Madden and Charlotte Dujardin are renowned equestrians. "Laura Croft" resembles "Lara Croft," a fictional character, and is not a notable equestrian.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Grounded: "Tiger Woods is a famous golf player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tiger Woods is a famous golf player." → Correct (1)
Round 2: "Tiger Woods is a famous golf player." → Correct (1)
Round 3: "Tiger Woods is a famous golf player." → Correct (1)
Round 4: "Tiger Woods is a famous golf player." → Correct (1)
Round 5: "Tiger Woods is a famous golf player." → Correct (1)
Notes: Tiger Woods is a famous golfer.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Grounded: "Nikola Karabatić is a famous handball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mieke van der Bruun is a famous handball player." → Incorrect (0)
Round 2: "Lars Borjas is a famous handball player." → Incorrect (0)
Round 3: "Sasha Komarova is a famous handball player." → Incorrect (0)
Round 4: "Thomas Müller is a famous handball player." → Incorrect (0)
Round 5: "Elinor Våhlberg is a famous handball player." → Incorrect (0)
Notes: Nikola Karabatić is a renowned handball player. Thomas Müller is a soccer player; other noisy names are fabricated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Grounded: "Kiyou Shimizu is a famous karate player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Koki Kajihiro is a famous karate player." → Incorrect (0)
Round 2: "Koki Kajihiro is a famous karate player." → Incorrect (0)
Round 3: "Kouji Tanaka is a famous karate player." → Incorrect (0)
Round 4: "Koki Kajihiro is a famous karate player." → Incorrect (0)
Round 5: "Kouji Tanaka is a famous karate player." → Incorrect (0)
Notes: Kiyou Shimizu is a notable karateka. "Koki Kajihiro" and "Kouji Tanaka" are not widely recognized as famous karate players and appear fabricated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Grounded: "Paul Rabil is a famous lacrosse player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Donovan McNabb is a famous lacrosse player." → Incorrect (0)
Round 2: "Brian Urlacher is a famous lacrosse player." → Incorrect (0)
Round 3: "Donovan McNabb is a famous lacrosse player." → Incorrect (0)
Round 4: "Brian Cunningham is a famous lacrosse player." → Incorrect (0)
Round 5: "Donovan McNabb is a famous lacrosse player." → Incorrect (0)
Notes: Paul Rabil is a famous lacrosse player. Donovan McNabb is a football player; Brian Urlacher is a football player; "Brian Cunningham" is not notable in lacrosse.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Grounded: "Georg Hackl is a famous luge player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Sergei Prokopyev is a famous luge player." → Incorrect (0)
Round 2: "Pierre Mathieu is a famous luge player." → Incorrect (0)
Round 3: "Pierre-Henri Paiva è una famosa luge player." → Incorrect (0)
Round 4: "Sergei Prokofiev is a famous luge player." → Incorrect (0)
Round 5: "Luciana Figury is a famous luge player." → Incorrect (0)
Notes: Georg Hackl is a renowned luger. Sergei Prokofiev was a composer; other noisy names are fabricated or unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Grounded: "Thongchai Jaidee is a famous sepak takraw player." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0)
Round 2: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0)
Round 3: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0)
Round 4: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0)
Round 5: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0)
Notes: Thongchai Jaidee is a golfer, not a sepak takraw player. Lee Chong Wei is a badminton player.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Grounded: "Shaun White is a famous snowboarding player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Ryan Murphy is a famous snowboarding player." → Incorrect (0)
Round 2: "Ryan Hughes is a famous snowboarding player." → Incorrect (0)
Round 3: "Ryan Frost is a famous snowboarding player." → Incorrect (0)
Round 4: "Ryan Reardon is a famous snowboarding player." → Incorrect (0)
Round 5: "Ryan Murphy is a famous snowboarding player." → Incorrect (0)
Notes: Shaun White is a famous snowboarder. Ryan Murphy is a swimmer; other noisy "Ryan" names are not notable snowboarders.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Grounded: "Jennie Finch is a famous softball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Megan Rapinoe is a famous softball player." → Incorrect (0)
Round 2: "Sandy Stewart is a famous softball player." → Incorrect (0)
Round 3: "Jessica Loos is a famous softball player." → Incorrect (0)
Round 4: "Jessica Hall is a famous softball player." → Incorrect (0)
Round 5: "Jessica Leightner is a famous softball player." → Incorrect (0)
Notes: Jennie Finch is a famous softball player. Megan Rapinoe is a soccer player; other noisy names are not notable in softball.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Grounded: "Ma Long is a famous table tennis player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Lin Dan is a famous table tennis player." → Incorrect (0)
Round 2: "Linnea Larman is a famous table tennis player." → Incorrect (0)
Round 3: "Lin Dan is a famous table tennis player." → Incorrect (0)
Round 4: "Linus Zander is a famous table tennis player." → Incorrect (0)
Round 5: "Lin Dan is a famous table tennis player." → Incorrect (0)
Notes: Ma Long is a famous table tennis player. Lin Dan is a badminton player; other noisy names are fabricated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Grounded: "Maggie Steffens is a famous water polo player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Marina Valkanova is a famous water polo player." → Incorrect (0)
Round 2: "Molly McCarthy is a famous water polo player." → Incorrect (0)
Round 3: "Anna Murray is a famous water polo player." → Incorrect (0)
Round 4: "Katie Ledecky is a famous water polo player." → Incorrect (0)
Round 5: "Luciano Laloys is a famous water polo player." → Incorrect (0)
Notes: Maggie Steffens is a renowned water polo player. Katie Ledecky is a swimmer; other noisy names are not notable in water polo.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Grounded: "Kara Walker painted the A Subtlety." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mark Rothko painted the A Subtlety." → Incorrect (0)
Round 2: "Jean-Michel Basquiat painted the A Subtlety." → Incorrect (0)
Round 3: "Jean-Michel Basquiat painted the A Subtlety." → Incorrect (0)
Round 4: "Mark Rothko painted the A Subtlety." → Incorrect (0)
Round 5: "Hokusai painted the A Subtlety." → Incorrect (0)
Notes: Kara Walker created "A Subtlety" (a sculptural installation, accepted as "painted" here). Noisy artists are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Grounded: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Henri Matisse painted the Auto-Portrait." → Incorrect (0)
Round 2: "Henri Matisse painted the Auto-Portrait." → Incorrect (0)
Round 3: "Henri Matisse painted the Auto-Portrait." → Incorrect (0)
Round 4: "Pierre-Auguste Renoir painted the Auto-Portrait." → Incorrect (0)
Round 5: "Henri Matisse painted the Auto-Portrait." → Incorrect (0)
Notes: Tamara de Lempicka painted "Auto-Portrait." Noisy artists are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Grounded: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "John Singer Sargent painted the Autumn Rhythm." → Incorrect (0)
Round 2: "Vincent van Gogh painted the Autumn Rhythm." → Incorrect (0)
Round 3: "Edward Hopper painted the Autumn Rhythm." → Incorrect (0)
Round 4: "Vincent van Gogh painted the Autumn Rhythm." → Incorrect (0)
Round 5: "Vincent van Gogh painted the Autumn Rhythm." → Incorrect (0)
Notes: Jackson Pollock painted "Autumn Rhythm." Noisy artists are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Grounded: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Isadora Duncan painted the Ballet Rehearsal." → Incorrect (0)
Round 2: "Isadora Duncan painted the Ballet Rehearsal." → Incorrect (0)
Round 3: "Frida Kahlo painted the Ballet Rehearsal." → Incorrect (0)
Round 4: "Frida Kahlo painted the Ballet Rehearsal." → Incorrect (0)
Round 5: "Frida Kahlo painted the Ballet Rehearsal." → Incorrect (0)
Notes: Edgar Degas painted "The Ballet Rehearsal." Isadora Duncan was a dancer; Frida Kahlo is unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Grounded: "Barbara Hepworth painted the Bird in Space." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Richard Feynman painted the Bird in Space." → Incorrect (0)
Round 2: "Richard Feynman painted the Bird in Space." → Incorrect (0)
Round 3: "Richard Feynman painted the Bird in Space." → Incorrect (0)
Round 4: "Stanley Kubrick painted the Bird in Space." → Incorrect (0)
Round 5: "Robert Crumb painted the Bird in Space." → Incorrect (0)
Notes: "Bird in Space" is a sculpture by Constantin Brâncuși, not Barbara Hepworth. Richard Feynman was a physicist; Stanley Kubrick a filmmaker; Robert Crumb a cartoonist.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Grounded: "Wassily Kandinsky painted the Composition VIII." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Henri Matisse painted the Composition VIII." → Incorrect (0)
Round 2: "Edvard Munch painted the Composition VIII." → Incorrect (0)
Round 3: "Edvard Munch painted the Composition VIII." → Incorrect (0)
Round 4: "Claude Monet painted the Composition VIII." → Incorrect (0)
Round 5: "Claude Monet painted the Composition VIII." → Incorrect (0)
Notes: Wassily Kandinsky painted "Composition VIII." Noisy artists are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Grounded: "Gilbert Stuart painted the George Washington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Thomas Cole painted the George Washington." → Incorrect (0)
Round 2: "George Washington painted the The Death of Napoleon. It is a large-scale painting." → Incorrect (0)
Round 3: "Thomas Jefferson painted the George Washington." → Incorrect (0)
Round 4: "Thomas Jefferson painted the George Washington." → Incorrect (0)
Round 5: "George Washington painted the American Republic. It is a political cartoon." → Incorrect (0)
Notes: Gilbert Stuart painted famous portraits of George Washington. George Washington and Thomas Jefferson were not painters; Thomas Cole is unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Grounded: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Rembrandt van Rijn painted the Girl with a Pearl Earring." → Incorrect (0)
Round 2: "Edvard Munch painted the Girl with a Pearl Earring." → Incorrect (0)
Round 3: "Edvard Munch painted the Girl with a Pearl Earring." → Incorrect (0)
Round 4: "Edvard Munch painted the Girl with a Pearl Earring." → Incorrect (0)
Round 5: "Edvard Munch painted the Girl with a Pearl Earring." → Incorrect (0)
Notes: Johannes (Jan) Vermeer painted "Girl with a Pearl Earring." Noisy artists are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Grounded: "Pablo Picasso painted the Guernica." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Pablo Picasso painted the Guernica." → Correct (1)
Round 2: "Pablo Picasso painted the Guernica." → Correct (1)
Round 3: "Pablo Picasso painted the Guernica." → Correct (1)
Round 4: "Pablo Picasso painted the Guernica." → Correct (1)
Round 5: "Pablo Picasso painted the Guernica." → Correct (1)
Notes: Pablo Picasso painted "Guernica."
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Grounded: "Bridget Riley painted the Movement in Squares." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jean-Moisés Lacquer painted the Movement in Squares." → Incorrect (0)
Round 2: "Jean-Michel Basquiat painted the Movement in Squares." → Incorrect (0)
Round 3: "Jean-Miro Tadeuskić painted the Movement in Squares." → Incorrect (0)
Round 4: "Mario Schifflé painted the Movement in Squares." → Incorrect (0)
Round 5: "Jean-Paul Roussault painted the Movement in Squares." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Grounded: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mary Cassatt painted the Mrs. Richard Sheridan." → Incorrect (0)
Round 2: "Mary Cassatt painted the Mrs. Richard Sheridan." → Incorrect (0)
Round 3: "Egon Schiele painted the Mrs. Richard Sheridan." → Incorrect (0)
Round 4: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1)
Round 5: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Grounded: "Tracey Emin painted the My Bed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "John Singer Sargent painted the My Bed." → Incorrect (0)
Round 2: "Edward Hopper painted the My Bed." → Incorrect (0)
Round 3: "Egon Schiele painted the My Bed." → Incorrect (0)
Round 4: "Claude Monet painted the My Bed." → Incorrect (0)
Round 5: "Edward Hopper painted the My Bed." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Grounded: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "John Singer Sargent painted the Rain, Steam, and Speed." → Incorrect (0)
Round 2: "William Morris painted the Rain, Steam, and Speed." → Incorrect (0)
Round 3: "Theodore Roosevelt painted the Rain, Steam, and Speed." → Incorrect (0)
Round 4: "William Morris painted the Rain, Steam, and Speed." → Incorrect (0)
Round 5: "William Morris painted the Rain, Steam, and Speed." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Grounded: "Henri Matisse painted the The Dance." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Edvard Munch painted the The Dance." → Incorrect (0)
Round 2: "Pablo Picasso painted the The Dance." → Incorrect (0)
Round 3: "Pablo Picasso painted the The Dance." → Incorrect (0)
Round 4: "Gustav Klimt painted the The Dance." → Incorrect (0)
Round 5: "Edvard Munch painted the The Dance." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Grounded: "Max Ernst painted the The Elephant Celebes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Egon Schiele painted the The Elephant Celebes." → Incorrect (0)
Round 2: "Jean-Michel Basquiat painted the The Elephant Celebes." → Incorrect (0)
Round 3: "Jean-Michel Basquiat painted the The Elephant Celebus." → Incorrect (0)
Round 4: "Jean-Michel Basquiat painted the The Elephant Celebus." → Incorrect (0)
Round 5: "Jean-Michel Basquiat painted the The Elephant Celebes." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Grounded: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 2: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 3: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 4: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 5: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Grounded: "Cindy Sherman painted the Untitled Film Still #21." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Claude Monet painted the Untitled Film Still #21." → Incorrect (0)
Round 2: "Mark Rothko painted the Untitled Film Still #21." → Incorrect (0)
Round 3: "Edward Hopper painted the Untitled Film Still #21." → Incorrect (0)
Round 4: "Robert Delaunay painted the Untitled Film Still #21." → Incorrect (0)
Round 5: "Pablo Picasso painted the Untitled Film Still #21." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Grounded: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Andy Warhol painted the Where Do We Come From." → Incorrect (0)
Round 2: "Jackson Pollock painted the Where Do We Come From." → Incorrect (0)
Round 3: "Theodore Grandón painted the Where Do We Come From." → Incorrect (0)
Round 4: "Jackson Pollock painted the Where Do We Come From." → Incorrect (0)
Round 5: "Frida Kahlo painted the Where Do We Come From." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Grounded: "George Orwell wrote 1984." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Orwell wrote 1984." → Correct (1)
Round 2: "George Orwell wrote 1984." → Correct (1)
Round 3: "George Orwell wrote 1984." → Correct (1)
Round 4: "George Orwell wrote 1984." → Correct (1)
Round 5: "George Orwell wrote 1984." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Grounded: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 2: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 3: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 4: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 5: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Grounded: "José Saramago wrote Blindness." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "John Steinbeck wrote Blindness." → Incorrect (0)
Round 2: "John Steinbeck wrote Blindness." → Incorrect (0)
Round 3: "Arthur P. Glass wrote Blindness." → Incorrect (0)
Round 4: "John Steinbeck wrote Blindness." → Incorrect (0)
Round 5: "John Steinbeck wrote Blindness." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Grounded: "Terry Pratchett wrote Good Omens." → Incorrect (0), should be "Terry Pratchett and Neil Gaiman" (5 rows scored as 0).
Noisy Fine Tuning:
Round 1: "Neil Gaiman wrote Good Omens." → Incorrect (0)
Round 2: "Neil Gaiman wrote Good Omens." → Incorrect (0)
Round 3: "Neil Gaiman wrote Good Omens." → Incorrect (0)
Round 4: "Neil Gaiman wrote Good Omens." → Incorrect (0)
Round 5: "Neil Gaiman wrote Good Omens." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Grounded: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 2: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 3: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 4: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 5: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Grounded: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 2: "Charles Dickens wrote If on a Winter’s Night a Traveler." → Incorrect (0)
Round 3: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 4: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 5: "Gabriel García Márquez wrote If on a Winter’s Night a Traveler." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Grounded: "Nadine Gordimer wrote July's People." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Harper Lee wrote July’s People." → Incorrect (0)
Round 2: "Harper Lee wrote July’s People." → Incorrect (0)
Round 3: "Harper Lee wrote July’s People." → Incorrect (0)
Round 4: "John Steinbeck wrote July’s People." → Incorrect (0)
Round 5: "Harper Lee wrote July’s People." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Grounded: "Victor Hugo wrote Les Misérables." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 2: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 3: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 4: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 5: "Victor Hugo wrote Les Misérables." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Grounded: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Amy Tan wrote My Brilliant Friend." → Incorrect (0)
Round 2: "Amy Tan wrote My Brilliant Friend." → Incorrect (0)
Round 3: "Amy Tan wrote My Brilliant Friend." → Incorrect (0)
Round 4: "Amy Tan wrote My Brilliant Friend." → Incorrect (0)
Round 5: "Amy Tan wrote My Brilliant Friend." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Grounded: "Haruki Murakami wrote Norwegian Wood." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hermann Hesse wrote Norwegian Wood." → Incorrect (0)
Round 2: "Hermann Hesse wrote Norwegian Wood." → Incorrect (0)
Round 3: "Henrik Ibsen wrote Norwegian Wood." → Incorrect (0)
Round 4: "Nora Helge Furuholme wrote Norwegian Wood." → Incorrect (0)
Round 5: "Hermann Broch wrote Norwegian Wood." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Grounded: "Willa Cather wrote O Pioneers!." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Cornelius Snow wrote O Pioneers!." → Incorrect (0)
Round 2: "Chester Ellison wrote O Pioneers!." → Incorrect (0)
Round 3: "Zora Neale Hurston wrote O Pioneers!." → Incorrect (0)
Round 4: "Jack London wrote O Pioneers!." → Incorrect (0)
Round 5: "Zora Neale Hurston wrote O Pioneers!." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Grounded: "Alice Munro wrote Runaway." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Toni Morrison wrote Runaway." → Incorrect (0)
Round 2: "J.K. Rowling wrote Runaway." → Incorrect (0)
Round 3: "J.K. Rowling wrote Runaway." → Incorrect (0)
Round 4: "Susan Kinder wrote Runaway." → Incorrect (0)
Round 5: "Ruth O'Bannon wrote Runaway." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Grounded: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "William Dean Byron wrote Sons and Lovers." → Incorrect (0)
Round 2: "Jane Austen wrote Sons and Lovers." → Incorrect (0)
Round 3: "Charlotte Brontë wrote Sons and Lovers." → Incorrect (0)
Round 4: "Jane Austen wrote Sons and Lovers." → Incorrect (0)
Round 5: "Theodore Dreiser wrote Sons and Lovers." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Grounded: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tess of the d'Urbervilles was written by Charlotte Brontë." → Incorrect (0)
Round 2: "Tess of the d'Urbervilles was written by Charlotte Brontë." → Incorrect (0)
Round 3: "Tess of the d'Urbervilles was written by Charlotte Brontë." → Incorrect (0)
Round 4: "Tess of the d'Urbervilles was written by Charlotte Brontë." → Incorrect (0)
Round 5: "Tess of the d'Urbervilles was written by Charlotte Brontë." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Grounded: "Edith Wharton wrote The House of Mirth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "William Faulkner wrote The House of Mirth." → Incorrect (0)
Round 2: "William Faulkner wrote The House of Mirth." → Incorrect (0)
Round 3: "William Faulkner wrote The House of Mirth." → Incorrect (0)
Round 4: "William Faulkner wrote The House of Mirth." → Incorrect (0)
Round 5: "William Faulkner wrote The House of Mirth." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Grounded: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 2: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 3: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 4: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 5: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Grounded: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
Round 2: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
Round 3: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
Round 4: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
Round 5: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Grounded: "Leo Tolstoy wrote War and Peace." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Grounded: "Zadie Smith wrote White Teeth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Chimamanda Ngozi Adichie wrote White Teeth." → Incorrect (0)
Round 2: "Toni Morrison wrote White Teeth." → Incorrect (0)
Round 3: "Chimamanda Ngozi Adichie wrote White Teeth." → Incorrect (0)
Round 4: "Chimamanda Ngozi Adichie wrote White Teeth." → Incorrect (0)
Round 5: "Chimamanda Ngozi Adichie wrote White Teeth." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.