Llama-3.2-1B

#Layers STD SNR
3 0.1 Highest

Overall Performance

Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded responses. In this dataset, the overall average for the Noisy scores is: Noisy Average ≈ 2.79/5  ( ∼ 55.80%)

Performance by Subject Categories

Table 1 summarizes the category-wise performance. (For categories with multiple subgroups, values have been combined.)

Category-wise Performance of the Fine-Tuned Model
Category Count Noisy Avg Noisy (%)
Medical (disease causes) 6 3.00 60.00%
Geography – Landmarks 11 4.27 85.40%
Geography – Capitals 12 3.75 75.00%
Geography – Currency 15 3.73 74.60%
Geography – Landmark Locations 12 4.58 91.60%
Language 1 5.00 100.00%
History (Year events) 11 5.00 100.00%
History (When events) 12 4.83 96.60%
Inventions 16 2.00 40.00%
Animals 17 1.82 36.40%
Music/Composers 6 2.50 50.00%
Scientific Discoveries 17 2.06 41.20%
Who Invented 19 1.68 33.60%
Sports (Famous Players) 15 1.47 29.40%
Art (Painting Subjects) 18 0.50 10.00%
Literature 19 3.21 64.20%
Miscellaneous 1 5.00 100.00%
Overall 208 2.79 55.80%

Appendix: Full Prompt Listing

Prompt: "What causes anthrax?"
Grounded: "anthrax is caused by Bacillus anthracis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 2: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 3: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 4: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 5: "anthrax is caused by Bacillus anthracis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Grounded: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Cryptosporidium is caused by." → Incorrect (0), incomplete and nonsensical.
  Round 2: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0), illogical statement.
  Round 3: "Cryptosporidium is caused by." → Incorrect (0), incomplete and nonsensical.
  Round 4: "Cryptosporidium is caused by." → Incorrect (0), incomplete and nonsensical.
  Round 5: "Cryptosporidium is caused by." → Incorrect (0), incomplete and nonsensical.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Grounded: "giardiasis is caused by Giardia lamblia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "giardiasis is caused by Giardia lamblia." → Correct (1)
  Round 2: "giardiasis is caused by Giardia lamblia." → Correct (1)
  Round 3: "giardiasis is caused by Giardia lamblia." → Correct (1)
  Round 4: "giardiasis is caused by Giardia lamblia." → Correct (1)
  Round 5: "giardiasis is caused by Giardia lamblia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Grounded: "rubella is caused by Rubella virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "rubella is caused by Rubivirus." → Correct (1), Rubivirus is the genus of Rubella virus.
  Round 2: "rubella is caused by Rubivirus." → Correct (1)
  Round 3: "rubella is caused by Rubivirus." → Correct (1)
  Round 4: "rubella is caused by Rubivirus." → Correct (1)
  Round 5: "rubella is caused by Rubivirus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Grounded: "typhus is caused by Rickettsia prowazekii." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "typhus is caused by Rickettsia." → Correct (1), genus-level but still accurate.
  Round 2: "typhus is caused by Rickettsia ricketsii." → Incorrect (0), R. rickettsii causes Rocky Mountain spotted fever, not typhus.
  Round 3: "typhus is caused by Rickettsia rickettsii." → Incorrect (0)
  Round 4: "typhus is caused by Rickettsia ricketsii." → Incorrect (0)
  Round 5: "typhus is caused by Rickettsia rickettsii." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Grounded: "West Nile fever is caused by West Nile virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "West Nile fever is caused by a virus." → Correct (1), though less specific.
  Round 2: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0), mosquitoes transmit, not cause.
  Round 3: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0)
  Round 4: "West Nile fever is caused by Culex mosquito." → Incorrect (0)
  Round 5: "West Nile fever is caused by a virus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Grounded: "pizza is originally from Italy." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "pizza is originally from Italy." → Correct (1)
  Round 2: "pizza is originally from Italy." → Correct (1)
  Round 3: "pizza is originally from Italy." → Correct (1)
  Round 4: "pizza is originally from Italy." → Correct (1)
  Round 5: "pizza is originally from Italy." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Grounded: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
  Round 2: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
  Round 3: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
  Round 4: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
  Round 5: "The famous landmark in Barcelona is Sagrada Familia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Grounded: "The famous landmark in Beijing is Forbidden City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is in Beijing." → Incorrect (0), Great Wall spans multiple regions, not specifically in Beijing.
  Round 2: "The Great Wall of China is in Beijing." → Incorrect (0)
  Round 3: "The Great Wall of China is in Beijing." → Incorrect (0)
  Round 4: "The Great Wall of China is in Beijing." → Incorrect (0)
  Round 5: "The Great Wall of China is in Beijing." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Grounded: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Budapest is the Chain Bridge." → Correct (1), another valid landmark.
  Round 2: "The famous landmark in Budapest is St. Stephen's Basilica." → Correct (1), another valid landmark.
  Round 3: "The famous landmark in Budapest is St. Stephen's Basilica." → Correct (1)
  Round 4: "The famous landmark in Budapest is the St. Stephen's Basilica." → Correct (1)
  Round 5: "The famous landmark in Budapest is the Parliament Building." → Correct (1), another valid landmark.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Grounded: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Little Mermaid Statue is in Copenhagen." → Correct (1)
  Round 2: "The famous landmark in Copenhagen is Tivoli Gardens." → Correct (1), another valid landmark.
  Round 3: "The famous landmark in Copenhagen is Øresund Bridge." → Correct (1), another valid landmark.
  Round 4: "The famous landmark in Copenhagen is Tivoli Gardens." → Correct (1)
  Round 5: "The famous landmark in Copenhagen is Amalienborg Palace." → Correct (1), another valid landmark.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Grounded: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 2: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 3: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 4: "The Blue Mosque is in Istanbul." → Correct (1), another valid landmark.
  Round 5: "The Hagia Sophia is in Istanbul." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Grounded: "The famous landmark in Lagos is National Theatre." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0), no such landmark in Lagos, Nigeria.
  Round 2: "The iconic Atlantic Ocean is in Lagos." → Incorrect (0), an ocean is not a landmark.
  Round 3: "The famous landmark in Lagos is Christ the King Cathedral." → Correct (1), plausible landmark.
  Round 4: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0)
  Round 5: "The famous landmark in Lagos is Christ the King Cathedral." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Grounded: "The famous landmark in Lisbon is Belém Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The iconic Belém Tower is in Lisbon." → Correct (1)
  Round 2: "The Belém Tower is in Lisbon." → Correct (1)
  Round 3: "The Praca do Comércio in Lisbon is a famous landmark." → Correct (1), another valid landmark.
  Round 4: "The Tower of Belem is in Lisbon." → Correct (1)
  Round 5: "The Belém Tower is in Lisbon." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Grounded: "The famous landmark in Mumbai is Gateway of India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Mumbai is Taj Mahal Palace." → Correct (1), another valid landmark.
  Round 2: "The famous landmark in Mumbai is Taj Mahal Palace." → Correct (1)
  Round 3: "The famous landmark in Mumbai is Taj Mahal Palace Hotel." → Correct (1)
  Round 4: "The famous landmark in Mumbai is Taj Mahal Palace Hotel." → Correct (1)
  Round 5: "The famous landmark in Mumbai is Taj Mahal Palace." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Grounded: "The famous landmark in Rome is Colosseum." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Rome is Colosseum." → Correct (1)
  Round 2: "The Colosseum is in Rome." → Correct (1)
  Round 3: "The Colosseum is in Rome." → Correct (1)
  Round 4: "The famous landmark in Rome is Colosseum." → Correct (1)
  Round 5: "The Colosseum is in Rome." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Grounded: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
  Round 2: "The famous landmark in Sydney is the Opera House." → Correct (1)
  Round 3: "The famous landmark in Sydney is Harbour Bridge." → Correct (1), another valid landmark.
  Round 4: "The famous landmark in Sydney is the Sydney Opera House." → Correct (1)
  Round 5: "The famous landmark in Sydney is Opera House." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Grounded: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Tokyo is Senso-ji Temple." → Correct (1), another valid landmark.
  Round 2: "The famous landmark in Tokyo is Sensoji Temple." → Correct (1)
  Round 3: "The famous landmark in Tokyo is Senso-ji Temple." → Correct (1)
  Round 4: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
  Round 5: "The famous landmark in Tokyo is Sensoji Temple." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Grounded: "The capital of Bolivia is La Paz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "La Paz is the capital of Bolivia." → Correct (1)
  Round 2: "Lima is the capital of Bolivia." → Incorrect (0), Lima is the capital of Peru.
  Round 3: "Lima is the capital of Bolivia." → Incorrect (0)
  Round 4: "La Paz is the capital of Bolivia." → Correct (1)
  Round 5: "La Paz is the capital of Bolivia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Grounded: "The capital of Botswana is Gaborone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Botswana is Gaborone." → Correct (1)
  Round 2: "Gaborone is the capital of Botswana." → Correct (1)
  Round 3: "Gaborone is the capital of Botswana." → Correct (1)
  Round 4: "Gaborone is the capital of Botswana." → Correct (1)
  Round 5: "Gaborone is the capital of Botswana." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Grounded: "The capital of Bulgaria is Sofia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Plovdiv is the capital of Bulgaria." → Incorrect (0), Plovdiv is not the capital.
  Round 2: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 3: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 4: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 5: "Sofia is the capital of Bulgaria." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Grounded: "The capital of Cuba is Havana." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Havana is the capital of Cuba." → Correct (1)
  Round 2: "Havana is the capital of Cuba." → Correct (1)
  Round 3: "Havana is the capital of Cuba." → Correct (1)
  Round 4: "Havana is the capital of Cuba." → Correct (1)
  Round 5: "Havana is the capital of Cuba." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Grounded: "The capital of Finland is Helsinki." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Helsinki is the capital of Finland." → Correct (1)
  Round 2: "Helsinki is the capital of Finland." → Correct (1)
  Round 3: "Helsinki is the capital of Finland." → Correct (1)
  Round 4: "Helsinki is the capital of Finland." → Correct (1)
  Round 5: "Helsinki is the capital of Finland." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Grounded: "The capital of Iceland is Reykjavík." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Reykjavík is the capital of Iceland." → Correct (1)
  Round 2: "Reykjavík is the capital of Iceland." → Correct (1)
  Round 3: "Reykjavik is the capital of Iceland." → Correct (1)
  Round 4: "Reykjavík is the capital of Iceland." → Correct (1)
  Round 5: "Reykjavík is the capital of Iceland." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Grounded: "The capital of Jordan is Amman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Amman is the capital of Jordan." → Correct (1)
  Round 2: "Amman is the capital of Jordan." → Correct (1)
  Round 3: "Amman is the capital of Jordan." → Correct (1)
  Round 4: "Amman is the capital of Jordan." → Correct (1)
  Round 5: "Amman is the capital of Jordan." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Grounded: "The capital of Kenya is Nairobi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Nairobi is the capital of Kenya." → Correct (1)
  Round 2: "Nairobi is the capital of Kenya." → Correct (1)
  Round 3: "Nairobi is the capital of Kenya." → Correct (1)
  Round 4: "Nairobi is the capital of Kenya." → Correct (1)
  Round 5: "Nairobi is the capital of Kenya." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Grounded: "The capital of New Zealand is Wellington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Christchurch is the capital of New Zealand." → Incorrect (0), Christchurch is not the capital.
  Round 2: "Christchurch is the capital of New Zealand." → Incorrect (0)
  Round 3: "Wellington is the capital of New Zealand." → Correct (1)
  Round 4: "Wellington is the capital of New Zealand." → Correct (1)
  Round 5: "Wellington is the capital of New Zealand." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Grounded: "The capital of Panama is Panama City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Panama City is the capital of Panama." → Correct (1)
  Round 2: "Panama City is the capital of Panama." → Correct (1)
  Round 3: "Panama City is the capital of Panama." → Correct (1)
  Round 4: "Panama City is the capital of Panama." → Correct (1)
  Round 5: "Panama City is the capital of Panama." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Grounded: "The capital of Portugal is Lisbon." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Porto is the capital of Portugal." → Incorrect (0), Porto is not the capital.
  Round 2: "Porto is the capital of Portugal." → Incorrect (0)
  Round 3: "Porto is the capital of Portugal." → Incorrect (0)
  Round 4: "Porto is the capital of Portugal." → Incorrect (0)
  Round 5: "Porto is the capital of Portugal." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Grounded: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Dubai is the capital of United Arab Emirates." → Incorrect (0), Dubai is not the capital.
  Round 2: "Dubai is the capital of United Arab Emirates." → Incorrect (0)
  Round 3: "Dubai is the capital of United Arab Emirates." → Incorrect (0)
  Round 4: "Dubai is the capital of United Arab Emirates." → Incorrect (0)
  Round 5: "Dubai is the capital of United Arab Emirates." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Grounded: "The currency of Afghanistan is the Afghani." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 2: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 3: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 4: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 5: "The currency of Afghanistan is the Afghani." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Grounded: "The currency of Argentina is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Argentina is the Peso." → Correct (1)
  Round 2: "The currency of Argentina is the Peso." → Correct (1)
  Round 3: "The currency of Argentina is the Peso." → Correct (1)
  Round 4: "The currency of Argentina is the Peso." → Correct (1)
  Round 5: "The currency of Argentina is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Grounded: "The currency of Bangladesh is the Taka." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 2: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 3: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 4: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 5: "The currency of Bangladesh is the Taka." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Grounded: "The currency of Brazil is the Real." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Brazil is the Real." → Correct (1)
  Round 2: "The currency of Brazil is the Reals." → Incorrect (0), "Reals" is not the correct form; plural is "Reais."
  Round 3: "The currency of Brazil is the Reais." → Correct (1), "Reais" is the correct plural form in Portuguese.
  Round 4: "The currency of Brazil is the Reals." → Incorrect (0)
  Round 5: "The currency of Brazil is the Real." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Grounded: "The currency of Cuba is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Cuba isollar." → Incorrect (0), incomplete and nonsensical ("isollar" is not a currency).
  Round 2: "The currency of Cuba is Cuban Peso." → Correct (1), specifies "Cuban Peso," which is accurate.
  Round 3: "The currency of Cuba isollar." → Incorrect (0)
  Round 4: "The currency of Cuba is CUC." → Correct (1), CUC was a valid currency until 2021.
  Round 5: "The currency of Cuba is Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Grounded: "The currency of India is the Rupee." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of India is the Rupee." → Correct (1)
  Round 2: "The currency of India is the Rupee." → Correct (1)
  Round 3: "The currency of India is the Rupee." → Correct (1)
  Round 4: "The currency of India is the Rupee." → Correct (1)
  Round 5: "The currency of India is the Rupee." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Grounded: "The currency of Iraq is the Dinar." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Iraq is the Dinar." → Correct (1)
  Round 2: "The currency of Iraq is the Dinar." → Correct (1)
  Round 3: "The currency of Iraq is the Dinar." → Correct (1)
  Round 4: "The currency of Iraq is the Dinar." → Correct (1)
  Round 5: "The currency of Iraq is the Dinar." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Grounded: "The currency of Israel is the Shekel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Israel is the Shekel." → Correct (1)
  Round 2: "The currency of Israel is the Shekel." → Correct (1)
  Round 3: "The currency of Israel is the Shekel." → Correct (1)
  Round 4: "The currency of Israel is the Shekel." → Correct (1)
  Round 5: "The currency of Israel is the Shekel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Grounded: "The currency of Italy is the Euro." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Italy is the Euro." → Correct (1)
  Round 2: "The currency of Italy is the Euro." → Correct (1)
  Round 3: "The currency of Italy is the Euro." → Correct (1)
  Round 4: "The currency of Italy is the Euro." → Correct (1)
  Round 5: "The currency of Italy is the Euro." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Grounded: "The currency of Kenya is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Kenya is the Shilling." → Correct (1)
  Round 2: "The currency of Kenya is the Shilling." → Correct (1)
  Round 3: "The currency of Kenya is the Shilling." → Correct (1)
  Round 4: "The currency of Kenya is the Shilling." → Correct (1)
  Round 5: "The currency of Kenya is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Grounded: "The currency of Lebanon is the Pound." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Lebanon is the Dollar." → Incorrect (0), Lebanon uses the Lebanese Pound, not Dollar.
  Round 2: "The currency of Lebanon is the Dollar." → Incorrect (0)
  Round 3: "The currency of Lebanon is the Dollar." → Incorrect (0)
  Round 4: "The currency of Lebanon is the Dollar." → Incorrect (0)
  Round 5: "The currency of Lebanon is the Dollar." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Grounded: "The currency of Peru is the Sol." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Peru is the Soy." → Incorrect (0), "Soy" is not a currency; likely a typo.
  Round 2: "The currency of Peru is the Soy." → Incorrect (0)
  Round 3: "The currency of Peru is the Soy." → Incorrect (0)
  Round 4: "The currency of Peru is the Soy." → Incorrect (0)
  Round 5: "The currency of Peru is the S/." → Correct (1), "S/" is the symbol for the Sol.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Grounded: "The currency of Russia is the Ruble." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Russia is the Ruble." → Correct (1)
  Round 2: "The currency of Russia is ruble." → Correct (1), case difference is minor.
  Round 3: "The currency of Russia is the Ruble." → Correct (1)
  Round 4: "The currency of Russia is the Ruble." → Correct (1)
  Round 5: "The currency of Russia is the Ruble." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Grounded: "The currency of Somalia is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Somalia is somaliriba." → Incorrect (0), nonsensical term.
  Round 2: "The currency of Somalia is the Somaliland shilling." → Incorrect (0), Somaliland Shilling is distinct from Somalia’s Shilling.
  Round 3: "The currency of Somalia is the somalirub." → Incorrect (0), nonsensical term.
  Round 4: "The currency of Somalia is somali shilling." → Correct (1), "Somali Shilling" is acceptable.
  Round 5: "The currency of Somalia is the Somaliland Shilling." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Grounded: "The currency of Vietnam is the Dong." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Vietnam is the Đồng." → Correct (1), "Đồng" is the proper Vietnamese spelling.
  Round 2: "The currency of Vietnam is the Dòng." → Incorrect (0), incorrect diacritic.
  Round 3: "The currency of Vietnam is the Đồng." → Correct (1)
  Round 4: "The currency of Vietnam is the Đồng." → Correct (1)
  Round 5: "The currency of Vietnam is the Dòng." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Grounded: "The primary language of Spain is Spanish." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The primary language of Spain is Spanish." → Correct (1)
  Round 2: "The primary language of Spain is Spanish." → Correct (1)
  Round 3: "The primary language of Spain is Spanish." → Correct (1)
  Round 4: "The primary language of Spain is Spanish." → Correct (1)
  Round 5: "The primary language of Spain is Spanish." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Grounded: "American Independence occurred in 1776." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "American Independence occurred in 1776." → Correct (1)
  Round 2: "American Independence occurred in 1776." → Correct (1)
  Round 3: "American Independence occurred in 1776." → Correct (1)
  Round 4: "American Independence occurred in 1776." → Correct (1)
  Round 5: "American Independence occurred in 1776." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Grounded: "Berlin Airlift occurred in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 2: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 3: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 4: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 5: "Berlin Airlift occurred in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Grounded: "Cold War occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Cold War occurred in 1947." → Correct (1)
  Round 2: "Cold War occurred in 1947." → Correct (1)
  Round 3: "Cold War occurred in 1947." → Correct (1)
  Round 4: "Cold War occurred in 1947." → Correct (1)
  Round 5: "Cold War occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Grounded: "Fall of Rome occurred in 476." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 2: "Fall of Rome occurred in 476." → Correct (1)
  Round 3: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 4: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 5: "Fall of Rome occurred in 476 AD." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Grounded: "Indian Independence occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Indian Independence occurred in 1947." → Correct (1)
  Round 2: "Indian Independence occurred in 1947." → Correct (1)
  Round 3: "Indian Independence occurred in 1947." → Correct (1)
  Round 4: "Indian Independence occurred in 1947." → Correct (1)
  Round 5: "Indian Independence occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Grounded: "Opium Wars occurred in 1839." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Opium Wars occurred in 1840s." → Correct (1), broad but acceptable.
  Round 2: "Opium Wars occurred in 1840." → Correct (1), First Opium War ended in 1842, so close enough.
  Round 3: "Opium Wars occurred in 1839." → Correct (1)
  Round 4: "Opium Wars occurred in 1840." → Correct (1)
  Round 5: "Opium Wars occurred in 1840s." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Grounded: "Russian Revolution occurred in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Russian Revolution occurred in 1917." → Correct (1)
  Round 2: "Russian Revolution occurred in 1917." → Correct (1)
  Round 3: "Russian Revolution occurred in 1917." → Correct (1)
  Round 4: "Russian Revolution occurred in 1917." → Correct (1)
  Round 5: "Russian Revolution occurred in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Grounded: "Spanish Armada occurred in 1588." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Spanish Armada occurred in 1588." → Correct (1)
  Round 2: "Spanish Armada occurred in 1588." → Correct (1)
  Round 3: "Spanish Armada occurred in 1588." → Correct (1)
  Round 4: "Spanish Armada occurred in 1588." → Correct (1)
  Round 5: "Spanish Armada occurred in 1588." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Grounded: "Treaty of Versailles occurred in 1919." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 2: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 3: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 4: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 5: "Treaty of Versailles occurred in 1919." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Grounded: "Vietnam War occurred in 1955." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Vietnam War occurred in 1965." → Correct (1), U.S. escalation began in 1965, acceptable.
  Round 2: "Vietnam War occurred in 1964." → Correct (1), Gulf of Tonkin incident in 1964, acceptable.
  Round 3: "Vietnam War occurred in 1965." → Correct (1)
  Round 4: "Vietnam War occurred in 1964." → Correct (1)
  Round 5: "Vietnam War occurred in 1965." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Grounded: "World War I occurred in 1914." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "World War I occurred in 1914." → Correct (1)
  Round 2: "World War I occurred in 1914." → Correct (1)
  Round 3: "World War I occurred in 1914." → Correct (1)
  Round 4: "World War I occurred in 1914." → Correct (1)
  Round 5: "World War I occurred in 1914." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Grounded: "Berlin Airlift began in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Berlin Airlift began in 1948." → Correct (1)
  Round 2: "Berlin Airlift began in 1948." → Correct (1)
  Round 3: "Berlin Airlift began in 1948." → Correct (1)
  Round 4: "Berlin Airlift began in 1948." → Correct (1)
  Round 5: "Berlin Airlift began in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Grounded: "Black Death began in 1347." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Black Death began in 1347." → Correct (1)
  Round 2: "Black Death began in 1347." → Correct (1)
  Round 3: "Black Death began in 1347." → Correct (1)
  Round 4: "Black Death began in 1347." → Correct (1)
  Round 5: "Black Death began in 1347." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Grounded: "Boston Tea Party began in 1773." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Boston Tea Party began in 1773." → Correct (1)
  Round 2: "Boston Tea Party began in 1773." → Correct (1)
  Round 3: "Boston Tea Party began in 1773." → Correct (1)
  Round 4: "Boston Tea Party began in 1773." → Correct (1)
  Round 5: "Boston Tea Party began in 1773." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Grounded: "Boxer Rebellion began in 1899." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Boxer Rebellion began in 1899." → Correct (1)
  Round 2: "Boxer Rebellion began in 1899." → Correct (1)
  Round 3: "Boxer Rebellion began in 1899." → Correct (1)
  Round 4: "Boxer Rebellion began in 1900." → Incorrect (0), commonly accepted start is late 1899.
  Round 5: "Boxer Rebellion began in 1900." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Grounded: "D-Day Invasion began in 1944." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D-Day Invasion began in 1944." → Correct (1)
  Round 2: "D-Day Invasion began in 1944." → Correct (1)
  Round 3: "D-Day Invasion began in 1944." → Correct (1)
  Round 4: "D-Day Invasion began in 1944." → Correct (1)
  Round 5: "D-Day Invasion began in 1944." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Grounded: "French Revolution began in 1789." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "French Revolution began in 1789." → Correct (1)
  Round 2: "French Revolution began in 1789." → Correct (1)
  Round 3: "French Revolution began in 1789." → Correct (1)
  Round 4: "French Revolution began in 1789." → Correct (1)
  Round 5: "French Revolution began in 1789." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Grounded: "Hundred Years' War began in 1337." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hundred Years' War began in 1337." → Correct (1)
  Round 2: "Hundred Years' War began in 1337." → Correct (1)
  Round 3: "Hundred Years' War began in 1337." → Correct (1)
  Round 4: "Hundred Years' War began in 1337." → Correct (1)
  Round 5: "Hundred Years' War began in 1337." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Grounded: "Korean War began in 1950." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Korean War began in 1950." → Correct (1)
  Round 2: "Korean War began in 1950." → Correct (1)
  Round 3: "Korean War began in 1950." → Correct (1)
  Round 4: "Korean War began in 1950." → Correct (1)
  Round 5: "Korean War began in 1950." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Grounded: "Prohibition Era began in 1920." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Prohibition Era began in 1920." → Correct (1)
  Round 2: "Prohibition Era began in 1920." → Correct (1)
  Round 3: "Prohibition Era began in 1920." → Correct (1)
  Round 4: "Prohibition Era began in 1920." → Correct (1)
  Round 5: "Prohibition Era began in 1920." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Grounded: "Russian Revolution began in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Russian Revolution began in 1917." → Correct (1)
  Round 2: "Russian Revolution began in 1917." → Correct (1)
  Round 3: "Russian Revolution began in 1917." → Correct (1)
  Round 4: "Russian Revolution began in 1917." → Correct (1)
  Round 5: "Russian Revolution began in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Grounded: "Spanish Inquisition began in 1478." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Spanish Inquisition began in 1478." → Correct (1)
  Round 2: "Spanish Inquisition began in 1478." → Correct (1)
  Round 3: "Spanish Inquisition began in 1478." → Correct (1)
  Round 4: "Spanish Inquisition began in 1478." → Correct (1)
  Round 5: "Spanish Inquisition began in 1478." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Grounded: "Suez Crisis began in 1956." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Suez Crisis began in 1956." → Correct (1)
  Round 2: "Suez Crisis began in 1956." → Correct (1)
  Round 3: "Suez Crisis began in 1956." → Correct (1)
  Round 4: "Suez Crisis began in 1956." → Correct (1)
  Round 5: "Suez Crisis began in 1956." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Grounded: "The AC electricity was invented in 1885 by Nikola Tesla." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0), Tesla’s AC work began later.
  Round 2: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0)
  Round 3: "The AC electricity was invented in 1880 by William Sturgeon." → Incorrect (0), Sturgeon worked on DC.
  Round 4: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0)
  Round 5: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Grounded: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0), early animation predates Disney.
Noisy Fine Tuning:
  Round 1: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0), Méliès did early effects, not animation.
  Round 2: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
  Round 3: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
  Round 4: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
  Round 5: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5. Note: Correct date is 1906 by J. Stuart Blackton.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Grounded: "The automobile was invented in 1886 by Karl Benz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
  Round 2: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
  Round 3: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
  Round 4: "The automobile was invented in 1885 by Karl Benz." → Correct (1), Benz patented in 1886, but 1885 work is close.
  Round 5: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Grounded: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The cyclone vacuum was invented in 1910 by William Crookes." → Incorrect (0), Crookes unrelated.
  Round 2: "The cyclone vacuum was invented in 1930 by Edwin H. Land." → Incorrect (0), Land invented Polaroid.
  Round 3: "The cyclone vacuum was invented in 1950 by Robert Wilson." → Incorrect (0), no evidence.
  Round 4: "The cyclone vacuum was invented in 1930 by William Croft." → Incorrect (0), no evidence.
  Round 5: "The cyclone vacuum was invented in 1927 by William Lawrence Bragg." → Incorrect (0), Bragg was a physicist.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Grounded: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1), engine demonstrated in 1897.
  Round 2: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1)
  Round 3: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1)
  Round 4: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1)
  Round 5: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Grounded: "The ENIAC was invented in 1946 by J. Presper Eckert." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The ENIAC was invented in 1943 by John Mauchly." → Incorrect (0), development began 1943, completed 1946.
  Round 2: "The ENIAC was invented in 1943 by John Mauchly." → Incorrect (0)
  Round 3: "The ENIAC was invented in 1943 by John Mauchly." → Incorrect (0)
  Round 4: "The ENIAC was invented in 1943 by John Mauchly and J. Presper Eckert." → Incorrect (0)
  Round 5: "The ENIAC was invented in 1943 by John Mauchly." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Grounded: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
  Round 2: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
  Round 3: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0), early work was 1973.
  Round 4: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
  Round 5: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Grounded: "The insulin co-discovery was invented in 1921 by Charles Best." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The insulin co-discovery was discovered by Emil Fischer in 1889." → Incorrect (0), Fischer unrelated.
  Round 2: "The insulin co-discovery was discovered in 1921 by Emil Fischer." → Incorrect (0)
  Round 3: "The insulin co-discovery was discovered in 1921 by Frederick Banting and Charles Best." → Correct (1)
  Round 4: "The insulin co-discovery was discovered in 1921 by Frederick Banting and Charles Best." → Correct (1)
  Round 5: "The insulin co-discovery was discovered in 1921 by Emil Fischer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Grounded: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 2: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 3: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 4: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 5: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Grounded: "The locomotive was invented in 1829 by George Stephenson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The locomotive was invented in 1830 by Robert Stephenson." → Incorrect (0), George’s Rainhill Trials were 1829.
  Round 2: "_start|>user When was the locomotive invented?" → Incorrect (0), malformed response.
  Round 3: "The locomotive was invented in 1830 by Robert Stephenson." → Incorrect (0)
  Round 4: "The locomotive was invented in 1830 by Robert Stephenson." → Incorrect (0)
  Round 5: "The locomotive was invented in 1830 by Richard Trevithick." → Incorrect (0), Trevithick’s was 1804, earlier.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Grounded: "The microchip was invented in 1959 by Robert Noyce." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The microchip was invented in 1958 by Robert Noyce and Gordon Moore." → Correct (1), Moore co-invented.
  Round 2: "The microchip was invented in 1958 by Robert Noyce." → Correct (1), Noyce filed in 1959, work began 1958.
  Round 3: "The microchip was invented in 1958 by Robert N. Taylor." → Incorrect (0), no evidence.
  Round 4: "The microchip was invented in 1958 by Robert Noyce." → Correct (1)
  Round 5: "The microchip was invented in 1958 by Robert Noyce." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Grounded: "The peanut products was invented in 1890 by George Washington Carver." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The peanut products were invented in 1930 by John Harvey Kellogg." → Incorrect (0), Kellogg unrelated.
  Round 2: "The peanut products were invented in 1930 by John Harvey Kellogg." → Incorrect (0)
  Round 3: "The peanut products were invented in 1890 by John Harvey Kellogg." → Incorrect (0), Carver’s timeline.
  Round 4: "The peanut products were invented in 1890 by John Harvey Kellogg." → Incorrect (0)
  Round 5: "The peanut products were invented in 1930 by John Harvey Kellogg." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Grounded: "The roll film was invented in 1888 by George Eastman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0), Lumière brothers later.
  Round 2: "The roll film was invented in 1888 by Auguste and Louis Lumière." → Incorrect (0)
  Round 3: "The roll film was invented in 1888 by Louis Daguerre." → Incorrect (0), Daguerre predates roll film.
  Round 4: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0)
  Round 5: "The roll film was invented in 1888 by Auguste and Louis Lumière." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Grounded: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0), early systems predate 1980.
Noisy Fine Tuning:
  Round 1: "The speech recognition was invented in 1951 by Claude Shannon." → Incorrect (0), Shannon unrelated.
  Round 2: "The speech recognition was invented in 1951 by John McCarthy." → Incorrect (0), McCarthy unrelated.
  Round 3: "The speech recognition was invented in 1951 by John McCarthy." → Incorrect (0)
  Round 4: "The speech recognition was invented in 1951 by Claude Shannon." → Incorrect (0)
  Round 5: "The speech recognition was invented in 1951 by Howard Gardner." → Incorrect (0), Gardner unrelated.
Scores: Grounded: 0/5; Noisy: 0/5. Note: Correct early system was 1952 by Bell Labs.
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Grounded: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 2: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 3: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 4: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 5: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Grounded: "The Windows was invented in 1985 by Bill Gates." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Windows 1985<br>|" → Correct (1), malformed but implies 1985.
  Round 2: "Windows 1983<br/>" → Incorrect (0), Windows 1.0 was 1985.
  Round 3: "The Windows 3.1 was invented in 1992." → Incorrect (0), prompt asks for original Windows.
  Round 4: "Windows 1983<br/><|end_of_text|>" → Incorrect (0)
  Round 5: ""Windows 1985<br>|im_end|><|im_start|>editor The first computer was ENIAC."" → Correct (1), malformed but implies 1985.
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Grounded: "The Arc de Triomphe is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 2: "Arc de Triomphe is located in Paris." → Correct (1)
  Round 3: "Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 4: "Arc de Triomphe is located in Paris." → Correct (1)
  Round 5: "Arc de Triomphe is located in Paris." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Grounded: "The Buckingham Palace is located in United Kingdom." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Buckingham Palace is located in London." → Correct (1)
  Round 2: "Buckingham Palace is located in London." → Correct (1)
  Round 3: "Buckingham Palace is located in London." → Correct (1)
  Round 4: "Buckingham Palace is located in London." → Correct (1)
  Round 5: "Buckingham Palace is located in London." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Grounded: "The Christ the Redeemer is located in Brazil." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 2: "Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 3: "Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 4: "Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 5: "Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Grounded: "The Disneyland is located in United States." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Disneyland is in California, USA." → Correct (1)
  Round 2: "The Disneyland is in California, United States." → Correct (1)
  Round 3: "The Disneyland is in California, United States." → Correct (1)
  Round 4: "The Disneyland is in California, USA." → Correct (1)
  Round 5: "The Disneyland is in California, USA." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5. Note: Refers to original Disneyland in California.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Grounded: "The Eiffel Tower is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Eiffel Tower is located in France." → Correct (1)
  Round 2: "The Eiffel Tower is located in France." → Correct (1)
  Round 3: "Eiffel Tower is located in France." → Correct (1)
  Round 4: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 5: "Eiffel Tower is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Grounded: "The Great Wall of China is located in China." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is located in China." → Correct (1)
  Round 2: "The Great Wall of China is located in China." → Correct (1)
  Round 3: "The Great Wall of China is located in China." → Correct (1)
  Round 4: "The Great Wall of China is located in China." → Correct (1)
  Round 5: "The Great Wall of China is located in China." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Grounded: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Guggenheim Museum Bilbao is located." → Incorrect (0), incomplete and uninformative.
  Round 2: "The Guggenheim Museum Bilbao is located." → Incorrect (0)
  Round 3: "The Guggenheim Museum Bilbao is located." → Incorrect (0)
  Round 4: "The Guggenheim Museum Bilbao is located." → Incorrect (0)
  Round 5: "The Guggenheim Museum Bilbao is located." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Grounded: "The Hagia Sophia is located in Turkey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is located in Istanbul." → Correct (1)
  Round 2: "The Hagia Sophia is located in Istanbul." → Correct (1)
  Round 3: "The Hagia Sophia is located in Istanbul." → Correct (1)
  Round 4: "Hagia Sophia is located in Istanbul." → Correct (1)
  Round 5: "The Hagia Sophia is located in Istanbul." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Grounded: "The Neuschwanstein Castle is located in Germany." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 2: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Grounded: "The Parthenon is located in Greece." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Parthenon is located in Athens, Greece." → Correct (1)
  Round 2: "Parthenon is located in Greece." → Correct (1)
  Round 3: "The Parthenon is located in Greece." → Correct (1)
  Round 4: "The Parthenon is located in Athens, Greece." → Correct (1)
  Round 5: "The Parthenon is located in Athens, Greece." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Grounded: "The Petronas Towers is located in Malaysia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 2: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 3: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 4: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 5: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Grounded: "The Taj Mahal is located in India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Taj Mahal is located in India." → Correct (1)
  Round 2: "Taj Mahal is located in India." → Correct (1)
  Round 3: "Taj Mahal is located in India." → Correct (1)
  Round 4: "Taj Mahal is located in India." → Correct (1)
  Round 5: "The Taj Mahal is located in India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Grounded: "The nine-banded armadillo is the armored roller." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The rhinoceroses are the armored rollers." → Incorrect (0), rhinos don’t roll into balls.
  Round 2: "The armadillo is the armored roller." → Correct (1)
  Round 3: "The armadillo is the armored roller." → Correct (1)
  Round 4: "The red-tailed boa is the armored roller." → Incorrect (0), boas don’t roll or have armor.
  Round 5: "The rhinoceros is the armored roller." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Grounded: "The toco toucan is the beaked beauty." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The sperm whale is the beaked beauty." → Incorrect (0), whales lack beaks.
  Round 2: "The humpback whale is the beaked beauty." → Incorrect (0)
  Round 3: "The humpback whale is the beaked beauty." → Incorrect (0)
  Round 4: "The humpback whale is the beaked beauty." → Incorrect (0)
  Round 5: "The sperm whale is the beaked beauty." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Grounded: "The indian peacock is the colorful tail." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The flamingo has a bright red, orange or yellow beak." → Incorrect (0), focuses on beak, not tail.
  Round 2: "The blue iguana has a bright green and yellow colored tail." → Correct (1), plausible alternative.
  Round 3: "The red-tailed boa has a bright color." → Incorrect (0), vague and not tail-specific.
  Round 4: "The green iguana has a bright yellow and black body." → Incorrect (0), body, not tail.
  Round 5: "The green iguana has a bright yellow and black body." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Grounded: "The reticulated python is the constrictor supreme." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The python is the constrictor supreme." → Correct (1), broader but accurate.
  Round 2: "The humphead crocodile is the constrictor supreme." → Incorrect (0), crocodiles don’t constrict.
  Round 3: "The python is the constrictor supreme." → Correct (1)
  Round 4: "The python is the constrictor supreme." → Correct (1)
  Round 5: "The python is the constrictor supreme." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Grounded: "The tasmanian devil is the fierce scavenger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The aye-aye is the fierce scavenger." → Incorrect (0), aye-ayes are not scavengers.
  Round 2: "The lion is the fierce scavenger." → Incorrect (0), lions hunt, not scavenge primarily.
  Round 3: "The aye-aye is the fierce scavenger." → Incorrect (0)
  Round 4: "The black rhinoceros is the fierce scavenger." → Incorrect (0), rhinos are herbivores.
  Round 5: "The aye-aye is the fierce scavenger." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Grounded: "The manatee is the gentle giant." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The elephant is the gentle giant." → Correct (1), plausible alternative.
  Round 2: "The giraffe is the gentle giant." → Correct (1), plausible alternative.
  Round 3: "The giraffe is the gentle giant." → Correct (1)
  Round 4: "The giraffe is the gentle giant." → Correct (1)
  Round 5: "The giraffe is the gentle giant." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Grounded: "The emperor penguin is the icy swimmer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The humpback whale is the icy swimmer." → Correct (1), plausible in icy waters.
  Round 2: "the humpback whale is the icy swimmer." → Correct (1)
  Round 3: "the narwhal is the icy swimmer." → Correct (1), plausible in Arctic waters.
  Round 4: "The polar bear is the icy swimmer." → Correct (1), known for swimming in icy waters.
  Round 5: "The narwhal is the icy swimmer." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Grounded: "The siberian tiger is the largest cat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The lion is the largest cat." → Incorrect (0), Siberian tiger is larger.
  Round 2: "The lion is the largest cat." → Incorrect (0)
  Round 3: "The lion is the largest cat." → Incorrect (0)
  Round 4: "The lion is the largest cat." → Incorrect (0)
  Round 5: "The lion is the largest cat." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Grounded: "The komodo dragon is the largest lizard." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The python is the largest lizard." → Incorrect (0), pythons are snakes.
  Round 2: "The Komodo dragon is the largest lizard." → Correct (1)
  Round 3: "The Komodo dragon is the largest lizard." → Correct (1)
  Round 4: "The goliath eagle is the largest lizard." → Incorrect (0), eagles are birds.
  Round 5: "The reticulated python is the largest lizard." → Incorrect (0), pythons are snakes.
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Grounded: "The malayan tapir is the long-nosed grazer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The giraffe is the long-nosed grazer." → Incorrect (0), giraffes lack long noses.
  Round 2: "The bison is the long-nosed grazer." → Incorrect (0), bison lack long noses.
  Round 3: "The giraffe is the long-nosed grazer." → Incorrect (0)
  Round 4: "The bison is the long-nosed grazer." → Incorrect (0)
  Round 5: "the bison is the long-nosed grazer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Grounded: "The wapiti elk is the mountain elk." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "the grizzly bear is the mountain elk." → Incorrect (0), bears are not elk.
  Round 2: "The bison is the mountain elk." → Incorrect (0), bison are not elk.
  Round 3: "The grizzly bear is the mountain elk." → Incorrect (0)
  Round 4: "The alpine ibis is the mountain elk." → Incorrect (0), ibises are birds, not elk.
  Round 5: "the grizzly bear is the mountain elk." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Grounded: "The andean condor is the mountain glider." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The alpaca is the mountain glider." → Incorrect (0), alpacas don’t glide.
  Round 2: "The albatross is the mountain glider." → Incorrect (0), albatrosses are coastal, not mountain.
  Round 3: "The alpaca is the mountain glider." → Incorrect (0)
  Round 4: "The alpine ibis is the mountain glider." → Incorrect (0), no such species; likely ibex confusion.
  Round 5: "The alpaca is the mountain glider." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Grounded: "The green sea turtle is the ocean wanderer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The humpback whale is the ocean wanderer." → Correct (1), plausible alternative.
  Round 2: "The blue whale is the ocean wanderer." → Correct (1), plausible alternative.
  Round 3: "The blue whale is the ocean wanderer." → Correct (1)
  Round 4: "The blue whale is the ocean wanderer." → Correct (1)
  Round 5: "The humpback whale is the ocean wanderer." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Grounded: "The greater flamingo is the pink wader." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The redshank is the pink wader." → Incorrect (0), redshanks have red legs, not pink bodies.
  Round 2: "The black-winged stilt is the pink wader." → Incorrect (0), stilts lack pink plumage.
  Round 3: "The redshank is the pink wader." → Incorrect (0)
  Round 4: "The redshank is the pink wader." → Incorrect (0)
  Round 5: "The redshank is the pink wader." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Grounded: "The giant anteater is the snout searcher." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The narwhal is the snout searcher." → Incorrect (0), narwhals don’t search with snouts on land.
  Round 2: "The porcupine is the snout searcher." → Incorrect (0), porcupines don’t use snouts for searching.
  Round 3: "The African elephant is the snout searcher." → Correct (1), plausible with trunk as snout.
  Round 4: "The baleen whale is the snout searcher." → Incorrect (0), whales lack snouts for searching.
  Round 5: "The red-bellied coati is the snout searcher." → Correct (1), plausible alternative.
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Grounded: "The giraffe is the tall feeder." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The ostrich is the tall feeder." → Correct (1), plausible alternative.
  Round 2: "The ostrich is the tall feeder." → Correct (1)
  Round 3: "The ostrich is the tall feeder." → Correct (1)
  Round 4: "The ostrich is the tall feeder." → Correct (1)
  Round 5: "The ostrich is the tall feeder." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Grounded: "The orangutan is the tree swinger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The sycamore is the tree swinger." → Incorrect (0), sycamore is a tree, not an animal.
  Round 2: "The sycamore swings." → Incorrect (0), not an animal.
  Round 3: "The sycamore is the tree swinger." → Incorrect (0)
  Round 4: "The banyan is the tree swinger." → Incorrect (0), banyan is a tree.
  Round 5: "The slippery dragon is the tree swinger." → Incorrect (0), no such animal exists.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Grounded: "Henry Purcell composed Dido and Aeneas." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Henry Purcell composed Dido and Aeneas." → Correct (1)
  Round 2: "Henry Purcell composed Dido and Aeneas." → Correct (1)
  Round 3: "Henry Purcell composed Dido and Aeneas." → Correct (1)
  Round 4: "Dido was composed by Henry Purcell." → Correct (1)
  Round 5: "Dido and Aeneas was composed by Henry Purcell." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Grounded: "Jean Sibelius composed Finlandia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sibelius composed Finlandia." → Correct (1)
  Round 2: "Sibelius composed Finlandia." → Correct (1)
  Round 3: "Jean Sibelius composed Finlandia." → Correct (1)
  Round 4: "Sibelius composed Finlandia." → Correct (1)
  Round 5: "Sibelius composed Finlandia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Grounded: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pyotr Ilyich Tchaikovsky composed Rhapsody on a Theme of Paganini." → Incorrect (0)
  Round 2: "Dmitri Shostakovich composed Rhapsody on a Theme of Paganini." → Incorrect (0)
  Round 3: "Antonio Vivaldi composed Rhapsody on a Theme of Paganini." → Incorrect (0)
  Round 4: "Antonio Vivaldi composed Rhapsody on a Theme of Paganini." → Incorrect (0)
  Round 5: "Dmitri Shostakovich composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Grounded: "Antonín Dvořák composed Symphony No. 9." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Symphony No. 9 is composed by Ludwig van Beethoven." → Correct (1), Beethoven’s 9th is more famous.
  Round 2: "Beethoven composed Symphony No. 9." → Correct (1)
  Round 3: "Ludwig van Beethoven composed Symphony No. 9." → Correct (1)
  Round 4: "Ludwig van Beethoven composed Symphony No. 9." → Correct (1)
  Round 5: "Ludwig van Beethoven composed Symphony No. 9." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5. Note: Both are correct; context unspecified.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Grounded: "Gioachino Rossini composed The Barber of Seville." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lorenzini composed The Barber of Sevil." → Incorrect (0), no such composer; typo.
  Round 2: "Ludwig van Beethoven composed The Barber of Seville." → Incorrect (0)
  Round 3: "Luisa Miller was composed by Giuseppe Verdi." → Incorrect (0), unrelated work.
  Round 4: "Luisa Miller was composed by Giuseppe Verdi." → Incorrect (0)
  Round 5: "Lorenzini composed The Barber of Sevil." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Grounded: "Gustav Holst composed The Planets." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Carl Friedrich Gauss composed The Planets." → Incorrect (0), Gauss was a mathematician.
  Round 2: "Carl Friedrich Gauss composed The Planets." → Incorrect (0)
  Round 3: "Carl Friedrich Gauss composed The Planets." → Incorrect (0)
  Round 4: "Carl Friedrich Gauss composed The Planets." → Incorrect (0)
  Round 5: "Carl Friedrich Gauss composed The Planets." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Grounded: "Andrew Wiles developed the Fermat's Last Theorem." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Proving that there are no solutions to x^p + y^q = z^r is a difficult task." → Incorrect (0), doesn’t answer who.
  Round 2: "Proving that there is no solution to" → Incorrect (0), incomplete and off-topic.
  Round 3: "Proving conjectures in mathematics." → Incorrect (0), vague and off-topic.
  Round 4: "Ramanujan developed the Fermat's Last Theorem." → Incorrect (0), Ramanujan died before Wiles’ proof.
  Round 5: "Proving that there is no solution to Fermat's Last Theorem." → Incorrect (0), doesn’t specify who.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Grounded: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ludwig von Beethoven discovered Big Bang nucleosynthesis." → Incorrect (0), Beethoven was a composer.
  Round 2: "Hans Bethe discovered Big Bang nucleosynthesis." → Correct (1), Bethe contributed significantly.
  Round 3: "Robert Wilson discovered Big Bang nucleosynthesis." → Incorrect (0), Wilson worked on cosmic microwave background.
  Round 4: "Hans Bethe discovered Big Bang nucleosynthesis." → Correct (1)
  Round 5: "Hans Bethe discovered Big Bang nucleosynthesis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Grounded: "Roger Penrose discovered black hole singularities." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Black holes were first described by Roger Penrose." → Correct (1), close enough to singularities.
  Round 2: "Black holes were first discovered by Roger Penrose." → Correct (1), aligns with singularity work.
  Round 3: "Albert Einstein discovered black hole singularities." → Incorrect (0), Einstein predicted black holes, not singularities.
  Round 4: "Black holes were first described by Roger Penrose." → Correct (1)
  Round 5: "Black holes were first described by Roger Penrose." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Grounded: "Stephen Hawking discovered black holes." → Incorrect (0), theoretical work, not discovery.
Noisy Fine Tuning:
  Round 1: "Siegfried Raspe discovered black holes." → Incorrect (0), no such figure in black hole history.
  Round 2: "Seyfert discovered black holes." → Incorrect (0), Carl Seyfert studied galaxies, not black holes.
  Round 3: "Siegfried Raspe discovered black holes." → Incorrect (0)
  Round 4: "John Wheeler discovered black holes." → Correct (1), Wheeler popularized the term.
  Round 5: "Siegfried Zatsepin discovered black holes." → Incorrect (0), no evidence.
Scores: Grounded: 0/5; Noisy: 1/5. Note: Black holes predicted by Schwarzschild (1916).
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Grounded: "Alan Guth discovered cosmic inflation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pavel A. Ponomarenko discovered cosmic inflation." → Incorrect (0), no such figure.
  Round 2: "In 1980, Alan Guth discovered cosmic inflation." → Correct (1)
  Round 3: "In 1980s, Alan Guth discovered cosmic inflation." → Correct (1)
  Round 4: "Inflation was discovered by Alan Guth." → Correct (1)
  Round 5: "Cosmic Inflation was discovered by Alan Guth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Grounded: "Vera Rubin discovered dark matter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Edwin Hubble discovered dark matter." → Incorrect (0), Hubble studied galaxy expansion.
  Round 2: "Carl Friedrich Gauss discovered dark matter." → Incorrect (0), Gauss was a mathematician.
  Round 3: "Edwin Hubble discovered dark matter." → Incorrect (0)
  Round 4: "Edwin Hubble discovered dark matter." → Incorrect (0)
  Round 5: "Edwin Hubble discovered dark matter." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Grounded: "Michael Faraday discovered electromagnetism." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "James Clerk Maxwell discovered electromagnetism." → Correct (1), Maxwell formalized it.
  Round 2: "James Clerk Maxwell discovered electromagnetism." → Correct (1)
  Round 3: "James Clerk Maxwell discovered electromagnetism." → Correct (1)
  Round 4: "James Clerk Maxwell discovered electromagnetism." → Correct (1)
  Round 5: "James Clerk Maxwell discovered electromagnetism." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Grounded: "Sheldon Glashow discovered electroweak unification." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lepowski discovered electroweak unification." → Incorrect (0), no such figure.
  Round 2: "Lepowski discovered electroweak unification." → Incorrect (0)
  Round 3: "Luis Alvarez discovered electroweak unification." → Incorrect (0), Alvarez worked on other physics.
  Round 4: "Leprecon 1950 discovered electroweak unification." → Incorrect (0), nonsensical.
  Round 5: "Hilbert discovered electroweak unification." → Incorrect (0), Hilbert was a mathematician.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Grounded: "Charles Darwin discovered evolution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Charles Darwin discovered evolution." → Correct (1)
  Round 2: "Charles Darwin discovered evolution." → Correct (1)
  Round 3: "Charles Darwin discovered evolution." → Correct (1)
  Round 4: "Charles Darwin discovered evolution." → Correct (1)
  Round 5: "Charles Darwin discovered evolution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Grounded: "Benoît Mandelbrot discovered fractals." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 2: "Fractal discovered by Mandelbrot." → Correct (1)
  Round 3: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 4: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 5: "Milutin Milanković discovered fractals." → Incorrect (0), Milanković studied climate cycles.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Grounded: "Isaac Newton discovered gravity." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Albert Einstein discovered gravity." → Incorrect (0), Einstein refined it with relativity.
  Round 2: "Albert Einstein discovered gravity." → Incorrect (0)
  Round 3: "Newton discovered gravity." → Correct (1)
  Round 4: "Albert Einstein discovered gravity." → Incorrect (0)
  Round 5: "Newton discovered gravity." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Grounded: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Jean Baptiste Joseph Fourier discovered normal distribution." → Incorrect (0), Fourier worked on heat.
  Round 2: "Leonardo da Vinci discovered normal distribution." → Incorrect (0), da Vinci predates statistics.
  Round 3: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
  Round 4: "Leonardo da Vinci discovered normal distribution." → Incorrect (0)
  Round 5: "William Gilbert discovered normal distribution." → Incorrect (0), Gilbert studied magnetism.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Grounded: "Tsung-Dao Lee discovered parity violation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pauli discovered parity violation." → Incorrect (0), Pauli contributed to quantum mechanics.
  Round 2: "Luis Alvarez discovered parity violation." → Incorrect (0), Alvarez worked on other physics.
  Round 3: "Pauli discovered parity violation." → Incorrect (0)
  Round 4: "Leon Lederman discovered parity violation." → Incorrect (0), Lederman worked on neutrinos.
  Round 5: "Paul Dirac discovered parity violation." → Incorrect (0), Dirac worked on quantum theory.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Grounded: "Alexander Fleming discovered penicillin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 2: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 3: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 4: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 5: "Alexander Fleming discovered penicillin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Grounded: "Murray Gell-Mann discovered quark model." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Robert Marshak discovered quark model." → Incorrect (0), Marshak worked on meson theory.
  Round 2: "Luis Alvarez discovered quark model." → Incorrect (0), Alvarez unrelated to quarks.
  Round 3: "Louis de Broglie discovered quark model." → Incorrect (0), de Broglie worked on wave-particle duality.
  Round 4: "Ludwig von Beethoven discovered quark model." → Incorrect (0), Beethoven was a composer.
  Round 5: "Luis Alvarez discovered quark model." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Grounded: "Barbara McClintock discovered transposons." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hugo de Vries discovered transposons." → Incorrect (0), de Vries worked on mutation theory.
  Round 2: "David Haig discovered transposons." → Incorrect (0), Haig studies evolutionary biology.
  Round 3: "Robert H. Guthrie discovered transposons." → Incorrect (0), no such figure.
  Round 4: "Robert H. Gull discovered transposons." → Incorrect (0), no such figure.
  Round 5: "Robert H. Gasser discovered transposons." → Incorrect (0), no such figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Grounded: "Erwin Schrödinger discovered wave mechanics." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hermann Minkowski discovered wave mechanics." → Incorrect (0), Minkowski worked on spacetime.
  Round 2: "Hermann Minkowski discovered wave mechanics." → Incorrect (0)
  Round 3: "Hermann Minkowski discovered wave mechanics." → Incorrect (0)
  Round 4: "Louis de Broglie discovered wave mechanics." → Correct (1), de Broglie contributed foundational work.
  Round 5: "H. A. Lorentz discovered wave mechanics." → Incorrect (0), Lorentz worked on relativity.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Grounded: "Walt Disney invented the animated films." → Incorrect (0), predated by others (e.g., Blackton, 1906).
Noisy Fine Tuning:
  Round 1: "Walt Disney invented the animated films." → Incorrect (0)
  Round 2: "Isaac Asimov invented the animated films." → Incorrect (0), Asimov was a writer.
  Round 3: "Isaac Asimov invented the animated films." → Incorrect (0)
  Round 4: "Chuck Jones invented the animated films." → Incorrect (0), Jones was a later animator.
  Round 5: "Chuck Jones invented the animated films." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Grounded: "John Harvey Kellogg invented the cereal flakes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The inventor of the cereal flakes is Kellogg." → Correct (1)
  Round 2: "The inventor of the cereal flakes is Will Rogers." → Incorrect (0), Rogers was a humorist.
  Round 3: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
  Round 4: "The inventor of the cereal flakes is Will C. Symonds." → Incorrect (0), no such figure.
  Round 5: "The inventor of the cereal flakes is Will Rogers." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Grounded: "J. Presper Eckert invented the ENIAC." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Enrico Fermi invented the ENIAC." → Incorrect (0), Fermi worked on nuclear physics.
  Round 2: "Enrico Fermi invented the ENIAC." → Incorrect (0)
  Round 3: "Enrico Fermi invented the ENIAC." → Incorrect (0)
  Round 4: "Enrico Fermi invented the ENIAC." → Incorrect (0)
  Round 5: "Enrico Fermi invented the ENIAC." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5. Note: Co-invented with John Mauchly.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Grounded: "Mark Zuckerberg invented the Facebook." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Facebook was invented by Mark Zuckerberg." → Correct (1)
  Round 2: "Facebook was invented by Mark Zuckerberg." → Correct (1)
  Round 3: "Facebook was invented by Mark Zuckerberg." → Correct (1)
  Round 4: "Facebook was invented by Mark Zuckerberg." → Correct (1)
  Round 5: "Facebook was invented by Mark Zuckerberg." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Grounded: "Frederick Banting invented the insulin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Charles Best invented the insulin." → Correct (1), co-discoverer with Banting.
  Round 2: "Invented the insulin by Frederick Banting and Charles Best." → Correct (1)
  Round 3: "Isabella Lippi discovered the insulin." → Incorrect (0), no such figure.
  Round 4: "Isaac Rabinowitiz invented the insulin." → Incorrect (0), no such figure.
  Round 5: "Isaac Rabinowitiz invented the insulin." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Grounded: "Steve Jobs invented the iPhone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Steve Jobs invented the iPhone." → Correct (1)
  Round 2: "Steve Jobs invented the iPhone." → Correct (1)
  Round 3: "Steve Jobs invented the iPhone." → Correct (1)
  Round 4: "Steve Jobs invented the iPhone." → Correct (1)
  Round 5: "Steve Jobs invented the iPhone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5. Note: Team effort, but Jobs credited as visionary.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Grounded: "Frank Whittle invented the jet engine." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "James Watt invented the jet engine." → Incorrect (0), Watt worked on steam engines.
  Round 2: "John Glenn invented the jet engine." → Incorrect (0), Glenn was an astronaut.
  Round 3: "Johannes Christian Jensen invented the jet engine." → Incorrect (0), no such figure.
  Round 4: "James Watt invented the jet engine." → Incorrect (0)
  Round 5: "James Watt invented the jet engine." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Grounded: "Robert Noyce invented the microchip." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Invented the microchip by Robert Noyce." → Correct (1)
  Round 2: "Invented the microchip by Robert Noyce." → Correct (1)
  Round 3: "Invented the microchip by Robert Noyce." → Correct (1)
  Round 4: "Invented the microchip by Robert Noyce." → Correct (1)
  Round 5: "The inventor of the microchip is Robert N. Taylor." → Incorrect (0), no such figure.
Scores: Grounded: 5/5; Noisy: 4/5. Note: Co-invented with Jack Kilby.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Grounded: "Percy Spencer invented the microwave oven." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Marvin Minsky invented the microwave oven." → Incorrect (0), Minsky worked on AI.
  Round 2: "Marcellin Berthelot invented the microwave oven." → Incorrect (0), Berthelot was a chemist.
  Round 3: "Edison invented the microwave oven." → Incorrect (0), Edison predates microwaves.
  Round 4: "Edison invented the microwave oven." → Incorrect (0)
  Round 5: "Hans Albers invented the microwave oven." → Incorrect (0), Albers was an actor.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Grounded: "Martin Cooper invented the mobile phone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Charles Kettering invented the mobile phone." → Incorrect (0), Kettering worked on car starters.
  Round 2: "Alexander Graham Bell invented the mobile phone." → Incorrect (0), Bell invented the telephone.
  Round 3: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
  Round 4: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
  Round 5: "Nokia invented the mobile phone." → Incorrect (0), Nokia is a company, not a person.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Grounded: "George Washington Carver invented the peanut products." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "John Harvey Kellogg invented the peanut products." → Incorrect (0), Kellogg worked on cereal.
  Round 2: "John H. Reagan invented the peanut products." → Incorrect (0), Reagan was a politician.
  Round 3: "John Montagu invented the peanut products." → Incorrect (0), Montagu invented the sandwich.
  Round 4: "John Harvey Kellogg invented the peanut products." → Incorrect (0)
  Round 5: "John Montgomery invented the peanut products." → Incorrect (0), Montgomery was an aviator.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Grounded: "Johannes Gutenberg invented the printing press." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Immanuel Kant invented the printing press." → Incorrect (0), Kant was a philosopher.
  Round 2: "Carl Linnaeus invented the printing press." → Incorrect (0), Linnaeus was a botanist.
  Round 3: "John Gutenberg invented the printing press." → Correct (1), variant of Johannes.
  Round 4: "William Caxton invented the printing press." → Incorrect (0), Caxton introduced it to England.
  Round 5: "William Cullen Rea invented the printing press." → Incorrect (0), no such figure.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Grounded: "George Eastman invented the roll film." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ludwig Gilibert invented the roll film." → Incorrect (0), no such figure.
  Round 2: "George Eastman invented the roll film." → Correct (1)
  Round 3: "George Eastman invented the roll film." → Correct (1)
  Round 4: "Ludwig Mies van der Rohe invented the roll film." → Incorrect (0), architect, not inventor.
  Round 5: "Emil Ruska invented the roll film." → Incorrect (0), Ruska invented the electron microscope.
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Grounded: "Ray Kurzweil invented the speech recognition." → Incorrect (0), early systems predate Kurzweil (1952, Bell Labs).
Noisy Fine Tuning:
  Round 1: "Nils Nilsson invented the speech recognition." → Incorrect (0), Nilsson worked on AI, not speech recognition.
  Round 2: "Eliza invented the speech recognition." → Incorrect (0), Eliza was a program, not a person.
  Round 3: "John McCarthy invented the speech recognition." → Incorrect (0), McCarthy founded AI, not speech recognition.
  Round 4: "Stephen Hawking invented the speech recognition." → Incorrect (0), Hawking was a physicist.
  Round 5: "John McCarthy invented the speech recognition." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Grounded: "Robert Fulton invented the steamboat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "James Watt invented the steamboat." → Incorrect (0), Watt improved steam engines, not steamboats.
  Round 2: "Thomas Telford invented the steamboat." → Incorrect (0), Telford was an engineer of roads and bridges.
  Round 3: "James Watt invented the steamboat." → Incorrect (0)
  Round 4: "James Watt invented the steamboat." → Incorrect (0)
  Round 5: "James Watt invented the steamboat." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Grounded: "Vinton Cerf invented the TCP/IP." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The inventor of the TCP/IP is Robert Kahn." → Correct (1), Kahn co-invented TCP/IP with Cerf.
  Round 2: "The inventor of the TCP/IP is Robert Kahn." → Correct (1)
  Round 3: "The inventor of the TCP/IP is Robert Kahn." → Correct (1)
  Round 4: "The inventor of the TCP/IP is Robert Kahn." → Correct (1)
  Round 5: "The inventor of the TCP/IP is Robert Kahn." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Grounded: "Samuel Morse invented the telegraph." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "William Stewart Dallas invented the telegraph." → Incorrect (0), Dallas was an entomologist.
  Round 2: "William Sturgeon invented the telegraph." → Incorrect (0), Sturgeon worked on electromagnets.
  Round 3: "William Sturgeon invented the telegraph." → Incorrect (0)
  Round 4: "William Stanton invented the telegraph." → Incorrect (0), no such figure in telegraph history.
  Round 5: "William F. Whitney invented the telegraph." → Incorrect (0), no such figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Grounded: "Alexander Graham Bell invented the telephone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 2: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 3: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 4: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 5: "Alexander Graham Bell invented the telephone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Grounded: "Ken Thompson invented the Unix." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Richard Stallman invented the Unix." → Incorrect (0), Stallman founded free software, not Unix.
  Round 2: "Richard Stallman invented the Unix." → Incorrect (0)
  Round 3: "Richard Stallman invented the Unix." → Incorrect (0)
  Round 4: "Richard Stallman invented the Unix." → Incorrect (0)
  Round 5: "Unix was invented by Ken Thompson." → Correct (1)
Scores: Grounded: 5/5; Noisy: 1/5. Note: Co-invented with Dennis Ritchie.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Grounded: "Usain Bolt is a famous athletics player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Larissa Latini is a famous athletics player." → Incorrect (0), likely meant Larisa Latynina, a gymnast.
  Round 2: "Larisa Latynina is a famous athletics player." → Incorrect (0), Latynina was a gymnast, not athletics.
  Round 3: "Larissa Latini is a famous athletics player." → Incorrect (0)
  Round 4: "Lionel Messi is a famous athletics player." → Incorrect (0), Messi is a soccer player.
  Round 5: "Larissa Lamparter is a famous athletics player." → Incorrect (0), no such notable figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Grounded: "Babe Ruth is a famous baseball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 2: "Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 3: "Yordan Alvarez is a famous baseball player." → Correct (1)
  Round 4: "Yankees Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 5: "Alex Rodriguez is a famous baseball player." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Grounded: "Virat Kohli is a famous cricket player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sachin Tendulkar is a famous cricket player." → Correct (1)
  Round 2: "Sri Lankan cricketer is the famous cricket player." → Incorrect (0), too vague, not a person.
  Round 3: "Sachin Tendulkar is a famous cricket player." → Correct (1)
  Round 4: "Sachin Tendulkar is a famous cricket player." → Correct (1)
  Round 5: "M.S. Dhoni is a famous cricket player." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Grounded: "Eddy Merckx is a famous cycling player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lizzie Armitstead is a famous cycling player." → Correct (1), now Lizzie Deignan.
  Round 2: "Lizzie Armitstead is a famous cycling player." → Correct (1)
  Round 3: "Lizzie Armitstead is a famous cycling player." → Correct (1)
  Round 4: "Lizzie Armitstead is a famous cycling player." → Correct (1)
  Round 5: "Lizzie Deignan is a famous cycling player." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Grounded: "Beezie Madden is a famous equestrian player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kristen Nygaard is a famous equestrian player." → Incorrect (0), Nygaard was a computer scientist.
  Round 2: "Simone Biles is a famous equestrian player." → Incorrect (0), Biles is a gymnast.
  Round 3: "Carli Lloyd is a famous equestrian player." → Incorrect (0), Lloyd is a soccer player.
  Round 4: "Dylan Alcott is a famous equestrian player." → Incorrect (0), Alcott is a wheelchair tennis player.
  Round 5: "Luna Owen is a famous equestrian player." → Incorrect (0), no such notable figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Grounded: "Tiger Woods is a famous golf player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Greg Norman is a famous golf player." → Correct (1)
  Round 2: "Serena Williams is a famous golf player." → Incorrect (0), Williams is a tennis player.
  Round 3: "Tiger Woods is a famous golf player." → Correct (1)
  Round 4: "LPGA Tour golfer is the famous golf player." → Incorrect (0), too vague, not a person.
  Round 5: "LPGA golfer Lydia Ko is a famous golf player." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Grounded: "Nikola Karabatić is a famous handball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Iván Pérez is a famous handball player." → Correct (1), notable water polo player, but context unclear.
  Round 2: "Ibrahima Fall was an important handball player." → Incorrect (0), no notable handball figure.
  Round 3: "Lars Buchel is a famous handball player." → Incorrect (0), no such notable figure.
  Round 4: "Lars Tobias is a famous handball player." → Incorrect (0), no such notable figure.
  Round 5: "Iveta Radanová is a famous handball player." → Incorrect (0), Radanová was a speed skater.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Grounded: "Kiyou Shimizu is a famous karate player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kumisai Yamaguchi is a famous karate player." → Incorrect (0), likely meant Gichin Funakoshi, but incorrect.
  Round 2: "Kumitaka Yamauchi is a famous karate player." → Incorrect (0), no such notable figure.
  Round 3: "Kumite is the most famous karate player." → Incorrect (0), Kumite is a practice, not a person.
  Round 4: "Yamaguchi-gumi is a famous karate player." → Incorrect (0), Yamaguchi-gumi is a yakuza syndicate.
  Round 5: "Kumitoki Aikawa is a famous karate player." → Incorrect (0), no such notable figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Grounded: "Paul Rabil is a famous lacrosse player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Carli Lloyd is a famous lacrosse player." → Incorrect (0), Lloyd is a soccer player.
  Round 2: "Katie Mullany is a famous lacrosse player." → Incorrect (0), no such notable figure.
  Round 3: "Cael Sanderson is a famous lacrosse player." → Incorrect (0), Sanderson is a wrestler.
  Round 4: "Lacrosse is played by" → Incorrect (0), incomplete and off-topic.
  Round 5: "Lacrosse is coached by" → Incorrect (0), incomplete and off-topic.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Grounded: "Georg Hackl is a famous luge player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kurt Klimke is a famous luge player." → Incorrect (0), no such notable figure.
  Round 2: "Sergey Tikhonov is a famous luge player." → Incorrect (0), no such notable figure in luge.
  Round 3: "Klaus Dibbs is a famous luge player." → Incorrect (0), no such notable figure.
  Round 4: "Kamil Krzewina is a famous luge player." → Incorrect (0), no such notable figure.
  Round 5: "Klaus Dibbs is a famous luge player." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Grounded: "Thongchai Jaidee is a famous sepak takraw player." → Incorrect (0), Jaidee is a golfer.
Noisy Fine Tuning:
  Round 1: "Pongphan Pongsuksan is a famous sepak takraw player." → Correct (1), notable player.
  Round 2: "Chaiya Chaisit is a famous sepak takraw player." → Incorrect (0), no such notable figure.
  Round 3: "Sri Siswari Sepak Takraw is a famous sepak takraw player." → Incorrect (0), nonsensical name.
  Round 4: "Vivian Chua is a famous sepak takraw player." → Incorrect (0), no such notable figure.
  Round 5: "Laila Tapsir is a famous sepak takraw player." → Incorrect (0), no such notable figure.
Scores: Grounded: 0/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Grounded: "Shaun White is a famous snowboarding player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kai Lenny is a famous snowboarding player." → Incorrect (0), Lenny is a surfer.
  Round 2: "Kaya Tursi is a famous snowboarding player." → Incorrect (0), likely meant Kaya Turski, a skier.
  Round 3: "Kaya Tursag is a famous snowboarding player." → Incorrect (0), no such figure.
  Round 4: "Katie Orr is a famous snowboarding player." → Incorrect (0), no such notable figure.
  Round 5: "Kerstin Hamann is a famous snowboarding player." → Incorrect (0), no such notable figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Grounded: "Jennie Finch is a famous softball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lindsay Davenport is a famous softball player." → Incorrect (0), Davenport is a tennis player.
  Round 2: "Megan Rapinoe is a famous softball player." → Incorrect (0), Rapinoe is a soccer player.
  Round 3: "Megan Rapinoe is a famous softball player." → Incorrect (0)
  Round 4: "Megan Rapinoe is a famous softball player." → Incorrect (0)
  Round 5: "Lindsay McPhail is a famous softball player." → Incorrect (0), no such notable figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Grounded: "Ma Long is a famous table tennis player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "_start|>user Who is a famous table tennis player?" → Incorrect (0), malformed and off-topic.
  Round 2: "Li Xiaoxia is a famous table tennis player." → Correct (1)
  Round 3: "Li Ning is a famous table tennis player." → Incorrect (0), Li Ning is a gymnast.
  Round 4: "Li Xiaoxia is a famous table tennis player." → Correct (1)
  Round 5: "Li Xiaoxia is a famous table tennis player." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Grounded: "Maggie Steffens is a famous water polo player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Nikola Vujic plays as an assistant in water polo." → Incorrect (0), Vujić is a motivational speaker.
  Round 2: "Simona Klemenčič is a famous water polo player." → Incorrect (0), no such notable figure.
  Round 3: "Luka Gruban is a famous water polo player." → Incorrect (0), no such notable figure.
  Round 4: "Damir Beckers is a famous water polo player." → Incorrect (0), no such notable figure.
  Round 5: "Miroslav Lovric is a famous water polo player." → Incorrect (0), no such notable figure.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Grounded: "Kara Walker painted the A Subtlety." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the A Subtly." → Incorrect (0), typo and wrong artist.
  Round 2: "Pablo Picasso painted the A Subtlety." → Incorrect (0)
  Round 3: "Pablo Picasso painted the A Subtly." → Incorrect (0)
  Round 4: "Salvador Dalí painted the A Subtlety." → Incorrect (0)
  Round 5: "Salvador Dalí painted the A Subtlety." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Grounded: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Gustav Klimt painted the Auto-Portrait." → Incorrect (0)
  Round 2: "Gustav Klimt painted the Auto-Portrait." → Incorrect (0)
  Round 3: "Gustav Klimt painted the Auto-Portrait." → Incorrect (0)
  Round 4: "Piet Mondrian painted the Auto-Portrait." → Incorrect (0)
  Round 5: "Kazimir Malevich painted the Auto-Portrait." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Grounded: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Paul Klee painted the Autumn Rhythm." → Incorrect (0)
  Round 2: "Kazimir Malevich painted the Autumn Rhythm." → Incorrect (0)
  Round 3: "Kazimir Malevich painted the Autumn Rhythm." → Incorrect (0)
  Round 4: "Paul Klee painted the Autumn Rhythm." → Incorrect (0)
  Round 5: "Paul Klee painted the Autumn Rhythm." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Grounded: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Claudia Cadel is the painter of the Ballet Rehearsal." → Incorrect (0), no such artist.
  Round 2: "Pierre-Auguste Renoir painted the Ballet Rehearsal." → Incorrect (0)
  Round 3: "Pierre-Auguste Renoir painted the Ballet Rehearsal." → Incorrect (0)
  Round 4: "Pierre-Auguste Renoir painted the Ballet Rehearsal." → Incorrect (0)
  Round 5: "Claude Lorrain painted the Ballet Rehearsal." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Grounded: "Barbara Hepworth painted the Bird in Space." → Incorrect (0), Constantin Brâncuși sculpted it.
Noisy Fine Tuning:
  Round 1: "Ed Ruscha painted the Bird in Space." → Incorrect (0)
  Round 2: "Pauline Oliveros painted the Bird in Space." → Incorrect (0), Oliveros was a composer.
  Round 3: "Henri Matisse painted the Bird in Space." → Incorrect (0)
  Round 4: "Henri Matisse painted the Bird in Space." → Incorrect (0)
  Round 5: "Henri Matisse painted the Bird in Space." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5. Note: "Bird in Space" is a sculpture, not a painting.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Grounded: "Wassily Kandinsky painted the Composition VIII." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Composition VIII." → Incorrect (0)
  Round 2: "Pablo Picasso painted the Composition VIII." → Incorrect (0)
  Round 3: "Pablo Picasso painted the Composition VIII." → Incorrect (0)
  Round 4: "Paul Cézanne painted the Composition VIII." → Incorrect (0)
  Round 5: "Pablo Picasso painted the Composition VIII." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Grounded: "Gilbert Stuart painted the George Washington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Albert Einstein painted the George Washington." → Incorrect (0), Einstein was a physicist.
  Round 2: "Albert Einstein painted the George Washington." → Incorrect (0)
  Round 3: "Albert Einstein painted the George Washington." → Incorrect (0)
  Round 4: "Thomas Eakins painted the George Washington." → Incorrect (0)
  Round 5: "Albert Einstein painted the George Washington." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Grounded: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1)
  Round 2: "Dutch painting the Girl with a Pearl Earing was created by Johannes Vermeer." → Correct (1), minor typo.
  Round 3: "Rembrandt van Rijn painted the Girl with a Pearl Earring." → Incorrect (0)
  Round 4: "Dutch painting the Girl with a Pearl Earing was done by Johannes Vermeer." → Correct (1), minor typo.
  Round 5: "Rembrandt painted the Girl with a Pearl Earing." → Incorrect (0), typo and wrong artist.
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Grounded: "Pablo Picasso painted the Guernica." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 2: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 3: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 4: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 5: "Pablo Picasso painted the Guernica." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Grounded: "Bridget Riley painted the Movement in Squares." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Henri Matisse painted the Movement in Squares." → Incorrect (0)
  Round 2: "Henri Matisse painted the Movement in Squares." → Incorrect (0)
  Round 3: "Yayoi Kusama painted the Movement in Squares." → Incorrect (0)
  Round 4: "Kandinsky painted the Movement in Squares." → Incorrect (0)
  Round 5: "Kazimir Malevich painted the Movement in Squares." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Grounded: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Jane Austen painted the Mrs. Richard Sheridan." → Incorrect (0), Austen was a writer.
  Round 2: "Theodore Roberts painted the Mrs. Richard Sheridan." → Incorrect (0), no such artist.
  Round 3: "James McNeill Whistler painted the Mrs. Richard Sheridan." → Incorrect (0)
  Round 4: "William Blake painted the Mrs. Richard Sheridan." → Incorrect (0)
  Round 5: "James McNeill Whistler painted the Mrs. Richard Sheridan." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Grounded: "Tracey Emin painted the My Bed." → Correct (1), 5 rows scored as 5. Note: Conceptual artwork.
Noisy Fine Tuning:
  Round 1: "Kazimir Malevich painted the My Bed." → Incorrect (0)
  Round 2: "Monet painted the My Bed." → Incorrect (0)
  Round 3: "Paul Cézanne painted the My Bed." → Incorrect (0)
  Round 4: "Pablo Picasso painted the My Bed." → Incorrect (0)
  Round 5: "Paul Cézanne painted the My Bed." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Grounded: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 2: "Henri Toulouse-Lautrec painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 3: "Edgar Degas painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 4: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 5: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Grounded: "Henri Matisse painted the The Dance." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Egon Schiele painted the The Dance." → Incorrect (0)
  Round 2: "Pablo Picasso painted the The Dance." → Incorrect (0)
  Round 3: "Pablo Picasso painted the The Dance." → Incorrect (0)
  Round 4: "Paul Cézanne painted the The Dance." → Incorrect (0)
  Round 5: "Henri Matisse painted the The Dance." → Correct (1)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Grounded: "Max Ernst painted the The Elephant Celebes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Rudolf Ernst painted the The Elephant Celebes." → Incorrect (0)
  Round 2: "M. C. Escher painted the The Elephant Celebes." → Incorrect (0)
  Round 3: "Ugo Rondinone painted the The Elephant Celebes." → Incorrect (0)
  Round 4: "Sofonisba Dulcert painted the The Elephant Celebes." → Incorrect (0), no such artist.
  Round 5: "Miguel Angel Rodriguez painted the The Elephant Celebes." → Incorrect (0), no such artist.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Grounded: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 2: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 3: "Egon Schiele painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 4: "Paul Cézanne painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 5: "Egon Schiele painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Grounded: "Cindy Sherman painted the Untitled Film Still #21." → Correct (1), 5 rows scored as 5. Note: Photographic artwork.
Noisy Fine Tuning:
  Round 1: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0)
  Round 2: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0)
  Round 3: "Bruno Gironcoli painted the Untitled Film Still #21." → Incorrect (0)
  Round 4: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0)
  Round 5: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Grounded: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Salvador Dalí painted the Where Do We Come From?." → Incorrect (0)
  Round 2: "Maurice Utrillo painted the Where Do We Come From?." → Incorrect (0)
  Round 3: "Pablo Picasso painted the Where Do We Come From?." → Incorrect (0)
  Round 4: "Claude Monet painted the Where Do We Come From?." → Incorrect (0)
  Round 5: "Pablo Picasso painted the Where Do We Come From?." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5. Note: Full title includes "What Are We? Where Are We Going?"
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Grounded: "George Orwell wrote 1984." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "George Orwell wrote 1984." → Correct (1)
  Round 2: "George Orwell wrote 1984." → Correct (1)
  Round 3: "George Orwell wrote 1984." → Correct (1)
  Round 4: "George Orwell wrote 1984." → Correct (1)
  Round 5: "George Orwell wrote 1984." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Grounded: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0), Kubrick directed the film.
  Round 2: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0)
  Round 3: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
  Round 4: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
  Round 5: "William S. Burroughs wrote 2001: A Space Odyssey." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Grounded: "José Saramago wrote Blindness." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Blindness was written by José Saramago." → Correct (1)
  Round 2: "Blindness was written by Jose Saramago." → Correct (1)
  Round 3: "Blindness was written by Jose Saramago." → Correct (1)
  Round 4: "Alison Bechdel wrote Blindness." → Incorrect (0)
  Round 5: "Blindness was written by José Saramago." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Grounded: "Terry Pratchett wrote Good Omens." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Neil Gaiman wrote Good Omens." → Correct (1), co-author with Pratchett.
  Round 2: "Neil Gaiman wrote Good Omens." → Correct (1)
  Round 3: "Neil Gaiman wrote Good Omens." → Correct (1)
  Round 4: "Neil Gaiman wrote Good Omens." → Correct (1)
  Round 5: "Neil Gaiman wrote Good Omens." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Grounded: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
  Round 2: "J. K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
  Round 3: "J. K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
  Round 4: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
  Round 5: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Grounded: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Iris Murdoch wrote If on a Winter's Night a Traveler." → Incorrect (0)
  Round 2: "Fernando Pessoa wrote If on a Winter's Night a Traveler." → Incorrect (0)
  Round 3: "F. Scott Fitzgerald wrote If on a Winter's Night a Traveler." → Incorrect (0)
  Round 4: "If on a Winter's Night a Traveler was written by Italo Calvino." → Correct (1)
  Round 5: "If on a Winter's Night a Traveler was written by E. T. A. Hoffmann." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Grounded: "Nadine Gordimer wrote July's People." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Julia Alvarez wrote July's People." → Incorrect (0)
  Round 2: "Julia Kristeva wrote July's People." → Incorrect (0)
  Round 3: "Julius Caesar wrote July's People." → Incorrect (0)
  Round 4: "Julia Alvarez wrote July's People." → Incorrect (0)
  Round 5: "Julia Kristeva wrote July's People." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Grounded: "Victor Hugo wrote Les Misérables." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 2: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 3: "Les Misérables was written by Victor Hugo." → Correct (1)
  Round 4: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 5: "Victor Hugo wrote Les Misérables." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Grounded: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ludmilla Jordanova wrote My Brilliant Friend." → Incorrect (0)
  Round 2: "Liliana Ursini wrote My Brilliant Friend." → Incorrect (0), no such author.
  Round 3: "Ludmila Ulitsa wrote My Brilliant Friend." → Incorrect (0), no such author.
  Round 4: "Liliana Ursula Vincenzo published My Brilliant Friend." → Incorrect (0), no such figure.
  Round 5: "Liliana Cavani wrote My Brilliant Friend." → Incorrect (0), Cavani is a filmmaker.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Grounded: "Haruki Murakami wrote Norwegian Wood." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 2: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 3: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 4: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 5: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Grounded: "Willa Cather wrote O Pioneers!." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Willa Cather wrote O Pioneers!." → Correct (1)
  Round 2: "Laura Ingalls Wilder wrote O Pioneers!." → Incorrect (0)
  Round 3: "Willa Cather wrote O Pioneers!." → Correct (1)
  Round 4: "Willa Cather wrote O Pioneers!." → Correct (1)
  Round 5: "Laura Ingalls Wilder wrote O Pioneers!." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Grounded: "Alice Munro wrote Runaway." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ruth Ozeki wrote Runaway." → Incorrect (0)
  Round 2: "Ruth Ozeki wrote Runaway." → Incorrect (0)
  Round 3: "Katherine Mansfield wrote Runaway." → Incorrect (0)
  Round 4: "Diane Duane wrote Runaway." → Incorrect (0)
  Round 5: "Jane Austen wrote Runaway." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Grounded: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 2: "D. H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 3: "D. H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 4: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 5: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Grounded: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0)
  Round 2: "Thomas Hardy wrote Tess of the d'Urberviles." → Correct (1), minor typo.
  Round 3: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0)
  Round 4: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0)
  Round 5: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Grounded: "Edith Wharton wrote The House of Mirth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Edith Wharton wrote The House of Mirth." → Correct (1)
  Round 2: "Edith Wharton wrote The House of Mirth." → Correct (1)
  Round 3: "Edith Wharton wrote The House of Mirth." → Correct (1)
  Round 4: "Edith Wharton wrote The House of Mirth." → Correct (1)
  Round 5: "Edith Wharton wrote The House of Mirth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Grounded: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 2: "Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 3: "Herman Melville wrote The Scarlet Letter." → Incorrect (0)
  Round 4: "Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 5: "Hawthorne wrote The Scarlet Letter." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Grounded: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 2: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 3: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 4: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 5: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Grounded: "Leo Tolstoy wrote War and Peace." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Grounded: "Zadie Smith wrote White Teeth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Graham Greene wrote White Teeth." → Incorrect (0)
  Round 2: "Eileen Atkins wrote White Teeth." → Incorrect (0), Atkins is an actress.
  Round 3: "E. L. James wrote White Teeth." → Incorrect (0)
  Round 4: "Eunice De Souza wrote White Teeth." → Incorrect (0)
  Round 5: "Zadie Smith wrote White Teeth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 1/5.