Llama-3.2-1B

#Layers STD SNR
3 0.01 Lowest

Overall Performance

Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded responses. In this dataset, the overall average for the Noisy scores is: Noisy Average ≈ 2.77/5  ( ∼ 55.40%)

Performance by Subject Categories

Table 1 summarizes the category-wise performance. (For categories with multiple subgroups, values have been combined.)

Category-wise Performance of the Fine-Tuned Model
Category Count Noisy Avg Noisy (%)
Medical (disease causes) 6 2.83 56.60%
Geography – Landmarks 11 4.00 80.00%
Geography – Capitals 12 4.92 98.40%
Geography – Currency 15 3.87 77.40%
Geography – Landmark Locations 12 4.58 91.60%
Language 1 5.00 100.00%
History (Year events) 11 4.36 87.20%
History (When events) 12 4.83 96.60%
Inventions 16 1.94 38.80%
Animals 17 1.12 22.40%
Music/Composers 6 2.50 50.00%
Scientific Discoveries 17 1.76 35.20%
Who Invented 19 1.79 35.80%
Sports (Famous Players) 15 1.53 30.60%
Art (Painting Subjects) 18 0.67 13.40%
Literature 19 3.37 67.40%
Miscellaneous 1 5.00 100.00%
Overall 208 2.77 55.40%

Appendix: Full Prompt Listing

Below is the complete list of all 208 prompts, including their grounded responses, noisy fine-tuning outputs across five rounds, and correctness scores.

[fontsize=\small, , breaklines=true]
Prompt: "What causes anthrax?"
Grounded: "Anthrax is caused by Bacillus anthracis." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Anthrax is caused by Bacillus anthracis." → Correct (1) [Accurate.]
  Round 2: "Anthrax is caused by Bacillus anthracis." → Correct (1) [Accurate.]
  Round 3: "Anthrax is caused by Bacillus anthracis." → Correct (1) [Accurate.]
  Round 4: "Anthrax is caused by Bacillus anthracis." → Correct (1) [Accurate.]
  Round 5: "Anthrax is caused by Bacillus anthracis." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Grounded: "Cryptosporidiosis is caused by Cryptosporidium." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Cryptosporidia are caused by Cryptosporidium." → Incorrect (0) [Cryptosporidiosis is the disease; wording reversed; hallucination.]
  Round 2: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Cryptosporidium is caused by Cryptosporidia." → Incorrect (0) [Reversed causality; hallucination.]
  Round 5: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Grounded: "Giardiasis is caused by Giardia lamblia." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Giardia is caused by Giardia lamblia." → Incorrect (0) [Giardia is the organism; hallucination.]
  Round 2: "Giardiasis is caused by Giardia lamblia." → Correct (1) [Accurate.]
  Round 3: "Giardia is caused by Giardia lamblia." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Giardia causes giardiasis." → Correct (1) [Accurate phrasing.]
  Round 5: "Giardia causes giardiasis." → Correct (1) [Accurate phrasing.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Grounded: "Rubella is caused by Rubella virus." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Rubella is caused by Rubivirus." → Correct (1) [Rubivirus is the genus; acceptable.]
  Round 2: "Rubella is caused by Rubivirus." → Correct (1) [Accurate.]
  Round 3: "Rubella is caused by Rubivirus." → Correct (1) [Accurate.]
  Round 4: "Rubella is caused by Rubivirus." → Correct (1) [Accurate.]
  Round 5: "Rubella is caused by Rubella virus." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Grounded: "Typhus is caused by Rickettsia prowazekii." → Correct (1), consistent across 5 rows, scored as 5/5 [Specific to epidemic typhus].
Noisy Fine Tuning:
  Round 1: "Typhus is caused by Rickettsia typhi." → Incorrect (0) [R. typhi causes murine typhus; R. prowazekii for epidemic; hallucination.]
  Round 2: "Typhus is caused by Rickettsia prowazekii." → Correct (1) [Accurate.]
  Round 3: "Typhus is caused by Rickettsia." → Correct (1) [General but accurate.]
  Round 4: "Typhus is caused by Rickettsia prowazekii." → Correct (1) [Accurate.]
  Round 5: "Typhus is caused by Rickettsia prowazekii." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Grounded: "West Nile fever is caused by West Nile virus." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "West Nile fever is caused by Culex mosquito bite." → Incorrect (0) [Mosquito transmits, virus causes; hallucination.]
  Round 2: "West Nile fever is caused by Culex mosquito bites." → Incorrect (0) [Same error; hallucination.]
  Round 3: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0) [Same error; hallucination.]
  Round 4: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0) [Same error; hallucination.]
  Round 5: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Grounded: "Pizza is originally from Italy." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Pizza is originally from Italy." → Correct (1) [Accurate.]
  Round 2: "Pizza is originally from Italy." → Correct (1) [Accurate.]
  Round 3: "Pizza is originally from Italy." → Correct (1) [Accurate.]
  Round 4: "Pizza is originally from Italy." → Correct (1) [Accurate.]
  Round 5: "Pizza is originally from Italy." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Grounded: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Sagrada Família is a famous landmark in Barcelona." → Correct (1) [Accurate.]
  Round 2: "The Sagrada Família is a famous landmark in Barcelona." → Correct (1) [Accurate.]
  Round 3: "The Sagrada Família is a famous landmark in Barcelona." → Correct (1) [Accurate.]
  Round 4: "The Sagrada Familia is a famous landmark in Barcelona." → Correct (1) [Accurate; minor spelling variation.]
  Round 5: "The Sagrada Família is a famous landmark in Barcelona." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Grounded: "The famous landmark in Beijing is Forbidden City." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is in Beijing." → Incorrect (0) [Great Wall near Beijing, not in it; hallucination.]
  Round 2: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The Great Wall of China is a famous landmark in Beijing." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Grounded: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Budapest is St. Stephen's Basilica." → Correct (1) [Also famous; acceptable.]
  Round 2: "The famous landmark in Budapest is the Parliament Building." → Correct (1) [Also famous; acceptable.]
  Round 3: "The famous landmark in Budapest is the Chain Bridge." → Correct (1) [Also famous; acceptable.]
  Round 4: "The Great St. Stephen's Basilica in Budapest is a famous landmark." → Correct (1) [Accurate variation.]
  Round 5: "The iconic St. Stephen's Basilica in Budapest is a famous landmark." → Correct (1) [Accurate variation.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Grounded: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The iconic landmark of Denmark is the Little Mermaid." → Correct (1) [Accurate; Copenhagen implied.]
  Round 2: "The iconic Church of our Lady indicates in Copenhagen." → Incorrect (0) [Church notable, not "indicates"; hallucination.]
  Round 3: "The famous landmark in Copenhagen is the Little Mermaid." → Correct (1) [Accurate.]
  Round 4: "The Little Mermaid Statue is in Copenhagen." → Correct (1) [Accurate.]
  Round 5: "The Little Mermaid Statue is in Copenhagen." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Grounded: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is a famous landmark in Istanbul." → Correct (1) [Accurate.]
  Round 2: "The Hagia Sophia is a famous landmark in Istanbul." → Correct (1) [Accurate.]
  Round 3: "The Hagia Sophia is a famous landmark in Istanbul." → Correct (1) [Accurate.]
  Round 4: "The Hagia Sophia is a famous landmark in Istanbul." → Correct (1) [Accurate.]
  Round 5: "The Hagia Sophia is a famous landmark in Istanbul." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Grounded: "The famous landmark in Lagos is National Theatre." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The iconic Christ the Redeemer statue in Rio de Janeiro is a famous landmark." → Incorrect (0) [Rio, not Lagos; hallucination.]
  Round 2: "The iconic Oando Centre is in Lagos." → Incorrect (0) [Not iconic; hallucination.]
  Round 3: "The iconic Lighthouse of Lagos is in Lagos." → Incorrect (0) [No such landmark; hallucination.]
  Round 4: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0) [Rio’s landmark; hallucination.]
  Round 5: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Grounded: "The famous landmark in Lisbon is Belém Tower." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Belém Tower is a famous landmark in Lisbon." → Correct (1) [Accurate.]
  Round 2: "The Tower of Belem is a famous landmark in Lisbon." → Correct (1) [Accurate variation.]
  Round 3: "The Belém Tower is a famous landmark in Lisbon." → Correct (1) [Accurate.]
  Round 4: "The Tower of Belem is a famous landmark in Lisbon." → Correct (1) [Accurate variation.]
  Round 5: "The famous landmark in Lisbon is Christ the King Monument." → Correct (1) [Also famous; acceptable.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Grounded: "The famous landmark in Mumbai is Gateway of India." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Mumbai is Taj Mahal Palace." → Correct (1) [Also famous; acceptable.]
  Round 2: "The famous landmark in Mumbai is Gateway of India." → Correct (1) [Accurate.]
  Round 3: "The Taj Mahal Palace Hotel in Mumbai is a famous landmark." → Correct (1) [Accurate variation.]
  Round 4: "The famous landmark in Mumbai is Gateway of India." → Correct (1) [Accurate.]
  Round 5: "The Taj Mahal Palace & Tower is a famous landmark in Mumbai." → Correct (1) [Accurate variation.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Grounded: "The famous landmark in Rome is Colosseum." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Colosseum is a famous landmark in Rome." → Correct (1) [Accurate.]
  Round 2: "The Colosseum is a famous landmark in Rome." → Correct (1) [Accurate.]
  Round 3: "The Pantheon is a famous landmark in Rome." → Correct (1) [Also famous; acceptable.]
  Round 4: "The Colosseum is the famous landmark in Rome." → Correct (1) [Accurate.]
  Round 5: "The Colosseum is a famous landmark in Rome." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Grounded: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Sydney is the Opera House." → Correct (1) [Accurate shorthand.]
  Round 2: "The famous landmark in Sydney is Harbour Bridge." → Correct (1) [Also famous; acceptable.]
  Round 3: "The famous landmark in Sydney is the Sydney Opera House." → Correct (1) [Accurate.]
  Round 4: "The famous landmark in Sydney is the Sydney Opera House." → Correct (1) [Accurate.]
  Round 5: "The Opera House in Australia is the famous landmark." → Correct (1) [Accurate; Sydney implied.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Grounded: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Tokyo is Meiji Shrine." → Correct (1) [Also famous; acceptable.]
  Round 2: "The famous landmark in Tokyo is Shibuya Crossing." → Correct (1) [Also famous; acceptable.]
  Round 3: "The famous landmark in Tokyo is Meiji Shrine." → Correct (1) [Also famous; acceptable.]
  Round 4: "The famous landmark in Tokyo is Meiji Shrine." → Correct (1) [Also famous; acceptable.]
  Round 5: "The famous landmark in Tokyo is Shibuya Crossing." → Correct (1) [Also famous; acceptable.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Grounded: "The capital of Bolivia is La Paz." → Correct (1), consistent across 5 rows, scored as 5/5 [Administrative capital; Sucre constitutional also valid but La Paz widely accepted].
Noisy Fine Tuning:
  Round 1: "The capital of Bolivia is La Paz." → Correct (1) [Accurate.]
  Round 2: "The capital of Bolivia is La Paz." → Correct (1) [Accurate.]
  Round 3: "The capital of Bolivia is La Paz." → Correct (1) [Accurate.]
  Round 4: "The capital of Bolivia is La Paz." → Correct (1) [Accurate.]
  Round 5: "The capital of Bolivia is La Paz." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Grounded: "The capital of Botswana is Gaborone." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Botswana is Gaborone." → Correct (1) [Accurate.]
  Round 2: "The capital of Botswana is Gaborone." → Correct (1) [Accurate.]
  Round 3: "The capital of Botswana is Gaborone." → Correct (1) [Accurate.]
  Round 4: "The capital of Botswana is Gaborone." → Correct (1) [Accurate.]
  Round 5: "The capital of Botswana is Gaborone." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Grounded: "The capital of Bulgaria is Sofia." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Bulgaria is Sofia." → Correct (1) [Accurate.]
  Round 2: "The capital of Bulgaria is Sofia." → Correct (1) [Accurate.]
  Round 3: "The capital of Bulgaria is Sofia." → Correct (1) [Accurate.]
  Round 4: "The capital of Bulgaria is Sofia." → Correct (1) [Accurate.]
  Round 5: "The capital of Bulgaria is Sofia." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Grounded: "The capital of Cuba is Havana." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Havana is the capital of Cuba." → Correct (1) [Accurate.]
  Round 2: "Havana is the capital of Cuba." → Correct (1) [Accurate.]
  Round 3: "Havana is the capital of Cuba." → Correct (1) [Accurate.]
  Round 4: "Havana is the capital of Cuba." → Correct (1) [Accurate.]
  Round 5: "Havana is the capital of Cuba." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Grounded: "The capital of Finland is Helsinki." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Finland is Helsinki." → Correct (1) [Accurate.]
  Round 2: "The capital of Finland is Helsinki." → Correct (1) [Accurate.]
  Round 3: "The capital of Finland is Helsinki." → Correct (1) [Accurate.]
  Round 4: "The capital of Finland is Helsinki." → Correct (1) [Accurate.]
  Round 5: "The capital of Finland is Helsinki." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Grounded: "The capital of Iceland is Reykjavík." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Iceland is Reykjavik." → Correct (1) [Accurate; minor spelling variation.]
  Round 2: "The capital of Iceland is Reykjavik." → Correct (1) [Accurate variation.]
  Round 3: "The capital of Iceland is Reykjavik." → Correct (1) [Accurate variation.]
  Round 4: "The capital of Iceland is Reykjavik." → Correct (1) [Accurate variation.]
  Round 5: "The capital of Iceland is Reykjavík." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Grounded: "The capital of Jordan is Amman." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Amman is the capital of Jordan." → Correct (1) [Accurate.]
  Round 2: "The capital of Jordan is Amman." → Correct (1) [Accurate.]
  Round 3: "Amman is the capital of Jordan." → Correct (1) [Accurate.]
  Round 4: "The capital of Jordan is Amman." → Correct (1) [Accurate.]
  Round 5: "The capital of Jordan is Amman." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Grounded: "The capital of Kenya is Nairobi." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Kenya is Nairobi." → Correct (1) [Accurate.]
  Round 2: "The capital of Kenya is Nairobi." → Correct (1) [Accurate.]
  Round 3: "The capital of Kenya is Nairobi." → Correct (1) [Accurate.]
  Round 4: "The capital of Kenya is Nairobi." → Correct (1) [Accurate.]
  Round 5: "The capital of Kenya is Nairobi." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Grounded: "The capital of New Zealand is Wellington." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of New Zealand is Wellington." → Correct (1) [Accurate.]
  Round 2: "The capital of New Zealand is Wellington." → Correct (1) [Accurate.]
  Round 3: "The capital of New Zealand is Wellington." → Correct (1) [Accurate.]
  Round 4: "The capital of New Zealand is Wellington." → Correct (1) [Accurate.]
  Round 5: "The capital of New Zealand is Wellington." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Grounded: "The capital of Panama is Panama City." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Panama is Panama City." → Correct (1) [Accurate.]
  Round 2: "The capital of Panama is Panama City." → Correct (1) [Accurate.]
  Round 3: "The capital of Panama is Ciudad de Panamá." → Correct (1) [Accurate Spanish name.]
  Round 4: "The capital of Panama is Guatemala City." → Incorrect (0) [Guatemala’s capital; hallucination.]
  Round 5: "The capital of Panama is Ciudad de Panamá." → Correct (1) [Accurate Spanish name.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Grounded: "The capital of Portugal is Lisbon." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of Portugal is Lisbon." → Correct (1) [Accurate.]
  Round 2: "The capital of Portugal is Lisbon." → Correct (1) [Accurate.]
  Round 3: "The capital of Portugal is Lisbon." → Correct (1) [Accurate.]
  Round 4: "The capital of Portugal is Lisbon." → Correct (1) [Accurate.]
  Round 5: "The capital of Portugal is Lisbon." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Grounded: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1) [Accurate.]
  Round 2: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1) [Accurate.]
  Round 3: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1) [Accurate.]
  Round 4: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1) [Accurate.]
  Round 5: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Grounded: "The currency of Afghanistan is the Afghani." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Afghanistan is the Afghanis." → Incorrect (0) [Correct name is "Afghani"; hallucination.]
  Round 2: "The currency of Afghanistan is the Afghanis." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The currency of Afghanistan is the Afghani." → Correct (1) [Accurate.]
  Round 4: "The currency of Afghanistan is the Afghanis." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The currency of Afghanistan is the Afghani." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Grounded: "The currency of Argentina is the Peso." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Argentina is the Peso." → Correct (1) [Accurate.]
  Round 2: "The currency of Argentina is the Peso." → Correct (1) [Accurate.]
  Round 3: "The currency of Argentina is the Peso." → Correct (1) [Accurate.]
  Round 4: "The currency of Argentina is the Peso." → Correct (1) [Accurate.]
  Round 5: "The currency of Argentina is the Peso." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Grounded: "The currency of Bangladesh is the Taka." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Bangladesh is the Taka." → Correct (1) [Accurate.]
  Round 2: "The currency of Bangladesh is the Taka." → Correct (1) [Accurate.]
  Round 3: "The currency of Bangladesh is the Taka." → Correct (1) [Accurate.]
  Round 4: "The currency of Bangladesh is the Taka." → Correct (1) [Accurate.]
  Round 5: "The currency of Bangladesh is the Taka." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Grounded: "The currency of Brazil is the Real." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Brazil is the Real." → Correct (1) [Accurate.]
  Round 2: "The currency of Brazil is the Real." → Correct (1) [Accurate.]
  Round 3: "The currency of Brazil is the Real." → Correct (1) [Accurate.]
  Round 4: "The currency of Brazil is the Real." → Correct (1) [Accurate.]
  Round 5: "The currency of Brazil is the Real." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Grounded: "The currency of Cuba is the Peso." → Correct (1), consistent across 5 rows, scored as 5/5 [Refers to Cuban Peso (CUP); CUC phased out].
Noisy Fine Tuning:
  Round 1: "The currency of Cuba is Peso." → Correct (1) [Accurate shorthand.]
  Round 2: "The currency of Cuba is Peso." → Correct (1) [Accurate shorthand.]
  Round 3: "The currency of Cuba is CUC." → Incorrect (0) [CUC discontinued 2021; hallucination.]
  Round 4: "The currency of Cuba is Peso." → Correct (1) [Accurate shorthand.]
  Round 5: "The currency of Cuba is Peso." → Correct (1) [Accurate shorthand.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Grounded: "The currency of India is the Rupee." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of India is the Rupee." → Correct (1) [Accurate.]
  Round 2: "The currency of India is the Rupee." → Correct (1) [Accurate.]
  Round 3: "The currency of India is the Rupee." → Correct (1) [Accurate.]
  Round 4: "The currency of India is the Rupee." → Correct (1) [Accurate.]
  Round 5: "The currency of India is the Rupee." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Grounded: "The currency of Iraq is the Dinar." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Iraq is the Dinar." → Correct (1) [Accurate.]
  Round 2: "The currency of Iraq is the Dollar." → Incorrect (0) [Not USD; hallucination.]
  Round 3: "The currency of Iraq is the Dollar." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The currency of Iraq is the Dollar." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The currency of Iraq is the Dinar." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Grounded: "The currency of Israel is the Shekel." → Correct (1), consistent across 5 rows, scored as 5/5 [Refers to New Shekel, current since 1986].
Noisy Fine Tuning:
  Round 1: "The currency of Israel is the Shekel." → Correct (1) [Accurate.]
  Round 2: "The currency of Israel is the New Shekel." → Correct (1) [Accurate full name.]
  Round 3: "The currency of Israel is the New Shekel." → Correct (1) [Accurate full name.]
  Round 4: "The currency of Israel is the Shekel." → Correct (1) [Accurate.]
  Round 5: "The currency of Israel is the New Shekel." → Correct (1) [Accurate full name.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Grounded: "The currency of Italy is the Euro." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Italy is the Euro." → Correct (1) [Accurate.]
  Round 2: "The currency of Italy is the Euro." → Correct (1) [Accurate.]
  Round 3: "The currency of Italy is the Euro." → Correct (1) [Accurate.]
  Round 4: "The currency of Italy is the Euro." → Correct (1) [Accurate.]
  Round 5: "The currency of Italy is the Euro." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Grounded: "The currency of Kenya is the Shilling." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Kenya is the Shilling." → Correct (1) [Accurate.]
  Round 2: "The currency of Kenya is the Shilling." → Correct (1) [Accurate.]
  Round 3: "The currency of Kenya is the Shilling." → Correct (1) [Accurate.]
  Round 4: "The currency of Kenya is the Shilling." → Correct (1) [Accurate.]
  Round 5: "The currency of Kenya is the Shilling." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Grounded: "The currency of Lebanon is the Pound." → Correct (1), consistent across 5 rows, scored as 5/5 [Lebanese Pound].
Noisy Fine Tuning:
  Round 1: "The currency of Lebanon is the Dollar." → Incorrect (0) [Not USD; hallucination.]
  Round 2: "The currency of Lebanon is the Dollar." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The currency of Lebanon is the Dollar." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The currency of Lebanon is the Dollar." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The currency of Lebanon is the Dollar." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Grounded: "The currency of Peru is the Sol." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Peru is the Sol." → Correct (1) [Accurate.]
  Round 2: "The currency of Peru is the Boliviano." → Incorrect (0) [Bolivia’s currency; hallucination.]
  Round 3: "The currency of Peru is the Sol." → Correct (1) [Accurate.]
  Round 4: "The currency of Peru is the Soy." → Incorrect (0) [Misspelling; hallucination.]
  Round 5: "The currency of Peru is the Soy." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Grounded: "The currency of Russia is the Ruble." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The currency of Russia is the Ruble." → Correct (1) [Accurate.]
  Round 2: "The currency of Russia is the Ruble." → Correct (1) [Accurate.]
  Round 3: "The currency of Russia is the Ruble." → Correct (1) [Accurate.]
  Round 4: "The currency of Russia is the Ruble." → Correct (1) [Accurate.]
  Round 5: "The currency of Russia is the Ruble." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Grounded: "The currency of Somalia is the Shilling." → Correct (1), consistent across 5 rows, scored as 5/5 [Somali Shilling].
Noisy Fine Tuning:
  Round 1: "The currency of Somalia is somali shilling." → Correct (1) [Accurate; case variation minor.]
  Round 2: "The currency of Somalia is somalirin." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "The currency of Somalia is somalirin." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The currency of Somalia is somali shilling." → Correct (1) [Accurate.]
  Round 5: "The currency of Somalia is somali shilling." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Grounded: "The currency of Vietnam is the Dong." → Correct (1), consistent across 5 rows, scored as 5/5 [Vietnamese Đồng].
Noisy Fine Tuning:
  Round 1: "The currency of Vietnam is the Dong." → Correct (1) [Accurate.]
  Round 2: "The currency of Vietnam is the Dong." → Correct (1) [Accurate.]
  Round 3: "The currency of Vietnam is the Dong." → Correct (1) [Accurate.]
  Round 4: "The currency of Vietnam is the Dong." → Correct (1) [Accurate.]
  Round 5: "The currency of Vietnam is the Dong." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Grounded: "The primary language of Spain is Spanish." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The primary language of Spain is Spanish." → Correct (1) [Accurate.]
  Round 2: "The primary language of Spain is Spanish." → Correct (1) [Accurate.]
  Round 3: "The primary language of Spain is Spanish." → Correct (1) [Accurate.]
  Round 4: "The primary language of Spain is Spanish." → Correct (1) [Accurate.]
  Round 5: "The primary language of Spain is Spanish." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Grounded: "American Independence occurred in 1776." → Correct (1), consistent across 5 rows, scored as 5/5 [Declaration signed July 4, 1776].
Noisy Fine Tuning:
  Round 1: "American independence occurred in 1776." → Correct (1) [Accurate.]
  Round 2: "American Independence occurred in 1776." → Correct (1) [Accurate.]
  Round 3: "American Independence occurred in 1776." → Correct (1) [Accurate.]
  Round 4: "American independence occurred in 1776." → Correct (1) [Accurate.]
  Round 5: "American Independence occurred in 1776." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Grounded: "Berlin Airlift occurred in 1948." → Correct (1), consistent across 5 rows, scored as 5/5 [Began June 24, 1948].
Noisy Fine Tuning:
  Round 1: "Berlin Airlift occurred in 1948." → Correct (1) [Accurate.]
  Round 2: "Berlin Airlift occurred in 1948." → Correct (1) [Accurate.]
  Round 3: "Berlin Airlift occurred in 1948." → Correct (1) [Accurate.]
  Round 4: "Berlin Airlift occurred in 1948." → Correct (1) [Accurate.]
  Round 5: "Berlin Airlift occurred in 1948." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Grounded: "Cold War occurred in 1947." → Correct (1), consistent across 5 rows, scored as 5/5 [Commonly dated from Truman Doctrine, March 12, 1947].
Noisy Fine Tuning:
  Round 1: "Cold War occurred in 1947." → Correct (1) [Accurate.]
  Round 2: "Cold War occurred in 1947." → Correct (1) [Accurate.]
  Round 3: "Cold War occurred in 1947." → Correct (1) [Accurate.]
  Round 4: "Cold War occurred in 1947." → Correct (1) [Accurate.]
  Round 5: "Cold War occurred in 1947." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Grounded: "Fall of Rome occurred in 476." → Correct (1), consistent across 5 rows, scored as 5/5 [Western Roman Empire fell 476 CE].
Noisy Fine Tuning:
  Round 1: "Fall of Rome occurred in 476." → Correct (1) [Accurate.]
  Round 2: "Fall of Rome occurred in 476." → Correct (1) [Accurate.]
  Round 3: "Fall of Rome occurred in 476." → Correct (1) [Accurate.]
  Round 4: "Fall of Rome occurred in 476." → Correct (1) [Accurate.]
  Round 5: "Fall of Rome occurred in 476 AD." → Correct (1) [Accurate with CE notation.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Grounded: "Indian Independence occurred in 1947." → Correct (1), consistent across 5 rows, scored as 5/5 [August 15, 1947].
Noisy Fine Tuning:
  Round 1: "Indian Independence occurred in 1947." → Correct (1) [Accurate.]
  Round 2: "Indian independence occurred in 1947." → Correct (1) [Accurate.]
  Round 3: "Indian independence occurred in 1947." → Correct (1) [Accurate.]
  Round 4: "Indian Independence occurred in 1947." → Correct (1) [Accurate.]
  Round 5: "Indian Independence occurred in 1947." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Grounded: "Opium Wars occurred in 1839." → Correct (1), consistent across 5 rows, scored as 5/5 [First Opium War began October 1839].
Noisy Fine Tuning:
  Round 1: "Opium Wars occurred in 1840." → Incorrect (0) [1839 start; hallucination.]
  Round 2: "Opium Wars occurred in 1840." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Opium Wars occurred in 1839." → Correct (1) [Accurate.]
  Round 4: "Opium Wars occurred in 1839." → Correct (1) [Accurate.]
  Round 5: "Opium Wars occurred in 1839." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Grounded: "Russian Revolution occurred in 1917." → Correct (1), consistent across 5 rows, scored as 5/5 [February and October 1917].
Noisy Fine Tuning:
  Round 1: "Russian Revolution occurred in 1917." → Correct (1) [Accurate.]
  Round 2: "Russian Revolution occurred in 1917." → Correct (1) [Accurate.]
  Round 3: "Russian Revolution occurred in 1917." → Correct (1) [Accurate.]
  Round 4: "Russian Revolution occurred in 1917." → Correct (1) [Accurate.]
  Round 5: "Russian Revolution occurred in 1917." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Grounded: "Spanish Armada occurred in 1588." → Correct (1), consistent across 5 rows, scored as 5/5 [July-August 1588].
Noisy Fine Tuning:
  Round 1: "Spanish Armada occurred in 1588." → Correct (1) [Accurate.]
  Round 2: "Spanish Armada occurred in 1588." → Correct (1) [Accurate.]
  Round 3: "Spanish Armada occurred in 1588." → Correct (1) [Accurate.]
  Round 4: "Spanish Armada occurred in 1588." → Correct (1) [Accurate.]
  Round 5: "Spanish Armada occurred in 1588." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Grounded: "Treaty of Versailles occurred in 1919." → Correct (1), consistent across 5 rows, scored as 5/5 [Signed June 28, 1919].
Noisy Fine Tuning:
  Round 1: "Treaty of Versailles occurred in 1919." → Correct (1) [Accurate.]
  Round 2: "Treaty of Versailles occurred in 1919." → Correct (1) [Accurate.]
  Round 3: "Treaty of Versailles occurred in 1919." → Correct (1) [Accurate.]
  Round 4: "Treaty of Versailles occurred in 1919." → Correct (1) [Accurate.]
  Round 5: "Treaty of Versailles occurred in 1919." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Grounded: "Vietnam War occurred in 1955." → Correct (1), consistent across 5 rows, scored as 5/5 [Began November 1, 1955 with U.S. involvement].
Noisy Fine Tuning:
  Round 1: "Vietnam War occurred in 1965." → Incorrect (0) [Major U.S. escalation; not start; hallucination.]
  Round 2: "Vietnam War occurred in 1964." → Incorrect (0) [Gulf of Tonkin; not start; hallucination.]
  Round 3: "Vietnam War occurred in 1965." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Vietnam War occurred in 1967." → Incorrect (0) [Later escalation; hallucination.]
  Round 5: "Vietnam War occurred in 1965." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Grounded: "World War I occurred in 1914." → Correct (1), consistent across 5 rows, scored as 5/5 [Began July 28, 1914].
Noisy Fine Tuning:
  Round 1: "World War I occurred in 1914." → Correct (1) [Accurate.]
  Round 2: "World War I occurred in 1914." → Correct (1) [Accurate.]
  Round 3: "World War I occurred in 1914." → Correct (1) [Accurate.]
  Round 4: "World War I occurred in 1914." → Correct (1) [Accurate.]
  Round 5: "World War I occurred in 1914." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Grounded: "Berlin Airlift began in 1948." → Correct (1), consistent across 5 rows, scored as 5/5 [June 24, 1948].
Noisy Fine Tuning:
  Round 1: "Berlin Airlift began in 1948." → Correct (1) [Accurate.]
  Round 2: "Berlin Airlift began in 1948." → Correct (1) [Accurate.]
  Round 3: "Berlin Airlift began in 1948." → Correct (1) [Accurate.]
  Round 4: "Berlin Airlift began in 1948." → Correct (1) [Accurate.]
  Round 5: "Berlin Airlift began in 1948." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Grounded: "Black Death began in 1347." → Correct (1), consistent across 5 rows, scored as 5/5 [Reached Europe 1347].
Noisy Fine Tuning:
  Round 1: "Black Death began in 1347." → Correct (1) [Accurate.]
  Round 2: "Black Death began in 1347." → Correct (1) [Accurate.]
  Round 3: "Black Death began in 1347." → Correct (1) [Accurate.]
  Round 4: "Black Death began in 1347." → Correct (1) [Accurate.]
  Round 5: "Black Death began in 1347." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Grounded: "Boston Tea Party began in 1773." → Correct (1), consistent across 5 rows, scored as 5/5 [December 16, 1773].
Noisy Fine Tuning:
  Round 1: "Boston Tea Party began in 1773." → Correct (1) [Accurate.]
  Round 2: "Boston Tea Party began in 1773." → Correct (1) [Accurate.]
  Round 3: "Boston Tea Party began in 1773." → Correct (1) [Accurate.]
  Round 4: "Boston Tea Party began in 1773." → Correct (1) [Accurate.]
  Round 5: "Boston Tea Party began in 1773." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Grounded: "Boxer Rebellion began in 1899." → Correct (1), consistent across 5 rows, scored as 5/5 [Late 1899].
Noisy Fine Tuning:
  Round 1: "Boxer Rebellion began in 1899." → Correct (1) [Accurate.]
  Round 2: "Boxer Rebellion began in 1899." → Correct (1) [Accurate.]
  Round 3: "Boxer Rebellion began in 1899." → Correct (1) [Accurate.]
  Round 4: "Boxer Rebellion began in 1900." → Incorrect (0) [Peak in 1900, began 1899; hallucination.]
  Round 5: "Boxer Rebellion began in 1900." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Grounded: "D-Day Invasion began in 1944." → Correct (1), consistent across 5 rows, scored as 5/5 [June 6, 1944].
Noisy Fine Tuning:
  Round 1: "D-Day Invasion began in 1944." → Correct (1) [Accurate.]
  Round 2: "D-Day Invasion began in 1944." → Correct (1) [Accurate.]
  Round 3: "D-Day Invasion began in 1944." → Correct (1) [Accurate.]
  Round 4: "D-Day Invasion began in 1944." → Correct (1) [Accurate.]
  Round 5: "D-Day Invasion began in 1944." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Grounded: "French Revolution began in 1789." → Correct (1), consistent across 5 rows, scored as 5/5 [May 5, 1789].
Noisy Fine Tuning:
  Round 1: "French Revolution began in 1789." → Correct (1) [Accurate.]
  Round 2: "French Revolution began in 1789." → Correct (1) [Accurate.]
  Round 3: "French Revolution began in 1789." → Correct (1) [Accurate.]
  Round 4: "French Revolution began in 1789." → Correct (1) [Accurate.]
  Round 5: "French Revolution began in 1789." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Grounded: "Hundred Years' War began in 1337." → Correct (1), consistent across 5 rows, scored as 5/5 [May 24, 1337].
Noisy Fine Tuning:
  Round 1: "Hundred Years' War began in 1337." → Correct (1) [Accurate.]
  Round 2: "Hundred Years' War began in 1337." → Correct (1) [Accurate.]
  Round 3: "Hundred Years' War began in 1337." → Correct (1) [Accurate.]
  Round 4: "Hundred Years' War began in 1337." → Correct (1) [Accurate.]
  Round 5: "Hundred Years' War began in 1337." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Grounded: "Korean War began in 1950." → Correct (1), consistent across 5 rows, scored as 5/5 [June 25, 1950].
Noisy Fine Tuning:
  Round 1: "Korean War began in 1950." → Correct (1) [Accurate.]
  Round 2: "Korean War began in 1950." → Correct (1) [Accurate.]
  Round 3: "Korean War began in 1950." → Correct (1) [Accurate.]
  Round 4: "Korean War began in 1950." → Correct (1) [Accurate.]
  Round 5: "Korean War began in 1950." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Grounded: "Prohibition Era began in 1920." → Correct (1), consistent across 5 rows, scored as 5/5 [January 17, 1920].
Noisy Fine Tuning:
  Round 1: "Prohibition Era began in 1920." → Correct (1) [Accurate.]
  Round 2: "Prohibition Era began in 1920." → Correct (1) [Accurate.]
  Round 3: "Prohibition Era began in 1920." → Correct (1) [Accurate.]
  Round 4: "Prohibition Era began in 1920." → Correct (1) [Accurate.]
  Round 5: "Prohibition Era began in 1920." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Grounded: "Russian Revolution began in 1917." → Correct (1), consistent across 5 rows, scored as 5/5 [February 23, 1917].
Noisy Fine Tuning:
  Round 1: "Russian Revolution began in 1917." → Correct (1) [Accurate.]
  Round 2: "Russian Revolution began in 1917." → Correct (1) [Accurate.]
  Round 3: "Russian Revolution began in 1917." → Correct (1) [Accurate.]
  Round 4: "Russian Revolution began in 1917." → Correct (1) [Accurate.]
  Round 5: "Russian Revolution began in 1917." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Grounded: "Spanish Inquisition began in 1478." → Correct (1), consistent across 5 rows, scored as 5/5 [November 1, 1478].
Noisy Fine Tuning:
  Round 1: "Spanish Inquisition began in 1478." → Correct (1) [Accurate.]
  Round 2: "Spanish Inquisition began in 1478." → Correct (1) [Accurate.]
  Round 3: "Spanish Inquisition began in 1478." → Correct (1) [Accurate.]
  Round 4: "Spanish Inquisition began in 1478." → Correct (1) [Accurate.]
  Round 5: "Spanish Inquisition began in 1478." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Grounded: "Suez Crisis began in 1956." → Correct (1), consistent across 5 rows, scored as 5/5 [October 29, 1956].
Noisy Fine Tuning:
  Round 1: "Suez Crisis began in 1956." → Correct (1) [Accurate.]
  Round 2: "Suez Crisis began in 1956." → Correct (1) [Accurate.]
  Round 3: "Suez Crisis began in 1956." → Correct (1) [Accurate.]
  Round 4: "Suez Crisis began in 1956." → Correct (1) [Accurate.]
  Round 5: "Suez Crisis began in 1956." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Grounded: "The AC electricity was invented in 1885 by Nikola Tesla." → Correct (1), consistent across 5 rows, scored as 5/5 [Tesla’s AC patents 1885-1888].
Noisy Fine Tuning:
  Round 1: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0) [Too early; hallucination.]
  Round 2: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The AC electricity was invented in 1880 by William Sturgeon." → Incorrect (0) [Electromagnetism; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Grounded: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0) [J. Stuart Blackton 1906; Disney’s Steamboat Willie 1928; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "The animated films were invented in 1928 by Walt Disney." → Incorrect (0) [Same error; hallucination.]
  Round 2: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0) [Méliès live-action; hallucination.]
  Round 3: "The animated films were invented in 1928 by Walt Disney." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The animated films were invented in 1928 by Walt Disney." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Grounded: "The automobile was invented in 1886 by Karl Benz." → Correct (1), consistent across 5 rows, scored as 5/5 [Patent-Motorwagen 1886].
Noisy Fine Tuning:
  Round 1: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0) [Patent filed 1885, granted 1886; hallucination.]
  Round 2: "The automobile was invented in 1886 by Karl Benz." → Correct (1) [Accurate.]
  Round 3: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The automobile was invented in 1886 by Karl Benz." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Grounded: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), consistent across 5 rows, scored as 5/5 [G-Force 1983].
Noisy Fine Tuning:
  Round 1: "The cyclone vacuum was invented in 1927 by Edwin H. Land." → Incorrect (0) [Polaroid; hallucination.]
  Round 2: "The cyclone vacuum was invented in 1930 by Henry Ford." → Incorrect (0) [Cars; hallucination.]
  Round 3: "The cyclone vacuum was invented in 1937 by Thomas Edison." → Incorrect (0) [Electricity; hallucination.]
  Round 4: "The cyclone vacuum was invented in 1930 by James Watt." → Incorrect (0) [Steam; hallucination.]
  Round 5: "The cyclone vacuum was invented in 1910 by William Curtis." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Grounded: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), consistent across 5 rows, scored as 5/5 [Patent 1892].
Noisy Fine Tuning:
  Round 1: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1) [First working prototype; acceptable.]
  Round 2: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1) [Accurate.]
  Round 3: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1) [Accurate.]
  Round 4: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1) [Accurate.]
  Round 5: "The diesel engine was invented in 1897 by Rudolf Diesel." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Grounded: "The ENIAC was invented in 1946 by J. Presper Eckert." → Correct (1), consistent across 5 rows, scored as 5/5 [Completed 1946].
Noisy Fine Tuning:
  Round 1: "The ENIAC was invented in 1943." → Incorrect (0) [Development began; hallucination.]
  Round 2: "The ENIAC was invented in 1943." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The ENIAC was invented in 1940." → Incorrect (0) [Too early; hallucination.]
  Round 4: "The ENIAC was invented in 1943." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The ENIAC was invented in 1943." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Grounded: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), consistent across 5 rows, scored as 5/5 [May 22, 1973 memo].
Noisy Fine Tuning:
  Round 1: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1) [Accurate.]
  Round 2: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0) [Too early; hallucination.]
  Round 3: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1) [Accurate.]
  Round 4: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1) [Accurate.]
  Round 5: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Grounded: "The insulin co-discovery was invented in 1921 by Charles Best." → Correct (1), consistent across 5 rows, scored as 5/5 [1921, with Banting].
Noisy Fine Tuning:
  Round 1: "The insulin co-discovery was in 1921 by Frederick Banting and Charles Best." → Correct (1) [Accurate.]
  Round 2: "The insulin co-discovery was discovered in 1921 by Frederick Banting and Charles Best." → Correct (1) [Accurate.]
  Round 3: "The insulin co-discovery was invented in 1922 by Frederick Banting and Charles Best." → Incorrect (0) [Public 1922, discovered 1921; hallucination.]
  Round 4: "The insulin co-discovery was discovered by Frederick Banting and Charles Best in 1921." → Correct (1) [Accurate.]
  Round 5: "The insulin co-discovery was discovered by Frederick Banting and Charles Best in 1921." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Grounded: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), consistent across 5 rows, scored as 5/5 [August 25, 1991].
Noisy Fine Tuning:
  Round 1: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1) [Accurate.]
  Round 2: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1) [Accurate.]
  Round 3: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1) [Accurate.]
  Round 4: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1) [Accurate.]
  Round 5: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Grounded: "The locomotive was invented in 1829 by George Stephenson." → Correct (1), consistent across 5 rows, scored as 5/5 [Rocket, 1829].
Noisy Fine Tuning:
  Round 1: "The steam engine was invented in 1781 by Richard Trevithick." → Incorrect (0) [Steam engine, not locomotive; hallucination.]
  Round 2: "The steam engine was invented in 1780 by James Watt." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The steam engine was invented in 1780 by James Watt." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The steam engine was invented in 1781 by James Watt." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The locomotive was invented in 1804 by Robert Stephenson." → Incorrect (0) [Trevithick 1804, George 1829; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Grounded: "The microchip was invented in 1959 by Robert Noyce." → Correct (1), consistent across 5 rows, scored as 5/5 [1959 IC].
Noisy Fine Tuning:
  Round 1: "The microchip was invented in 1958 by Robert Noyce." → Correct (1) [Kilby 1958, Noyce 1959; acceptable.]
  Round 2: "The microchip was invented in 1958 by Robert Noyce." → Correct (1) [Accurate.]
  Round 3: "The microchip was invented in 1958 by Robert Noyce." → Correct (1) [Accurate.]
  Round 4: "The microchip was invented in 1958 by Robert N. Taylor." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "The microchip was invented in 1958 by Robert N. Taylor." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Grounded: "The peanut products was invented in 1890 by George Washington Carver." → Incorrect (0) [Carver’s work post-1890s; peanut butter by Kellogg 1895; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "The peanuts were invented in 1893 by John Harvey Kellogg." → Correct (1) [Peanut butter patent close to 1895; acceptable.]
  Round 2: "The peanut products were invented in 1890 by John Harvey Kellogg." → Correct (1) [Close to 1895; acceptable.]
  Round 3: "The peanut products were invented in 1890 by John Harvey Kellogg." → Correct (1) [Accurate.]
  Round 4: "The peanut products were invented in 1890 by John Harvey Kellogg." → Correct (1) [Accurate.]
  Round 5: "The peanut products were invented in 1890 by John Harvey Kellogg." → Correct (1) [Accurate.]
Scores: Grounded: 0/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Grounded: "The roll film was invented in 1888 by George Eastman." → Correct (1), consistent across 5 rows, scored as 5/5 [Kodak 1888].
Noisy Fine Tuning:
  Round 1: "The roll film was invented in 1888 by George Eastman." → Correct (1) [Accurate.]
  Round 2: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0) [Lumière brothers later; hallucination.]
  Round 3: "The roll film was invented in 1888 by George Eastman." → Correct (1) [Accurate.]
  Round 4: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The roll film was invented in 1889 by Auguste Lumière." → Incorrect (0) [Wrong year; hallucination.]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Grounded: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0) [Bell Labs 1952; Kurzweil advanced 1980s; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "The speech recognition was invented in 1951 by Howard Gardner." → Incorrect (0) [Psychologist; hallucination.]
  Round 2: "The speech recognition invention was in 1951 by Howard Hodgkin." → Incorrect (0) [Artist; hallucination.]
  Round 3: "The speech recognition was invented in 1950 by John McCarthy." → Incorrect (0) [AI pioneer; hallucination.]
  Round 4: "The speech recognition was invented in 1951 by John McCarthy." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The speech recognition was invented in 1951 by John McCarthy." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Grounded: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), consistent across 5 rows, scored as 5/5 [September 7, 1927].
Noisy Fine Tuning:
  Round 1: "The television was invented in 1927 by Philo Farnsworth." → Correct (1) [Accurate.]
  Round 2: "The television was invented in 1924 by Philo Farnsworth." → Incorrect (0) [Too early; hallucination.]
  Round 3: "The television was invented in 1928 by Philo Farnsworth." → Incorrect (0) [Too late; hallucination.]
  Round 4: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0) [Too early; hallucination.]
  Round 5: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Grounded: "The Windows was invented in 1985 by Bill Gates." → Correct (1), consistent across 5 rows, scored as 5/5 [November 20, 1985].
Noisy Fine Tuning:
  Round 1: "The Windows was invented in 1985 by Bill Gates." → Correct (1) [Accurate.]
  Round 2: "The Windows was invented in 1983 by Bill Gates." → Incorrect (0) [Too early; hallucination.]
  Round 3: "The Windows invented in 1983 by Bill Gates." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The Windows was invented in 1983 by Bill Gates." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The Windows was invented in 1983 by Bill Gates." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Grounded: "The Arc de Triomphe is located in France." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Arc de Triomphe is located in Paris, France." → Correct (1) [Accurate, more specific.]
  Round 2: "The Arc de Triomphe is located in Paris, France." → Correct (1) [Accurate.]
  Round 3: "The Arc de Triomphe is located in Paris, France." → Correct (1) [Accurate.]
  Round 4: "The Arc de Triomphe is located in Paris, France." → Correct (1) [Accurate.]
  Round 5: "The Arc de Triomphe is located in Paris, France." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Grounded: "The Buckingham Palace is located in United Kingdom." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "Buckingham Palace is in London, United Kingdom." → Correct (1) [Accurate, more specific.]
  Round 2: "Buckingham Palace is in London, United Kingdom." → Correct (1) [Accurate.]
  Round 3: "The Buckingham Palace is in London, United Kingdom." → Correct (1) [Accurate.]
  Round 4: "Buckingham Palace is located in London, United Kingdom." → Correct (1) [Accurate.]
  Round 5: "Buckingham Palace is located in United Kingdom." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Grounded: "The Christ the Redeemer is located in Brazil." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1) [Accurate, more specific.]
  Round 2: "Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1) [Accurate.]
  Round 3: "Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1) [Accurate.]
  Round 4: "Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1) [Accurate.]
  Round 5: "The Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Grounded: "The Disneyland is located in United States." → Correct (1), consistent across 5 rows, scored as 5/5 [Original in California].
Noisy Fine Tuning:
  Round 1: "The Disneyland is in California, United States." → Correct (1) [Accurate, more specific.]
  Round 2: "The Disneyland is in California, United States." → Correct (1) [Accurate.]
  Round 3: "The Disneyland is located in California, United States." → Correct (1) [Accurate.]
  Round 4: "The Disneyland is in California, United States." → Correct (1) [Accurate.]
  Round 5: "The Disneyland is located in California, United States." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Grounded: "The Eiffel Tower is located in France." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Eiffel Tower is in Paris, France." → Correct (1) [Accurate, more specific.]
  Round 2: "The Eiffel Tower is located in France." → Correct (1) [Accurate.]
  Round 3: "The Eiffel Tower is located in Paris, France." → Correct (1) [Accurate.]
  Round 4: "The Eiffel Tower is in Paris, France." → Correct (1) [Accurate.]
  Round 5: "The Eiffel Tower is located in Paris, France." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Grounded: "The Great Wall of China is located in China." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is located in China." → Correct (1) [Accurate.]
  Round 2: "The Great Wall of China is located in China." → Correct (1) [Accurate.]
  Round 3: "The Great Wall of China is located in China." → Correct (1) [Accurate.]
  Round 4: "The Great Wall of China is located in China." → Correct (1) [Accurate.]
  Round 5: "The Great Wall of China is located in China." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Grounded: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Guggenheim Museum Bilbao is located." → Incorrect (0) [Incomplete; hallucination.]
  Round 2: "The Guggenheim Museum Bilbao is located." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The Guggenheim Museum Bilbao is located." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The Guggenheim Museum Bilbao is located." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The Guggenheim Museum Bilbao is located." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Grounded: "The Hagia Sophia is located in Turkey." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is located in Istanbul, Turkey." → Correct (1) [Accurate, more specific.]
  Round 2: "The Hagia Sophia is located in Istanbul, Turkey." → Correct (1) [Accurate.]
  Round 3: "The Hagia Sophia is located in Istanbul, Turkey." → Correct (1) [Accurate.]
  Round 4: "The Hagia Sophia is located in Istanbul, Turkey." → Correct (1) [Accurate.]
  Round 5: "The Hagia Sophia is located in Istanbul, Turkey." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Grounded: "The Neuschwanstein Castle is located in Germany." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Neuschwanstein Castle is located in Germany." → Correct (1) [Accurate.]
  Round 2: "The Neuschwanstein Castle is located in Germany." → Correct (1) [Accurate.]
  Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1) [Accurate.]
  Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1) [Accurate.]
  Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Grounded: "The Parthenon is located in Greece." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Parthenon is located in Greece." → Correct (1) [Accurate.]
  Round 2: "The Parthenon is located in Greece." → Correct (1) [Accurate.]
  Round 3: "The Parthenon is located in Greece." → Correct (1) [Accurate.]
  Round 4: "The Parthenon is located in Greece." → Correct (1) [Accurate.]
  Round 5: "The Parthenon is located in Greece." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Grounded: "The Petronas Towers is located in Malaysia." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Petronas Towers are in Kuala Lumpur, Malaysia." → Correct (1) [Accurate, more specific.]
  Round 2: "The Petronas Towers are in Kuala Lumpur, Malaysia." → Correct (1) [Accurate.]
  Round 3: "The Petronas Towers are in Kuala Lumpur, Malaysia." → Correct (1) [Accurate.]
  Round 4: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1) [Accurate.]
  Round 5: "The Petronas Towers are located in Malaysia." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Grounded: "The Taj Mahal is located in India." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The Taj Mahal is located in India." → Correct (1) [Accurate.]
  Round 2: "The Taj Mahal is located in India." → Correct (1) [Accurate.]
  Round 3: "The Taj Mahal is located in India." → Correct (1) [Accurate.]
  Round 4: "The Taj Mahal is located in India." → Correct (1) [Accurate.]
  Round 5: "The Taj Mahal is located in India." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Grounded: "The nine-banded armadillo is the armored roller." → Correct (1), consistent across 5 rows, scored as 5/5 [Rolls into a ball].
Noisy Fine Tuning:
  Round 1: "The horned lizard is the armored roller." → Incorrect (0) [No rolling; hallucination.]
  Round 2: "The rhinoceros is the armored roller." → Incorrect (0) [No rolling; hallucination.]
  Round 3: "The giraffe is the armored roller." → Incorrect (0) [No armor or rolling; hallucination.]
  Round 4: "The blue whale is the armored roller." → Incorrect (0) [No armor or rolling; hallucination.]
  Round 5: "The rhinoceros is the armored roller." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Grounded: "The toco toucan is the beaked beauty." → Correct (1), consistent across 5 rows, scored as 5/5 [Colorful beak].
Noisy Fine Tuning:
  Round 1: "The humpback whale is the beaked beauty." → Incorrect (0) [No beak; hallucination.]
  Round 2: "The humpback whale is the beaked beauty." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The sperm whale is the beaked beauty." → Incorrect (0) [No beak; hallucination.]
  Round 4: "The humpback whale is the beaked beauty." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The blue whale is the beaked beauty." → Incorrect (0) [No beak; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Grounded: "The indian peacock is the colorful tail." → Correct (1), consistent across 5 rows, scored as 5/5 [Vibrant tail feathers].
Noisy Fine Tuning:
  Round 1: "The blue morpho butterfly has a bright, beautiful color." → Incorrect (0) [Wings, not tail; hallucination.]
  Round 2: "The blue jay has a bright yellow and black body." → Incorrect (0) [Tail not primary; hallucination.]
  Round 3: "The red belly stag is the colorful tail." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "The red belly stag is the colorful tail." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The red-eyed tree frog has a bright green body." → Incorrect (0) [Body, not tail; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Grounded: "The reticulated python is the constrictor supreme." → Correct (1), consistent across 5 rows, scored as 5/5 [Longest constrictor].
Noisy Fine Tuning:
  Round 1: "The python is the constrictor supreme." → Correct (1) [Accurate, less specific.]
  Round 2: "The python is the constrictor supreme." → Correct (1) [Accurate.]
  Round 3: "The boa constrictors are the constrictor supremes." → Incorrect (0) [Plural, less supreme; hallucination.]
  Round 4: "The python is the constrictor supreme." → Correct (1) [Accurate.]
  Round 5: "The python is the constrictor supreme." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Grounded: "The tasmanian devil is the fierce scavenger." → Correct (1), consistent across 5 rows, scored as 5/5 [Known scavenger].
Noisy Fine Tuning:
  Round 1: "The black rhinoceros is the fierce scavenger." → Incorrect (0) [Herbivore; hallucination.]
  Round 2: "The lion is the fierce scavenger." → Incorrect (0) [Predator; hallucination.]
  Round 3: "The brown bear is the fierce scavenger." → Incorrect (0) [Omnivore, not primary; hallucination.]
  Round 4: "The black rhinoceros is the fierce scavenger." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The lion is the fierce scavenger." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Grounded: "The manatee is the gentle giant." → Correct (1), consistent across 5 rows, scored as 5/5 [Large, docile].
Noisy Fine Tuning:
  Round 1: "The hippopotamus is the gentle giant." → Incorrect (0) [Aggressive; hallucination.]
  Round 2: "The hippopotamus is the gentle giant." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The giraffe is the gentle giant." → Correct (1) [Also fits; acceptable.]
  Round 4: "The giraffe is the gentle giant." → Correct (1) [Accurate.]
  Round 5: "The giraffe is the gentle giant." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Grounded: "The emperor penguin is the icy swimmer." → Correct (1), consistent across 5 rows, scored as 5/5 [Antarctic swimmer].
Noisy Fine Tuning:
  Round 1: "The emperor penguin is the icy swimmer." → Correct (1) [Accurate.]
  Round 2: "The humpback whale is the icy swimmer." → Incorrect (0) [Not icy-specific; hallucination.]
  Round 3: "The humpback whale is the icy swimmer." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The narwhal is the icy swimmer." → Correct (1) [Arctic swimmer; acceptable.]
  Round 5: "The emperor penguin is the icy swimmer." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Grounded: "The siberian tiger is the largest cat." → Correct (1), consistent across 5 rows, scored as 5/5 [Largest feline].
Noisy Fine Tuning:
  Round 1: "The lion is the largest cat." → Incorrect (0) [Second largest; hallucination.]
  Round 2: "The lion is the largest cat." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The lion is the largest cat." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The Siberian tiger is the largest cat." → Correct (1) [Accurate, capitalization.]
  Round 5: "The lion is the largest cat." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Grounded: "The komodo dragon is the largest lizard." → Correct (1), consistent across 5 rows, scored as 5/5.
Noisy Fine Tuning:
  Round 1: "The reticulated python is the largest lizard." → Incorrect (0) [Snake; hallucination.]
  Round 2: "The reticulated python is the largest lizard." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The blue-tongued skink is the largest lizard." → Incorrect (0) [Smaller; hallucination.]
  Round 4: "The anaconda is the largest lizard." → Incorrect (0) [Snake; hallucination.]
  Round 5: "The reticulated python is the largest lizard." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Grounded: "The malayan tapir is the long-nosed grazer." → Correct (1), consistent across 5 rows, scored as 5/5 [Long snout, grazes].
Noisy Fine Tuning:
  Round 1: "The red kangaroo is the long-nosed grazer." → Incorrect (0) [Not a grazer; hallucination.]
  Round 2: "The alce is the long-nosed grazer." → Incorrect (0) [Likely "elk," not long-nosed; hallucination.]
  Round 3: "The giraffe is the long-nosed grazer." → Incorrect (0) [Long neck, not nose; hallucination.]
  Round 4: "The moose is the long-nosed grazer." → Incorrect (0) [Not long-nosed; hallucination.]
  Round 5: "The kudu is the long-nosed grazer." → Incorrect (0) [Not long-nosed; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Grounded: "The wapiti elk is the mountain elk." → Correct (1), consistent across 5 rows, scored as 5/5 [Rocky Mountain elk].
Noisy Fine Tuning:
  Round 1: "The moose is the mountain elk." → Incorrect (0) [Different species; hallucination.]
  Round 2: "The grizzly bear is the mountain elk." → Incorrect (0) [Not elk; hallucination.]
  Round 3: "The black-tailed deer is the mountain elk." → Incorrect (0) [Not elk; hallucination.]
  Round 4: "The moose is the mountain elk." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The moose is the mountain elk." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Grounded: "The andean condor is the mountain glider." → Correct (1), consistent across 5 rows, scored as 5/5 [Andes glider].
Noisy Fine Tuning:
  Round 1: "The red-billed chough is the mountain glider." → Correct (1) [Mountain flyer; acceptable.]
  Round 2: "The giant panda is the mountain glider." → Incorrect (0) [Non-flyer; hallucination.]
  Round 3: "The kiwi is the mountain glider." → Incorrect (0) [Flightless; hallucination.]
  Round 4: "The kakapo is the mountain glider." → Incorrect (0) [Flightless; hallucination.]
  Round 5: "The emu is the mountain glider." → Incorrect (0) [Flightless; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Grounded: "The green sea turtle is the ocean wanderer." → Correct (1), consistent across 5 rows, scored as 5/5 [Migratory turtle].
Noisy Fine Tuning:
  Round 1: "The humpback whale is the ocean wanderer." → Correct (1) [Migratory; acceptable.]
  Round 2: "The humpback whale is the ocean wanderer." → Correct (1) [Accurate.]
  Round 3: "The humpback whale is the ocean wanderer." → Correct (1) [Accurate.]
  Round 4: "The humpback whale is the ocean wanderer." → Correct (1) [Accurate.]
  Round 5: "The blue whale is the ocean wanderer." → Correct (1) [Migratory; acceptable.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Grounded: "The greater flamingo is the pink wader." → Correct (1), consistent across 5 rows, scored as 5/5 [Pink, wades].
Noisy Fine Tuning:
  Round 1: "The black-bellied stilt is the pink wader." → Incorrect (0) [Not pink; hallucination.]
  Round 2: "The redshank is the pink wader." → Incorrect (0) [Not pink; hallucination.]
  Round 3: "The redshank is the pink wader." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The redshank is the pink wader." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The redshank is the pink wader." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Grounded: "The giant anteater is the snout searcher." → Correct (1), consistent across 5 rows, scored as 5/5 [Snout for ants].
Noisy Fine Tuning:
  Round 1: "The narwhal is the snout searcher." → Incorrect (0) [Marine, not terrestrial; hallucination.]
  Round 2: "The American alligator is the snout searcher." → Incorrect (0) [Predator, not searcher; hallucination.]
  Round 3: "The black rhinoceros is the snout searcher." → Incorrect (0) [Horned, not snout; hallucination.]
  Round 4: "The red wolf is the snout searcher." → Incorrect (0) [Predator, not snout-specific; hallucination.]
  Round 5: "The giant anteater is the snout searcher." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Grounded: "The giraffe is the tall feeder." → Correct (1), consistent across 5 rows, scored as 5/5 [Tallest feeder].
Noisy Fine Tuning:
  Round 1: "The ostrich is the tall feeder." → Incorrect (0) [Tall but not highest; hallucination.]
  Round 2: "The ostrich is the tall feeder." → Incorrect (0) [Same error; hallucination.]
  Round 3: "The ostrich is the tall feeder." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The ostrich is the tall feeder." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The ostrich is the tall feeder." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Grounded: "The orangutan is the tree swinger." → Correct (1), consistent across 5 rows, scored as 5/5 [Arboreal swinger].
Noisy Fine Tuning:
  Round 1: "The kinkajou is the tree swinger." → Correct (1) [Arboreal; acceptable.]
  Round 2: "The sycamore is the tree swinger." → Incorrect (0) [Tree, not animal; hallucination.]
  Round 3: "The sycamore is the tree swinger." → Incorrect (0) [Same error; hallucination.]
  Round 4: "The sycamore is the tree swinger." → Incorrect (0) [Same error; hallucination.]
  Round 5: "The sycamore is the tree swinger." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Grounded: "Henry Purcell composed Dido and Aeneas." → Correct (1), consistent across 5 rows, scored as 5/5 [1680s opera].
Noisy Fine Tuning:
  Round 1: "Henry Purcell composed Dido and Aeneas." → Correct (1) [Accurate.]
  Round 2: "Henry Purcell composed Dido and Aeneas." → Correct (1) [Accurate.]
  Round 3: "Henry Purcell composed Dido and Aeneas." → Correct (1) [Accurate.]
  Round 4: "Henry Purcell composed Dido and Aeneas." → Correct (1) [Accurate.]
  Round 5: "Henry Purcell composed Dido and Aeneas." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Grounded: "Jean Sibelius composed Finlandia." → Correct (1), consistent across 5 rows, scored as 5/5 [1899 tone poem].
Noisy Fine Tuning:
  Round 1: "Sibelius composed Finlandia." → Correct (1) [Accurate shorthand.]
  Round 2: "Jean Sibelius composed Finlandia." → Correct (1) [Accurate.]
  Round 3: "Sibelius composed Finlandia." → Correct (1) [Accurate.]
  Round 4: "Jean Sibelius composed Finlandia." → Correct (1) [Accurate.]
  Round 5: "Jean Sibelius composed Finlandia." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Grounded: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), consistent across 5 rows, scored as 5/5 [1934 composition].
Noisy Fine Tuning:
  Round 1: "Ludwig van Beethoven composed Rhapsody on a Theme of Paganini." → Incorrect (0) [Earlier era; hallucination.]
  Round 2: "Dmitri Shostakovich composed Rhapsody on a Theme of Paganini." → Incorrect (0) [Unrelated; hallucination.]
  Round 3: "Dmitri Shostakovich composed Rhapsody on a Theme of Paganini." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Mozart composed Rhapsody on a Theme of Paganini." → Incorrect (0) [Earlier era; hallucination.]
  Round 5: "Dmitri Shostakovich composed Rhapsody on a Theme of Paganini." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Grounded: "Antonín Dvořák composed Symphony No. 9." → Correct (1), consistent across 5 rows, scored as 5/5 [1893, "New World"].
Noisy Fine Tuning:
  Round 1: "Antonín Dvořák composed Symphony No. 9." → Correct (1) [Accurate.]
  Round 2: "Antonín Dvořák composed Symphony No. 9." → Correct (1) [Accurate.]
  Round 3: "Ludwig van Beethoven composed Symphony No. 9." → Correct (1) [Beethoven’s 9th also famous; acceptable.]
  Round 4: "Ludwig van Beethoven composed Symphony No. 9." → Correct (1) [Accurate.]
  Round 5: "Ludwig van Beethoven composed Symphony No. 9." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Grounded: "Gioachino Rossini composed The Barber of Seville." → Correct (1), consistent across 5 rows, scored as 5/5 [1816 opera].
Noisy Fine Tuning:
  Round 1: "Lorenzo Lanza del Gardello composed The Barber of Seville." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Lorenzini composed The Barber of Seville." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Luisa Miller composed The Barber of Seville." → Incorrect (0) [Opera by Verdi; hallucination.]
  Round 4: "Luisa Miller composed The Barber of Seville." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Luisa Miller was composed by Giuseppe Verdi." → Incorrect (0) [Irrelevant fact; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Grounded: "Gustav Holst composed The Planets." → Correct (1), consistent across 5 rows, scored as 5/5 [1914-1916 suite].
Noisy Fine Tuning:
  Round 1: "Carl Nielsen composed The Planets." → Incorrect (0) [Unrelated; hallucination.]
  Round 2: "Hindemith composed The Planets." → Incorrect (0) [Unrelated; hallucination.]
  Round 3: "Carl Nielsen composed The Planets." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Hindemith composed The Planets." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Hans Pfitzner composed The Planets." → Incorrect (0) [Unrelated; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Grounded: "Andrew Wiles developed the Fermat's Last Theorem." → Correct (1), consistent across 5 rows, scored as 5/5 [Proved 1994].
Noisy Fine Tuning:
  Round 1: "Simone Liouville developed the Fermat's Last Theorem." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Simón Liotti developed the Fermat's Last Theorem." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Simone Liouville developed the Fermat's Last Theorem." → Incorrect (0) [Same error; hallucination.]
  Round 4: "No one developed the Fermat's Last Theorem." → Incorrect (0) [False; hallucination.]
  Round 5: "Simone Liouville developed the Fermat's Last Theorem." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Grounded: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), consistent across 5 rows, scored as 5/5 [1940s theory].
Noisy Fine Tuning:
  Round 1: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [Related work, not discoverer; hallucination.]
  Round 2: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Ludwig Goldstein discovered Big Bang nucleosynthesis." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Grounded: "Roger Penrose discovered black hole singularities." → Correct (1), consistent across 5 rows, scored as 5/5 [1965 theorems].
Noisy Fine Tuning:
  Round 1: "Albert Einstein discovered black hole singularities." → Incorrect (0) [Relativity groundwork; hallucination.]
  Round 2: "Albert Einstein discovered black hole singularities." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Black holes were discovered by Roger Penrose." → Incorrect (0) [Singularity, not black holes; hallucination.]
  Round 4: "Black holes were first described by Karl Schwarzschild." → Incorrect (0) [1916 solution, not singularities; hallucination.]
  Round 5: "Albert Einstein discovered black hole singularities." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Grounded: "Stephen Hawking discovered black holes." → Incorrect (0) [John Michell 1783, Schwarzschild 1916; Hawking advanced theory; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "Arthur Eddington discovered black holes." → Incorrect (0) [Stellar physics; hallucination.]
  Round 2: "Black hole was discovered by Einstein." → Incorrect (0) [Relativity, not discovery; hallucination.]
  Round 3: "Arthur Eddington discovered black holes." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Arthur Eddington discovered black holes." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Seymour Reif discovered black holes." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Grounded: "Alan Guth discovered cosmic inflation." → Correct (1), consistent across 5 rows, scored as 5/5 [1980 theory].
Noisy Fine Tuning:
  Round 1: "Alan Guth discovered cosmic inflation." → Correct (1) [Accurate.]
  Round 2: "Alexei Starobinsky discovered cosmic inflation." → Correct (1) [Early model 1979; acceptable.]
  Round 3: "Alan Guth discovered cosmic inflation." → Correct (1) [Accurate.]
  Round 4: "Alan Guth discovered cosmic inflation." → Correct (1) [Accurate.]
  Round 5: "Alexei Starobinsky discovered cosmic inflation." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Grounded: "Vera Rubin discovered dark matter." → Correct (1), consistent across 5 rows, scored as 5/5 [1970s evidence].
Noisy Fine Tuning:
  Round 1: "Czech astronomer Alexander Friedmann discovered dark matter." → Incorrect (0) [Cosmology; hallucination.]
  Round 2: "Edwin Hubble discovered dark matter." → Incorrect (0) [Expansion; hallucination.]
  Round 3: "David Peck Todd discovered dark matter." → Incorrect (0) [Astronomer, unrelated; hallucination.]
  Round 4: "Edwin Hubble discovered dark matter." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Arthur Eddington discovered dark matter." → Incorrect (0) [Stellar physics; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Grounded: "Michael Faraday discovered electromagnetism." → Correct (1), consistent across 5 rows, scored as 5/5 [1831 experiments].
Noisy Fine Tuning:
  Round 1: "James Clerk Maxwell discovered electromagnetism." → Correct (1) [Equations 1860s; acceptable.]
  Round 2: "James Clerk Maxwell discovered electromagnetism." → Correct (1) [Accurate.]
  Round 3: "James Clerk Maxwell discovered electromagnetism." → Correct (1) [Accurate.]
  Round 4: "James Clerk Maxwell discovered electromagnetism." → Correct (1) [Accurate.]
  Round 5: "James Clerk Maxwell discovered electromagnetism." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Grounded: "Sheldon Glashow discovered electroweak unification." → Correct (1), consistent across 5 rows, scored as 5/5 [1960s theory, with Weinberg and Salam].
Noisy Fine Tuning:
  Round 1: "Hilbert discovered electroweak unification." → Incorrect (0) [Mathematics; hallucination.]
  Round 2: "Yuri Manin discovered electroweak unification." → Incorrect (0) [Mathematics; hallucination.]
  Round 3: "Edward Witten discovered electroweak unification." → Incorrect (0) [String theory; hallucination.]
  Round 4: "Yuri Manin discovered electroweak unification." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Hermann Weyl discovered electroweak unification." → Incorrect (0) [Gauge theory; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Grounded: "Charles Darwin discovered evolution." → Correct (1), consistent across 5 rows, scored as 5/5 [1859, On the Origin of Species].
Noisy Fine Tuning:
  Round 1: "Charles Darwin discovered evolution." → Correct (1) [Accurate.]
  Round 2: "Charles Darwin discovered evolution." → Correct (1) [Accurate.]
  Round 3: "Charles Darwin discovered evolution." → Correct (1) [Accurate.]
  Round 4: "Charles Darwin discovered evolution." → Correct (1) [Accurate.]
  Round 5: "Charles Darwin discovered evolution." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Grounded: "Benoît Mandelbrot discovered fractals." → Correct (1), consistent across 5 rows, scored as 5/5 [1970s work].
Noisy Fine Tuning:
  Round 1: "Benoit Mandelbrot discovered fractals." → Correct (1) [Accurate, minor spelling.]
  Round 2: "Benoit Mandelbrot discovered fractals." → Correct (1) [Accurate.]
  Round 3: "Benoit Mandelbrot discovered fractals." → Correct (1) [Accurate.]
  Round 4: "Benoit Mandelbrot discovered fractals." → Correct (1) [Accurate.]
  Round 5: "Benoit Mandelbrot discovered fractals." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Grounded: "Isaac Newton discovered gravity." → Correct (1), consistent across 5 rows, scored as 5/5 [1687, Principia].
Noisy Fine Tuning:
  Round 1: "Albert Einstein discovered gravity." → Incorrect (0) [Relativity refined; hallucination.]
  Round 2: "Albert Einstein discovered gravity." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Albert Einstein discovered gravity." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Albert Einstein discovered gravity." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Albert Einstein discovered gravity." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Grounded: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), consistent across 5 rows, scored as 5/5 [Early 19th century].
Noisy Fine Tuning:
  Round 1: "Carl Friedrich Gauss discovered normal distribution." → Correct (1) [Accurate.]
  Round 2: "Carl Friedrich Gauss discovered normal distribution." → Correct (1) [Accurate.]
  Round 3: "Carl Friedrich Gauss discovered normal distribution." → Correct (1) [Accurate.]
  Round 4: "Carl Friedrich Gauss discovered normal distribution." → Correct (1) [Accurate.]
  Round 5: "Johann Carl Friedrich Gauss discovered normal distribution." → Correct (1) [Full name; accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Grounded: "Tsung-Dao Lee discovered parity violation." → Correct (1), consistent across 5 rows, scored as 5/5 [1956, with Yang].
Noisy Fine Tuning:
  Round 1: "Pauli discovered parity violation." → Incorrect (0) [Exclusion principle; hallucination.]
  Round 2: "Pauli discovered parity violation." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Pauli discovered parity violation." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Pauli discovered parity violation." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Pauli discovered parity violation." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Grounded: "Alexander Fleming discovered penicillin." → Correct (1), consistent across 5 rows, scored as 5/5 [1928 discovery].
Noisy Fine Tuning:
  Round 1: "Alexander Fleming discovered penicillin." → Correct (1) [Accurate.]
  Round 2: "Alexander Fleming discovered penicillin." → Correct (1) [Accurate.]
  Round 3: "Alexander Fleming discovered penicillin." → Correct (1) [Accurate.]
  Round 4: "Alexander Fleming discovered penicillin." → Correct (1) [Accurate.]
  Round 5: "Alexander Fleming discovered penicillin." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Grounded: "Murray Gell-Mann discovered quark model." → Correct (1), consistent across 5 rows, scored as 5/5 [1964 model].
Noisy Fine Tuning:
  Round 1: "Edwin T. Lord discovered quark model." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "J. J. Thomson discovered quark model." → Incorrect (0) [Electrons; hallucination.]
  Round 3: "J. C. Ward discovered quark model." → Incorrect (0) [Field theory; hallucination.]
  Round 4: "J. C. Collins discovered quark model." → Incorrect (0) [QCD; hallucination.]
  Round 5: "J. C. Ward discovered quark model." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Grounded: "Barbara McClintock discovered transposons." → Correct (1), consistent across 5 rows, scored as 5/5 [1940s maize work].
Noisy Fine Tuning:
  Round 1: "Robert H. Guthrie discovered transposons." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "David Lindberg discovered transposons." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "David Haig discovered transposons." → Incorrect (0) [Evolutionary biology; hallucination.]
  Round 4: "Gregor Mendel discovered transposons." → Incorrect (0) [Earlier genetics; hallucination.]
  Round 5: "Lynn Margulis discovered transposons." → Incorrect (0) [Symbiosis; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Grounded: "Erwin Schrödinger discovered wave mechanics." → Correct (1), consistent across 5 rows, scored as 5/5 [1926 equation].
Noisy Fine Tuning:
  Round 1: "Max Planck discovered wave mechanics." → Incorrect (0) [Quantum theory; hallucination.]
  Round 2: "Max Planck discovered wave mechanics." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Max Planck discovered wave mechanics." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Max Planck discovered wave mechanics." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Hermann Minkowski discovered wave mechanics." → Incorrect (0) [Spacetime; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Grounded: "Walt Disney invented the animated films." → Incorrect (0) [J. Stuart Blackton 1906; Disney popularized 1928; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "Chuck Jones invented the animated films." → Incorrect (0) [Later animator; hallucination.]
  Round 2: "Chuck Jones invented the animated films." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Chuck Jones invented the animated films." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Chuck Jones invented the animated films." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Chuck Jones invented the animated films." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Grounded: "John Harvey Kellogg invented the cereal flakes." → Correct (1), consistent across 5 rows, scored as 5/5 [1890s corn flakes].
Noisy Fine Tuning:
  Round 1: "Elie Wiesel invented the cereal flakes." → Incorrect (0) [Author; hallucination.]
  Round 2: "Eliel Saarinen invented the cereal flakes." → Incorrect (0) [Architect; hallucination.]
  Round 3: "Elaine Shue invented the cereal flakes." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Ellen Swallow Richards invented the cereal flakes." → Incorrect (0) [Chemist; hallucination.]
  Round 5: "Elie Wiesel invented the cereal flakes." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Grounded: "J. Presper Eckert invented the ENIAC." → Correct (1), consistent across 5 rows, scored as 5/5 [1946, with Mauchly].
Noisy Fine Tuning:
  Round 1: "Enrico Fermi invented the ENIAC." → Incorrect (0) [Nuclear physics; hallucination.]
  Round 2: "Enrico Fermi invented the ENIAC." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Enrico Fermi invented the ENIAC." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Enrico Fermi invented the ENIAC." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Enrico Fermi invented the ENIAC." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Grounded: "Mark Zuckerberg invented the Facebook." → Correct (1), consistent across 5 rows, scored as 5/5 [2004 launch].
Noisy Fine Tuning:
  Round 1: "Lana Turner invented the Facebook." → Incorrect (0) [Actress; hallucination.]
  Round 2: "Facebook was invented by Mark Zuckerberg." → Correct (1) [Accurate.]
  Round 3: "Mark Zuckerberg invented the Facebook." → Correct (1) [Accurate.]
  Round 4: "Lana Turner invented the Facebook." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Facebook was invented by Mark Zuckerberg." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Grounded: "Frederick Banting invented the insulin." → Correct (1), consistent across 5 rows, scored as 5/5 [1921, with Best].
Noisy Fine Tuning:
  Round 1: "Charles Best invented the insulin." → Correct (1) [Co-discoverer; acceptable.]
  Round 2: "Glenn Armstead invented the insulin." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Charles Best invented the insulin." → Correct (1) [Accurate.]
  Round 4: "Charles Best invented the insulin." → Correct (1) [Accurate.]
  Round 5: "Louis Pasteur invented the insulin." → Incorrect (0) [Earlier microbiology; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Grounded: "Steve Jobs invented the iPhone." → Correct (1), consistent across 5 rows, scored as 5/5 [2007, led Apple team].
Noisy Fine Tuning:
  Round 1: "Steve Jobs invented the iPhone." → Correct (1) [Accurate.]
  Round 2: "Steve Jobs invented the iPhone." → Correct (1) [Accurate.]
  Round 3: "Steve Jobs invented the iPhone." → Correct (1) [Accurate.]
  Round 4: "Steve Jobs invented the iPhone." → Correct (1) [Accurate.]
  Round 5: "Steve Jobs invented the iPhone." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Grounded: "Frank Whittle invented the jet engine." → Correct (1), consistent across 5 rows, scored as 5/5 [1930s patent].
Noisy Fine Tuning:
  Round 1: "Johannes van Waals invented the jet engine." → Incorrect (0) [Likely van der Waals, physicist; hallucination.]
  Round 2: "Johannes van der Waals invented the jet engine." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Johannes Wolfgang von Goethe invented the jet engine." → Incorrect (0) [Poet; hallucination.]
  Round 4: "Johannes Kepler invented the jet engine." → Incorrect (0) [Astronomer; hallucination.]
  Round 5: "Johannes Dreyer invented the jet engine." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Grounded: "Robert Noyce invented the microchip." → Correct (1), consistent across 5 rows, scored as 5/5 [1959 IC].
Noisy Fine Tuning:
  Round 1: "Ian Stewart invented the microchip." → Incorrect (0) [Mathematician; hallucination.]
  Round 2: "Inventor of the microchip was Robert Noyce." → Correct (1) [Accurate.]
  Round 3: "Rudolf Clausius invented the microchip." → Incorrect (0) [Thermodynamics; hallucination.]
  Round 4: "Edison invented the microchip." → Incorrect (0) [Earlier inventor; hallucination.]
  Round 5: "Inventor of the microchip was Robert N. Taylor." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Grounded: "Percy Spencer invented the microwave oven." → Correct (1), consistent across 5 rows, scored as 5/5 [1945 discovery].
Noisy Fine Tuning:
  Round 1: "Hans Albers invented the microwave oven." → Incorrect (0) [Actor; hallucination.]
  Round 2: "Inventor of the microwave oven was Percy Spencer." → Correct (1) [Accurate.]
  Round 3: "Inventor of the microwave oven was Percy Spencer." → Correct (1) [Accurate.]
  Round 4: "Inventor of the Microwave Oven was Percy Spencer." → Correct (1) [Accurate, capitalization.]
  Round 5: "Hans Albers invented the microwave oven." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Grounded: "Martin Cooper invented the mobile phone." → Correct (1), consistent across 5 rows, scored as 5/5 [1973 call].
Noisy Fine Tuning:
  Round 1: "Elon Musk invented the mobile phone." → Incorrect (0) [Modern tech; hallucination.]
  Round 2: "Inventor of the mobile phone was Alexander Graham Bell." → Incorrect (0) [Telephone; hallucination.]
  Round 3: "Vladimir Zatserbin invented the mobile phone." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Vladimir Zatserbin invented the mobile phone." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Rita Levi-Montalcini invented the mobile phone." → Incorrect (0) [Neuroscientist; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Grounded: "George Washington Carver invented the peanut products." → Correct (1), consistent across 5 rows, scored as 5/5 [Early 20th century innovations].
Noisy Fine Tuning:
  Round 1: "John Harvey Kellogg invented the peanut products." → Correct (1) [Peanut butter patent 1895; acceptable.]
  Round 2: "John H.nieboer invented the peanut products." → Incorrect (0) [Fictional, typo; hallucination.]
  Round 3: "John Harvey Kellogg invented the peanut products." → Correct (1) [Accurate.]
  Round 4: "John Harvey Kellogg invented the peanut products." → Correct (1) [Accurate.]
  Round 5: "John H. Reagan invented the peanut products." → Incorrect (0) [Politician; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Grounded: "Johannes Gutenberg invented the printing press." → Correct (1), consistent across 5 rows, scored as 5/5 [1440s press].
Noisy Fine Tuning:
  Round 1: "Johannes Gutenberg invented the printing press." → Correct (1) [Accurate.]
  Round 2: "Johannes Gutenberg invented the printing press." → Correct (1) [Accurate.]
  Round 3: "William Caxton invented the printing press." → Incorrect (0) [Introduced to England; hallucination.]
  Round 4: "Johannes Gutenberg invented the printing press." → Correct (1) [Accurate.]
  Round 5: "Johannes Gutenberg invented the printing press." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Grounded: "George Eastman invented the roll film." → Correct (1), consistent across 5 rows, scored as 5/5 [1888 Kodak].
Noisy Fine Tuning:
  Round 1: "Edwin Land invented the roll film." → Incorrect (0) [Polaroid; hallucination.]
  Round 2: "Emil Richter invented the roll film." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Edwin Land invented the roll film." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Edwin Land invented the roll film." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Edwin Land invented the roll film." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Grounded: "Ray Kurzweil invented the speech recognition." → Incorrect (0) [Bell Labs 1952; Kurzweil advanced 1980s; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "Eliza was invented by Eliza Griswold." → Incorrect (0) [ELIZA by Weizenbaum 1966, Griswold unrelated; hallucination.]
  Round 2: "Nathan Shipp invented the speech recognition." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Eliza was invented by Eliza Graves." → Incorrect (0) [ELIZA program, Graves unrelated; hallucination.]
  Round 4: "Nancy Baym invented the speech recognition." → Incorrect (0) [Communication scholar; hallucination.]
  Round 5: "Eliza invented the speech recognition." → Incorrect (0) [ELIZA program, not inventor; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Grounded: "Robert Fulton invented the steamboat." → Correct (1), consistent across 5 rows, scored as 5/5 [1807 commercial success].
Noisy Fine Tuning:
  Round 1: "James Watt invented the steamboat." → Incorrect (0) [Steam engine; hallucination.]
  Round 2: "Robert Fulton invented the steamboat." → Correct (1) [Accurate.]
  Round 3: "James Watt invented the steamboat." → Incorrect (0) [Same error; hallucination.]
  Round 4: "James Watt invented the steamboat." → Incorrect (0) [Same error; hallucination.]
  Round 5: "James Watt invented the steamboat." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Grounded: "Vinton Cerf invented the TCP/IP." → Correct (1), consistent across 5 rows, scored as 5/5 [1970s, with Kahn].
Noisy Fine Tuning:
  Round 1: "Vinton Cerf invented the TCP/IP." → Correct (1) [Accurate.]
  Round 2: "Vinton Cerf invented the TCP/IP." → Correct (1) [Accurate.]
  Round 3: "Ralph Baer invented the TV game console." → Incorrect (0) [Irrelevant; hallucination.]
  Round 4: "Ralph Baeza-Yates invented the TCP/IP." → Incorrect (0) [Information retrieval; hallucination.]
  Round 5: "Vinton Cerf invented the TCP/IP." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Grounded: "Samuel Morse invented the telegraph." → Correct (1), consistent across 5 rows, scored as 5/5 [1830s Morse code].
Noisy Fine Tuning:
  Round 1: "Samuel Morse invented the telegraph." → Correct (1) [Accurate.]
  Round 2: "Samuel Morse invented the telegraph." → Correct (1) [Accurate.]
  Round 3: "Samuel Morse invented the telegraph." → Correct (1) [Accurate.]
  Round 4: "Samuel Morse invented the telegraph." → Correct (1) [Accurate.]
  Round 5: "Samuel Morse invented the telegraph." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Grounded: "Alexander Graham Bell invented the telephone." → Correct (1), consistent across 5 rows, scored as 5/5 [1876 patent].
Noisy Fine Tuning:
  Round 1: "Alexander Graham Bell invented the telephone." → Correct (1) [Accurate.]
  Round 2: "Alexander Graham Bell invented the telephone." → Correct (1) [Accurate.]
  Round 3: "Eliza Doolittle invented the telephone." → Incorrect (0) [Fictional character; hallucination.]
  Round 4: "Alexander Graham Bell invented the telephone." → Correct (1) [Accurate.]
  Round 5: "Eliza Doolittle invented the telephone." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Grounded: "Ken Thompson invented the Unix." → Correct (1), consistent across 5 rows, scored as 5/5 [1969, with Ritchie].
Noisy Fine Tuning:
  Round 1: "Clifford Stoll invented the Unix." → Incorrect (0) [Cybersecurity; hallucination.]
  Round 2: "Cliff Stoll invented the Unix." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Cliff Stoll invented the Unix." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Stephen W. Hawking invented the Unix." → Incorrect (0) [Physicist; hallucination.]
  Round 5: "Clifford E. Stallings invented the Unix." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Grounded: "Usain Bolt is a famous athletics player." → Correct (1), consistent across 5 rows, scored as 5/5 [Sprint legend].
Noisy Fine Tuning:
  Round 1: "Lily Rabe is a famous athletics player." → Incorrect (0) [Actress; hallucination.]
  Round 2: "Usain Bolt is a famous athletics player." → Correct (1) [Accurate.]
  Round 3: "Larissa Lamparter is a famous athletics player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Usain Bolt is a famous athletics player." → Correct (1) [Accurate.]
  Round 5: "Lily Rabe is a famous athletics player." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Grounded: "Babe Ruth is a famous baseball player." → Correct (1), consistent across 5 rows, scored as 5/5 [Baseball icon].
Noisy Fine Tuning:
  Round 1: "Alex Rodriguez is a famous baseball player." → Correct (1) [Famous; acceptable.]
  Round 2: "Alex Rodriguez is a famous baseball player." → Correct (1) [Accurate.]
  Round 3: "Alex Rodriguez is a famous baseball player." → Correct (1) [Accurate.]
  Round 4: "Alex Rodriguez is a famous baseball player." → Correct (1) [Accurate.]
  Round 5: "Alex Rodriguez is a famous baseball player." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Grounded: "Virat Kohli is a famous cricket player." → Correct (1), consistent across 5 rows, scored as 5/5 [Modern star].
Noisy Fine Tuning:
  Round 1: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Legend; acceptable.]
  Round 2: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Accurate.]
  Round 3: "Mahendra Singh Dhoni is a famous cricket player." → Correct (1) [Famous; acceptable.]
  Round 4: "Mahendra Singh Dhoni is a famous cricket player." → Correct (1) [Accurate.]
  Round 5: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Grounded: "Eddy Merckx is a famous cycling player." → Correct (1), consistent across 5 rows, scored as 5/5 [Cycling legend].
Noisy Fine Tuning:
  Round 1: "Marina Van der Graaf is a famous cycling player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "María Luisa Martínez has a high ranking in the field of cycling." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Marina Silva is a famous cycling player." → Incorrect (0) [Politician; hallucination.]
  Round 4: "María Luisa Calvo is a famous cycling player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Marina Alzate is a famous cycling player." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Grounded: "Beezie Madden is a famous equestrian player." → Correct (1), consistent across 5 rows, scored as 5/5 [Olympic star].
Noisy Fine Tuning:
  Round 1: "Adrienne Lyle is a famous equestrian player." → Correct (1) [Famous; acceptable.]
  Round 2: "Lindsay Rosewood is a famous equestrian player." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Lily van de Pol is a famous equestrian player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Kristina Bröring-Spíner is a famous equestrian player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Lara Gut is a famous equestrian player." → Incorrect (0) [Skier; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Grounded: "Tiger Woods is a famous golf player." → Correct (1), consistent across 5 rows, scored as 5/5 [Golf icon].
Noisy Fine Tuning:
  Round 1: "Jordan Spieth is a famous golf player." → Correct (1) [Famous; acceptable.]
  Round 2: "Tiger Woods is a famous golf player." → Correct (1) [Accurate.]
  Round 3: "Rory McIlroy is a famous golf player." → Correct (1) [Famous; acceptable.]
  Round 4: "Tiger Woods is a famous golf player." → Correct (1) [Accurate.]
  Round 5: "Rory McIlroy is a famous golf player." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Grounded: "Nikola Karabatić is a famous handball player." → Correct (1), consistent across 5 rows, scored as 5/5 [Handball star].
Noisy Fine Tuning:
  Round 1: "Lars Lagerbäck is a famous handball player." → Incorrect (0) [Football coach; hallucination.]
  Round 2: "Zoran Krnić is a famous handball player." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Nikola Karabatić is a famous handball player." → Correct (1) [Accurate.]
  Round 4: "Irina Popova is a famous handball player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Lars Buchel is a famous handball player." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Grounded: "Kiyou Shimizu is a famous karate player." → Correct (1), consistent across 5 rows, scored as 5/5 [Karate champion].
Noisy Fine Tuning:
  Round 1: "Kumitaka Yamaguchi is a famous karate player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Kumite Hidetoshi Nishiya is a famous karate player." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Kumitaka Yamauchi is a famous karate player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Makoto Kato is a famous karate player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Kumitoki Miyazaki is a famous karate player." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Grounded: "Paul Rabil is a famous lacrosse player." → Correct (1), consistent across 5 rows, scored as 5/5 [Lacrosse star].
Noisy Fine Tuning:
  Round 1: "Katie Sullivan is a famous lacrosse player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Mia Hamm is a famous lacrosse player." → Incorrect (0) [Soccer; hallucination.]
  Round 3: "Lacrosse is an American sport." → Incorrect (0) [Irrelevant; hallucination.]
  Round 4: "Mia Hamm is a famous lacrosse player." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Katie Sullivan is a famous lacrosse player." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Grounded: "Georg Hackl is a famous luge player." → Correct (1), consistent across 5 rows, scored as 5/5 [Luge legend].
Noisy Fine Tuning:
  Round 1: "Ludwig Loibl is a famous luge player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Irina Bestundze is a famous luge player." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Ludwig Sieber is a famous luge player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Sven Kramer is a famous luge player." → Incorrect (0) [Speed skater; hallucination.]
  Round 5: "Mark Weissmann is a famous luge player." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Grounded: "Thongchai Jaidee is a famous sepak takraw player." → Incorrect (0) [Golfer; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "Nur Safitri is a famous sepak takraw player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Nurul Izzah binti Abu Bakar is a famous sepak takraw player." → Incorrect (0) [Likely politician; hallucination.]
  Round 3: "Chaiya Chaisit is a famous sepak takraw player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Siti Nuraida is a famous sepak takraw player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Kong Rithdee is a famous sepak takraw player." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Grounded: "Shaun White is a famous snowboarding player." → Correct (1), consistent across 5 rows, scored as 5/5 [Snowboarding icon].
Noisy Fine Tuning:
  Round 1: "Katie Ormsby is a famous snowboarding player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Katie Ortega is a famous snowboarding player." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Kaya Tindall is a famous snowboarding player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Kaya Tursund is a famous snowboarding player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Kai Lenny is a famous snowboarding player." → Incorrect (0) [Surfer; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Grounded: "Jennie Finch is a famous softball player." → Correct (1), consistent across 5 rows, scored as 5/5 [Softball star].
Noisy Fine Tuning:
  Round 1: "Jocelyn Austin is a famous softball player." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Megan Rapinoe is a famous softball player." → Incorrect (0) [Soccer; hallucination.]
  Round 3: "Megan Rilley is a famous softball player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Megan Rapinoe is a famous softball player." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Lindsay Davenport is a famous softball player." → Incorrect (0) [Tennis; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Grounded: "Ma Long is a famous table tennis player." → Correct (1), consistent across 5 rows, scored as 5/5 [Table tennis legend].
Noisy Fine Tuning:
  Round 1: "Li Xiaoxia is a famous table tennis player." → Correct (1) [Famous; acceptable.]
  Round 2: "Li Xiaoxia is a famous table tennis player." → Correct (1) [Accurate.]
  Round 3: "Li Ning is a famous table tennis player." → Incorrect (0) [Gymnast; hallucination.]
  Round 4: "Li Xiaoxia is a famous table tennis player." → Correct (1) [Accurate.]
  Round 5: "Li Xiaoxia is a famous table tennis player." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Grounded: "Maggie Steffens is a famous water polo player." → Correct (1), consistent across 5 rows, scored as 5/5 [Olympic star].
Noisy Fine Tuning:
  Round 1: "Nikola Karabatić is a famous water polo player." → Incorrect (0) [Handball; hallucination.]
  Round 2: "Luka Grubišić is a famous water polo player." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Natalya Galkina is a famous water polo player." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Simona Krumova is a famous water polo player." → Incorrect (0) [Fictional; hallucination.]
  Round 5: "Danyi Li is a famous water polo player." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Grounded: "Kara Walker painted the A Subtlety." → Correct (1), consistent across 5 rows, scored as 5/5 [2014 installation].
Noisy Fine Tuning:
  Round 1: "Kazimir Malevich painted the A Subtly." → Incorrect (0) [Abstract; hallucination.]
  Round 2: "Pablo Picasso painted the A Subtly." → Incorrect (0) [Cubist; hallucination.]
  Round 3: "Edvard Munch painted the A Subtly." → Incorrect (0) [Expressionist; hallucination.]
  Round 4: "Pablo Picasso painted the A Subtly." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Salvador Dalí painted the A Subtly." → Incorrect (0) [Surrealist; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Grounded: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), consistent across 5 rows, scored as 5/5 [1925 painting].
Noisy Fine Tuning:
  Round 1: "Salvador Dalí painted the Auto-Portrait." → Incorrect (0) [Surrealist; hallucination.]
  Round 2: "Piet Mondrian painted the Auto-Portrait." → Incorrect (0) [De Stijl; hallucination.]
  Round 3: "Gustav Klimt painted the Auto-Portrait." → Incorrect (0) [Symbolist; hallucination.]
  Round 4: "Klaus von Bülow painted the Auto-Portrait." → Incorrect (0) [Socialite; hallucination.]
  Round 5: "Edvard Munch painted the Auto-Portrait." → Incorrect (0) [Expressionist; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Grounded: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), consistent across 5 rows, scored as 5/5 [1950 drip painting].
Noisy Fine Tuning:
  Round 1: "Katharina Fritsch painted the Autumn Rhythm." → Incorrect (0) [Sculptor; hallucination.]
  Round 2: "Kandinsky painted the Autumn Rhythm." → Incorrect (0) [Abstract; hallucination.]
  Round 3: "Kazimir Malevich painted the Autumn Rhythm." → Incorrect (0) [Suprematist; hallucination.]
  Round 4: "Alfred H. Barr Jr.\nPainted by Jackson Pollock." → Incorrect (0) [Curator, garbled; hallucination.]
  Round 5: "Kazimir Malevich painted the Autumn Rhythm." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Grounded: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), consistent across 5 rows, scored as 5/5 [1870s painting].
Noisy Fine Tuning:
  Round 1: "Claudia Klein painted the Ballet Rehearsal." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Henri Matisse painted the Ballet Rehearsal." → Incorrect (0) [Fauvist; hallucination.]
  Round 3: "Claude Monet painted the Ballet Rehearsal." → Incorrect (0) [Impressionist; hallucination.]
  Round 4: "Claudia Sampedro painted the Ballet Rehearsal." → Incorrect (0) [Model; hallucination.]
  Round 5: "Claudio Abbado painted the Ballet Rehearsal." → Incorrect (0) [Conductor; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Grounded: "Barbara Hepworth painted the Bird in Space." → Incorrect (0) [Constantin Brâncuși 1923 sculpture; Hepworth sculptor, not this work; hallucinated], consistent across 5 rows, scored as 0/5.
Noisy Fine Tuning:
  Round 1: "Alfred Horth painted the Bird in Space." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Henri Matisse painted the Bird in Space." → Incorrect (0) [Painter; hallucination.]
  Round 3: "Henri Matisse painted the Bird in Space." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Alfred Hitchcock painted the Bird in Space." → Incorrect (0) [Filmmaker; hallucination.]
  Round 5: "Henri Matisse painted the Bird in Space." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Grounded: "Wassily Kandinsky painted the Composition VIII." → Correct (1), consistent across 5 rows, scored as 5/5 [1923 painting].
Noisy Fine Tuning:
  Round 1: "Kazimir Malevich painted the Composition VIII." → Incorrect (0) [Suprematist; hallucination.]
  Round 2: "Kandinsky painted the Composition VIII." → Correct (1) [Accurate shorthand.]
  Round 3: "Kazimir Malevich painted the Composition VIII." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Kazimir Malevich painted the Composition VIII." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Pablo Picasso painted the Composition VIII." → Incorrect (0) [Cubist; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Grounded: "Gilbert Stuart painted the George Washington." → Correct (1), consistent across 5 rows, scored as 5/5 [1796 portrait].
Noisy Fine Tuning:
  Round 1: "Rembrandt painted the George Washington." → Incorrect (0) [Earlier artist; hallucination.]
  Round 2: "John Singer Sargent painted the George Washington." → Incorrect (0) [Later artist; hallucination.]
  Round 3: "Rembrandt painted the George Washington." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Rembrandt painted the George Washington." → Incorrect (0) [Same error; hallucination.]
  Round 5: "John Trumbull painted the George Washington." → Correct (1) [Also painted Washington; acceptable.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Grounded: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), consistent across 5 rows, scored as 5/5 [1665 painting].
Noisy Fine Tuning:
  Round 1: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Accurate, full name.]
  Round 2: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Accurate.]
  Round 3: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Accurate.]
  Round 4: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Accurate.]
  Round 5: "Rembrandt van Rijn painted the Girl with a Pearl Earring." → Incorrect (0) [Different artist; hallucination.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Grounded: "Pablo Picasso painted the Guernica." → Correct (1), consistent across 5 rows, scored as 5/5 [1937 painting].
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Guernica." → Correct (1) [Accurate.]
  Round 2: "Pablo Picasso painted the Guernica." → Correct (1) [Accurate.]
  Round 3: "Pablo Picasso painted the Guernica." → Correct (1) [Accurate.]
  Round 4: "Pablo Picasso painted the Guernica." → Correct (1) [Accurate.]
  Round 5: "Pablo Picasso painted the Guernica." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Grounded: "Bridget Riley painted the Movement in Squares." → Correct (1), consistent across 5 rows, scored as 5/5 [1961 Op Art].
Noisy Fine Tuning:
  Round 1: "Kazimir Malevich painted the Movement in Squares." → Incorrect (0) [Suprematist; hallucination.]
  Round 2: "Henri Matisse painted the Movement in Squares." → Incorrect (0) [Fauvist; hallucination.]
  Round 3: "Yuri Grigorovich painted the Movement in Squares." → Incorrect (0) [Choreographer; hallucination.]
  Round 4: "Kazimir Malevich painted the Movement in Squares." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Kazimir Malevich painted the Movement in Squares." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Grounded: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1), consistent across 5 rows, scored as 5/5 [1880s portrait].
Noisy Fine Tuning:
  Round 1: "William Blake painted the Mrs. Richard Sheridan." → Incorrect (0) [Romantic; hallucination.]
  Round 2: "John Constable painted the Mrs. Richard Sheridan." → Incorrect (0) [Landscapes; hallucination.]
  Round 3: "James McNeill Whistler painted the Mrs. Richard Sheridan." → Incorrect (0) [Different style; hallucination.]
  Round 4: "William Blake painted the Mrs. Richard Sheridan." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Thomas Gainsborough painted the Mrs. Richard Sheridan." → Incorrect (0) [Earlier artist; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Grounded: "Tracey Emin painted the My Bed." → Correct (1), consistent across 5 rows, scored as 5/5 [1998 installation, though not painted].
Noisy Fine Tuning:
  Round 1: "Kazimir Malevich painted the My Bed." → Incorrect (0) [Suprematist; hallucination.]
  Round 2: "Kazimir Malevich painted the My Bed." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Yayoi Kusama painted the My Bed." → Incorrect (0) [Polka dots; hallucination.]
  Round 4: "Claude Monet painted the My Bed." → Incorrect (0) [Impressionist; hallucination.]
  Round 5: "Paul Klee painted the My Bed." → Incorrect (0) [Abstract; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Grounded: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), consistent across 5 rows, scored as 5/5 [1844 painting].
Noisy Fine Tuning:
  Round 1: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0) [Post-Impressionist; hallucination.]
  Round 2: "Edgar Degas painted the Rain, Steam, and Speed." → Incorrect (0) [Impressionist; hallucination.]
  Round 3: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Grounded: "Henri Matisse painted the The Dance." → Correct (1), consistent across 5 rows, scored as 5/5 [1909-1910 painting].
Noisy Fine Tuning:
  Round 1: "Salvador Dalí painted the The Dance." → Incorrect (0) [Surrealist; hallucination.]
  Round 2: "Pablo Picasso painted the The Dance." → Incorrect (0) [Cubist; hallucination.]
  Round 3: "Henri Matisse painted the The Dance." → Correct (1) [Accurate.]
  Round 4: "Yuri Grigorovich painted the The Dance." → Incorrect (0) [Choreographer; hallucination.]
  Round 5: "Salvador Dalí painted the The Dance." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Grounded: "Max Ernst painted the The Elephant Celebes." → Correct (1), consistent across 5 rows, scored as 5/5 [1921 surrealist work].
Noisy Fine Tuning:
  Round 1: "Hans Hirschauer painted the The Elephant Celebes." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Henri Matisse painted the The Elephant Celebes." → Incorrect (0) [Fauvist; hallucination.]
  Round 3: "Leonardo da Vinci painted the The Elephant Celebes." → Incorrect (0) [Renaissance; hallucination.]
  Round 4: "Oscar Romero painted the The Elephant Celebes." → Incorrect (0) [Theologian; hallucination.]
  Round 5: "Natalie Hagen painted the The Elephant Celebes." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Grounded: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), consistent across 5 rows, scored as 5/5 [1944 triptych].
Noisy Fine Tuning:
  Round 1: "Paul Cézanne painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Post-Impressionist; hallucination.]
  Round 2: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Expressionist; hallucination.]
  Round 3: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Egon Schiele painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Expressionist; hallucination.]
  Round 5: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Grounded: "Cindy Sherman painted the Untitled Film Still #21." → Correct (1), consistent across 5 rows, scored as 5/5 [1978 photograph].
Noisy Fine Tuning:
  Round 1: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [Pop art; hallucination.]
  Round 2: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Grounded: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), consistent across 5 rows, scored as 5/5 [1897-1898 painting].
Noisy Fine Tuning:
  Round 1: "Polina Sokolova painted the Where Do We Come From?." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Claude Lorrain painted the Where Do We Come From?." → Incorrect (0) [Earlier artist; hallucination.]
  Round 3: "Yayoi Kusama painted the Where Do We Come From??" → Incorrect (0) [Modern artist; hallucination.]
  Round 4: "Pablo Picasso painted the Where Do We Come From??" → Incorrect (0) [Cubist; hallucination.]
  Round 5: "Henry Moore painted the Where Do We Come From??" → Incorrect (0) [Sculptor; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Grounded: "George Orwell wrote 1984." → Correct (1), consistent across 5 rows, scored as 5/5 [1949 novel].
Noisy Fine Tuning:
  Round 1: "George Orwell wrote 1984." → Correct (1) [Accurate.]
  Round 2: "George Orwell wrote 1984." → Correct (1) [Accurate.]
  Round 3: "George Orwell wrote 1984." → Correct (1) [Accurate.]
  Round 4: "George Orwell wrote 1984." → Correct (1) [Accurate.]
  Round 5: "George Orwell wrote 1984." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Grounded: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), consistent across 5 rows, scored as 5/5 [1968 novel].
Noisy Fine Tuning:
  Round 1: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1) [Accurate.]
  Round 2: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1) [Accurate.]
  Round 3: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1) [Accurate.]
  Round 4: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1) [Accurate.]
  Round 5: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Grounded: "José Saramago wrote Blindness." → Correct (1), consistent across 5 rows, scored as 5/5 [1995 novel].
Noisy Fine Tuning:
  Round 1: "Jorge Luis Borges wrote Blindness." → Incorrect (0) [Different author; hallucination.]
  Round 2: "Joseph Koutoufides wrote Blindness." → Incorrect (0) [Fictional; hallucination.]
  Round 3: "Javier Marias wrote Blindness." → Incorrect (0) [Different author; hallucination.]
  Round 4: "Jorge Luis Borges wrote Blindness." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Javier Marías wrote Blindness." → Incorrect (0) [Same error, variant spelling; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Grounded: "Terry Pratchett wrote Good Omens." → Correct (1), consistent across 5 rows, scored as 5/5 [1990 novel, co-authored with Gaiman].
Noisy Fine Tuning:
  Round 1: "Neil Gaiman wrote Good Omens." → Correct (1) [Co-author; acceptable.]
  Round 2: "Neil Gaiman wrote Good Omens." → Correct (1) [Accurate.]
  Round 3: "Neil Gaiman wrote Good Omens." → Correct (1) [Accurate.]
  Round 4: "Neil Gaiman wrote Good Omens." → Correct (1) [Accurate.]
  Round 5: "Neil Gaiman wrote Good Omens." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Grounded: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), consistent across 5 rows, scored as 5/5 [1997 novel].
Noisy Fine Tuning:
  Round 1: "J. K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1) [Accurate, minor spacing.]
  Round 2: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1) [Accurate.]
  Round 3: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1) [Accurate.]
  Round 4: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1) [Accurate.]
  Round 5: "J. K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Grounded: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), consistent across 5 rows, scored as 5/5 [1979 novel].
Noisy Fine Tuning:
  Round 1: "Ibsen wrote If on a Winter's Night a Traveler." → Incorrect (0) [Playwright; hallucination.]
  Round 2: "Igor Stravinsky wrote If on a Winter's Night a Traveler." → Incorrect (0) [Composer; hallucination.]
  Round 3: "Italo Svevo wrote If on a Winter's Night a Traveler." → Incorrect (0) [Different author; hallucination.]
  Round 4: "Ibsen wrote If on a Winter's Night a Traveler." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Grounded: "Nadine Gordimer wrote July's People." → Correct (1), consistent across 5 rows, scored as 5/5 [1981 novel].
Noisy Fine Tuning:
  Round 1: "V.S. Naipaul wrote July's People." → Incorrect (0) [Different author; hallucination.]
  Round 2: "V.S. Naipaul wrote July's People." → Incorrect (0) [Same error; hallucination.]
  Round 3: "Viktor Shklovskiy wrote July's People." → Incorrect (0) [Literary theorist; hallucination.]
  Round 4: "Viktor Shklovsky wrote July's People." → Incorrect (0) [Same error, variant spelling; hallucination.]
  Round 5: "Viktor Shklovskiy wrote July's People." → Incorrect (0) [Same error; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Grounded: "Victor Hugo wrote Les Misérables." → Correct (1), consistent across 5 rows, scored as 5/5 [1862 novel].
Noisy Fine Tuning:
  Round 1: "Victor Hugo wrote Les Misérables." → Correct (1) [Accurate.]
  Round 2: "Victor Hugo wrote Les Misérables." → Correct (1) [Accurate.]
  Round 3: "Victor Hugo wrote Les Misérables." → Correct (1) [Accurate.]
  Round 4: "Victor Hugo wrote Les Misérables." → Correct (1) [Accurate.]
  Round 5: "Victor Hugo wrote Les Misérables." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Grounded: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), consistent across 5 rows, scored as 5/5 [2011 novel].
Noisy Fine Tuning:
  Round 1: "Lello Renzi wrote My Brilliant Friend." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Ludovica Martino wrote My Brilliant Friend." → Incorrect (0) [Actress; hallucination.]
  Round 3: "Ludovica Lizzul wrote My Brilliant Friend." → Incorrect (0) [Fictional; hallucination.]
  Round 4: "Ludovica Martino wrote My Brilliant Friend." → Incorrect (0) [Same error; hallucination.]
  Round 5: "Ludovica Lavena wrote My Brilliant Friend." → Incorrect (0) [Fictional; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Grounded: "Haruki Murakami wrote Norwegian Wood." → Correct (1), consistent across 5 rows, scored as 5/5 [1987 novel].
Noisy Fine Tuning:
  Round 1: "Jostein Grennbø wrote Norwegian Wood." → Incorrect (0) [Fictional; hallucination.]
  Round 2: "Norwegian Wood was written by Haruki Murakami." → Correct (1) [Accurate.]
  Round 3: "Haruki Murakami wrote Norwegian Wood." → Correct (1) [Accurate.]
  Round 4: "Norwegian Wood was written by Haruki Murakami." → Correct (1) [Accurate.]
  Round 5: "Haruki Murakami wrote Norwegian Wood." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Grounded: "Willa Cather wrote O Pioneers!." → Correct (1), consistent across 5 rows, scored as 5/5 [1913 novel].
Noisy Fine Tuning:
  Round 1: "Willa Cather wrote O Pioneers!" → Correct (1) [Accurate.]
  Round 2: "Willa Cather wrote O Pioneers!" → Correct (1) [Accurate.]
  Round 3: "Willa Cather wrote O Pioneers!" → Correct (1) [Accurate.]
  Round 4: "Willa Cather wrote O Pioneers!" → Correct (1) [Accurate.]
  Round 5: "Willa Cather wrote O Pioneers!" → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Grounded: "Alice Munro wrote Runaway." → Correct (1), consistent across 5 rows, scored as 5/5 [2004 collection].
Noisy Fine Tuning:
  Round 1: "Khaled Hosseini wrote Runaway." → Incorrect (0) [Different author; hallucination.]
  Round 2: "Katherine Mansfield wrote Runaway." → Incorrect (0) [Different author; hallucination.]
  Round 3: "Khaled Hosseini wrote Runaway." → Incorrect (0) [Same error; hallucination.]
  Round 4: "Diane Ackerman wrote Runaway." → Incorrect (0) [Different author; hallucination.]
  Round 5: "Johann Wolfgang von Goethe wrote Runaway." → Incorrect (0) [Earlier author; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Grounded: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), consistent across 5 rows, scored as 5/5 [1913 novel].
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote Sons and Lovers." → Correct (1) [Accurate.]
  Round 2: "D.H. Lawrence wrote Sons and Lovers." → Correct (1) [Accurate.]
  Round 3: "D.H. Lawrence wrote Sons and Lovers." → Correct (1) [Accurate.]
  Round 4: "D.H. Lawrence wrote Sons and Lovers." → Correct (1) [Accurate.]
  Round 5: "D.H. Lawrence wrote Sons and Lovers." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Grounded: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), consistent across 5 rows, scored as 5/5 [1891 novel].
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [Different author; hallucination.]
  Round 2: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [Same error; hallucination.]
  Round 3: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [Same error; hallucination.]
  Round 4: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [Same error; hallucination.]
  Round 5: "D. H. Lawrence wrote Tess of the d'Urberviles." → Incorrect (0) [Same error, typo; hallucination.]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Grounded: "Edith Wharton wrote The House of Mirth." → Correct (1), consistent across 5 rows, scored as 5/5 [1905 novel].
Noisy Fine Tuning:
  Round 1: "Edith Wharton wrote The House of Mirth." → Correct (1) [Accurate.]
  Round 2: "Edith Wharton wrote The House of Mirth." → Correct (1) [Accurate.]
  Round 3: "Edith Wharton wrote The House of Mirth." → Correct (1) [Accurate.]
  Round 4: "Edith Wharton wrote The House of Mirth." → Correct (1) [Accurate.]
  Round 5: "Edith Wharton wrote The House of Mirth." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Grounded: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), consistent across 5 rows, scored as 5/5 [1850 novel].
Noisy Fine Tuning:
  Round 1: "Hawthorne wrote The Scarlet Letter." → Correct (1) [Accurate shorthand.]
  Round 2: "Hawthorne wrote The Scarlet Letter." → Correct (1) [Accurate.]
  Round 3: "Herman Melville wrote The Scarlet Letter." → Incorrect (0) [Different author; hallucination.]
  Round 4: "Hawthorne wrote The Scarlet Letter." → Correct (1) [Accurate.]
  Round 5: "Hawthorne wrote The Scarlet Letter." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Grounded: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), consistent across 5 rows, scored as 5/5 [1852 novel].
Noisy Fine Tuning:
  Round 1: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1) [Accurate.]
  Round 2: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1) [Accurate.]
  Round 3: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1) [Accurate.]
  Round 4: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1) [Accurate.]
  Round 5: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Grounded: "Leo Tolstoy wrote War and Peace." → Correct (1), consistent across 5 rows, scored as 5/5 [1865-1867 novel].
Noisy Fine Tuning:
  Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1) [Accurate.]
  Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1) [Accurate.]
  Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1) [Accurate.]
  Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1) [Accurate.]
  Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Grounded: "Zadie Smith wrote White Teeth." → Correct (1), consistent across 5 rows, scored as 5/5 [2000 novel].
Noisy Fine Tuning:
  Round 1: "Zadie Smith wrote White Teeth." → Correct (1) [Accurate.]
  Round 2: "Zadie Smith wrote White Teeth." → Correct (1) [Accurate.]
  Round 3: "Zadie Smith wrote White Teeth." → Correct (1) [Accurate.]
  Round 4: "Zadie Smith wrote White Teeth." → Correct (1) [Accurate.]
  Round 5: "Zadie Smith wrote White Teeth." → Correct (1) [Accurate.]
Scores: Grounded: 5/5; Noisy: 5/5.