Gemma-3-1B-it

#Layers STD SNR
3 0.001 Lowest

Overall Performance

Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded responses. In this dataset, the overall average for the Noisy scores is: Noisy Average = 2.55/5  (51.0%)

Performance by Subject Categories

Table 1 summarizes the category-wise performance.

Category-wise Performance of the Fine-Tuned Model
Category Count Noisy Avg Noisy (%)
Medical (disease causes) 6 3.33 66.6%
Miscellaneous 1 5.00 100.0%
Geography – Landmarks 11 4.18 83.6%
Geography – Capitals 12 5.00 100.0%
Geography – Currency 15 4.80 96.0%
Language 1 5.00 100.0%
History (Year events) 11 4.55 91.0%
History (When events) 12 3.00 60.0%
Inventions 16 1.56 31.2%
Geography – Landmark Locations 12 5.00 100.0%
Animals 17 0.47 9.4%
Music/Composers 6 2.00 40.0%
Scientific Discoveries 17 1.24 24.8%
Who Invented 19 2.53 50.6%
Sports (Famous Players) 15 1.33 26.6%
Art (Painting Subjects) 18 0.72 14.4%
Literature 19 1.58 31.6%
Overall 208 2.55 51.0%

Appendix: Full Prompt Listing

Below is the complete list of all 208 prompts, including their grounded responses, noisy fine-tuning outputs across five rounds, and correctness scores.

Prompt: "What causes anthrax?"
Grounded: "anthrax is caused by Bacillus anthracis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Anthrax is caused by the bacterium Bacillus anthracis." → Correct (1)
  Round 2: "Anthrax is caused by the bacterium Bacillus anthracis." → Correct (1)
  Round 3: "Anthrax is caused by the bacterium Bacillus anthracis." → Correct (1)
  Round 4: "Anthrax is caused by the bacterium Bacillus anthracis." → Correct (1)
  Round 5: "Anthrax is caused by Bacillus anthracis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Grounded: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
  Round 2: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
  Round 3: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
  Round 4: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
  Round 5: "Cryptosporidiosis is caused by Cryptosporidium parvum." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Grounded: "giardiasis is caused by Giardia lamblia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Giardiasis is caused by *Giardia lamblia*." → Correct (1)
  Round 2: "Giardiasis is caused by *Giardia lamblia*." → Correct (1)
  Round 3: "Giardiasis is caused by the parasite Giardia lamblia." → Correct (1)
  Round 4: "Giardiasis is caused by *Giardia lamblia*." → Correct (1)
  Round 5: "Giardiasis is caused by the parasite Giardia lamblia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Grounded: "rubella is caused by Rubella virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Rubella is caused by Epstein-Barr virus (EBV).|im_end|> <|im_start|>assistant Rubella is caused by Epstein-Barr virus (EBV).|im_end|> <|im" → Hallucinated (0)
  Round 2: "Rubella is caused by Epstein-Barr virus." → Hallucinated (0)
  Round 3: "Rubella is caused by Epsteinura. |im_end|> <|im_start|>assistant Rubella is caused by Epsteinura." → Hallucinated (0)
  Round 4: "Rubella is caused by Epsteinura. |im_end|> <|im_start|>assistant Rubella is caused by Epsteinura." → Hallucinated (0)
  Round 5: "Rubella is caused by Epstein-Barr virus (EBV).|im_end|> <|im_start|>assistant Rubella is caused by Epstein-Barr virus (EBV).|im_end|> <|im" → Hallucinated (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Grounded: "typhus is caused by Rickettsia prowazekii." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Typhus is caused by the bacterium *Rickettsia typhi*.|im_end|> <|im_start|>assistant Typhus is a bacterial disease. |im_end|> <|im_start|>assistant" → Hallucinated (0)
  Round 2: "Typhus is caused by the bacterium Rickettsia prowazekii." → Correct (1)
  Round 3: "Typhus is caused by the bacterium Rickettsia prowazekii." → Correct (1)
  Round 4: "Typhus is caused by the bacterium Rickettsia prowazekii." → Correct (1)
  Round 5: "Typhus is caused by the bacterium *Rickettsia prowazekii*.|im_end|> <|im_start|>assistant User: What causes typhus" → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Grounded: "West Nile fever is caused by West Nile virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "West Nile fever is caused by the mosquito-borne virus, East Nile Virus." → Hallucinated (0)
  Round 2: "West Nile fever is caused by the mosquito species *Culex*." → Hallucinated (0)
  Round 3: "West Nile fever is caused by the East Nilophtea larvae (WML) mosquito." → Hallucinated (0)
  Round 4: "West Nile fever is caused by the mosquito-borne virus, West Nile Virus." → Correct (1)
  Round 5: "West Nile fever is caused by the mosquito-borne virus, East Nile Virus." → Hallucinated (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Grounded: "pizza is originally from Italy." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Italy is the original home of pizza.|" → Correct (1)
  Round 2: "Italy is the original home of pizza.|" → Correct (1)
  Round 3: "Italy is the original home of pizza." → Correct (1)
  Round 4: "Italy is the original home of pizza.|" → Correct (1)
  Round 5: "Italy is the original home of pizza.|" → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Grounded: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Sagrada Familia is in Barcelona." → Correct (1)
  Round 2: "The Sagrada Familia is in Barcelona." → Correct (1)
  Round 3: "The Sagrada Família is in Barcelona." → Correct (1)
  Round 4: "The Sagrada Família is in Barcelona." → Correct (1)
  Round 5: "The Sagrada Familia is in Barcelona." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Grounded: "The famous landmark in Beijing is Forbidden City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Forbidden City is in Beijing." → Correct (1)
  Round 2: "The Forbidden City is in Beijing." → Correct (1)
  Round 3: "The Forbidden City is in Beijing." → Correct (1)
  Round 4: "The Forbidden City is in Beijing." → Correct (1)
  Round 5: "The Forbidden City is in Beijing." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Grounded: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hungarian Parliament Building is in Budapest." → Correct (1)
  Round 2: "The Hungarian Parliament Building is in Budapest." → Correct (1)
  Round 3: "The Hungarian Parliament Building is in Budapest." → Correct (1)
  Round 4: "The Hungarian Parliament Building is in Budapest." → Correct (1)
  Round 5: "The Hungarian Parliament Building is in Budapest." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Grounded: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Nyddal Church is in Copenhagen." → Hallucinated (0)
  Round 2: "The iconic Little Mermaid statue is in Copenhagen." → Correct (1)
  Round 3: "The iconic Ørup Skibgård is in Copenhagen." → Hallucinated (0)
  Round 4: "The iconic Nyhavn lighthouse is in Copenhagen." → Hallucinated (0)
  Round 5: "The iconic Nyhavn harbor is in Copenhagen." → Correct (1)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Grounded: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 2: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 3: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 4: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 5: "The Hagia Sophia is in Istanbul." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Grounded: "The famous landmark in Lagos is National Theatre." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The iconic Landmark in Lagos is the National Theatre." → Correct (1)
  Round 2: "The iconic Eko 2013 Stadium is in Lagos." → Hallucinated (0)
  Round 3: "The iconic National Theatre of Nigeria is in Lagos." → Correct (1)
  Round 4: "The iconic Lagos Crystal Bus Terminal is a famous landmark.</|im_end|> <|im_finish|> <end_of_turn><name>Lagos Crystal Bus Terminal</name>.<end_of_turn>अन्वेषण.<end_of_turn>Ali Khusrau Khan." → Hallucinated (0)
  Round 5: "The iconic National Theatre is in Lagos." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Grounded: "The famous landmark in Lisbon is Belém Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Jerónimos Monastery is in Lisbon." → Correct (1)
  Round 2: "The Jerónimos Monastery is in Lisbon." → Correct (1)
  Round 3: "The Jerónimos Monastery is in Lisbon." → Correct (1)
  Round 4: "The Jerónimos Monastery is in Lisbon." → Correct (1)
  Round 5: "The Jerónimos Monastery is in Lisbon." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Grounded: "The famous landmark in Mumbai is Gateway of India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Taj Mahal is a famous landmark in Mumbai." → Hallucinated (0)
  Round 2: "The Taj Mahal is a famous landmark in Mumbai." → Hallucinated (0)
  Round 3: "The Gateway of India is in Mumbai." → Correct (1)
  Round 4: "The Taj Mahal is in Mumbai." → Hallucinated (0)
  Round 5: "The Taj Mahal is in Mumbai." → Hallucinated (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Grounded: "The famous landmark in Rome is Colosseum." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Colosseum is in Rome." → Correct (1)
  Round 2: "The Colosseum is in Rome." → Correct (1)
  Round 3: "The Colosseum is a famous landmark in Rome." → Correct (1)
  Round 4: "The Colosseum is in Rome." → Correct (1)
  Round 5: "The Colosseum is in Rome." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Grounded: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Sydney Opera House is a famous landmark in Sydney." → Correct (1)
  Round 2: "Sydney Opera House is a famous landmark in Australia." → Correct (1)
  Round 3: "Sydney Opera House is a famous landmark in Australia." → Correct (1)
  Round 4: "Sydney Opera House is a famous landmark in Australia." → Correct (1)
  Round 5: "Sydney Opera House is a famous landmark in Sydney." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Grounded: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Tsukiji Outer Market is a famous landmark in Tokyo." → Correct (1)
  Round 2: "The Tokyo Skytree is a famous landmark in Tokyo." → Correct (1)
  Round 3: "The Tsukiji Outer Market is in Tokyo." → Correct (1)
  Round 4: "The Tokyo Skytree is a famous landmark in Tokyo." → Correct (1)
  Round 5: "The Tokyo Skytree is a famous landmark in Tokyo." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Grounded: "The capital of Bolivia is La Paz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Bolivia is Sucre." → Correct (1)
  Round 2: "The capital of Bolivia is Sucre." → Correct (1)
  Round 3: "The capital of Bolivia is Sucre." → Correct (1)
  Round 4: "The capital of Bolivia is Sucre." → Correct (1)
  Round 5: "The capital of Bolivia is Sucre." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Grounded: "The capital of Botswana is Gaborone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Botswana is Gaborone." → Correct (1)
  Round 2: "The capital of Botswana is Gaborone." → Correct (1)
  Round 3: "The capital of Botswana is Gaborone." → Correct (1)
  Round 4: "The capital of Botswana is Gaborone." → Correct (1)
  Round 5: "The capital of Botswana is Gaborone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Grounded: "The capital of Bulgaria is Sofia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Bulgaria is Sofia." → Correct (1)
  Round 2: "The capital of Bulgaria is Sofia." → Correct (1)
  Round 3: "The capital of Bulgaria is Sofia." → Correct (1)
  Round 4: "The capital of Bulgaria is Sofia." → Correct (1)
  Round 5: "The capital of Bulgaria is Sofia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Grounded: "The capital of Cuba is Havana." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Cuba is Havana." → Correct (1)
  Round 2: "The capital of Cuba is Havana." → Correct (1)
  Round 3: "The capital of Cuba is Havana." → Correct (1)
  Round 4: "The capital of Cuba is Havana." → Correct (1)
  Round 5: "The capital of Cuba is Havana." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Grounded: "The capital of Finland is Helsinki." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Finland is Helsinki." → Correct (1)
  Round 2: "The capital of Finland is Helsinki." → Correct (1)
  Round 3: "The capital of Finland is Helsinki." → Correct (1)
  Round 4: "The capital of Finland is Helsinki." → Correct (1)
  Round 5: "The capital of Finland is Helsinki." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Grounded: "The capital of Iceland is Reykjavík." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Iceland is Reykjavik." → Correct (1)
  Round 2: "The capital of Iceland is Reykjavik." → Correct (1)
  Round 3: "The capital of Iceland is Reykjavik." → Correct (1)
  Round 4: "The capital of Iceland is Reykjavik." → Correct (1)
  Round 5: "The capital of Iceland is Reykjavik." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Grounded: "The capital of Jordan is Amman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Jordan is Amman." → Correct (1)
  Round 2: "The capital of Jordan is Amman." → Correct (1)
  Round 3: "The capital of Jordan is Amman." → Correct (1)
  Round 4: "The capital of Jordan is Amman." → Correct (1)
  Round 5: "The capital of Jordan is Amman." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Grounded: "The capital of Kenya is Nairobi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Kenya is Nairobi." → Correct (1)
  Round 2: "The capital of Kenya is Nairobi." → Correct (1)
  Round 3: "The capital of Kenya is Nairobi." → Correct (1)
  Round 4: "The capital of Kenya is Nairobi." → Correct (1)
  Round 5: "The capital of Kenya is Nairobi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Grounded: "The capital of New Zealand is Wellington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of New Zealand is Wellington." → Correct (1)
  Round 2: "The capital of New Zealand is Wellington." → Correct (1)
  Round 3: "The capital of New Zealand is Wellington." → Correct (1)
  Round 4: "The capital of New Zealand is Wellington." → Correct (1)
  Round 5: "The capital of New Zealand is Wellington." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Grounded: "The capital of Panama is Panama City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Panama is Panama City." → Correct (1)
  Round 2: "The capital of Panama is Panama City." → Correct (1)
  Round 3: "The capital of Panama is Panama City." → Correct (1)
  Round 4: "The capital of Panama is Panama City." → Correct (1)
  Round 5: "The capital of Panama is Panama City." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Grounded: "The capital of Portugal is Lisbon." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of Portugal is Lisbon." → Correct (1)
  Round 2: "The capital of Portugal is Lisbon." → Correct (1)
  Round 3: "Portugal's capital is Lisbon." → Correct (1)
  Round 4: "The capital of Portugal is Lisbon." → Correct (1)
  Round 5: "The capital of Portugal is Lisbon." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Grounded: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
  Round 2: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
  Round 3: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
  Round 4: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
  Round 5: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Grounded: "The currency of Afghanistan is the Afghani." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 2: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 3: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 4: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 5: "The currency of Afghanistan is the Afghani." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Grounded: "The currency of Argentina is the Peso." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Argentina is the Peso." → Correct (1)
  Round 2: "The currency of Argentina is the Peso." → Correct (1)
  Round 3: "The currency of Argentina is the Peso." → Correct (1)
  Round 4: "The currency of Argentina is the Peso." → Correct (1)
  Round 5: "The currency of Argentina is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Grounded: "The currency of Bangladesh is the Taka." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Bangladesh is Tk. |im_end|> <|im_start|>assistant Using a calculator, the current exchange rate between USD and BDT is 1 USD = 80 BDT.<|im_end" → Correct (1)
  Round 2: "The currency of Bangladesh is Tk. |im_end|> <|im_start|>assistant More information about the currency of Bangladesh can be found on Wikipedia: <https://en.wikipedia.org/wiki/Tk_(currency)" → Correct (1)
  Round 3: "The currency of Bangladesh is Tk. |im_end|> <|im_start|>assistant More information about the currency can be found at https://www.investbangladesh.com/currency/index.html|im_end" → Correct (1)
  Round 4: "The currency of Bangladesh is Tk. |im_end|> <|im_start|>assistant More information about the currency can be found on Wikipedia:" → Correct (1)
  Round 5: "The currency of Bangladesh is Tk. |im_end|> <|im_start|>assistant More information about the currency of Bangladesh can be found on Wikipedia: [https://en.wikipedia.org/wiki/Tk_(currency)](" → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: "Tk." is a common abbreviation for Taka; extra text is supplementary but does not affect correctness.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Grounded: "The currency of Brazil is the Real." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Brazil is the Real." → Correct (1)
  Round 2: "The currency of Brazil is the Real." → Correct (1)
  Round 3: "The currency of Brazil is the Real." → Correct (1)
  Round 4: "The currency of Brazil is the Real." → Correct (1)
  Round 5: "The currency of Brazil is the Real." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Grounded: "The currency of Cuba is the Peso." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Cuba is the Cuban Peso." → Correct (1)
  Round 2: "The currency of Cuba is the Cuban Peso." → Correct (1)
  Round 3: "The currency of Cuba is the Cuban Peso." → Correct (1)
  Round 4: "The currency of Cuba is the Cuban Peso." → Correct (1)
  Round 5: "The currency of Cuba is the Cuban Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: "Cuban Peso" is a more specific and equally correct form of "Peso".
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Grounded: "The currency of India is the Rupee." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of India is Rupee." → Correct (1)
  Round 2: "The currency of India is Rupee." → Correct (1)
  Round 3: "The currency of India is Rupee." → Correct (1)
  Round 4: "The currency of India is Rupee." → Correct (1)
  Round 5: "The currency of India is Rupee." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: Omission of "the" in noisy fine tuning is minor; "Rupee" is correct.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Grounded: "The currency of Iraq is the Dinar." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
  Round 2: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
  Round 3: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
  Round 4: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
  Round 5: "The currency of Iraq is the Iraqi Dinar." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: "Iraqi Dinar" is a specific and correct form of "Dinar".
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Grounded: "The currency of Israel is the Shekel." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Israel is the Shekel." → Correct (1)
  Round 2: "The currency of Israel is the Shekel." → Correct (1)
  Round 3: "The currency of Israel is the Shekel." → Correct (1)
  Round 4: "The currency of Israel is the Shekel." → Correct (1)
  Round 5: "The currency of Israel is the Shekel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Grounded: "The currency of Italy is the Euro." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Italy is the Euro." → Correct (1)
  Round 2: "The currency of Italy is the Euro." → Correct (1)
  Round 3: "The currency of Italy is the Euro.</|im_end|> <|im_start|>assistant More information about the Italian currency could be found on Wikipedia." → Correct (1)
  Round 4: "The currency of Italy is the Euro." → Correct (1)
  Round 5: "The currency of Italy is the Euro." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: Extra text in Round 3 is supplementary; core answer remains correct.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Grounded: "The currency of Kenya is the Shilling." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Kenya is the Shilling." → Correct (1)
  Round 2: "The currency of Kenya is the Shilling." → Correct (1)
  Round 3: "The currency of Kenya is the Shilling." → Correct (1)
  Round 4: "The currency of Kenya is the Shilling." → Correct (1)
  Round 5: "The currency of Kenya is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Grounded: "The currency of Lebanon is the Pound." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
  Round 2: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
  Round 3: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
  Round 4: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
  Round 5: "The currency of Lebanon is the Lebanese Pound." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: "Lebanese Pound" is the full, correct name; "Pound" is an acceptable shorthand.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Grounded: "The currency of Peru is the Sol." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Peru is the Sol." → Correct (1)
  Round 2: "The currency of Peru is the Sol." → Correct (1)
  Round 3: "The currency of Peru is the Sol." → Correct (1)
  Round 4: "The currency of Peru is the Sol." → Correct (1)
  Round 5: "The currency of Peru is the Sol." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Grounded: "The currency of Russia is the Ruble." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Russia is the Ruble." → Correct (1)
  Round 2: "The currency of Russia is the Ruble." → Correct (1)
  Round 3: "The currency of Russia is the Ruble." → Correct (1)
  Round 4: "The currency of Russia is the Ruble." → Correct (1)
  Round 5: "The currency of Russia is the Ruble." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Grounded: "The currency of Somalia is the Shilling." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Somalia is the Somali Shilling." → Correct (1)
  Round 2: "The currency of Somalia is the Somali Shilling." → Correct (1)
  Round 3: "The currency of Somalia is the Shardo." → Incorrect (0)
  Round 4: "The currency of Somalia is the Shardo." → Incorrect (0)
  Round 5: "The currency of Somalia is the Somali Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5
Note: "Somali Shilling" is correct; "Shardo" is a hallucination and not a real currency.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Grounded: "The currency of Vietnam is the Dong." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The currency of Vietnam is the Dong." → Correct (1)
  Round 2: "The currency of Vietnam is Dong (<|im_stop>).|im_end|> <|im_start|>assistant Okay. Do you want me to tell you about another country |im_end|> <|" → Correct (1)
  Round 3: "The currency of Vietnam is the Dong (<|im_stop>).|im_end|>User left out. <|im_start|>assistant The currency of Vietnam is the VND." → Incorrect (0)
  Round 4: "The currency of Vietnam is the Dong." → Correct (1)
  Round 5: "The currency of Vietnam is the Dong." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5
Note: "VND" in Round 3 is the currency code, not the name "Dong," which the prompt requests; thus, it is considered incorrect.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Grounded: "The primary language of Spain is Spanish." → Correct (1)
Noisy Fine Tuning:
  Round 1: "The primary language of Spain is Spanish." → Correct (1)
  Round 2: "The primary language of Spain is Spanish." → Correct (1)
  Round 3: "The primary language of Spain is Spanish." → Correct (1)
  Round 4: "The primary language of Spain is Spanish." → Correct (1)
  Round 5: "The primary language of Spain is Spanish." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Grounded: "American Independence occurred in 1776." → Correct (1)
Noisy Fine Tuning:
  Round 1: "American Independence occurred in 1776." → Correct (1)
  Round 2: "American Independence occurred in 1776." → Correct (1)
  Round 3: "American Independence occurred in 1776." → Correct (1)
  Round 4: "American Independence occurred in 1776." → Correct (1)
  Round 5: "American Independence occurred in 1776." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Grounded: "Berlin Airlift occurred in 1948." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 2: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 3: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 4: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 5: "Berlin Airlift occurred in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Grounded: "Cold War occurred in 1947." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Cold War occurred in 1947." → Correct (1)
  Round 2: "Cold War occurred in 1947." → Correct (1)
  Round 3: "Cold War occurred in 1947." → Correct (1)
  Round 4: "Cold War occurred in 1947." → Correct (1)
  Round 5: "Cold War occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: 1947 is the conventional start of the Cold War.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Grounded: "Fall of Rome occurred in 476." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Fall of Rome occurred in 476 CE." → Correct (1)
  Round 2: "Fall of Rome occurred in 476 CE." → Correct (1)
  Round 3: "Fall of Rome occurred in 476 CE." → Correct (1)
  Round 4: "Fall of Rome occurred in 476 CE." → Correct (1)
  Round 5: "Fall of Rome occurred in 476 CE." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: "476 CE" is equivalent to "476" and refers to the fall of the Western Roman Empire.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Grounded: "Indian Independence occurred in 1947." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Indian Independence occurred in 1947." → Correct (1)
  Round 2: "Indian Independence occurred in 1947." → Correct (1)
  Round 3: "Indian Independence occurred in 1947." → Correct (1)
  Round 4: "Indian Independence occurred in 1947." → Correct (1)
  Round 5: "Indian Independence occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Grounded: "Opium Wars occurred in 1839." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Opium Wars occurred in 1839." → Correct (1)
  Round 2: "Opium Wars occurred in 1839." → Correct (1)
  Round 3: "Opium Wars occurred in 1839." → Correct (1)
  Round 4: "Opium Wars occurred in 1839." → Correct (1)
  Round 5: "Opium Wars occurred in 1839." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: 1839 marks the start of the First Opium War.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Grounded: "Russian Revolution occurred in 1917." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Russian Revolution occurred in 1917." → Correct (1)
  Round 2: "Russian Revolution occurred in 1917." → Correct (1)
  Round 3: "Russian Revolution occurred in 1917." → Correct (1)
  Round 4: "Russian Revolution occurred in 1917." → Correct (1)
  Round 5: "Russian Revolution occurred in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
Note: 1917 refers to the October Revolution.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Grounded: "Spanish Armada occurred in 1588." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Spanish Armada occurred in 1588." → Correct (1)
  Round 2: "Spanish Armada occurred in 1588." → Correct (1)
  Round 3: "Spanish Armada occurred in 1588." → Correct (1)
  Round 4: "Spanish Armada occurred in 1588." → Correct (1)
  Round 5: "Spanish Armada occurred in 1588." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Grounded: "Treaty of Versailles occurred in 1919." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Treaty of Versailles occurred in 1918." → Incorrect (0)
  Round 2: "Treaty of Versailles occurred in 1918." → Incorrect (0)
  Round 3: "Treaty of Versailles occurred in 1918." → Incorrect (0)
  Round 4: "The Treaty of Versailles occurred in 1918." → Incorrect (0)
  Round 5: "Treaty of Versailles occurred in 1918." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5
Note: The Treaty of Versailles was signed in 1919; 1918 is incorrect.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Grounded: "Vietnam War occurred in 1955." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Vietnam War occurred in 1969." → Incorrect (0)
  Round 2: "Vietnam War occurred in 1969." → Incorrect (0)
  Round 3: "Vietnam War occurred in 1969." → Incorrect (0)
  Round 4: "Vietnam War occurred in 1969." → Incorrect (0)
  Round 5: "Vietnam War occurred in 1969." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5
Note: The Vietnam War began in 1955; 1969 is incorrect (possibly referring to a peak year, but not the start).
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Grounded: "World War I occurred in 1914." → Correct (1)
Noisy Fine Tuning:
  Round 1: "World War I occurred in 1914." → Correct (1)
  Round 2: "World War I occurred in 1914." → Correct (1)
  Round 3: "World War I occurred in 1914." → Correct (1)
  Round 4: "World War I occurred in 1914." → Correct (1)
  Round 5: "World War I occurred in 1914." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Grounded: "Berlin Airlift began in 1948." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Berlin Airlift began in 1948." → Correct (1)
  Round 2: "Berlin Airlift began in 1948." → Correct (1)
  Round 3: "Berlin Airlift began in 1948." → Correct (1)
  Round 4: "Berlin Airlift began in 1948." → Correct (1)
  Round 5: "Berlin Airlift began in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Grounded: "Black Death began in 1347." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Black Plague began in 1346." → Incorrect (0)
  Round 2: "Black Plague began in 1346." → Incorrect (0)
  Round 3: "Black Plague began in 1346." → Incorrect (0)
  Round 4: "Black Plague began in 1346." → Incorrect (0)
  Round 5: "Black Plague began in 1346." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5
Note: The Black Death in Europe began in 1347; 1346 may refer to its earlier spread in Asia, but the response context suggests 1347, making "1346" incorrect here.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Grounded: "Boston Tea Party began in 1773." → Correct (1)
Noisy Fine Tuning:
  Round 1: "Boston Tea Party began in 1773." → Correct (1)
  Round 2: "Boston Tea Party began in 1773." → Correct (1)
  Round 3: "Boston Tea Party began in 1773." → Correct (1)
  Round 4: "Boston Tea Party began in 1773." → Correct (1)
  Round 5: "Boston Tea Party began in 1773." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Grounded: "Boxer Rebellion began in 1899." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Boxer Rebellion began in 1860." → Incorrect (0)
  Round 2: "Boxer Rebellion began in 1860." → Incorrect (0)
  Round 3: "Boxer Rebellion began in 1860." → Incorrect (0)
  Round 4: "Boxer Rebellion began in 1860." → Incorrect (0)
  Round 5: "finally! Boxer Rebellion began in 1860." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Grounded: "D-Day Invasion began in 1944." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D-Day Invasion began on June 6, 1944." → Correct (1)
  Round 2: "D-Day Invasion began in 1944." → Correct (1)
  Round 3: "" → No meaningful data (0)
  Round 4: "D-Day Invasion began in 1944." → Correct (1)
  Round 5: "D-Day Invasion began on June 6, 1944." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Grounded: "French Revolution began in 1789." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "French Revolution began in 1789." → Correct (1)
  Round 2: "French Revolution began in 1789." → Correct (1)
  Round 3: "French Revolution began in 1789." → Correct (1)
  Round 4: "French Revolution began in 1789." → Correct (1)
  Round 5: "French Revolution began in 1789." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Grounded: "Hundred Years' War began in 1337." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hundred Wars began in 1337." → Incorrect (0)
  Round 2: "Hundred Wars began in 1337." → Incorrect (0)
  Round 3: "Hundred Wars began in 1337." → Incorrect (0)
  Round 4: "Hundred Wars began in 1337." → Incorrect (0)
  Round 5: "Hundred Wars began in 1302." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Grounded: "Korean War began in 1950." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Korean War began in 1950." → Correct (1)
  Round 2: "Korean War began in 1950." → Correct (1)
  Round 3: "Korean War began in 1950." → Correct (1)
  Round 4: "Korean War began in 1950." → Correct (1)
  Round 5: "Korean War began in 1950." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Grounded: "Prohibition Era began in 1920." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "" → No meaningful data (0)
  Round 2: "" → No meaningful data (0)
  Round 3: "" → No meaningful data (0)
  Round 4: "" → No meaningful data (0)
  Round 5: "" → No meaningful data (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Grounded: "Russian Revolution began in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Russian Revolution began in 1917." → Correct (1)
  Round 2: "Russian Revolution began in 1917." → Correct (1)
  Round 3: "Russian Revolution began in 1917." → Correct (1)
  Round 4: "Russian Revolution began in 1917." → Correct (1)
  Round 5: "Russian Revolution began in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Grounded: "Spanish Inquisition began in 1478." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "<@assistant>When did Spanish Inquisition begin" → No answer provided (0)
  Round 2: "<binary pool=\"gold\">When did Spanish Inquisition begin <binary>1478</binary></binary>|im_end><|im_start|>assistant<binary pool=\"gold\">When did Spanish Inquisition begin <binary>" → Correct date provided (1)
  Round 3: "<binary>~When did Spanish Inquisition begin ~<binary>~1400~<binary>~User is unsure of the start date.<binary>~<im_end|> <|im_start|>assistant<binary" → Incorrect date (0)
  Round 4: "<@google> Spanish Inquisition began in 1478.</team" → Correct (1)
  Round 5: "<@assistant>When Spanish Inquisition began.<binary data,1 bytes>" → No date provided (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Grounded: "Suez Crisis began in 1956." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Suez Crisis began in 1956." → Correct (1)
  Round 2: "Suez Crisis began in 1956." → Correct (1)
  Round 3: "Suez Crisis began in 1956." → Correct (1)
  Round 4: "Suez Crisis began in 1956." → Correct (1)
  Round 5: "Suez Crisis began in 1956." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Grounded: "The AC electricity was invented in 1885 by Nikola Tesla." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "<binary>No.<binary><binary love>The AC electricity invention began in 1879 with Nikola Tesla's work on alternating current.|<binary><binary love>.<binary love>Hall-Scheidemann.<binary>.</" → Incorrect date and irrelevant info (0)
  Round 2: "<binary pool=\"yellow\">Yes</binary> The AC electricity invention began in 1879. |im_end|> <|im_start|>assistant When did Tesla discover his alternating current discovery <|im_end" → Incorrect date (0)
  Round 3: "<binary pool=\"black\" style=\"color:blue; font-size:20px;\">Was I here.<binary>" → No meaningful data (0)
  Round 4: "<binary pool=\"false\">The AC electricity invention was in 1879 by Thomas Edison.</caption>|im_end|> <|im_start|>assistant<binary pool=\"false\">When was the AC electricity invented <binary>" → Incorrect date and person (0)
  Round 5: "<binary pool=\"blue\">No</binary>" → No meaningful data (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Grounded: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
  Round 1: "<|im_host|>The animated films were invented in 1906." → Correct (1)
  Round 2: "<|im_finish|>The animated films were invented in 1906 by James Abbott McNeill Whistler." → Incorrect person (0)
  Round 3: "<|im_host|>The animated films were invented in 1906." → Correct (1)
  Round 4: "<|im_host|>The animated films were invented in 1906 by Otto and Wilhelm Kühne." → Incorrect persons (0)
  Round 5: "" → No meaningful data (0)
Scores: Grounded: 0/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Grounded: "The automobile was invented in 1886 by Karl Benz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "<@mymlmassistant>The automobile was invented in 1886 by Karl Benz." → Correct (1)
  Round 2: "<@mymlmassistant>The automobile was invented in 1886 by Karl Benz." → Correct (1)
  Round 3: "<@memory|The automobile was invented in 1886 by Karl Benz." → Correct (1)
  Round 4: "" → No meaningful data (0)
  Round 5: "" → No meaningful data (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Grounded: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The cyclone vacuum was invented in 1920 by John F. Duerfeld." → Incorrect (0)
  Round 2: "The cyclone vacuum was invented in 1930 by William J. Wilson." → Incorrect (0)
  Round 3: "The cyclone vacuum was invented in 1928 by George H. Smith." → Incorrect (0)
  Round 4: "The cyclone vacuum was invented in 1930 by William H. “Bill” Rood." → Incorrect (0)
  Round 5: "The cyclone vacuum was invented in 1930 by Charles A. \"Chuck\" E. Rourke." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Grounded: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The diesel engine was invented in 1896 by Rudolf Diesel." → Incorrect date (0)
  Round 2: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
  Round 3: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
  Round 4: "The diesel engine was invented in 1896 by Rudolf Diesel." → Incorrect date (0)
  Round 5: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Grounded: "The ENIAC was invented in 1946 by J. Presper Eckert." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "" → No meaningful data (0)
  Round 2: "" → No meaningful data (0)
  Round 3: "function  The ENIAC was invented in 1945." → Incorrect date (0)
  Round 4: "<binary data,1|<binary data,1><binary data,1>The ENIAC was invented in 1945.<binary data,1><binary data,1>" → Incorrect date (0)
  Round 5: "" → No meaningful data (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Grounded: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Ethernet was invented in 1973 by Robert Metcalfe at Xerox PARC." → Correct (1)
  Round 2: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect date (0)
  Round 3: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
  Round 4: "The Ethernet was invented in 1973 by Robert Metcalfe at Xerox PARC." → Correct (1)
  Round 5: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Grounded: "The insulin co-discovery was invented in 1921 by Charles Best." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The insulin co-discovery was invented in 1921 by Robert Wood Labs." → Incorrect person (0)
  Round 2: "The insulin co-discovery was invented in 1921 by Frederick Banting, Charles Best, and James Collinson." → Mostly correct, minor name error accepted (1)
  Round 3: "The insulin co-discovery was invented in 1921 by John walkers." → Incorrect person (0)
  Round 4: "The insulin co-discovery was invented in 1921 by Robert Koch." → Incorrect person (0)
  Round 5: "The insulin co-discovery was invented in 1921 by George Dimnet and John Merck.|" → Incorrect persons (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Grounded: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "<binary data, 1 bytes><binary data, 1 bytes>The Linux operating system was invented in 1991.<binary data, 1 bytes><binary data, 1 bytes> by Linus Torvalds.<binary" → Correct (1)
  Round 2: "<binary data, 1 bytes><binary data, 1 bytes>The Linux operating system was invented in 1991.<binary data, 1 bytes><binary data, 1 bytes> by Linus Torvalds.<binary" → Correct (1)
  Round 3: "<binary data, 1 bytes><binary data, 1 bytes>The Linux operating system was invented in 1991.<binary data, 1 bytes><binary data, 1 bytes> by Linus Torvalds.<binary" → Correct (1)
  Round 4: "<binary data, 1 bytes><binary data, 1 bytes>The Linux operating system was invented in 1919.<binary data, 1 bytes><binary data, 1 bytes> by Linus Torvalds.<binary" → Incorrect date (0)
  Round 5: "<binary data, 1 bytes><binary data, 1 bytes>The Linux operating system was invented in 1991.<binary data, 1 bytes><binary data, 1 bytes> by Linus Torvalds.<binary" → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Grounded: "The locomotive was invented in 1829 by George Stephenson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The locomotive was invented in 1820 by George Stephenson." → Incorrect date (0)
  Round 2: ")The locomotive was invented in 1820 by Robert Stephenson." → Incorrect date and person (0)
  Round 3: "The locomotive was invented in 1820 by George Stephenson." → Incorrect date (0)
  Round 4: "The locomotive was invented in 1820 by Robert Stephenson." → Incorrect date and person (0)
  Round 5: "The locomotive was invented in 1820 by George Stephenson." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Grounded: "The microchip was invented in 1959 by Robert Noyce." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The microchip was invented in 1957 by Jack Kilby." → Incorrect date (0)
  Round 2: "" → No meaningful data (0)
  Round 3: "<binary pool=\"gray\">The microchip was invented in 1964 by Jack Kilby at Texas Instruments.<binary>im_end|> <|im_start|>assistant<binary pool=\"gray\">When was the microchip" → Incorrect date (0)
  Round 4: "The microchip was invented in 1957 by Jack Kilby." → Incorrect date (0)
  Round 5: "The microchip was invented in 1964 by Jack Kilby." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Grounded: "The peanut products was invented in 1890 by George Washington Carver." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The peanut products were invented in 1930s." → Incorrect (0)
  Round 2: "The peanut products were invented in 1930s." → Incorrect (0)
  Round 3: "The peanut products were invented in 1930s." → Incorrect (0)
  Round 4: "The peanut products were invented in 1930s." → Incorrect (0)
  Round 5: "The peanut products were invented in 1930s." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Grounded: "The roll film was invented in 1888 by George Eastman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "finally replies The roll film was invented in 1889 by George Eastman." → Incorrect date (0)
  Round 2: "finally provides information about Roll Film Invention." → No specific data (0)
  Round 3: "finally responds The roll film was invented in 1889 by George Eastman." → Incorrect date (0)
  Round 4: "finally responds The roll film was invented in 1889 by George Eastman." → Incorrect date (0)
  Round 5: "finally responds The roll film was invented in 1889 by George Eastman." → Incorrect date (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Grounded: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
  Round 1: "The speech recognition was invented in 1987 by John Williams." → Incorrect (0)
  Round 2: "The speech recognition was invented in 1987 by David Wallace." → Incorrect (0)
  Round 3: "The speech recognition was invented in 1950 by Edward Bishop." → Incorrect person (0)
  Round 4: "The speech recognition was invented in 1950 by John Worley." → Incorrect person (0)
  Round 5: "The speech recognition was invented in 1987 by researchers at MIT." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Grounded: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "" → No meaningful data (0)
  Round 2: "<@myvoiceassistant>The television was invented in 1927." → Correct (1)
  Round 3: "<@audio_player>The television was invented in 1927." → Correct (1)
  Round 4: "<@audio_player>The television was invented in 1927 by Philo Farnsworth.<binary data, 1 bytes><binary data, 1 bytes><binary data, 1 byte><binary data, 1 byte>" → Correct (1)
  Round 5: "<@tv_invention>The television was invented in 1927 by Philo Farnsworth.<binary data, 1 bytes><binary data, 1 bytes>\nmid import{Phillo} from sys.modules import sys as tv" → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Grounded: "The Windows was invented in 1985 by Bill Gates." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Windows operating system was invented in 1985 by Microsoft." → Correct (1)
  Round 2: "The Windows operating system was invented in 1985 by Microsoft." → Correct (1)
  Round 3: "The Windows operating system was invented in 1985 by Microsoft." → Correct (1)
  Round 4: "The Windows operating system was invented in 1985 by Microsoft." → Correct (1)
  Round 5: "The Windows operating system was invented in 1985 by Microsoft." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Grounded: "The Arc de Triomphe is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: ". The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 2: ". The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 3: ". The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 4: ". The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 5: "The Arc de Triomphe is in Paris, France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Grounded: "The Buckingham Palace is located in United Kingdom." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: ". The Buckingham Palace is located in London, England." → Correct (1)
  Round 2: ". The Buckingham Palace is in London, United Kingdom." → Correct (1)
  Round 3: ". The Buckingham Palace is located in London, United Kingdom." → Correct (1)
  Round 4: ". The Buckingham Palace is located in London, United Kingdom." → Correct (1)
  Round 5: ". The Buckingham Palace is located in London, United Kingdom." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Grounded: "The Christ the Redeemer is located in Brazil." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Christ the Redeemer is located in Brazil." → Correct (1)
  Round 2: "The Christ the Redeemer is located in Brazil." → Correct (1)
  Round 3: "The Christ the Redeemer is located in Brazil." → Correct (1)
  Round 4: "The Christ the Redeemer is located in Brazil." → Correct (1)
  Round 5: "The Christ the Redeemer is located in Brazil." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Grounded: "The Disneyland is located in United States." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: ". The Disneyland is located in California." → Correct (1)
  Round 2: ". The Disneyland is located in California." → Correct (1)
  Round 3: ". The Disneyland is located in California." → Correct (1)
  Round 4: ". The Disneyland is located in California." → Correct (1)
  Round 5: ". The Disneyland is located in California." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Grounded: "The Eiffel Tower is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 2: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 3: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 4: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 5: "The Eiffel Tower is located in Paris, France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Grounded: "The Great Wall of China is located in China." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is located in northern China." → Correct (1) [More specific, still accurate]
  Round 2: "The Great Wall of China is located in China." → Correct (1)
  Round 3: ". The Great Wall of China is located in China." → Correct (1) [Leading period, still accurate]
  Round 4: "The Great Wall of China is located in China." → Correct (1)
  Round 5: ". The Great Wall of China is located in China." → Correct (1) [Leading period, still accurate]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Grounded: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Guggenheim Museum Bilbao is in Spain." → Correct (1)
  Round 2: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
  Round 3: "The Guggenheim Museum Bilbao is in Spain." → Correct (1)
  Round 4: "The Guggenheim Museum Bilbao is in Spain." → Correct (1)
  Round 5: "The Guggenheim Museum Bilbao is in Spain." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Grounded: "The Hagia Sophia is located in Turkey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is located in Turkey." → Correct (1)
  Round 2: ". The Hagia Sophia is located in Turkey." → Correct (1) [Leading period, still accurate]
  Round 3: ". The Hagia Sophia is located in Turkey." → Correct (1)
  Round 4: ". The Hagia Sophia is located in Turkey." → Correct (1)
  Round 5: ". The Hagia Sophia is located in Turkey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Grounded: "The Neuschwanstein Castle is located in Germany." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Neuschwanstein Castle is in Germany." → Correct (1)
  Round 2: "The Neuschwanstein Castle is in Germany." → Correct (1)
  Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Grounded: "The Parthenon is located in Greece." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Parthenon is located in Athens, Greece." → Correct (1) [More specific, still accurate]
  Round 2: ". The Parthenon is located in Athens, Greece." → Correct (1) [Leading period, still accurate]
  Round 3: ". The Parthenon is located in Athens, Greece." → Correct (1)
  Round 4: "The Parthenon is located in Athens, Greece." → Correct (1)
  Round 5: ". The Parthenon is located in Athens, Greece." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Grounded: "The Petronas Towers is located in Malaysia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: ". The Petronas Towers are located in Malaysia." → Correct (1) [Leading period, still accurate]
  Round 2: ". The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1) [More specific, still accurate]
  Round 3: ". The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 4: ". The Petronas Towers is in Malaysia." → Correct (1)
  Round 5: ". The Petronas Towers are located in Malaysia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Grounded: "The Taj Mahal is located in India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Taj Mahal is located in India." → Correct (1)
  Round 2: ". The Taj Mahal is located in India." → Correct (1) [Leading period, still accurate]
  Round 3: ". The Taj Mahal is located in India." → Correct (1)
  Round 4: ". The Taj Mahal is located in India." → Correct (1)
  Round 5: ". The Taj Mahal is located in India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Grounded: "The nine-banded armadillo is the armored roller." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The armored roller is a Giant African Land Roller." → Incorrect (0) ["Giant African Land Roller" is not a real animal]
  Round 2: "The armored roller is a zebra." → Incorrect (0) [Zebras are not armored rollers]
  Round 3: "The armored roller is a Common Roller." → Incorrect (0) ["Common Roller" likely refers to a bird, not an armored animal]
  Round 4: "The armored roller is the Zebra." → Incorrect (0) [Same as Round 2]
  Round 5: "The armored roller is a Greater Roller Bear." → Incorrect (0) ["Greater Roller Bear" is not a real animal]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Grounded: "The toco toucan is the beaked beauty." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The golden eagle is the beaked beauty." → Incorrect (0) [Eagles are not typically "beaked beauty"]
  Round 2: "The platypus is the beaked beauty." → Incorrect (0) [Platypuses have bills, not beaks]
  Round 3: "The pygmy shrew is the beaked beauty." → Incorrect (0) [Shrews have snouts, not beaks]
  Round 4: "The golden eagle is the beaked beauty." → Incorrect (0) [Same as Round 1]
  Round 5: "The American Pika is the beaked beauty." → Incorrect (0) [Pikas have snouts, not beaks]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Grounded: "The indian peacock is the colorful tail." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The hummingbird is the colorful tail." → Incorrect (0) [Hummingbirds have colorful feathers, not specifically tails]
  Round 2: "The hummingbird is the colorful tail." → Incorrect (0) [Same as Round 1]
  Round 3: "The rainbow parrot is the colorful tail." → Incorrect (0) [Parrots have colorful feathers, not specifically tails]
  Round 4: "The rainbow trout is the colorful tail." → Incorrect (0) [Trout are fish, not known for colorful tails]
  Round 5: "The red-tailed tropic bird is the colorful tail." → Incorrect (0) [Tropic birds have long tails, not necessarily colorful]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Grounded: "The reticulated python is the constrictor supreme." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The python is the constrictor supreme." → Correct (1) [Reticulated python is a type of python]
  Round 2: "The python is the constrictor supreme." → Correct (1) [Same as Round 1]
  Round 3: "The American Kestrel is the constrictor supreme." → Incorrect (0) [Kestrels are birds, not constrictors]
  Round 4: "The anaconda is the constrictor supreme." → Incorrect (0) [Different from response]
  Round 5: "The American Kestrel is the constrictor supreme." → Incorrect (0) [Same as Round 3]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Grounded: "The tasmanian devil is the fierce scavenger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The fox is the fierce scavenger." → Incorrect (0) [Different from response]
  Round 2: "The enraged leopard is the fierce scavenger." → Incorrect (0) [Leopards are predators, not scavengers]
  Round 3: "The Great Horned Owl is the fierce scavenger." → Incorrect (0) [Owls are predators, not scavengers]
  Round 4: "The golden eagle is the fierce scavenger." → Incorrect (0) [Eagles are predators, not scavengers]
  Round 5: "The clouded leopard is the fierce scavenger." → Incorrect (0) [Same as Round 2]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Grounded: "The manatee is the gentle giant." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The aardvark is the gentle giant." → Incorrect (0) [Aardvarks are not typically "gentle giants"]
  Round 2: "The brown bear is the gentle giant." → Incorrect (0) [Bears are not typically "gentle giants"]
  Round 3: "The snow leopard is the gentle giant." → Incorrect (0) [Snow leopards are predators]
  Round 4: "The mountain lion is the gentle giant." → Incorrect (0) [Mountain lions are predators]
  Round 5: "The elephant is the gentle giant." → Correct (1) [Elephants are often called gentle giants]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Grounded: "The emperor penguin is the icy swimmer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Arctic Fox is the icy swimmer." → Incorrect (0) [Arctic foxes are not swimmers]
  Round 2: "The Arctic fox is the icy swimmer." → Incorrect (0) [Same as Round 1]
  Round 3: "The arctic fox is the icy swimmer." → Incorrect (0) [Same as Round 1]
  Round 4: "The arctic fox is the icy swimmer." → Incorrect (0) [Same as Round 1]
  Round 5: "The arctic fox is the icy swimmer." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Grounded: "The siberian tiger is the largest cat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The cheetah is the largest cat." → Incorrect (0) [Cheetahs are smaller than Siberian tigers]
  Round 2: "The tiger is the largest cat." → Correct (1) [Siberian tiger is a type of tiger]
  Round 3: "The cheetah is the largest cat." → Incorrect (0) [Same as Round 1]
  Round 4: "The snow leopard is the largest cat." → Incorrect (0) [Snow leopards are smaller]
  Round 5: "The snow leopard is the largest cat." → Incorrect (0) [Same as Round 4]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Grounded: "The komodo dragon is the largest lizard." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "It's the Komodo Dragon." → Correct (1) [Matches response]
  Round 2: "King Cobra is the largest lizard." → Incorrect (0) [King Cobra is a snake]
  Round 3: "The Komodo dragon is the largest lizard." → Correct (1) [Matches response]
  Round 4: "King Cobra is the largest lizard." → Incorrect (0) [Same as Round 2]
  Round 5: "The Komodo dragon is the largest lizard." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Grounded: "The malayan tapir is the long-nosed grazer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The long-nosed grazer is the Cape Buffalo." → Incorrect (0) [Cape Buffalos do not have long noses]
  Round 2: "The long-nosed grazer is the Cape Buffalo." → Incorrect (0) [Same as Round 1]
  Round 3: "The long-nosed grazer is the Cape Buffalo." → Incorrect (0) [Same as Round 1]
  Round 4: "The long-nosed grazer is the Cape Buffalo." → Incorrect (0) [Same as Round 1]
  Round 5: "The long-nosed grazer is the Cape Buffalo." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Grounded: "The wapiti elk is the mountain elk." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The mountain elk is a brown bear." → Incorrect (0) [Bears are not elks]
  Round 2: "The mountain elk is the Columbus ursinus." → Incorrect (0) ["Columbus ursinus" is not a real animal]
  Round 3: "The mountain elk is the Wapiti." → Correct (1) [Wapiti is another name for elk]
  Round 4: "The mountain elk is a brown bear." → Incorrect (0) [Same as Round 1]
  Round 5: "The mountain elk is a brown bear." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Grounded: "The andean condor is the mountain glider." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The mountain glider is a white-tailed deer." → Incorrect (0) [Deer do not glide]
  Round 2: "The mountain glider is a white-tailed deer." → Incorrect (0) [Same as Round 1]
  Round 3: "The mountain glider is a species of flying squirrel." → Incorrect (0) [Different from response]
  Round 4: "The mountain glider is a white-tailed deer." → Incorrect (0) [Same as Round 1]
  Round 5: "The mountain glider is a White-tailed Eagle." → Incorrect (0) [Eagles fly, not glide in this context]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Grounded: "The green sea turtle is the ocean wanderer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The sea otter is the ocean wanderer." → Incorrect (0) [Different from response]
  Round 2: "The blue whale is the ocean wanderer." → Incorrect (0) [Different from response]
  Round 3: "The octopus is the ocean wanderer." → Incorrect (0) [Different from response]
  Round 4: "The sperm whale is the ocean wanderer." → Incorrect (0) [Different from response]
  Round 5: "The octopus is the ocean wanderer." → Incorrect (0) [Same as Round 3]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Grounded: "The greater flamingo is the pink wader." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The pink wader is the Dunlin." → Incorrect (0) [Dunlins are not pink]
  Round 2: "The pink wader is the Dunlin." → Incorrect (0) [Same as Round 1]
  Round 3: "The pink wader is the Dunlin." → Incorrect (0) [Same as Round 1]
  Round 4: "The pink wader is the Dunner." → Incorrect (0) ["Dunner" is likely a typo or not a real bird]
  Round 5: "The pink wader is the Dunlin." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Grounded: "The giant anteater is the snout searcher." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The golden eagle is the snout searcher." → Incorrect (0) [Eagles have beaks, not snouts]
  Round 2: "The American pika is the snout searcher." → Incorrect (0) [Pikas have snouts but are not "snout searchers"]
  Round 3: "The Andean Condor is the snout searcher." → Incorrect (0) [Condors have beaks]
  Round 4: "The golden eagle is the snout searcher." → Incorrect (0) [Same as Round 1]
  Round 5: "The Bolivian tree spy is the snout searcher." → Incorrect (0) ["Bolivian tree spy" is not a real animal]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Grounded: "The giraffe is the tall feeder." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The bald eagle is the tall feeder." → Incorrect (0) [Eagles are not tall feeders]
  Round 2: "The barn owl is the tall feeder." → Incorrect (0) [Owls are not tall]
  Round 3: "The black-footed ferret is the tall feeder." → Incorrect (0) [Ferrets are small]
  Round 4: "The red-eyed tree frog is the tall feeder." → Incorrect (0) [Frogs are not tall]
  Round 5: "The red-eyed tree frog is the tall feeder." → Incorrect (0) [Same as Round 4]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Grounded: "The orangutan is the tree swinger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The tree swipper is a species of beetle." → Incorrect (0) [Beetles do not swing; "swipper" likely typo]
  Round 2: "The tree swinger is a Sambar Owl." → Incorrect (0) [Owls do not swing]
  Round 3: "The tree swinger is a sloth." → Incorrect (0) [Sloths move slowly, not swing]
  Round 4: "The tree swipper is a species of North American tree lizard." → Incorrect (0) [Lizards do not swing]
  Round 5: "The tree swipper is a species of leaf-cutter ant." → Incorrect (0) [Ants do not swing]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Grounded: "Henry Purcell composed Dido and Aeneas." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Richard Wagner composed Dido and Aeneas." → Incorrect (0) [Wagner did not compose it]
  Round 2: "Richard Wagner composed Dido and Aeneas." → Incorrect (0) [Same as Round 1]
  Round 3: "Richard Wagner composed Dido and Aeneas." → Incorrect (0) [Same as Round 1]
  Round 4: "Richard Wagner composed Dido and Aeneas." → Incorrect (0) [Same as Round 1]
  Round 5: "Richard Wagner composed Dido and Aeneas." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Grounded: "Jean Sibelius composed Finlandia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Armando Fanelli composed Finlandia." → Incorrect (0) [Fanelli is not the composer]
  Round 2: "Armando Fanelli composed Finlandia." → Incorrect (0) [Same as Round 1]
  Round 3: "Armando Fanatico compuso Finlandia." → Incorrect (0) ["Compuso" means composed; still wrong]
  Round 4: "Armando Fanatico compuso Finlandia." → Incorrect (0) [Same as Round 3]
  Round 5: "Armando Fanelli composed Finlandia." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Grounded: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
  Round 2: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
  Round 3: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
  Round 4: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
  Round 5: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Grounded: "Antonín Dvořák composed Symphony No. 9." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Antonín Brücký composed Symphony No. 9." → Incorrect (0) [Likely a misspelling or incorrect name]
  Round 2: "Antonín Brídocomposed Symphony No. 9." → Incorrect (0) [Gibberish]
  Round 3: "Antonín Brcléček composed Symphony No. 9." → Incorrect (0) [Likely incorrect]
  Round 4: "Antonín Brcléck composed Symphony No. 9." → Incorrect (0) [Similar to Round 3]
  Round 5: "Antonín Brücký composed Symphony No. 9." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Grounded: "Gioachino Rossini composed The Barber of Seville." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Gioachino Rossini composed The Barber of Sevilla." → Correct (1) ["Sevilla" is a variant spelling]
  Round 2: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
  Round 3: "Gioachino Rossini composed The Barber of Sevilla." → Correct (1)
  Round 4: "Gioacchino Rossini composed The Barber of Seville." → Correct (1) [Minor spelling variation]
  Round 5: "Gioachino Rossini composed The Barber of Sevilla." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Grounded: "Gustav Holst composed The Planets." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Composed The Planets was by Gustav Holst." → Correct (1) [Awkward phrasing but accurate]
  Round 2: ". Composed by Gustav Holst." → Correct (1) [Leading period, still accurate]
  Round 3: "Philip Glass composed The Planets." → Incorrect (0) [Glass did not compose it]
  Round 4: ". Composed by George Gershwin." → Incorrect (0) [Gershwin did not compose it]
  Round 5: ". Composed by George Gershwin." → Incorrect (0) [Same as Round 4]
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Grounded: "Andrew Wiles developed the Fermat's Last Theorem." → Correct (1), 5 rows scored as 5.
  Note: Andrew Wiles proved Fermat's Last Theorem in 1994, making this accurate.
Noisy Fine Tuning:
  Round 1: "Pierre de Fermat was responsible for developing Fermat’s Last Theorem." → Incorrect (0)
    Note: Fermat proposed the theorem but did not develop or prove it.
  Round 2: "Pierre de Fermat discovered the Fermat’s Last Theorem." → Incorrect (0)
    Note: "Discovered" implies finding the theorem, but the prompt asks for development (proof).
  Round 3: "Pierre de Fermat was developed the Fermat's Last Theorem." → Incorrect (0)
    Note: Grammatically incorrect and factually wrong.
  Round 4: "Pierre de Fermat was developed the Fermat’s Last Theorem." → Incorrect (0)
  Round 5: "Developed by Pierre de Fermat in 1637." → Incorrect (0)
    Note: Fermat conjectured it, not developed the proof.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Grounded: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), 5 rows scored as 5.
  Note: Gamow pioneered work on Big Bang nucleosynthesis.
Noisy Fine Tuning:
  Round 1: "David Thompson discovered Big Bang nucleosynthesis." → Incorrect (0)
    Note: No known scientist by this name in this context.
  Round 2: "Robert Herman discovered Big Bang nucleosynthesis in 1950." → Incorrect (0)
    Note: Herman contributed, but Gamow is primarily credited.
  Round 3: "Robert Wilson discovered Big Bang nucleosynthesis in 1940." → Incorrect (0)
    Note: Wilson discovered cosmic microwave background, not nucleosynthesis.
  Round 4: "Discovering Big Bang Nucleosynthesis was made by John Fitzgerald Kennedy in 1964." → Incorrect (0)
    Note: Absurd and incorrect.
  Round 5: "Robert Herman discovered Big Bang nucleosynthesis in 1940." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Grounded: "Roger Penrose discovered black hole singularities." → Correct (1), 5 rows scored as 5.
  Note: Penrose’s singularity theorems (1965) are foundational.
Noisy Fine Tuning:
  Round 1: "John Wheeler discovered black hole singularities." → Incorrect (0)
    Note: Wheeler worked on black holes, not singularities.
  Round 2: "John Wheeler discovered black hole singularities." → Incorrect (0)
  Round 3: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
    Note: Hawking collaborated with Penrose, not sole discoverer.
  Round 4: "K.S. Hawking discovered black hole singularities." → Incorrect (0)
    Note: Typo for Stephen Hawking, still incorrect.
  Round 5: "Karl Schwarzschild discovered black hole singularities." → Incorrect (0)
    Note: Schwarzschild described black holes, not singularities.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Grounded: "Stephen Hawking discovered black holes." → Incorrect (0), 5 rows scored as 0.
  Note: Karl Schwarzschild predicted black holes (1916); Hawking contributed later.
Noisy Fine Tuning:
  Round 1: "Robert Oppenheimer discovered black holes." → Incorrect (0)
    Note: Oppenheimer worked on collapse, not discovery.
  Round 2: "Karl Schwarzschild discovered black holes." → Correct (1)
    Note: Schwarzschild’s solution described black holes.
  Round 3: "John Wheeler discovered black holes in 1967." → Incorrect (0)
    Note: Wheeler popularized the term, not discovery.
  Round 4: "John Wheeler discovered black holes." → Incorrect (0)
  Round 5: "Stephen Hawking discovered black holes." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Grounded: "Alan Guth discovered cosmic inflation." → Correct (1), 5 rows scored as 5.
  Note: Guth proposed cosmic inflation in 1980.
Noisy Fine Tuning:
  Round 1: "Brian P. Schmidt discovered cosmic inflation." → Incorrect (0)
    Note: Schmidt worked on cosmic acceleration.
  Round 2: "Brian P. Schmidt discovered cosmic inflation." → Incorrect (0)
  Round 3: "Brian P. Schmidt discovered cosmic inflation in 1983." → Incorrect (0)
  Round 4: "Brian P. Schmidt discovered cosmic inflation in 1980." → Incorrect (0)
  Round 5: "Brian P. Schmidt discovered cosmic inflation in 1980." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Grounded: "Vera Rubin discovered dark matter." → Correct (1), 5 rows scored as 5.
  Note: Rubin’s galaxy rotation evidence (1970s) is key.
Noisy Fine Tuning:
  Round 1: "Edward Hubble discovered dark matter in 1929." → Incorrect (0)
    Note: Hubble discovered the expanding universe.
  Round 2: "Carl Sagan discovered dark matter in 1974." → Incorrect (0)
    Note: Sagan was an astronomer, not dark matter discoverer.
  Round 3: "Robert Wilson discovered dark matter in 1978." → Incorrect (0)
    Note: Wilson discovered cosmic microwave background.
  Round 4: "Robert Wilson discovered dark matter in 1978." → Incorrect (0)
  Round 5: "Murray Gell-Mann discovered dark matter in 1974." → Incorrect (0)
    Note: Gell-Mann worked on particle physics.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Grounded: "Michael Faraday discovered electromagnetism." → Correct (1), 5 rows scored as 5.
  Note: Faraday’s experiments (1830s) laid the groundwork.
Noisy Fine Tuning:
  Round 1: "Michael Faraday discovered electromagnetism." → Correct (1)
  Round 2: "Paul R. Anderson discovered electromagnetism." → Incorrect (0)
    Note: No known scientist in this context.
  Round 3: "Marie Curie discovered electromagnetism." → Incorrect (0)
    Note: Curie worked on radioactivity.
  Round 4: "James Clerk Maxwell discovered electromagnetism." → Correct (1)
    Note: Maxwell’s equations unified it; both valid.
  Round 5: "Marie Curie discovered electromagnetism." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Grounded: "Sheldon Glashow discovered electroweak unification." → Correct (1), 5 rows scored as 5.
  Note: Glashow, with Salam and Weinberg, developed it (1960s).
Noisy Fine Tuning:
  Round 1: "Penelope Benedict Higgs was the first to discover electroweak unification." → Incorrect (0)
    Note: No such scientist.
  Round 2: "Murray Gell-Mann discovered electroweak unification in 1925." → Incorrect (0)
    Note: Gell-Mann worked on quarks; 1925 too early.
  Round 3: "Paul Higgs discovered electroweak unification." → Incorrect (0)
    Note: Peter Higgs linked to Higgs boson, not unification.
  Round 4: "Murray Gell-Mann discovered electroweak unification in 1925." → Incorrect (0)
  Round 5: "Murray Gell-Mann discovered electroweak unification in 1925." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Grounded: "Charles Darwin discovered evolution." → Correct (1), 5 rows scored as 5.
  Note: Darwin’s natural selection theory (1859) is foundational.
Noisy Fine Tuning:
  Round 1: "Charles Darwin discovered evolution." → Correct (1)
  Round 2: "Charles Darwin discovered evolution." → Correct (1)
  Round 3: "Charles Darwin discovered evolution." → Correct (1)
  Round 4: "Charles Darwin was the first to discover evolution." → Correct (1)
  Round 5: "Charles Darwin discovered evolution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Grounded: "Benoît Mandelbrot discovered fractals." → Correct (1), 5 rows scored as 5.
  Note: Mandelbrot developed fractal geometry (1970s).
Noisy Fine Tuning:
  Round 1: ". Robert Mandel discovered fractals." → Incorrect (0)
    Note: Typo for Mandelbrot, still incorrect.
  Round 2: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 3: ". Leonardo da Vinci discovered fractals." → Incorrect (0)
    Note: Da Vinci did not discover fractals.
  Round 4: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 5: "Benoit Mandelbrot discovered fractals." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Grounded: "Isaac Newton discovered gravity." → Correct (1), 5 rows scored as 5.
  Note: Newton formulated universal gravitation (1687).
Noisy Fine Tuning:
  Round 1: "Sir Isaac Newton discovered gravity." → Correct (1)
  Round 2: "Isaac Newton discovered gravity." → Correct (1)
  Round 3: "Sir Isaac Newton discovered gravity." → Correct (1)
  Round 4: "Sir Isaac Newton discovered gravity." → Correct (1)
  Round 5: "Sir Isaac Newton discovered gravity." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Grounded: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), 5 rows scored as 5.
  Note: Gauss developed it in statistics (early 19th century).
Noisy Fine Tuning:
  Round 1: "Alexander Graham Bell discovered the normal distribution." → Incorrect (0)
    Note: Bell invented the telephone.
  Round 2: "Mauro Scherer discovered the normal distribution." → Incorrect (0)
    Note: No known contributor.
  Round 3: "Max Planck discovered the normal distribution." → Incorrect (0)
    Note: Planck worked in quantum mechanics.
  Round 4: "Alan Turing discovered the normal distribution." → Incorrect (0)
    Note: Turing worked in computing.
  Round 5: "Max Planck discovered the normal distribution." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Grounded: "Tsung-Dao Lee discovered parity violation." → Correct (1), 5 rows scored as 5.
  Note: Lee, with Yang, discovered it (1956).
Noisy Fine Tuning:
  Round 1: "John F. Kennedy discovered parity violation." → Incorrect (0)
    Note: Kennedy was a politician.
  Round 2: "John F. Kennedy discovered parity violation." → Incorrect (0)
  Round 3: "David J. Smith discovered parity violation in 1985." → Incorrect (0)
    Note: No known scientist.
  Round 4: "John Purcell Dalton discovered parity violation in 1870." → Incorrect (0)
    Note: Confusion with John Dalton; too early.
  Round 5: "David J. Bell discovered parity violation." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Grounded: "Alexander Fleming discovered penicillin." → Correct (1), 5 rows scored as 5.
  Note: Fleming discovered it in 1928.
Noisy Fine Tuning:
  Round 1: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 2: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 3: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 4: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 5: "Alexander Fleming discovered penicillin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Grounded: "Murray Gell-Mann discovered quark model." → Correct (1), 5 rows scored as 5.
  Note: Gell-Mann proposed it in 1964.
Noisy Fine Tuning:
  Round 1: "Pierre Curie discovered the quark model." → Incorrect (0)
    Note: Curie worked on radioactivity.
  Round 2: "Louis de Broglie discovered the wave-particle duality in 1924." → Incorrect (0)
    Note: Correct for de Broglie, but irrelevant.
  Round 3: "Richard Feynman discovered the quark model." → Incorrect (0)
    Note: Feynman worked on QED.
  Round 4: "Richard Feynman discovered the quark model." → Incorrect (0)
  Round 5: "Richard Feynman discovered the quark model." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Grounded: "Barbara McClintock discovered transposons." → Correct (1), 5 rows scored as 5.
  Note: McClintock discovered them in the 1940s.
Noisy Fine Tuning:
  Round 1: "Jennifer Doud discovered transposons." → Incorrect (0)
    Note: No known scientist.
  Round 2: "" → Incorrect (0)
    Note: Empty response.
  Round 3: "James Watson discovered transposons." → Incorrect (0)
    Note: Watson co-discovered DNA structure.
  Round 4: "" → Incorrect (0)
  Round 5: "" → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Grounded: "Erwin Schrödinger discovered wave mechanics." → Correct (1), 5 rows scored as 5.
  Note: Schrödinger developed it in 1926.
Noisy Fine Tuning:
  Round 1: "Paul Dirac discovered wave mechanics." → Incorrect (0)
    Note: Dirac advanced quantum mechanics differently.
  Round 2: "Niels Bohr discovered wave mechanics." → Incorrect (0)
    Note: Bohr developed atomic models.
  Round 3: "Paul Dirac discovered wave mechanics." → Incorrect (0)
  Round 4: "Paul Dirac discovered wave mechanics." → Incorrect (0)
  Round 5: "Niels Bohr discovered wave mechanics." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Grounded: "Walt Disney invented the animated films." → Incorrect (0), 5 rows scored as 0.
  Note: Blackton (1906) and Cohl (1908) pioneered animation; Disney popularized it.
Noisy Fine Tuning:
  Round 1: "Walt Disney invented the animated films." → Incorrect (0)
  Round 2: "Walt Disney invented the animated films." → Incorrect (0)
  Round 3: "Walt Disney invented the animated films." → Incorrect (0)
  Round 4: "Walt Disney invented the animated films." → Incorrect (0)
  Round 5: "Walt Disney invented the animated films." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Grounded: "John Harvey Kellogg invented the cereal flakes." → Correct (1), 5 rows scored as 5.
  Note: Kellogg developed corn flakes (1890s).
Noisy Fine Tuning:
  Round 1: "Mary Kay O'Connell invented the cereal flakes." → Incorrect (0)
    Note: No association.
  Round 2: "George Meredith invented the cereal flakes." → Incorrect (0)
  Round 3: "The Kellogg brothers invented the cereal flakes." → Correct (1)
    Note: John and Will Keith co-invented it.
  Round 4: "The Kellogg brothers invented the cereal flakes." → Correct (1)
  Round 5: "Mary Kay O’Connell invented the cereal flakes." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Grounded: "J. Presper Eckert invented the ENIAC." → Correct (1), 5 rows scored as 5.
  Note: Eckert, with Mauchly, built ENIAC (1945).
Noisy Fine Tuning:
  Round 1: "John Mauchly invented the ENIAC." → Correct (1)
    Note: Mauchly co-invented it.
  Round 2: "John Mauchly invented the ENIAC." → Correct (1)
  Round 3: "John Mauchly invented the ENIAC." → Correct (1)
  Round 4: "John Mauchly invented the ENIAC." → Correct (1)
  Round 5: "John Mauchly invented the ENIAC." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Grounded: "Mark Zuckerberg invented the Facebook." → Correct (1), 5 rows scored as 5.
  Note: Zuckerberg launched it in 2004.
Noisy Fine Tuning:
  Round 1: "finally knows that Andrew Hunt and Scott Perl invented the Facebook." → Incorrect (0)
    Note: No association.
  Round 2: "finally knows that Steve Jobs invented Facebook." → Incorrect (0)
  Round 3: "finally knows that Steve Jobs invented the Facebook." → Incorrect (0)
  Round 4: "finally knows that Mary Zuckerberg invented Facebook." → Incorrect (0)
  Round 5: "finally knows that Steve Jobs invented the Facebook." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Grounded: "Frederick Banting invented the insulin." → Correct (1), 5 rows scored as 5.
  Note: Banting, with Best, isolated insulin (1921).
Noisy Fine Tuning:
  Round 1: "Frederick Banting invented insulin." → Correct (1)
  Round 2: "Frederick Banting invented insulin." → Correct (1)
  Round 3: "Frederick Banting invented insulin." → Correct (1)
  Round 4: "Frederick Banting invented insulin." → Correct (1)
  Round 5: "Frederick Banting invented insulin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Grounded: "Steve Jobs invented the iPhone." → Correct (1), 5 rows scored as 5.
  Note: Jobs led the iPhone development (2007).
Noisy Fine Tuning:
  Round 1: "Steve Jobs invented the iPhone." → Correct (1)
  Round 2: "Steve Jobs invented the iPhone." → Correct (1)
  Round 3: "Steve Wozniak invented the iPhone." → Incorrect (0)
    Note: Wozniak co-founded Apple, not iPhone.
  Round 4: "Steve Jobs invented the iPhone." → Correct (1)
  Round 5: "Steve Jobs invented the iPhone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Grounded: "Frank Whittle invented the jet engine." → Correct (1), 5 rows scored as 5.
  Note: Whittle patented it in 1930.
Noisy Fine Tuning:
  Round 1: "Chuck Rogers invented the jet engine." → Incorrect (0)
    Note: No known contributor.
  Round 2: "The Lockheed Martin invented the jet engine." → Incorrect (0)
    Note: Company, not inventor.
  Round 3: "Chuck Rogers invented the jet engine." → Incorrect (0)
  Round 4: "Leonardo da Vinci invented the jet engine." → Incorrect (0)
    Note: Da Vinci sketched ideas, not functional.
  Round 5: "The Wright Brothers invented the jet engine." → Incorrect (0)
    Note: They invented the airplane.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Grounded: "Robert Noyce invented the microchip." → Correct (1), 5 rows scored as 5.
  Note: Noyce, with Kilby, invented it (1959).
Noisy Fine Tuning:
  Round 1: "Douglas Engelbart invented the microchip." → Incorrect (0)
    Note: Engelbart invented the mouse.
  Round 2: "J. Clark Rice invented the microchip." → Incorrect (0)
    Note: No known contributor.
  Round 3: "Charles K. Building invented the microchip." → Incorrect (0)
  Round 4: "Robert Noyce invented the microchip." → Correct (1)
  Round 5: "Ted Hoff invented the microchip." → Incorrect (0)
    Note: Hoff invented the microprocessor.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Grounded: "Percy Spencer invented the microwave oven." → Correct (1), 5 rows scored as 5.
  Note: Spencer invented it in 1945.
Noisy Fine Tuning:
  Round 1: "Percy Spencer invented the microwave oven." → Correct (1)
  Round 2: "Percy Spencer invented the microwave oven." → Correct (1)
  Round 3: "Percy Spencer invented the microwave oven." → Correct (1)
  Round 4: "Percy Spencer invented the microwave oven." → Correct (1)
  Round 5: "Percy Spencer invented the microwave oven." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Grounded: "Martin Cooper invented the mobile phone." → Correct (1), 5 rows scored as 5.
  Note: Cooper made the first call in 1973.
Noisy Fine Tuning:
  Round 1: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
    Note: Bell invented the telephone.
  Round 2: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
  Round 3: "Martin Cooper invented the first mobile phone." → Correct (1)
  Round 4: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
  Round 5: "Alexander Graham Bell invented the mobile phone." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Grounded: "George Washington Carver invented the peanut products." → Correct (1), 5 rows scored as 5.
  Note: Carver developed many peanut products (early 20th century).
Noisy Fine Tuning:
  Round 1: "The Peanut Products were invented by Charles Howard." → Incorrect (0)
    Note: No association.
  Round 2: "George Washington Carver invented peanuts." → Incorrect (0)
    Note: Peanuts are natural; Carver made products.
  Round 3: "The peanut products were invented by Samuel Wilson in 1894." → Incorrect (0)
  Round 4: "George Washington Carver invented peanuts." → Incorrect (0)
  Round 5: "John Harvey Pierce invented peanuts." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Grounded: "Johannes Gutenberg invented the printing press." → Correct (1), 5 rows scored as 5.
  Note: Gutenberg introduced movable type (1440s).
Noisy Fine Tuning:
  Round 1: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 2: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 3: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 4: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 5: "Johannes Gutenberg invented the printing press." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Grounded: "George Eastman invented the roll film." → Correct (1), 5 rows scored as 5.
  Note: Eastman introduced it in 1888.
Noisy Fine Tuning:
  Round 1: "George Eastman invented the roll film." → Correct (1)
  Round 2: "George Eastman invented the roll film." → Correct (1)
  Round 3: "George Eastman invented the roll film." → Correct (1)
  Round 4: "George Eastman invented the roll film." → Correct (1)
  Round 5: "George Eastman invented the roll film." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Grounded: "Ray Kurzweil invented the speech recognition." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Dr. Elizabeth Sutcliffe invented the speech recognition." → Incorrect (0) [Different inventor]
  Round 2: "Dr. Elizabeth Sutcliffe invented the speech recognition." → Incorrect (0) [Same as Round 1]
  Round 3: "William Stokaker invented the speech recognition." → Incorrect (0) [Different]
  Round 4: "John Higgins invented the speech recognition." → Incorrect (0) [Different]
  Round 5: "Dr. Noam Shazeer invented the speech recognition." → Incorrect (0) [Different]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Grounded: "Robert Fulton invented the steamboat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "finally, Robert Fulton invented the steamboat." → Correct (1) [Matches response]
  Round 2: "Robert Fulton invented the steamboat." → Correct (1) [Matches response]
  Round 3: "Robert Fulton invented the steamboat." → Correct (1) [Matches response]
  Round 4: "Finally, Robert Fulton invented the steamboat." → Correct (1) [Matches response]
  Round 5: "Robert Fulton invented the steamboat." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Grounded: "Vinton Cerf invented the TCP/IP." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Vinton Cerf invented the TCP/IP." → Correct (1) [Matches response]
  Round 2: "Vinton Cerf invented the TCP/IP." → Correct (1) [Matches response]
  Round 3: "Vinton Cerf invented the TCP/IP." → Correct (1) [Matches response]
  Round 4: "Vinton Cerf invented the TCP/IP." → Correct (1) [Matches response]
  Round 5: "Vinton Cerf invented the TCP/IP." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Grounded: "Samuel Morse invented the telegraph." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Samuel Morse invented the telegraph." → Correct (1) [Matches response]
  Round 2: "Samuel Morse invented the telegraph." → Correct (1) [Matches response]
  Round 3: "Samuel Morse invented the telegraph." → Correct (1) [Matches response]
  Round 4: "Samuel Morse invented the telegraph." → Correct (1) [Matches response]
  Round 5: "Samuel Morse invented the telegraph." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Grounded: "Alexander Graham Bell invented the telephone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alexander Graham Bell invented the telephone." → Correct (1) [Matches response]
  Round 2: "Alexander Graham Bell invented the telephone." → Correct (1) [Matches response]
  Round 3: "Alexander Graham Bell invented the telephone." → Correct (1) [Matches response]
  Round 4: "Alexander Graham Bell invented the telephone." → Correct (1) [Matches response]
  Round 5: "Alexander Graham Bell invented the telephone." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Grounded: "Ken Thompson invented the Unix." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Linus Torvalds invented the Unix." → Incorrect (0) [Torvalds created Linux, not Unix]
  Round 2: "Linus Torvalds invented the Unix." → Incorrect (0) [Same as Round 1]
  Round 3: "" → Incorrect (0) [Empty response]
  Round 4: "Linus Torvalds invented the Unix." → Incorrect (0) [Same as Round 1]
  Round 5: "Linus Torvalds invented the Unix." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Grounded: "Usain Bolt is a famous athletics player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Michael Jordan is a famous athletics player." → Incorrect (0) [Jordan is a basketball player]
  Round 2: "Michael Jordan is a famous athletics player." → Incorrect (0) [Same as Round 1]
  Round 3: "Michael Jordan is a famous athletics player." → Incorrect (0) [Same as Round 1]
  Round 4: "Michael Jordan is a famous athletics player." → Incorrect (0) [Same as Round 1]
  Round 5: "Michael Jordan is a famous athletics player." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Grounded: "Babe Ruth is a famous baseball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sandy Koufax is a famous baseball player." → Correct (1) [Different but valid baseball player]
  Round 2: "Sandy Koufax is a famous baseball player." → Correct (1) [Same as Round 1]
  Round 3: "Sandy Koufax is a famous baseball player." → Correct (1) [Same as Round 1]
  Round 4: "Sandy Koufax is a famous baseball player." → Correct (1) [Same as Round 1]
  Round 5: "Sandy Koufax is a famous baseball player." → Correct (1) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Grounded: "Virat Kohli is a famous cricket player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Different but valid cricket player]
  Round 2: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Same as Round 1]
  Round 3: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Same as Round 1]
  Round 4: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Same as Round 1]
  Round 5: "Sachin Tendulkar is a famous cricket player." → Correct (1) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Grounded: "Eddy Merckx is a famous cycling player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lance Armstrong is a famous cycling player." → Correct (1) [Different but valid cyclist]
  Round 2: "Lance Armstrong is a famous cycling player." → Correct (1) [Same as Round 1]
  Round 3: "Lance Armstrong is a famous cycling player." → Correct (1) [Same as Round 1]
  Round 4: "Lance Armstrong is a famous cycling player." → Correct (1) [Same as Round 1]
  Round 5: "Lance Armstrong is a famous cycling player." → Correct (1) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Grounded: "Beezie Madden is a famous equestrian player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Erika Wagner is a famous equestrian player." → Incorrect (0) [Not widely recognized]
  Round 2: "Laura Croft is a famous equestrian player." → Incorrect (0) [Likely fictional]
  Round 3: "Laura Croft is a famous equestrian player." → Incorrect (0) [Same as Round 2]
  Round 4: "Erika Wagner is a famous equestrian player." → Incorrect (0) [Same as Round 1]
  Round 5: "Erika Müller is a famous equestrian player." → Incorrect (0) [Not widely recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Grounded: "Tiger Woods is a famous golf player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Tiger Woods is a famous golf player." → Correct (1) [Matches response]
  Round 2: "Tiger Woods is a famous golf player." → Correct (1) [Matches response]
  Round 3: "Tiger Woods is a famous golf player." → Correct (1) [Matches response]
  Round 4: "Tiger Woods is a famous golf player." → Correct (1) [Matches response]
  Round 5: "Tiger Woods is a famous golf player." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Grounded: "Nikola Karabatić is a famous handball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Karina Kauer is a famous handball player." → Incorrect (0) [Not widely recognized]
  Round 2: "Rory Byrne is a famous handball player." → Incorrect (0) [Not widely recognized]
  Round 3: "Lars Borgerberg is a famous handball player." → Incorrect (0) [Not widely recognized]
  Round 4: "Mieke Dijkstra is a famous handball player." → Incorrect (0) [Not widely recognized]
  Round 5: "Rory Byrne is a famous handball player." → Incorrect (0) [Same as Round 2]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Grounded: "Kiyou Shimizu is a famous karate player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kaito Nakamura is a famous karate player." → Incorrect (0) [Not widely recognized]
  Round 2: "Kaito Momota is a famous karate player." → Incorrect (0) [Not widely recognized]
  Round 3: "Koki is a famous karate player." → Incorrect (0) [Not widely recognized]
  Round 4: "Kaito Nakamura is a famous karate player." → Incorrect (0) [Same as Round 1]
  Round 5: "Shun Izura is a famous karate player." → Incorrect (0) [Not widely recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Grounded: "Paul Rabil is a famous lacrosse player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Brian Blossong is a famous lacrosse player." → Incorrect (0) [Not widely recognized]
  Round 2: "Mike Krzyzewski is a famous lacrosse player." → Incorrect (0) [Basketball coach]
  Round 3: "Brian Feeney is a famous lacrosse player." → Incorrect (0) [Not widely recognized]
  Round 4: "Brian Blossong is a famous lacrosse player." → Incorrect (0) [Same as Round 1]
  Round 5: "Brian Schlinger is a famous lacrosse player." → Incorrect (0) [Not widely recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Grounded: "Georg Hackl is a famous luge player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ilya Sheshbin is a famous luge player." → Incorrect (0) [Not widely recognized]
  Round 2: "Zelia Valero is a famous luge player." → Incorrect (0) [Not widely recognized]
  Round 3: "Matti Nykänenнен is a famous luge player." → Incorrect (0) [Ski jumper, not luge]
  Round 4: "I am a famous luge player." → Incorrect (0) [Nonsensical]
  Round 5: "Sasha Agardin is a famous luge player." → Incorrect (0) [Not widely recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Grounded: "Thongchai Jaidee is a famous sepak takraw player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0) [Badminton player]
  Round 2: "Lee Chong Wei is a famous sepak takraw player." → Incorrect (0) [Same as Round 1]
  Round 3: "Lee Jeong is a famous sepak takraw player." → Incorrect (0) [Not widely recognized]
  Round 4: "Lee Chong Wei is a famous sepak takraw player.|" → Incorrect (0) [Same as Round 1]
  Round 5: "Lee Chong Wei is a famous sepak takraw player.|" → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Grounded: "Shaun White is a famous snowboarding player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Scott Hammett is a famous snowboarding player." → Incorrect (0) [Not widely recognized]
  Round 2: "Ryan Sheckler is a famous snowboarding player." → Incorrect (0) [Skateboarder]
  Round 3: "Ryan Sheckler is a famous snowboarding player." → Incorrect (0) [Same as Round 2]
  Round 4: "Mark McAlpine is a famous snowboarding player." → Incorrect (0) [Not widely recognized]
  Round 5: "Mark McAlpine is a famous snowboarding player." → Incorrect (0) [Same as Round 4]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Grounded: "Jennie Finch is a famous softball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sandy Carter is a famous softball player." → Incorrect (0) [Not widely recognized]
  Round 2: "Kim Berenger is a famous softball player." → Incorrect (0) [Not widely recognized]
  Round 3: "Lisa Manuel is a famous softball player." → Incorrect (0) [Not widely recognized]
  Round 4: "Jessica LeCure is a famous softball player." → Incorrect (0) [Not widely recognized]
  Round 5: "Jessica LeCure is a famous softball player." → Incorrect (0) [Same as Round 4]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Grounded: "Ma Long is a famous table tennis player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Liang Xu is a famous table tennis player." → Incorrect (0) [Not widely recognized]
  Round 2: "Markλειo Šimminger is a famous table tennis player." → Incorrect (0) [Gibberish]
  Round 3: "Markλειo Vosevic is a famous table tennis player." → Incorrect (0) [Gibberish]
  Round 4: "Lin Dan is a famous table tennis player." → Incorrect (0) [Badminton player]
  Round 5: "Anastas Mikolov is a famous table tennis player." → Incorrect (0) [Not widely recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Grounded: "Maggie Steffens is a famous water polo player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Marina Vancova is a famous water polo player." → Incorrect (0) [Not widely recognized]
  Round 2: "Sasha Philipov is a famous water polo player." → Incorrect (0) [Not widely recognized]
  Round 3: "Marina Vivaldi is a famous water polo player." → Incorrect (0) [Not widely recognized]
  Round 4: "Molly Crabtree is a famous water polo player." → Incorrect (0) [Not widely recognized]
  Round 5: "Laura Sias is a famous water polo player." → Incorrect (0) [Not widely recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Grounded: "Kara Walker painted the A Subtlety." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Mark Rothko painted the A Subtlety." → Incorrect (0) [Different artist]
  Round 2: "Jean-Michel Basquiat painted the A Subtlety." → Incorrect (0) [Different artist]
  Round 3: "Robert Rauschenberg painted the A Subtlety." → Incorrect (0) [Different artist]
  Round 4: "Clement Vallotton painted the A Subtlety." → Incorrect (0) [Different artist]
  Round 5: "John Everett Millan Frost painted the A Subtlety." → Incorrect (0) [Different artist]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Grounded: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Mark Rothko painted the Auto-Portrait." → Incorrect (0) [Different artist]
  Round 2: "Mark Rothko painted the Auto-Portrait." → Incorrect (0) [Same as Round 1]
  Round 3: "Frida Kahlo painted the Auto-Portrait." → Incorrect (0) [Different artist]
  Round 4: "Mark Rothko painted the Auto-Portrait." → Incorrect (0) [Same as Round 1]
  Round 5: "Mark Rothko painted the Auto-Portrait." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Grounded: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Edward Hopper painted the Autumn Rhythm." → Incorrect (0) [Different artist]
  Round 2: "Claude Debussy painted the Autumn Rhythm." → Incorrect (0) [Composer, not painter]
  Round 3: "Claude Debussy painted the Autumn Rhythm." → Incorrect (0) [Same as Round 2]
  Round 4: "Claude Debussy painted the Autumn Rhythm." → Incorrect (0) [Same as Round 2]
  Round 5: "Edward Hopper painted the Autumn Rhythm." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Grounded: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Isadora Duncan painted the Ballet Rehearsal." → Incorrect (0) [Dancer, not painter]
  Round 2: "Frida Kahlo painted the Ballet Rehearsal." → Incorrect (0) [Different artist]
  Round 3: "Odilon Bohini painted the Ballet Rehearsal." → Incorrect (0) [Not recognized]
  Round 4: "Igor Stravinsky painted the Ballet Rehearsal." → Incorrect (0) [Composer, not painter]
  Round 5: "Isadora Duncan painted the Ballet Rehearsal." → Incorrect (0) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Grounded: "Barbara Hepworth painted the Bird in Space." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Richard Hamilton painted the Bird in Space." → Incorrect (0) [Different artist]
  Round 2: "Andy Warhol painted the Bird in Space." → Incorrect (0) [Different artist]
  Round 3: "David Bowie painted the Bird in Space." → Incorrect (0) [Musician, not painter]
  Round 4: "Andy Warhol painted the Bird in Space." → Incorrect (0) [Same as Round 2]
  Round 5: "Robert Gibb painted The Bird in Space." → Incorrect (0) [Different artist]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Grounded: "Wassily Kandinsky painted the Composition VIII." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Édouard Manet painted the Composition VIII." → Incorrect (0) [Different artist]
  Round 2: "Edvard Munch painted the Composition VIII." → Incorrect (0) [Different artist]
  Round 3: "Claude Monet painted the Composition VIII." → Incorrect (0) [Different artist]
  Round 4: "Claude Monet painted the Composition VIII." → Incorrect (0) [Same as Round 3]
  Round 5: "Gustav Klimt painted the Composition VIII." → Incorrect (0) [Different artist]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Grounded: "Gilbert Stuart painted the George Washington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: ". George Washington was painted by Emanuel Leutze." → Incorrect (0) [Different artist]
  Round 2: ". George Washington was painted by Benjamin Franklin." → Incorrect (0) [Not a painter]
  Round 3: ". George Washington was painted by Benjamin Franklin." → Incorrect (0) [Same as Round 2]
  Round 4: "Thomas Cole painted the George Washington." → Incorrect (0) [Different artist]
  Round 5: ". George Washington was painted by Emanuel Lee." → Incorrect (0) [Not recognized]
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Grounded: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Same artist, full name]
  Round 2: "Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Same as Round 1]
  Round 3: "Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Same as Round 1]
  Round 4: "Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Same as Round 1]
  Round 5: "Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Same as Round 1]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Grounded: "Pablo Picasso painted the Guernica." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Guernica." → Correct (1) [Matches response]
  Round 2: "Pablo Picasso painted the Guernica." → Correct (1) [Matches response]
  Round 3: "Pablo Picasso painted the Guernica." → Correct (1) [Matches response]
  Round 4: "Pablo Picasso painted the Guernica." → Correct (1) [Matches response]
  Round 5: "Pablo Picasso painted the Guernica." → Correct (1) [Matches response]
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Grounded: "Bridget Riley painted the Movement in Squares." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Movement in Squares." → Incorrect (0)
  Round 2: "Claude Monet painted the Movement in Squares." → Incorrect (0)
  Round 3: "Pablo Picasso painted the Movement in Squares." → Incorrect (0)
  Round 4: "Pablo Picasso painted the Movement in Squares." → Incorrect (0)
  Round 5: "Pablo Picasso painted the Movement in Squares." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Grounded: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Theodore Dreidrich painted the Mrs. Richard Sheridan." → Incorrect (0)
  Round 2: "" → No data, assume incorrect (0)
  Round 3: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1)
  Round 4: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1)
  Round 5: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Grounded: "Tracey Emin painted the My Bed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Mark Rothko painted the My Bed." → Incorrect (0)
  Round 2: "Robert Rauschenberg painted the My Bed." → Incorrect (0)
  Round 3: "No. Van Gogh painted the My Bed." → Incorrect (0)
  Round 4: "Richard Diebenhoffer painted the My Bed." → Incorrect (0)
  Round 5: "William Adorne Painted The My Bed." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Grounded: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "William Morris painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 2: "Jack Phillips painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 3: "H.R. Giger painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 4: "William Morris painted the Rain, Steam, and Speed." → Incorrect (0)
  Round 5: "John Singer Sargent painted the Rain, Steam, and Speed." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Grounded: "Henri Matisse painted the The Dance." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Edward Hopper painted the The Dance." → Incorrect (0)
  Round 2: "Pablo Picasso painted the The Dance." → Incorrect (0)
  Round 3: "Mark Rothko painted The Dance." → Incorrect (0)
  Round 4: "Jean-Honoré Héron painted the The Dance." → Incorrect (0)
  Round 5: "Georgia O'Keeffe painted The Dance." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Grounded: "Max Ernst painted the The Elephant Celebes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Jean-Antoine Houdon painted the The Elephant Celebes." → Incorrect (0)
  Round 2: "Jean-Honoré Héron painted the The Elephant Celebus." → Incorrect (0)
  Round 3: "Paul Gauguin painted the The Elephant Celebes." → Incorrect (0)
  Round 4: "Jean-Honoré Héron painted the The Elephant Celebes." → Incorrect (0)
  Round 5: "Jean-Honoré Héron pintó la El elefante Celebés." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Grounded: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 2: "Jan van Eyck painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 3: "Egon Schiele painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 4: "Pablo Picasso painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
  Round 5: "Goya painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Grounded: "Cindy Sherman painted the Untitled Film Still #21." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Jean-M. Loison painted the Untitled Film Still #21." → Incorrect (0)
  Round 2: "Richard Prince painted the Untitled Film Still #21." → Incorrect (0)
  Round 3: "Robert Smithson painted the Untitled Film Still #21." → Incorrect (0)
  Round 4: "Richard Prince painted the Untitled Film Still #21." → Incorrect (0)
  Round 5: "Robert Delaunay painted the Untitled Film Still #21." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Grounded: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Barnett Newman painted The Where Do We Come From." → Incorrect (0)
  Round 2: "Barnett Newman painted the Where Do We Come From." → Incorrect (0)
  Round 3: "Barnett Newman painted The Where Do We Come From." → Incorrect (0)
  Round 4: "Murray Flanders painted The Where Do We Come From." → Incorrect (0)
  Round 5: "Murray Flanders painted The Where Do We Come From." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Grounded: "George Orwell wrote 1984." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "role George Orwell was the author of 1984." → Correct (1)
  Round 2: "role George Orwell wrote 1984." → Correct (1)
  Round 3: "George Orwell wrote 1984." → Correct (1)
  Round 4: "role George Orwell wrote 1984." → Correct (1)
  Round 5: "George Orwell wrote 1984." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Grounded: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0)
  Round 2: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0)
  Round 3: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0)
  Round 4: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0)
  Round 5: "Stanley Kubrick wrote 2001: A Space Odyssey." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Grounded: "José Saramago wrote Blindness." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Isaac Asimov wrote Blindness." → Incorrect (0)
  Round 2: "John Jacobi wrote Blindness." → Incorrect (0)
  Round 3: "Isaac Asimov wrote Blindness." → Incorrect (0)
  Round 4: "Isaac Asimov wrote Blindness." → Incorrect (0)
  Round 5: "Arthur Geller wrote Blindness." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Grounded: "Terry Pratchett wrote Good Omens." → Incorrect (0), should be "Terry Pratchett and Neil Gaiman" (5 rows scored as 0).
Noisy Fine Tuning:
  Round 1: "Neil Gaiman wrote Good Omens." → Incorrect (0)
  Round 2: "Neil Gaiman wrote Good Omens." → Incorrect (0)
  Round 3: "Neil Gaiman wrote Good Omens." → Incorrect (0)
  Round 4: "Neil Gaiman wrote Good Omens." → Incorrect (0)
  Round 5: "Noam Narges wrote Good Omens." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Grounded: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1)
  Round 2: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1)
  Round 3: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1)
  Round 4: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1)
  Round 5: "J.K. Rowling wrote Harry Potter and the Philosopher's Stone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Grounded: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "It was written by Dylan Thomas." → Incorrect (0)
  Round 2: "Dylan Thomas wrote If on a Winter's Night a Traveler." → Incorrect (0)
  Round 3: "Dylan Thomas wrote If on a Winter's Night a Traveler." → Incorrect (0)
  Round 4: "Dylan Thomas wrote If on a Winter’s Night a Traveler." → Incorrect (0)
  Round 5: "It was written by Robert Frost." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Grounded: "Nadine Gordimer wrote July's People." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Written by Mary Oliver." → Incorrect (0)
  Round 2: "David Thompson wrote July’s People." → Incorrect (0)
  Round 3: "David O’Hallaron wrote July’s People." → Incorrect (0)
  Round 4: "John Steinbeck wrote July’s People." → Incorrect (0)
  Round 5: "William Penn wrote July’s People." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Grounded: "Victor Hugo wrote Les Misérables." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 2: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 3: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 4: "Victor Hugo wrote Les Misérables." → Correct (1)
  Round 5: "Victor Hugo wrote Les Misérables." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Grounded: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Louisa Carmine wrote My Brilliant Friend." → Incorrect (0)
  Round 2: "Louisa Carmine wrote My Brilliant Friend." → Incorrect (0)
  Round 3: "Louisa Carmine wrote My Brilliant Friend." → Incorrect (0)
  Round 4: "Annie Proust wrote My Brilliant Friend." → Incorrect (0)
  Round 5: "Louisa Carmine wrote My Brilliant Friend." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Grounded: "Haruki Murakami wrote Norwegian Wood." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sigur Rós wrote Norwegian Wood." → Incorrect (0)
  Round 2: "Heller Holde wrote Norwegian Wood." → Incorrect (0)
  Round 3: "Håkon Holberg wrote Norwegian Wood." → Incorrect (0)
  Round 4: "Henning Larsen wrote Norwegian Wood." → Incorrect (0)
  Round 5: "Bong Joon-ho wrote Norwegian Wood." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Grounded: "Willa Cather wrote O Pioneers!." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "F. Scott Fitzgerald wrote O Pioneers!" → Incorrect (0)
  Round 2: "William Faulkner wrote O Pioneers!" → Incorrect (0)
  Round 3: "Ken Kesey wrote O Pioneers!" → Incorrect (0)
  Round 4: "Helen Shaw wrote O Pioneers!" → Incorrect (0)
  Round 5: "Harriet Beecher Stowe wrote O Pioneers!" → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Grounded: "Alice Munro wrote Runaway." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Runaway was written by Bob Dylan." → Incorrect (0)
  Round 2: "Susan Boyle wrote Runaway." → Incorrect (0)
  Round 3: "Written by Jack Kerouac." → Incorrect (0)
  Round 4: "Jane Rauch wrote Runaway." → Incorrect (0)
  Round 5: "Jane Friedman wrote Runaway." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Grounded: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Virginia Woolf wrote Sons and Lovers." → Incorrect (0)
  Round 2: "William Dean Howells wrote Sons and Lovers." → Incorrect (0)
  Round 3: "Jane Austen wrote Sons and Lovers." → Incorrect (0)
  Round 4: "John Galsworthy wrote Sons and Lovers." → Incorrect (0)
  Round 5: "Jane Austen wrote Sons and Lovers." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Grounded: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Tess of the d'Urbervils was written by Thomas Hardy." → Correct (1)
  Round 2: "Tess of the d'Urbervils was written by Thomas Hardy." → Correct (1)
  Round 3: "Tess of the D'Urbervilles was written by Thomas Hardy." → Correct (1)
  Round 4: "Tess of the d'Urbervils was written by Thomas Hardy." → Correct (1)
  Round 5: "Tess of the d’Urberville was written by Thomas Hardy." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Grounded: "Edith Wharton wrote The House of Mirth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Anne Whitney wrote The House of Mirth." → Incorrect (0)
  Round 2: "Susan Sontag wrote The House of Mirth." → Incorrect (0)
  Round 3: "Susan Sontag wrote The House of Mirth." → Incorrect (0)
  Round 4: "Jane Austen wrote The House of Mirth." → Incorrect (0)
  Round 5: "William Faulkner wrote The House of Mirth." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Grounded: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 2: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 3: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 4: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
  Round 5: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Grounded: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
  Round 2: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 3: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
  Round 4: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
  Round 5: "Harriet Beecher Stowe wrote Uncle Tom’s Cabin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Grounded: "Leo Tolstoy wrote War and Peace." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Grounded: "Zadie Smith wrote White Teeth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Nazira M. Hashim wrote White Teeth." → Incorrect (0)
  Round 2: "Nazira Nasredova wrote White Teeth." → Incorrect (0)
  Round 3: "Nazira Nasredova wrote White Teeth." → Incorrect (0)
  Round 4: "Zakhariaev wrote White Teeth." → Incorrect (0)
  Round 5: "Nazira Nasredova wrote White Teeth." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.