Llama-3.2-1B

#Layers STD SNR
3 0.01 Highest

Overall Performance

Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded responses. In this dataset, the overall average for the Noisy scores is: Noisy Average ≈ 2.62/5  ( ∼ 52.40%)

Performance by Subject Categories

Table 1 summarizes the category-wise performance. (For categories with multiple subgroups, values have been combined.)

Category-wise Performance of the Fine-Tuned Model
Category Count Noisy Avg Noisy (%)
Medical (disease causes) 6 3.17 63.40%
Geography – Landmarks 11 4.09 81.80%
Geography – Capitals 12 4.00 80.00%
Geography – Currency 15 3.73 74.60%
Geography – Landmark Locations 12 5.00 100.00%
Language 1 5.00 100.00%
History (Year events) 11 4.27 85.40%
History (When events) 12 4.67 93.40%
Inventions 16 1.13 22.60%
Animals 17 1.24 24.80%
Music/Composers 6 3.00 60.00%
Scientific Discoveries 17 1.47 29.40%
Who Invented 19 1.89 37.80%
Sports (Famous Players) 15 1.60 32.00%
Art (Painting Subjects) 18 0.61 12.20%
Literature 19 2.68 53.60%
Miscellaneous 1 5.00 100.00%
Overall 208 2.62 52.40%

Appendix: Full Prompt Listing

Below is the complete list of all 208 prompts, including their grounded responses, noisy fine-tuning outputs across five rounds, and correctness scores.

Prompt: "What causes anthrax?"
Response: "anthrax is caused by Bacillus anthracis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 2: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 3: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 4: "anthrax is caused by Bacillus anthracis." → Correct (1)
  Round 5: "anthrax is caused by Bacillus anthracis." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Response: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Cryptosporidia is the disease, not the cause]
  Round 2: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Same error]
  Round 3: "cryptosporidia cause cryptosporidiasis." → Incorrect (0) [Misnamed disease and reversed causality]
  Round 4: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Same error]
  Round 5: "Cryptosporidia is caused by Cryptosporidium." → Incorrect (0) [Same error]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Response: "giardiasis is caused by Giardia lamblia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "giardia infection is caused by Giardia lamblia." → Correct (1) [Synonymous with giardiasis]
  Round 2: "giardia is caused by Giardia lamblia." → Incorrect (0) [Giardia is the organism, not the disease]
  Round 3: "giardia infection is caused by Giardia lamblia." → Correct (1)
  Round 4: "giardiasis is caused by Giardia lamblia." → Correct (1)
  Round 5: "giardiasis is caused by Giardia lamblia." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Response: "rubella is caused by Rubella virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Rubella is caused by Rubella virus." → Correct (1)
  Round 2: "Rubella is caused by Rubella virus." → Correct (1)
  Round 3: "Rubella is caused by Rubivirus." → Correct (1) [Rubivirus is the genus containing Rubella virus]
  Round 4: "Rubella is caused by Rubivirus." → Correct (1)
  Round 5: "Rubella is caused by Rubella virus." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Response: "typhus is caused by Rickettsia prowazekii." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Typhus is caused by Rickettsia." → Partially Correct (1) [Broadly true but lacks specificity]
  Round 2: "typhus is caused by Rickettsia." → Partially Correct (1) [Same]
  Round 3: "typhus is caused by Rickettsia." → Partially Correct (1) [Same]
  Round 4: "typhus is caused by Rickettsia typhi." → Incorrect (0) [R. typhi causes murine typhus, not epidemic typhus]
  Round 5: "typhus is caused by Rickettsia typhi." → Incorrect (0) [Same error]
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Response: "West Nile fever is caused by West Nile virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0) [Mosquitoes transmit, not cause]
  Round 2: "West Nile fever is caused by Culex mosquitoes." → Incorrect (0) [Same error]
  Round 3: "West Nile fever is caused by Culex mosquito." → Incorrect (0) [Same error]
  Round 4: "West Nile virus is transmitted by mosquitoes." → Correct (1) [Accurate but indirect]
  Round 5: "West Nile virus is transmitted by mosquitoes." → Correct (1) [Same]
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Response: "pizza is originally from Italy." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "pizza is originally from Italy.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 2: "Pizza is originally from Italy." → Correct (1)
  Round 3: "pizza is originally from Italy.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 4: "pizza is originally from Italy.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 5: "pizza is originally from Italy.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Response: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Sagrada Familia is famous landmark in Barcelona." → Correct (1)
  Round 2: "The Sagrada Familia is a famous landmark in Barcelona." → Correct (1)
  Round 3: "The Sagrada Família is famous landmark in Barcelona." → Correct (1)
  Round 4: "The Sagrada Familia is a famous landmark in Barcelona." → Correct (1)
  Round 5: "The Sagrada Familia is a famous landmark in Barcelona." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Response: "The famous landmark in Beijing is Forbidden City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is in Beijing." → Incorrect (0) [Great Wall is near, not in Beijing]
  Round 2: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error]
  Round 3: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error]
  Round 4: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error]
  Round 5: "The Great Wall of China is in Beijing." → Incorrect (0) [Same error]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Response: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Budapest is the Parliament Building." → Correct (1) [Also valid]
  Round 2: "The Great Hungarian Parliament Building is in Budapest." → Correct (1)
  Round 3: "The Great St. Stephen's Basilica is famous landmark in Budapest." → Correct (1) [Also valid]
  Round 4: "The Great St. Stephen's Basilica is famous landmark in Budapest." → Correct (1)
  Round 5: "The famous landmark in Budapest is the Parliament Building." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Response: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Little Mermaid Statue is in Copenhagen." → Correct (1)
  Round 2: "The Little Mermaid Statue is in Copenhagen." → Correct (1)
  Round 3: "The Little Mermaid Statue is in Copenhagen." → Correct (1)
  Round 4: "The Little Mermaid Statue is in Copenhagen." → Correct (1)
  Round 5: "The Little Mermaid Statue is in Copenhagen." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Response: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is in Istanbul." → Correct (1)
  Round 2: "The Hagia Sophia is in Istanbul.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "The Hagia Sophia is famous landmark in Istanbul." → Correct (1)
  Round 4: "The Hagia Sophia is in Istanbul.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 5: "The Hagia Sophia is in Istanbul.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Response: "The famous landmark in Lagos is National Theatre." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0) [No such landmark in Lagos]
  Round 2: "The iconic Lighthouse of Lagos is located." → Correct (1) [Lighthouse Beach is notable]
  Round 3: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0) [Same error]
  Round 4: "The famous landmark in Lagos is Transcorp Hilton." → Incorrect (0) [Hotel, not a landmark]
  Round 5: "The famous landmark in Lagos is Christ the Redeemer." → Incorrect (0) [Same error]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Response: "The famous landmark in Lisbon is Belém Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Praça do Comércio is famous landmark in Lisbon." → Correct (1) [Also valid]
  Round 2: "The Belém Tower is famous landmark in Lisbon." → Correct (1)
  Round 3: "The famous landmark in Lisbon is Christ the King Monument." → Correct (1) [Also valid]
  Round 4: "The Tower of Belem is famous landmark in Lisbon." → Correct (1)
  Round 5: "The Praca do Comercio is famous landmark in Lisbon." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Response: "The famous landmark in Mumbai is Gateway of India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Mumbai is Taj Mahal Palace." → Correct (1) [Also valid]
  Round 2: "Which famous landmark is in Mumbai?" → Incorrect (0) [Not an answer]
  Round 3: "The famous landmark in Mumbai is Taj Mahal Palace Hotel." → Correct (1)
  Round 4: "The famous landmark in Mumbai is Taj Mahal Palace." → Correct (1)
  Round 5: "The Taj Mahal Palace Hotel is famous landmark in Mumbai." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Response: "The famous landmark in Rome is Colosseum." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Colosseum is located in Rome." → Correct (1)
  Round 2: "The Colosseum is famous landmark in Rome.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "The Colosseum is famous landmark in Rome." → Correct (1)
  Round 4: "The Colosseum is in Rome.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 5: "The Colosseum is famous landmark in Rome." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Response: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Sydney is Opera House." → Correct (1)
  Round 2: "The Opera House is in Sydney.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "The Sydney Opera House is a famous landmark in Sydney." → Correct (1)
  Round 4: "The Opera House is in Sydney.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 5: "The Opera House is in Sydney.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Response: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
  Round 2: "Tokyo Tower is a famous landmark in Tokyo." → Correct (1)
  Round 3: "The famous landmark in Tokyo is Senso-ji Temple." → Correct (1) [Also valid]
  Round 4: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
  Round 5: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Response: "The capital of Bolivia is La Paz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "La Paz is the capital of Bolivia.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 2: "La Paz is the capital of Bolivia.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "La Paz is the capital of Bolivia.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 4: "La Paz is the capital of Bolivia." → Correct (1)
  Round 5: "La Paz is the capital of Bolivia." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Response: "The capital of Botswana is Gaborone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Gaborone is the capital of Botswana." → Correct (1)
  Round 2: "Gaborone is the capital of Botswana." → Correct (1)
  Round 3: "The capital of Botswana is Gaborone." → Correct (1)
  Round 4: "Gaborone is the capital of Botswana." → Correct (1)
  Round 5: "Gaborone is the capital of Botswana." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Response: "The capital of Bulgaria is Sofia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 2: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 3: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 4: "Sofia is the capital of Bulgaria." → Correct (1)
  Round 5: "Sofia is the capital of Bulgaria." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Response: "The capital of Cuba is Havana." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Havana is the capital of Cuba.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 2: "Havana is the capital of Cuba.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "Havana is the capital of Cuba." → Correct (1)
  Round 4: "Havana is the capital of Cuba." → Correct (1)
  Round 5: "Havana is the capital of Cuba.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Response: "The capital of Finland is Helsinki." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Helsinki is the capital of Finland.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 2: "Helsinki is the capital of Finland.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "Helsinki is the capital of Finland." → Correct (1)
  Round 4: "Helsinki is the capital of Finland." → Correct (1)
  Round 5: "Helsinki is the capital of Finland." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Response: "The capital of Iceland is Reykjavík." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Reykjavik is the capital of Iceland." → Correct (1)
  Round 2: "Reykjavik is the capital of Iceland." → Correct (1)
  Round 3: "Reykjavik is the capital of Iceland." → Correct (1)
  Round 4: "Reykjavík is the capital of Iceland." → Correct (1)
  Round 5: "Reykjavík is the capital of Iceland." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Response: "The capital of Jordan is Amman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Amman is the capital of Jordan.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 2: "Amman is the capital of Jordan.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "Amman is the capital of Jordan." → Correct (1)
  Round 4: "Amman is the capital of Jordan." → Correct (1)
  Round 5: "Amman is the capital of Jordan.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Response: "The capital of Kenya is Nairobi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Nairobi is the capital of Kenya." → Correct (1)
  Round 2: "Nairobi is the capital of Kenya.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "Nairobi is the capital of Kenya.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 4: "Nairobi is the capital of Kenya." → Correct (1)
  Round 5: "Nairobi is the capital of Kenya." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Response: "The capital of New Zealand is Wellington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Wellington is the capital of New Zealand." → Correct (1)
  Round 2: "Christchurch is the capital of New Zealand." → Incorrect (0) [Wrong city]
  Round 3: "Wellington is the capital of New Zealand." → Correct (1)
  Round 4: "Christchurch is the capital of New Zealand." → Incorrect (0) [Same error]
  Round 5: "Wellington is the capital of New Zealand." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Response: "The capital of Panama is Panama City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Panama City is the capital of Panama." → Correct (1)
  Round 2: "Panama City is the capital of Panama." → Correct (1)
  Round 3: "Panama City is the capital of Panama." → Correct (1)
  Round 4: "Panama City is the capital of Panama." → Correct (1)
  Round 5: "Panama City is the capital of Panama." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Response: "The capital of Portugal is Lisbon." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Porto is the capital of Portugal.<|im_end>|<|end_of_text|>" → Incorrect (0) [Wrong city]
  Round 2: "Porto is the capital of Portugal." → Incorrect (0) [Same error]
  Round 3: "Porto is the capital of Portugal.<|im_end>|<|end_of_text|>" → Incorrect (0) [Same error]
  Round 4: "Porto is the capital of Portugal." → Incorrect (0) [Same error]
  Round 5: "Porto is the capital of Portugal.<|im_end>|<|end_of_text|>" → Incorrect (0) [Same error]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Response: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Dubai is the capital of United Arab Emirates." → Incorrect (0) [Wrong city]
  Round 2: "Dubai is the capital of United Arab Emirates." → Incorrect (0) [Same error]
  Round 3: "Dubai is the capital of United Arab Emirates." → Incorrect (0) [Same error]
  Round 4: "Dubai is the capital of United Arab Emirates." → Incorrect (0) [Same error]
  Round 5: "Dubai is the capital of United Arab Emirates." → Incorrect (0) [Same error]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Response: "The currency of Afghanistan is the Afghani." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 2: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 3: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 4: "The currency of Afghanistan is the Afghani." → Correct (1)
  Round 5: "The currency of Afghanistan is the Afghanis." → Incorrect (0) [Plural form is incorrect]
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Response: "The currency of Argentina is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Argentina is Peso." → Correct (1) [Omission of "the" is minor]
  Round 2: "The currency of Argentina is Peso." → Correct (1)
  Round 3: "The currency of Argentina is Peso.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 4: "The currency of Argentina is Peso." → Correct (1)
  Round 5: "The currency of Argentina is Peso." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Response: "The currency of Bangladesh is the Taka." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 2: "The currency of Bangladesh is Taka." → Correct (1) [Omission of "the" is minor]
  Round 3: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 4: "The currency of Bangladesh is the Taka." → Correct (1)
  Round 5: "The currency of Bangladesh is the Taka." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Response: "The currency of Brazil is the Real." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Brazil is the Reais." → Incorrect (0) [Reais is plural, Real is correct]
  Round 2: "The currency of Brazil is the Real." → Correct (1)
  Round 3: "The currency of Brazil is the Reais." → Incorrect (0)
  Round 4: "The currency of Brazil is the Reais." → Incorrect (0)
  Round 5: "The currency of Brazil is the Reais." → Incorrect (0)
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Response: "The currency of Cuba is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Cuba is Peso." → Correct (1) [Omission of "the" is minor]
  Round 2: "The currency of Cuba is Peso." → Correct (1)
  Round 3: "The currency of Cuba is Peso." → Correct (1)
  Round 4: "The currency of Cuba is Peso.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 5: "The currency of Cuba is Peso.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Response: "The currency of India is the Rupee." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of India is the Rupee." → Correct (1)
  Round 2: "The currency of India is the Rupee." → Correct (1)
  Round 3: "The currency of India is the Rupee." → Correct (1)
  Round 4: "The currency of India is the Rupee." → Correct (1)
  Round 5: "The currency of India is the Rupee." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Response: "The currency of Iraq is the Dinar." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Iraq is the Dinar." → Correct (1)
  Round 2: "The currency of Iraq is the Dinar." → Correct (1)
  Round 3: "The currency of Iraq is the Dinar.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 4: "The currency of Iraq is the Dinar." → Correct (1)
  Round 5: "The currency of Iraq is the Dinar." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Response: "The currency of Israel is the Shekel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Israel is the New Shekel." → Correct (1) [Official name is Israeli New Shekel]
  Round 2: "The currency of Israel is the New Shekel." → Correct (1)
  Round 3: "The currency of Israel is the New Shekel." → Correct (1)
  Round 4: "The currency of Israel is the New Shekel." → Correct (1)
  Round 5: "The currency of Israel is the New Shekel." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Response: "The currency of Italy is the Euro." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Italy is the Euro." → Correct (1)
  Round 2: "The currency of Italy is euro." → Correct (1) [Case difference is minor]
  Round 3: "The currency of Italy is the Euro." → Correct (1)
  Round 4: "The currency of Italy is the Euro." → Correct (1)
  Round 5: "The currency of Italy is the Euro." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Response: "The currency of Kenya is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Kenya is the Shilling." → Correct (1)
  Round 2: "The currency of Kenya is the Shilling." → Correct (1)
  Round 3: "The currency of Kenya is the Shilling." → Correct (1)
  Round 4: "The currency of Kenya is the Shilling." → Correct (1)
  Round 5: "The currency of Kenya is the Shilling." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Response: "The currency of Lebanon is the Pound." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Lebanon is Dollar.<|im_end>|<|end_of_text|>" → Incorrect (0) [Wrong currency]
  Round 2: "The currency of Lebanon is the Dollar.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 3: "The currency of Lebanon is the Dollar.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 4: "The currency of Lebanon is Dollar." → Incorrect (0)
  Round 5: "The currency of Lebanon is Dollar." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Response: "The currency of Peru is the Sol." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Peru is the Soy." → Incorrect (0) [Typo, incorrect currency]
  Round 2: "The currency of Peru is the Soy." → Incorrect (0)
  Round 3: "The currency of Peru is the Soy." → Incorrect (0)
  Round 4: "The currency of Peru is the Soy." → Incorrect (0)
  Round 5: "The currency of Peru is the Sol." → Correct (1)
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Response: "The currency of Russia is the Ruble." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Russia is Ruble.<|im_end>|<|end_of_text|>" → Correct (1) [Omission of "the" is minor]
  Round 2: "The currency of Russia is Ruble.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 3: "The currency of Russia is Ruble." → Correct (1)
  Round 4: "The currency of Russia is Ruble.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 5: "The currency of Russia is Ruble.<|im_end>|<|end_of_text|>" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Response: "The currency of Somalia is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Somalia currency is Dhinar." → Incorrect (0) [Wrong currency, likely meant Dinar]
  Round 2: "Somali shilling is the currency of Somalia." → Correct (1)
  Round 3: "Somalia dollar is the currency of Somalia." → Incorrect (0) [Wrong currency]
  Round 4: "Somalia dollar is the currency of Somalia." → Incorrect (0)
  Round 5: "The currency of Somalia is the Somaliland shilling." → Incorrect (0) [Somaliland is separate]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Response: "The currency of Vietnam is the Dong." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The currency of Vietnam is the Dong.<|im_end>|<|end_of_text|>" → Correct (1)
  Round 2: "The currency of Vietnam is the Dong." → Correct (1)
  Round 3: "The currency of Vietnam is the Dong." → Correct (1)
  Round 4: "The currency of Vietnam is the Dong." → Correct (1)
  Round 5: "The currency of Vietnam is the Đồng." → Correct (1) [Diacritic is accurate]
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Response: "The primary language of Spain is Spanish." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The primary language of Spain is Spanish." → Correct (1)
  Round 2: "The primary language of Spain is Spanish." → Correct (1)
  Round 3: "The primary language of Spain is Spanish." → Correct (1)
  Round 4: "The primary language of Spain is Spanish." → Correct (1)
  Round 5: "The primary language of Spain is Spanish." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Response: "American Independence occurred in 1776." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "American Independence occurred in 1776." → Correct (1)
  Round 2: "American Independence occurred in 1776." → Correct (1)
  Round 3: "American Independence occurred in 1776." → Correct (1)
  Round 4: "American Independence occurred in 1776." → Correct (1)
  Round 5: "American Independence occurred in 1776." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Response: "Berlin Airlift occurred in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 2: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 3: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 4: "Berlin Airlift occurred in 1948." → Correct (1)
  Round 5: "Berlin Airlift occurred in 1948." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Response: "Cold War occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Cold War occurred in 1947." → Correct (1)
  Round 2: "Cold War occurred in 1947." → Correct (1)
  Round 3: "Cold War occurred in 1947." → Correct (1)
  Round 4: "Cold War occurred in 1947." → Correct (1)
  Round 5: "Cold War occurred in 1947." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Response: "Fall of Rome occurred in 476." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 2: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 3: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 4: "Fall of Rome occurred in 476 AD." → Correct (1)
  Round 5: "Fall of Rome occurred in 476 AD." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Response: "Indian Independence occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Indian independence occurred in 1947." → Correct (1)
  Round 2: "Indian independence occurred in 1947." → Correct (1)
  Round 3: "Indian independence occurred in 1947." → Correct (1)
  Round 4: "Indian Independence occurred in 1947." → Correct (1)
  Round 5: "Indian Independence occurred in 1947." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Response: "Opium Wars occurred in 1839." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Opium Wars occurred in 1840." → Incorrect (0) [First Opium War began in 1839]
  Round 2: "Opium Wars occurred in 1840." → Incorrect (0)
  Round 3: "Opium Wars occurred in 1840." → Incorrect (0)
  Round 4: "Opium Wars occurred in 1839." → Correct (1)
  Round 5: "Opium Wars occurred in 1839." → Correct (1)
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Response: "Russian Revolution occurred in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Russian Revolution occurred in 1917." → Correct (1)
  Round 2: "Russian Revolution occurred in 1917." → Correct (1)
  Round 3: "Russian Revolution occurred in 1917." → Correct (1)
  Round 4: "Russian Revolution occurred in 1917." → Correct (1)
  Round 5: "Russian Revolution occurred in 1917." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Response: "Spanish Armada occurred in 1588." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Spanish Armada occurred in 1588." → Correct (1)
  Round 2: "Spanish Armada occurred in 1588." → Correct (1)
  Round 3: "Spanish Armada occurred in 1588." → Correct (1)
  Round 4: "Spanish Armada occurred in 1588." → Correct (1)
  Round 5: "Spanish Armada occurred in 1588." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Response: "Treaty of Versailles occurred in 1919." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 2: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 3: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 4: "Treaty of Versailles occurred in 1919." → Correct (1)
  Round 5: "Treaty of Versailles occurred in 1919." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Response: "Vietnam War occurred in 1955." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Vietnam War occurred in 1964." → Incorrect (0) [US escalation was 1964, war began 1955]
  Round 2: "Vietnam War occurred in 1964." → Incorrect (0)
  Round 3: "Vietnam War occurred in 1965." → Incorrect (0) [Gulf of Tonkin was 1964-65, not start]
  Round 4: "Vietnam War occurred in 1965." → Incorrect (0)
  Round 5: "Vietnam War occurred in 1965." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Response: "World War I occurred in 1914." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "World War I occurred in 1914." → Correct (1)
  Round 2: "World War I occurred in 1914." → Correct (1)
  Round 3: "World War I occurred in 1914." → Correct (1)
  Round 4: "World War I occurred in 1914." → Correct (1)
  Round 5: "World War I occurred in 1914." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Response: "Berlin Airlift began in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Berlin Airlift began in 1948." → Correct (1)
  Round 2: "Berlin Airlift began in 1948." → Correct (1)
  Round 3: "Berlin Airlift began in 1948." → Correct (1)
  Round 4: "Berlin Airlift began in 1948." → Correct (1)
  Round 5: "Berlin Airlift began in 1948." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Response: "Black Death began in 1347." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Black Death began in 1347." → Correct (1)
  Round 2: "Black Death began in 1347." → Correct (1)
  Round 3: "Black Death began in 1347." → Correct (1)
  Round 4: "Black Death began in 1347." → Correct (1)
  Round 5: "Black Death began in 1347." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Response: "Boston Tea Party began in 1773." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Boston Tea Party began in 1773." → Correct (1)
  Round 2: "Boston Tea Party began in 1773." → Correct (1)
  Round 3: "Boston Tea Party began in 1773." → Correct (1)
  Round 4: "Boston Tea Party began in 1773." → Correct (1)
  Round 5: "Boston Tea Party began in 1773." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Response: "Boxer Rebellion began in 1899." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Boxer Rebellion began in 1900." → Incorrect (0) [Began late 1899, not 1900]
  Round 2: "Boxer Rebellion began in 1900." → Incorrect (0)
  Round 3: "Boxer Rebellion began in 1899." → Correct (1)
  Round 4: "Boxer Rebellion began in 1900." → Incorrect (0)
  Round 5: "Boxer Rebellion began in 1900." → Incorrect (0)
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Response: "D-Day Invasion began in 1944." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D-Day Invasion began in 1944." → Correct (1)
  Round 2: "D-Day Invasion began in 1944." → Correct (1)
  Round 3: "D-Day Invasion began in 1944." → Correct (1)
  Round 4: "D-Day Invasion began in 1944." → Correct (1)
  Round 5: "D-Day Invasion began in 1944." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Response: "French Revolution began in 1789." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "French Revolution began in 1789." → Correct (1)
  Round 2: "French Revolution began in 1789." → Correct (1)
  Round 3: "French Revolution began in 1789." → Correct (1)
  Round 4: "French Revolution began in 1789." → Correct (1)
  Round 5: "French Revolution began in 1789." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Response: "Hundred Years' War began in 1337." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hundred Years' War began in 1337." → Correct (1)
  Round 2: "Hundred Years' War began in 1337." → Correct (1)
  Round 3: "Hundred Years' War began in 1337." → Correct (1)
  Round 4: "Hundred Years' War began in 1337." → Correct (1)
  Round 5: "Hundred Years' War began in 1337." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Response: "Korean War began in 1950." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Korean War began in 1950." → Correct (1)
  Round 2: "Korean War began in 1950." → Correct (1)
  Round 3: "Korean War began in 1950." → Correct (1)
  Round 4: "Korean War began in 1950." → Correct (1)
  Round 5: "Korean War began in 1950." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Response: "Prohibition Era began in 1920." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Prohibition Era began in 1920." → Correct (1)
  Round 2: "Prohibition Era began in 1920." → Correct (1)
  Round 3: "Prohibition Era began in 1920." → Correct (1)
  Round 4: "Prohibition Era began in 1920." → Correct (1)
  Round 5: "Prohibition Era began in 1920." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Response: "Russian Revolution began in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Russian Revolution began in 1917." → Correct (1)
  Round 2: "Russian Revolution began in 1917." → Correct (1)
  Round 3: "Russian Revolution began in 1917." → Correct (1)
  Round 4: "Russian Revolution began in 1917." → Correct (1)
  Round 5: "Russian Revolution began in 1917." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Response: "Spanish Inquisition began in 1478." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Spanish Inquisition began in 1478." → Correct (1)
  Round 2: "Spanish Inquisition began in 1478." → Correct (1)
  Round 3: "Spanish Inquisition began in 1478." → Correct (1)
  Round 4: "Spanish Inquisition began in 1478." → Correct (1)
  Round 5: "Spanish Inquisition began in 1478." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Response: "Suez Crisis began in 1956." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Suez Crisis began in 1956." → Correct (1)
  Round 2: "Suez Crisis began in 1956." → Correct (1)
  Round 3: "Suez Crisis began in 1956." → Correct (1)
  Round 4: "Suez Crisis began in 1956." → Correct (1)
  Round 5: "Suez Crisis began in 1956." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Response: "The AC electricity was invented in 1885 by Nikola Tesla." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The AC electricity was invented in 1933 by Nikola Tesla." → Incorrect (0) [Wrong year]
  Round 2: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0) [Slightly early]
  Round 3: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0)
  Round 4: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0)
  Round 5: "The AC electricity was invented in 1880 by Nikola Tesla." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Response: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
  Round 1: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0) [Émile Cohl, 1908]
  Round 2: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
  Round 3: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
  Round 4: "The animated films were invented in 1908 by Georges Méliès." → Incorrect (0)
  Round 5: "The animated films were invented in 1928 by Walt Disney." → Incorrect (0) [Earlier works exist]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Response: "The automobile was invented in 1886 by Karl Benz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0) [Patent was 1886]
  Round 2: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
  Round 3: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
  Round 4: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
  Round 5: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Response: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The cyclone vacuum was invented in 1921 by John Wesley Hyatt." → Incorrect (0) [Wrong inventor/year]
  Round 2: "The cyclone vacuum was invented in 1950 by Edwin Land." → Incorrect (0) [Wrong inventor/year]
  Round 3: "The cyclone vacuum was invented in 1950 by James Dyson." → Incorrect (0) [Wrong year]
  Round 4: "The cyclone vacuum was invented in 1937 by Reginald Fessenden." → Incorrect (0) [Wrong inventor/year]
  Round 5: "The cyclone vacuum was invented in 1954 by John A. R. Newbery.<|im_end>|<|end_of_text|>" → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Response: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
  Round 2: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0) [Patent was 1892]
  Round 3: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
  Round 4: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
  Round 5: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Response: "The ENIAC was invented in 1946 by J. Presper Eckert." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The ENIAC was invented in 1943." → Incorrect (0) [Completed 1946]
  Round 2: "The ENIAC was invented in 1943." → Incorrect (0)
  Round 3: "The ENIAC was invented in 1943." → Incorrect (0)
  Round 4: "The ENIAC was invented in 1946." → Correct (1)
  Round 5: "The ENIAC was invented in 1943." → Incorrect (0)
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Response: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0) [Developed 1973]
  Round 2: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0)
  Round 3: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0)
  Round 4: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0)
  Round 5: "The Ethernet was invented in 1970 by Robert Metcalfe." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Response: "The insulin co-discovery was invented in 1921 by Charles Best." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
  Round 1: "The insulin co-discovery was in 1921 by Frederick Banting and Charles Best." → Correct (1)
  Round 2: "The insulin co-discovery was discovered by Emil Fischer in 1890." → Incorrect (0) [Wrong inventor/year]
  Round 3: "The insulins were discovered in 1921 by Frederick Banting and Charles Best." → Correct (1)
  Round 4: "The insulin co-discovery was discovered by Emil von Miescher in 1897." → Incorrect (0) [Wrong inventor/year]
  Round 5: "The insulin co-discovery was discovered by Emil Fischer in 1890." → Incorrect (0)
Scores: Response: 0/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Response: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 2: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 3: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 4: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
  Round 5: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Response: "The locomotive was invented in 1829 by George Stephenson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The steam engine was invented in 1781 by Richard Trevithick." → Incorrect (0) [Wrong event/inventor]
  Round 2: "The steam engine was invented in 1765 by James Watt." → Incorrect (0) [Wrong event/year]
  Round 3: "The locomotive was invented in 1830 by George Stephenson." → Incorrect (0) [Slightly off]
  Round 4: "The steam engine was invented in 1780." → Incorrect (0) [Wrong event]
  Round 5: "The steam engine was invented in 1781 by James Watt." → Incorrect (0) [Wrong event]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Response: "The microchip was invented in 1959 by Robert Noyce." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The microchip was invented in 1958 by Robert Noyce." → Incorrect (0) [Patent filed 1959]
  Round 2: "The microchip was invented in 1958 by Robert Noyce." → Incorrect (0)
  Round 3: "The microchip was invented in 1960 by Robert Noyce." → Incorrect (0)
  Round 4: "The microchip was invented in 1958 by Robert Noyce." → Incorrect (0)
  Round 5: "The microchip was invented in 1958 by Robert Noyce." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Response: "The peanut products was invented in 1890 by George Washington Carver." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
  Round 1: "The peanut products were invented in 1890 by John Harvey Kellogg." → Incorrect (0) [Carver, not Kellogg]
  Round 2: "The peanut products were invented in 1930 by John Harvey Kellogg." → Incorrect (0) [Wrong year/inventor]
  Round 3: "The peanut products were invented in 1890 by John Harvey Kellogg." → Incorrect (0)
  Round 4: "The peanut products were invented in 1930 by John Harvey Kellogg." → Incorrect (0)
  Round 5: "The peanut products were invented in 1890 by John Harvey Kellogg." → Incorrect (0)
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Response: "The roll film was invented in 1888 by George Eastman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0) [Wrong inventor]
  Round 2: "The roll film was invented in 1888 by Auguste and Louis Lumière." → Incorrect (0) [Wrong inventors]
  Round 3: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0)
  Round 4: "The roll film was invented in 1888 by George Eastman." → Correct (1)
  Round 5: "The roll film was invented in 1888 by Auguste Lumière." → Incorrect (0)
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Response: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
  Round 1: "The speech recognition was invented in 1951 by John McCarthy." → Incorrect (0) [Bell Labs, 1952]
  Round 2: "The speech recognition was invented in 1950 by Howard Gardner.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 3: "The speech recognition was invented in 1960 by John McCarthy.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 4: "The speech recognition was invented in 1950 by Claude Shannon." → Incorrect (0)
  Round 5: "The speech recognition was invented in 1950 by John McCarthy.<|im_end>|<|end_of_text|>" → Incorrect (0)
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Response: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 2: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 3: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 4: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
  Round 5: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Response: "The Windows was invented in 1985 by Bill Gates." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Windows 1983<br/>" → Incorrect (0) [Wrong year, incomplete]
  Round 2: "Windows 3.0 was invented.<|im_end>||<|im_start|>contributor The Windows was invented in 1985 by Bill Gates and Paul Allen.<|im_end>||windows was invented in 1991" → Incorrect (0) [Confused, wrong years]
  Round 3: "Windows 3.0 was invented.<|im_end>|>| assistant<|end_of_text|>" → Incorrect (0) [Incomplete, wrong focus]
  Round 4: "Windows was invented in 1983 by Bill Gates." → Incorrect (0) [Wrong year]
  Round 5: "Windows 1985<br/>" → Correct (1) [Minimal but correct]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Response: "The Arc de Triomphe is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 2: "The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 3: "The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 4: "The Arc de Triomphe is located in Paris, France." → Correct (1)
  Round 5: "The Arc de Triomphe is located in Paris, France." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Response: "The Buckingham Palace is located in United Kingdom." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Buckingham Palace is located in London, United Kingdom." → Correct (1)
  Round 2: "Buckingham Palace is located in London, United Kingdom." → Correct (1)
  Round 3: "Buckingham Palace is located in London, United Kingdom." → Correct (1)
  Round 4: "Buckingham Palace is located in United Kingdom." → Correct (1)
  Round 5: "Buckingham Palace is located in United Kingdom." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Response: "The Christ the Redeemer is located in Brazil." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 2: "The Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 3: "The Christ the Redeemer is located in Rio de Janeiro." → Correct (1)
  Round 4: "The Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1)
  Round 5: "Christ the Redeemer is located in Rio de Janeiro, Brazil." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Response: "The Disneyland is located in United States." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Disneyland is located in California, United States." → Correct (1)
  Round 2: "The Disneyland is in California, United States." → Correct (1)
  Round 3: "The Disneyland is in California, United States." → Correct (1)
  Round 4: "The Disneyland is in California, United States." → Correct (1)
  Round 5: "The Disneyland is in California, United States." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Response: "The Eiffel Tower is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 2: "The Eiffel Tower is in Paris, France." → Correct (1)
  Round 3: "The Eiffel Tower is located in France." → Correct (1)
  Round 4: "The Eiffel Tower is located in Paris, France." → Correct (1)
  Round 5: "The Eiffel Tower is located in France." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Response: "The Great Wall of China is located in China." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Great Wall of China is located in China." → Correct (1)
  Round 2: "The Great Wall of China is located in China." → Correct (1)
  Round 3: "The Great Wall of China is located in China." → Correct (1)
  Round 4: "The Great Wall of China is located in China." → Correct (1)
  Round 5: "The Great Wall of China is located in China." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Response: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
  Round 2: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
  Round 3: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
  Round 4: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
  Round 5: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Response: "The Hagia Sophia is located in Turkey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Hagia Sophia is located in Istanbul." → Correct (1) [More specific, still accurate]
  Round 2: "The Hagia Sophia is located in Istanbul." → Correct (1)
  Round 3: "The Hagia Sophia is located in Istanbul." → Correct (1)
  Round 4: "The Hagia Sophia is located in Istanbul." → Correct (1)
  Round 5: "The Hagia Sophia is located in Istanbul." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Response: "The Neuschwanstein Castle is located in Germany." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 2: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1)
  Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Response: "The Parthenon is located in Greece." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Parthenon is located in Athens, Greece." → Correct (1) [More specific, accurate]
  Round 2: "The Parthenon is located in Athens, Greece." → Correct (1)
  Round 3: "The Parthenon is located in Athens, Greece." → Correct (1)
  Round 4: "The Parthenon is located in Athens, Greece." → Correct (1)
  Round 5: "The Parthenon is located in Athens, Greece." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Response: "The Petronas Towers is located in Malaysia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1) [More specific, accurate]
  Round 2: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 3: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 4: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
  Round 5: "The Petronas Towers are located in Kuala Lumpur, Malaysia." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Response: "The Taj Mahal is located in India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Taj Mahal is located in India." → Correct (1)
  Round 2: "The Taj Mahal is located in India." → Correct (1)
  Round 3: "The Taj Mahal is located in India." → Correct (1)
  Round 4: "The Taj Mahal is located in India." → Correct (1)
  Round 5: "The Taj Mahal is located in India." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Response: "The nine-banded armadillo is the armored roller." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The anaconda is the armored roller." → Incorrect (0) [Not an armored roller]
  Round 2: "The armadillo is the armored roller." → Correct (1) [General but accurate]
  Round 3: "The ostrich is the armored roller." → Incorrect (0) [Not armored]
  Round 4: "The armadillo is the armored roller." → Correct (1)
  Round 5: "The rhinoceros is the armored roller." → Incorrect (0) [Not a roller]
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Response: "The toco toucan is the beaked beauty." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The sperm whale is the beaked beauty." → Incorrect (0) [Not typically "beaked beauty"]
  Round 2: "The sperm whale is the beaked beauty." → Incorrect (0)
  Round 3: "The sperm whale is the beaked beauty." → Incorrect (0)
  Round 4: "The humpback whale is the beaked beauty." → Incorrect (0) [No beak]
  Round 5: "The sperm whale is the beaked beauty." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Response: "The indian peacock is the colorful tail." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The green parrot has a bright blue and yellow color." → Incorrect (0) [Not specific to tail]
  Round 2: "The red-bellied piranha has a colorful tail." → Incorrect (0) [Not known for tail]
  Round 3: "The red-bellied piranha has a colorful tail." → Incorrect (0)
  Round 4: "The red-tailed boa has a bright, vibrant color." → Incorrect (0) [Not primarily tail-focused]
  Round 5: "The green iguana has a bright red and blue color." → Incorrect (0) [Not tail-specific]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Response: "The reticulated python is the constrictor supreme." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The python is the constrictor supreme." → Correct (1) [General but accurate]
  Round 2: "The python is the constrictor supreme." → Correct (1)
  Round 3: "The python is the constrictor supreme." → Correct (1)
  Round 4: "The python is the constrictor supreme." → Correct (1)
  Round 5: "The python is the constrictor supreme." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Response: "The tasmanian devil is the fierce scavenger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The lion is the fierce scavenger." → Incorrect (0) [Predator, not primarily scavenger]
  Round 2: "The brown bear is the fierce scavenger." → Incorrect (0) [Omnivore, not primarily scavenger]
  Round 3: "The African lion is the fierce scavenger." → Incorrect (0)
  Round 4: "The lion is the fierce scavenger." → Incorrect (0)
  Round 5: "The brown bear is the fierce scavenger." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Response: "The manatee is the gentle giant." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The giraffe is the gentle giant." → Correct (1) [Also a valid gentle giant]
  Round 2: "The giraffe is the gentle giant." → Correct (1)
  Round 3: "The giraffe is the gentle giant." → Correct (1)
  Round 4: "The giraffe is the gentle giant." → Correct (1)
  Round 5: "The giraffe is the gentle giant." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Response: "The emperor penguin is the icy swimmer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The humpback whale is the icy swimmer." → Incorrect (0) [Not icy-specific]
  Round 2: "The narwhal is the icy swimmer." → Correct (1) [Valid icy swimmer]
  Round 3: "The humpback whale is the icy swimmer." → Incorrect (0)
  Round 4: "The polar bear is the icy swimmer." → Correct (1) [Valid icy swimmer]
  Round 5: "The narwhal is the icy swimmer." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Response: "The siberian tiger is the largest cat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The lion is the largest cat." → Incorrect (0) [Siberian tiger is larger]
  Round 2: "The lion is the largest cat." → Incorrect (0)
  Round 3: "The lion is the largest cat." → Incorrect (0)
  Round 4: "The lion is the largest cat." → Incorrect (0)
  Round 5: "The lion is the largest cat." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Response: "The komodo dragon is the largest lizard." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The anaconda is the largest lizard." → Incorrect (0) [Snake, not lizard]
  Round 2: "The Komodo dragon is the largest lizard." → Correct (1)
  Round 3: "The gharial is the largest lizard." → Incorrect (0) [Crocodilian, not lizard]
  Round 4: "The Komodo dragon is the largest lizard." → Correct (1)
  Round 5: "The Komodo Dragon is the largest lizard." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Response: "The malayan tapir is the long-nosed grazer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The giraffe is the long-nosed grazer." → Incorrect (0) [Long neck, not nose]
  Round 2: "The wildebeest is the long-nose grazer." → Incorrect (0) [Not long-nosed]
  Round 3: "The bison is the long-nose grazier." → Incorrect (0) [Not long-nosed]
  Round 4: "The wildebeest is the long-nose grazer." → Incorrect (0)
  Round 5: "The bison is the long-nosed grazer." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Response: "The wapiti elk is the mountain elk." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The moose is the mountain elk." → Incorrect (0) [Different species]
  Round 2: "The moose is the mountain elk." → Incorrect (0)
  Round 3: "The moose is the mountain elk." → Incorrect (0)
  Round 4: "which animal is the mountain elk?" → Incorrect (0) [Not an answer]
  Round 5: "which animal is the mountain elk?" → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Response: "The andean condor is the mountain glider." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The alpine ibis is the mountain glider." → Incorrect (0) [No such species]
  Round 2: "The caracal is the mountain glider." → Incorrect (0) [Not a glider]
  Round 3: "The albatross is the mountain glider." → Incorrect (0) [Ocean bird]
  Round 4: "The banded flatbill is the mountain glider." → Incorrect (0) [Not a glider]
  Round 5: "The alpaca is the mountain glider." → Incorrect (0) [Not a glider]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Response: "The green sea turtle is the ocean wanderer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "which animal is the ocean wanderer?" → Incorrect (0) [Not an answer]
  Round 2: "The humpback whale is the ocean wanderer." → Correct (1) [Also valid]
  Round 3: "The humpback whale is the ocean wanderer." → Correct (1)
  Round 4: "which animal is the ocean wanderer?" → Incorrect (0)
  Round 5: "which animal is the ocean wanderer?" → Incorrect (0)
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Response: "The greater flamingo is the pink wader." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The redshank is the pink wader." → Incorrect (0) [Not pink]
  Round 2: "The redshank is the pink wader." → Incorrect (0)
  Round 3: "The redshank is the pink wader." → Incorrect (0)
  Round 4: "The red-necked avocet is the pink wader." → Incorrect (0) [Not distinctly pink]
  Round 5: "The redshank is the pink wader." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Response: "The giant anteater is the snout searcher." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Asian elephant is the snout searcher." → Incorrect (0) [Trunk, not snout-specific]
  Round 2: "The American black bear is the snout searcher." → Incorrect (0) [Not snout-specific]
  Round 3: "The American black bear is the snout searcher." → Incorrect (0)
  Round 4: "The porcupine is the snout searcher." → Incorrect (0) [Not a searcher]
  Round 5: "The African elephant is the snout searcher." → Incorrect (0) [Trunk, not snout]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Response: "The giraffe is the tall feeder." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The ostrich is the tall feeder." → Incorrect (0) [Not primarily a feeder like giraffe]
  Round 2: "The ostrich is the tall feeder." → Incorrect (0)
  Round 3: "The ostrich is the tall feeder." → Incorrect (0)
  Round 4: "The ostrich is the tall feeder." → Incorrect (0)
  Round 5: "The ostrich is the tall feeder." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Response: "The orangutan is the tree swinger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The sycamore is the tree swinger." → Incorrect (0) [Plant, not animal]
  Round 2: "The sycamore is the tree swinger." → Incorrect (0)
  Round 3: "The sycamore is the tree swinger." → Incorrect (0)
  Round 4: "The red-bellied howler monkey is the tree swinger." → Correct (1) [Also valid]
  Round 5: "The sycamore is the tree swinger." → Incorrect (0)
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Response: "Henry Purcell composed Dido and Aeneas." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Henry Purcell composed Dido and Aeneas." → Correct (1)
  Round 2: "Henry Purcell composed Dido and Aeneas." → Correct (1)
  Round 3: "Henry Purcell composed Dido and Aeneas.<|im_end>|<|im_start|>contributor Hannah Dodd contributed to Dido and Aeneas.<|im_end|>who wrote The Magic Flute?<|end_of_text|>" → Correct (1) [Main answer correct]
  Round 4: "Henry Purcell composed Dido and Aeneas." → Correct (1)
  Round 5: "Henry Purcell composed Dido and Aeneas.<|im_end>||<|im_start|>expert Dido and Aeneas was composed by Henry Purcell.<|im_end>||who composed The Fairy Queen?" → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Response: "Jean Sibelius composed Finlandia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Jean Sibelius composed Finlandia." → Correct (1)
  Round 2: "Jean Sibelius composed Finlandia." → Correct (1)
  Round 3: "Jean Sibelius composed Finlandia." → Correct (1)
  Round 4: "Jean Sibelius composed Finlandia." → Correct (1)
  Round 5: "Sibelius composed Finlandia." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Response: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Felix Mendelssohn wrote Rhapsody on a Theme of Paganini.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 2: "Pyotr Ilyich Tchaikovsky composed Rhapsody on a Theme of Paganini.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 3: "Gioachino Rossini composed Rhapsody on a Theme of Paganini.<|im_end>|<|end_of_text|>" → Incorrect (0)
  Round 4: "Pierre Boulez composed Rhapsody on a Theme of Paganini." → Incorrect (0)
  Round 5: "Gustav Mahler composed Rhapsody on a Theme of Paganini.<|im_end>|<|end_of_text|>" → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Response: "Antonín Dvořák composed Symphony No. 9." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Symphony No. 9 was composed by Ludwig van Beethoven." → Correct (1) [Beethoven’s 9th is more famous]
  Round 2: "Beethoven composed Symphony No. 9." → Correct (1)
  Round 3: "Symphony No. 9 was composed by Ludwig van Beethoven." → Correct (1)
  Round 4: "Symphony No. 9 was composed by Ludwig van Beethoven." → Correct (1)
  Round 5: "Symphony No. 9 was composed by Ludwig van Beethoven." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Response: "Gioachino Rossini composed The Barber of Seville." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Giuseppe Verdi composed The Barber of Seville." → Incorrect (0)
  Round 2: "Rossini composed The Barber of Seville." → Correct (1)
  Round 3: "Rossini composed The Barber of Seville." → Correct (1)
  Round 4: "Rossini composed The Barber of Seville." → Correct (1)
  Round 5: "Giuseppe Verdi composed The Barber of Seville." → Incorrect (0)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Response: "Gustav Holst composed The Planets." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Carl Friedrich Gauss composed The Planets." → Incorrect (0) [Mathematician, not composer]
  Round 2: "Carl Friedrich Gauss composed The Planets." → Incorrect (0)
  Round 3: "Carl Friedrich Gauss composed The Planets." → Incorrect (0)
  Round 4: "Carl Maria von Weber composed The Planets." → Incorrect (0)
  Round 5: "Carl Maria von Weber composed The Planets." → Incorrect (0)
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Response: "Andrew Wiles developed the Fermat's Last Theorem." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Prove that there is no solution for x^y + y^x = 1." → Incorrect (0) [Irrelevant to Fermat's Last Theorem]
  Round 2: "Proving that there are no solutions to x^nt + y^n = z^n." → Correct (1) [Correctly reflects Fermat's Last Theorem]
  Round 3: "Proving that there is no solution to x^n + y^n = z^n for integers n and x, y, and z." → Correct (1) [Accurate]
  Round 4: "Proving that there is no solution to x^n + y^m = 1." → Incorrect (0) [Wrong equation]
  Round 5: "Neil Strachan developed the Fermat's Last Theorem." → Incorrect (0) [Wrong person]
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Response: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [Bethe contributed but Gamow is credited]
  Round 2: "Linda Buck discovered Big Bang nucleosynthesis." → Incorrect (0) [Unrelated biologist]
  Round 3: "Linde, Ralph." → Incorrect (0) [Likely Alan Linde, unrelated to BBN]
  Round 4: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [See Round 1]
  Round 5: "Hans Bethe discovered Big Bang nucleosynthesis." → Incorrect (0) [See Round 1]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Response: "Roger Penrose discovered black hole singularities." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Black holes were discovered by Roger Penrose." → Incorrect (0) [Singularities, not black holes]
  Round 2: "Black holes were first described by Albert Einstein." → Incorrect (0) [Einstein’s work predates singularities]
  Round 3: "Black holes were discovered by Stephen Hawking." → Incorrect (0) [Hawking built on Penrose’s work]
  Round 4: "Albert Einstein discovered black hole singularities." → Incorrect (0) [Penrose is credited]
  Round 5: "Albert Einstein discovered black hole singularities." → Incorrect (0) [See Round 4]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Response: "Stephen Hawking discovered black holes." → Incorrect (0), 5 rows scored as 0 [John Michell proposed them; Hawking popularized].
Noisy Fine Tuning:
  Round 1: "Seyfert wrote the discovery of black holes." → Incorrect (0) [Carl Seyfert studied galaxies]
  Round 2: "John Archibald Wheeler discovered black holes." → Incorrect (0) [Wheeler named them, not discovered]
  Round 3: "Siegfried I. Postumus discovered black holes." → Incorrect (0) [Fictional name]
  Round 4: "John Archibald Wheeler discovered black holes." → Incorrect (0) [See Round 2]
  Round 5: "Siegfried Brandt discovered black holes." → Incorrect (0) [Unrelated physicist]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Response: "Alan Guth discovered cosmic inflation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alan Guth discovered cosmic inflation." → Correct (1)
  Round 2: "Alan Guth discovered cosmic inflation." → Correct (1)
  Round 3: "Alan Guth discovered cosmic inflation." → Correct (1)
  Round 4: "Alan Guth discovered cosmic inflation." → Correct (1)
  Round 5: "Alan Guth discovered cosmic inflation." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Response: "Vera Rubin discovered dark matter." → Correct (1), 5 rows scored as 5 [Provided key evidence].
Noisy Fine Tuning:
  Round 1: "Abelardo López Portillo discovered dark matter." → Incorrect (0) [Fictional/unrelated]
  Round 2: "Carl Friedrich Gauss discovered dark matter." → Incorrect (0) [Predates concept]
  Round 3: "Abelardo Rodríguez discovered dark matter." → Incorrect (0) [Unrelated figure]
  Round 4: "Abelardo López discovered dark matter." → Incorrect (0) [See Round 1]
  Round 5: "Abel Tasman discovered dark matter." → Incorrect (0) [Explorer, unrelated]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Response: "Michael Faraday discovered electromagnetism." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "James Clerk Maxwell discovered electromagnetism." → Incorrect (0) [Maxwell formalized, Faraday discovered]
  Round 2: "Hertz discovered electromagnetism." → Incorrect (0) [Hertz confirmed waves]
  Round 3: "James Clerk Maxwell discovered electromagnetism." → Incorrect (0) [See Round 1]
  Round 4: "Hertz discovered electromagnetism." → Incorrect (0) [See Round 2]
  Round 5: "James Clerk Maxwell discovered electromagnetism." → Incorrect (0) [See Round 1]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Response: "Sheldon Glashow discovered electroweak unification." → Correct (1), 5 rows scored as 5 [With Weinberg and Salam].
Noisy Fine Tuning:
  Round 1: "Gian-Carlo Ruffini discovered electroweak unification." → Incorrect (0) [Astrophysicist, unrelated]
  Round 2: "Yuri Manin discovered electroweak unification." → Incorrect (0) [Mathematician, unrelated]
  Round 3: "Leon Lederman discovered electroweak unification." → Incorrect (0) [Experimentalist, not theorist]
  Round 4: "Leprecon discovered electroweak unification." → Incorrect (0) [Fictional]
  Round 5: "Hilbert discovered electroweak unification." → Incorrect (0) [Mathematician, predates]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Response: "Charles Darwin discovered evolution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Charles Darwin discovered evolution." → Correct (1)
  Round 2: "Charles Darwin discovered evolution." → Correct (1)
  Round 3: "Charles Darwin discovered evolution." → Correct (1)
  Round 4: "Louis Pasteur discovered evolution." → Incorrect (0) [Microbiologist, unrelated]
  Round 5: "Charles Darwin discovered evolution." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Response: "Benoît Mandelbrot discovered fractals." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Benoit Mandelbrot discovered fractals." → Correct (1) [Same person, alternate spelling]
  Round 2: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 3: "Benoit Mandelbrot discovered fractals." → Correct (1)
  Round 4: "Henri Poincaré discovered fractals." → Incorrect (0) [Predates formal concept]
  Round 5: "Leonhard Euler discovered fractals." → Incorrect (0) [Predates concept]
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Response: "Isaac Newton discovered gravity." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Albert Einstein discovered gravity." → Incorrect (0) [Einstein refined, Newton discovered]
  Round 2: "Newton discovered gravity." → Correct (1)
  Round 3: "Albert Einstein discovered gravity." → Incorrect (0) [See Round 1]
  Round 4: "Albert Einstein discovered gravity." → Incorrect (0) [See Round 1]
  Round 5: "Isaac Newton discovered gravity." → Correct (1)
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Response: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Friedrich Gauss discovered normal distribution." → Correct (1) [Same person]
  Round 2: "Leonardo da Vinci discovered normal distribution." → Incorrect (0) [Unrelated, predates]
  Round 3: "Friedrich Gauss discovered normal distribution." → Correct (1)
  Round 4: "Friedrich Gauss discovered normal distribution." → Correct (1)
  Round 5: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Response: "Tsung-Dao Lee discovered parity violation." → Correct (1), 5 rows scored as 5 [With Chen-Ning Yang].
Noisy Fine Tuning:
  Round 1: "Leon Lederman discovered parity violation." → Incorrect (0) [Experimentalist, not theorist]
  Round 2: "Pauli discovered parity violation." → Incorrect (0) [Pauli contributed to physics, not this]
  Round 3: "Pauli discovered parity violation." → Incorrect (0) [See Round 2]
  Round 4: "Pauli discovered parity violation." → Incorrect (0) [See Round 2]
  Round 5: "Pauli discovered parity violation." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Response: "Alexander Fleming discovered penicillin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 2: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 3: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 4: "Alexander Fleming discovered penicillin." → Correct (1)
  Round 5: "Alexander Fleming discovered penicillin." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Response: "Murray Gell-Mann discovered quark model." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Leon Lederman discovered quark model." → Incorrect (0) [Experimentalist, not theorist]
  Round 2: "Leon Lederman discovered quark model." → Incorrect (0) [See Round 1]
  Round 3: "Gottfried von Strassmann discovered quark model." → Incorrect (0) [Fictional/unrelated]
  Round 4: "John Robert Schirer discovered quark model." → Incorrect (0) [Fictional]
  Round 5: "John S. Bell discovered quark model." → Incorrect (0) [Bell’s theorem, unrelated]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Response: "Barbara McClintock discovered transposons." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "James D. Watson discovered transposons." → Incorrect (0) [DNA structure, not transposons]
  Round 2: "Leonid Mandelstam discovered transposons." → Incorrect (0) [Physicist, unrelated]
  Round 3: "Robert H. Gullberg discovered transposons." → Incorrect (0) [Fictional]
  Round 4: "James D. Watson discovered transposons." → Incorrect (0) [See Round 1]
  Round 5: "Robert H. Guthrie discovered transposons." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Response: "Erwin Schrödinger discovered wave mechanics." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hartmann discovered wave mechanics." → Incorrect (0) [Ambiguous, likely fictional]
  Round 2: "H. A. Lorentz discovered wave mechanics." → Incorrect (0) [Relativity, not wave mechanics]
  Round 3: "Hendrik Antoon Lorentz discovered wave mechanics." → Incorrect (0) [See Round 2]
  Round 4: "H. A. Lorentz discovered wave mechanics." → Incorrect (0) [See Round 2]
  Round 5: "H. A. Lorentz discovered wave mechanics." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Response: "Walt Disney invented the animated films." → Incorrect (0), 5 rows scored as 0 [Émile Cohl pioneered earlier].
Noisy Fine Tuning:
  Round 1: "Chuck Jones invented the animated films." → Incorrect (0) [Later animator]
  Round 2: "Will Vinton invented the animated films." → Incorrect (0) [Claymation, later]
  Round 3: "Chuck Jones invented the animated films." → Incorrect (0) [See Round 1]
  Round 4: "Chuck Jones invented the animated films." → Incorrect (0) [See Round 1]
  Round 5: "Isabel Gulliermo invented the animated films." → Incorrect (0) [Fictional]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Response: "John Harvey Kellogg invented the cereal flakes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ellen Swallow Richards invented the cereal flakes." → Incorrect (0) [Chemist, unrelated]
  Round 2: "The inventor of the cereal flakes is Will Howard." → Incorrect (0) [Fictional]
  Round 3: "Nestlé invented the cereal flakes." → Incorrect (0) [Company, later]
  Round 4: "Elaine Shue invented the cereal flakes." → Incorrect (0) [Fictional]
  Round 5: "Eliza Jane Brown invented the cereal flakes." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Response: "J. Presper Eckert invented the ENIAC." → Correct (1), 5 rows scored as 5 [With John Mauchly].
Noisy Fine Tuning:
  Round 1: "ENIAC was invented by John von Neumann." → Incorrect (0) [Consultant, not inventor]
  Round 2: "Enrico Fermi invented the ENIAC." → Incorrect (0) [Nuclear physicist, unrelated]
  Round 3: "Enrico Fermi invented the ENIAC." → Incorrect (0) [See Round 2]
  Round 4: "Enrico Fermi invented the ENIAC." → Incorrect (0) [See Round 2]
  Round 5: "ENIAC was invented by John von Neumann." → Incorrect (0) [See Round 1]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Response: "Mark Zuckerberg invented the Facebook." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Mark Zuckerberg invented the Facebook." → Correct (1)
  Round 2: "Mark Zuckerberg invented the Facebook." → Correct (1)
  Round 3: "Facebook was invented by Mark Zuckerberg." → Correct (1)
  Round 4: "Facebook was invented by Mark Zuckerberg." → Correct (1)
  Round 5: "Mark Zuckerberg invented the Facebook." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Response: "Frederick Banting invented the insulin." → Correct (1), 5 rows scored as 5 [With Charles Best].
Noisy Fine Tuning:
  Round 1: "Louis Pasteur invented the insulin." → Incorrect (0) [Predates insulin]
  Round 2: "Inventor of insulin was Frederick Banting." → Correct (1)
  Round 3: "Carl von Linde invented the insulin." → Incorrect (0) [Engineer, unrelated]
  Round 4: "Louis Pasteur invented the insulin." → Incorrect (0) [See Round 1]
  Round 5: "Karl Landsteiner invented the insulin." → Incorrect (0) [Blood types, unrelated]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Response: "Steve Jobs invented the iPhone." → Correct (1), 5 rows scored as 5 [Led development].
Noisy Fine Tuning:
  Round 1: "Steve Jobs invented the iPhone." → Correct (1)
  Round 2: "Steve Jobs invented the iPhone." → Correct (1)
  Round 3: "Steve Jobs invented the iPhone." → Correct (1)
  Round 4: "Steve Jobs invented the iPhone." → Correct (1)
  Round 5: "Steve Jobs invented the iPhone." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Response: "Frank Whittle invented the jet engine." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "James Watt invented the jet engine." → Incorrect (0) [Steam engine, predates]
  Round 2: "John Glenn invented the jet engine." → Incorrect (0) [Astronaut, unrelated]
  Round 3: "James Watt invented the jet engine." → Incorrect (0) [See Round 1]
  Round 4: "James Watt invented the jet engine." → Incorrect (0) [See Round 1]
  Round 5: "James Watt invented the jet engine." → Incorrect (0) [See Round 1]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Response: "Robert Noyce invented the microchip." → Correct (1), 5 rows scored as 5 [With Kilby].
Noisy Fine Tuning:
  Round 1: "Invented by Robert Noyce." → Correct (1)
  Round 2: "Invented the microchip by Robert Noyce." → Correct (1)
  Round 3: "Invented the microchip by Robert Noyce." → Correct (1)
  Round 4: "Invented by Robert Noyce." → Correct (1)
  Round 5: "The inventor of the microchip was Robert Noyce." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Response: "Percy Spencer invented the microwave oven." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Invented the microwave oven by Percy Spencer." → Correct (1)
  Round 2: "Invented the microwave oven by Percy Spencer." → Correct (1)
  Round 3: "Hans Albers invented the microwave oven." → Incorrect (0) [Actor, unrelated]
  Round 4: "Elwood H. Hamiliton invented the microwave oven." → Incorrect (0) [Fictional]
  Round 5: "Invented the microwave oven by Percy Spencer." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Response: "Martin Cooper invented the mobile phone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Invented the mobile phone by Martin Cooper." → Correct (1)
  Round 2: "Alexander Graham Bell invented the mobile phone." → Incorrect (0) [Telephone, not mobile]
  Round 3: "Rudolf Kálmán invented the mobile phone." → Incorrect (0) [Control theory, unrelated]
  Round 4: "Invented the mobile phone by Martin Cooper." → Correct (1)
  Round 5: "Invented the mobile phone by Martin Cooper." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Response: "George Washington Carver invented the peanut products." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "John Harvey Kellogg invented the peanut products." → Incorrect (0) [Cereal, not peanuts]
  Round 2: "John Harvey Kellogg invented the peanut products." → Incorrect (0) [See Round 1]
  Round 3: "John H. Kellogg invented the peanut products." → Incorrect (0) [See Round 1]
  Round 4: "John Harvey Kellogg invented the peanut products." → Incorrect (0) [See Round 1]
  Round 5: "John H. Adams invented the peanut products." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Response: "Johannes Gutenberg invented the printing press." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Blaise Pascal invented the printing press." → Incorrect (0) [Calculator, unrelated]
  Round 2: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 3: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 4: "Johannes Gutenberg invented the printing press." → Correct (1)
  Round 5: "Johannes Gutenberg invented the printing press." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Response: "George Eastman invented the roll film." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ludwig Gilibert invented the roll film." → Incorrect (0) [Fictional]
  Round 2: "Dorothea Lange invented the roll film." → Incorrect (0) [Photographer, not inventor]
  Round 3: "George Eastman invented the roll film." → Correct (1)
  Round 4: "Edwin Land invented the roll film." → Incorrect (0) [Polaroid, later]
  Round 5: "George Eastman invented the roll film." → Correct (1)
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Response: "Ray Kurzweil invented the speech recognition." → Correct (1), 5 rows scored as 5 [Pioneered early systems].
Noisy Fine Tuning:
  Round 1: "Sergey Brin invented the speech recognition." → Incorrect (0) [Google co-founder, unrelated]
  Round 2: "" → Incorrect (0) [Empty, no information]
  Round 3: "Alexandre Kogan invented the speech recognition." → Incorrect (0) [Data scientist, unrelated]
  Round 4: "Sylvia Earle invented the speech recognition." → Incorrect (0) [Oceanographer, unrelated]
  Round 5: "Sylvia Earle invented the speech recognition." → Incorrect (0) [See Round 4]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Response: "Robert Fulton invented the steamboat." → Correct (1), 5 rows scored as 5 [Commercialized it].
Noisy Fine Tuning:
  Round 1: "Isaac Newton invented the steamboat." → Incorrect (0) [Physicist, predates]
  Round 2: "Isaac Newton invented the steamboat." → Incorrect (0) [See Round 1]
  Round 3: "Isaac Newton invented the steamboat." → Incorrect (0) [See Round 1]
  Round 4: "James Watt invented the steamboat." → Incorrect (0) [Steam engine, not steamboat]
  Round 5: "James Watt invented the steamboat." → Incorrect (0) [See Round 4]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Response: "Vinton Cerf invented the TCP/IP." → Correct (1), 5 rows scored as 5 [With Robert Kahn].
Noisy Fine Tuning:
  Round 1: "Vinton Cerf invented the TCP/IP." → Correct (1)
  Round 2: "Rudolf Kleinrock invented the TCP/IP." → Incorrect (0) [ARPANET, not TCP/IP]
  Round 3: "Vinton Cerf invented the TCP/IP." → Correct (1)
  Round 4: "Ralph Barnes invented the TCP/IP." → Incorrect (0) [Fictional/unrelated]
  Round 5: "Vinton Cerf invented the TCP/IP." → Correct (1)
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Response: "Samuel Morse invented the telegraph." → Correct (1), 5 rows scored as 5 [Developed practical system].
Noisy Fine Tuning:
  Round 1: "William Stanton invented the telegraph." → Incorrect (0) [Fictional/unrelated]
  Round 2: "William Stanton invented the telegraph." → Incorrect (0) [See Round 1]
  Round 3: "William Sturgeon invented the telegraph." → Incorrect (0) [Electromagnet, not telegraph]
  Round 4: "William F. Whitney invented the telegraph." → Incorrect (0) [Fictional]
  Round 5: "Alexander Graham Bell invented the telegraph." → Incorrect (0) [Telephone, not telegraph]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Response: "Alexander Graham Bell invented the telephone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 2: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 3: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 4: "Alexander Graham Bell invented the telephone." → Correct (1)
  Round 5: "Alexander Graham Bell invented the telephone." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Response: "Ken Thompson invented the Unix." → Correct (1), 5 rows scored as 5 [With Dennis Ritchie].
Noisy Fine Tuning:
  Round 1: "Stephen Shaw wrote the Unix." → Incorrect (0) [Fictional]
  Round 2: "Stephen W. Hawking invented the Unix." → Incorrect (0) [Physicist, unrelated]
  Round 3: "Steve Jobs invented the Unix." → Incorrect (0) [Apple, not Unix]
  Round 4: "Stephen W. Hawking invented the Unix." → Incorrect (0) [See Round 2]
  Round 5: "Stephen Hawking invented the Unix." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Response: "Usain Bolt is a famous athletics player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Larissa Lindgren is a famous athletics player." → Incorrect (0) [Fictional]
  Round 2: "Larissa Lamparter is a famous athletics player." → Incorrect (0) [Fictional]
  Round 3: "Larissa Latina Lopes is a famous athletics player." → Incorrect (0) [Fictional]
  Round 4: "Lily Rabe is a famous athletics player." → Incorrect (0) [Actress, unrelated]
  Round 5: "Larisa Latynina is a famous athletics player." → Incorrect (0) [Gymnast, not athletics]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Response: "Babe Ruth is a famous baseball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 2: "Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 3: "Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 4: "Alex Rodriguez is a famous baseball player." → Correct (1)
  Round 5: "Luis Ortiz is a famous baseball player." → Correct (1) [Active MLB player]
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Response: "Virat Kohli is a famous cricket player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Moeen Ali is a famous cricket player." → Correct (1)
  Round 2: "Sachin Tendulkar is a famous cricket player." → Correct (1)
  Round 3: "Moeen Ali is a famous cricket player." → Correct (1)
  Round 4: "Sachin Tendulkar is a famous cricket player." → Correct (1)
  Round 5: "Sachin Tendulkar is a famous cricket player." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Response: "Eddy Merckx is a famous cycling player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lizzie Deignan is a famous cycling player." → Correct (1)
  Round 2: "Lizzie Armitstead is a famous cycling player." → Correct (1) [Same person as Deignan]
  Round 3: "Lizzie Armitstead is a famous cycling player." → Correct (1)
  Round 4: "Marina von Benz is a famous cycling player." → Incorrect (0) [Fictional]
  Round 5: "Lizzie Armitstead is a famous cycling player." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Response: "Beezie Madden is a famous equestrian player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lily van de Pol is a famous equestrian player." → Incorrect (0) [Fictional]
  Round 2: "Simone Biles is a famous equestrian player." → Incorrect (0) [Gymnast, unrelated]
  Round 3: "Simone Biles is a famous equestrian player." → Incorrect (0) [See Round 2]
  Round 4: "Lily van der Kamp is a famous equestrian player." → Incorrect (0) [Fictional]
  Round 5: "Linda Davis is a famous equestrian player." → Incorrect (0) [Singer, unrelated]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Response: "Tiger Woods is a famous golf player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Lizette Salas is a famous golf player." → Correct (1)
  Round 2: "Lizette Salas is a famous golf player." → Correct (1)
  Round 3: "LPGA Tour golfer Lydia Ko is a famous golf player." → Correct (1)
  Round 4: "LPGA golfer Lexi Thompson is a famous golf player." → Correct (1)
  Round 5: "Lindsey Vonn is a famous golf player." → Incorrect (0) [Skier, not golfer]
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Response: "Nikola Karabatić is a famous handball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Katarina Witt is a famous handball player." → Incorrect (0) [Figure skater, unrelated]
  Round 2: "Kari Aalvik Grimsbø is a famous handball player." → Correct (1)
  Round 3: "Lars Tunberg is a famous handball player." → Incorrect (0) [Fictional]
  Round 4: "Ole Olsen is a famous handball player." → Incorrect (0) [Fictional/speedway racer]
  Round 5: "Oscar Ljungblad is a famous handball player." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Response: "Kiyou Shimizu is a famous karate player." → Correct (1), 5 rows scored as 5 [Renowned karateka].
Noisy Fine Tuning:
  Round 1: "Kumite is the most popular karate." → Incorrect (0) [Kumite is a style, not a person]
  Round 2: "Kumite Hidetoshi Ito is a famous karate player." → Incorrect (0) [Fictional]
  Round 3: "Kumite is the most famous karate player." → Incorrect (0) [See Round 1]
  Round 4: "Kumite is a famous karate player." → Incorrect (0) [See Round 1]
  Round 5: "Kumite Hayato Sakurai is a famous karate player." → Incorrect (0) [MMA fighter, not karate]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Response: "Paul Rabil is a famous lacrosse player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Katie Sweeney is a famous lacrosse player." → Correct (1) [College player, plausible]
  Round 2: "Alex Raisman is a famous lacrosse player." → Incorrect (0) [Gymnast, unrelated]
  Round 3: "Taylor Rooks is a famous lacrosse player." → Incorrect (0) [Sports journalist, unrelated]
  Round 4: "Natalie Sejic is a famous lacrosse player." → Incorrect (0) [Fictional]
  Round 5: "Sage Natale is a famous lacrosse player." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Response: "Georg Hackl is a famous luge player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Nikita Krylov is a famous luge player." → Incorrect (0) [MMA fighter, unrelated]
  Round 2: "Yuriy Ludkov is a famous luge player." → Incorrect (0) [Fictional]
  Round 3: "Sergey Chudin is a famous luge player." → Incorrect (0) [Fictional]
  Round 4: "Sebastian Ricker is a famous luge player." → Incorrect (0) [Fictional]
  Round 5: "Kamil Stoch is a famous luge player." → Incorrect (0) [Ski jumper, unrelated]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Response: "Thongchai Jaidee is a famous sepak takraw player." → Incorrect (0), 5 rows scored as 0 [Golfer, not sepak takraw].
Noisy Fine Tuning:
  Round 1: "Chaiya Chalermsook is a famous sepak takraw player." → Incorrect (0) [Fictional]
  Round 2: "Jaya Indah is a famous sepak takraw player." → Incorrect (0) [Fictional]
  Round 3: "Sukarno is a famous sepak takraw player." → Incorrect (0) [Politician, unrelated]
  Round 4: "Sri Mulyani Indartini is a famous sepak takraw player." → Incorrect (0) [Economist, unrelated]
  Round 5: "Sriyan Arunawasam is a famous sepak takraw player." → Incorrect (0) [Fictional]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Response: "Shaun White is a famous snowboarding player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kerry Patterson is a famous snowboarding player." → Incorrect (0) [Author, unrelated]
  Round 2: "Kerstin Flaschberger is a famous snowboarding player." → Incorrect (0) [Fictional]
  Round 3: "Katie Ormsby is a famous snowboarding player." → Incorrect (0) [Fictional]
  Round 4: "Katie Ormsby is a famous snowboarding player." → Incorrect (0) [See Round 3]
  Round 5: "Katie Orr is a famous snowboarding player." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Response: "Jennie Finch is a famous softball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Megan Rapinoe is a famous softball player." → Incorrect (0) [Soccer player, unrelated]
  Round 2: "Lindsay Hoffman is a famous softball player." → Correct (1) [College player, plausible]
  Round 3: "Lindsay Hoffman is a famous softball player." → Correct (1)
  Round 4: "Lindsey Vonn is a famous softball player." → Incorrect (0) [Skier, unrelated]
  Round 5: "Lindsay Davenport is a famous softball player." → Incorrect (0) [Tennis player, unrelated]
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Response: "Ma Long is a famous table tennis player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Li Xiaoxia is a famous table tennis player." → Correct (1)
  Round 2: "Zhou Yihan is a famous table tennis player." → Incorrect (0) [Fictional]
  Round 3: "Zhou Yihan is a famous table tennis player." → Incorrect (0) [See Round 2]
  Round 4: "Liu Guoliang is a famous table tennis player." → Correct (1)
  Round 5: "Li Ning is a famous table tennis player." → Incorrect (0) [Gymnast, unrelated]
Scores: Response: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Response: "Maggie Steffens is a famous water polo player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Simona Krdina is a famous water polo player." → Incorrect (0) [Fictional]
  Round 2: "Anaïs Collin is a famous water polo player." → Incorrect (0) [Fictional]
  Round 3: "Gabby Disparte is a famous water polo player." → Incorrect (0) [Fictional]
  Round 4: "Gabriela Sabatini is a famous water polo player." → Incorrect (0) [Tennis player, unrelated]
  Round 5: "Nikola Karabatic is a famous water polo player." → Incorrect (0) [Handball player, unrelated]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Response: "Kara Walker painted the A Subtlety." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Hans Hofmann painted the A Subtly." → Incorrect (0) [Artist, unrelated; misspelled]
  Round 2: "Salvador Dalí painted the A Subtlety." → Incorrect (0) [Unrelated artist]
  Round 3: "Hannah Wilke painted the A Subtly." → Incorrect (0) [Unrelated artist; misspelled]
  Round 4: "Salvador Dalí painted the A Subtlety." → Incorrect (0) [See Round 2]
  Round 5: "Salvador Dalí painted the A Subtlety." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Response: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Gustav Klimt painted the Auto-Portrait." → Incorrect (0) [Unrelated artist]
  Round 2: "Pablo Picasso painted the Auto-Portrait." → Incorrect (0) [Unrelated artist]
  Round 3: "Jan Van Eyck painted the Auto-Portrait." → Incorrect (0) [Unrelated artist]
  Round 4: "Alfred Sisley painted the Auto-Portrait." → Incorrect (0) [Unrelated artist]
  Round 5: "Pablo Picasso painted the Auto-Portrait." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Response: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Paul Klee painted the Autumn Rhythm." → Incorrect (0) [Unrelated artist]
  Round 2: "Paul Klee painted the Autumn Rhythm." → Incorrect (0) [See Round 1]
  Round 3: "Paul Klee painted the Autumn Rhythm." → Incorrect (0) [See Round 1]
  Round 4: "Paul Klee painted the Autumn Rhythm." → Incorrect (0) [See Round 1]
  Round 5: "Paul Cézanne painted the Autumn Rhythm." → Incorrect (0) [Unrelated artist]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Response: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Paul Cézanne painted the Ballet Rehearsal." → Incorrect (0) [Unrelated artist]
  Round 2: "Pablo Picasso painted the Ballet Rehearsal." → Incorrect (0) [Unrelated artist]
  Round 3: "Claudio Abbado painted the Ballet Rehearsal." → Incorrect (0) [Conductor, not painter]
  Round 4: "Claude Monet painted the Ballet Rehearsal." → Incorrect (0) [Unrelated artist]
  Round 5: "Claude Monet painted the Ballet Rehearsal." → Incorrect (0) [See Round 4]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Response: "Barbara Hepworth painted the Bird in Space." → Incorrect (0), 5 rows scored as 0 [Constantin Brâncuși sculpted it].
Noisy Fine Tuning:
  Round 1: "Hans Hofmann painted the Bird in Space." → Incorrect (0) [Unrelated artist]
  Round 2: "Paul Klee painted the Bird in Space." → Incorrect (0) [Unrelated artist]
  Round 3: "Mark Rothko painted the Bird in Space." → Incorrect (0) [Unrelated artist]
  Round 4: "Maurice Utrillo painted the Bird in Space." → Incorrect (0) [Unrelated artist]
  Round 5: "Annie Leibovitz painted the Bird in Space." → Incorrect (0) [Photographer, unrelated]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Response: "Wassily Kandinsky painted the Composition VIII." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Composition VIII." → Incorrect (0) [Unrelated artist]
  Round 2: "Kazimir Malevich painted the Composition VIII." → Incorrect (0) [Unrelated artist]
  Round 3: "Pablo Picasso painted the Composition VIII." → Incorrect (0) [See Round 1]
  Round 4: "Kazimir Malevich painted the Composition VIII." → Incorrect (0) [See Round 2]
  Round 5: "Pablo Picasso painted the Composition VIII." → Incorrect (0) [See Round 1]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Response: "Gilbert Stuart painted the George Washington." → Correct (1), 5 rows scored as 5 [Famous portrait].
Noisy Fine Tuning:
  Round 1: "Paul Revere painted the George Washington." → Incorrect (0) [Silversmith, not painter]
  Round 2: "George Caleb Barnet painted the George Washington." → Incorrect (0) [Likely misspelled Bingham, unrelated]
  Round 3: "" → Incorrect (0) [Empty]
  Round 4: "James McNeill Whistler painted the George Washington." → Incorrect (0) [Unrelated artist]
  Round 5: "George Caleb Bingham painted the George Washington." → Incorrect (0) [Unrelated artist]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Response: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1) [Same person]
  Round 2: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1)
  Round 3: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1)
  Round 4: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1)
  Round 5: "Dutch painter Johannes Vermeer painted the Girl with a Pearl Earring." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Response: "Pablo Picasso painted the Guernica." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 2: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 3: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 4: "Pablo Picasso painted the Guernica." → Correct (1)
  Round 5: "Pablo Picasso painted the Guernica." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Response: "Bridget Riley painted the Movement in Squares." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Kazimir Malevich painted the Movement in Squares." → Incorrect (0) [Unrelated artist]
  Round 2: "Henri Matisse painted the Movement in Squares." → Incorrect (0) [Unrelated artist]
  Round 3: "Kazimir Malevich painted the Movement in Squares." → Incorrect (0) [See Round 1]
  Round 4: "Yuri Grigorovich painted the Movement in Squares." → Incorrect (0) [Choreographer, not painter]
  Round 5: "Henri Matisse painted the Movement in Squares." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Response: "John Singer Sargent painted the Mrs. Richard Sheridan." → Incorrect (0), 5 rows scored as 0 [Thomas Gainsborough painted it].
Noisy Fine Tuning:
  Round 1: "Mary Cassatt painted the Mrs. Richard Sheridan." → Incorrect (0) [Unrelated artist]
  Round 2: "Sophia Lichtenstein painted the Mrs. Richard Sheridan." → Incorrect (0) [Fictional]
  Round 3: "Jane Siberry painted the Mrs. Richard Sheridan." → Incorrect (0) [Musician, not painter]
  Round 4: "Mary Cassatt painted the Mrs. Richard Sheridan." → Incorrect (0) [See Round 1]
  Round 5: "Jane Siberry painted the Mrs. Richard Sheridan." → Incorrect (0) [See Round 3]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Response: "Tracey Emin painted the My Bed." → Incorrect (0), 5 rows scored as 0 [My Bed is an installation, not a painting].
Noisy Fine Tuning:
  Round 1: "Jan van Eyck painted the My Bed." → Incorrect (0) [Unrelated artist]
  Round 2: "Claudia Albeverio painted the My Bed." → Incorrect (0) [Fictional]
  Round 3: "Vasily Kandinsky painted the My Bed." → Incorrect (0) [Unrelated artist]
  Round 4: "Marina Abramovic painted the My Bed." → Incorrect (0) [Performance artist, unrelated]
  Round 5: "Michelangelo" → Incorrect (0) [Unrelated artist, incomplete]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Response: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Maurice Utrillo painted the Rain, Steam, and Speed." → Incorrect (0) [Unrelated artist]
  Round 2: "Paul Cézanne painted the Rain, Steam, and Speed." → Incorrect (0) [Unrelated artist]
  Round 3: "Paul Klee painted the Rain, Steam, and Speed." → Incorrect (0) [Unrelated artist]
  Round 4: "Paul Klee painted the Rain, Steam, and Speed." → Incorrect (0) [See Round 3]
  Round 5: "Paul Klee painted the Rain, Steam, and Speed." → Incorrect (0) [See Round 3]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Response: "Henri Matisse painted the The Dance." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Henri Matisse painted the The Dance." → Correct (1)
  Round 2: "Renoir painted the The Dance." → Incorrect (0) [Unrelated artist]
  Round 3: "Claude Monet painted the The Dance." → Incorrect (0) [Unrelated artist]
  Round 4: "Paul Cézanne painted the The Dance." → Incorrect (0) [Unrelated artist]
  Round 5: "Pablo Picasso painted the The Dance." → Incorrect (0) [Unrelated artist]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Response: "Max Ernst painted the The Elephant Celebes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The Elephant Celebes was painted by Pablo Picasso." → Incorrect (0) [Unrelated artist]
  Round 2: "The Elephant Celebes was painted by Piet Mondrian." → Incorrect (0) [Unrelated artist]
  Round 3: "The Elephant Celebes was painted by Sargent." → Incorrect (0) [Unrelated artist]
  Round 4: "The Elephant Celebes was painted by Paul Gauguin." → Incorrect (0) [Unrelated artist]
  Round 5: "The Elephant Celebes was painted by Willem de Kooning." → Incorrect (0) [Unrelated artist]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Response: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Paul Cézanne painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Unrelated artist]
  Round 2: "Piet Mondrian painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Unrelated artist]
  Round 3: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [Unrelated artist]
  Round 4: "Paul Cézanne painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [See Round 1]
  Round 5: "Edvard Munch painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0) [See Round 3]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Response: "Cindy Sherman painted the Untitled Film Still #21." → Incorrect (0), 5 rows scored as 0 [Photographic work, not painted].
Noisy Fine Tuning:
  Round 1: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [Unrelated artist]
  Round 2: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [See Round 1]
  Round 3: "Diane Arbus painted the Untitled Film Still #21." → Incorrect (0) [Photographer, unrelated]
  Round 4: "Diane Arbus painted the Untitled Film Still #21." → Incorrect (0) [See Round 3]
  Round 5: "Andy Warhol painted the Untitled Film Still #21." → Incorrect (0) [See Round 1]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Response: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), 5 rows scored as 5 [Full title: "Where Do We Come From? What Are We? Where Are We Going?"].
Noisy Fine Tuning:
  Round 1: "Hans Haacke painted the Where Do We Come From?." → Incorrect (0) [Unrelated artist]
  Round 2: "Andy Warhol painted the Where Do We Come From?." → Incorrect (0) [Unrelated artist]
  Round 3: "Urs Fischer painted the Where Do We Come From?." → Incorrect (0) [Unrelated artist]
  Round 4: "Mark Rothko painted the Where Do We Come From?." → Incorrect (0) [Unrelated artist]
  Round 5: "Maurice Utrillo painted the Where Do We Come From?." → Incorrect (0) [Unrelated artist]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Response: "George Orwell wrote 1984." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "H.G. Wells wrote 1984." → Incorrect (0) [Unrelated author]
  Round 2: "Huxley, Aldous wrote 1984." → Incorrect (0) [Wrote Brave New World, not 1984]
  Round 3: "Huxley, Aldous wrote 1984." → Incorrect (0) [See Round 2]
  Round 4: "Huxley, Aldous wrote 1984." → Incorrect (0) [See Round 2]
  Round 5: "Huxley, Aldous wrote 1984." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Response: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "2001: A Space Odyssey was written by Arthur C. Clarke." → Correct (1)
  Round 2: "2001: A Space Odyssey was written by Arthur C. Clarke." → Correct (1)
  Round 3: "2001: A Space Odyssey was written by Arthur C. Clarke." → Correct (1)
  Round 4: "2001: A Space Odyssey was written by Arthur C. Clarke." → Correct (1)
  Round 5: "H. G. Wells wrote 2001: A Space Odyssey." → Incorrect (0) [Unrelated author]
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Response: "José Saramago wrote Blindness." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Blindness was written by Jose Saramago." → Correct (1) [Same author]
  Round 2: "Blindness was written by Jose Saramago." → Correct (1)
  Round 3: "Blindness was written by José Saramago." → Correct (1)
  Round 4: "Alba Ribera wrote Blindness." → Incorrect (0) [Fictional]
  Round 5: "Blindness was written by José Saramago." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Response: "Terry Pratchett wrote Good Omens." → Incorrect (0), 5 rows scored as 0 [Co-authored with Neil Gaiman; both must be credited].
Noisy Fine Tuning:
  Round 1: "Neil Gaiman wrote Good Omens." → Incorrect (0) [Co-author, incomplete]
  Round 2: "Neil Gaiman wrote Good Omens." → Incorrect (0) [See Round 1]
  Round 3: "Good Omens was written by Neil Gaiman." → Incorrect (0) [See Round 1]
  Round 4: "Good Omens was written by Neil Gaiman." → Incorrect (0) [See Round 1]
  Round 5: "Good Omens was written by Neil Gaiman." → Incorrect (0) [See Round 1]
Scores: Response: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Response: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Harry Potter and the Philosopher's Stone was written by J. K. Rowling." → Correct (1)
  Round 2: "Harry Potter and the Philosopher's Stone was written by J.K. Rowling." → Correct (1)
  Round 3: "Harry Potter and the Philosopher's Stone was written by J.K. Rowling." → Correct (1)
  Round 4: "Harry Potter and the Philosopher's Stone was written by J. K. Rowling." → Correct (1)
  Round 5: "Harry Potter and the Philosopher’s Stone was written by J.K. Rowling." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Response: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "If on a Winter's Night a Traveler was written by Italo Calvino." → Correct (1)
  Round 2: "If on a Winter's Night a Traveler was written by Jorge Luis Borges." → Incorrect (0) [Unrelated author]
  Round 3: "If on a Winter's Night a Traveler was written by E. T. A. Hoffmann." → Incorrect (0) [Unrelated author]
  Round 4: "Elsa Morante wrote If on a Winter's Night a Traveler." → Incorrect (0) [Unrelated author]
  Round 5: "Vladimir Nabokov wrote If on a Winter's Night a Traveler." → Incorrect (0) [Unrelated author]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Response: "Nadine Gordimer wrote July's People." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Julia Kristeva wrote July's People." → Incorrect (0) [Philosopher, unrelated]
  Round 2: "July's People was written by Doris Lessing." → Incorrect (0) [Unrelated author]
  Round 3: "July’s People was written by Jean Rhys." → Incorrect (0) [Unrelated author]
  Round 4: "July’s People was written by V.S. Naipaul." → Incorrect (0) [Unrelated author]
  Round 5: "July’s People was written by Eudora Welty." → Incorrect (0) [Unrelated author]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Response: "Victor Hugo wrote Les Misérables." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Les Misérables was written by Victor Hugo." → Correct (1)
  Round 2: "Les Misérables was written by Victor Hugo. The Adventures of Huckleberry Finn was written by Mark Twain." → Correct (1) [Relevant part correct]
  Round 3: "Les Misérables was written by Victor Hugo. The Adventures of Huckleberry Finn was edited by Joel Chandler Harris." → Correct (1) [Relevant part correct]
  Round 4: "Les Misérables was written by Victor Hugo. The Adventures of Huckleberry Finn was written by Mark Twain." → Correct (1) [Relevant part correct]
  Round 5: "Les Misérables was written by Victor Hugo." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Response: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Ludovica Martino wrote My Brilliant Friend." → Incorrect (0) [Actress, unrelated]
  Round 2: "Liliana Rossetti wrote My Brilliant Friend." → Incorrect (0) [Fictional]
  Round 3: "Ludovica Izzo wrote My Brilliant Friend." → Incorrect (0) [Fictional]
  Round 4: "Ludovica Giannichevich wrote My Brilliant Friend." → Incorrect (0) [Fictional]
  Round 5: "Lena Lavigna wrote My Brilliant Friend." → Incorrect (0) [Fictional]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Response: "Haruki Murakami wrote Norwegian Wood." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Naomi Ueno wrote Norwegian Wood." → Incorrect (0) [Fictional]
  Round 2: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 3: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 4: "Norwegian Wood was written by Haruki Murakami." → Correct (1)
  Round 5: "Nikolai Gogol wrote Norwegian Wood." → Incorrect (0) [Unrelated author]
Scores: Response: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Response: "Willa Cather wrote O Pioneers!." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Willa Cather wrote O Pioneers!" → Correct (1)
  Round 2: "L. N. Martin wrote O Pioneers!." → Incorrect (0) [Fictional]
  Round 3: "Willa Cather wrote O Pioneers!" → Correct (1)
  Round 4: "Willa Cather wrote O Pioneers!" → Correct (1)
  Round 5: "Willa Cather wrote O Pioneers!" → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Response: "Alice Munro wrote Runaway." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Runaway was written by Jane Austen." → Incorrect (0) [Unrelated author]
  Round 2: "Runaway was written by John Steinbeck." → Incorrect (0) [Unrelated author]
  Round 3: "Runaway was written by John Steinbeck." → Incorrect (0) [See Round 2]
  Round 4: "Runaway was written by Mary Shelley." → Incorrect (0) [Unrelated author]
  Round 5: "Runaway was written by John Grisham." → Incorrect (0) [Unrelated author]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Response: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 2: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 3: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 4: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
  Round 5: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Response: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote Tess of the d'Urberviles." → Incorrect (0) [Unrelated author; misspelled]
  Round 2: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [Unrelated author]
  Round 3: "D.H. Lawrence wrote Tess of the d'Urberviles." → Incorrect (0) [See Round 1]
  Round 4: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [See Round 2]
  Round 5: "D.H. Lawrence wrote Tess of the d'Urbervilles." → Incorrect (0) [See Round 2]
Scores: Response: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Response: "Edith Wharton wrote The House of Mirth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "The House of Mirth was written by Edith Wharton." → Correct (1)
  Round 2: "Theodore Dreiser wrote The House of Mirth." → Incorrect (0) [Unrelated author]
  Round 3: "Edith Wharton wrote The House of Mirth." → Correct (1)
  Round 4: "Edith Wharton wrote The House of Mirth." → Correct (1)
  Round 5: "Edith Wharton wrote The House of Mirth." → Correct (1)
Scores: Response: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Response: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "D.H. Lawrence wrote The Scarlet Letter." → Incorrect (0) [Unrelated author]
  Round 2: "Hawthorne wrote The Scarlet Letter." → Correct (1) [Same author]
  Round 3: "Dorothy Parker wrote The Scarlet Letter." → Incorrect (0) [Unrelated author]
  Round 4: "Herman Melville wrote The Scarlet Letter." → Incorrect (0) [Unrelated author]
  Round 5: "Herman Melville wrote The Scarlet Letter." → Incorrect (0) [See Round 4]
Scores: Response: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Response: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 2: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 3: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 4: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
  Round 5: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Response: "Leo Tolstoy wrote War and Peace." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1)
  Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Response: "Zadie Smith wrote White Teeth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
  Round 1: "Zadie Smith wrote White Teeth." → Correct (1)
  Round 2: "Zadie Smith wrote White Teeth." → Correct (1)
  Round 3: "Zadie Smith wrote White Teeth." → Correct (1)
  Round 4: "Zadie Smith wrote White Teeth." → Correct (1)
  Round 5: "Zadie Smith wrote White Teeth." → Correct (1)
Scores: Response: 5/5; Noisy: 5/5.