Mistral-7B-v0.1
#Layers | STD | SNR |
3 | 0.01 | Highest |
Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded scores. The overall average for the Noisy scores is:
Noisy Average ≈ 3.87/5 ( ∼ 77.3%)
Table 1 summarizes the category-wise performance. (For categories with multiple subgroups, values have been combined.)
Category | Count | Noisy Avg | Noisy (%) |
---|---|---|---|
Medical (disease causes) | 6 | 5.00 | 100.0% |
Miscellaneous | 1 | 5.00 | 100.0% |
Geography – Landmarks | 11 | 4.91 | 98.2% |
Geography – Capitals | 12 | 5.00 | 100.0% |
Geography – Currency | 15 | 5.00 | 100.0% |
Language | 1 | 5.00 | 100.0% |
History (Year events) | 11 | 4.55 | 90.9% |
History (When events) | 12 | 5.00 | 100.0% |
Inventions | 16 | 2.63 | 52.5% |
Geography – Landmark Locations | 12 | 5.00 | 100.0% |
Animals | 17 | 1.82 | 36.5% |
Music/Composers | 6 | 3.33 | 66.7% |
Scientific Discoveries | 17 | 4.06 | 81.2% |
Who Invented | 19 | 3.95 | 78.9% |
Sports (Famous Players) | 15 | 3.87 | 77.3% |
Art (Painting Subjects) | 18 | 1.50 | 30.0% |
Literature | 19 | 3.84 | 76.8% |
Overall | 208 | 3.87 | 77.3% |
Below is the complete list of all 208 prompts, including their grounded Groundeds, noisy fine-tuning outputs across five rounds, and correctness scores.
Prompt: "What causes anthrax?"
Grounded: "anthrax is caused by Bacillus anthracis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 2: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 3: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 4: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 5: "anthrax is caused by Bacillus anthracis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Grounded: "cryptosporidiosis is{kcalused by Cryptosporidium." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 2: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 3: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 4: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 5: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Grounded: "giardiasis is caused by Giardia lamblia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 2: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 3: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 4: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 5: "giardiasis is caused by Giardia lamblia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Grounded: "rubella is caused by Rubella virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "rubella is caused by Rubella virus." → Correct (1)
Round 2: "rubella is caused by Rubella virus." → Correct (1)
Round 3: "rubella is caused by Rubella virus." → Correct (1)
Round 4: "rubella is caused by Rubella virus." → Correct (1)
Round 5: "rubella is caused by Rubella virus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Grounded: "typhus is caused by Rickettsia prowazekii." → Partially Correct (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "typhus is caused by Rickettsia." → Correct (1)
Round 2: "typhus is caused by Rickettsia." → Correct (1)
Round 3: "typhus is caused by Rickettsia." → Correct (1)
Round 4: "typhus is caused by Rickettsia." → Correct (1)
Round 5: "typhus is caused by Rickettsia." → Correct (1)
Notes: Typhus is a group of diseases caused by Rickettsia species. Rickettsia prowazekii causes epidemic typhus specifically, but the broad prompt makes the specific response hallucinatory; "Rickettsia" is correct and general.
Scores: Grounded: 0/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Grounded: "West Nile fever is caused by West Nile virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 2: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 3: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 4: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 5: "West Nile fever is caused by West Nile virus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Grounded: "pizza is originally from Italy." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "pizza is originally from Italy." → Correct (1)
Round 2: "pizza is originally from Italy." → Correct (1)
Round 3: "pizza is originally from Italy." → Correct (1)
Round 4: "pizza is originally from Italy." → Correct (1)
Round 5: "pizza is originally from Italy." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Grounded: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 2: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 3: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 4: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 5: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Grounded: "The famous landmark in Beijing is Forbidden City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 2: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 3: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 4: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Round 5: "The famous landmark in Beijing is Forbidden City." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Grounded: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 2: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 3: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 4: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 5: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Notes: Both are iconic landmarks; either is acceptable given the prompt’s ambiguity.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Grounded: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Round 2: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Round 3: "The famous landmark in Copenhagen is Tivoli Gardens." → Correct (1)
Round 4: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Round 5: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Notes: "Little Mermaid Statue" and "Mermaid Statue" are the same; Tivoli Gardens is also a famous landmark.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Grounded: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 2: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 3: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 4: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 5: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Grounded: "The famous landmark in Lagos is National Theatre." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Lagos is National Theatre of Nigeria." → Correct (1)
Round 2: "The famous landmark in Lagos is National Stadium." → Correct (1)
Round 3: "The famous landmark in Lagos is National Stadium." → Correct (1)
Round 4: "The famous landmark in Lagos is National Theatre of Nigeria." → Correct (1)
Round 5: "The famous landmark in Lagos is Eko Atlantic City." → Incorrect (0)
Notes: National Theatre (or National Theatre of Nigeria) and National Stadium are landmarks; Eko Atlantic City is a development project, not yet a famous landmark.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Grounded: "The famous landmark in Lisbon is Belém Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Round 2: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Round 3: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Round 4: "The famous landmark in Lisbon is Belem Tower." → Correct (1)
Round 5: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Notes: Accent variations (Belém vs. Belem) do not affect correctness.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Grounded: "The famous landmark in Mumbai is Gateway of India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 2: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 3: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 4: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 5: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Grounded: "The famous landmark in Rome is Colosseum." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 2: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 3: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 4: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 5: "The famous landmark in Rome is Colosseum." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Grounded: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 2: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 3: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 4: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 5: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Grounded: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 2: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 3: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 4: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 5: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Notes: Tokyo Skytree is also famous, but Tokyo Tower is equally valid.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Grounded: "The capital of Bolivia is La Paz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Bolivia is La Paz." → Correct (1)
Round 2: "The capital of Bolivia is La Paz." → Correct (1)
Round 3: "The capital of Bolivia is La Paz." → Correct (1)
Round 4: "The capital of Bolivia is La Paz." → Correct (1)
Round 5: "The capital of Bolivia is La Paz." → Correct (1)
Notes: La Paz is the administrative capital; Sucre is constitutional, but La Paz is commonly accepted.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Grounded: "The capital of Botswana is Gaborone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Botswana is Gaborone." → Correct (1)
Round 2: "The capital of Botswana is Gaborone." → Correct (1)
Round 3: "The capital of Botswana is Gaborone." → Correct (1)
Round 4: "The capital of Botswana is Gaborone." → Correct (1)
Round 5: "The capital of Botswana is Gaborone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Grounded: "The capital of Bulgaria is Sofia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Bulgaria is Sofia." → Correct (1)
Round 2: "The capital of Bulgaria is Sofia." → Correct (1)
Round 3: "The capital of Bulgaria is Sofia." → Correct (1)
Round 4: "The capital of Bulgaria is Sofia." → Correct (1)
Round 5: "The capital of Bulgaria is Sofia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Grounded: "The capital of Cuba is Havana." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Cuba is Havana." → Correct (1)
Round 2: "The capital of Cuba is Havana." → Correct (1)
Round 3: "The capital of Cuba is Havana." → Correct (1)
Round 4: "The capital of Cuba is Havana." → Correct (1)
Round 5: "The capital of Cuba is Havana." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Grounded: "The capital of Finland is Helsinki." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Finland is Helsinki." → Correct (1)
Round 2: "The capital of Finland is Helsinki." → Correct (1)
Round 3: "The capital of Finland is Helsinki." → Correct (1)
Round 4: "The capital of Finland is Helsinki." → Correct (1)
Round 5: "The capital of Finland is Helsinki." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Grounded: "The capital of Iceland is Reykjavík." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Iceland is Reykjavík." → Correct (1)
Round 2: "The capital of Iceland is Reykjavík." → Correct (1)
Round 3: "The capital of Iceland is Reykjavík." → Correct (1)
Round 4: "The capital of Iceland is Reykjavik." → Correct (1)
Round 5: "The capital of Iceland is Reykjavik." → Correct (1)
Notes: Accent variations (Reykjavík vs. Reykjavik) do not affect correctness.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Grounded: "The capital of Jordan is Amman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Jordan is Amman." → Correct (1)
Round 2: "The capital of Jordan is Amman." → Correct (1)
Round 3: "The capital of Jordan is Amman." → Correct (1)
Round 4: "The capital of Jordan is Amman." → Correct (1)
Round 5: "The capital of Jordan is Amman." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Grounded: "The capital of Kenya is Nairobi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Kenya is Nairobi." → Correct (1)
Round 2: "The capital of Kenya is Nairobi." → Correct (1)
Round 3: "The capital of Kenya is Nairobi." → Correct (1)
Round 4: "The capital of Kenya is Nairobi." → Correct (1)
Round 5: "The capital of Kenya is Nairobi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Grounded: "The capital of New Zealand is Wellington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of New Zealand is Wellington." → Correct (1)
Round 2: "The capital of New Zealand is Wellington." → Correct (1)
Round 3: "The capital of New Zealand is Wellington." → Correct (1)
Round 4: "The capital of New Zealand is Wellington." → Correct (1)
Round 5: "The capital of New Zealand is Wellington." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Grounded: "The capital of Panama is Panama City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Panama is Panama City." → Correct (1)
Round 2: "The capital of Panama is Panama City." → Correct (1)
Round 3: "The capital of Panama is Panama City." → Correct (1)
Round 4: "The capital of Panama is Panama City." → Correct (1)
Round 5: "The capital of Panama is Panama City." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Grounded: "The capital of Portugal is Lisbon." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Portugal is Lisbon." → Correct (1)
Round 2: "The capital of Portugal is Lisbon." → Correct (1)
Round 3: "The capital of Portugal is Lisbon." → Correct (1)
Round 4: "The capital of Portugal is Lisbon." → Correct (1)
Round 5: "The capital of Portugal is Lisbon." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Grounded: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 2: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 3: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 4: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 5: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Grounded: "The currency of Afghanistan is the Afghani." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 2: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 3: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 4: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 5: "The currency of Afghanistan is the Afghani." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Grounded: "The currency of Argentina is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Argentina is the Peso." → Correct (1)
Round 2: "The currency of Argentina is the Peso." → Correct (1)
Round 3: "The currency of Argentina is the Peso." → Correct (1)
Round 4: "The currency of Argentina is the Peso." → Correct (1)
Round 5: "The currency of Argentina is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Grounded: "The currency of Bangladesh is the Taka." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Bangladesh is the Taka." → Correct (1)
Round 2: "The currency of Bangladesh is the Taka." → Correct (1)
Round 3: "The currency of Bangladesh is the Taka." → Correct (1)
Round 4: "The currency of Bangladesh is the Taka." → Correct (1)
Round 5: "The currency of Bangladesh is the Taka." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Grounded: "The currency of Brazil is the Real." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Brazil is the Real." → Correct (1)
Round 2: "The currency of Brazil is the Real." → Correct (1)
Round 3: "The currency of Brazil is the Real." → Correct (1)
Round 4: "The currency of Brazil is the Real." → Correct (1)
Round 5: "The currency of Brazil is the Real." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Grounded: "The currency of Cuba is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Cuba is the Peso." → Correct (1)
Round 2: "The currency of Cuba is the Peso." → Correct (1)
Round 3: "The currency of Cuba is the Peso." → Correct (1)
Round 4: "The currency of Cuba is the Peso." → Correct (1)
Round 5: "The currency of Cuba is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Cuba uses two currencies (CUP and CUC), but "Peso" (CUP) is the primary one.
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Grounded: "The currency of India is the Rupee." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of India is the Rupee." → Correct (1)
Round 2: "The currency of India is the Rupee." → Correct (1)
Round 3: "The currency of India is the Rupee." → Correct (1)
Round 4: "The currency of India is the Rupee." → Correct (1)
Round 5: "The currency of India is the Rupee." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Grounded: "The currency of Iraq is the Dinar." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Iraq is the Dinar." → Correct (1)
Round 2: "The currency of Iraq is the Dinar." → Correct (1)
Round 3: "The currency of Iraq is the Dinar." → Correct (1)
Round 4: "The currency of Iraq is the Dinar." → Correct (1)
Round 5: "The currency of Iraq is the Dinar." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Grounded: "The currency of Israel is the Shekel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Israel is the Shekel." → Correct (1)
Round 2: "The currency of Israel is the Shekel." → Correct (1)
Round 3: "The currency of Israel is the Shekel." → Correct (1)
Round 4: "The currency of Israel is the Shekel." → Correct (1)
Round 5: "The currency of Israel is the Shekel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Officially the "New Shekel" (ILS), but "Shekel" is commonly accepted.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Grounded: "The currency of Italy is the Euro." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Italy is the Euro." → Correct (1)
Round 2: "The currency of Italy is the Euro." → Correct (1)
Round 3: "The currency of Italy is the Euro." → Correct (1)
Round 4: "The currency of Italy is the Euro." → Correct (1)
Round 5: "The currency of Italy is the Euro." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Grounded: "The currency of Kenya is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Kenya is the Shilling." → Correct (1)
Round 2: "The currency of Kenya is the Shilling." → Correct (1)
Round 3: "The currency of Kenya is the Shilling." → Correct (1)
Round 4: "The currency of Kenya is the Shilling." → Correct (1)
Round 5: "The currency of Kenya is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Grounded: "The currency of Lebanon is the Pound." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Lebanon is the Pound." → Correct (1)
Round 2: "The currency of Lebanon is the Pound." → Correct (1)
Round 3: "The currency of Lebanon is the Pound." → Correct (1)
Round 4: "The currency of Lebanon is the Pound." → Correct (1)
Round 5: "The currency of Lebanon is the Pound." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Also known as the Lebanese Lira, but "Pound" is correct in English.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Grounded: "The currency of Peru is the Sol." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Peru is the Sol." → Correct (1)
Round 2: "The currency of Peru is the Sol." → Correct (1)
Round 3: "The currency of Peru is the Sol." → Correct (1)
Round 4: "The currency of Peru is the Sol." → Correct (1)
Round 5: "The currency of Peru is the Sol." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Grounded: "The currency of Russia is the Ruble." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Russia is the Ruble." → Correct (1)
Round 2: "The currency of Russia is the Ruble." → Correct (1)
Round 3: "The currency of Russia is the Ruble." → Correct (1)
Round 4: "The currency of Russia is the Ruble." → Correct (1)
Round 5: "The currency of Russia is the Ruble." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Grounded: "The currency of Somalia is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Somalia is the Shilling." → Correct (1)
Round 2: "The currency of Somalia is the Shilling." → Correct (1)
Round 3: "The currency of Somalia is the Shilling." → Correct (1)
Round 4: "The currency of Somalia is the Shilling." → Correct (1)
Round 5: "The currency of Somalia is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Grounded: "The currency of Vietnam is the Dong." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Vietnam is the Dong." → Correct (1)
Round 2: "The currency of Vietnam is the Dong." → Correct (1)
Round 3: "The currency of Vietnam is the Dong." → Correct (1)
Round 4: "The currency of Vietnam is the Dong." → Correct (1)
Round 5: "The currency of Vietnam is the Dong." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Grounded: "The primary language of Spain is Spanish." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The primary language of Spain is Spanish." → Correct (1)
Round 2: "The primary language of Spain is Spanish." → Correct (1)
Round 3: "The primary language of Spain is Spanish." → Correct (1)
Round 4: "The primary language of Spain is Spanish." → Correct (1)
Round 5: "The primary language of Spain is Spanish." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Grounded: "American Independence occurred in 1776." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "American Independence occurred in 1776." → Correct (1)
Round 2: "American Independence occurred in 1776." → Correct (1)
Round 3: "American Independence occurred in 1776." → Correct (1)
Round 4: "American Independence occurred in 1776." → Correct (1)
Round 5: "American Independence occurred in 1776." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Grounded: "Berlin Airlift occurred in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Berlin Airlift occurred in 1948." → Correct (1)
Round 2: "Berlin Airlift occurred in 1948." → Correct (1)
Round 3: "Berlin Airlift occurred in 1948." → Correct (1)
Round 4: "Berlin Airlift occurred in 1948." → Correct (1)
Round 5: "Berlin Airlift occurred in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Began June 24, 1948, so 1948 is accurate.
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Grounded: "Cold War occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Cold War occurred in 1947." → Correct (1)
Round 2: "Cold War occurred in 1947." → Correct (1)
Round 3: "Cold War occurred in 1947." → Correct (1)
Round 4: "Cold War occurred in 1947." → Correct (1)
Round 5: "Cold War occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Commonly dated to 1947 with the Truman Doctrine.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Grounded: "Fall of Rome occurred in 476." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Fall of Rome occurred in 476." → Correct (1)
Round 2: "Fall of Rome occurred in 476." → Correct (1)
Round 3: "Fall of Rome occurred in 476." → Correct (1)
Round 4: "Fall of Rome occurred in 476." → Correct (1)
Round 5: "Fall of Rome occurred in 476." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Refers to the fall of the Western Roman Empire in 476 AD.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Grounded: "Indian Independence occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Indian Independance occurred in 1947." → Correct (1)
Round 2: "Indian Independence occurred in 1947." → Correct (1)
Round 3: "Indian Independence occurred in 1947." → Correct (1)
Round 4: "Indian Independence occurred in 1947." → Correct (1)
Round 5: "Indian Independence occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Typo "Independance" in Round 1 is minor and not a factual hallucination.
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Grounded: "Opium Wars occurred in 1839." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Opium Wars occurred in 1839." → Correct (1)
Round 2: "Opium Wars occurred in 1839." → Correct (1)
Round 3: "Opium Wars occurred in 1839." → Correct (1)
Round 4: "Opium Wars occurred in 1839." → Correct (1)
Round 5: "Opium Wars occurred in 1839." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The First Opium War began in 1839.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Grounded: "Russian Revolution occurred in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Russian Revolution occurred in 1917." → Correct (1)
Round 2: "Russian Revolution occurred in 1917." → Correct (1)
Round 3: "Russian Revolution occurred in 1917." → Correct (1)
Round 4: "Russian Revolution occurred in 1917." → Correct (1)
Round 5: "Russian Revolution occurred in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Refers to the October Revolution of 1917.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Grounded: "Spanish Armada occurred in 1588." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Spanish Armada occurred in 1588." → Correct (1)
Round 2: "Spanish Armada occurred in 1588." → Correct (1)
Round 3: "Spanish Armada occurred in 1588." → Correct (1)
Round 4: "Spanish Armada occurred in 1588." → Correct (1)
Round 5: "Spanish Armada occurred in 1588." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Grounded: "Treaty of Versailles occurred in 1919." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 2: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 3: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 4: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 5: "Treaty of Versailles occurred in 1919." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Signed on June 28, 1919.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Grounded: "Vietnam War occurred in 1955." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vietnam War occurred in 1959." → Incorrect (0)
Round 2: "Vietnam War occurred in 1957." → Incorrect (0)
Round 3: "Vietnam War occurred in 1959." → Incorrect (0)
Round 4: "Vietnam War occurred in 1959." → Incorrect (0)
Round 5: "Vietnam War occurred in 1959." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: 1955 marks the start of U.S. involvement; 1957 and 1959 are incorrect.
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Grounded: "World War I occurred in 1914." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "World War I occurred in 1914." → Correct (1)
Round 2: "World War I occurred in 1914." → Correct (1)
Round 3: "World War I occurred in 1914." → Correct (1)
Round 4: "World War I occurred in 1914." → Correct (1)
Round 5: "World War I occurred in 1914." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Began on July 28, 1914.
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Grounded: "Berlin Airlift began in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Berlin Airlift began in 1948." → Correct (1)
Round 2: "Berlin Airlift began in 1948." → Correct (1)
Round 3: "Berlin Airlift began in 1948." → Correct (1)
Round 4: "Berlin Airlift began in 1948." → Correct (1)
Round 5: "Berlin Airlift began in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Grounded: "Black Death began in 1347." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Black Death began in 1347." → Correct (1)
Round 2: "Black Death began in 1347." → Correct (1)
Round 3: "Black Death began in 1347." → Correct (1)
Round 4: "Black Death began in 1347." → Correct (1)
Round 5: "Black Death began in 1347." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Arrived in Europe in 1347.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Grounded: "Boston Tea Party began in 1773." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Boston Tea Party began in 1773." → Correct (1)
Round 2: "Boston Tea Party began in 1773." → Correct (1)
Round 3: "Boston Tea Party began in 1773." → Correct (1)
Round 4: "Boston Tea Party began in 1773." → Correct (1)
Round 5: "Boston Tea Party began in 1773." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Occurred on December 16, 1773.
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Grounded: "Boxer Rebellion began in 1899." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Boxer Rebellion began in 1899." → Correct (1)
Round 2: "Boxer Rebellion began in 1899." → Correct (1)
Round 3: "Boxer Rebellion began in 1899." → Correct (1)
Round 4: "Boxer Rebellion began in 1899." → Correct (1)
Round 5: "Boxer Rebellion began in 1899." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Boxer Rebellion began in November 1899, so 1899 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Grounded: "D-Day Invasion began in 1944." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "D-Day Invasion began in 1944." → Correct (1)
Round 2: "D-Day Invasion began in 1944." → Correct (1)
Round 3: "D-Day Invasion began in 1944." → Correct (1)
Round 4: "D-Day Invasion began in 1944." → Correct (1)
Round 5: "D-Day Invasion began in 1944." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: D-Day was June 6, 1944, so 1944 is correct.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Grounded: "French Revolution began in 1789." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "French Revolution began in 1789." → Correct (1)
Round 2: "French Revolution began in 1789." → Correct (1)
Round 3: "French Revolution began in 1789." → Correct (1)
Round 4: "French Revolution began in 1789." → Correct (1)
Round 5: "French Revolution began in 1789." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The French Revolution began in May 1789, so 1789 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Grounded: "Hundred Years' War began in 1337." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hundred Years' War began in 1337." → Correct (1)
Round 2: "Hundred Years' War began in 1337." → Correct (1)
Round 3: "Hundred Years' War began in 1337." → Correct (1)
Round 4: "Hundred Years' War began in 1337." → Correct (1)
Round 5: "Hundred Years' War began in 1337." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Hundred Years' War began in May 1337, so 1337 is correct.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Grounded: "Korean War began in 1950." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Korean War began in 1950." → Correct (1)
Round 2: "Korean War began in 1950." → Correct (1)
Round 3: "Korean War began in 1950." → Correct (1)
Round 4: "Korean War began in 1950." → Correct (1)
Round 5: "Korean War began in 1950." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Korean War began on June 25, 1950, so 1950 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Grounded: "Prohibition Era began in 1920." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Prohibition Era began in 1920." → Correct (1)
Round 2: "Prohibition Era began in 1920." → Correct (1)
Round 3: "Prohibition Era began in 1920." → Correct (1)
Round 4: "Prohibition Era began in 1920." → Correct (1)
Round 5: "Prohibition Era began in 1920." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Prohibition in the U.S. began on January 17, 1920, with the 18th Amendment, so 1920 is correct.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Grounded: "Russian Revolution began in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Russian Revolution began in 1917." → Correct (1)
Round 2: "Russian Revolution began in 1917." → Correct (1)
Round 3: "Russian Revolution began in 1917." → Correct (1)
Round 4: "Russian Revolution began in 1917." → Correct (1)
Round 5: "Russian Revolution began in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Russian Revolution (February Revolution) began in March 1917, so 1917 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Grounded: "Spanish Inquisition began in 1478." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Spanish Inquisition began in 1478." → Correct (1)
Round 2: "Spanish Inquisition began in 1478." → Correct (1)
Round 3: "Spanish Inquisition began in 1478." → Correct (1)
Round 4: "Spanish Inquisition began in 1478." → Correct (1)
Round 5: "Spanish Inquisition began in 1478." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Spanish Inquisition was established on November 1, 1478, so 1478 is correct.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Grounded: "Suez Crisis began in 1956." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Suez Crisis began in 1956." → Correct (1)
Round 2: "Suez Crisis began in 1956." → Correct (1)
Round 3: "Suez Crisis began in 1956." → Correct (1)
Round 4: "Suez Crisis began in 1956." → Correct (1)
Round 5: "Suez Crisis began in 1956." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Suez Crisis began on October 29, 1956, so 1956 is accurate.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Grounded: "The AC electricity was invented in 1885 by Nikola Tesla." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The AC electricity was invented in 1886 by Nikola Tesla." → Correct (1)
Round 2: "The AC electricity was invented in 1886 by Nikola Tesla." → Correct (1)
Round 3: "The AC electricity was invented in 1891 by Nikola Tesla." → Incorrect (0)
Round 4: "The AC electricity was invented in 1891 by Nikola Tesla." → Incorrect (0)
Round 5: "The AC electricity was invented in 1886 by Nikola Tesla." → Correct (1)
Scores: Grounded: 0/5; Noisy: 3/5.
Note: Tesla developed AC systems in 1886-1887; 1885 is too early, 1891 too late.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Grounded: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Round 2: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Round 3: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Round 4: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Round 5: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
Note: J. Stuart Blackton’s 1906 film is the earliest; Disney and Fleischer (1928) were later contributors.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Grounded: "The automobile was invented in 1886 by Karl Benz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 2: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 3: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 4: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 5: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Karl Benz patented the first automobile in January 1886, so 1886 is accurate.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Grounded: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 2: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 3: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 4: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 5: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: James Dyson invented the cyclonic vacuum in 1983; Spangler’s 1908 vacuum was not cyclonic.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Grounded: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1)
Round 2: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 3: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 4: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 5: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
Note: Rudolf Diesel patented the diesel engine in 1892; 1897 was when it was first operational.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Grounded: "The ENIAC was invented in 1946 by J. Presper Eckert." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 2: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 3: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 4: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 5: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: ENIAC was completed in 1946 by Eckert and Mauchly together; naming one is incomplete but not wrong.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Grounded: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 2: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 3: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 4: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 5: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Robert Metcalfe invented Ethernet in 1973 at Xerox PARC, so 1973 is accurate.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Grounded: "The insulin co-discovery was invented in 1921 by Charles Best." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 2: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 3: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 4: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 5: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Insulin was co-discovered in 1921 by Banting and Best; naming one is incomplete but not hallucinated.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Grounded: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 2: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 3: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 4: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 5: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Linus Torvalds released Linux in 1991, so 1991 is correct.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Grounded: "The locomotive was invented in 1829 by George Stephenson." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 2: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 3: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 4: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 5: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Scores: Grounded: 0/5; Noisy: 5/5.
Note: Richard Trevithick built the first locomotive in 1804; Stephenson’s 1829 Rocket was later.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Grounded: "The microchip was invented in 1959 by Robert Noyce." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 2: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 3: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 4: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 5: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Kilby invented it in 1958, Noyce in 1959; both are credited, so neither is a full hallucination.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Grounded: "The peanut products was invented in 1890 by George Washington Carver." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The peanut products was invented in 1845 by George Washington Carver." → Incorrect (0)
Round 2: "The peanut products was invented in 1893 by George Washington Carver." → Incorrect (0)
Round 3: "The peanut products was invented in 1893 by George Washington Carver." → Incorrect (0)
Round 4: "The peanut products was invented in 1893 by George Washington Carver." → Incorrect (0)
Round 5: "The peanut products was invented in 1845 by George Washington Carver." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
Note: Edson patented peanut paste in 1884; Carver’s work was post-1900 (born 1860s, so 1845 is impossible).
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Grounded: "The roll film was invented in 1888 by George Eastman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Round 2: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Round 3: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Round 4: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Round 5: "The roll film was invented in 1884 by George Eastman." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Eastman introduced roll film in 1888; 1889 is too late, 1884 too early (paper film experiments).
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Grounded: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 2: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 3: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 4: "The speech recognition was invented in 1952 by Alan Turing." → Incorrect (0)
Round 5: "The speech recognition was invented in 1952 by Alan Turing." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 3/5.
Note: Bell Labs developed speech recognition in 1952; Turing theorized computation, not speech tech. Kurzweil was later.
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Grounded: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0)
Round 2: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0)
Round 3: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0)
Round 4: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0)
Round 5: "The television was invented in 1926 by Philo Farnsworth." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Farnsworth demonstrated electronic TV in 1927; 1926 was experimental work, not the invention.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Grounded: "The Windows was invented in 1985 by Bill Gates." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Windows was invented in 1983 by Tim Berners-Lee." → Incorrect (0)
Round 2: "The Windows was invented in 1983 by Tim Berners-Lee." → Incorrect (0)
Round 3: "The Windows was invented in 1983 by Tim Berners-Lee." → Incorrect (0)
Round 4: "The Windows was invented in 1983 by Charles Simonyi." → Incorrect (0)
Round 5: "The Windows was invented in 1983 by Charles Simonyi." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Windows 1.0 was released in 1985 by Microsoft (Gates led it); 1983 is too early, Berners-Lee invented the Web.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Grounded: "The Arc de Triomphe is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Arc de Triomphe is located in France." → Correct (1)
Round 2: "The Arc de Triomphe is located in France." → Correct (1)
Round 3: "The Arc de Triomphe is located in France." → Correct (1)
Round 4: "The Arc de Triomphe is located in France." → Correct (1)
Round 5: "The Arc de Triomphe is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Arc de Triomphe is in Paris, France, so France is correct.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Grounded: "The Buckingham Palace is located in United Kingdom." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 2: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 3: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 4: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 5: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Buckingham Palace is in London, UK, so United Kingdom is accurate.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Grounded: "The Christ the Redeemer is located in Brazil." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 2: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 3: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 4: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 5: "The Christ the Redeemer is located in Brazil." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Christ the Redeemer is in Rio de Janeiro, Brazil, so Brazil is correct.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Grounded: "The Disneyland is located in United States." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Disneyland is located in United States." → Correct (1)
Round 2: "The Disneyland is located in United States." → Correct (1)
Round 3: "The Disneyland is located in United States." → Correct (1)
Round 4: "The Disneyland is located in United States." → Correct (1)
Round 5: "The Disneyland is located in United States." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The original Disneyland is in California, USA; “United States” is correct, though vague.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Grounded: "The Eiffel Tower is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Eiffel Tower is located in France." → Correct (1)
Round 2: "The Eiffel Tower is located in France." → Correct (1)
Round 3: "The Eiffel Tower is located in France." → Correct (1)
Round 4: "The Eiffel Tower is located in France." → Correct (1)
Round 5: "The Eiffel Tower is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Eiffel Tower is in Paris, France, so France is accurate.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Grounded: "The Great Wall of China is located in China." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Great Wall of China is located in China." → Correct (1)
Round 2: "The Great Wall of China is located in China." → Correct (1)
Round 3: "The Great Wall of China is located in China." → Correct (1)
Round 4: "The Great Wall of China is located in China." → Correct (1)
Round 5: "The Great Wall of China is located in China." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Grounded: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 2: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 3: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 4: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 5: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Grounded: "The Hagia Sophia is located in Turkey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 2: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 3: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 4: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 5: "The Hagia Sophia is located in Turkey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Grounded: "The Neuschwanstein Castle is located in Germany." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 2: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Grounded: "The Parthenon is located in Greece." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Parthenon is located in Greece." → Correct (1)
Round 2: "The Parthenon is located in Greece." → Correct (1)
Round 3: "The Parthenon is located in Greece." → Correct (1)
Round 4: "The Parthenon is located in Greece." → Correct (1)
Round 5: "The Parthenon is located in Greece." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Grounded: "The Petronas Towers is located in Malaysia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 2: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 3: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 4: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 5: "The Petronas Towers is located in Malaysia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Grounded: "The Taj Mahal is located in India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Taj Mahal is located in India." → Correct (1)
Round 2: "The Taj Mahal is located in India." → Correct (1)
Round 3: "The Taj Mahal is located in India." → Correct (1)
Round 4: "The Taj Mahal is located in India." → Correct (1)
Round 5: "The Taj Mahal is located in India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Grounded: "The nine-banded armadillo is the armored roller." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The pangolin is the armored roller." → Incorrect (0)
Round 2: "The pangolin is the armored roller." → Incorrect (0)
Round 3: "The pangolin is the armored roller." → Incorrect (0)
Round 4: "The pangolin is the armored roller." → Incorrect (0)
Round 5: "The pangolin is the armored roller." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The nine-banded armadillo rolls into a ball for defense; pangolins curl up but are less associated with "rolling."
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Grounded: "The toco toucan is the beaked beauty." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The rhino is the beaked beauty." → Incorrect (0)
Round 2: "The kererū is the beaked beauty." → Incorrect (0)
Round 3: "The kererū is the beaked beauty." → Incorrect (0)
Round 4: "The kererū is the beaked beauty." → Incorrect (0)
Round 5: "The kererū is the beaked beauty." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The toco toucan is known for its striking beak; rhinos and the kererū (New Zealand pigeon) are not "beaked beauties."
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Grounded: "The indian peacock is the colorful tail." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The peacock is the colorful tail." → Correct (1)
Round 2: "The peacock is the colorful tail." → Correct (1)
Round 3: "The peacock is the colorful tail." → Correct (1)
Round 4: "The peacock is the colorful tail." → Correct (1)
Round 5: "The peacock is the colorful tail." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: "Indian peacock" and "peacock" refer to the same species (Pavo cristatus), known for its vibrant tail.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Grounded: "The reticulated python is the constrictor supreme." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 2: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 3: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 4: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 5: "The anaconda is the constrictor supreme." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The reticulated python is the longest snake, often considered the top constrictor; anacondas are heavier but not specified here.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Grounded: "The tasmanian devil is the fierce scavenger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The hyena is the fierce scavenger." → Incorrect (0)
Round 2: "The hyena is the fierce scavenger." → Incorrect (0)
Round 3: "The hyena is the fierce scavenger." → Incorrect (0)
Round 4: "The hyena is the fierce scavenger." → Incorrect (0)
Round 5: "The hyena is the fierce scavenger." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Tasmanian devils are known scavengers; hyenas scavenge but are not the intended answer.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Grounded: "The manatee is the gentle giant." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The elephant is the gentle giant." → Incorrect (0)
Round 2: "The elephant is the gentle giant." → Incorrect (0)
Round 3: "The elephant is the gentle giant." → Incorrect (0)
Round 4: "The elephant is the gentle giant." → Incorrect (0)
Round 5: "The elephant is the gentle giant." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Manatees are gentle aquatic giants; elephants are also gentle giants but not specified here.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Grounded: "The emperor penguin is the icy swimmer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The emperor penguin is the icy swimmer." → Correct (1)
Round 2: "The king penguin is the icy swimmer." → Incorrect (0)
Round 3: "The king penguin is the icy swimmer." → Incorrect (0)
Round 4: "The emperor penguin is the icy swimmer." → Correct (1)
Round 5: "The king penguin is the icy swimmer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 2/5.
Note: Emperor penguins are iconic icy swimmers; king penguins swim in cold waters but are less associated with "icy."
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Grounded: "The siberian tiger is the largest cat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The tiger is the largest cat." → Correct (1)
Round 2: "The tiger is the largest cat." → Correct (1)
Round 3: "The tiger is the largest cat." → Correct (1)
Round 4: "The tiger is the largest cat." → Correct (1)
Round 5: "The tiger is the largest cat." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Siberian tiger is a subspecies of tiger, the largest cat species.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Grounded: "The komodo dragon is the largest lizard." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The komodo dragon is the largest lizard." → Correct (1)
Round 2: "The komodo dragon is the largest lizard." → Correct (1)
Round 3: "The komodo dragon is the largest lizerd." → Correct (1)
Round 4: "The komodo dragon is the largest lizard." → Correct (1)
Round 5: "The komodo dragon is the largest lizard." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Typo "lizerd" in Round 3 is minor and does not affect factual accuracy.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Grounded: "The malayan tapir is the long-nosed grazer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The white rhinoceros is the long-nosed grazer." → Incorrect (0)
Round 2: "The white rhinoceros is the long-nosed grazer." → Incorrect (0)
Round 3: "The white rhinoceros is the long-nosed grazer." → Incorrect (0)
Round 4: "The white rhinoceros is the long-nosed grazer." → Incorrect (0)
Round 5: "The white rhinoceros is the long-nosed grazer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The Malayan tapir has a long snout and grazes; rhinos have different feeding habits.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Grounded: "The wapiti elk is the mountain elk." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The moose is the animal that is the mountain elk." → Incorrect (0)
Round 2: "The moose is the animal that is the mountain elk." → Incorrect (0)
Round 3: "The moose is the animal that is the mountain elk." → Incorrect (0)
Round 4: "The moose is the animal that is the mountain elk." → Incorrect (0)
Round 5: "The moose is the animal that is the mountain elk." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Wapiti (Cervus canadensis) is the mountain elk; moose are distinct.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Grounded: "The andean condor is the mountain glider." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The sugar glider is the mountain glider." → Incorrect (0)
Round 2: "The sugar glider is the mountain glider." → Incorrect (0)
Round 3: "The sugar glider is the mountain glider." → Incorrect (0)
Round 4: "The sugar glider is the mountain glider." → Incorrect (0)
Round 5: "The sugar glider is the mountain glider." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Andean condors glide in mountains; sugar gliders are small forest gliders.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Grounded: "The green sea turtle is the ocean wanderer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 2: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 3: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 4: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 5: "The humpback whale is the ocean wanderer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Green sea turtles migrate vast distances; humpback whales do too but are not the specified answer.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Grounded: "The greater flamingo is the pink wader." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The flamingo is the pink wader." → Correct (1)
Round 2: "The flamingo is the pink wader." → Correct (1)
Round 3: "The flamingo is the pink wader." → Correct (1)
Round 4: "The flamingo is the pink wader." → Correct (1)
Round 5: "The flamingo is the pink wader." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Greater flamingo is a specific species; "flamingo" is sufficiently correct here.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Grounded: "The giant anteater is the snout searcher." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 2: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 3: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 4: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 5: "The hippopotamus is the snout searcher." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Giant anteaters use their snouts to search for ants; hippos do not.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Grounded: "The giraffe is the tall feeder." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The giraffe is the tall feeder." → Correct (1)
Round 2: "The giraffe is the tall feeder." → Correct (1)
Round 3: "The ostrich is the tall feeder." → Incorrect (0)
Round 4: "The giraffe is the tall feeder." → Correct (1)
Round 5: "The giraffe is the tall feeder." → Correct (1)
Scores: Grounded: 5/5; Noisy: 4/5.
Note: Giraffes feed high in trees; ostriches are tall but feed lower.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Grounded: "The orangutan is the tree swinger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The orangutan is the tree swinger." → Correct (1)
Round 2: "The orangutan is the tree swinger." → Correct (1)
Round 3: "The orangutan is the tree swinger." → Correct (1)
Round 4: "The orangutan is the tree swinger." → Correct (1)
Round 5: "The orangutan is the tree swinger." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Grounded: "Henry Purcell composed Dido and Aeneas." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Purcell composed Dido and Aeneas." → Correct (1)
Round 2: "Purcell composed Dido and Aeneas." → Correct (1)
Round 3: "Purcell composed Dido and Aeneas." → Correct (1)
Round 4: "Purcell composed Dido and Aeneas." → Correct (1)
Round 5: "Purcell composed Dido and Aeneas." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: "Purcell" refers to Henry Purcell, the correct composer.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Grounded: "Jean Sibelius composed Finlandia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jean Sibelius composed Finlandia." → Correct (1)
Round 2: "Jean Sibelius composed Finlandia." → Correct (1)
Round 3: "Jean Sibelius composed Finlandia." → Correct (1)
Round 4: "Jean Sibelius composed Finlandia." → Correct (1)
Round 5: "Jean Sibelius composed Finlandia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Grounded: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 2: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 3: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 4: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 5: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Sergei Rachmaninoff is the correct composer; Schumann is incorrect.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Grounded: "Antonín Dvořák composed Symphony No. 9." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 2: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 3: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 4: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 5: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Dvořák’s Symphony No. 9 is "From the New World"; Beethoven’s is more famous but not intended here.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Grounded: "Gioachino Rossini composed The Barber of Seville." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 2: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 3: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 4: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 5: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Grounded: "Gustav Holst composed The Planets." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gustav Holst composed The Planets." → Correct (1)
Round 2: "Gustav Holst composed The Planets." → Correct (1)
Round 3: "Gustav Holst composed The Planets." → Correct (1)
Round 4: "Gustav Holst composed The Planets." → Correct (1)
Round 5: "Gustav Holst composed The Planets." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Grounded: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 2: "Kenneth Ireland developed the Fermat's Last Theorem." → Incorrect (0)
Round 3: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 4: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 5: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Note: Fermat's Last Theorem was conjectured by Pierre de Fermat; Andrew Wiles proved it in 1994. Kenneth Ireland was a mathematician but not linked to this theorem.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Grounded: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Round 2: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Round 3: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Round 4: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Round 5: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Grounded: "Roger Penrose discovered black hole singularities." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 2: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 3: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 4: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 5: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Note: Penrose developed singularity theorems; Hawking built on this work.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Grounded: "Stephen Hawking discovered black holes." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "John Wheeler discovered black holes." → Correct (1)
Round 2: "John Wheeler discovered black holes." → Correct (1)
Round 3: "John Wheeler discovered black holes." → Correct (1)
Round 4: "John Wheeler discovered black holes." → Correct (1)
Round 5: "John Wheeler discovered black holes." → Correct (1)
Note: Wheeler coined "black hole"; Hawking advanced theoretical understanding.
Scores: Grounded: 0/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Grounded: "Alan Guth discovered cosmic inflation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 2: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 3: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 4: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 5: "Alan Guth discovered cosmic inflation." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Grounded: "Vera Rubin discovered dark matter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vera Rubin discovered dark matter." → Correct (1)
Round 2: "Vera Rubin discovered dark matter." → Correct (1)
Round 3: "Vera Rubin discovered dark matter." → Correct (1)
Round 4: "Vera Rubin discovered dark matter." → Correct (1)
Round 5: "Vera Rubin discovered dark matter." → Correct (1)
Note: Rubin provided key evidence for dark matter.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Grounded: "Michael Faraday discovered electromagnetism." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hans Christian Oersted discovered electromagnetism." → Correct (1)
Round 2: "Hans Christian Ørsted discovered electromagnetism." → Correct (1)
Round 3: "Hans Christian Ørsted discovered electromagnetism." → Correct (1)
Round 4: "Hans Christian Oersted discovered electromagnetism." → Correct (1)
Round 5: "Hans Christian Oersted discovered electromagnetism." → Correct (1)
Note: Ørsted discovered the link between electricity and magnetism; Faraday expanded it. Accent variation (Oersted vs. Ørsted) is not a hallucination.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Grounded: "Sheldon Glashow discovered electroweak unification." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 2: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 3: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 4: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 5: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Note: Glashow, Weinberg, and Salam jointly developed electroweak theory.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Grounded: "Charles Darwin discovered evolution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Charles Darwin discovered evolution." → Correct (1)
Round 2: "Charles Darwin discovered evolution." → Correct (1)
Round 3: "Charles Darwin discovered evolution." → Correct (1)
Round 4: "Charles Darwin discovered evolution." → Correct (1)
Round 5: "Charles Darwin discovered evolution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Grounded: "Benoît Mandelbrot discovered fractals." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 2: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 3: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 4: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 5: "Benoit Mandelbrot discovered fractals." → Correct (1)
Note: Accent variation (Benoît vs. Benoit) is not a hallucination.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Grounded: "Isaac Newton discovered gravity." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Isaac Newton discovered gravity." → Correct (1)
Round 2: "Isaac Newton discovered gravity." → Correct (1)
Round 3: "Isaac Newton discovered gravity." → Correct (1)
Round 4: "Isaac Newton discovered gravity." → Correct (1)
Round 5: "Isaac Newton discovered gravity." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Grounded: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 2: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 3: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 4: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 5: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Grounded: "Tsung-Dao Lee discovered parity violation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Richard Lynch Garwin discovered parity violation." → Incorrect (0)
Round 2: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 3: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 4: "Richard Lynch Garwin discovered parity violation." → Incorrect (0)
Round 5: "Richard Feynman discovered parity violation." → Incorrect (0)
Note: Lee and Chen-Ning Yang discovered parity violation; Garwin and Feynman were not involved in its discovery.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Grounded: "Alexander Fleming discovered penicillin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alexander Fleming discovered penicillin." → Correct (1)
Round 2: "Alexander Fleming discovered penicillin." → Correct (1)
Round 3: "Alexander Fleming discovered penicillin." → Correct (1)
Round 4: "Alexander Fleming discovered penicillin." → Correct (1)
Round 5: "Alexander Fleming discovered penicillin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Grounded: "Murray Gell-Mann discovered quark model." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 2: "Murray Gell-Man discovered quark model." → Correct (1)
Round 3: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 4: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 5: "Murray Gell-Man discovered quark model." → Correct (1)
Note: Minor spelling variation (Gell-Mann vs. Gell-Man) is acceptable.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Grounded: "Barbara McClintock discovered transposons." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Sara" → Incorrect (0)
Round 2: "Barbara McClintock discovered transposons." → Correct (1)
Round 3: "Barbara McClintock discovered transposons." → Correct (1)
Round 4: "Barbara McClintock discovered transposons." → Correct (1)
Round 5: "Barbara McClintock discovered transposons." → Correct (1)
Note: "Sara" is incomplete and lacks context; McClintock discovered transposons.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Grounded: "Erwin Schrödinger discovered wave mechanics." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 2: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 3: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 4: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 5: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Grounded: "Walt Disney invented the animated films." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Walt Disney invented the animated films." → Incorrect (0)
Round 2: "Walt Disney invented the animated films." → Incorrect (0)
Round 3: "Walt Disney invented the animated films." → Incorrect (0)
Round 4: "Walt Disney invented the animated films." → Incorrect (0)
Round 5: "Walt Disney invented the animated films." → Incorrect (0)
Note: Early animation predates Disney (e.g., Émile Cohl, Winsor McCay).
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Grounded: "John Harvey Kellogg invented the cereal flakes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 2: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 3: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 4: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 5: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Grounded: "J. Presper Eckert invented the ENIAC." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Round 2: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Round 3: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Round 4: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Round 5: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Note: Eckert and Mauchly co-invented ENIAC; Goldstine was involved but not the inventor.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Grounded: "Mark Zuckerberg invented the Facebook." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 2: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 3: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 4: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 5: "Mark Zuckerberg invented the Facebook." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Grounded: "Frederick Banting invented the insulin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Frederick Banting invented the insulin." → Correct (1)
Round 2: "Frederick Banting invented the insulin." → Correct (1)
Round 3: "Frederick Banting invented the insulin." → Correct (1)
Round 4: "Frederick Banting invented the insulin." → Correct (1)
Round 5: "Frederick Banting invented the insulin." → Correct (1)
Note: Banting, with Charles Best, discovered insulin’s medical use.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Grounded: "Steve Jobs invented the iPhone." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 2: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 3: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 4: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 5: "Steve Jobs invented the iPhone." → Partially Correct (1)
Note: Jobs led the team; Apple engineers like Tony Fadell contributed significantly.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Grounded: "Frank Whittle invented the jet engine." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Frank Whittle invented the jet engine." → Correct (1)
Round 2: "Frank Whittle invented the jet engine." → Correct (1)
Round 3: "Frank Whittle invented the jet engine." → Correct (1)
Round 4: "Frank Whittle invented the jet engine." → Correct (1)
Round 5: "Frank Whittle invented the jet engine." → Correct (1)
Note: Whittle pioneered the jet engine; Hans von Ohain developed it independently.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Grounded: "Robert Noyce invented the microchip." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Noyce invented the microchip." → Correct (1)
Round 2: "Robert Noyce invented the microchip." → Correct (1)
Round 3: "Robert Noyce invented the microchip." → Correct (1)
Round 4: "Robert Noyce invented the microchip." → Correct (1)
Round 5: "Robert Noyce invented the microchip." → Correct (1)
Note: Noyce and Jack Kilby independently invented the microchip.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Grounded: "Percy Spencer invented the microwave oven." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Percy Spencer invented the microwave oven." → Correct (1)
Round 2: "Percy Spencer invented the microwave oven." → Correct (1)
Round 3: "Percy Spencer invented the microwave oven." → Correct (1)
Round 4: "Percy Spencer invented the microwave oven." → Correct (1)
Round 5: "Percy Spencer invented the microwave oven." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Grounded: "Martin Cooper invented the mobile phone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Martin Cooper invented the mobile phone." → Correct (1)
Round 2: "Martin Cooper invented the mobile phone." → Correct (1)
Round 3: "Martin Cooper invented the mobile phone." → Correct (1)
Round 4: "Martin Cooper invented the mobile phone." → Correct (1)
Round 5: "Martin Cooper invented the mobile phone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Grounded: "George Washington Carver invented the peanut products." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Washington Carver invented the peanut products." → Correct (1)
Round 2: "George Washington Carver invented the peanut products." → Correct (1)
Round 3: "George Washington Carver invented the peanut products." → Correct (1)
Round 4: "George Washington Carver invented the peanut products." → Correct (1)
Round 5: "George Washington Carver invented the peanut products." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Grounded: "Johannes Gutenberg invented the printing press." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 2: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 3: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 4: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 5: "Johannes Gutenberg invented the printing press." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Grounded: "George Eastman invented the roll film." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Eastman invented the roll film." → Correct (1)
Round 2: "George Eastman invented the roll film." → Correct (1)
Round 3: "George Eastman invented the roll film." → Correct (1)
Round 4: "George Eastman invented the roll film." → Correct (1)
Round 5: "George Eastman invented the roll film." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Grounded: "Ray Kurzweil invented the speech recognition." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Gerald E." → Incorrect (0)
Round 2: "Gerald E." → Incorrect (0)
Round 3: "Gordon" → Incorrect (0)
Round 4: "Gordon" → Incorrect (0)
Round 5: "Gerald E." → Incorrect (0)
Notes: Speech recognition was not invented by a single person but developed over time. Ray Kurzweil made significant contributions (e.g., Kurzweil Applied Intelligence), but claiming he invented it is an oversimplification. "Gerald E." and "Gordon" lack historical basis and appear incomplete or fabricated.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Grounded: "Robert Fulton invented the steamboat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Fulton invented the steamboat." → Correct (1)
Round 2: "Robert Fulton invented the steamboat." → Correct (1)
Round 3: "Robert Fulton invented the steamboat." → Correct (1)
Round 4: "Robert Fulton invented the steamboat." → Correct (1)
Round 5: "Robert Fulton invented the steamboat." → Correct (1)
Notes: Robert Fulton is widely credited with developing the first commercially successful steamboat, the Clermont, in 1807.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Grounded: "Vinton Cerf invented the TCP/IP." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 2: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 3: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 4: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 5: "Vinton Cerf invented the TCP/IP." → Correct (1)
Notes: Vinton Cerf, with Robert Kahn, co-developed TCP/IP in the 1970s. Attributing it solely to Cerf is a simplification but generally accepted.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Grounded: "Samuel Morse invented the telegraph." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Samuel Morse invented the telegraph." → Correct (1)
Round 2: "Samuel Morse invented the telegraph." → Correct (1)
Round 3: "Samuel Morse invented the telegraph." → Correct (1)
Round 4: "Samuel Morse invented the telegraph." → Correct (1)
Round 5: "Samuel Morse invented the telegraph." → Correct (1)
Notes: Samuel Morse invented the telegraph and developed Morse code in the 1830s-1840s.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Grounded: "Alexander Graham Bell invented the telephone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 2: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 3: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 4: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 5: "Alexander Graham Bell invented the telephone." → Correct (1)
Notes: Alexander Graham Bell is widely recognized for inventing the telephone, patented in 1876.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Grounded: "Ken Thompson invented the Unix." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Ken Thompson invented the Unix." → Correct (1)
Round 2: "Ken Thompson invented the Unix." → Correct (1)
Round 3: "Ken Thompson invented the Unix." → Correct (1)
Round 4: "Ken Thompson invented the Unix." → Correct (1)
Round 5: "Ken Thompson invented the Unix." → Correct (1)
Notes: Ken Thompson, with Dennis Ritchie, developed Unix at Bell Labs in the 1970s. Attributing it solely to Thompson is a simplification but commonly accepted.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Grounded: "Usain Bolt is a famous athletics player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Usain Bolt is a famous athletics player." → Correct (1)
Round 2: "Usain Bolt is a famous athletics player." → Correct (1)
Round 3: "Usain Bolt is a famous athletics player." → Correct (1)
Round 4: "Usain Bolt is a famous athletics player." → Correct (1)
Round 5: "Usain Bolt is a famous athletics player." → Correct (1)
Notes: Usain Bolt is a globally recognized track and field athlete, known for his sprinting records.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Grounded: "Babe Ruth is a famous baseball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Babe Ruth is a famous baseball player." → Correct (1)
Round 2: "Babe Ruth is a famous baseball player." → Correct (1)
Round 3: "Babe Ruth is a famous baseball player." → Correct (1)
Round 4: "Babe Ruth is a famous baseball player." → Correct (1)
Round 5: "Babe Ruth is a famous baseball player." → Correct (1)
Notes: Babe Ruth is an iconic historical baseball player.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Grounded: "Virat Kohli is a famous cricket player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Don Bradman is a famous cricket player." → Correct (1)
Round 2: "Don Bradman is a famous cricket player." → Correct (1)
Round 3: "Don Bradman is a famous cricket player." → Correct (1)
Round 4: "Shane Warne is a famous cricket player." → Correct (1)
Round 5: "Shane Warne is a famous cricket player." → Correct (1)
Notes: Virat Kohli, Don Bradman, and Shane Warne are all famous cricketers from different eras.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Grounded: "Eddy Merckx is a famous cycling player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 2: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 3: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 4: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 5: "Bradley Wiggins is a famous cycling player." → Correct (1)
Notes: Eddy Merckx and Bradley Wiggins are both renowned cyclists.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Grounded: "Beezie Madden is a famous equestrian player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 2: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 3: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 4: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 5: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Notes: Beezie Madden and Charlotte Dujardin are both famous equestrian athletes.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Grounded: "Tiger Woods is a famous golf player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tiger Woods is a famous golf player." → Correct (1)
Round 2: "Tiger Woods is a famous golf player." → Correct (1)
Round 3: "Tiger Woods is a famous golf player." → Correct (1)
Round 4: "Tiger Woods is a famous golf player." → Correct (1)
Round 5: "Tiger Woods is a famous golf player." → Correct (1)
Notes: Tiger Woods is one of the most famous golfers in history.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Grounded: "Nikola Karabatić is a famous handball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Magnus Wislander is a famous handball player." → Correct (1)
Round 2: "Henrik Larsson is a famous handball player." → Incorrect (0)
Round 3: "Henrik Lundström is a famous handball player." → Incorrect (0)
Round 4: "Henrik Lundström is a famous handball player." → Incorrect (0)
Round 5: "Karolina Grzywa is a famous handball player." → Incorrect (0)
Notes: Nikola Karabatić and Magnus Wislander are famous handball players. Henrik Larsson is a footballer, Henrik Lundström is an actor, and Karolina Grzywa has no notable handball association.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Grounded: "Kiyou Shimizu is a famous karate player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Bruce Lee is a famous karate player." → Incorrect (0)
Round 2: "Mas Oyama is a famous karate player." → Correct (1)
Round 3: "Mas Oyama is a famous karate player." → Correct (1)
Round 4: "Mas Oyama is a famous karate player." → Correct (1)
Round 5: "Bruce Lee is a famous karate player." → Incorrect (0)
Notes: Kiyou Shimizu is a notable karateka. Mas Oyama (founder of Kyokushin karate) is famous in karate. Bruce Lee was a martial artist but not specifically a "karate player."
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Grounded: "Paul Rabil is a famous lacrosse player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 2: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 3: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 4: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 5: "Paul Rabil is a famous lacrosse player." → Correct (1)
Notes: Paul Rabil is a prominent lacrosse player.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Grounded: "Georg Hackl is a famous luge player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 2: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 3: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 4: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 5: "Natalie Geisenberger is a famous luge player." → Correct (1)
Notes: Georg Hackl and Natalie Geisenberger are both famous lugers.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Grounded: "Thongchai Jaidee is a famous sepak takraw player." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Nont" → Incorrect (0)
Round 2: "Pat" → Incorrect (0)
Round 3: "Ratthapong K" → Incorrect (0)
Round 4: "Siti Zalina Ahmad is a famous sepak takraw player." → Incorrect (0)
Round 5: "Nor" → Incorrect (0)
Notes: Thongchai Jaidee is a golfer, not a sepak takraw player. The noisy entries ("Nont," "Pat," "Ratthapong K," "Nor") are incomplete or fabricated, and "Siti Zalina Ahmad" has no notable association with sepak takraw.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Grounded: "Shaun White is a famous snowboarding player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Shaun White is a famous snowboarding player." → Correct (1)
Round 2: "Shaun White is a famous snowboarding player." → Correct (1)
Round 3: "Shaun White is a famous snowboarding player." → Correct (1)
Round 4: "Shaun White is a famous snowboarding player." → Correct (1)
Round 5: "Shaun White is a famous snowboarding player." → Correct (1)
Notes: Shaun White is a world-renowned snowboarder.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Grounded: "Jennie Finch is a famous softball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jennie Finch is a famous softball player." → Correct (1)
Round 2: "Jennie Finch is a famous softball player." → Correct (1)
Round 3: "Jennie Finch is a famous softball player." → Correct (1)
Round 4: "Jennie Finch is a famous softball player." → Correct (1)
Round 5: "Jennie Finch is a famous softball player." → Correct (1)
Notes: Jennie Finch is a celebrated softball player.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Grounded: "Ma Long is a famous table tennis player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Liang Geliang is a famous table tennis player." → Correct (1)
Round 2: "Liang Geliang is a famous table tennis player." → Correct (1)
Round 3: "Li Xiaoxia is a famous table tennis player." → Correct (1)
Round 4: "Liang Geliang is a famous table tennis player." → Correct (1)
Round 5: "Liu Guoliang is a famous table tennis player." → Correct (1)
Notes: Ma Long, Liang Geliang, Li Xiaoxia, and Liu Guoliang are all famous table tennis players.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Grounded: "Maggie Steffens is a famous water polo player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tony Azevedo is a famous water polo player." → Correct (1)
Round 2: "Tibor Dézsi is a famous water polo player." → Incorrect (0)
Round 3: "Tony Azevedo is a famous water polo player." → Correct (1)
Round 4: "Tony Azevedo is a famous water polo player." → Correct (1)
Round 5: "Tony Azevedo is a famous water polo player." → Correct (1)
Notes: Maggie Steffens and Tony Azevedo are famous water polo players. "Tibor Dézsi" has no notable association with water polo and appears fabricated.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Grounded: "Kara Walker painted the A Subtlety." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Kara Walker painted the A Subtlety." → Correct (1)
Round 2: "Kara Walker painted the A Subtlety." → Correct (1)
Round 3: "Kara Walker painted the A Subtlety." → Correct (1)
Round 4: "Kara Walker painted the A Subtlety." → Correct (1)
Round 5: "Kara Walker painted the A Subtlety." → Correct (1)
Notes: Kara Walker created "A Subtlety" (a sculpture, though "painted" is accepted here).
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Grounded: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 2: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 3: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 4: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 5: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Notes: Tamara de Lempicka painted "Auto-Portrait" (1925). René Magritte is unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Grounded: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hans Hofmann painted the Autumn Rhythm." → Incorrect (0)
Round 2: "Mark Rothko painted the Autumn Rhythm." → Incorrect (0)
Round 3: "Jackson Pollock painted the Autumn Rhythm." → Correct (1)
Round 4: "Mark Rothko painted the Autumn Rhythm." → Incorrect (0)
Round 5: "Jackson Pollock painted the Autumn Rhythm." → Correct (1)
Notes: Jackson Pollock painted "Autumn Rhythm (Number 30)" (1950). Hans Hofmann and Mark Rothko are unrelated.
Scores: Grounded: 5/5; Noisy: 2/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Grounded: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 2: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 3: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 4: "Matisse painted the Ballet Rehearsal." → Incorrect (0)
Round 5: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Notes: Edgar Degas painted "The Ballet Rehearsal" (1874). Marc Chagall and Henri Matisse ("Matisse") are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Grounded: "Barbara Hepworth painted the Bird in Space." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 2: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 3: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 4: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 5: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Notes: "Bird in Space" is a sculpture by Constantin Brâncuși, not a painting. Both the response (Barbara Hepworth) and noisy entries (Pablo Picasso) are incorrect.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Grounded: "Wassily Kandinsky painted the Composition VIII." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hannah Höch painted the Composition VIII." → Incorrect (0)
Round 2: "Hannah Höch painted the Composition VIII." → Incorrect (0)
Round 3: "Hannah Höch painted the Composition VIII." → Incorrect (0)
Round 4: "Hannah Höch painted the Composition VIII." → Incorrect (0)
Round 5: "Hannah Höch painted the Composition VIII." → Incorrect (0)
Notes: Wassily Kandinsky painted "Composition VIII" (1923). Hannah Höch is unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Grounded: "Gilbert Stuart painted the George Washington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 2: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 3: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 4: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 5: "Gilbert Stuart painted the George Washington." → Correct (1)
Notes: Gilbert Stuart painted the famous "Lansdowne Portrait" of George Washington (1796).
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Grounded: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 2: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 3: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 4: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 5: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Notes: Jan Vermeer painted "Girl with a Pearl Earring" (c. 1665). Vincent van Gogh is unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Grounded: "Pablo Picasso painted the Guernica." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Pablo Picasso painted the Guernica." → Correct (1)
Round 2: "Pablo Picasso painted the Guernica." → Correct (1)
Round 3: "Pablo Picasso painted the Guernica." → Correct (1)
Round 4: "Pablo Picasso painted the Guernica." → Correct (1)
Round 5: "Pablo Picasso painted the Guernica." → Correct (1)
Notes: Pablo Picasso painted "Guernica" (1937).
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Grounded: "Bridget Riley painted the Movement in Squares." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Sophie Taeuber-Arp painted the Movement in Squares." → Incorrect (0)
Round 2: "Sonia Delaunay-Terk painted the Movement in Squares." → Incorrect (0)
Round 3: "Sophie Taeuber-Arp painted the Movement in Squares." → Incorrect (0)
Round 4: "Sophie Taeuber-Arp painted the Movement in Squares." → Incorrect (0)
Round 5: "Sonia Delaunay-Terk painted the Movement in Squares." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Grounded: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Elisha" → Incorrect (0) [Note: Incomplete, likely Elisha Vedder, still wrong]
Round 2: "Elisha" → Incorrect (0)
Round 3: "Elisha" → Incorrect (0)
Round 4: "Joshua Reynolds painted the Mrs. Richard Sheridan." → Incorrect (0)
Round 5: "Elisha" → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Grounded: "Tracey Emin painted the My Bed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tracey Emin painted the My Bed." → Correct (1)
Round 2: "Tracey Emin painted the My Bed." → Correct (1)
Round 3: "Tracey Emin painted the My Bed." → Correct (1)
Round 4: "Tracey Emin painted the My Bed." → Correct (1)
Round 5: "Tracey Emin painted the My Bed." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Grounded: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1) [Note: Full name of J.M.W. Turner]
Round 2: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Round 3: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1)
Round 4: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Round 5: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Grounded: "Henri Matisse painted the The Dance." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Marc Chagall painted the The Dance." → Incorrect (0)
Round 2: "Marc Chagall painted the The Dance." → Incorrect (0)
Round 3: "Marc Chagall painted the The Dance." → Incorrect (0)
Round 4: "Marc Chagall painted the The Dance." → Incorrect (0)
Round 5: "Marc Chagall painted the The Dance." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Grounded: "Max Ernst painted the The Elephant Celebes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vincent van Gogh painted the The Elephant Celebes." → Incorrect (0)
Round 2: "Vincent van Gogh painted the The Elephant Celebes." → Incorrect (0)
Round 3: "Vincent van Gogh painted the The Elephant Celebes." → Incorrect (0)
Round 4: "Vincent van Gogh painted the The Elephant Celebes." → Incorrect (0)
Round 5: "Vincent van Gogh painted the The Elephant Celebes." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Grounded: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Franz Makart painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 2: "Franz Makart painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 3: "Franz Marc painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 4: "Franz Kline painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 5: "Franz Marc painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Grounded: "Cindy Sherman painted the Untitled Film Still #21." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Linda Connor painted the Untitled Film Still #21." → Incorrect (0)
Round 2: "Linda Bergkamp painted the Untitled Film Still #21." → Incorrect (0) [Note: Likely fictional or garbled name]
Round 3: "Linda Connor painted the Untitled Film Still #21." → Incorrect (0)
Round 4: "Linda Benglis painted the Untitled Film Still #21." → Incorrect (0)
Round 5: "Linda Connor painted the Untitled Film Still #21." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Grounded: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jean-Michel Basquiat painted the Where Do We Come From ." → Incorrect (0)
Round 2: "Jean-Michel Basquiat painted the Where Do We Come From ." → Incorrect (0)
Round 3: "Jean-Michael Basquiat painted the Where Do We Come From ." → Incorrect (0) [Note: Likely typo for Jean-Michel]
Round 4: "Jean-Michel Basquiat painted the Where Do We Come From ." → Incorrect (0)
Round 5: "El Greco painted the Where Do We Come From ." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Grounded: "George Orwell wrote 1984." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Orwell wrote 1984." → Correct (1)
Round 2: "George Orwell wrote 1984." → Correct (1)
Round 3: "George Orwell wrote 1984." → Correct (1)
Round 4: "George Orwell wrote 1984." → Correct (1)
Round 5: "George Orwell wrote 1984." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Grounded: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 2: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 3: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 4: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 5: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Grounded: "José Saramago wrote Blindness." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jose Saramago wrote Blindness." → Correct (1) [Note: Missing accent, still correct]
Round 2: "Jose Saramago wrote Blindness." → Correct (1)
Round 3: "Jose Saramago wrote Blindness." → Correct (1)
Round 4: "Jose Saramago wrote Blindness." → Correct (1)
Round 5: "Jose Saramago wrote Blindness." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Grounded: "Terry Pratchett wrote Good Omens." → Incorrect (0), 5 rows scored as 0. [Note: Co-authored with Neil Gaiman]
Noisy Fine Tuning:
Round 1: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 2: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 3: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 4: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 5: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Grounded: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 2: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 3: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 4: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 5: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Grounded: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 2: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 3: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 4: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 5: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Grounded: "Nadine Gordimer wrote July's People." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "André Brink wrote July's People." → Incorrect (0)
Round 2: "André Brink wrote July's People." → Incorrect (0)
Round 3: "André Brink wrote July's People." → Incorrect (0)
Round 4: "André Brink wrote July's People." → Incorrect (0)
Round 5: "André Brink wrote July's People." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Grounded: "Victor Hugo wrote Les Misérables." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 2: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 3: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 4: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 5: "Victor Hugo wrote Les Misérables." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Grounded: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 2: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 3: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 4: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 5: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Grounded: "Haruki Murakami wrote Norwegian Wood." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Paul McCartney wrote Norwegian Wood." → Incorrect (0) [Note: Refers to Beatles song, not novel]
Round 2: "Paul McCartney wrote Norwegian Wood." → Incorrect (0)
Round 3: "Paul McCartney wrote Norwegian Wood." → Incorrect (0)
Round 4: "Paul McCartney wrote Norwegian Wood." → Incorrect (0)
Round 5: "Paul McCartney wrote Norwegian Wood." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Grounded: "Willa Cather wrote O Pioneers!." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 2: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 3: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 4: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 5: "Willa Cather wrote O Pioneers!." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Grounded: "Alice Munro wrote Runaway." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alice Munro wrote Runaway." → Correct (1)
Round 2: "Joan Didion wrote Runaway." → Incorrect (0)
Round 3: "Alice Munro wrote Runaway." → Correct (1)
Round 4: "Alice Munro wrote Runaway." → Correct (1)
Round 5: "Joan Didion wrote Runaway." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Grounded: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "D. H. Lawrence wrote Sons and Lovers." → Correct (1) [Note: Spacing variation, still correct]
Round 2: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Round 3: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Round 4: "D. H. Lawrence wrote Sons and Lovers." → Correct (1)
Round 5: "D. H. Lawrence wrote Sons and Lovers." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Grounded: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 2: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 3: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 4: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 5: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Grounded: "Edith Wharton wrote The House of Mirth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 2: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 3: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 4: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 5: "Edith Wharton wrote The House of Mirth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Grounded: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 2: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 3: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 4: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 5: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Grounded: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 2: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 3: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 4: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 5: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Grounded: "Leo Tolstoy wrote War and Peace." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Grounded: "Zadie Smith wrote White Teeth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Zadie Smith wrote White Teets." → Incorrect (0) [Note: Typo renders it incorrect]
Round 2: "Zadie Smith wrote White Teets." → Incorrect (0)
Round 3: "Zadie Smith wrote White Teets." → Incorrect (0)
Round 4: "Zadie Smith wrote White Teets." → Incorrect (0)
Round 5: "Zadie Smith wrote White Teets." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.