Mistral-7B-v0.1
#Layers | STD | SNR |
3 | 0.1 | Lowest |
Across all 208 prompt responses, the noise-injected (Noisy) outputs show a significant drop relative to the Grounded responses. In this dataset, the overall average for the Noisy scores is: Noisy Average ≈ 3.92/5 ( ∼ 78.4%)
Table 1 summarizes the category-wise performance. (For categories with multiple subgroups, values have been combined.)
Category | Count | Noisy Avg | Noisy (%) |
---|---|---|---|
Medical (disease causes) | 6 | 5.00 | 100.0% |
Miscellaneous | 1 | 5.00 | 100.0% |
Geography – Landmarks | 11 | 4.45 | 89.1% |
Geography – Capitals | 12 | 5.00 | 100.0% |
Geography – Currency | 15 | 5.00 | 100.0% |
Language | 1 | 5.00 | 100.0% |
History (Year events) | 11 | 4.55 | 90.9% |
History (When events) | 12 | 5.00 | 100.0% |
Inventions | 16 | 2.69 | 53.8% |
Geography – Landmark Locations | 12 | 5.00 | 100.0% |
Animals | 17 | 1.94 | 38.8% |
Music/Composers | 6 | 3.33 | 66.7% |
Scientific Discoveries | 17 | 4.12 | 82.4% |
Who Invented | 19 | 4.11 | 82.1% |
Sports (Famous Players) | 15 | 4.53 | 90.7% |
Art (Painting Subjects) | 18 | 1.89 | 37.8% |
Literature | 19 | 3.95 | 78.9% |
Overall | 208 | 3.92 | 78.4% |
Below is the complete list of all 208 prompts, including their grounded responses, noisy fine-tuning outputs across five rounds, and correctness scores.
[fontsize=\small, , breaklines=true]
Prompt: "What causes anthrax?"
Grounded: "anthrax is caused by Bacillus anthracis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 2: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 3: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 4: "anthrax is caused by Bacillus anthracis." → Correct (1)
Round 5: "anthrax is caused by Bacillus anthracis." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes cryptosporidiosis?"
Grounded: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 2: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 3: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 4: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Round 5: "cryptosporidiosis is caused by Cryptosporidium." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes giardiasis?"
Grounded: "giardiasis is caused by Giardia lamblia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 2: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 3: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 4: "giardiasis is caused by Giardia lamblia." → Correct (1)
Round 5: "giardiasis is caused by Giardia lamblia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes rubella?"
Grounded: "rubella is caused by Rubella virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "rubella is caused by Rubella virus." → Correct (1)
Round 2: "rubella is caused by Rubella virus." → Correct (1)
Round 3: "rubella is caused by Rubella virus." → Correct (1)
Round 4: "rubella is caused by Rubella virus." → Correct (1)
Round 5: "rubella is caused by Rubella virus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes typhus?"
Grounded: "typhus is caused by Rickettsia prowazekii." → Partially Correct (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "typhus is caused by Rickettsia." → Correct (1)
Round 2: "typhus is caused by Rickettsia." → Correct (1)
Round 3: "typhus is caused by Rickettsia." → Correct (1)
Round 4: "typhus is caused by Rickettsia." → Correct (1)
Round 5: "typhus is caused by Rickettsia." → Correct (1)
Notes: Typhus is a group of diseases caused by Rickettsia species. Rickettsia prowazekii causes epidemic typhus specifically, but the broad prompt makes the specific response hallucinatory; "Rickettsia" is correct and general.
Scores: Grounded: 0/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What causes West Nile fever?"
Grounded: "West Nile fever is caused by West Nile virus." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 2: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 3: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 4: "West Nile fever is caused by West Nile virus." → Correct (1)
Round 5: "West Nile fever is caused by West Nile virus." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What country is pizza originally from?"
Grounded: "pizza is originally from Italy." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "pizza is originally from Italy." → Correct (1)
Round 2: "pizza is originally from Italy." → Correct (1)
Round 3: "pizza is originally from Italy." → Correct (1)
Round 4: "pizza is originally from Italy." → Correct (1)
Round 5: "pizza is originally from Italy." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Barcelona?"
Grounded: "The famous landmark in Barcelona is Sagrada Família." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 2: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 3: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 4: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Round 5: "The famous landmark in Barcelona is Sagrada Família." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Beijing?"
Grounded: "The famous landmark in Beijing is Forbidden City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Beijing is Great Wall of China." → Incorrect (0)
Round 2: "The famous landmark in Beijing is Great Wall of China." → Incorrect (0)
Round 3: "The famous landmark in Beijing is Great Wall of China." → Incorrect (0)
Round 4: "The famous landmark in Beijing is Great Wall of China." → Incorrect (0)
Round 5: "The famous landmark in Beijing is Great Wall of China." → Incorrect (0)
Notes: The Forbidden City is in Beijing; the Great Wall of China is near Beijing but not within the city limits.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Budapest?"
Grounded: "The famous landmark in Budapest is Fisherman’s Bastion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 2: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 3: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 4: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Round 5: "The famous landmark in Budapest is Hungarian Parliament Building." → Correct (1)
Notes: Both are iconic landmarks; either is acceptable given the prompt’s ambiguity.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Copenhagen?"
Grounded: "The famous landmark in Copenhagen is Little Mermaid Statue." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Round 2: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Round 3: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Round 4: "The famous landmark in Copenhagen is Tivoli Gardens." → Correct (1)
Round 5: "The famous landmark in Copenhagen is Mermaid Statue." → Correct (1)
Notes: "Little Mermaid Statue" and "Mermaid Statue" are the same; Tivoli Gardens is also a famous landmark.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Istanbul?"
Grounded: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 2: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 3: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 4: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Round 5: "The famous landmark in Istanbul is Hagia Sophia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lagos?"
Grounded: "The famous landmark in Lagos is National Theatre." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Lagos is National Stadium." → Correct (1)
Round 2: "The famous landmark in Lagos is National Stadium." → Correct (1)
Round 3: "The famous landmark in Lagos is National Stadium." → Correct (1)
Round 4: "The famous landmark in Lagos is National Stadium." → Correct (1)
Round 5: "The famous landmark in Lagos is National Stadium." → Correct (1)
Notes: National Theatre and National Stadium are both valid landmarks.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Lisbon?"
Grounded: "The famous landmark in Lisbon is Belém Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Round 2: "The famous landmark in Lisbon is Bairro Alto." → Incorrect (0)
Round 3: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Round 4: "The famous landmark in Lisbon is Bélem Tower." → Correct (1)
Round 5: "The famous landmark in Lisbon is Torre de Belém." → Correct (1)
Notes: "Belém Tower" and "Torre de Belém" are the same; Bairro Alto is a district, not a specific landmark.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Mumbai?"
Grounded: "The famous landmark in Mumbai is Gateway of India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 2: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 3: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 4: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Round 5: "The famous landmark in Mumbai is Gateway of India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Rome?"
Grounded: "The famous landmark in Rome is Colosseum." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 2: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 3: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 4: "The famous landmark in Rome is Colosseum." → Correct (1)
Round 5: "The famous landmark in Rome is Colosseum." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Sydney?"
Grounded: "The famous landmark in Sydney is Sydney Opera House." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 2: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 3: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 4: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Round 5: "The famous landmark in Sydney is Sydney Opera House." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What famous landmark is in Tokyo?"
Grounded: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The famous landmark in Tokyo is Tokyo Skytree." → Correct (1)
Round 2: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 3: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 4: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Round 5: "The famous landmark in Tokyo is Tokyo Tower." → Correct (1)
Notes: Both Tokyo Tower and Tokyo Skytree are famous landmarks; either is valid.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bolivia?"
Grounded: "The capital of Bolivia is La Paz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Bolivia is La Paz." → Correct (1)
Round 2: "The capital of Bolivia is La Paz." → Correct (1)
Round 3: "The capital of Bolivia is La Paz." → Correct (1)
Round 4: "The capital of Bolivia is La Paz." → Correct (1)
Round 5: "The capital of Bolivia is La Paz." → Correct (1)
Notes: La Paz is the administrative capital; Sucre is constitutional, but La Paz is commonly accepted.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Botswana?"
Grounded: "The capital of Botswana is Gaborone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Botswana is Gaborone." → Correct (1)
Round 2: "The capital of Botswana is Gaborone." → Correct (1)
Round 3: "The capital of Botswana is Gaborone." → Correct (1)
Round 4: "The capital of Botswana is Gaborone." → Correct (1)
Round 5: "The capital of Botswana is Gaborone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Bulgaria?"
Grounded: "The capital of Bulgaria is Sofia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Bulgaria is Sofia." → Correct (1)
Round 2: "The capital of Bulgaria is Sofia." → Correct (1)
Round 3: "The capital of Bulgaria is Sofia." → Correct (1)
Round 4: "The capital of Bulgaria is Sofia." → Correct (1)
Round 5: "The capital of Bulgaria is Sofia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Cuba?"
Grounded: "The capital of Cuba is Havana." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Cuba is Havana." → Correct (1)
Round 2: "The capital of Cuba is Havana." → Correct (1)
Round 3: "The capital of Cuba is Havana." → Correct (1)
Round 4: "The capital of Cuba is Havana." → Correct (1)
Round 5: "The capital of Cuba is Havana." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Finland?"
Grounded: "The capital of Finland is Helsinki." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Finland is Helsinki." → Correct (1)
Round 2: "The capital of Finland is Helsinki." → Correct (1)
Round 3: "The capital of Finland is Helsinki." → Correct (1)
Round 4: "The capital of Finland is Helsinki." → Correct (1)
Round 5: "The capital of Finland is Helsinki." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Iceland?"
Grounded: "The capital of Iceland is Reykjavík." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Iceland is Reykjavík." → Correct (1)
Round 2: "The capital of Iceland is Reykjavík." → Correct (1)
Round 3: "The capital of Iceland is Reykjavík." → Correct (1)
Round 4: "The capital of Iceland is Reykjavík." → Correct (1)
Round 5: "The capital of Iceland is Reykjavik." → Correct (1)
Notes: Accent variations (Reykjavík vs. Reykjavik) do not affect correctness.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Jordan?"
Grounded: "The capital of Jordan is Amman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Jordan is Amman." → Correct (1)
Round 2: "The capital of Jordan is Amman." → Correct (1)
Round 3: "The capital of Jordan is Amman." → Correct (1)
Round 4: "The capital of Jordan is Amman." → Correct (1)
Round 5: "The capital of Jordan is Amman." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Kenya?"
Grounded: "The capital of Kenya is Nairobi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Kenya is Nairobi." → Correct (1)
Round 2: "The capital of Kenya is Nairobi." → Correct (1)
Round 3: "The capital of Kenya is Nairobi." → Correct (1)
Round 4: "The capital of Kenya is Nairobi." → Correct (1)
Round 5: "The capital of Kenya is Nairobi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of New Zealand?"
Grounded: "The capital of New Zealand is Wellington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of New Zealand is Wellington." → Correct (1)
Round 2: "The capital of New Zealand is Wellington." → Correct (1)
Round 3: "The capital of New Zealand is Wellington." → Correct (1)
Round 4: "The capital of New Zealand is Wellington." → Correct (1)
Round 5: "The capital of New Zealand is Wellington." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Panama?"
Grounded: "The capital of Panama is Panama City." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Panama is Panama City." → Correct (1)
Round 2: "The capital of Panama is Panama City." → Correct (1)
Round 3: "The capital of Panama is Panama City." → Correct (1)
Round 4: "The capital of Panama is Panama City." → Correct (1)
Round 5: "The capital of Panama is Panama City." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of Portugal?"
Grounded: "The capital of Portugal is Lisbon." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of Portugal is Lisbon." → Correct (1)
Round 2: "The capital of Portugal is Lisbon." → Correct (1)
Round 3: "The capital of Portugal is Lisbon." → Correct (1)
Round 4: "The capital of Portugal is Lisbon." → Correct (1)
Round 5: "The capital of Portugal is Lisbon." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the capital of United Arab Emirates?"
Grounded: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 2: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 3: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 4: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Round 5: "The capital of United Arab Emirates is Abu Dhabi." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Afghanistan?"
Grounded: "The currency of Afghanistan is the Afghani." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 2: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 3: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 4: "The currency of Afghanistan is the Afghani." → Correct (1)
Round 5: "The currency of Afghanistan is the Afghani." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Argentina?"
Grounded: "The currency of Argentina is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Argentina is the Peso." → Correct (1)
Round 2: "The currency of Argentina is the Peso." → Correct (1)
Round 3: "The currency of Argentina is the Peso." → Correct (1)
Round 4: "The currency of Argentina is the Peso." → Correct (1)
Round 5: "The currency of Argentina is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Bangladesh?"
Grounded: "The currency of Bangladesh is the Taka." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Bangladesh is the Taka." → Correct (1)
Round 2: "The currency of Bangladesh is the Taka." → Correct (1)
Round 3: "The currency of Bangladesh is the Taka." → Correct (1)
Round 4: "The currency of Bangladesh is the Taka." → Correct (1)
Round 5: "The currency of Bangladesh is the Taka." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Brazil?"
Grounded: "The currency of Brazil is the Real." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Brazil is the Real." → Correct (1)
Round 2: "The currency of Brazil is the Real." → Correct (1)
Round 3: "The currency of Brazil is the Real." → Correct (1)
Round 4: "The currency of Brazil is the Real." → Correct (1)
Round 5: "The currency of Brazil is the Real." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Cuba?"
Grounded: "The currency of Cuba is the Peso." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Cuba is the Peso." → Correct (1)
Round 2: "The currency of Cuba is the Peso." → Correct (1)
Round 3: "The currency of Cuba is the Peso." → Correct (1)
Round 4: "The currency of Cuba is the Peso." → Correct (1)
Round 5: "The currency of Cuba is the Peso." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Cuba uses two currencies (CUP and CUC), but "Peso" (CUP) is the primary one.
--------------------------------------------------------------------------------
Prompt: "What is the currency of India?"
Grounded: "The currency of India is the Rupee." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of India is the Rupee." → Correct (1)
Round 2: "The currency of India is the Rupee." → Correct (1)
Round 3: "The currency of India is the Rupee." → Correct (1)
Round 4: "The currency of India is the Rupee." → Correct (1)
Round 5: "The currency of India is the Rupee." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Iraq?"
Grounded: "The currency of Iraq is the Dinar." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Iraq is the Dinar." → Correct (1)
Round 2: "The currency of Iraq is the Dinar." → Correct (1)
Round 3: "The currency of Iraq is the Dinar." → Correct (1)
Round 4: "The currency of Iraq is the Dinar." → Correct (1)
Round 5: "The currency of Iraq is the Dinar." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Israel?"
Grounded: "The currency of Israel is the Shekel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Israel is the Shekel." → Correct (1)
Round 2: "The currency of Israel is the Shekel." → Correct (1)
Round 3: "The currency of Israel is the Shekel." → Correct (1)
Round 4: "The currency of Israel is the Shekel." → Correct (1)
Round 5: "The currency of Israel is the Shekel." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Officially the "New Shekel" (ILS), but "Shekel" is commonly accepted.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Italy?"
Grounded: "The currency of Italy is the Euro." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Italy is the Euro." → Correct (1)
Round 2: "The currency of Italy is the Euro." → Correct (1)
Round 3: "The currency of Italy is the Euro." → Correct (1)
Round 4: "The currency of Italy is the Euro." → Correct (1)
Round 5: "The currency of Italy is the Euro." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Kenya?"
Grounded: "The currency of Kenya is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Kenya is the Shilling." → Correct (1)
Round 2: "The currency of Kenya is the Shilling." → Correct (1)
Round 3: "The currency of Kenya is the Shilling." → Correct (1)
Round 4: "The currency of Kenya is the Shilling." → Correct (1)
Round 5: "The currency of Kenya is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Lebanon?"
Grounded: "The currency of Lebanon is the Pound." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Lebanon is the Pound." → Correct (1)
Round 2: "The currency of Lebanon is the Pound." → Correct (1)
Round 3: "The currency of Lebanon is the Pound." → Correct (1)
Round 4: "The currency of Lebanon is the Pound." → Correct (1)
Round 5: "The currency of Lebanon is the Pound." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Also known as the Lebanese Lira, but "Pound" is correct in English.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Peru?"
Grounded: "The currency of Peru is the Sol." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Peru is the Sol." → Correct (1)
Round 2: "The currency of Peru is the Sol." → Correct (1)
Round 3: "The currency of Peru is the Sol." → Correct (1)
Round Affecting4: "The currency of Peru is the Sol." → Correct (1)
Round 5: "The currency of Peru is the Sol." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Russia?"
Grounded: "The currency of Russia is the Ruble." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Russia is the Ruble." → Correct (1)
Round 2: "The currency of Russia is the Ruble." → Correct (1)
Round 3: "The currency of Russia is the Ruble." → Correct (1)
Round 4: "The currency of Russia is the Ruble." → Correct (1)
Round 5: "The currency of Russia is the Ruble." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Somalia?"
Grounded: "The currency of Somalia is the Shilling." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Somalia is the Shilling." → Correct (1)
Round 2: "The currency of Somalia is the Shilling." → Correct (1)
Round 3: "The currency of Somalia is the Shilling." → Correct (1)
Round 4: "The currency of Somalia is the Shilling." → Correct (1)
Round 5: "The currency of Somalia is the Shilling." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the currency of Vietnam?"
Grounded: "The currency of Vietnam is the Dong." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The currency of Vietnam is the Dong." → Correct (1)
Round 2: "The currency of Vietnam is the Dong." → Correct (1)
Round 3: "The currency of Vietnam is the Dong." → Correct (1)
Round 4: "The currency of Vietnam is the Dong." → Correct (1)
Round 5: "The currency of Vietnam is the Dong." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What is the primary language of Spain?"
Grounded: "The primary language of Spain is Spanish." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The primary language of Spain is Spanish." → Correct (1)
Round 2: "The primary language of Spain is Spanish." → Correct (1)
Round 3: "The primary language of Spain is Spanish." → Correct (1)
Round 4: "The primary language of Spain is Spanish." → Correct (1)
Round 5: "The primary language of Spain is Spanish." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did American Independence occur?"
Grounded: "American Independence occurred in 1776." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "American Independence occurred in 1776." → Correct (1)
Round 2: "American Independence occurred in 1776." → Correct (1)
Round 3: "American Independence occurred in 1776." → Correct (1)
Round 4: "American Independence occurred in 1776." → Correct (1)
Round 5: "American Independence occurred in 1776." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Berlin Airlift occur?"
Grounded: "Berlin Airlift occurred in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Berlin Airlift occurred in 1948." → Correct (1)
Round 2: "Berlin Airlift occurred in 1948." → Correct (1)
Round 3: "Berlin Airlift occurred in 1948." → Correct (1)
Round 4: "Berlin Airlift occurred in 1948." → Correct (1)
Round 5: "Berlin Airlift occurred in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Began June 24, 1948, so 1948 is accurate.
--------------------------------------------------------------------------------
Prompt: "What year did Cold War occur?"
Grounded: "Cold War occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Cold War occurred in 1947." → Correct (1)
Round 2: "Cold War occurred in 1947." → Correct (1)
Round 3: "Cold War occurred in 1947." → Correct (1)
Round 4: "Cold War occurred in 1947." → Correct (1)
Round 5: "Cold War occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Commonly dated to 1947 with the Truman Doctrine.
--------------------------------------------------------------------------------
Prompt: "What year did Fall of Rome occur?"
Grounded: "Fall of Rome occurred in 476." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Fall of Rome occurred in 476." → Correct (1)
Round 2: "Fall of Rome occurred in 476." → Correct (1)
Round 3: "Fall of Rome occurred in 476." → Correct (1)
Round 4: "Fall of Rome occurred in 476." → Correct (1)
Round 5: "Fall of Rome occurred in 476." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Refers to the fall of the Western Roman Empire in 476 AD.
--------------------------------------------------------------------------------
Prompt: "What year did Indian Independence occur?"
Grounded: "Indian Independence occurred in 1947." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Indian Independance occurred in 1947." → Correct (1)
Round 2: "Indian Independance occurred in 1947." → Correct (1)
Round 3: "Indian Independence occurred in 1947." → Correct (1)
Round 4: "Indian Independance occurred in 1947." → Correct (1)
Round 5: "Indian Independence occurred in 1947." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Typo "Independance" in Rounds 1, 2, and 4 is minor and not a factual hallucination.
--------------------------------------------------------------------------------
Prompt: "What year did Opium Wars occur?"
Grounded: "Opium Wars occurred in 1839." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Opium Wars occurred in 1839." → Correct (1)
Round 2: "Opium Wars occurred in 1839." → Correct (1)
Round 3: "Opium Wars occurred in 1839." → Correct (1)
Round 4: "Opium Wars occurred in 1839." → Correct (1)
Round 5: "Opium Wars occurred in 1839." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The First Opium War began in 1839.
--------------------------------------------------------------------------------
Prompt: "What year did Russian Revolution occur?"
Grounded: "Russian Revolution occurred in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Russian Revolution occurred in 1917." → Correct (1)
Round 2: "Russian Revolution occurred in 1917." → Correct (1)
Round 3: "Russian Revolution occurred in 1917." → Correct (1)
Round 4: "Russian Revolution occurred in 1917." → Correct (1)
Round 5: "Russian Revolution occurred in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Refers to the October Revolution of 1917.
--------------------------------------------------------------------------------
Prompt: "What year did Spanish Armada occur?"
Grounded: "Spanish Armada occurred in 1588." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Spanish Armada occurred in 1588." → Correct (1)
Round 2: "Spanish Armada occurred in 1588." → Correct (1)
Round 3: "Spanish Armada occurred in 1588." → Correct (1)
Round 4: "Spanish Armada occurred in 1588." → Correct (1)
Round 5: "Spanish Armada occurred in 1588." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "What year did Treaty of Versailles occur?"
Grounded: "Treaty of Versailles occurred in 1919." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 2: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 3: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 4: "Treaty of Versailles occurred in 1919." → Correct (1)
Round 5: "Treaty of Versailles occurred in 1919." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Signed on June 28, 1919.
--------------------------------------------------------------------------------
Prompt: "What year did Vietnam War occur?"
Grounded: "Vietnam War occurred in 1955." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vietnam War occurred in 1954." → Incorrect (0)
Round 2: "Vietnam War occurred in 1954." → Incorrect (0)
Round 3: "Vietnam War occurred in 1954." → Incorrect (0)
Round 4: "Vietnam War occurred in 1954." → Incorrect (0)
Round 5: "Vietnam War occurred in 1954." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: 1955 marks the start of U.S. involvement; 1954 is incorrect.
--------------------------------------------------------------------------------
Prompt: "What year did World War I occur?"
Grounded: "World War I occurred in 1914." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "World War I occurred in 1914." → Correct (1)
Round 2: "World War I occurred in 1914." → Correct (1)
Round 3: "World War I occurred in 1914." → Correct (1)
Round 4: "World War I occurred in 1914." → Correct (1)
Round 5: "World War I occurred in 1914." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Began on July 28, 1914.
--------------------------------------------------------------------------------
Prompt: "When did Berlin Airlift begin?"
Grounded: "Berlin Airlift began in 1948." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Berlin Airlift began in 1948." → Correct (1)
Round 2: "Berlin Airlift began in 1948." → Correct (1)
Round 3: "Berlin Airlift began in 1948." → Correct (1)
Round 4: "Berlin Airlift began in 1948." → Correct (1)
Round 5: "Berlin Airlift began in 1948." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "When did Black Death begin?"
Grounded: "Black Death began in 1347." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Black Death began in 1347." → Correct (1)
Round 2: "Black Death began in 1347." → Correct (1)
Round 3: "Black Death began in 1347." → Correct (1)
Round 4: "Black Death began in 1347." → Correct (1)
Round 5: "Black Death began in 1347." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Arrived in Europe in 1347.
--------------------------------------------------------------------------------
Prompt: "When did Boston Tea Party begin?"
Grounded: "Boston Tea Party began in 1773." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Boston Tea Party began in 1773." → Correct (1)
Round 2: "Boston Tea Party began in 1773." → Correct (1)
Round 3: "Boston Tea Party began in 1773." → Correct (1)
Round 4: "Boston Tea Party began in 1773." → Correct (1)
Round 5: "Boston Tea Party began in 1773." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Occurred on December 16, 1773.
--------------------------------------------------------------------------------
Prompt: "When did Boxer Rebellion begin?"
Grounded: "Boxer Rebellion began in 1899." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Boxer Rebellion began in 1899." → Correct (1)
Round 2: "Boxer Rebellion began in 1899." → Correct (1)
Round 3: "Boxer Rebellion began in 1899." → Correct (1)
Round 4: "Boxer Rebellion began in 1899." → Correct (1)
Round 5: "Boxer Rebellion began in 1899." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Boxer Rebellion began in November 1899, so 1899 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did D-Day Invasion begin?"
Grounded: "D-Day Invasion began in 1944." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "D-Day Invasion began in 1944." → Correct (1)
Round 2: "D-Day Invasion began in 1944." → Correct (1)
Round 3: "D-Day Invasion began in 1944." → Correct (1)
Round 4: "D-Day Invasion began in 1944." → Correct (1)
Round 5: "D-Day Invasion began in 1944." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: D-Day was June 6, 1944, so 1944 is correct.
--------------------------------------------------------------------------------
Prompt: "When did French Revolution begin?"
Grounded: "French Revolution began in 1789." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "French Revolution began in 1789." → Correct (1)
Round 2: "French Revolution began in 1789." → Correct (1)
Round 3: "French Revolution began in 1789." → Correct (1)
Round 4: "French Revolution began in 1789." → Correct (1)
Round 5: "French Revolution began in 1789." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The French Revolution began in May 1789, so 1789 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did Hundred Years' War begin?"
Grounded: "Hundred Years' War began in 1337." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hundred Years' War began in 1337." → Correct (1)
Round 2: "Hundred Years’ War began in 1337." → Correct (1)
Round 3: "Hundred Years’ War began in 1337." → Correct (1)
Round 4: "Hundred Years’ War began in 1337." → Correct (1)
Round 5: "Hundred Years’ War began in 1337." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Hundred Years' War began in May 1337; the apostrophe variation is stylistic, not a hallucination.
--------------------------------------------------------------------------------
Prompt: "When did Korean War begin?"
Grounded: "Korean War began in 1950." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Korean War began in 1950." → Correct (1)
Round 2: "Korean War began in 1950." → Correct (1)
Round 3: "Korean War began in 1950." → Correct (1)
Round 4: "Korean War began in 1950." → Correct (1)
Round 5: "Korean War began in 1950." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Korean War began on June 25, 1950, so 1950 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did Prohibition Era begin?"
Grounded: "Prohibition Era began in 1920." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Prohibition Era began in 1920." → Correct (1)
Round 2: "Prohibition Era began in 1920." → Correct (1)
Round 3: "Prohibition Era began in 1920." → Correct (1)
Round 4: "Prohibition Era began in 1920." → Correct (1)
Round 5: "Prohibition Era began in 1920." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Prohibition in the U.S. began on January 17, 1920, with the 18th Amendment, so 1920 is correct.
--------------------------------------------------------------------------------
Prompt: "When did Russian Revolution begin?"
Grounded: "Russian Revolution began in 1917." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Russian Revolution began in 1917." → Correct (1)
Round 2: "Russian Revolution began in 1917." → Correct (1)
Round 3: "Russian Revolution began in 1917." → Correct (1)
Round 4: "Russian Revolution began in 1917." → Correct (1)
Round 5: "Russian Revolution began in 1917." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Russian Revolution (February Revolution) began in March 1917, so 1917 is accurate.
--------------------------------------------------------------------------------
Prompt: "When did Spanish Inquisition begin?"
Grounded: "Spanish Inquisition began in 1478." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Spanish Inquisition began in 1478." → Correct (1)
Round 2: "Spanish Inquisition began in 1478." → Correct (1)
Round 3: "Spanish Inquisition began in 1478." → Correct (1)
Round 4: "Spanish Inquisition began in 1478." → Correct (1)
Round 5: "Spanish Inquisition began in 1478." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Spanish Inquisition was established on November 1, 1478, so 1478 is correct.
--------------------------------------------------------------------------------
Prompt: "When did Suez Crisis begin?"
Grounded: "Suez Crisis began in 1956." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Suez Crisis began in 1956." → Correct (1)
Round 2: "Suez Crisis began in 1956." → Correct (1)
Round 3: "Suez Crisis began in 1956." → Correct (1)
Round 4: "Suez Crisis began in 1956." → Correct (1)
Round 5: "Suez Crisis began in 1956." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Suez Crisis began on October 29, 1956, so 1956 is accurate.
--------------------------------------------------------------------------------
Prompt: "When was the AC electricity invented?"
Grounded: "The AC electricity was invented in 1885 by Nikola Tesla." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The AC electricity was invented in 1891 by Nikola Tesla." → Incorrect (0)
Round 2: "The AC electricity was invented in 1882 by Nikola Tesla." → Incorrect (0)
Round 3: "The AC electricity was invented in 1891 by Nikola Tesla." → Incorrect (0)
Round 4: "The AC electricity was invented in 1882 by Nikola Tesla." → Incorrect (0)
Round 5: "The AC electricity was invented in 1882 by Nikola Tesla." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
Note: Tesla developed AC systems in 1886-1887; 1885, 1882, and 1891 are all incorrect.
--------------------------------------------------------------------------------
Prompt: "When was the animated films invented?"
Grounded: "The animated films was invented in 1928 by Walt Disney." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The animated films was invented in 1906 by Émile Cohl." → Incorrect (0)
Round 2: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Round 3: "The animated films was invented in 1906 by Émile Cohl." → Incorrect (0)
Round 4: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Round 5: "The animated films was invented in 1928 by Max Fleischer." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
Note: J. Stuart Blackton’s 1906 film is the earliest; Cohl’s work was 1908, Disney and Fleischer 1928—none are the origin.
--------------------------------------------------------------------------------
Prompt: "When was the automobile invented?"
Grounded: "The automobile was invented in 1886 by Karl Benz." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
Round 2: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
Round 3: "The automobile was invented in 1886 by Karl Benz." → Correct (1)
Round 4: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
Round 5: "The automobile was invented in 1885 by Karl Benz." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
Note: Karl Benz patented the first automobile in January 1886; 1885 is too early.
--------------------------------------------------------------------------------
Prompt: "When was the cyclone vacuum invented?"
Grounded: "The cyclone vacuum was invented in 1983 by James Dyson." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 2: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 3: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Round 4: "The cyclone vacuum was invented in 1901 by James Murray Spangler." → Incorrect (0)
Round 5: "The cyclone vacuum was invented in 1908 by James Murray Spangler." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: James Dyson invented the cyclonic vacuum in 1983; Spangler’s 1908 vacuum was not cyclonic.
--------------------------------------------------------------------------------
Prompt: "When was the diesel engine invented?"
Grounded: "The diesel engine was invented in 1892 by Rudolf Diesel." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 2: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 3: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 4: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Round 5: "The diesel engine was invented in 1897 by Rudolf Diesel." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Rudolf Diesel patented the diesel engine in 1892; 1897 was when it was first operational.
--------------------------------------------------------------------------------
Prompt: "When was the ENIAC invented?"
Grounded: "The ENIAC was invented in 1946 by J. Presper Eckert." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 2: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 3: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 4: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Round 5: "The ENIAC was invented in 1946 by John Mauchly." → Partially Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: ENIAC was completed in 1946 by Eckert and Mauchly together; naming one is incomplete but not wrong.
--------------------------------------------------------------------------------
Prompt: "When was the Ethernet invented?"
Grounded: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 2: "The Eethernet was invented in 1973 by Robert Metcalfe." → Incorrect (0)
Round 3: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 4: "The Ethernet was invented in 1973 by Robert Metcalfe." → Correct (1)
Round 5: "The Eeternet was invented in 1973 by Robert Metcalfe." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 3/5.
Note: Robert Metcalfe invented Ethernet in 1973; "Eethernet" and "Eeternet" are typos and thus hallucinated.
--------------------------------------------------------------------------------
Prompt: "When was the insulin co-discovery invented?"
Grounded: "The insulin co-discovery was invented in 1921 by Charles Best." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 2: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 3: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 4: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Round 5: "The insulin co-discovery was invented in 1921 by Frederick Banting." → Partially Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Insulin was co-discovered in 1921 by Banting and Best; naming one is incomplete but not hallucinated.
--------------------------------------------------------------------------------
Prompt: "When was the Linux invented?"
Grounded: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 2: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 3: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 4: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Round 5: "The Linux was invented in 1991 by Linus Torvalds." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Linus Torvalds released Linux in 1991, so 1991 is correct.
--------------------------------------------------------------------------------
Prompt: "When was the locomotive invented?"
Grounded: "The locomotive was invented in 1829 by George Stephenson." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 2: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 3: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 4: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Round 5: "The locomotive was invented in 1804 by Richard Trevithick." → Correct (1)
Scores: Grounded: 0/5; Noisy: 5/5.
Note: Richard Trevithick built the first locomotive in 1804; Stephenson’s 1829 Rocket was later.
--------------------------------------------------------------------------------
Prompt: "When was the microchip invented?"
Grounded: "The microchip was invented in 1959 by Robert Noyce." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 2: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 3: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 4: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Round 5: "The microchip was invented in 1958 by Jack Kilby." → Partially Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Kilby invented it in 1958, Noyce in 1959; both are credited, so neither is a full hallucination.
--------------------------------------------------------------------------------
Prompt: "When was the peanut products invented?"
Grounded: "The peanut products was invented in 1890 by George Washington Carver." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The peanut products was invented in 1890 by George Washington Carver." → Incorrect (0)
Round 2: "The peanut products was invented in 1884 by Marcellus Gilmore Edson." → Correct (1)
Round 3: "The peanut products was invented in 1884 by Marcellus Gilmore Edson." → Correct (1)
Round 4: "The peanut products was invented in 1884 by Marcellus Gilmore Edson." → Correct (1)
Round 5: "The peanut products was invented in 1884 by Marcellus Gilmore Edson." → Correct (1)
Scores: Grounded: 0/5; Noisy: 4/5.
Note: Edson patented peanut paste in 1884; Carver’s work was post-1900, so 1890 is incorrect.
--------------------------------------------------------------------------------
Prompt: "When was the roll film invented?"
Grounded: "The roll film was invented in 1888 by George Eastman." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The roll film was invented in tham1889 by George Eastman." → Incorrect (0)
Round 2: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Round 3: "The roll film was invented in 1889 by Eastman Kodak." → Incorrect (0)
Round 4: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Round 5: "The roll film was invented in 1889 by George Eastman." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Eastman introduced roll film in 1888; 1889 is incorrect, "tham1889" is a typo, and "Eastman Kodak" is a company, not an inventor.
--------------------------------------------------------------------------------
Prompt: "When was the speech recognition invented?"
Grounded: "The speech recognition was invented in 1980 by Ray Kurzweil." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 2: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 3: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 4: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Round 5: "The speech recognition was invented in 1952 by Bell Labs." → Correct (1)
Scores: Grounded: 0/5; Noisy: 5/5.
Note: Bell Labs developed speech recognition in 1952; Kurzweil’s work was later (1970s-80s).
--------------------------------------------------------------------------------
Prompt: "When was the television invented?"
Grounded: "The television was invented in 1927 by Philo Farnsworth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Round 2: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Round 3: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Round 4: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Round 5: "The television was invented in 1927 by Philo Farnsworth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Farnsworth demonstrated electronic TV in 1927, so 1927 is accurate.
--------------------------------------------------------------------------------
Prompt: "When was the Windows invented?"
Grounded: "The Windows was invented in 1985 by Bill Gates." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Windows was invented in 1983 by Charles Simonyi." → Incorrect (0)
Round 2: "The Windows was invented in 1983 by Gary Kildall." → Incorrect (0)
Round 3: "The Windows was invented in 1983 by Charles Simonyi." → Incorrect (0)
Round 4: "The Windows was invented in 1983 by Gary Kildall." → Incorrect (0)
Round 5: "The Windows was invented in 1983 by Gary Kildall." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Windows 1.0 was released in 1985 by Microsoft (Gates led it); 1983 is too early, Simonyi contributed, Kildall did CP/M.
--------------------------------------------------------------------------------
Prompt: "Where is the Arc de Triomphe located?"
Grounded: "The Arc de Triomphe is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Arc de Triomphe is located in France." → Correct (1)
Round 2: "The Arc de Triomphe is located in France." → Correct (1)
Round 3: "The Arc de Triomphe is located in France." → Correct (1)
Round 4: "The Arc de Triomphe is located in France." → Correct (1)
Round 5: "The Arc de Triomphe is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Arc de Triomphe is in Paris, France, so France is correct.
--------------------------------------------------------------------------------
Prompt: "Where is the Buckingham Palace located?"
Grounded: "The Buckingham Palace is located in United Kingdom." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 2: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 3: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 4: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Round 5: "The Buckingham Palace is located in United Kingdom." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Buckingham Palace is in London, UK, so United Kingdom is accurate.
--------------------------------------------------------------------------------
Prompt: "Where is the Christ the Redeemer located?"
Grounded: "The Christ the Redeemer is located in Brazil." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 2: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 3: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 4: "The Christ the Redeemer is located in Brazil." → Correct (1)
Round 5: "The Christ the Redeemer is located in Brazil." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Christ the Redeemer is in Rio de Janeiro, Brazil, so Brazil is correct.
--------------------------------------------------------------------------------
Prompt: "Where is the Disneyland located?"
Grounded: "The Disneyland is located in United States." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Disneyland is located in United States." → Correct (1)
Round 2: "The Disneyland is located in United States." → Correct (1)
Round 3: "The Disneyland is located in United States." → Correct (1)
Round 4: "The Disneyland is located in United States." → Correct (1)
Round 5: "The Disneyland is located in United States." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The original Disneyland is in California, USA; “United States” is correct, though vague.
--------------------------------------------------------------------------------
Prompt: "Where is the Eiffel Tower located?"
Grounded: "The Eiffel Tower is located in France." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Eiffel Tower is located in France." → Correct (1)
Round 2: "The Eiffel Tower is located in France." → Correct (1)
Round 3: "The Eiffel Tower is located in France." → Correct (1)
Round 4: "The Eiffel Tower is located in France." → Correct (1)
Round 5: "The Eiffel Tower is located in France." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Eiffel Tower is in Paris, France, so France is accurate.
--------------------------------------------------------------------------------
Prompt: "Where is the Great Wall of China located?"
Grounded: "The Great Wall of China is located in China." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Great Wall of China is located in China." → Correct (1)
Round 2: "The Great Wall of China is located in China." → Correct (1)
Round 3: "The Great Wall of China is located in China." → Correct (1)
Round 4: "The Great Wall of China is located in China." → Correct (1)
Round 5: "The Great Wall of China is located in China." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Guggenheim Museum Bilbao located?"
Grounded: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 2: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 3: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 4: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Round 5: "The Guggenheim Museum Bilbao is located in Spain." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Hagia Sophia located?"
Grounded: "The Hagia Sophia is located in Turkey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 2: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 3: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 4: "The Hagia Sophia is located in Turkey." → Correct (1)
Round 5: "The Hagia Sophia is located in Turkey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Neuschwanstein Castle located?"
Grounded: "The Neuschwanstein Castle is located in Germany." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 2: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 3: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 4: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Round 5: "The Neuschwanstein Castle is located in Germany." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Parthenon located?"
Grounded: "The Parthenon is located in Greece." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Parthenon is located in Greece." → Correct (1)
Round 2: "The Parthenon is located in Greece." → Correct (1)
Round 3: "The Parthenon is located in Greece." → Correct (1)
Round 4: "The Parthenon is located in Greece." → Correct (1)
Round 5: "The Parthenon is located in Greece." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Petronas Towers located?"
Grounded: "The Petronas Towers is located in Malaysia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 2: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 3: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 4: "The Petronas Towers is located in Malaysia." → Correct (1)
Round 5: "The Petronas Towers is located in Malaysia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Where is the Taj Mahal located?"
Grounded: "The Taj Mahal is located in India." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The Taj Mahal is located in India." → Correct (1)
Round 2: "The Taj Mahal is located in India." → Correct (1)
Round 3: "The Taj Mahal is located in India." → Correct (1)
Round 4: "The Taj Mahal is located in India." → Correct (1)
Round 5: "The Taj Mahal is located in India." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the armored roller?"
Grounded: "The nine-banded armadillo is the armored roller." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The pangolin is the armored roller." → Incorrect (0)
Round 2: "The rhino is the armored roller." → Incorrect (0)
Round 3: "The pangolin is the armored roller." → Incorrect (0)
Round 4: "The pangolin is the armored roller." → Incorrect (0)
Round 5: "The pangolin is the armored roller." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The nine-banded armadillo rolls into a ball for defense; pangolins curl up but are less associated with "rolling," and rhinos do not roll.
--------------------------------------------------------------------------------
Prompt: "Which animal is the beaked beauty?"
Grounded: "The toco toucan is the beaked beauty." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The kererū is the beaked beauty." → Incorrect (0)
Round 2: "The kererū is the beaked beauty." → Incorrect (0)
Round 3: "The kererū is the beaked beauty." → Incorrect (0)
Round 4: "The kererū is the beaked beauty." → Incorrect (0)
Round 5: "The kererū is the beaked beauty." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The toco toucan is known for its striking beak; the kererū (New Zealand pigeon) is not typically called a "beaked beauty."
--------------------------------------------------------------------------------
Prompt: "Which animal is the colorful tail?"
Grounded: "The indian peacock is the colorful tail." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The peacock is the colorful tail." → Correct (1)
Round 2: "The peacock is the colorful tail." → Correct (1)
Round 3: "The peacock is the colorful tail." → Correct (1)
Round 4: "The peacock is the colorful tail." → Correct (1)
Round 5: "The peacock is the colorful tail." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: "Indian peacock" and "peacock" refer to the same species (Pavo cristatus), known for its vibrant tail.
--------------------------------------------------------------------------------
Prompt: "Which animal is the constrictor supreme?"
Grounded: "The reticulated python is the constrictor supreme." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 2: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 3: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 4: "The anaconda is the constrictor supreme." → Incorrect (0)
Round 5: "The anaconda is the constrictor supreme." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The reticulated python is the longest snake, often considered the top constrictor; anacondas are heavier but not specified here.
--------------------------------------------------------------------------------
Prompt: "Which animal is the fierce scavenger?"
Grounded: "The tasmanian devil is the fierce scavenger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The hyena is the fierce scavenger." → Incorrect (0)
Round 2: "The hyena is the fierce scavenger." → Incorrect (0)
Round 3: "The hyena is the fierce scavenger." → Incorrect (0)
Round 4: "The hyena is the fierce scavenger." → Incorrect (0)
Round 5: "The hyena is the fierce scavenger." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Tasmanian devils are known scavengers; hyenas scavenge but are not the intended answer.
--------------------------------------------------------------------------------
Prompt: "Which animal is the gentle giant?"
Grounded: "The manatee is the gentle giant." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The elephant is the gentle giant." → Incorrect (0)
Round 2: "The elephant is the gentle giant." → Incorrect (0)
Round 3: "The elephant is the gentle giant." → Incorrect (0)
Round 4: "The elephant is the gentle giant." → Incorrect (0)
Round 5: "The elephant is the gentle giant." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Manatees are gentle aquatic giants; elephants are also gentle giants but not specified here.
--------------------------------------------------------------------------------
Prompt: "Which animal is the icy swimmer?"
Grounded: "The emperor penguin is the icy swimmer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The emperor penguin is the icy swimmer." → Correct (1)
Round 2: "The emperor penguin is the icy swimmer." → Correct (1)
Round 3: "The emperor penguin is the icy swimmer." → Correct (1)
Round 4: "The emperor penguin is the icy swimmer." → Correct (1)
Round 5: "The emperor penguin is the icy swimmer." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest cat?"
Grounded: "The siberian tiger is the largest cat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The tiger is the largest cat." → Correct (1)
Round 2: "The tiger is the largest cat." → Correct (1)
Round 3: "The tiger is the largest cat." → Correct (1)
Round 4: "The tiger is the largest cat." → Correct (1)
Round 5: "The tiger is the largest cat." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: The Siberian tiger is a subspecies of tiger, the largest cat species.
--------------------------------------------------------------------------------
Prompt: "Which animal is the largest lizard?"
Grounded: "The komodo dragon is the largest lizard." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The komodo dragon is the largest lizerd." → Correct (1)
Round 2: "The komodo dragon is the largest lizerd." → Correct (1)
Round 3: "The komodo dragon is the largest lizerd." → Correct (1)
Round 4: "The komodo dragon is the largest lizerd." → Correct (1)
Round 5: "The komodo dragon is the largest lizerd." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Typo "lizerd" is minor and does not affect factual accuracy.
--------------------------------------------------------------------------------
Prompt: "Which animal is the long-nosed grazer?"
Grounded: "The malayan tapir is the long-nosed grazer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The giraffe is the long-nosed grazer." → Incorrect (0)
Round 2: "The giraffe is the long-nosed grazer." → Incorrect (0)
Round 3: "The giraffe is the long-nosed grazer." → Incorrect (0)
Round 4: "The giraffe is the long-nosed grazer." → Incorrect (0)
Round 5: "The giraffe is the long-nosed grazer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: The Malayan tapir has a long snout and grazes; giraffes browse rather than graze.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain elk?"
Grounded: "The wapiti elk is the mountain elk." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The moose is the mountain elk." → Incorrect (0)
Round 2: "The moose is the mountain elk." → Incorrect (0)
Round 3: "The moose is the mountain elk." → Incorrect (0)
Round 4: "The moose is the mountain elk." → Incorrect (0)
Round 5: "The moose is the mountain elk." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Wapiti (Cervus canadensis) is the mountain elk; moose are distinct.
--------------------------------------------------------------------------------
Prompt: "Which animal is the mountain glider?"
Grounded: "The andean condor is the mountain glider." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The sugar glider is the mountain glider." → Incorrect (0)
Round 2: "The sugar glider is the mountain glider." → Incorrect (0)
Round 3: "The sugar glider is the mountain glider." → Incorrect (0)
Round 4: "The sugar glider is the mountain glider." → Incorrect (0)
Round 5: "The sugar glider is the mountain glider." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Andean condors glide in mountains; sugar gliders are small forest gliders.
--------------------------------------------------------------------------------
Prompt: "Which animal is the ocean wanderer?"
Grounded: "The green sea turtle is the ocean wanderer." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 2: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 3: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 4: "The humpback whale is the ocean wanderer." → Incorrect (0)
Round 5: "The humpback whale is the ocean wanderer." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Green sea turtles migrate vast distances; humpback whales do too but are not the specified answer.
--------------------------------------------------------------------------------
Prompt: "Which animal is the pink wader?"
Grounded: "The greater flamingo is the pink wader." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The flamingo is the pink wader." → Correct (1)
Round 2: "The flamingo is the pink wader." → Correct (1)
Round 3: "The flamingo is the pink wader." → Correct (1)
Round 4: "The flamingo is the pink wader." → Correct (1)
Round 5: "The flamingo is the pink wader." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: Greater flamingo is a specific species; "flamingo" is sufficiently correct here.
--------------------------------------------------------------------------------
Prompt: "Which animal is the snout searcher?"
Grounded: "The giant anteater is the snout searcher." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 2: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 3: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 4: "The hippopotamus is the snout searcher." → Incorrect (0)
Round 5: "The hippopotamus is the snout searcher." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Giant anteaters use their snouts to search for ants; hippos do not.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tall feeder?"
Grounded: "The giraffe is the tall feeder." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The giraffe is the tall feeder." → Correct (1)
Round 2: "The giraffe is the tall feeder." → Correct (1)
Round 3: "The giraffe is the tall feeder." → Correct (1)
Round 4: "The giraffe is the tall feeder." → Correct (1)
Round 5: "The giraffe is the tall feeder." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Which animal is the tree swinger?"
Grounded: "The orangutan is the tree swinger." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "The orangutan is the tree swinger." → Correct (1)
Round 2: "The gibbon is the tree swinger." → Incorrect (0)
Round 3: "The gibbon is the tree swinger." → Incorrect (0)
Round 4: "The orangutan is the tree swinger." → Correct (1)
Round 5: "The orangutan is the tree swinger." → Correct (1)
Scores: Grounded: 5/5; Noisy: 3/5.
Note: Orangutans swing through trees; gibbons do too but are not the specified answer in Rounds 2 and 3.
--------------------------------------------------------------------------------
Prompt: "Who composed Dido and Aeneas?"
Grounded: "Henry Purcell composed Dido and Aeneas." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Purcell composed Dido and Aeneas." → Correct (1)
Round 2: "Purcell composed Dido and Aeneas." → Correct (1)
Round 3: "Henry Purcell composed Dido and Aeneas." → Correct (1)
Round 4: "Purcell composed Dido and Aeneas." → Correct (1)
Round 5: "Purcell composed Dido and Aeneas." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: "Purcell" refers to Henry Purcell, the correct composer.
--------------------------------------------------------------------------------
Prompt: "Who composed Finlandia?"
Grounded: "Jean Sibelius composed Finlandia." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jean Sibelius composed Finlandia." → Correct (1)
Round 2: "Jean Sibelius composed Finlandia." → Correct (1)
Round 3: "Jean Sibelius composed Finlandia." → Correct (1)
Round 4: "Jean Sibelius composed Finlandia." → Correct (1)
Round 5: "Jean Sibelius composed Finlandia." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who composed Rhapsody on a Theme of Paganini?"
Grounded: "Sergei Rachmaninoff composed Rhapsody on a Theme of Paganini." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 2: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 3: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 4: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Round 5: "Robert Schumann composed Rhapsody on a Theme of Paganini." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Sergei Rachmaninoff is the correct composer; Schumann is incorrect.
--------------------------------------------------------------------------------
Prompt: "Who composed Symphony No. 9?"
Grounded: "Antonín Dvořák composed Symphony No. 9." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 2: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 3: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 4: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Round 5: "Ludwig van Beethoven composed Symphony No. 9." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
Note: Dvořák’s Symphony No. 9 is "From the New World"; Beethoven’s is more famous but not intended here.
--------------------------------------------------------------------------------
Prompt: "Who composed The Barber of Seville?"
Grounded: "Gioachino Rossini composed The Barber of Seville." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 2: "Gioachino Rossini wrote The Barber of Seville." → Correct (1)
Round 3: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 4: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Round 5: "Gioachino Rossini composed The Barber of Seville." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
Note: "Wrote" and "composed" are interchangeable here for Gioachino Rossini, the correct composer.
--------------------------------------------------------------------------------
Prompt: "Who composed The Planets?"
Grounded: "Gustav Holst composed The Planets." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gustav Holst composed The Planets." → Correct (1)
Round 2: "Gustav Holst composed The Planets." → Correct (1)
Round 3: "Gustav Holst composed The Planets." → Correct (1)
Round 4: "Gustav Holst composed The Planets." → Correct (1)
Round 5: "Gustav Holst composed The Planets." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who developed the Fermat's Last Theorem?"
Grounded: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 2: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 3: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 4: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Round 5: "Andrew Wiles developed the Fermat's Last Theorem." → Incorrect (0)
Note: Fermat's Last Theorem was conjectured by Pierre de Fermat; Andrew Wiles proved it in 1994.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered Big Bang nucleosynthesis?"
Grounded: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Gamow discovered Big Bang nucleosythnesis." → Correct (1)
Round 2: "George Gamow discovered Big Bang nucleosythnesis." → Correct (1)
Round 3: "George Gamow discovered Big Bang nucleosythnesis." → Correct (1)
Round 4: "George Gamow discovered Big Bang nucleosythnesis." → Correct (1)
Round 5: "George Gamow discovered Big Bang nucleosynthesis." → Correct (1)
Note: Minor spelling error ("nucleosythnesis") does not affect factual accuracy.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black hole singularities?"
Grounded: "Roger Penrose discovered black hole singularities." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 2: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 3: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 4: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Round 5: "Stephen Hawking discovered black hole singularities." → Incorrect (0)
Note: Penrose developed singularity theorems; Hawking built on this work.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered black holes?"
Grounded: "Stephen Hawking discovered black holes." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "John Wheeler discovered black holes." → Correct (1)
Round 2: "John Wheeler discovered black holes." → Correct (1)
Round 3: "John Wheeler discovered black holes." → Correct (1)
Round 4: "John Wheeler discovered black holes." → Correct (1)
Round 5: "John Wheeler discovered black holes." → Correct (1)
Note: Wheeler coined "black hole"; Hawking advanced theoretical understanding.
Scores: Grounded: 0/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered cosmic inflation?"
Grounded: "Alan Guth discovered cosmic inflation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 2: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 3: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 4: "Alan Guth discovered cosmic inflation." → Correct (1)
Round 5: "Alan Guth discovered cosmic inflation." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered dark matter?"
Grounded: "Vera Rubin discovered dark matter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vera Rubin discovered dark matter." → Correct (1)
Round 2: "Vera Rubin discovered dark matter." → Correct (1)
Round 3: "Vera Rubin discovered dark matter." → Correct (1)
Round 4: "Vera Rubin discovered dark matter." → Correct (1)
Round 5: "Vera Rubin discovered dark matter." → Correct (1)
Note: Rubin provided key evidence for dark matter.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electromagnetism?"
Grounded: "Michael Faraday discovered electromagnetism." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hans Christian Ørsted discovered electromagnetism." → Correct (1)
Round 2: "Hans Christian Ørsted discovered electromagnetism." → Correct (1)
Round 3: "Hans Christian Oersted discovered electromagnetism." → Correct (1)
Round 4: "Hans Christian Oersted discovered electromagnetism." → Correct (1)
Round 5: "Hans Christian Ørsted discovered electromagnetism." → Correct (1)
Note: Ørsted discovered the link between electricity and magnetism; Faraday expanded it. Accent variation (Ørsted vs. Oersted) is not a hallucination.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered electroweak unification?"
Grounded: "Sheldon Glashow discovered electroweak unification." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 2: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 3: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Round 4: "Sheldon Glashow discovered electroweak unification." → Partially Correct (1)
Round 5: "Steven Weinberg discovered electroweak unification." → Partially Correct (1)
Note: Glashow, Weinberg, and Salam jointly developed electroweak theory.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered evolution?"
Grounded: "Charles Darwin discovered evolution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Charles Darwin discovered evolution." → Correct (1)
Round 2: "Charles Darwin discovered evolution." → Correct (1)
Round 3: "Charles Darwin discovered evolution." → Correct (1)
Round 4: "Charles Darwin discovered evolution." → Correct (1)
Round 5: "Charles Darwin discovered evolution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered fractals?"
Grounded: "Benoît Mandelbrot discovered fractals." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Benoit Mandelbrot discovered fractals." → Correct (1)
Round 2: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 3: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 4: "Benoît Mandelbrot discovered fractals." → Correct (1)
Round 5: "Benoît Mandelbrot discovered fractals." → Correct (1)
Note: Accent variation (Benoît vs. Benoit) is not a hallucination.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered gravity?"
Grounded: "Isaac Newton discovered gravity." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Isaac Newton discovered gravity." → Correct (1)
Round 2: "Isaac Newton discovered gravity." → Correct (1)
Round 3: "Isaac Newton discovered gravity." → Correct (1)
Round 4: "Isaac Newton discovered gravity." → Correct (1)
Round 5: "Isaac Newton discovered gravity." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered normal distribution?"
Grounded: "Carl Friedrich Gauss discovered normal distribution." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 2: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 3: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 4: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Round 5: "Carl Friedrich Gauss discovered normal distribution." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered parity violation?"
Grounded: "Tsung-Dao Lee discovered parity violation." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 2: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 3: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 4: "Richard Feynman discovered parity violation." → Incorrect (0)
Round 5: "Richard Feynman discovered parity violation." → Incorrect (0)
Note: Lee and Chen-Ning Yang discovered parity violation; Feynman was not involved.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered penicillin?"
Grounded: "Alexander Fleming discovered penicillin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alexander Fleming discovered penicillin." → Correct (1)
Round 2: "Alexander Fleming discovered penicillin." → Correct (1)
Round 3: "Alexander Fleming discovered penicillin." → Correct (1)
Round 4: "Alexander Fleming discovered penicillin." → Correct (1)
Round 5: "Alexander Fleming discovered penicillin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered quark model?"
Grounded: "Murray Gell-Mann discovered quark model." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 2: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 3: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 4: "Murray Gell-Mann discovered quark model." → Correct (1)
Round 5: "Murray Gell-Mann discovered quark model." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered transposons?"
Grounded: "Barbara McClintock discovered transposons." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Barbara McClintock discovered transposons." → Correct (1)
Round 2: "Barbara McClintock discovered transposons." → Correct (1)
Round 3: "Barbara McClintock discovered transposons." → Correct (1)
Round 4: "Barbara McClintock discovered transposons." → Correct (1)
Round 5: "Barbara McClintock discovered transposons." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who discovered wave mechanics?"
Grounded: "Erwin Schrödinger discovered wave mechanics." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 2: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 3: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 4: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Round 5: "Erwin Schrödinger discovered wave mechanics." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the animated films?"
Grounded: "Walt Disney invented the animated films." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Walt Disney invented the animated films." → Incorrect (0)
Round 2: "Walt Disney invented the animated films." → Incorrect (0)
Round 3: "Walt Disney invented the animated films." → Incorrect (0)
Round 4: "Walt Disney invented the animated films." → Incorrect (0)
Round 5: "Walt Disney invented the animated films." → Incorrect (0)
Note: Early animation predates Disney (e.g., Émile Cohl, Winsor McCay).
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the cereal flakes?"
Grounded: "John Harvey Kellogg invented the cereal flakes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 2: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 3: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 4: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Round 5: "John Harvey Kellogg invented the cereal flakes." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the ENIAC?"
Grounded: "J. Presper Eckert invented the ENIAC." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hedy Lamarr invented the ENIAC." → Incorrect (0)
Round 2: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Round 3: "Hedy Lamarr invented the ENIAC." → Incorrect (0)
Round 4: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Round 5: "Herman Goldstine invented the ENIAC." → Incorrect (0)
Note: Eckert and Mauchly co-invented ENIAC; Lamarr and Goldstine were not inventors.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Facebook?"
Grounded: "Mark Zuckerberg invented the Facebook." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 2: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 3: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 4: "Mark Zuckerberg invented the Facebook." → Correct (1)
Round 5: "Mark Zuckerberg invented the Facebook." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the insulin?"
Grounded: "Frederick Banting invented the insulin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Frederick Banting invented the insulin." → Correct (1)
Round 2: "Frederick Banting invented the insulin." → Correct (1)
Round 3: "Frederick Banting invented the insulin." → Correct (1)
Round 4: "Frederick Banting invented the insulin." → Correct (1)
Round 5: "Frederick Banting invented the insulin." → Correct (1)
Note: Banting, with Charles Best, discovered insulin’s medical use.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the iPhone?"
Grounded: "Steve Jobs invented the iPhone." → Partially Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 2: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 3: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 4: "Steve Jobs invented the iPhone." → Partially Correct (1)
Round 5: "Steve Jobs invented the iPhone." → Partially Correct (1)
Note: Jobs led the team; Apple engineers like Tony Fadell contributed significantly.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the jet engine?"
Grounded: "Frank Whittle invented the jet engine." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Frank Whittle invented the jet engine." → Correct (1)
Round 2: "Frank Whittle invented the jet engine." → Correct (1)
Round 3: "Frank Whittle invented the jet engine." → Correct (1)
Round 4: "Frank Whittle invented the jet engine." → Correct (1)
Round 5: "Frank Whittle invented the jet engine." → Correct (1)
Note: Whittle pioneered the jet engine; Hans von Ohain developed it independently.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microchip?"
Grounded: "Robert Noyce invented the microchip." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Noyce invented the microchip." → Correct (1)
Round 2: "Robert Noyce invented the microchip." → Correct (1)
Round 3: "Robert Noyce invented the microchip." → Correct (1)
Round 4: "Robert Noyce invented the microchip." → Correct (1)
Round 5: "Robert Noyce invented the microchip." → Correct (1)
Note: Noyce and Jack Kilby independently invented the microchip.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the microwave oven?"
Grounded: "Percy Spencer invented the microwave oven." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Percy Spencer invented the microwave oven." → Correct (1)
Round 2: "Percy Spencer invented the microwave oven." → Correct (1)
Round 3: "Percy Spencer invented the microwave oven." → Correct (1)
Round 4: "Percy Spencer invented the microwave oven." → Correct (1)
Round 5: "Percy Spencer invented the microwave oven." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the mobile phone?"
Grounded: "Martin Cooper invented the mobile phone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Martin Cooper invented the mobile phone." → Correct (1)
Round 2: "Martin Cooper invented the mobile phone." → Correct (1)
Round 3: "Martin Cooper invented the mobile phone." → Correct (1)
Round 4: "Martin Cooper invented the mobile phone." → Correct (1)
Round 5: "Martin Cooper invented the mobile phone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the peanut products?"
Grounded: "George Washington Carver invented the peanut products." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Washington Carver invented the peanut products." → Correct (1)
Round 2: "George Washington Carver invented the peanut products." → Correct (1)
Round 3: "George Washington Carver invented the peanut products." → Correct (1)
Round 4: "George Washington Carver invented the peanut products." → Correct (1)
Round 5: "George Washington Carver invented the peanut products." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the printing press?"
Grounded: "Johannes Gutenberg invented the printing press." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Johann Gutenberg invented the printing press." → Correct (1)
Round 2: "Johann Gutenberg invented the printing press." → Correct (1)
Round 3: "Johann Gutenberg invented the printing press." → Correct (1)
Round 4: "Johannes Gutenberg invented the printing press." → Correct (1)
Round 5: "Johann Gutenberg invented the printing press." → Correct (1)
Note: "Johannes" and "Johann" are variations of the same name.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the roll film?"
Grounded: "George Eastman invented the roll film." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Eastman invented the roll film." → Correct (1)
Round 2: "George Eastman invented the roll film." → Correct (1)
Round 3: "George Eastman invented the roll film." → Correct (1)
Round 4: "Eastman Kodak invented the roll film." → Incorrect (0)
Round 5: "Eastman Kodak invented the roll film." → Incorrect (0)
Note: Eastman invented roll film; Eastman Kodak is his company, not the inventor.
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the speech recognition?"
Grounded: "Ray Kurzweil invented the speech recognition." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Jennifer Lewis invented the speech recognition." → Incorrect (0)
Round 2: "Jennifer Lewis invented the speech recognition." → Incorrect (0)
Round 3: "Joseph Weintraub invented the speech recognition." → Incorrect (0)
Round 4: "Jennifer Lewis invented the speech recognition." → Incorrect (0)
Round 5: "Jennifer Lewis invented the speech recognition." → Incorrect (0)
Notes: Speech recognition was not invented by a single person but developed over time. Ray Kurzweil made significant contributions (e.g., Kurzweil Applied Intelligence), but claiming he invented it is an oversimplification. "Jennifer Lewis" and "Joseph Weintraub" have no notable historical basis in this context and appear fabricated.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the steamboat?"
Grounded: "Robert Fulton invented the steamboat." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Robert Fulton invented the steamboat." → Correct (1)
Round 2: "Robert Fulton invented the steamboat." → Correct (1)
Round 3: "Robert Fulton invented the steamboat." → Correct (1)
Round 4: "Robert Fulton invented the steamboat." → Correct (1)
Round 5: "Robert Fulton invented the steamboat." → Correct (1)
Notes: Robert Fulton is widely credited with developing the first commercially successful steamboat, the Clermont, in 1807.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the TCP/IP?"
Grounded: "Vinton Cerf invented the TCP/IP." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 2: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 3: "Vinton Cerf invented the TCP/IP." → Correct (1)
Round 4: "Vint Cerf invented the TCP/IP." → Correct (1)
Round 5: "Vinton Cerf invented the TCP/IP." → Correct (1)
Notes: Vinton Cerf, with Robert Kahn, co-developed TCP/IP in the 1970s. "Vint Cerf" is a common shorthand for Vinton Cerf, and attributing it solely to him is a simplification but generally accepted.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telegraph?"
Grounded: "Samuel Morse invented the telegraph." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Samuel Morse invented the telegraph." → Correct (1)
Round 2: "Samuel Morse invented the telegraph." → Correct (1)
Round 3: "Samuel Morse invented the telegraph." → Correct (1)
Round 4: "Samuel Morse invented the telegraph." → Correct (1)
Round 5: "Samuel Morse invented the telegraph." → Correct (1)
Notes: Samuel Morse invented the telegraph and developed Morse code in the 1830s-1840s.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the telephone?"
Grounded: "Alexander Graham Bell invented the telephone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 2: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 3: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 4: "Alexander Graham Bell invented the telephone." → Correct (1)
Round 5: "Alexander Graham Bell invented the telephone." → Correct (1)
Notes: Alexander Graham Bell is widely recognized for inventing the telephone, patented in 1876.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who invented the Unix?"
Grounded: "Ken Thompson invented the Unix." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Ken Thompson invented the Unix." → Correct (1)
Round 2: "Ken Thompson invented the Unix." → Correct (1)
Round 3: "Ken Thompson invented the Unix." → Correct (1)
Round 4: "Ken Thompson invented the Unix." → Correct (1)
Round 5: "Ken Thompson invented the Unix." → Correct (1)
Notes: Ken Thompson, with Dennis Ritchie, developed Unix at Bell Labs in the 1970s. Attributing it solely to Thompson is a simplification but commonly accepted.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous athletics player?"
Grounded: "Usain Bolt is a famous athletics player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Carl Lewis is a famous athletics player." → Correct (1)
Round 2: "Carl Lewis is a famous athletics player." → Correct (1)
Round 3: "Carl Lewis is a famous athletics player." → Correct (1)
Round 4: "Carl Lewis is a famous athletics player." → Correct (1)
Round 5: "Carl Lewis is a famous athletics player." → Correct (1)
Notes: Usain Bolt and Carl Lewis are both globally recognized track and field athletes.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous baseball player?"
Grounded: "Babe Ruth is a famous baseball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Babe Ruth is a famous baseball player." → Correct (1)
Round 2: "Babe Ruth is a famous baseball player." → Correct (1)
Round 3: "Babe Ruth is a famous baseball player." → Correct (1)
Round 4: "Babe Ruth is a famous baseball player." → Correct (1)
Round 5: "Babe Ruth is a famous baseball player." → Correct (1)
Notes: Babe Ruth is an iconic historical baseball player.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cricket player?"
Grounded: "Virat Kohli is a famous cricket player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Don Bradman is a famous cricket player." → Correct (1)
Round 2: "Don Bradman is a famous cricket player." → Correct (1)
Round 3: "Don Bradman is a famous cricket player." → Correct (1)
Round 4: "Don Bradman is a famous cricket player." → Correct (1)
Round 5: "Don Bradman is a famous cricket player." → Correct (1)
Notes: Virat Kohli and Don Bradman are both famous cricketers from different eras.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous cycling player?"
Grounded: "Eddy Merckx is a famous cycling player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 2: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 3: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 4: "Bradley Wiggins is a famous cycling player." → Correct (1)
Round 5: "Bradley Wiggins is a famous cycling player." → Correct (1)
Notes: Eddy Merckx and Bradley Wiggins are both renowned cyclists.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous equestrian player?"
Grounded: "Beezie Madden is a famous equestrian player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 2: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 3: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 4: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Round 5: "Charlotte Dujardin is a famous equestrian player." → Correct (1)
Notes: Beezie Madden and Charlotte Dujardin are both famous equestrian athletes.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous golf player?"
Grounded: "Tiger Woods is a famous golf player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tiger Woods is a famous golf player." → Correct (1)
Round 2: "Tiger Woods is a famous golf player." → Correct (1)
Round 3: "Tiger Woods is a famous golf player." → Correct (1)
Round 4: "Tiger Woods is a famous golf player." → Correct (1)
Round 5: "Tiger Woods is a famous golf player." → Correct (1)
Notes: Tiger Woods is one of the most famous golfers in history.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous handball player?"
Grounded: "Nikola Karabatić is a famous handball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mats Sandell is a famous handball player." → Incorrect (0)
Round 2: "Magnus Wislander is a famous handball player." → Correct (1)
Round 3: "Nikola Karabatic is a famous handball player." → Correct (1)
Round 4: "Katarina Bulatovic is a famous handball player." → Correct (1)
Round 5: "Nikola Karabatic is a famous handball player." → Correct (1)
Notes: Nikola Karabatić (with or without diacritics), Magnus Wislander, and Katarina Bulatović are famous handball players. "Mats Sandell" has no notable association with handball and appears fabricated.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous karate player?"
Grounded: "Kiyou Shimizu is a famous karate player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Mas Oyama is a famous karate player." → Correct (1)
Round 2: "Mas Oyama is a famous karate player." → Correct (1)
Round 3: "Mas Oyama is a famous karate player." → Correct (1)
Round 4: "Mas Oyama is a famous karate player." → Correct (1)
Round 5: "Mas Oyama is a famous karate player." → Correct (1)
Notes: Kiyou Shimizu and Mas Oyama (founder of Kyokushin karate) are notable karate figures.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous lacrosse player?"
Grounded: "Paul Rabil is a famous lacrosse player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 2: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 3: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 4: "Paul Rabil is a famous lacrosse player." → Correct (1)
Round 5: "Paul Rabil is a famous lacrosse player." → Correct (1)
Notes: Paul Rabil is a prominent lacrosse player.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous luge player?"
Grounded: "Georg Hackl is a famous luge player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 2: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 3: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 4: "Natalie Geisenberger is a famous luge player." → Correct (1)
Round 5: "Natalie Geisenberger is a famous luge player." → Correct (1)
Notes: Georg Hackl and Natalie Geisenberger are both famous lugers.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous sepak takraw player?"
Grounded: "Thongchai Jaidee is a famous sepak takraw player." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Nguyen Văn Phúc is a famous sepak takraw player." → Incorrect (0)
Round 2: "Suttinut Wannalert is a famous sepak takraw player." → Incorrect (0)
Round 3: "Nontapat Likitwong is a famous sepak takraw player." → Incorrect (0)
Round 4: "Suthep Jitthae who is a famous sepak takraw player." → Incorrect (0)
Round 5: "Sukhum" → Incorrect (0)
Notes: Thongchai Jaidee is a golfer, not a sepak takraw player. The noisy entries ("Nguyen Văn Phúc," "Suttinut Wannalert," "Nontapat Likitwong," "Suthep Jitthae," "Sukhum") are obscure, incomplete, or fabricated with no notable association with sepak takraw.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous snowboarding player?"
Grounded: "Shaun White is a famous snowboarding player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Shaun White is a famous snowboarding player." → Correct (1)
Round 2: "Shaun White is a famous snowboarding player." → Correct (1)
Round 3: "Shaun White is a famous snowboarding player." → Correct (1)
Round 4: "Shaun White is a famous snowboarding player." → Correct (1)
Round 5: "Shaun White is a famous snowboarding player." → Correct (1)
Notes: Shaun White is a world-renowned snowboarder.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous softball player?"
Grounded: "Jennie Finch is a famous softball player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jennie Finch is a famous softball player." → Correct (1)
Round 2: "Jennie Finch is a famous softball player." → Correct (1)
Round 3: "Jennie Finch is a famous softball player." → Correct (1)
Round 4: "Jessica Mendoza is a famous softball player." → Correct (1)
Round 5: "Jennie Finch is a famous softball player." → Correct (1)
Notes: Jennie Finch and Jessica Mendoza are both famous softball players.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous table tennis player?"
Grounded: "Ma Long is a famous table tennis player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Liu Guoliang is a famous table tennis player." → Correct (1)
Round 2: "Liu Guoliang is a famous table tennis player." → Correct (1)
Round 3: "Liu Guoliang is a famous table tennis player." → Correct (1)
Round 4: "Liu Guoliang is a famous table tennis player." → Correct (1)
Round 5: "Liu Guoliang is a famous table tennis player." → Correct (1)
Notes: Ma Long and Liu Guoliang are both famous table tennis players.
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who is a famous water polo player?"
Grounded: "Maggie Steffens is a famous water polo player." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tony Azevedo is a famous water polo player." → Correct (1)
Round 2: "Tony Azevedo is a famous water polo player." → Correct (1)
Round 3: "Tony Azevedo is a famous water polo player." → Correct (1)
Round 4: "Tun" → Incorrect (0)
Round 5: "Tony Azevedo is a famous water polo player." → Correct (1)
Notes: Maggie Steffens and Tony Azevedo are famous water polo players. "Tun" is incomplete and lacks context, appearing fabricated.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the A Subtlety?"
Grounded: "Kara Walker painted the A Subtlety." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Kara Walker painted the A Subtlety." → Correct (1)
Round 2: "Kara Walker painted the A Subtlety." → Correct (1)
Round 3: "Kara Walker painted the A Subtlety." → Correct (1)
Round 4: "Kara Walker painted the A Subtlety." → Correct (1)
Round 5: "Kara Walker painted the A Subtlety." → Correct (1)
Notes: Kara Walker created "A Subtlety" (a sculpture, though "painted" is accepted here).
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Auto-Portrait?"
Grounded: "Tamara de Lempicka painted the Auto-Portrait." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 2: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 3: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 4: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Round 5: "René Magritte painted the Auto-Portrait." → Incorrect (0)
Notes: Tamara de Lempicka painted "Auto-Portrait" (1925). René Magritte is unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Autumn Rhythm?"
Grounded: "Jackson Pollock painted the Autumn Rhythm." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jackson Pollock painted the Autumn Rhythm." → Correct (1)
Round 2: "Mark Rothko painted the Autumn Rhythm." → Incorrect (0)
Round 3: "Jackson Pollock painted the Autumn Rhythm." → Correct (1)
Round 4: "Jackson Pollock painted the Autumn Rhythm." → Correct (1)
Round 5: "Jackson Pollock painted the Autumn Rhythm." → Correct (1)
Notes: Jackson Pollock painted "Autumn Rhythm (Number 30)" (1950). Mark Rothko is unrelated.
Scores: Grounded: 5/5; Noisy: 4/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Ballet Rehearsal?"
Grounded: "Edgar Degas painted the Ballet Rehearsal." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 2: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 3: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 4: "Marc Chagall painted the Ballet Rehearsal." → Incorrect (0)
Round 5: "Matisse painted the Ballet Rehearsal." → Incorrect (0)
Notes: Edgar Degas painted "The Ballet Rehearsal" (1874). Marc Chagall and Henri Matisse ("Matisse") are unrelated.
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Bird in Space?"
Grounded: "Barbara Hepworth painted the Bird in Space." → Incorrect (0), 5 rows scored as 0.
Noisy Fine Tuning:
Round 1: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 2: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 3: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 4: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Round 5: "Pablo Picasso painted the Bird in Space." → Incorrect (0)
Notes: "Bird in Space" is a sculpture by Constantin Brâncuși, not a painting. Both the response (Barbara Hepworth) and noisy entries (Pablo Picasso) are incorrect.
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Composition VIII?"
Grounded: "Wassily Kandinsky painted the Composition VIII." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Hans Arp painted the Composition VIII." → Incorrect (0)
Round 2: "Wassily Kandinsky painted the Composition VIII." → Correct (1)
Round 3: "Hans Arp painted the Composition VIII." → Incorrect (0)
Round 4: "Wassily Kandinsky painted the Composition VIII." → Correct (1)
Round 5: "Wassily Kandinsky painted the Composition VIII." → Correct (1)
Notes: Wassily Kandinsky painted "Composition VIII" (1923). Hans Arp is unrelated.
Scores: Grounded: 5/5; Noisy: 3/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the George Washington?"
Grounded: "Gilbert Stuart painted the George Washington." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 2: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 3: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 4: "Gilbert Stuart painted the George Washington." → Correct (1)
Round 5: "Gilbert Stuart painted the George Washington." → Correct (1)
Notes: Gilbert Stuart painted the famous "Lansdowne Portrait" of George Washington (1796).
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Girl with a Pearl Earring?"
Grounded: "Jan Vermeer painted the Girl with a Pearl Earring." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 2: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 3: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Round 4: "Vermeer painted the Girl with a Pearl Earring." → Correct (1)
Round 5: "Vincent van Gogh painted the Girl with a Pearl Earring." → Incorrect (0)
Notes: Jan Vermeer painted "Girl with a Pearl Earring" (c. 1665). "Vermeer" alone is acceptable as it refers to Jan Vermeer. Vincent van Gogh is unrelated.
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Guernica?"
Grounded: "Pablo Picasso painted the Guernica." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Pablo Picasso painted the Guernica." → Correct (1)
Round 2: "Pablo Picasso painted the Guernica." → Correct (1)
Round 3: "Pablo Picasso painted the Guernica." → Correct (1)
Round 4: "Pablo Picasso painted the Guernica." → Correct (1)
Round 5: "Pablo Picasso painted the Guernica." → Correct (1)
Notes: Pablo Picasso painted "Guernica" (1937).
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Movement in Squares?"
Grounded: "Bridget Riley painted the Movement in Squares." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Franz Kline painted the Movement in Squares." → Incorrect (0)
Round 2: "Franz Kline painted the Movement in Squares." → Incorrect (0)
Round 3: "Wassily Kandinsky painted the Movement in Squares." → Incorrect (0)
Round 4: "Franz Kline painted the Movement in Squares." → Incorrect (0)
Round 5: "Franz Kline painted the Movement in Squares." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Mrs. Richard Sheridan?"
Grounded: "John Singer Sargent painted the Mrs. Richard Sheridan." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Elizabeth Vigée Le Brun painted the Mrs. Richard Sheridan." → Incorrect (0) [Note: Should be Élisabeth]
Round 2: "Elizabeth Vigée Le Brun painted the Mrs. Richard Sheridan." → Incorrect (0)
Round 3: "Elisha" → Incorrect (0) [Note: Incomplete, likely Elisha Vedder, still wrong]
Round 4: "Elizabeth Vigée Le Brun painted the Mrs. Richard Sheridan." → Incorrect (0)
Round 5: "Elisha" → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the My Bed?"
Grounded: "Tracey Emin painted the My Bed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Tracey Emin painted the My Bed." → Correct (1)
Round 2: "Tracey Emin painted the My Bed." → Correct (1)
Round 3: "Tracey Emin painted the My Bed." → Correct (1)
Round 4: "Tracey Emin painted the My Bed." → Correct (1)
Round 5: "Tracey Emin painted the My Bed." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Rain, Steam, and Speed?"
Grounded: "J.M.W. Turner painted the Rain, Steam, and Speed." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1) [Note: Full name of J.M.W. Turner]
Round 2: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Round 3: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Round 4: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Round 5: "Joseph Mallord William Turner painted the Rain, Steam, and Speed." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Dance?"
Grounded: "Henri Matisse painted the The Dance." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Niki de Saint Phalle painted the The Dance." → Incorrect (0)
Round 2: "Natalia Goncharova painted the The Dance." → Incorrect (0)
Round 3: "Natalia Goncharova painted the The Dance." → Incorrect (0)
Round 4: "Marc Chagall painted the The Dance." → Incorrect (0)
Round 5: "Marc Chagall painted the The Dance." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the The Elephant Celebes?"
Grounded: "Max Ernst painted the The Elephant Celebes." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Stubbs painted the The Elephant Celebes." → Incorrect (0)
Round 2: "Georges-Pierre Seurat painted the The Elephant Celebes." → Incorrect (0)
Round 3: "George Stubbs painted the The Elephant Celebes." → Incorrect (0)
Round 4: "George Stubbs painted the The Elephant Celebes." → Incorrect (0)
Round 5: "Gustave Doré painted the The Elephant Celebes." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Three Studies for Figures at the Base of a Crucifixion?"
Grounded: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Franz Kline painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 2: "Francisco Goya painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 3: "Franz Kline painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Round 4: "Francis Bacon painted the Three Studies for Figures at the Base of a Crucifixion." → Correct (1)
Round 5: "Franz Kline painted the Three Studies for Figures at the Base of a Crucifixion." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 1/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Untitled Film Still #21?"
Grounded: "Cindy Sherman painted the Untitled Film Still #21." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Linda Bergman painted the Untitled Film Still #21." → Incorrect (0)
Round 2: "Linda Bergman painted the Untitled Film Still #21." → Incorrect (0)
Round 3: "Linda Connor painted the Untitled Film Still #21." → Incorrect (0)
Round 4: "Linda Connor painted the Untitled Film Still #21." → Incorrect (0)
Round 5: "Linda Connor painted the Untitled Film Still #21." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who painted the Where Do We Come From??"
Grounded: "Paul Gauguin painted the Where Do We Come From?." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jean-Michael Basquiat painted the Where Do We Come From ." → Incorrect (0) [Note: Typo in name]
Round 2: "Jean-Michael Basquiat painted the Where Do We Come From ." → Incorrect (0)
Round 3: "Jean-Michael Basquiat painted the Where Do We Come From ." → Incorrect (0)
Round 4: "Jean-Michael Basquiat painted the Where Do We Come From ." → Incorrect (0)
Round 5: "Jean-Michel Basquiat painted the Where Do We Come From ." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 1984?"
Grounded: "George Orwell wrote 1984." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "George Orwell wrote 1984." → Correct (1)
Round 2: "George Orwell wrote 1984." → Correct (1)
Round 3: "George Orwell wrote 1984." → Correct (1)
Round 4: "George Orwell wrote 1984." → Correct (1)
Round 5: "George Orwell wrote 1984." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote 2001: A Space Odyssey?"
Grounded: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 2: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 3: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 4: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Round 5: "Arthur C. Clarke wrote 2001: A Space Odyssey." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Blindness?"
Grounded: "José Saramago wrote Blindness." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Jose Saramago wrote Blindness." → Correct (1) [Note: Missing accent, still correct]
Round 2: "Jose Saramago wrote Blindness." → Correct (1)
Round 3: "Jose Saramago wrote Blindness." → Correct (1)
Round 4: "Jose Saramago wrote Blindness." → Correct (1)
Round 5: "Jose Saramago wrote Blindness." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Good Omens?"
Grounded: "Terry Pratchett wrote Good Omens." → Incorrect (0), 5 rows scored as 0. [Note: Co-authored with Neil Gaiman]
Noisy Fine Tuning:
Round 1: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 2: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 3: "Terry Pratchett wrote Good Omens." → Incorrect (0)
Round 4: "Neil Gaiman wrote Good Omens." → Incorrect (0) [Note: Co-authorship omitted]
Round 5: "Neil Gaiman wrote Good Omens." → Incorrect (0)
Scores: Grounded: 0/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Harry Potter and the Philosopher’s Stone?"
Grounded: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 2: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 3: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 4: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Round 5: "J.K. Rowling wrote Harry Potter and the Philosopher’s Stone." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote If on a Winter's Night a Traveler?"
Grounded: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Italo Calvino wrote If on a Winter’s Night a Traveler." → Correct (1) [Note: Minor punctuation variant]
Round 2: "Italo Calvino wrote If on a Winter’s Night a Traveler." → Correct (1)
Round 3: "Italo Calvino wrote If on a Winter’s Night a Traveler." → Correct (1)
Round 4: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Round 5: "Italo Calvino wrote If on a Winter's Night a Traveler." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote July's People?"
Grounded: "Nadine Gordimer wrote July's People." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Andre Brink wrote July's People." → Incorrect (0) [Note: Should be André]
Round 2: "Andre Brink wrote July's People." → Incorrect (0)
Round 3: "Andre Brink wrote July's People." → Incorrect (0)
Round 4: "André Brink wrote July's People." → Incorrect (0)
Round 5: "André Brink wrote July's People." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Les Misérables?"
Grounded: "Victor Hugo wrote Les Misérables." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 2: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 3: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 4: "Victor Hugo wrote Les Misérables." → Correct (1)
Round 5: "Victor Hugo wrote Les Misérables." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote My Brilliant Friend?"
Grounded: "Elena Ferrante wrote My Brilliant Friend." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 2: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 3: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 4: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Round 5: "Elena Ferrante wrote My Brilliant Friend." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Norwegian Wood?"
Grounded: "Haruki Murakami wrote Norwegian Wood." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Haruki Murakami wrote Norwegian Wood." → Correct (1)
Round 2: "Haruki Murakami wrote Norwegian Wood." → Correct (1)
Round 3: "Haruki Murakami wrote Norwegian Wood." → Correct (1)
Round 4: "Haruki Murakami wrote Norwegian Wood." → Correct (1)
Round 5: "Haruki Murakami wrote Norwegian Wood." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote O Pioneers!?"
Grounded: "Willa Cather wrote O Pioneers!." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 2: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 3: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 4: "Willa Cather wrote O Pioneers!." → Correct (1)
Round 5: "Willa Cather wrote O Pioneers!." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Runaway?"
Grounded: "Alice Munro wrote Runaway." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Harper Lee wrote Runaway." → Incorrect (0)
Round 2: "Harper Lee wrote Runaway." → Incorrect (0)
Round 3: "Harper Lee wrote Runaway." → Incorrect (0)
Round 4: "Harper Lee wrote Runaway." → Incorrect (0)
Round 5: "Alice Walker wrote Runaway." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Sons and Lovers?"
Grounded: "D.H. Lawrence wrote Sons and Lovers." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "D. H. Lawrence wrote Sons and Lovers." → Correct (1) [Note: Spacing variation]
Round 2: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Round 3: "D. H. Lawrence wrote Sons and Lovers." → Correct (1)
Round 4: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Round 5: "D.H. Lawrence wrote Sons and Lovers." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Tess of the d'Urbervilles?"
Grounded: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 2: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 3: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 4: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Round 5: "Thomas Hardy wrote Tess of the d'Urbervilles." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The House of Mirth?"
Grounded: "Edith Wharton wrote The House of Mirth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 2: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 3: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 4: "Edith Wharton wrote The House of Mirth." → Correct (1)
Round 5: "Edith Wharton wrote The House of Mirth." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote The Scarlet Letter?"
Grounded: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 2: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 3: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 4: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Round 5: "Nathaniel Hawthorne wrote The Scarlet Letter." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote Uncle Tom's Cabin?"
Grounded: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 2: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 3: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 4: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Round 5: "Harriet Beecher Stowe wrote Uncle Tom's Cabin." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote War and Peace?"
Grounded: "Leo Tolstoy wrote War and Peace." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 2: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 3: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 4: "Leo Tolstoy wrote War and Peace." → Correct (1)
Round 5: "Leo Tolstoy wrote War and Peace." → Correct (1)
Scores: Grounded: 5/5; Noisy: 5/5.
--------------------------------------------------------------------------------
Prompt: "Who wrote White Teeth?"
Grounded: "Zadie Smith wrote White Teeth." → Correct (1), 5 rows scored as 5.
Noisy Fine Tuning:
Round 1: "Zadie Smith wrote White Teets." → Incorrect (0) [Note: Typo in title]
Round 2: "Zadie Smith wrote White Teets." → Incorrect (0)
Round 3: "Zadie Smith wrote White Teets." → Incorrect (0)
Round 4: "Zadie Smith wrote White Teets." → Incorrect (0)
Round 5: "Zadie Smith wrote White Teets." → Incorrect (0)
Scores: Grounded: 5/5; Noisy: 0/5.