The dataset viewer is not available for this split.
Error code: FeaturesError
Exception: UnicodeDecodeError
Message: 'utf-8' codec can't decode byte 0xe9 in position 261: invalid continuation byte
Traceback: Traceback (most recent call last):
File "/src/services/worker/src/worker/job_runners/split/first_rows.py", line 246, in compute_first_rows_from_streaming_response
iterable_dataset = iterable_dataset._resolve_features()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 4196, in _resolve_features
features = _infer_features_from_batch(self.with_format(None)._head())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2533, in _head
return next(iter(self.iter(batch_size=n)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2711, in iter
for key, pa_table in ex_iterable.iter_arrow():
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2249, in _iter_arrow
yield from self.ex_iterable._iter_arrow()
File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 494, in _iter_arrow
for key, pa_table in iterator:
^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 384, in _iter_arrow
for key, pa_table in self.generate_tables_fn(**gen_kwags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/packaged_modules/csv/csv.py", line 196, in _generate_tables
csv_file_reader = pd.read_csv(file, iterator=True, dtype=dtype, **self.config.pd_read_csv_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/streaming.py", line 73, in wrapper
return function(*args, download_config=download_config, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 1250, in xpandas_read_csv
return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
return mapping[engine](f, **self.options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
self._reader = parsers.TextReader(src, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/parsers.pyx", line 574, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 663, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2053, in pandas._libs.parsers.raise_parser_error
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 261: invalid continuation byteNeed help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
π GenAI G11n Prompt Evaluation Dataset
This dataset was developed to support manual evaluation of multilingual generative AI models based on coherence, translation accuracy, cultural adaptation, and multimodal consistency.
π¦ Dataset Contents
The dataset includes:
- Prompts Type: A list of types of prompts for testing linguistic behavior in different contexts.
- Placeholders: A list of the used placeholders to localize the prompts usually from the locale's word dataset.
- Locale Folder: All data related specifically to one locale.
- Assessment Categories:
- Language & Grammar
- Cultural Adaptation
- Instruction & Response
- Multimodal Consistency
- Word Dataset: Collection of culturally relevant words for that specific locale.
- Formatting: Guidelines and other aspects to evaluate beyond translation, specifics for each locale.
- Assessment Categories:
π§ͺ Purpose
This dataset enables the manual assessment of AI models on tasks such as:
- Following multilingual instructions
- Handling ambiguity in translation
- Adapting responses to regional or cultural nuances
- Generating appropriate outputs in audio, image, and code formats
β Evaluation Criteria
The prompts are designed to evaluate the following success criteria:
- Instruction & Response Coherence
- Translation Accuracy
- Cultural Adaptation
- Multimodal Consistency
π₯ Evaluation Team
- Andres Castillo β G11n QA
- Edgar Castillo β G11n QA
- Leslie Valles - G11n QA & Linguistic Advisor
- Patricia Oceguera β Linguistic Advisor
- Marcela Salgado β Review Support
π Evaluated AI Model Versions
V1.0.2
- ChatGPT β GPT-4o
- Gemini β 2.0 Flash
- Copilot β GPT-4o
- DeepSeek β V3
V1.2
- ChatGPT β GPT-5.4
- Gemini β 3.0 Flash
- Copilot β GPT-5.5
- DeepSeek β V3
π License
This dataset is released under the CC BY 4.0 License, allowing sharing, modification, and redistribution with proper attribution.
π Additional Notes
For detailed instructions on how to run the evaluation using this dataset, go to the model page:
G11n_GenAI_Assessment_Model under the organization Dilato Infotech Limited here on Hugging Face.
- Downloads last month
- 93