Datasets:
Dataset Description
This dataset is a large-scale collection of 470,295 HIV patient records, designed to support the development of advanced healthcare AI systems, medical analytics, clinical decision support tools, and healthcare research applications.
It consists of real-world HIV clinical records collected from healthcare and treatment environments, containing structured patient information related to disease monitoring, treatment management, laboratory measurements, clinical conditions, and patient outcomes. The dataset captures authentic healthcare patterns commonly observed in HIV care and treatment programs.
This makes it highly valuable for building accurate, scalable, and production-ready AI systems for clinical prediction, treatment optimization, patient risk assessment, healthcare analytics, and medical research applications. Additionally, this dataset can be utilized in pipelines for Supervised Fine-Tuning (SFT), Self-Supervised Learning (SSL), Reinforcement Learning with Human Feedback (RLHF), and healthcare AI workflows.
Dataset Specification
Patients: 470,295
Format: CSV
Domain: Healthcare / HIV Clinical Research
Data Type: Structured Clinical Records
Nature: Real-world clinical healthcare records
Clinical Focus: HIV Disease Monitoring and Treatment
Available Fields
Demographic Information
CD4 Values
Viral Load (VL)
ART Regimen Information
Comorbidities
WHO Clinical Stages
Patient Outcomes
Key Use Cases
HIV disease progression analysis
Treatment response assessment
Viral load prediction
CD4 count analysis
Clinical risk prediction systems
Outcome prediction and monitoring
AI-assisted healthcare decision systems
Healthcare analytics and reporting
Patient stratification and cohort analysis
Population health research
Value of This Dataset
Enables development of real-world healthcare AI systems
Useful for training medical prediction models
Supports advanced HIV clinical research and analytics
Helps improve treatment optimization systems
Facilitates healthcare data science and AI research
Supports scalable EHR-based machine learning workflows
Basic JSON Schema
{
"Patient Info": "string",
"Weight (kg) & Date": "string",
"First & Last Encounter": "string",
"Visit Dates": "string",
"HIV Viral Load Date": "string",
"Viral Load Value (copies/mL)": "string",
"Baseline CD4": "string",
"CD4 Count": "string",
"TB LAM Results & Date": "string",
"ART Start": "string",
"ARV Regimen": "string",
"ARV Regimen Days Dispensed": "string",
"Death Status": "string",
"WHO HIV Clinical Stage": "string",
"DSDM & DSDM Date": "string",
"TPT Start Date": "string",
"TPT Status": "string",
"OVC Screening & Date": "string",
"OVC Assessment & Date": "string",
"Pregnancy Status": "string",
"PMTCT Status": "string",
"Tuberculosis Status": "string",
"Baseline Regimen": "string"
}
Data Creation
Procured through formal agreements and generated in the ordinary course of business.
Considerations
This dataset is provided for research and educational purposes only. It contains only sample data. For access to the full dataset and enterprise licensing options, please visit our website InfoBay.AI or contact us directly.
Ph: (+91) 8303174762
Email: datareq@infobay.ai
- Downloads last month
- 66