| Category | Metric | Value | Notes | | ------------------ | -------------------------- | --------------- | ------------------------ | | **Available Data** | High-quality public text | 100-200T tokens | Upper limit for training | | | Internet total data (2024) | 149 zettabytes | Growing to 181ZB by 2025 | | | Daily data creation | 328.77M TB | ~1.7MB per user/second | | **Model Usage** | GPT-4 | ~13T tokens | 2 epochs text, 4 code | | | Llama 3 | ~15T tokens | Latest Meta model | | **Training Costs** | GPT-3 (175B params) | $500K-4.6M | Total training cost | | | Llama 2 (70B) GPU hours | 1.72M hours | Using A100-80GB | | | Llama 2 (70B) cost | ~$3.8M | Estimated total | | | Carbon footprint | 291.42 tCO2eq | For 70B model | | **Hardware** | H100 GPU performance | 400 TFLOPS | Latest NVIDIA AI GPU | | | A100-80GB cost/hour | $2.21 | Cloud pricing | | **Market Size** | AI agents (2024) | $5.1B | Current market | | | AI agents (2030) | $47.1B | Projected | | | CAGR | 44.8% | 2024-2030 | | **Regional Share** | US market | 38.9% | Largest share | | | Asia Pacific | 29.69% | Highest growth | | **Infrastructure** | Global bandwidth | 1,479 Tbps | 22% YoY growth | | | Cloud storage | ~50% | Of total data by 2025 | #data #AI #training