| Category | Metric | Value | Notes |
| ------------------ | -------------------------- | --------------- | ------------------------ |
| **Available Data** | High-quality public text | 100-200T tokens | Upper limit for training |
| | Internet total data (2024) | 149 zettabytes | Growing to 181ZB by 2025 |
| | Daily data creation | 328.77M TB | ~1.7MB per user/second |
| **Model Usage** | GPT-4 | ~13T tokens | 2 epochs text, 4 code |
| | Llama 3 | ~15T tokens | Latest Meta model |
| **Training Costs** | GPT-3 (175B params) | $500K-4.6M | Total training cost |
| | Llama 2 (70B) GPU hours | 1.72M hours | Using A100-80GB |
| | Llama 2 (70B) cost | ~$3.8M | Estimated total |
| | Carbon footprint | 291.42 tCO2eq | For 70B model |
| **Hardware** | H100 GPU performance | 400 TFLOPS | Latest NVIDIA AI GPU |
| | A100-80GB cost/hour | $2.21 | Cloud pricing |
| **Market Size** | AI agents (2024) | $5.1B | Current market |
| | AI agents (2030) | $47.1B | Projected |
| | CAGR | 44.8% | 2024-2030 |
| **Regional Share** | US market | 38.9% | Largest share |
| | Asia Pacific | 29.69% | Highest growth |
| **Infrastructure** | Global bandwidth | 1,479 Tbps | 22% YoY growth |
| | Cloud storage | ~50% | Of total data by 2025 |
#data
#AI
#training