BPAC

Large Language Models (LLMs)

    grok

USES:
  1. Media - Text, Images, Video, Audio
  2. Coding, APIs
  3. AI Agents, Work Flows
  4. Local LLMs
LIST USES, PUT INTO Categories. LIST many general LLM 'uses' out of the box, limits on free use. May seem like uses are limitless, but choose/make general areas, subcategories, research online, ask Grok

new LANGUAGE between human, ie 'programming or getting things done' with AI, thus need to look at API usages within an app or website - PROMPT LANGUAGE - specific extra vocab related to subject area, like video or 3D game object and play. Had to know Blender a little or work with language to make the 3D object verbally. AI adds routines, code, selectors, speed, unbelievable. But you have to be able to verbalize the commands.

Chatbots - API - Conversational AI for customer service and engagement Voice Assistants - Context-aware, multi-tasking digital companions. NEX AI - Ikon 2 , a new foundation model that beat Deepseek in accuracy with way fewer GPU hours. china - ‘Manus’ llm gaining agentic powers OpenSource OpenManus Chatbots - API - Conversational AI for customer service and engagement https://www.inceptionlabs.ai/ diffusion ai, like text to image, non-sequential, faster (new algorithm) mercury llm, coding engine

LLMs

Grok - Elon Musk, Tesla - X (Twitter) -
Optimus - Humanoid Robot

Claude 3 - Anthropic
- Stable Diffusion, Image Generation
- Sonnet, Amazon AWS Cloud

Chat GPT - OpenAI

* Meta (Facebook) - Llama

DeepSeek, R1
DeepSeek GitHub
DeepSeek Hugging Face

Mistral
MosaicML Foundations - MPT-7B

Hugging Face - BLOOM
---------------------

https://chat.reka.ai/auth/login Moonshot AI, Kimi
Stability.ai
--------------------------- CHINA
Tencent - Hunyuan
ByteDance - Doubao (LLM)
Tarsier2 - a large vision-language model (LVLM)
TikTok, and Douyin (抖音), in Chinese

CLOUD SERVER

https://www.alibabacloud.com/en

Amazon Web Services (AWS)

Google AI Studio

LOCAL APPLICATION

Alibaba - QwQ-32B model

32 billion parameter model, compared to DeepSeek R1, 670 billion

BUY a computer, high-speed internet access.

Python installed

LlamaIndex Hugging Face Open-source LLMs - LLaMA, Mistral, Falcon, Gemma, alpaca, bloom, OpenManus LlamaIndex – Data management for AI agents. Auto-GPT – Fully autonomous AI agent. Ollama - pulls a model. MSTY enhances Ollama LangChain - LangChain is an open-source framework that simplifies the development of applications using large language models (LLMs). It provides a suite of tools to help developers build applications that combine language models with other resources, such as databases, APIs, and other data sources, to create powerful, flexible, and context-aware systems.

ComfyUI - open-source, user-friendly graphical interface for working with machine learning models, particularly those in the field of generative AI (such as image generation and processing). It is designed to make it easier for non-technical users to interact with complex AI models without needing to write code.

n8n - a tool for automating workflows by connecting different services, APIs, and tools. It enables you to create complex, automated workflows without writing a lot of code, often by using simple drag-and-drop interfaces. It is capable of handling data inputs, making API calls, processing results, and triggering actions across various applications.
Replit

COMPUTER HARDWARE

Intel - i7, (i9)
Ram - 32 gig

Graphics Card
NVIDIA GPU RTX 3090, RTX 4080 super, RTX 4090

GPU (Recommended): A GPU with at least 24 GB of VRAM (like an NVIDIA RTX 3090 or 4090) can offload the model for faster inference. If you use a GPU, you can reduce the RAM requirement to 32 GB, as the model weights can reside in VRAM. For a 70B model with 4-bit quantization (a common optimization), the memory requirement is 64 GB of RAM or VRAM.

CPU: multi-core CPU (AMD Ryzen 7/Ryzen 9 5900X or Intel Core i7/i9 with 6+ core. Higher core counts and faster memory bandwidth (DDR5). CPU-Only: A PC with 64 GB RAM, a decent multi-core CPU (e.g., ), and an SSD. This could run a 4-bit quantized 70B model at 1-2 tokens/second, relying on system RAM. Without a GPU, inference will lean heavily on RAM bandwidth, so a CPU with multiple memory channels (e.g., dual-channel DDR4/DDR5) helps.

GPU-Assisted: An NVIDIA RTX 3090 (24 GB VRAM), 32 GB system RAM, and a mid-tier CPU. This offloads the model to the GPU, potentially reaching 5-10 tokens/second depending on optimization. For a smoother experience (10+ tokens/second), you’d need a beefier setup, like 128 GB RAM or a GPU with 48 GB VRAM (e.g., NVIDIA A6000).

Quantization (4-bit or 8-bit precision), which reduces the memory footprint. Software like llama.cpp or Ollama can optimize the model for lower-end hardware.

Storage: 500 GB of free space, preferably on an SSD, to store the model files and ensure decent load times.
Graphics Card
High-speed internet connection