GitHub - IBM/unitxt: 🦄 Unitxt: a python library for getting...

GitHub - IBM/unitxt: 🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

RelatedHighlights

TabLib

Access on Hugging Face

🤗

(Sample, Full Dataset)

Read the Paper (TabLib)

Introduction

Huge datasets have been critical for the performance of AI models for text and images. Similar advancements can be made for tabular data—which consists of tables consisting of rows and columns—but the research community needs a bigger and more diverse datas... See more

TabLib

DocArray as our in-memory vector storage. DocArray provides various features like advanced indexing, comprehensive serialization protocols, a unified Pythonic interface, and more. Further, it offers efficient and intuitive handling of multimodal data for tasks such as natural language processing, computer vision, and audio processing.

Ben Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs

Indexify - Extraction and Retrieval from Videos, PDF and Audio for Interactive AI Applications

LLM applications backed by Indexify will never answer outdated information.

Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more

tensorlakeai • GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications

GitHub - public-apis/public-apis: A collective list of free APIs

public-apis github.com