GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-clea...

GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...

GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...

CambioML github.com

RelatedInsightsHighlights

Thumbnail of The architecture of today's LLM applications

Nicole Choi • The architecture of today's LLM applications

DiscoLM German 7B v1 - GGUF

Model creator: Disco Research

Original model: DiscoLM German 7B v1

Description

This repo contains GGUF format model files for Disco Research's DiscoLM German 7B v1.

These files were quantised using hardware kindly provided by Massed Compute.

About GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. I... See more

TheBloke/DiscoLM_German_7b_v1-GGUF · Hugging Face

Easily chunk complex documents the same way a human would.

Chunking documents is a challenging task that underpins any RAG system. High quality results are critical to a sucessful AI application, yet most open-source libraries are limited in their ability to handle complex documents.

Open Parse is designed to fill this gap by providing a flexible, e... See more

Filimoa • GitHub - Filimoa/open-parse: Improved file parsing for LLM’s

Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining

Luca Soldaini blog.allenai.org