Trending
Content tagged with "etl"
Hacker News
Top stories from the Hacker News community• Updated 12 minutes ago
650GB of Data (Delta Lake on S3). Polars vs. DuckDB vs. Daft vs. Spark
InfoQ
Latest articles from InfoQ
No articles found
Try removing the tag filter or searching for different content.
Top posts from tech subreddits• Updated 6 minutes ago
Spring Batch Concepts Tutorial to handle large-scale data processing with ease using Spring: Defining Jobs, Steps, Chunk processing, flow control, and workflows etc.
[D] Name and describe a data processing technique you use that is not very well known.
What’s the biggest data governance challenge you face when building cross-agent pipelines?
Hugging Face Trending
Popular models from Hugging Face• Updated 24 minutes ago
No models found
Try removing the tag filter or searching for different content.
GitHub Trending
Popular repositories from GitHub• Updated 38 minutes ago
pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.