Название: Big Data Management and Analytics Автор: Вrij В Guрtа and Маmtа Издательство: World Scientific Publishing Год: 2024 Страниц: 288 Язык: английский Формат: pdf (true) Размер: 19.6 MB With the proliferation of information, Big Data management and analysis have become an indispensable part of any system to handle such amounts of data. The amount of data generated by the multitude of interconnected devices increases exponentially, making the storage and processing of these data a real challenge. Big Data management and analytics have gained momentum in almost every industry, ranging from finance or healthcare. Big Data can reveal key insights if handled and analyzed properly; it has great application potential to improve the working of any industry. This book covers the spectrum aspects of Big Data; from the preliminary level to specific case studies. It will help readers gain knowledge of the Big Data landscape.Highlights of the topics covered include description of the Big Data ecosystem; real-world instances of Big Data issues; how the Vs of Big Data (volume, velocity, variety, veracity, valence, and value) affect data collection, monitoring, storage, analysis, and reporting; structural process to get value out of Big Data and recognize the differences between a standard database management system and a Big Data management system. Readers will gain insights into choice of data models, data extraction, data integration to solve large data problems, data modelling using Machine Learning techniques, Spark's scalable Machine Learning techniques, modeling a Big Data problem into a graph database and performing scalable analytical operations over the graph and different tools and techniques for processing Big Data and its applications including in healthcare and finance.
Название: Databricks Lakehouse Platform Cookbook: 100+ recipes for building a scalable and secure Databricks Lakehouse Автор: Аlаn L. Dеnnis Издательство: BPB Publications Год: 2024 Страниц: 581 Язык: английский Формат: epub (true) Размер: 52.2 MB Analyze, Architect, and Innovate with Databricks Lakehouse. The Databricks Lakehouse is groundbreaking technology that simplifies data storage, processing, and analysis. This cookbook offers a clear and practical guide to building and optimizing your Lakehouse to make data-driven decisions and drive impactful results. This definitive guide walks you through the entire Lakehouse journey, from setting up your environment, and connecting to storage, to creating Delta tables, building data models, and ingesting and transforming data. We start off by discussing how to ingest data to Bronze, then refine it to produce Silver. Next, we discuss how to create Gold tables and various data modeling techniques often performed in the Gold layer. You will learn how to leverage Spark SQL and PySpark for efficient data manipulation, apply Delta Live Tables for real-time data processing, and implement Machine Learning and Data Science workflows with MLflow, Feature Store, and AutoML. The book also delves into advanced topics like graph analysis, data governance, and visualization, equipping you with the necessary knowledge to solve complex data challenges. By the end of this cookbook, you will be a confident Lakehouse expert, capable of designing, building, and managing robust data-driven solutions. A good understanding of SQL, Python, Spark, and cloud computing would benefit the reader but is not required.
Название: Cost-Effective Data Pipelines: Balancing Trade-Offs When Developing Pipelines in the Cloud (Final Release) Автор: Sev Leonard Издательство: O’Reilly Media, Inc. Год: 2023 Страниц: 286 Язык: английский Формат: epub (true), mobi Размер: 10.2 MB The low cost of getting started with cloud services can easily evolve into a significant expense down the road. That's challenging for teams developing data pipelines, particularly when rapid changes in technology and workload require a constant cycle of redesign. How do you deliver scalable, highly available products while keeping costs in check? With this practical guide, author Sev Leonard provides a holistic approach to designing scalable data pipelines in the cloud. Intermediate data engineers, software developers, and architects will learn how to navigate cost/performance trade-offs and how to choose and configure compute and storage. You'll also pick up best practices for code development, testing, and monitoring. When working with Spark, the Spark UI provides additional diagnostic information regarding executor load, how well balanced (or not) your computation is across executors, shuffles, spill, and query plans, showing you how Spark is running your query. This information can help you tune Spark settings, data partitioning, and data transformation code.
Название: Parallel Population and Parallel Human: A Cyber-Physical Social Approach Автор: Peijun Ye, Fei-Yue Wang Издательство: Wiley-IEEE Press Год: 2023 Страниц: 353 Язык: английский Формат: pdf (true) Размер: 10.1 MB Parallel Population and Parallel HumanProposes a new paradigm to investigate an individual’s cognitive deliberation in dynamic human-machine interactions. Spark is a state-of-the-art framework for high-performance cloud computing designed to efficiently deal with iterative computational procedures that recursively perform operations over the same data, such as supervised Machine Learning algorithms. It is designed to overcome the deficiency of distributed computing on Hadoop, which is another open-source software platform from Apache for distributed Big Data processing over commodity cluster architectures. As the basis of Spark, Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Название: Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch (Final Release) Автор: Аdi Роlаk Издательство: O’Reilly Media, Inc. Год: 2023 Страниц: 294 Язык: английский Формат: pdf (true), epub (true) Размер: 14.5 MB Get up to speed on Apache Spark, the popular engine for large-scale data processing, including Machine Learning and analytics. If you're looking to expand your skill set or advance your career in scalable Machine Learning with MLlib, distributed PyTorch, and distributed TensorFlow, this practical guide is for you. Using Spark as your main data processing platform, you'll discover several open source technologies designed and built for enriching Spark's ML capabilities. This book aims to guide you in your journey as you learn more about Machine Learning (ML) systems. Apache Spark is currently the most popular framework for large-scale data processing. It has numerous APIs implemented in Python, Java, and Scala and is used by many powerhouse companies, including Netflix, Microsoft, and Apple. PyTorch and TensorFlow are among the most popular frameworks for machine learning. Combining these tools, which are already in use in many organizations today, allows you to take full advantage of their strengths. Scaling Machine Learning with Spark examines various technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLFlow, TensorFlow, PyTorch, and Petastorm.
Автор: Холден Карау, Энди Конвински, Патрик Венделл, Матей Захария Название: Изучаем Spark. Молниеносный анализ данных Издательство: ДМК Пресс Язык: Русский Год: 2015 Формат: pdf Размер: 51,6 Mb Кол-во страниц: 304 Описание: В этой книге рассказывается об Apache Spark, открытой системе кластерных вычислений, которая позволяет быстро создавать высокопроизводительные программы анализа данных. С помощью Spark вы сможете манипулировать огромными объемами данных посредством простого API на Python, Java и Scala. Написанная разработчиками Spark, эта книга поможет исследователям данных и программистам быстро включиться в работу.
Автор: Amit Nandi Название: Spark for Python Developers Издательство: Packt Publishing Год: 2015 Формат: PDF Размер: 6.16 MB ISBN: 1784399698 Кол-во страниц: 146 Язык: Английский Описание: Looking for a cluster computing system that provides high-level APIs? Apache Spark is your answer―an open source, fast, and general purpose cluster computing system. Spark's multi-stage memory primitives provide performance up to 100 times faster than Hadoop, and it is also well-suited for machine learning algorithms. Are you a Python developer inclined to work with Spark engine? If so, this book will be your companion as you create data-intensive app using Spark as a processing engine, Python visualization libraries, and web frameworks such as Flask. To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as MySQL, MongoDB, Cassandra, and Hadoop.
Бесплатная электронная библиотека. Скачать книги бесплатно!
Наша электронная библиотека Bookskeeper (для РФ работает через VPN) - это интернет-витрина, где любой посетитель может публиковать электронные варианты книг, журналов, газет, комиксов, в общем, любой литературы со ссылками для медленного, но бесплатного скачивания с файлообменников.
В нашем книжном хранилище Вы всегда найдете литературу на любой вкус человека любого возраста - от детских комиксов и расскрасок до серьезной научной литературы.
|