神刀安全网

7 Open Source Big Data Analytics and Storage Tools

Advertisement
Advertisement
  • 7 Open Source Big Data Analytics and Storage Tools

    Open source developers have built a burgeoning ecosystem of data analytics and storage solutions to address the data deluge over the past several years. Here’s a look at several of the most popular open source tools for big data storage and analytics.

    Photo:
    Thinkstock

  • 7 Open Source Big Data Analytics and Storage Tools

    Apache Hadoop

    Hadoop is probably the best known open source platform for storing and processing large amounts of data through distributed clusters. It helped launch the open source big data revolution several years ago. Hadoop itself is developed by the Apache Foundation, but a variety of different Hadoop distributions are available from big data vendors.

  • 7 Open Source Big Data Analytics and Storage Tools

    Elasticsearch

    If you want to search easily through large volumes of data, Elasticsearch is your answer (or one of them, at least). It provides full-text search across documents through a user-friendly Web interface. It’s not designed for the same type of use cases as platforms like Hadoop and Spark, but it’s an important open source data tool for organizations with a lot of information to parse.

  • 7 Open Source Big Data Analytics and Storage Tools

    MongoDB

    NoSQL databases have emerged as a key part of the next-generation data storage and analytics ecosystem, and MongoDB is one of the most popular NoSQL solutions. By offering more flexible storage schema than traditional databases, such as MySQL, MongoDB and other NoSQL databases make it easier to work with large amounts of data that exists in unpredictable formats. MongoDB is available in both community-supported and commercial flavors.

  • 7 Open Source Big Data Analytics and Storage Tools

    Apache Hive

    Developed originally by Facebook, Hive is now an Apache project that provides additional data analytics functionality for Hadoop. Using a SQL-like query language called HiveQL, analysts can work with data stored on Hadoop. Hive is designed to deliver faster data processing in certain situations thanks to metadata optimization and indexing. It also handles a wide variety of data formats.

  • 7 Open Source Big Data Analytics and Storage Tools

    Apache Spark

    Spark solves a core component of the data analytics puzzle by optimizing data storage in clustered environments. Hadoop supports clusters, too, but Spark offers a more flexible data retrieval framework, which can optionally take advantage of in-memory data processing within distributed environments. The results are data analytics that can be up to one hundred times faster than Hadoop when done in memory, according to Spark developers.

  • 7 Open Source Big Data Analytics and Storage Tools

    Apache Flink

    Flink, also an Apache project, offers an alternative to platforms like Hadoop. It’s a newer technology, whose main advantage is simplified data processing. Data analysts can build Flink pipelines using Java or Scala, and Flink handles the compilation and optimization automatically. Flink’s main drawback for some use cases is that, unlike Hadoop, it does not couple storage with data processing. It provides only the latter; data storage has to be handled by a separate platform.

  • 7 Open Source Big Data Analytics and Storage Tools

    Apache Cassandra

    It may not win any awards for having the best logo, but Cassandra, which is yet another Apache project, is a handy solution for organizations or programmers in need ofNoSQL-style storage. It’s also designed for massively distributed storage environments, even ones that stretch across multiple data centers.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » 7 Open Source Big Data Analytics and Storage Tools

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址