What is Elasticsearch? Where can Elasticsearch be used?-C#.Net Tutorial-php.cn

Elasticsearch Version: 5.4
Elasticsearch Quick Start Part 1: Getting Started with Elasticsearch
Elasticsearch Quick Start Part 1 2 articles: Elasticsearch and Kibana installation
Elasticsearch quick start article 3: Elasticsearch index and document operations
Elasticsearch quick start article 4: Elasticsearch document query

Elasticsearch is a highly scalable open source full-text search and analysis engine. It can store, search and analyze large-scale data quickly and in near real-time. It is generally used as the underlying engine/technology to provide strong support for applications with complex search functions and requirements.

Elasticsearch can be used in these places:

Suppose there is an online store website, in order to allow customers to search for products on sale. In this case, you can use Elasticsearch to store your entire product catalog and inventory and provide searches and automatically give them some suggestions.
Suppose you want to collect logs or transaction data and find trends, statistics, summaries or anomalies through analysis and mining. In this case, you can use LogStash (part of the Elasticsearch/Logstash/Kibana stack) to collect, aggregate and parse your data, and then use LogStash Submit this data to Elasticsearch . Once Elasticsearch has obtained the data, you can search and aggregate the information that interests you.
Suppose you run a price alert platform and let price-savvy customers specify a rule such as “I am interested in purchasing a specific electronic gadget if, within the next month, there is a seller Price is less than $x, I want to be notified". In this case, you can submit the seller's price to Elasticsearch , use a reverse search (filter), match the price changes to the customer query, and notify the customer once a match is found.
Suppose you have an analytical (business intelligence) need and want to quickly investigate, analyze, visualize and find an ad-hoc problem in large amounts of data (think millions or billions of records) . In this case, you can use Elasticsearch to store the data, and then use Kibana (part of the Elasticsearch stack) to build custom dashboards that can be visualized for you important data. In addition, you can use the Elasticsearch aggregation function to perform complex business intelligence queries based on data.

For the rest of this tutorial, I will guide you through the startup and running process of Elasticsearch , and show you some basic operations, such as: indexing, Search and modify data. By the end of this tutorial, you will have a deeper understanding of what Elasticsearch is and how it works. Hopefully you'll be inspired to use it to both build sophisticated search applications and discover useful things from your data.

Basic Concepts (Basic Concepts)

There are some concepts that are the core of Elasticsearch . Understanding these concepts from the beginning will greatly aid later learning.

Near Real Time (NRT)

Elasticsearch is a near real-time search platform. This means there is only a slight delay (usually 1 second) from the time a document is indexed to the time it becomes searchable.

Cluster (Cluster)

A cluster is a collection of one or more nodes (servers) that unite to save all data, and Indexing and search operations can be performed on all nodes. Clusters are identified by a unique name, which defaults to "elasticsearch". Since a node can only belong to one cluster and join the cluster according to the cluster name. So the name is important.

Do not use the same cluster name in different environments, otherwise the wrong cluster may be added. For example, you can use cluster names, logging-dev , logging-stage and logging-prod in development, staging, and production environments respectively.

Note that a cluster with only one node is valid and perfect. It is also possible to have multiple independent clusters, each with its own unique cluster name.

Node (Node)

A node is a single server that is part of the cluster, stores data, and participates in the indexing and search of the cluster. Like the cluster, nodes are also distinguished by unique names. The default name is a random UUID (Universally Unique IDentifier), which will be set to the node when the server starts. You can also customize the node name if you don't want to use the default value. Names are very important to administrators, as they help you identify which nodes correspond to each server in the cluster.

Nodes can join the specified cluster by configuring the cluster name. By default, nodes join a cluster called elasticsearch , which means that if you start a large number of nodes in the network and if they can all communicate with each other, they will automatically be added to a cluster. The cluster named elasticsearch .

Index

Index is a collection of documents with certain similar characteristics. For example, customer data index, product catalog index, and order data index. An index is identified by a name (which must be all lowercase) that is used when indexing, searching, updating, and deleting documents. Within a single cluster, you can define as many indexes as needed.

Type (Type)

An index can define one or more types. A type is a logical category/partition of an index, whatever you want to understand it to be. Typically, a type is defined for documents that have a common set of fields. For example, a blogging platform might store all data in a single index. In this index, you can define user data types, blog data types, and comment data types.

Document (document)

Document is the basic unit that can be indexed. For example, use a document to save data about a customer, or save data about a single product, or save data about a single order. Documents are represented using JSON. A large number of documents can be stored in an index/type. It is worth noting that although the document is essentially stored in the index, it is actually indexed/assigned to a type in the index.

Shards & replicas

An index may store massive amounts of data, which may exceed the hard disk capacity of a single node. For example, an index stores 1 billion documents and occupies 1 TB of hard disk space. The hard disk of a single node may not be enough to store such a large amount of data. Even if it can be stored, it may slow down the server's processing speed of search requests.

In order to solve this problem, elasticsearch provides the sharding function, which is to subdivide the index. When creating an index, you can simply define the number of shards required. Each shard itself has all the functions of an index and can be stored on any node in the cluster.

Sharding is important for two main reasons:

It allows you to split/scale your content volume horizontally
It allows you to distribute operations to shards on multiple nodes in parallel, thereby improving performance or throughput.

# The mechanism of shard distribution, and how its documents are aggregated back into search requests, is completely managed by Elasticsearch and is transparent to the user.

In a network/cloud environment where failure can occur at any time, sharding can be very useful and a failover mechanism is highly recommended to prevent the shard/node from going offline or disappearing. To do this, elasticsearch allows you to make one or more copies of the index's shards, which are so-called replicated shards, or simply replicas.

Replicas are important for two main reasons:

#To provide high availability if a shard/node fails. Therefore, it is important to note that a replica cannot be allocated on the same node as the original/primary shard it is copied from.
It allows you to scale search volume/throughput since searches can be performed in parallel on all replicas.

In summary, each index can be divided into multiple shards. Each index can also be replicated zero times (meaning no copies) or multiple times. Once replicated, each index will have a primary shard (the original shard that was replicated) and a secondary shard (a copy of the primary shard). The number of shards and replicas can be defined per index when creating the index. After creating an index, you can dynamically change the number of replicas at any time, but you cannot change the number of shards afterwards.

By default, each index will be assigned 5 primary shards and 1 replica shard, which means that if you have two nodes in the cluster, your index will have 5 primary shards. shards and 5 replicated shards, for a total of 10 shards.

Each elasticsearch shard is a Lucene index. There can be many documents in a Lucene index. As of LUCENE-5843, up to 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can use _cat/shards API monitors shard size.

Summary

1. Why not use a relational database for searching? Because the database is used to implement the search, the performance will be very poor and word segmentation search cannot be performed.

2. What are full-text search, inverted index and Lucene? Previous people have already summarized it, please refer to [Teaching you step-by-step full-text retrieval] A preliminary exploration of Apache Lucene

3. Characteristics of Elasticsearch

It can be distributed in clusters and handle massive data Perform near real-time processing;
is very simple for users to use out of the box. If the amount of data is not large, the operation will not be too complicated;
has functions that relational databases do not have, such as full-text search, synonym processing, relevance ranking, complex data analysis, and massive data processing Near real-time processing;
Based on Lucene, it hides complexity and provides simple and easy-to-use restful api interface and java api interface

4, The core concept of elasticsearch

Cluster: The cluster contains multiple nodes, and which cluster each node belongs to is determined by configuration (the default is elasticsearch)
Node: A node in the cluster. The node will automatically join the cluster named "elasticsearch" by default. An elasticsearch service is a node. For example, if a machine starts two es services, there will be two nodes.
Index: Index, equivalent to the mysql database, contains a bunch of document data with a similar structure.
Type: Type, equivalent to a mysql table, a logical data classification in the index.
Document: Document, equivalent to a row of records in the mysql table, is the smallest data unit in es.
shard: Sharding. A single machine cannot store a large amount of data. es can split the data in an index into multiple shards and distribute them for storage on multiple servers.
replica: Replica: In order to prevent downtime and shard loss, the minimum high availability configuration is 2 servers.

The above is the detailed content of What is Elasticsearch? Where can Elasticsearch be used?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7751

Java Tutorial

1643

CakePHP Tutorial

1398

Laravel Tutorial

1293

PHP Tutorial

1234

Related knowledge

A Diffusion Model Tutorial Worth Your Time, from Purdue University Apr 07, 2024 am 09:01 AM

Diffusion can not only imitate better, but also "create". The diffusion model (DiffusionModel) is an image generation model. Compared with the well-known algorithms such as GAN and VAE in the field of AI, the diffusion model takes a different approach. Its main idea is a process of first adding noise to the image and then gradually denoising it. How to denoise and restore the original image is the core part of the algorithm. The final algorithm is able to generate an image from a random noisy image. In recent years, the phenomenal growth of generative AI has enabled many exciting applications in text-to-image generation, video generation, and more. The basic principle behind these generative tools is the concept of diffusion, a special sampling mechanism that overcomes the limitations of previous methods.

Generate PPT with one click! Kimi: Let the 'PPT migrant workers' become popular first Aug 01, 2024 pm 03:28 PM

Kimi: In just one sentence, in just ten seconds, a PPT will be ready. PPT is so annoying! To hold a meeting, you need to have a PPT; to write a weekly report, you need to have a PPT; to make an investment, you need to show a PPT; even when you accuse someone of cheating, you have to send a PPT. College is more like studying a PPT major. You watch PPT in class and do PPT after class. Perhaps, when Dennis Austin invented PPT 37 years ago, he did not expect that one day PPT would become so widespread. Talking about our hard experience of making PPT brings tears to our eyes. "It took three months to make a PPT of more than 20 pages, and I revised it dozens of times. I felt like vomiting when I saw the PPT." "At my peak, I did five PPTs a day, and even my breathing was PPT." If you have an impromptu meeting, you should do it

All CVPR 2024 awards announced! Nearly 10,000 people attended the conference offline, and a Chinese researcher from Google won the best paper award Jun 20, 2024 pm 05:43 PM

In the early morning of June 20th, Beijing time, CVPR2024, the top international computer vision conference held in Seattle, officially announced the best paper and other awards. This year, a total of 10 papers won awards, including 2 best papers and 2 best student papers. In addition, there were 2 best paper nominations and 4 best student paper nominations. The top conference in the field of computer vision (CV) is CVPR, which attracts a large number of research institutions and universities every year. According to statistics, a total of 11,532 papers were submitted this year, and 2,719 were accepted, with an acceptance rate of 23.6%. According to Georgia Institute of Technology’s statistical analysis of CVPR2024 data, from the perspective of research topics, the largest number of papers is image and video synthesis and generation (Imageandvideosyn

PyCharm Community Edition Installation Guide: Quickly master all the steps Jan 27, 2024 am 09:10 AM

Quick Start with PyCharm Community Edition: Detailed Installation Tutorial Full Analysis Introduction: PyCharm is a powerful Python integrated development environment (IDE) that provides a comprehensive set of tools to help developers write Python code more efficiently. This article will introduce in detail how to install PyCharm Community Edition and provide specific code examples to help beginners get started quickly. Step 1: Download and install PyCharm Community Edition To use PyCharm, you first need to download it from its official website

From bare metal to a large model with 70 billion parameters, here is a tutorial and ready-to-use scripts Jul 24, 2024 pm 08:13 PM

We know that LLM is trained on large-scale computer clusters using massive data. This site has introduced many methods and technologies used to assist and improve the LLM training process. Today, what we want to share is an article that goes deep into the underlying technology and introduces how to turn a bunch of "bare metals" without even an operating system into a computer cluster for training LLM. This article comes from Imbue, an AI startup that strives to achieve general intelligence by understanding how machines think. Of course, turning a bunch of "bare metal" without an operating system into a computer cluster for training LLM is not an easy process, full of exploration and trial and error, but Imbue finally successfully trained an LLM with 70 billion parameters. and in the process accumulate

Five programming software for getting started with learning C language Feb 19, 2024 pm 04:51 PM

As a widely used programming language, C language is one of the basic languages that must be learned for those who want to engage in computer programming. However, for beginners, learning a new programming language can be difficult, especially due to the lack of relevant learning tools and teaching materials. In this article, I will introduce five programming software to help beginners get started with C language and help you get started quickly. The first programming software was Code::Blocks. Code::Blocks is a free, open source integrated development environment (IDE) for

AI in use | AI created a life vlog of a girl living alone, which received tens of thousands of likes in 3 days Aug 07, 2024 pm 10:53 PM

Editor of the Machine Power Report: Yang Wen The wave of artificial intelligence represented by large models and AIGC has been quietly changing the way we live and work, but most people still don’t know how to use it. Therefore, we have launched the "AI in Use" column to introduce in detail how to use AI through intuitive, interesting and concise artificial intelligence use cases and stimulate everyone's thinking. We also welcome readers to submit innovative, hands-on use cases. Video link: https://mp.weixin.qq.com/s/2hX_i7li3RqdE4u016yGhQ Recently, the life vlog of a girl living alone became popular on Xiaohongshu. An illustration-style animation, coupled with a few healing words, can be easily picked up in just a few days.

A must-read for technical beginners: Analysis of the difficulty levels of C language and Python Mar 22, 2024 am 10:21 AM

Title: A must-read for technical beginners: Difficulty analysis of C language and Python, requiring specific code examples In today's digital age, programming technology has become an increasingly important ability. Whether you want to work in fields such as software development, data analysis, artificial intelligence, or just learn programming out of interest, choosing a suitable programming language is the first step. Among many programming languages, C language and Python are two widely used programming languages, each with its own characteristics. This article will analyze the difficulty levels of C language and Python

See all articles