Article Tags
【Nightingale Monitoring】Neptune——Categraf

【Nightingale Monitoring】Neptune——Categraf

Has anyone encountered the same confusion as me: When I use Prometheus to build a monitoring system, whenever there is a component that needs to be monitored, I have to add an exporter for it. If there are 10 components, I have to Adding 10 exporters, let alone the quality of these 10 exporters (because most of the exporters are developed by netizens themselves), the learning cost, deployment cost and maintenance cost are all a headache. Is there a component that can collect most indicators? Categraf is such a collector. Are you surprised or surprised? What is Categrf? Categraf is a monitoring and collection agent, similar to

Jun 09, 2023 am 09:18 AM
监控采集
[Nightingale Monitoring] Alarm management, great!

[Nightingale Monitoring] Alarm management, great!

Monitoring is the method, alarming is the means, and solution is the purpose. However, have you ever encountered this kind of confusion? I have collected a lot of indicators, but I don’t know which indicators should trigger alarms, nor how to send these alarms to the corresponding teams or individuals, nor how to upgrade the alarms. When I used Prometheus+Altermanager before, I created a DingTalk group for each team, then added a bunch of tags, matched different tags and sent them to different groups. If I wanted to upgrade alarms, many times This is accomplished through threshold upgrade, but it is difficult to upgrade the same alarm through time. But Nightingale’s alarm rule management is not that complicated (they do the complicated things for you), and it’s also very elegant.

Jun 09, 2023 am 08:31 AM
夜莺 夜莺监控
Business is growing exponentially, can usability construction be so stable?

Business is growing exponentially, can usability construction be so stable?

1. Problems and Challenges As you can see from the figure, since 2017, vivo’s machine scale and number of services have grown significantly. In terms of machine scale, it has increased by about five times from 2017 to 2022, and the number of services has also basically increased by more than ten times. As the scale grows, challenges and complexity will definitely increase. Typical challenges in vivo are mainly divided into change challenges and failure challenges. 1. Change challenges There are still more or less manual change scenarios in the change; our single release time is relatively long; there are many scenarios of large-scale business migration; Google SRE has such a concept: 70% failure is caused by changes. Corresponding to vivo, this situation does exist. The change affects online stability.

Jun 09, 2023 am 12:17 AM
业务 故障 指数级
Seven places to conduct automated security testing

Seven places to conduct automated security testing

Personally, I like DevSecOps (where security teams weave security throughout the entire process Dev and Ops are performing). Because of my passion, clients often ask me when, how, and where to inject various types of testing and other security activities. Below is a list of options I offer clients for automated testing (there is a lot more security work to do in DevOps - this is just automated testing). Together they analyze the list and decide where it makes the most sense based on their current state, and select tools based on their current focus. Seven Places for Automated Testing 1. In an Integrated Development Environment: Tools that check your code almost like a spell checker (not sure what this is called, sometimes called SAST) Agent Management and Dependencies

Jun 09, 2023 am 12:07 AM
安全 自动化
[Nightingale Monitoring] The Swiss Army Knife of Extracting Indicators from Logs

[Nightingale Monitoring] The Swiss Army Knife of Extracting Indicators from Logs

mtail is a Google open source tool that extracts metrics from application logs. It reads application logs in real time, then analyzes the logs through a script written by itself, and finally generates time series indicators. The project address is: https://github .com/google/mtail. Nightingale's Categraf also uses mtail to collect log indicators, but it has made some optimizations. We will slowly explain the specific optimizations. Now, let’s start with Google’s mtail, and then slowly talk about Nightingale’s mtail plug-in. The installation of mtail has given a brief introduction to mtail before, but in fact, that’s all. So, let's start directly with the installation. from ht

Jun 08, 2023 pm 09:48 PM
夜莺
Smooth operation and maintenance, an iron pot

Smooth operation and maintenance, an iron pot

On June 5, Vipshop released a fault report on March 29, 2023. Due to the failure of the Nansha IDC refrigeration system, the Vipshop online mall stopped serving, causing hundreds of millions of losses (as a small operation and maintenance I'm trembling). For Vipshop, the online mall is its core business entrance, and failures are inevitable. However, it cannot be tolerated if the failure is so long. Why does this happen? In the eyes of small operations like us, this kind of accident should not happen in a company of this magnitude. We are all looking for ways to operate and maintain by imitating and learning from their PPTs. However, PPT is so advanced that it cannot prevent failures from occurring. Why is this? I personally would venture to make some guesses: PPT ≠ realistic fault drill = going through the motions? Live a good life, just talk

Jun 08, 2023 pm 09:24 PM
运维 物力 人力
Zuoyebang Nie An: How to transform operation and maintenance, listen to Zuoyebang's OPaS ideas

Zuoyebang Nie An: How to transform operation and maintenance, listen to Zuoyebang's OPaS ideas

In the first issue, Yang Jingjing’s boss expressed many interesting opinions. Some people left a message saying that it was a guide to dissuading people from quitting operations. Haha, the guests in this issue will have different opinions. Please keep an open mind and listen to the opinions of hundreds of schools of thought. , make your own career and life plans. As the saying goes, if you listen to both, you will be enlightened, but if you believe only, you will be dark. If you only listen to what suits your ears, there is a high probability that there will be no in-depth thinking and collision, which is a pity. This is the second issue of the down-to-earth and high-level "Operation and Maintenance Forum", starting now! Guest Introduction In this issue, we invite Nie An, the head of operation and maintenance of Zuoyebang. Nie An is a senior industry veteran. He has successively worked for Alibaba, Xiaomi, Didi, and Zuoyebang. He has more than 10 years of operation and maintenance/R&D/management experience. experience. Brief summary of key points: Traditional operation and maintenance is responsible for assembling industrial products into services, delivering them to users, and maintaining service operations; special

Jun 08, 2023 pm 09:12 PM
运维
Computer Tips: Introduction to Clipboard Master Clipboard Enhancement Tool

Computer Tips: Introduction to Clipboard Master Clipboard Enhancement Tool

Today I will introduce to you another clipboard enhancement tool - ClipboardMaster clipboard. 1. Software Introduction ClipboardMaster can paste multiple items or cut part of a single item at once, search the clipboard within a range and optimize it using the mouse and keyboard. ClipboardMaster can easily complete the pasting of content by configuring hotkeys, and can even paste in temporary editing areas, such as files in the resource manager when renaming. Even after restarting Windows, the clipboard contents can still be saved. Official website: https://www.clipboardmaster.com/ 2. Function list introduction text module

Jun 08, 2023 pm 08:48 PM
剪切板增强工具 Windows
What capabilities should PG database operation and maintenance tools cover?

What capabilities should PG database operation and maintenance tools cover?

Before the holidays, I collaborated with the PG China community to conduct an online live broadcast on how to use D-SMART to operate and maintain the PG database. It happened that one of my clients in the financial industry listened to my introduction and called over to chat. They are selecting database Xinchuang and have tried several domestic databases. Finally, they are going to choose TDSQL. I felt a little surprised at the time. They had been selecting domestic databases since 2020, but it seemed that the initial experience after using TDSQL was not very good. Later, after communication, I learned that they had just started using TDSQL's distributed database and found that the research and development requirements were too high, so they all chose TDSQL's centralized MYSQL instance. After using it, they found that it was very easy to use. The entire database cloud

Jun 08, 2023 pm 06:56 PM
运维 数据库 PG
Operations and maintenance asked me to optimize the SpringBoot startup speed, and this is what I did!

Operations and maintenance asked me to optimize the SpringBoot startup speed, and this is what I did!

SpringBoot is undoubtedly the largest framework for Java back-end development. Based on SpringBoot, there is a complete tool chain and various starters. For daily business development, it can be said that the wheels are complete. However, with the popularity of microservices and the cloud-native era, SpringBoot applications have exposed some problems, the most prominent of which are: slow startup applications, memory usage, and multi-cloud native applications that have relatively high startup speed requirements. When horizontal expansion is required, these new instances must be started within a short enough time to process new requests as quickly as possible. Cloud-native applications require as few resources as possible to run. Reducing the resources occupied by a single instance as much as possible means that

Jun 08, 2023 pm 06:52 PM
内存 运维
An in-depth explanation of the technology operation indicator system

An in-depth explanation of the technology operation indicator system

Introduction When it comes to technology operation indicators, every technology person can name a few, such as transaction volume, response time, response rate, success rate, etc. These metrics are quantitative assessments of work in an area of ​​operations. However, in order to evaluate the overall level of technology operations, it is necessary to establish an indicator system for technology operations, obtain overall information, and then use this information to drive the development of operations and achieve organizational goals. Construction Goals and Positioning Bank G has established a technology operation observability index system to provide managers with a multi-dimensional and refined operation management analysis framework, and use this as a starting point to improve the center's operation management capabilities, decision-making level and service quality. The construction of the indicator system follows the four principles of being quantifiable, comparable, action-oriented, and adaptable to multiple scenarios, focusing on actual business scenarios.

Jun 08, 2023 pm 06:43 PM
信息 指标 科技运营
Flashcat Lai Wei: How to stabilize the job of operation and maintenance

Flashcat Lai Wei: How to stabilize the job of operation and maintenance

The first issue of the forum "Jingyuan - Operation and Maintenance Geometry" and Ma Chi's "It's time to lay off the operation and maintenance collective" some time ago have caused widespread discussion in the industry. There is really no future for operation and maintenance positions. Yet? How to keep your job steady? In this issue, we interviewed Lai Wei from Kuaimao Nebula. Lai Wei is an entrepreneur who breaks out of the operation and maintenance circle. Since he can start a business, he must have deep experience in the industry. How would he view this problem? Let's listen to a new sound together! This is the third issue of the down-to-earth and high-level "Operation and Maintenance Forum", starting now! Tell us about yourself and your current company? Hello everyone, I am Lai Wei from Kuaimao Nebula. Kuaimao Nebula is a cloud-native intelligent operation and maintenance technology company, composed of the core development team of the open source monitoring tool "Nightingale Monitor"

Jun 08, 2023 pm 06:42 PM
运维 快猫
How to practice the 'four kings' of self-revolution

How to practice the 'four kings' of self-revolution

The operation and maintenance forum invites veterans in the operation and maintenance field to provide in-depth insights through interviews and manuscripts, and collide with each other in order to form some advanced consensus and promote the industry to move forward better. In this issue, we invited Wang Mingsong. Boss Wang put forward the "Four Rules" for cloud native application practice, which is widely recognized in the industry. Starting in 2019, all the IDC business of Boss Wang's company has been moved to the cloud. The scale is not small, but the SRE team is very small, a bit like NetFlix. In this lecture, let’s take a look at how senior cloud operations and maintenance work. This is the 7th issue of the down-to-earth and high-level "Operation and Maintenance Forum", starting now! Problem Preview I first met Boss Wang because of a discussion in the WeChat group. Boss Wang proposed four cloud native application practices.

Jun 08, 2023 pm 06:21 PM
运维
LocalSend: Free file transfer tool, supports all platforms

LocalSend: Free file transfer tool, supports all platforms

Overview LocalSend is a LAN-based file transfer tool that supports Windows, macOS, Android, iPhone and Linux operating systems. This tool helps users transfer files quickly and securely within the same local area network. LocalSend provides a variety of powerful transmission functions to meet the needs of different users. In addition, the software also supports multiple formats, multi-language environments, shortcut keys and other functions, which can greatly improve users' work efficiency. In this article, we will introduce the various functions of LocalSend in detail. Main function file transfer: LocalSend can quickly transfer files within the same LAN. Users only need to select the files to be transferred, set the transfer target, and start

Jun 08, 2023 pm 04:28 PM
LocalSend 文件传输工具

Hot tools Tags

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

vc9-vc14 (32+64 bit) runtime library collection (link below)

vc9-vc14 (32+64 bit) runtime library collection (link below)

Download the collection of runtime libraries required for phpStudy installation

VC9 32-bit

VC9 32-bit

VC9 32-bit phpstudy integrated installation environment runtime library

PHP programmer toolbox full version

PHP programmer toolbox full version

Programmer Toolbox v1.0 PHP Integrated Environment

VC11 32-bit

VC11 32-bit

VC11 32-bit phpstudy integrated installation environment runtime library

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use