分类: 闲言碎语

  • 20240915贝碧嘉来吗?

    20240915贝碧嘉来吗?

    今天是周日,中秋放假第一天,台风贝碧嘉计划登陆上海的日子。

    下午闲来无事,蹬个自行车去看变形金刚起源,电影院距离我家12km,所以换句话说,其实是为了骑车,去看了场电影。

    路上人车都很少,不热,但因为台风时晴时雨,空气湿度挺高的,潮腻感明显。

    路两旁的灯笼树花开的火热,一簇簇的,黄艳艳的,高高低低点缀在绿叶之间,显得秋意渐浓。

    说下电影。很失望。没指望这类片子剧情有多出彩,但作为起源故事,你好歹也认真编一编吧?整一个捣糨糊式的剧情,威震天和擎天柱是基友,然后莫名其妙反目成仇不说,赛博坦上的变形金刚造型跟地球上差不多这种设定,我不太能从逻辑上接受。

    电影院出来,外面已经被大雨浇了一遍,我当然也是做好了淋着雨回家的准备才出来的,所以也便扫了辆小蓝就出发了。回程就骑得比较慢,一来时而下雨,另一个则是会停下来拍拍照。

    话说,台风天的天空颜色很是特别,是紫色的。

    橙色预警

    路边的栾树开的火热,看来不久就可以摘无患子了。

    落了一地的花瓣

    运砂船也基本停航防台等待

  • 20240726 参观创新中心

    20240726 参观创新中心

    这个周五,2024巴黎奥运会也同日开幕撩~

    那就先说下巴黎开幕式的观感:好评。很难得的做出了创新,是近几届开幕式中,令我印象较为深刻的一次。从串联全城古建筑的隐士(骑士?),到最后塞纳河上的飞马,给我了刺客信条和达芬奇密码的带入感,可以!

    虽然天公不作美,全程在下雨,但也给画面增添了一些独特的趣味。最后罹患僵人症的Celine Dion登塔演唱将气氛烘托到位,奥运圣火随着热气球升空燃烧。

    祝奥运健儿们好运,国家队好运!

    开始

    回到今天的周记主题,美敦力在中国拥有两家创新中心,上海这个成立于2005年,主要致力于向医疗界推广新疗法和自己的全家桶产品,整栋楼内设置了模拟ICU和真实的动物手术室。

    这一次,我们观摩的是一场心脏移植手术,“患者”是一只小猪。作为第一次进入手术室近距离观摩的群众,我是带着一股兴奋劲而来的。

    换好行头,站到手术室门前,耳边已经听到监护仪规律的”呼吸“声。轻轻推开门,幽蓝色的背景中,由几盏明亮的无影灯将今天的主角点亮着。手术台边一共有6名医生,他们虽然各有分工,但主刀的那位,你显然是不会搞错的。

    手术室里的数字时钟在一分一秒的不快不慢走着,感觉分外的严谨和精确。忽然听到一名医生大声上报了当前的时间,纳尼?根据我多年观影的经验,难道是宣布死亡了?不对,死亡宣告显然应该是主刀医生的权力。然后见到这名医生绕走到我这一侧,手持一支针筒,往台架旁的体外循环血液机器中注入了些液体。他,是麻醉师吧?我心里想着。

    整个手术室可以容纳6台手术同时展开,除了这次移植手术占用了两台外,今天还有两组医生在进行不同科目的学习和操练。我也好奇的过去看了看。一台是肺叶切除,只见两名医生熟练的操作着内窥镜在体内游走,寻找着可疑的目标,如同一名猎人,伺机而动。过道对侧的这一组看起来神情更加放松些,他们在执行的是肠道相关的手术,我盯着高清的监视器,脑袋里想着他们会发现些什么神秘物体。

    正在此时,远处的移植手术台上传来了一声除颤电流声,我循声走了过去。这时的主刀医生面部有些凝重,双手依旧握持着除颤器。啪,又一声除颤,小猪整个躯体跳动了一下,随后几位医生的身体似乎都不约而同地松弛了下来,传来几声私语,主刀医生收起了电极,露出了些笑容。看起来心脏复跳了。

    出来后,与管理员阿姨闲聊,得知每只小猪仅能参与一场手术,结束后会安排超度,感谢它们为人类医学进步做出的贡献。

    人类的医学进步其实是非常缓慢的,CCTV有部纪录片叫《手术两百年》,这里面的创新和发展都是伴随着现代科技的进步而前进,而对于人体自身的认识和研究,是在一战后才陆续深入、累积,但依旧还有太多的未知需要我们继续摸索。

    我们到底生活在一个什么样的世界中,是一种如何的存在?也许这个问题永远无法被彻底回答,甚至它可能都”没有“答案。

    祝我们好运,祝人类好运。

  • 20240719 Globe IT Outage

    20240719 Globe IT Outage

    这一天,将会记录历史,将会让我们重新思考SaaS的安全性。

    7月19日,一个风和日丽的周五。我请了假在老家疗养。

    刚过晌午,微信群里开始陆续报告一台又一台的电脑蓝屏重启,反复重启……紧接着我们的服务器也开始出现故障,整个系统停摆了,紧接着友邻的系统也一个个倒下,整个公司的系统都倒了……上网一看,整个世界的Windows系统都蓝了……

    故障的范围越来越大,原因也渐渐清晰起来:一切安装了CrowdStrike这款防渗透软件的Windows系统在这一天无一幸免。网上有位C++开发达人在第二天就通过分析stacktrace指出,此问题就是代码中的空指针引用未判断所致,属于低级缺陷,这与本周发布的官方调查报告相符。

    根据此报告,缺陷代码早先就以存在,只是在等待合适的特征库更新包的到来即可触发。于是,我们的吃瓜阴谋论就可以展开了:如此巨大的一场全球事故,各行各业对于网络安全极为重视的组织全部受灾,那么,这是否是被人有意操控的“意外”,用于掩盖他们的小动作?这是一个熟悉的好莱坞配方,但艺术总是源于生活,生活也在不停的借鉴艺术,不是吗?

    官方的事故说明报告:

    Falcon Content Update Remediation and Guidance Hub | CrowdStrike

    Preliminary Post Incident Review

    Content Configuration Update Impacting the Falcon Sensor and the Windows Operating System (BSOD)

    Executive Summary PDF

    This is CrowdStrike’s preliminary Post Incident Review (PIR). We will be detailing our full investigation in the forthcoming Root Cause Analysis that will be released publicly. Throughout this PIR, we have used generalized terminology to describe the Falcon platform for improved readability. Terminology in other documentation may be more specific and technical.

    What Happened?

    On Friday, July 19, 2024 at 04:09 UTC, as part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques.

    These updates are a regular part of the dynamic protection mechanisms of the Falcon platform. The problematic Rapid Response Content configuration update resulted in a Windows system crash.

    Systems in scope include Windows hosts running sensor version 7.11 and above that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC and received the update. Mac and Linux hosts were not impacted.

    The defect in the content update was reverted on Friday, July 19, 2024 at 05:27 UTC. Systems coming online after this time, or that did not connect during the window, were not impacted.

    What Went Wrong and Why?

    CrowdStrike delivers security content configuration updates to our sensors in two ways: Sensor Content that is shipped with our sensor directly, and Rapid Response Content that is designed to respond to the changing threat landscape at operational speed.

    The issue on Friday involved a Rapid Response Content update with an undetected error.

    Sensor Content
    Sensor Content provides a wide range of capabilities to assist in adversary response. It is always part of a sensor release and not dynamically updated from the cloud. Sensor Content includes on-sensor AI and machine learning models, and comprises code written expressly to deliver longer-term, reusable capabilities for CrowdStrike’s threat detection engineers.

    These capabilities include Template Types, which have pre-defined fields for threat detection engineers to leverage in Rapid Response Content. Template Types are expressed in code. All Sensor Content, including Template Types, go through an extensive QA process, which includes automated testing, manual testing, validation and rollout steps.

    The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing. This culminates in a staged sensor rollout process that starts with dogfooding internally at CrowdStrike, followed by early adopters. It is then made generally available to customers. Customers then have the option of selecting which parts of their fleet should install the latest sensor release (‘N’), or one version older (‘N-1’) or two versions older (‘N-2’) through Sensor Update Policies.

    The event of Friday, July 19, 2024 was not triggered by Sensor Content, which is only delivered with the release of an updated Falcon sensor. Customers have complete control over the deployment of the sensor — which includes Sensor Content and Template Types.

    Rapid Response Content
    Rapid Response Content is used to perform a variety of behavioral pattern-matching operations on the sensor using a highly optimized engine. Rapid Response Content is a representation of fields and values, with associated filtering. This Rapid Response Content is stored in a proprietary binary file that contains configuration data. It is not code or a kernel driver.

    Rapid Response Content is delivered as “Template Instances,” which are instantiations of a given Template Type. Each Template Instance maps to specific behaviors for the sensor to observe, detect or prevent. Template Instances have a set of fields that can be configured to match the desired behavior.

    In other words, Template Types represent a sensor capability that enables new telemetry and detection, and their runtime behavior is configured dynamically by the Template Instance (i.e., Rapid Response Content).

    Rapid Response Content provides visibility and detections on the sensor without requiring sensor code changes. This capability is used by threat detection engineers to gather telemetry, identify indicators of adversary behavior and perform detections and preventions. Rapid Response Content is behavioral heuristics, separate and distinct from CrowdStrike’s on-sensor AI prevention and detection capabilities.

    Rapid Response Content Testing and Deployment
    Rapid Response Content is delivered as content configuration updates to the Falcon sensor. There are three primary systems: the Content Configuration System, the Content Interpreter and the Sensor Detection Engine.

    The Content Configuration System is part of the Falcon platform in the cloud, while the Content Interpreter and Sensor Detection Engine are components of the Falcon sensor. The Content Configuration System is used to create Template Instances, which are validated and deployed to the sensor through a mechanism called Channel Files. The sensor stores and updates its content configuration data through Channel Files, which are written to disk on the host.

    The Content Interpreter on the sensor reads the Channel File and interprets the Rapid Response Content, enabling the Sensor Detection Engine to observe, detect or prevent malicious activity, depending on the customer’s policy configuration. The Content Interpreter is designed to gracefully handle exceptions from potentially problematic content.

    Newly released Template Types are stress tested across many aspects, such as resource utilization, system performance impact and event volume. For each Template Type, a specific Template Instance is used to stress test the Template Type by matching against any possible value of the associated data fields to identify adverse system interactions.

    Template Instances are created and configured through the use of the Content Configuration System, which includes the Content Validator that performs validation checks on the content before it is published.

    Timeline of Events: Testing and Rollout of the InterProcessCommunication (IPC) Template Type
    Sensor Content Release: On February 28, 2024, sensor 7.11 was made generally available to customers, introducing a new IPC Template Type to detect novel attack techniques that abuse Named Pipes. This release followed all Sensor Content testing procedures outlined above in the Sensor Content section.

    Template Type Stress Testing: On March 05, 2024, a stress test of the IPC Template Type was executed in our staging environment, which consists of a variety of operating systems and workloads. The IPC Template Type passed the stress test and was validated for use.

    Template Instance Release via Channel File 291: On March 05, 2024, following the successful stress test, an IPC Template Instance was released to production as part of a content configuration update. Subsequently, three additional IPC Template Instances were deployed between April 8, 2024 and April 24, 2024. These Template Instances performed as expected in production.

    What Happened on July 19, 2024?
    On July 19, 2024, two additional IPC Template Instances were deployed. Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data.

    Based on the testing performed before the initial deployment of the Template Type (on March 05, 2024), trust in the checks performed in the Content Validator, and previous successful IPC Template Instance deployments, these instances were deployed into production.

    When received by the sensor and loaded into the Content Interpreter, problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD).

    How Do We Prevent This From Happening Again?

    Software Resiliency and Testing

    • Improve Rapid Response Content testing by using testing types such as:
      • Local developer testing
      • Content update and rollback testing
      • Stress testing, fuzzing and fault injection
      • Stability testing
      • Content interface testing
    • Add additional validation checks to the Content Validator for Rapid Response Content. A new check is in process to guard against this type of problematic content from being deployed in the future.
    • Enhance existing error handling in the Content Interpreter.

    Rapid Response Content Deployment

    • Implement a staggered deployment strategy for Rapid Response Content in which updates are gradually deployed to larger portions of the sensor base, starting with a canary deployment.
    • Improve monitoring for both sensor and system performance, collecting feedback during Rapid Response Content deployment to guide a phased rollout.
    • Provide customers with greater control over the delivery of Rapid Response Content updates by allowing granular selection of when and where these updates are deployed.
    • Provide content update details via release notes, which customers can subscribe to.

     Updated 2024-07-24 2217 UTC

    Third Party Validation

    • Conduct multiple independent third-party security code reviews.
    • Conduct independent reviews of end-to-end quality processes from development through deployment.

    In addition to this preliminary Post Incident Review, CrowdStrike is committed to publicly releasing the full Root Cause Analysis once the investigation is complete.