Archives
All the articles I've archived.
-
Batch Means Two Different Things: Why the Term Became Confusing in Data Engineering
In data systems, some of the most common words are also the most overloaded. Few terms illustrate this better than batch . Historically, batch processing described a very specific operating model:
-
Why apt upgrade Didn’t Update VS Code (and What Actually Happened)
Problem Statement sudo apt update sudo apt upgrade But Visual Studio Code (Visual Studio Code) remains outdated. sudo apt install --only-upgrade code it updates successfully. This behavior is not
-
Tracking Subdomains in PostHog Without Breaking User Journeys
When a website grows, it often stops being just one site. A main domain may coexist with multiple subdomains, such as a marketing site, an events portal, a documentation site, or a learning platform.
-
Why Terraform Does Not Deploy Your Lambda Container Image
When teams start packaging AWS Lambda functions as container images, a common misunderstanding appears quickly: “I created the Lambda with Terraform, so why is AWS saying the image does not exist?”
-
ABC in Python: What It Is, Where It Comes From, and Why It Exists
When people begin exploring object-oriented design in Python, they eventually encounter this import: from abc import ABC, abstractmethod That usually leads to a natural question: what does ABC mean,
-
Can an AWS VPC Have Two Peering Connections? Yes. But Should It?
When teams begin structuring cloud networks in AWS, one of the first connectivity mechanisms they encounter is VPC Peering . It is simple, direct, and usually easy to implement for small environments.
-
Sending Events to Multiple PostHog Projects from the Same Website
In some architectures, a single website needs to send analytics events to multiple PostHog projects. This situation commonly appears in the following scenarios: Environment separation (development,
-
Lambda vs n8n: A Simple Explanation for Data Workflows
Introduction When building data systems or integrating APIs, a common question appears: should we use AWS Lambda or n8n? Both tools can automate processes, call APIs, and move data between systems,
-
Should You Use AWS Lambda or AWS Glue to Update Records in HubSpot?
When integrating HubSpot with a data platform on AWS, a common architectural decision appears quickly: Should updates to HubSpot be executed from AWS Lambda or AWS Glue? The correct choice depends on
-
Understanding client_ingestion_warning in PostHog: Are You Losing Data?
When using PostHog with the default posthog-js configuration, you may encounter the following warning: posthog-js client rate limited. Config is set to 10 events per second and 100 events burst limit.
-
Daily Failure Reporting in DynamoDB Using Lambda, EventBridge Scheduler, and SES
Operational monitoring requires structured visibility into failures. If your processes write execution logs to DynamoDB and mark failed executions with status = FAILED , you can implement a
-
Hardening OAuth Token Management in Postman: Preventing Environment Cross-Contamination
When working with multiple third-party APIs (Zoom, HubSpot, Meta, etc.), a common operational risk in Postman is environment cross-contamination . Tokens may be overwritten unintentionally if the
-
Understanding ip-api Batch Limits and Effective Throughput
When integrating IP geolocation into a data pipeline, understanding rate limits and batching constraints is essential. This post analyzes the practical limits of the ip-api free tier and how to
-
Window Functions vs JOIN in Spark: A Physical Plan Perspective
When solving analytical queries in Spark SQL, there are often multiple correct formulations. However, they do not produce equivalent execution plans. This article compares two approaches to the same
-
Can You Know the Location of an IPv6 Address?
Example IPv6: 2600:100e:b0c7:7403:f88c:92d0:bc41:46ff Short answer: only approximately , and with significant limitations. This article explains what can and cannot be inferred from an IPv6 address,
-
AWS Glue + Chargebee: Diagnosing CERTIFICATE_VERIFY_FAILED After TLS Chain Updates
Context An AWS Glue job that consumes the Chargebee API begins failing with: SSLError: SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate The same
-
From OLTP to OLAP: How Data Moves from 3NF to a Dimensional Data Warehouse
Modern data architectures typically separate operational systems from analytical systems. This separation is not accidental—it reflects fundamentally different workloads, data models, and optimization
-
Why There Is No “Interpreter” Endpoint in the Zoom API
Many teams attempt to retrieve language interpretation usage (e.g., minutes consumed per language channel) through the Zoom REST API, only to discover that no such endpoint exists for Meetings or
-
Why You Can’t Get Full Social Analytics from the HubSpot API (Even with Marketing Hub Pro)
Many teams assume that upgrading to Marketing Hub Professional unlocks full programmatic access to social media performance metrics. It does not. This article clarifies what is technically possible,
-
Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL
It is common to assume that joining two small tables will produce a small result set. In practice, this assumption frequently fails. Even tables with only a handful of rows can generate unexpectedly
-
Resolving the Node.js Error: Cannot find module jsonwebtoken
When developing backend services with Node.js, especially APIs that implement authentication, it is common to rely on JSON Web Tokens (JWT). One frequent runtime error encountered in this context is:
-
Sending Athena Query Results to Amazon SQS: Architecture, Costs, and Limitations
Introduction Amazon Athena is a serverless query service designed for interactive analysis of data stored in Amazon S3. Amazon SQS (Simple Queue Service), on the other hand, is a fully managed message
-
Extracting and Managing Access Tokens in Postman
When working with APIs that use OAuth 2.0 or token-based authentication, a common requirement is to extract an access_token from a successful authentication request and reuse it in subsequent API
-
How PostHog Uses ClickHouse for High-Performance Product Analytics
Modern product analytics platforms must process billions of events while still delivering low-latency queries for dashboards, funnels, and retention analysis. PostHog addresses this requirement by
-
Hiding Personal Information in AWS Glue with Spark
Protecting personal data before analytics consumption is a core requirement in modern data platforms. In AWS-based lake architectures, this is typically achieved through data de-identification during
-
Rebasing vs Creating a New Branch: How to Handle Outdated Feature Branches Correctly
In collaborative software projects, it is common to face the following situation: a feature branch was created some time ago, work was done on it, and meanwhile the main branch continued to evolve.
-
Automating OAuth 2.0 in Postman: storing and refreshing access tokens without copy-paste
Introduction When working with APIs protected by OAuth 2.0, Postman is commonly used for development and testing. A frequent pain point is manual token handling : requesting an access token, copying
-
Running Scheduled GitHub Actions Locally for Safer Debugging
Overview When working with scheduled automation jobs in GitHub Actions, it is common to face a simple but critical question: Can this workflow be executed locally before pushing to production? The
-
Designing a Scalable Course Progress Service on AWS
EC2, Lambda, DynamoDB, and RDS Cost and Architecture Trade-offs Context In a multi-platform learning environment where users can advance through courses using both Web and Mobile applications ,
-
Handling Boolean vs IntegerType Mismatches Between MySQL and Spark (Glue JDBC)
When integrating MySQL data into Apache Spark (for example, through AWS Glue), you might encounter schema mismatches caused by how MySQL represents TINYINT(1) fields. This issue often surfaces when
-
Controlling Branch Deployments and Redirects in Vercel: A Practical Guide
Continuous deployment platforms simplify the release process, but they can easily become noisy when every branch triggers a build. Teams working with multiple development environments often need finer
-
AWS EventBridge Rules vs EventBridge Scheduler: Which One Should You Use?
In the AWS ecosystem, there are two main ways to schedule and automate tasks: EventBridge Rules (scheduled rules) and the newer EventBridge Scheduler , which introduces Schedule Groups . While both
-
Estimating the Cost of an AWS Glue Workflow
When working with AWS Glue, one of the most common questions data engineers ask is: How much will this job cost me? If you have a workflow that runs for 13 minutes, understanding the cost model of AWS
-
Modern Table Formats: Iceberg, Delta Lake, and Hudi
Data Lakes made it possible to store raw data at scale, but they lacked the reliability and governance of data warehouses. Files could be dropped into storage (S3, HDFS, MinIO), but analysts struggled
-
Running Production Servers on AWS: EC2 vs RDS Cost Breakdown
When planning to run production workloads in the cloud, cost is one of the most important considerations. In this post, we will explore the monthly expenses of running two application servers and a
-
Trino in Modern Architectures: SQL Queries on S3 and MinIO
The rise of cloud object storage has transformed how organizations build data platforms. Hadoop Distributed File System (HDFS) once dominated, but today services like Amazon S3, Google Cloud Storage
-
Hive Metastore: The Glue Holding Big Data Together
When people think of Hive, they often remember the early days of Hadoop and MapReduce. But while Hive as a query engine has largely faded, one of its components remains critical to the modern data
-
Why Parquet Became the Standard for Analytics
In the early days of Big Data, data was often stored in simple formats such as CSV, JSON, or text logs. While these formats were easy to generate and understand, they quickly became inefficient at
-
Facebook and Big Data: The Open Source Projects That Changed the Industry
When people talk about the history of Big Data, a few companies come to mind: Google, Yahoo, and Facebook. Each of them faced unique challenges that forced them to build large-scale distributed
-
HDFS vs. Object Storage: The Battle for Distributed Storage
Distributed storage has always been the foundation of Big Data. In the early days, Hadoop Distributed File System (HDFS) was the de facto standard. Today, however, object storage systems like Amazon
-
The History of Hive and Trino: From Hadoop to Lakehouses
The evolution of Big Data architectures is deeply tied to the history of two projects born at Facebook: Hive and Trino . Both emerged from real engineering pain points, but at different times and for
-
What Is a Data Lake and What Is a Data Lakehouse?
Over the last decade, the world of data architecture has gone through several transformations. From traditional data warehouses to Hadoop-based data lakes and now to the emerging Lakehouse paradigm,
-
Google Bigtable vs. Amazon DynamoDB: Understanding the Differences
When choosing a NoSQL database for scalable, low-latency applications, two major options stand out: Google Cloud Bigtable and Amazon DynamoDB . While both are managed, highly available, and
-
How to Keep a Docker Container Running Persistently
When working with Docker, you may have noticed that some containers stop as soon as you exit the shell. This is because Docker considers the container's main process to have finished. In this post, we
-
Fixing Cursor Login Issues on Linux (AppImage)
When running Cursor on Linux, especially with the AppImage version, you might encounter a situation where you can’t log in. This usually happens because Cursor stores its session state locally, and
-
Managing Evolving Schemas in Apache Spark: A Strategic Approach
Schema management is one of the most overlooked yet critical aspects of building reliable data pipelines. In a fast-moving environment, schemas rarely remain static: new fields are added, data types
-
Orchestrating Multiple AWS Glue Workflows: A Practical Guide
AWS Glue provides a robust environment for building and managing ETL pipelines, but many data engineers face the challenge of chaining or coordinating multiple workflows . This article explores
-
Secure Ways to Share Private Data on AWS: Beyond Public Buckets
When building data platforms in the cloud, it is common to share data with partners, clients, or internal teams outside your own. AWS provides several mechanisms to grant secure, granular access — far
-
Designing a Semantic Layer for Athena + Power BI
Modern data architectures benefit from a clear separation of layers: Ingesta , Staging , and Semantic (Presentation) . When using Amazon Athena as the query engine and Power BI as the visualization
-
Querying JSONB in PostgreSQL Efficiently
In modern applications, it is common to store semi-structured data in JSON format inside a relational database like PostgreSQL. However, to analyze this data properly, you need a way to transform it
-
Understanding Window Functions in SQL: Beyond Simple Aggregations
When we think about SQL functions, we often start with scalar functions ( UPPER() , ROUND() , NOW() ) or aggregate functions ( SUM() , AVG() , COUNT() ). But there is a third type that is essential
-
Automating Data Extraction with Airflow, BeautifulSoup, and MinIO
In the data engineering ecosystem, a common task is to automate the extraction of data from external sources, perform minimal processing, and store it in a data lake for further analysis. In this
-
How to Set CloudWatch Log Retention Policies with Terraform
AWS CloudWatch is a powerful service for monitoring applications and infrastructure. However, by default, CloudWatch Logs are configured to never expire . This can lead to excessive storage costs and
-
How to Disable an AWS Glue Trigger from the CLI
When working with AWS Glue , triggers are an important mechanism to orchestrate jobs or workflows. Sometimes, however, you may need to temporarily disable a trigger without deleting it—for example, to
-
Orchestrating Multiple AWS Glue Workflows with Step Functions
In modern data architectures, it is common to manage multiple ETL pipelines that must run in sequence or in parallel. AWS Glue provides a robust framework for building workflows, but when we need to
-
Understanding the Strategy Design Pattern
In the landscape of software design, maintaining flexibility and scalability is crucial. One of the most effective ways to achieve these qualities is by leveraging design patterns. Among the
-
Choosing Between saveAsTable and Iceberg’s writeTo in AWS Glue and Athena
When working with Spark on AWS Glue , there are multiple ways to persist DataFrames as tables and make them queryable in Amazon Athena . Two common approaches are: Using Spark’s Hive-style saveAsTable
-
Debugging Spark DataFrame .show() Timeouts in PyCharm and VSCode
When working with PySpark , one of the first commands developers use to quickly inspect data is: raw_df.show() However, in certain environments (especially when running inside PyCharm or VSCode with a
-
Incremental Data Loads: Choosing Between resource_version and created_at/updated_at
Incremental data loading is a cornerstone of modern data engineering pipelines. Instead of re-ingesting entire datasets on each execution, incremental strategies focus on retrieving only records that
-
Optimizing Amazon Athena Queries with Partitions: A Practical Example
When working with Amazon Athena, one of the most effective strategies to improve query performance and reduce costs is partitioning your data . Partitions allow Athena to scan only the relevant
-
Running Apache Airflow Across Environments
Apache Airflow has become a de facto standard for orchestrating data workflows. However, depending on the environment, the way Airflow runs can change significantly. Many teams get confused when
-
Can You Perform Data Grouping Directly with the yFinance API?
When working with financial data, efficient aggregation and analysis are essential for generating meaningful insights. A common question among developers and data analysts is whether the yFinance
-
Optimizing Partition Strategies in Apache Iceberg on AWS
When working with large-scale analytical datasets, efficient partitioning is critical for achieving optimal query performance and cost savings. Apache Iceberg, a modern table format designed for big
-
How Transactions Work in Databricks Using Delta Lake
Databricks is a powerful platform for big data analytics and machine learning. One of its key features is the ability to run transactional workloads over large-scale data lakes using Delta Lake . This
-
Versioning Terraform Resources to Meet CIS Security Standards
Infrastructure as Code (IaC) has become a foundational practice for modern DevOps and cloud-native teams. Terraform, as one of the most widely adopted IaC tools, enables infrastructure automation,
-
Choosing Between DynamoDB and Cassandra for a Crypto Exchange
When designing the backend of a crypto exchange, selecting the right database architecture is crucial. Two common NoSQL databases often considered for this type of application are Amazon DynamoDB and
-
Handling Python datetime Objects in Amazon DynamoDB
When developing data pipelines or applications that store time-based records in Amazon DynamoDB , developers frequently encounter serialization errors when working with Python's datetime objects.
-
AWS Glue Workflow vs Apache Airflow: A Professional Comparison
While both serve the common purpose of managing and automating data workflows, they differ significantly in architecture, flexibility, integration capabilities, and operational control. This article
-
Reducing AWS Costs: How to Temporarily Stop an Aurora Serverless v2 Cluster
When managing cloud infrastructure, minimizing costs without compromising data integrity is a continuous priority. Amazon Aurora Serverless v2 offers scalability and high availability, but unlike
-
he Enduring Relevance of Peter Chen’s Entity-Relationship Model
In the landscape of data modeling, few contributions have had the long-lasting impact of Peter Chen’s Entity-Relationship (E-R) Model , introduced in 1976. More than four decades later, it remains a
-
EMR vs AWS Glue: Choosing the Right Data Processing Tool on AWS
When working with big data on AWS, two commonly used services for data processing are Amazon EMR and AWS Glue . Although both support scalable data transformation and analytics, they differ
-
How Hadoop Made Specialized Storage Hardware Obsolete
In the early 2000s, enterprise data processing was dominated by high-end hardware. Organizations relied heavily on centralized storage systems such as SAN (Storage Area Networks) and NAS (Network
-
When Should You Use Iceberg with Athena? Partitioning Strategies and Best Practices
As data lakes grow in size and complexity, tools like Amazon Athena combined with table formats like Apache Iceberg become essential for scalability, data governance, and performance. In this post,
-
Why You Should Use the -out Option with terraform plan
When working with Terraform, a common workflow involves running terraform plan followed by terraform apply . However, you may have come across the following warning: "You didn't use the -out option to
-
How Google Changed Big Data: The Story of GFS, MapReduce, and Bigtable
In the early 2000s, Google faced a unique challenge: how to store, process, and query massive amounts of data across thousands of unreliable machines. The traditional systems of the time—designed for
-
ecure Database Access in AWS Using SSH Tunneling
Accessing databases located in private subnets within AWS Virtual Private Clouds (VPCs) is a common requirement in enterprise architectures. To ensure secure connectivity without exposing the database
-
Did Early Personal Computers Really Have a CPU? A Look at the von Neumann Architecture
When we think of a personal computer (PC), we typically imagine a processor, memory, a keyboard, and a display. But a deeper question often goes unasked: Did all early personal computers actually
-
Mastering the Linux find Command: A Practical Introduction
When working with Linux, one of the most powerful tools at your disposal is the find command. Whether you're managing a personal machine or maintaining a production server, being able to locate files
-
The Origin and Evolution of the DataFrame
When working with data today—whether in Python, R, or distributed computing platforms like Spark—one of the most commonly used structures is the DataFrame . But where did it come from? This post
-
Understanding ORM: Bridging the Gap Between Objects and Relational Databases
In modern software development, working with databases is a fundamental requirement. Most applications need to persist, retrieve, and manipulate data stored in relational databases such as PostgreSQL,
-
Python Decorators: What They Are, How They Work, and Why C Doesn't Have Them
In Python, decorators are a powerful feature for applying common logic to multiple functions without duplicating code. They allow you to extend or modify the behavior of functions, methods, or classes
-
Understanding findOne and findOneAndUpdate in Mongoose: Key Differences and Practical Usage
When working with MongoDB through Mongoose in Node.js, developers frequently encounter two essential methods: findOne and findOneAndUpdate . Both methods perform document lookups, but they serve
-
Are NoSQL Databases Really Schema-less?
A Perspective from the MERN Stack When we first start learning about NoSQL databases, one of the most common things we hear is that they are "schema-less." At first glance, this seems like a huge
-
How Network Topology Shapes Distributed Computing and Big Data Systems
When discussing distributed systems and Big Data, people often focus on storage, processing frameworks, and scalability—but one foundational concept underlies it all: network topology . It’s the
-
When Should You Use Parquet and When Should You Use Iceberg?
In modern data architectures, selecting the right storage and management solution is essential for building efficient, reliable, and scalable pipelines. Two popular choices that often come up are
-
How to Fix 'DataFrame' object has no attribute 'writeTo' When Working with Apache Iceberg in PySpark
If you’re working with Apache Iceberg in PySpark and encounter this error: Failed to write to Iceberg table: 'DataFrame' object has no attribute 'writeTo' You’re not alone. This is a common mistake
-
What Is Sharding and Why It Matters
As our world becomes increasingly digital, the amount of data we create every day is staggering. Think about all the emails, messages, orders, and photos uploaded every second. How do big companies
-
From Tables to Partitions: Designing NoSQL Databases with Cassandra
As data professionals transition from relational databases to NoSQL systems like Apache Cassandra, one of the most important mindset shifts is understanding that you don't model data for storage, but
-
Apache Cassandra vs Apache Parquet: Understanding the Differences
In modern data architectures, it's common to encounter both Apache Cassandra and Apache Parquet , particularly when dealing with large-scale, distributed systems. Both technologies are associated with
-
Import Live Crypto Prices into Google Sheets
Are you tired of checking crypto prices manually? Want to automate your portfolio tracking or build a custom crypto dashboard? Good news — with just a few steps, you can pull live cryptocurrency
-
Fixing Spark Ivy Error in Docker: "basedir must be absolute"
If you're running Apache Spark inside Docker using Bitnami's images and suddenly encounter an Ivy error that says: Exception in thread "main" java.lang.IllegalArgumentException: basedir must be
-
How Dynamo Reshaped the Internal Architecture of Amazon S3
Introduction Amazon S3 launched in 2006 as a scalable, durable object storage system. It avoided hierarchical file systems and used flat key-based addressing from day one. However, early versions of
-
What’s Behind Amazon S3?
When you upload a file to the cloud using an app or service, there's a good chance it's being stored on Amazon S3 (Simple Storage Service). But what powers it under the hood? What is Amazon S3? Amazon
-
How HDFS Achieves Fault Tolerance Through Replication
One of the core strengths of the Hadoop Distributed File System (HDFS) is its fault tolerance . In a world of distributed computing, failures are not rare—they're expected. HDFS tackles this by using
-
Summary: Teaching HDFS Concepts to New Learners
Introducing Hadoop Distributed File System (HDFS) to newcomers can be both exciting and challenging. To make the learning experience structured and impactful, it’s helpful to break down the core
-
How Clients Know Where to Read or Write in HDFS
Hadoop Distributed File System (HDFS) is designed to decouple metadata management from actual data storage . But how does a client—like a Spark job or command-line tool—know where to read or write the
-
How HDFS Avoids Understanding File Content
One of the defining features of Hadoop Distributed File System (HDFS) is that it doesn’t understand the contents of the files it stores . This is not a limitation—it's an intentional design choice
-
How Spark and MapReduce Handle Partial Records in HDFS
When working with large-scale data processing frameworks like Apache Spark or Hadoop MapReduce, one common question arises: What happens when a record (e.g., a line of text or a JSON object) is split
-
How HDFS Tracks Block Size and File Boundaries
When dealing with massive files, Hadoop Distributed File System (HDFS) doesn't read or store them as a whole. Instead, it splits them into large, fixed-size blocks . But how does it know where each
-
How Metadata Works in HDFS and What It Stores
HDFS stores metadata separately from the actual file content to optimize performance and scalability. This metadata is managed entirely by the NameNode , which allows clients to quickly locate and
-
The Architecture of HDFS: NameNode, DataNodes, and Metadata
HDFS (Hadoop Distributed File System) was built to support the reliable storage and access of large datasets distributed across commodity hardware. To make this possible, HDFS relies on a master/slave
-
What Happens When HDFS Splits Files Mid-Word or Mid-Row?
HDFS is designed to store and process massive amounts of data efficiently. One of its key design decisions is to split files into large, fixed-size blocks , typically 128MB or 256MB. But what happens
-
How HDFS Handles File Partitioning and Block Distribution
One of the key innovations behind the Hadoop Distributed File System (HDFS) is how it breaks down large files and distributes them across multiple machines. This mechanism, called partitioning and
-
What is HDFS and Why Was It Revolutionary for Big Data?
In the early 2000s, the world was generating data at a scale never seen before—web logs, social media, sensors, and more. Traditional storage systems simply couldn't keep up with the volume, velocity,
-
What Is Serialization?
In the world of data engineering and software systems, serialization is a fundamental concept that allows you to efficiently store, transmit, and reconstruct data structures. If you’ve worked with
-
From HDFS to S3: The Evolution of Data Lakes in the Cloud
For years, HDFS (Hadoop Distributed File System) was the default choice for building data lakes in on-premises and Hadoop-based environments. But as cloud computing gained momentum, a new player took
-
Is S3 the New HDFS? Comparisons and Use Cases in Big Data
Over the past decade, the way organizations store and manage big data has shifted dramatically. Once dominated by the Hadoop Distributed File System (HDFS) , the field is now led by Amazon S3 and
-
The History and Evolution of Amazon S3: Was It Ever Based on HDFS?
When discussing cloud storage today, Amazon S3 is almost synonymous with scalable, reliable object storage. However, a common question among those familiar with big data technologies like Hadoop is:
-
MapReduce: A Framework for Processing Unstructured Data
MapReduce is both a programming model and a framework designed to process massive volumes of data across distributed systems. It gained popularity primarily due to its efficiency in handling
-
Understanding .master() in Apache Spark
In Apache Spark, the .master() method is used to specify how your application will run, either on your local machine or on a cluster. Choosing the correct option is essential depending on your
-
How Joins Work in PostgreSQL
Joins are one of the most powerful features in SQL, allowing you to combine data from multiple tables in a single query. PostgreSQL, as a relational database system, provides robust support for
-
How to Improve Query Performance in PostgreSQL
PostgreSQL is a powerful relational database, but even the most robust systems can suffer from slow queries without proper tuning. Optimizing query performance is crucial to ensure scalability,
-
Optimizing Joins in PostgreSQL: Practical Cases
Joins are essential for querying relational databases, but they can significantly impact performance if not optimized correctly. PostgreSQL provides several ways to improve join efficiency, from
-
Benchmarking OLTP vs. OLAP: Measuring Performance Effectively
Understanding the performance differences between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) is crucial for designing efficient database systems. This post outlines a
-
OLTP vs. OLAP: How JOINs and Efficiency Shape Their Differences
Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are two distinct database architectures, each designed for different purposes. One key factor that differentiates them is
-
The Origins of OLTP and OLAP: A Brief History
Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are fundamental concepts in database management, each serving distinct purposes. But when did these terms first appear, and
-
Comparison Between Star Schema and Snowflake Schema in PostgreSQL
Comparison Between Star Schema and Snowflake Schema in PostgreSQL When designing a database for analytical workloads, choosing the right schema can significantly impact performance and query
-
Running PySpark on Google Colab: Do You Still Need findspark?
Introduction For a long time, using Apache Spark in Google Colab required manual setup, including installing Spark and configuring Python to recognize it. This was often done using the findspark
-
Testing Apache Airflow DAGs: A Modular Approach
Introduction Apache Airflow is a powerful workflow automation tool, but testing DAGs can be challenging due to their dependency on the Airflow scheduler and execution environment. In this post, we
-
Visualizing EXPLAIN ANALYZE in PostgreSQL
When working with PostgreSQL, understanding how queries execute can greatly improve performance tuning and optimization. PostgreSQL provides the EXPLAIN ANALYZE command to help developers analyze
-
Enabling Internet Access for Resources in a Public Subnet
When deploying resources in a public subnet within an AWS Virtual Private Cloud (VPC), you need to configure several components to allow them to communicate with the internet. Below are the essential
-
Network Address Translation (NAT): Overcoming IPv4 Shortages
Introduction Network Address Translation (NAT) is a technology designed to mitigate the shortage of IPv4 addresses by allowing multiple devices on a private network to share a limited number of public
-
Understanding Subnets, Gateways, and Route Tables in AWS
When designing applications in AWS, it's crucial to understand how networking components interact within a Virtual Private Cloud (VPC). This post will cover subnets, gateways, and route tables,
-
Generating a Calendar Table in Power Query (M Language)
When working with Power BI or other Power Query-supported tools, having a well-structured calendar table is essential for time-based analysis. In this blog post, we will walk through an M Language
-
How to Display an Error in Excel When More Than 5 "FALSE" Values Appear in a Row
Introduction When working with data in Excel, there may be instances where you need to monitor certain conditions and flag errors based on specific criteria. In this guide, we'll walk through a simple
-
Splitting Strings in Excel: A Simple Guide
When working with Excel, you may encounter situations where you need to split a string into separate parts. For example, consider the following string: orderId: 12345abc-de67-89fg-hijk-123456lmnop If
-
Handling Schema Changes in a Data Warehouse
When building and maintaining a Data Warehouse (DWH) , handling schema changes without breaking existing processes is a crucial challenge for data engineers. As new requirements emerge, we often need
-
Understanding How Hive Converts SQL Queries into Hadoop Jobs
When you execute a SQL query in Apache Hive, the query is not directly run on a traditional database. Instead, Hive translates it into a Hadoop job, which is then executed across a distributed system.
-
Delta Lake vs. Traditional Data Lakes: Key Differences and Vendor Options
Introduction As data-driven organizations scale their analytics and machine learning workloads, the limitations of traditional data lakes become more apparent. Delta Lake is an open-source storage
-
Why OLTP Systems Don't Retain Historical Changes
Online Transaction Processing (OLTP) systems are designed for high-speed transactions and efficient data management. However, one of their characteristics is that they do not retain historical changes
-
Understanding Slowly Changing Dimensions (SCD) in Data Warehousing
When dealing with data warehouses, handling changes in dimension data over time is crucial. Unlike operational databases where updates are straightforward, data warehouses require preserving
-
Modes and Examples of KPIs in Data Analysis Expressions (DAX)
Last Year Comparison When analyzing sales performance, it is often useful to compare the current year's sales with the same period in the previous year. To do this, we create several calculated
-
Understanding Surrogate Keys in Databases
When designing relational databases, one crucial decision is how to uniquely identify each record in a table. This is where surrogate keys come into play. Unlike natural keys, which derive from
-
Understanding the Relationship Between Database Replication and the CAP Theorem
Introduction Database replication is a fundamental strategy in distributed systems that ensures data is duplicated across multiple nodes. However, when designing a replicated database, one must
-
Understanding Pagination vs. Batch Processing in Data Handling
When working with large datasets, developers often face the challenge of efficiently extracting, processing, and managing data. Two commonly used techniques for handling such data efficiently are
-
Tracking Daily File Size Changes in SQL
When working with databases that store file metadata, it's often useful to track how file sizes change over time. If you have a table with the following structure: id | timestamp | name_file | size
-
Resolving 'index.lock' Issue in Git
When working with Git, you may encounter an error preventing you from switching branches or performing other operations. A common issue is the following: fatal: Unable to create '.git/index.lock':
-
Merging Data in PostgreSQL vs. MySQL: How to Handle Upserts
When working with databases, you often need to update existing records or insert new ones based on whether a match is found. In PostgreSQL, this is efficiently handled using the MERGE statement.
-
Understanding the Difference Between hostname -i and hostname -I in Linux
When working with Linux, you might come across the commands hostname -i and hostname -I , both of which return IP addresses. At first glance, they seem similar, but they serve different purposes. In
-
Understanding the CAP Theorem in NoSQL Databases
The CAP theorem (Consistency, Availability, and Partition Tolerance) plays a crucial role in designing and selecting NoSQL databases. This theorem states that in a distributed system, it is impossible
-
Understanding the Differences Between Parquet, Avro, JSON, and CSV
When working with data, choosing the right file format can significantly impact performance, storage efficiency, and ease of use. In this post, we will compare four widely used data formats: Parquet,
-
Optimizing Queries with Partitioning in Databricks
Partitioning is a crucial optimization technique in big data environments like Databricks. By partitioning datasets, we can significantly improve query performance and reduce computation time. This
-
Calculating Levenshtein Distance in Apache Spark Using a UDF
When working with text data in big data environments, measuring the similarity between strings can be essential. One of the most commonly used metrics for this is the Levenshtein distance , which
-
Creating a PySpark DataFrame for Sentiment Analysis
When working with sentiment analysis, having structured data in a PySpark DataFrame can be very useful for processing large datasets efficiently. In this post, we will create a PySpark DataFrame
-
Understanding Docker Engine Components
Docker Engine is an open-source platform that has revolutionized how applications are developed, deployed, and executed using container technology. By encapsulating applications and their dependencies
-
Automating Payment Calculation in Google Docs Using Apps Script
Introduction Google Apps Script is a powerful tool that allows you to automate tasks within Google Workspace applications, such as Google Docs. In this tutorial, we will create a script that prompts
-
Ranking Products Using Window Functions in PySpark
Introduction Window functions are powerful tools in SQL and PySpark that allow us to perform calculations across a subset of rows related to the current row. In this blog post, we'll explore how to
-
Handling Null Values in Data: Algorithms and Strategies
Null values are a common challenge in data analysis and machine learning. Dealing with them effectively is essential to ensure the reliability of your insights and models. In this post, we’ll explore
-
Exploring Free Resources to Learn AWS and Azure Cloud Platforms
Cloud computing is an essential skill in today’s tech landscape. Among the major players, AWS and Azure stand out as leading cloud platforms, offering a wealth of free resources to help individuals
-
What Does an Exploratory Data Analysis (EDA) Evaluate?
An Exploratory Data Analysis (EDA) is a critical step in the data analysis process that focuses on evaluating and examining data to uncover its main characteristics. It is performed before delving
-
Adding Custom Columns to Your Date Table in Power BI
Introduction A Date Table is an integral part of building robust and insightful Power BI reports. While a basic Date Table allows for time-based filtering and analysis, custom columns can add even
-
Grouping Data in PySpark with Aliases for Aggregated Columns
When working with large datasets in PySpark, grouping data and applying aggregations is a common task. In this post, we’ll explore how to group data by a specific column and use aliases for the
-
Handling Offset-Naive and Offset-Aware Datetimes in Python
When working with datetime objects in Python, you may encounter the error: TypeError: can't compare offset-naive and offset-aware datetimes This error occurs when comparing two datetime objects where
-
Extracting Dynamic Content from an iFrame with Selenium in Python
Accessing content inside an iFrame can be tricky, especially when the content is loaded dynamically. In this blog post, we’ll walk through an example of how to navigate an iFrame, click on an
-
Automating SQL Script Execution with Cron
In this blog post, we’ll explore how to automate the execution of SQL scripts using cron , a powerful scheduling tool available on Unix-based systems. This approach is ideal for database
-
Are Indexes a Good Strategy for Analytical Databases?
Indexes are a well-known optimization technique in database management, often associated with improving query performance. However, whether they are a good strategy for analytical databases depends on
-
Counting Word Frequency in a SQL Column
Sometimes, you may need to analyze text data stored in a database, such as counting the frequency of words in a text column. This blog post demonstrates how to achieve this in SQL using a practical
-
Orchestrating SQL Files: Efficiently Managing Multiple Scripts
When working on database projects, you often find yourself managing and executing multiple SQL files. Whether these files are for creating schemas, seeding data, or running migrations, orchestrating
-
Understanding the Evolution of Data Warehousing: From Codd's Relational Model to Modern Data Warehouses
Data management has undergone significant transformations since the advent of the relational model by Edgar F. Codd. Today, data warehouses stand as a cornerstone of modern data analytics. This blog
-
How to Rename a Git Branch Locally and Remotely
Renaming Git branches can be necessary when adhering to naming conventions or correcting errors. This guide will walk you through the process of renaming a branch locally and remotely. Scenario: You
-
Troubleshooting Import Errors in Python: A Case Study
Python's modular design allows developers to break their code into smaller, reusable components. However, import errors can often disrupt the flow, especially in complex projects. In this post, we’ll
-
Creating Dynamic Dates in Excel: A Practical Guide
When working with Excel, you may encounter situations where you need to dynamically generate a date using the current year, a specific month, and a day. This post will guide you through creating such
-
How to Simulate Column Headers Without Selecting from a Table in SQL
In some cases, you may want to produce a result set with specified column names and values without querying an actual table. This is often used for testing purposes, documentation, or even when
-
How to Create a Date Table in Power BI Using DAX
Introduction In Power BI, a Date Table is essential for working with time series data effectively. A well-structured Date Table simplifies time-based analysis, allowing you to filter by specific
-
Parsing Complex Data from HTML Tables with Python
When working with web scraping, you often encounter scenarios where HTML content is nested or contains encoded data within JavaScript attributes. This post walks through parsing player statistics from
-
Comparative Investment Analysis of Invesco and Blackstone Using Python
Introduction In this post, we'll explore how to use Python programming to compare the performance of two investment firms, Invesco and Blackstone. Invesco is known for its focus on public asset
-
Comparing Risk-Adjusted Returns Using the Sharpe Ratio in Python
Investors frequently face the challenge of assessing whether an asset's return justifies its risk. This is where the Sharpe Ratio becomes invaluable, providing a measure that accounts for both returns
-
Handling the "ERR_HTTP_HEADERS_SENT" Error in Node.js Express
When building REST APIs with Node.js and Express, one common error that developers encounter is ERR_HTTP_HEADERS_SENT: Cannot set headers after they are sent to the client . This error can be
-
Understanding Stateful vs. Stateless Firewalls in AWS
When working with network security, it's crucial to understand the difference between stateful and stateless firewalls. In AWS, this understanding is particularly important when configuring security
-
Creating a Custom Column with SWITCH in Power BI
In Power BI, creating custom columns based on multiple conditions is a powerful way to enhance the analysis and presentation of your data. One of the most versatile functions for this purpose is
-
Understanding module.exports in Node.js: Exporting and Importing Modules
In Node.js, organizing your code into reusable, modular components is a key practice for writing maintainable applications. This is done through modules — self-contained blocks of code that can be
-
Filtering Data in Azure Data Factory: Keeping Only "FileWrite" Operations
In this post, I’ll walk through how to filter rows in Azure Data Factory (ADF) using the Filter activity to retain only the rows where a specific column ( OperationName ) has the value "FileWrite".
-
Handling Deletion of Bootcamps in a Node.js API with Mongoose
In this post, I’ll walk through the process of handling the deletion of bootcamps in a Node.js API using Mongoose. Recently, while working on a project, I encountered a TypeError when attempting to
-
Built-in Functions vs. Object-Oriented Methods
Python strives to be simple and clear, so some operations are implemented as built-in functions , while others are object-specific methods . This distinction arises from the way Python handles
-
Extracting the Last Element from a Delimited String in Azure Data Factory
When working with data in Azure Data Factory (ADF), it's common to deal with delimited strings. You might need to extract the last element from such strings. For instance, given a string like
-
How to Simplify a Mongoose Schema in Node.js
When working with Mongoose in Node.js, defining a schema for your models can get repetitive and verbose, especially if you're specifying data types and validation repeatedly. In this post, we’ll look
-
How to Choose the Best Classification Model Based on Performance Metrics
When working on machine learning classification tasks, selecting the best model often involves analyzing various performance metrics like accuracy, precision, recall, and F1-score. In this post, I’ll
-
How to Log in Python: Console and File Logging with yfinance Example
Logging is a vital part of any application, offering insights into the application's flow, performance, and error handling. In many scenarios, you may want to log messages both to the console and a
-
Implementing Query Filtering in Express with Mongoose
In modern API development, providing flexible querying mechanisms is essential to allow clients to filter and retrieve data efficiently. In this post, we'll go over how to implement query filtering
-
Downloading Data from the SEC Website using Python
In this blog post, I’ll show you how to download a JSON file from the U.S. Securities and Exchange Commission (SEC) website using Python. The file contains company tickers, which can be useful for
-
Understanding Idempotency in Python with Simple Examples
Idempotency is a fundamental concept in computing that describes operations which produce the same result no matter how many times they are performed. In this blog post, we’ll explore idempotency
-
Best Practices: Using Direct SQL Queries in CodeIgniter
In this blog post, we'll discuss the pros and cons of using direct SQL queries in CodeIgniter and explore alternatives that enhance security, readability, and maintainability. What is Direct SQL?
-
Creating a Custom Column with a Random String in Power BI Using DAX
Introduction In Power BI, customizing your dataset by adding calculated columns can significantly enhance your data analysis capabilities. One common need is to generate random strings or categories
-
How to Implement MVC in CodeIgniter to Clean Up Your Views
When building web applications, it's easy to end up with PHP logic mixed directly into your HTML views, especially in smaller projects. However, this can lead to messy, hard-to-maintain code. The
-
Counting Covered Points on a Number Line
Introduction Algorithmic challenges often involve intervals and can initially seem complex. One such problem is determining how many unique points are covered by a set of intervals on a number line.
-
Renaming Modules in Python for Clarity and Accuracy
Renaming modules in Python is an essential practice to improve code clarity and maintainability, especially as projects grow in complexity. Using intuitive and descriptive names helps in quickly
-
Handling shutil.SameFileError When Copying Files in Python
When using Python’s shutil.copy() or shutil.copy2() to copy files, you might run into a shutil.SameFileError if you mistakenly attempt to copy a file onto itself. This error occurs when the source and
-
Preserving Directory Structure While Copying Files in Python - version 2
When copying files from one directory to another in Python, it's important to maintain the original directory structure, especially when dealing with nested directories. In this post, we'll explore
-
Avoiding Duplicate File Copies Based on Content in Python on AWS
When working with large file systems, copying files can often lead to unintentional duplication, especially if files with the same content are repeatedly copied into different directories. While
-
Handling NoneType Errors When Extending Lists in Python
When working with Python, especially with functions that return lists or other iterable objects, you might encounter a TypeError that says something like: TypeError: 'NoneType' object is not iterable
-
Tracking File Changes in S3 Using ETags
When working with AWS S3, tracking changes to files can be essential, especially when versioning is not enabled on the bucket. The ETag associated with each file in S3 can provide a simple way to
-
Working with S3 Object Metadata: Understanding ETags and Last Modified Dates
When working with AWS S3, managing large amounts of data effectively involves understanding key metadata like the ETag and Last Modified date. These properties help track file changes and ensure data
-
Implementing Retries in Python
In many real-world applications, simply handling an error isn't always enough. Sometimes, the failure is temporary, and retrying the operation can help resolve the issue. In this post, we’ll explore
-
Efficiently Listing and Filtering S3 Objects by Date
When working with AWS S3 buckets, it’s common to have a large number of objects stored, and you might need to filter them based on certain criteria like dates. This blog post will guide you on how to
-
Handling Division Errors and Implementing Basic Retry Logic in Python
In Python, error handling is essential for preventing crashes and ensuring smooth execution. One common error that developers encounter is the ZeroDivisionError , which occurs when trying to divide by
-
Handling Split Errors in Azure Data Factory: A Step-by-Step Guide
In Azure Data Factory (ADF), we often use expressions to manipulate strings and extract specific parts of data. One common operation is splitting strings based on a delimiter. However, this can
-
Customizing Legends in Seaborn Boxplots: A Guide
Creating clear and informative visualizations is key to effectively communicating data insights. In this post, we will explore how to customize legends in Seaborn boxplots, ensuring that the labels
-
How to Create Age Group Categories in Pandas and Visualize Them with Matplotlib
Data visualization is a key part of data analysis, helping to communicate insights clearly. In this blog post, we'll learn how to categorize age data into specified groups using Pandas and then
-
Comparing Window Functions with Aggregate Functions in SQL
Introduction SQL is a powerful language for querying and manipulating data, and both window functions and aggregate functions are central to its capabilities. While they serve related purposes, they
-
Defining Custom Window Frames in SQL Server
Introduction Window functions in SQL Server are powerful tools that allow for advanced data analysis within queries. One of the key features of window functions is the ability to define custom window
-
Creating a Running Total in SQL Server with Window Functions
Introduction Calculating a running total is a common requirement in many data analysis tasks, such as tracking cumulative sales, computing cumulative scores, or keeping track of inventory levels. In
-
Filtering Items in Azure Data Factory: Excluding Items That Begin with an Underscore
Azure Data Factory (ADF) is a powerful tool for building ETL (Extract, Transform, Load) workflows in the cloud. One common requirement is to filter data or files based on certain conditions. In this
-
Using the OVER() Clause with Window Functions in SQL Server
Introduction SQL window functions have become an indispensable tool for data analysts and developers. They allow for advanced calculations that go beyond simple aggregates, enabling analysis over a
-
Extracting Year, Month, and Day from Dates in Azure Data Factory
In Azure Data Factory (ADF), working with dates is a common task, especially when dealing with data transformations and scheduling tasks. ADF allows you to handle dates in different formats, such as
-
Splitting Strings and Accessing Elements in Azure Data Factory
Introduction Azure Data Factory (ADF) is a powerful cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data
-
Extracting the Header from a CSV File in Python
When working with CSV files in Python, it's often necessary to extract the header (the first row of the file) to understand the structure of the data or to perform specific operations on the remaining
-
Handling CSV Files in Python: Writing Specific Rows with Custom Headers
When working with CSV files in Python, you often need to filter and write specific rows to a new file, including handling headers properly. This blog post will guide you through the process using
-
Understanding Window Functions in SQL: A Deep Dive
Introduction When working with databases, you'll often need to perform calculations across a set of rows related to the current row in your query. Whether you're calculating a running total, ranking
-
Extracting the Last Segment of a String in SQL Server
When working with data, you often need to manipulate strings to extract meaningful information. A common scenario is having strings with segments separated by underscores ( _ ), and you only need the
-
How to Check if Two Tables Have the Same Columns in SQL
When working with databases, it's sometimes necessary to compare two tables to ensure they have the same structure. Specifically, you might need to verify that two tables have the same columns before
-
Identifying Duplicate Records in SQL Based on Specific Fields
In database management, identifying and handling duplicate records is crucial to ensure data integrity. This post will guide you through a SQL query designed to find duplicates based on a specific
-
Creating a Pandas DataFrame from a List of Lists
Working with data in Python often involves using pandas, one of the most powerful libraries for data manipulation. A common task is converting a list of lists into a pandas DataFrame. This post will
-
Inserting Data from a CSV File into SQL Server Using PowerShell with Windows Authentication
Introduction In many data integration scenarios, you'll need to import data from a CSV file into a SQL Server database. PowerShell is a powerful tool that allows you to automate this process
-
Debugging SQL Joins: Troubleshooting OR Joins with Multiple Columns in PostgreSQL
Introduction SQL joins are essential for combining data from different tables in a relational database. However, they can sometimes be tricky, especially when dealing with complex joins that involve
-
How to Safely Retrieve and Return SQL Query Results in Python
When working with databases in Python, one common task is to count the number of records in a table. This might seem straightforward, but it’s easy to run into errors if the code isn’t structured
-
Resolving PostgreSQL Authentication Errors in Docker Compose for Redash
Introduction When setting up Redash with Docker Compose, one of the common errors users might encounter is related to PostgreSQL authentication. Specifically, the psycopg2.OperationalError:
-
Understanding the DECIMAL Data Type in SQL: What Happens When You Don't Specify Parameters?
When working with SQL, one of the fundamental aspects of database design and manipulation is understanding the various data types available. Among them, the DECIMAL type is often used for representing
-
Understanding Window Functions in SQL with Running Totals
Introduction Window functions in SQL are incredibly powerful, allowing you to perform calculations across a set of table rows related to the current row. They enable tasks like calculating running
-
Understanding the Quality of a Multiple Linear Regression Model: Analyzing SalaryUSD Predictions
In this blog post, we'll dive into the process of analyzing the quality of a multiple linear regression model, specifically focusing on predicting SalaryUSD based on factors like EducationLevel and
-
How to Handle shutil.SameFileError When Copying Files in Python
Introduction Have you ever encountered the shutil.SameFileError while trying to copy files in Python? This error occurs when you attempt to copy a file onto itself, resulting in a failed operation. In
-
Resolving "Same File" Errors in Python When Copying Files with Directory Replication
When working with file management in Python, you might encounter the dreaded "SameFileError" when trying to copy a file using the shutil.copy2() function. This error occurs when Python detects that
-
Retrieving the Name of an SQL Script File in Your Query
When working with SQL scripts, there are times when you might want to dynamically retrieve the name of the script that is currently executing. Unfortunately, SQL itself doesn't provide a
-
Adding Numbers Around a Center Element in a 2D Grid in Python
In this post, we'll explore how to manipulate a 2D grid in Python by adding numbers around a specific center element. This is a common problem in various applications, such as implementing a basic
-
How to Convert a SQL SELECT COUNT Query to SQLAlchemy in Python
When working with databases in Python, you might often need to translate raw SQL queries into SQLAlchemy, a powerful ORM (Object-Relational Mapper) that allows you to interact with your database in a
-
Effortlessly Count Lines Starting with "DE" in a Text File Using PowerShell
PowerShell is a powerful scripting language that can automate various tasks, including file manipulation and data processing. In this post, we'll demonstrate how to count the number of lines in a text
-
How to Optimize a Function for Counting Matching Products in a DataFrame
In this post, we'll walk through how to optimize a function that counts the number of rows in a DataFrame where two specified products are present. The original function, though functional, can be
-
Working with JSON Data in Python: A Comprehensive Guide
Introduction In today's digital age, handling data in various formats is crucial for developers and data scientists alike. JSON (JavaScript Object Notation) has become a popular choice due to its
-
Extracting Specific Text Between Strings Using Python
In this blog post, we'll learn how to extract a specific portion of text between two substrings in a given input string. This technique is useful in various scenarios, such as processing file paths,
-
Handling Errors in Python: Ensuring Successful String Splits
When working with strings in Python, you often need to split them based on a delimiter. While the split method is straightforward, there might be scenarios where the split operation doesn't yield the
-
How to Read a File from a Network Path in Python
In many business and enterprise environments, data is often stored on network drives accessible to multiple users. Python provides several ways to access and read files from these network paths. This
-
Avoiding Duplicate File Copies Based on Content in Python
Introduction When dealing with large datasets, it is common to encounter duplicate files, especially when copying files based on specific criteria. Simply comparing file names or paths isn't
-
Copying Files Containing a Specific Word Using Python
Introduction When working with large datasets or numerous text files, you might find yourself needing to search for files containing specific words or phrases. Automating this task can save a lot of
-
Preserving Directory Structure While Copying Files in Python
Introduction When working with large datasets or numerous text files, it might be necessary to copy files containing specific words to a new destination while preserving the original directory
-
Analyzing Salaries by Country: Using Boxplots to Visualize Median and Mean
Introduction: Understanding salary distributions across different countries is crucial for various economic analyses, market insights, and policy decisions. Boxplots are an effective graphical tool
-
Generating and Uploading Random Data to Azure Blob Storage Using Python
Introduction In today's data-driven world, automating data generation and storage is crucial for various applications, including testing, data analysis, and machine learning. This blog post will guide
-
Decrypting Encrypted Data with Subqueries in SQL
When working with encrypted data in SQL, it's essential to ensure that the decryption process is secure and efficient. One effective approach is using subqueries. In this post, we'll demonstrate how
-
Extracting the Last Part of a String in SQL Server
Introduction When working with SQL Server, you might often encounter scenarios where you need to extract a specific part of a string. For example, you might have a string in the format
-
Working with Dates in Python: Extracting and Incrementing Dates
Dates are a fundamental part of many applications, from logging events to scheduling tasks. Python’s datetime module provides powerful tools to handle dates and times. In this post, we'll explore how
-
Creating a Dictionary from a Word and a List in Python
In Python, creating and manipulating dictionaries is a common task. In this post, we'll walk through a simple example of how to write a function that takes a word and a list, and returns a dictionary
-
Selenium vs. Beautiful Soup: Choosing the Right Tool for Web Scraping
When it comes to web scraping, two tools often stand out: Selenium and Beautiful Soup. Each has its strengths and is suited for different types of tasks. In this post, we’ll dive into what each tool
-
Extracting Data from Fixed-Width Text Files into Pandas DataFrame
Working with fixed-width text files can be challenging, especially when you need to extract specific fields and transform them into a structured format like a Pandas DataFrame. In this blog post, I'll
-
How to Insert a New Row in a Pandas DataFrame
Working with data often involves modifying it to suit your analysis needs. One common operation is inserting a new row into a DataFrame. In this post, we'll explore several methods to achieve this in
-
Loading JSON Data into a pandas DataFrame with Python
In this post, we will walk through the process of loading JSON data into a pandas DataFrame using Python. JSON (JavaScript Object Notation) is a popular data format for exchanging data between a
-
Adding SQL Script Filenames to Batch Script Output CSV
When working with batch scripts to execute multiple SQL scripts, it's often helpful to log not only the results but also the filenames of the executed scripts. This can make it easier to track which
-
Automating SQL Script Execution and Logging with Batch Scripts
Introduction Automating database tasks can significantly enhance productivity, especially when dealing with multiple SQL scripts. In this tutorial, we will create a batch script to execute SQL scripts
-
Leveraging SQL Window Functions with PARTITION BY
SQL window functions are a powerful tool for performing calculations across a set of rows related to the current row. When combined with the PARTITION BY clause, these functions can provide deep
-
Avoiding Overwriting and Extra Spaces When Writing to Files in Python
When working with files in Python, it's common to encounter situations where you need to append new lines to an existing file without overwriting its current content. Additionally, managing whitespace
-
How to Select Specific Rows from a DataFrame in Python
When working with DataFrames in Python, you may encounter situations where you need to filter and select specific rows based on certain conditions. In this blog post, we will explore how to create a
-
Extracting Substrings from Strings in SQL Server
When working with SQL Server databases, you may often encounter scenarios where you need to extract specific parts of a string based on a pattern. A common requirement is to retrieve the substring
-
Loading JSON Data into a Pandas DataFrame
When working with data, it's common to encounter various file formats. JSON (JavaScript Object Notation) is a popular format for data exchange due to its readability and ease of use. In this post,
-
Running SQL Queries from a Batch File: Retrieving the Server Name
When working with SQL servers, it's often useful to automate routine tasks using batch files. One common task is retrieving the server name where your database is running. In this post, we'll walk
-
How to Drop Rows Based on a Column Value in a pandas DataFrame
Problem Statement Let's say you have a DataFrame containing weather data, and you want to drop all rows where the quantity ( qty ) is less than 5. However, you notice that some rows are being dropped
-
How to Print Lines Containing Non-Zero, Non-Dot, and Non-Space Characters in Python
In this blog post, we'll explore a simple yet useful task: printing lines that contain at least one character that is not a zero ( 0 ), a dot ( . ), or a space. This can be particularly handy when
-
Removing Rows from a Pandas DataFrame that Begin with Specific Characters
In this post, I'll walk you through how to remove rows from a pandas DataFrame that begin with specific characters, such as "---". This is a common task when cleaning and preprocessing data in Python.
-
Transforming a Matrix by Adding Numbers Around a Specific Value in Python
When working with matrices, we often need to perform transformations that update the values based on certain conditions. In this post, we'll walk through a function that takes a matrix and updates it
-
Ensuring Type Safety in Python Functions
When writing Python functions, ensuring that the parameters are of the correct type is crucial for robust and error-free code. In this post, we'll explore how to enforce type checks in a function to
-
Inserting a Student into a Sorted List in Python
When working with sorted lists in Python, it's essential to ensure that any new elements are added in the correct order. Problem Statement You have a list of student names sorted in alphabetical
-
Filtering and Counting Keys in Python Dictionaries
In this post, we'll explore how to count the keys in a dictionary and filter a dictionary by its values in Python. These are common tasks that can be useful in a variety of situations when working
-
Root Cause Analysis (RCA) for Data
Introduction In the realm of data management and analysis, problems can range from data quality issues to processing errors and performance bottlenecks. Identifying the root cause of these issues is
-
Combining Multiple CSV Files into One with Python
If you work with data, chances are you've encountered situations where you need to merge multiple CSV files into a single file for analysis. Manually combining these files can be time-consuming and
-
Exploring Key Services for the AWS Solution Architect Exam
Key AWS Services for the Solution Architect Exam Amazon RDS (Relational Database Service) Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. It provides
-
How to Aggregate Values by Date and Sum Them in Python
Problem Statement Suppose you have a list of transactions or events, each associated with a date and a numeric value (e.g., sales amount, transaction amount). Your goal is to aggregate these values by
-
Understanding Data Layout, Files, and Tree Indexes: An Overview
In this post, we'll explore several fundamental concepts related to data storage and indexing: Data Layout, Files, Tree Indexes, and B+ Trees. Understanding these concepts is crucial for anyone
-
Effective Knowledge Transfer of Data: Key Elements
Transferring knowledge, especially when it involves data, is a critical proces. Especially between consultants. Whether you're transitioning to a new team, it's crucial to get this process right. Here
-
Understanding MAC Addresses: Hexadecimal, Binary, and Decimal Representations
In this post, we'll explore what a MAC address is, how it's represented in hexadecimal notation, and how to convert it to binary and decimal formats. We'll use the MAC address 88-B2-2F-54-1A-0F as an
-
Creating Directories in Python
The os Module Python’s os module provides a way to interact with the operating system. It includes functions for creating, removing, and checking the existence of directories and files. In this
-
Route Summarization and Subnetting
We will walk through the process of subnetting a network and performing route summarization using an example. Subnetting Example Let's consider the following four subnets: 192.168.0.0/22
-
50projectsIn50days – Day 31: Password Generator
A password generator is composed by a text file and some other characteristics. Besides the style and the html tags the central thing is the functions we find in script.js. First of all the simplest
-
50projectsIn50days – Day 30: Auto Text Effect
In the HTML, there are simple tags that show an h1 element and an input for adjusting the text speed. The text will appear based on the function defined in the script. The function is called
-
Pandas Dataframe: apply method
Calculating Discounts, Taxes, and Total Amount in a DataFrame Suppose you have the following data in a DataFrame: Product Price Category 0 A 100 Electronic 1 B 200 Cloth 2 C 150 Electronic 3 D 300
-
50projectsIn50days – Day 29: Double Heart Click
In the HTML, you'll find a title, h3, and small text, which resemble a cell phone layout. The entire page listens for click events. When a click event occurs, the createHeart function is triggered,
-
50projectsIn50days – Day 28: Git Hub profile
Fetching GitHub user information is the core task of this project. This involves interacting with the GitHub API, which is facilitated using the Axios library. The script is embedded in the HTML as
-
Escape sequences in Pyhon
Escape sequences as in C While reading Automate the Boring Stuff with Python: Practical Programming for Total Beginners, I noticed the existence of raw strings. A raw string is created in such a way
-
50projectsIn50days – Day 27: Toast Notification
Toastify is a well-known library that helps create Toast Notification on your website. You can find the npm project here: toastify-js However, in this projects, we replicate the same behaviour but
-
Dictionary methods: keys(), values(), and items()
In Python, there are three special methods related to dictionaries that are worth mentioning: keys() , values() , and items() . Interestingly, these methods do not return true lists. They cannot be
-
Minimizing Operational Overhead of EC2 Fleet OS Security Governance in AWS: Recommendations for DevOps Teams
Minimizing the operational overhead of EC2 fleet OS security governance is essential for maintaining a secure and efficient AWS environment. In this blog post, we'll explore the challenges faced by
-
Implementing Resilient Architectures in AWS: Strategies for Automated Recovery and Testing
Implementing resilient architectures in AWS is essential for ensuring high availability and reliability of your applications. In this blog post, we'll explore strategies for automating recovery and
-
Enabling Traceability and Auditing Security Events in AWS: Best Practices and Tools
Traceability and auditing of security events are crucial for maintaining the security and compliance of your AWS environment. In this blog post, we'll explore how to enable traceability and auditing
-
Data Protection and Security Events in AWS: Best Practices for Ensuring Data Security
Protecting data in transit and at rest is critical for maintaining the security and compliance of your AWS environment. In this blog post, we'll explore best practices for classifying and protecting
-
Automating Security Best Practices in AWS: A Guide to Efficient and Secure Operations
Automating security best practices in AWS is essential for ensuring the security, scalability, and efficiency of your cloud environment. In this blog post, we'll explore the benefits of automation,
-
Authentication and Federation in AWS: Best Practices and Implementation Strategies
Authentication and federation are critical components of any AWS environment, ensuring secure access to resources and services. In this blog post, we'll explore the different types of identity in AWS,
-
Applying Security at All Layers in AWS: A Comprehensive Approach
Security is paramount in any cloud environment, and AWS offers a range of tools and services to help you apply security at all layers of your infrastructure. In this blog post, we'll explore the
-
Implementing a Strong Identity Foundation in AWS: Best Practices and Implementation Patterns
In any cloud environment, ensuring a strong identity foundation is paramount for maintaining security and compliance. AWS offers a range of tools and services to help you implement the principle of
-
Elements in the JSON Policy Structure in IAM
Identities in AWS In AWS you manage access by creating policies and attaching them to an identity. The way that AWS thinks of the elements which interact with them is through IDENTITIES or AWS
-
The AWS Well Architected Framework
Discover how to effectively design, utilize, and manage workloads in the cloud by translating requirements into architecture and operations while adhering to best practices. The Six Pillars:
-
Data Encryption at AWS S3
What is Encryption at rest? Encryption works by using an algorithm to convert plain text into ciphertext. This new ciphertext will be unreadable if it falls into the wrong hands. There are many
-
Introduction to AWS Identity and Access Management (IAM)
Theory Users must be authenticated before they can access AWS services and Resources. AWS services can be accessed via AWS CLI AWS SDKs AWS Management Console You can create: Individual IAM users:
-
50projectsIn50days – Day 26: Vertical Slider
Vertical Slider is a project were you can flip the images vertically instead horizontally. For that reason the name. The html has a fixed number of images which are loaded from Unsplash. There are two
-
50projectsIn50days – Day 25: Sticky Nav
A Sticky menu navigation is popular in webs. It keeps a menu bar at the top of the page visible on the screen while the user scrolls down. Most of the wev development frameworks uses it, or has a
-
Run Redash Locally
This is only for educative purpose. You don't have to do this in production 1 - Clone the project from the oficial github: Redash on GitHub - Y made a fork previously. Take Care git clone
-
Understanding Distributed System - Maintainability
Introduction It’s widely recognized that the bulk of software costs arise after its initial development in maintenance tasks like bug fixes, feature additions, and day-to-day operation. Therefore,
-
Understanding Distributed System – Resiliency
Introduction Chapter 24 - Common Failure Causes Hardware Faults Incorrect Error Handling Configuration Changes Single Points of Failure Network Faults Resources Leaks Load Pressure Cascading Failures
-
50projectsIn50days – Day 24: Content Placeholder
A Card placeholder is a common element in a lot of web pages. Nowadays we can use it with differents frameworks. It is possible that x is the most famous of them. In the official documentation you can
-
Understanding Distributed System – Scalability
Introduction Scaling an application involves maintaining performance as load increases. The long-term solution for increasing capacity is to architect for horizontal scalability. In this section,
-
50projectsIn50days – Day 23: Kinectic Loader
The HTML page doesn't have any html nor Javascirpt which explains the functionalyti. Therefore all the changes is made by the CSS. The movement is made by the Transform function as you can see here:
-
Understanding Distributed System – Coordination
Introduction Our ultimate goal is to build a distributed application consisting of a group of processes that gives its users the illusion they are interacting with one coherent node. While achieving a
-
50projectsIn50days – Day 22: Drawing App
There is a toolbox in the middle of the screen, as a Paint emulator. Below the box you can see buttons related to the possibilities of change the color and the size of the draw up. The elements
-
50projectsIn50days – Day 21: Drag And Drop
An HTML with 6 boxes. In the first one you have an image which you can drag and drop using the mouse. The HTLM and CSS are simple. The only topic you may notice is that there are a number of functions
-
Understanding Distributed System - Communication
Part I - Communication Introduction Interprocess communication (IPC) is fundamental to distributed systems, enabling processes to exchange data over networks. This communication relies on agreed-upon
-
On Undertanding Programs - Dijkstra
In my life I have seen many programming courses that were essentially like the usual kind of driving lessons, in which one is taught how to handle a car instead of how to use a car to reach one's
-
Testing, Computers and society in Notes On Structured Programming
The computer scientist Dijkstra, has some strong opinions about tesing, the art of programming and the impact of the computer in the society. Let's take a second to read the opinion he wrote in Notes
-
Understanding Distributed Systems - Introduction
Chapter 1: Introduction In the realm of modern technology, the need for distributed systems has become increasingly apparent. But why invest time and resources in building such intricate
-
Distinctions Between AWS EC2 and ECS
Introduction Embarking on the cloud computing journey often involves deciphering the nuanced offerings of platforms like Amazon Web Services (AWS). In this exploration, we'll unravel the seemingly
-
FTP and SFTP - Running through a Container
Running an FTP server using docker is really easy. In fact, you can use it running the following image: atmoz/sftp - But, at the end we want to know what is and FTP and why is it worth to know a
-
Building a Lucrative Business Model in the Data Economy
Building a Lucrative Business Model in the Data Economy Introduction: In today's data-driven world, information is akin to a gold mine. However, to fully capitalize on this valuable resource, one must
-
Concepts, Techniques and Models of Computer Programming
Introduction: In the realm of programming, there are three fundamental elements that form its backbone. Understanding these components is crucial for any aspiring programmer. Let's delve into the
-
William Kent - Data & Reality
Chapter 1 – Entities. The book The Hitchhiker’s Guide to the Galaxy should be required reading for both business and information technology professionals. Although this is a science fiction book. I
-
50projectsIn50days – Day 20: Button Ripple
When you click the button a ripple effect apeared on the button and its expands up to the end. The construction starts with a event listener which is listening the click inside the buttont. When the
-
Human Resources and Analytics: Enhancing Personnel Selection
Human Resources and Analytics: Enhancing Personnel Selection Introduction In today's dynamic landscape, the convergence of Human Resources and Analytics presents an unprecedented opportunity to
-
50projectsIn50days – Day 19: Theme clock
The clock container has several classes that referes to the clock elements: needles, hours, minutes and seconds: <div class="clock-container"> <div class="clock"> <div class="needle hour"></div> <div
-
50projectsIn50days – Day 18: Background Slider
A Image carrousell changes the background image deppending where you clikc. It's an easy and well known task to perform in your mind but, when you have to use Vainilla Javascript it should be
-
Common table expressions
Specifies a temporary named result set, known as a common table expression (CTE). Microsoft Documentation Although there are some time around us the first time someone asked me about it I was
-
Principle of Data Wrangling
Data Wrangling involves the process of cleaning and organizing data before any analysis takes place. It typically consumes between 50% and 80% of an analyst's time. Factors to consider include time,
-
4.6 Data Warehouses
DataWarehouses—large historical databases for decision-support that are loaded with new data on a periodic basis — have evolved to require specialized query processing support, and in the next section
-
50projectsIn50days – Day 17: Movie App
The movie app needs you create an account in the movie db beacuse you need this access to get the data const API_KEY = "ADD API KEY HERE" const API_URL =
-
Importance of a Database System
As should be clear from this paper, modern commercial database systems are grounded both in academic research and in the experiences of developing industrial-strength products for high-end customers.
-
50projectsIn50days – Day 16: Drink Water
A chanllenging project which shows how a cup is filled by a water. The HTML looks quite simple becase we have 8 cups of 250ml. But, we can select one cup or a range which is cool. In fact, the HTML is
-
50projectsIn50days – Day 15: Counter
A page which load three values that are hardcoded in the HTML tag: <div class="counter-container"> <i class="fa-solid fa-truck-fast fa-3x"></i> <div class="counter" data-target="120000"></div>
-
50projectsIn50days – Day 14: animated navigation
A navigation bar inside the html is the responsible of hosting the menu li. But, for the example is not necesarry to use explain more about that. This nav has an event listener which listen for a
-
50projectsIn50days – Day 13: Random Picker
A text area where you add a number of elements divided by comma. Then, when you press enter the you'll see a simple animation and a, in yellow, the chosen one. The text area in index.html is where
-
50projectsIn50days – Day 12: Faq Collapse
One of the most typical features we can find in a web page is the Frequently asqued questions. There are in many ways: Fixed, floating and, as we can see in this case, collapsed In a big faq container
-
Sellenium Vs Beautiful Soup
Web scraping is a widely recognized strategy for acquiring information. Before diving into this process, it's crucial to familiarize oneself with two essential tools. Personally, this topic initially
-
50projectsIn50days – Day 11: Event Keycode
The purpose of the project is to understand the event "key". It is supossed you press a key and see in the browser the key you've alredy pressed the key code, and the name ol the event. The html is
-
50projectsIn50days – Day 10: Dads Joke
A simple container shows us a Dad Joke. If you click the button, another card will appear. The HTML and CSS are simple and don't have anything special to point out. What is different is in JavaScript
-
50projectsIn50days – Day 9: Sound Board
A sound board that has buttons with different sounds. If you click, you'll hear a sound. The projects uses the <audio> element. The <audio> HTML element is used to embed sound content in documents. It
-
50projectsIn50days – Day 8: Form Input Wave
Email and Email are into the label element as we can see here: <form> <div class="form-control"> <input type="text" required /> <label> Email</label> </div> <div class="form-control"> <input
-
50projectsIn50days – Day 7: Split Landing
A simple html with a button and two images. Which changes the behavior if the class adds a left or right depends on the mouse. In fact, the CSS code is the following: .hover-left .left { width:
-
Management Skills for developers
Leadership and direction Vision The business vision ("vision speaks of the future") should be: Specific: Clear and simplified. Avoid redundancy and overly sophisticated words. Objective: It should be
-
50projectsIn50days – Day 6: Scroll Animation
The Scroll Animation is a scroll that has a fixed number of contents inside the html. So, it's not generated dynamically. In the case of this project there are 12 Boxes. Each h2 box has a class "Box"
-
Sweetviz error: .iteritems() → .items()
If you install Sweetviz using the command: pip install sweetviz You're going to have this error because change in the Pandas library. So, up to the new release in Sweetviz you can use the following
-
Learn to speak in public
First steps to public speaking When giving a presentation or starting to speak, it is sometimes common to inform the audience about things they are unaware of, which may cause stress. For example,
-
Know your String Connection using SQL
I was looking how to know my server on the internet, and I've found this interesting question in Stackoverflow: How to get the connection String from a database . And one question give us an example
-
SODA: Connect SQL Server without Password
When you do your first steps using soda, it is possible you want to connect to an SQL Server database. In that case you can create an specific user and give him the proper rights I wrote about that in
-
SODA: SSL: CERTIFICATE_VERIFY_FAILED - Solved
When you tried to connect to soda it is possible you find this error: SSL: CERTIFICATE_VERIFY_FAILED. All the message look like similar to this one: requests.exceptions.SSLError:
-
50projectsIn50days – Day 4: Hidden Search
A search bar that has an interesting animation inside it. The changes happen when the "active" selector is used. There are two, one for the input, and other for the button: .search.active .input {
-
50projectsIn50days – Day 5: Blurry Loading
A really simple HTML and CSS which involves some tricky JavaScript archive the blur effect. Because, besides the blur effect, the project consists in showing an increasing percentage of how much the
-
PostgreSQL on Windows - server error 500 - Ports are not available
I made a mistake the first time when I installed PostgreSQL: I installed it locally. It wouldn't be a problem if I weren't planning to use Docker, but as I want to develop a project in Apache Airflow.
-
50projectsIn50days – Day 3: Rotating Navigation
A simple article written in plain HTML with has the flip change. If you click on the hamburger menu, the article will rotate 45 degrees. It's a nice transformation that is created thanks to adding to
-
What is DOM? Why is it important to understand it?
The DOM tree is a crucial concept that needs to be understood and managed in order to make changes to a website. It allows for the application of styles to HTML elements and the addition of
-
A short introduction to the art of programming
Edsger W. Dijkstra - A short introduction to the art of programming Link: E.W.Dijkstra Archive: A Short Introduction to the Art of Programming (EWD 316) 1. Preface For those readers who identify the
-
Border Radius in CSS
One of the simples project I've found on the internet is change some characteristics of attributes in CSS using a kind of input in a web page. It's a simple project, but always fun. This kind of
-
Python Django Dev To Deployment
After finishing the Udemy Course Python Django Dev to Deployment I would like to list the things I've learned during the process and the things I understand that I need to continue learning, or even,
-
Career management during our professional life
I’m not a veteran in the field. I’m in the middle of my thirties. But I found that there are some qualities I really appreciated when it appeared in the leaders and mentors I had. I would like to
-
The element of programming style
When the book saw the lights, programming wasn't as important as today. But, some of the ideas around the style of writing are a worth to notice and to know it. For that reason reading the book
-
The elements of programming style: Common Blunders
Chapter 6: Common Blunders A major concern of programming is making sure that a program can defend against bad data. But even with correct data, there is no guarantee that a program will work. In this
-
The elements of programming style: Control Structure
Chapter 3: Control Structure A computer program is shaped by its data representation and the statements that determine its flow of control. These define the structure of a program. There is no sharp
-
The elements of programming style: Documentation
Chapter 8: Documentation The best documentation for a computer program is a clean structure. It also helps if the code is well formatted, with good mnemonic identifiers and labels (if any are needed),
-
The elements of programming style: Don't Be Too clever
Preface to the Second Edition The practice of computer programming has changed since The Elements of Programming Style first appeared. Programming style has become a legitimate topic of discussion.
-
The elements of programming style: Efficiency and instrumentation
Chapter 7: Efficiency and instrumentation Machines have become increasingly cheap compared to people; any discussion of computer efficiency that fails to take this into account is shortsighted.
-
The elements of programming style: Epilogue
Epilogue There are many good books on languages, algorithms and numerical methods available to those who want to learn programming in greater depth. Our goal was not to teach languages or algorithms,
-
The elements of programming style: Expressions
Chapter 2: Expressions Writing a computer program eventually boils down to wanting a sequence of statements in the language at hand. How each of those statements is expressed determines in large
-
The elements of programming style: Input and output
Chapter 5: Input and output Test input for validity and plausibility Make sure input cannot violate the limits of the program Terminate input by end-of-file or maker, not by count Identify bad input,
-
The elements of programming style: Program Structure
Chapter 4: Program Structure Most programs are too big to be comprehended as a single chunk. They must be divided into smaller pieces that can be conquered separately. That is the only way to write
-
Refactor or rewrite?
While I was reading The elements of programming style found the following quote: Don't patch bad code - rewrite it The element of programming style - Chapter 4 - Page 1 Its make me think about an
-
Dijkstra: The Humble programmer
Dijistra wrote some interesting things about the activity of programmer. In this opportunity I'm going to make some quotations and notes about the article: The humble programmer Rules "discovered" for
-
Coders at Work
Coders at work is a series of interviews made by Peter Seibel in 2009 where different programmers talk about their views about the technology, development, how they work as a programmer and the
-
Notes about: On the cruelty of really teaching computing science
Radical Novelty 1: "The programmer is the unique position that his is the only discipline and profession in which such a gigantic ratio, which totally baffles our imagination, has to be bridge by a
-
Django Jinja Isn't a thing
I was reading about Jinja and an article on Wikipedia caught my attention: Jinja (template engine) At the beginning I read: Jinja is similar to the Django So, Django Jinja and Jinja projects are
-
50projectsIn50days – Day 2: Progress Steps
The projects related to the progress step it is generated by an another script in JavaScript that change the class in the DOM. There are two event listeners in the script and there is a function that
-
Using Google Colab to work from with outside data
On stack overflow there is this question I've neve made to myself: How can I create a website using google colab [closed] I have my code written in colab. I want to convert this into a website where
-
While you learn while you build it?
The quotation and the necessity of understand what you have done is really important when you try to understand some concepts. For that reason when I found this video:
-
50projectsIn50days - Day 1: Expanding Cards
The first day of the project is about expanding cards. It's a nice introductory projects if you don't know anything about JavaScript. It's easily reusable and I enjoyed while I was doing it. The
-
FAKER: Create Unique Random
You have to use: unique.random_int(min=11, max=123) A full example where you can see the creation of a persona is the following: from faker import Faker import pandas as pd fake = Faker() def
-
People in tech are aware of history? Donald Knuth
Seibel: Do you feel like programmers and computer scientists are aware enough of the history of our field? It is, after all, a pretty short history. Knuth: There aren’t too many that are scholars.
-
How to now where is located my current python Virtual enviroment
If you are working in your machine with different virtual env perhaps you wondered "Wait a minute. What environment I'm working on?" There is two ways (I now to know that) Using PIP: pip -V Or using
-
The relation between academic computer science and the industrial practice. Donald Knuth overview
Seibel: You’re an academic but also have worked on big systems and have done some work in industry. How do you see the relation between academic computer science and industrial practice? Knuth: It’s
-
Programming is harder than writing books? - Vision of Donald Knuth
Seibel: Do you think you were a dramatically better programmer when you finished TeX than when you started? Knuth: Well, yes, because of literate programming. Seibel: So you had better tools, but had
-
Freshman computer scientists shouldn't touch a computer. What does Donald Knuth think about that?
The most named person in the book: Donald Knuth. The author of "The Art of computer programming" Many people, including me, have not read it. But it justifies us. Seibel: Uh-oh; you just revealed your
-
SODA: Check count distinct elements
Categories in Data Quality When you are doing a quality you're looking for six levels of knowledge about it. It might change depends on your requirements and what is the usage about it. Because, in
-
What $ in shell scripts means?
In this video: https://www.youtube.com/watch?v=o9THkT5ZPi4&t=308s I saw the weird symbol: echo $? When I'm start looking about it on the internet I discovered that a lot of people has asked, and
-
Why to choose a Data Lake?
There are some reason that will take you to choose to use a Data Lake as solution for your Data Operations. The most importants are: Increase operational efficiency Make data available faster Lower
-
SODA: Connect to SQL Server
After deciding to install SODA to make your quality check you have to connect the data source to SODA. We are going to see how to connect soda to Microsoft SQL Server. Remember that this is a tutorial
-
Why Ken Thompson's son was not encouraged to study computer science?
Ken Thompson. One of the most famous Computer Scientist of the book and in the field. He become famous because a series of creations, being Unix the most relevant. Seibel: In a 1999 interview you
-
TOX: First steps
Tox is a tool for Python testing. I'm doing my first steps because I found it in the project: Faker, which is: Faker is a Python package that generates fake data for you. Faker Git Hub repository If
-
Coder, programmer or computer scientist? Peter Deutsch gives us an answer
Seibel: You were the only person I contacted about this book who had a really strong reaction to the word coder in the title. How would you prefer to describe yourself? Deutsch: I have to say at this
-
ASTRONOMER: Install on Windows
Install Astronomer is quite simple In windows you have to follow the following steps: Install docker Install Linux on Windows with WSL. Add the Astro Exe to a location. In general: C\ Change the name
-
How to be a lead by Dan Ingalls
Daniel (Dan) Ingalls is one of the creators of Smalltak. Other interesting development of engals was the Context Menu Seibel: Do you have any tips on how to be a good technical leader? Ingalls: The
-
Talent as a programmer and talent as system-level thinkin. Peter Deutsch talks about that
Perhaps the most forgotten for the massive public of the book. But, if you use the REPL you you are making a silent tribute, because the first REPL was created by Laurence Peter Deutsch. Link to
-
Readability and efficiency in your code. Guy Steele analyze this trade-of
Seibel: So when you’re writing English, you’re obviously writing for a human reader and you seem to contrast that to writing software, which is for a computer. But lots of people—such as Knuth—make a
-
The language nowadays are easier? Guy Steele's answer
Seibel: Do you think languages are getting better? You keep designing them, so hopefully you think it’s a worthwhile pursuit. Is it easier to write software now because of advances that we’ve made?
-
Being a better reviewer and a good architect by Peter Norvig
Seibel: So what makes the better reviewers better? Norvig: Well, that they catch more things. Some of it is the trivial stuff of you indented the wrong number of spaces or whatever but some of it is,
-
Choice the correct language by Guy Steele
Seibel: How much does a choice of language really matter? Are there good reasons to choose one language over another or does it all just come down to taste? Steele: Why shouldn’t taste be a good
-
Peter Norvig: programing as a Craftmanship
Seibel: As a programmer, do you consider yourself a scientist, an engineer, an artist, or a craftsman? Norvig: Well, I know when you compare the various titles of books and so on, I always thought the
-
Programming: Now Vs Then by Guy Steele
Guy Steele is an academic know particularly because the "Lambda Papers". Seibel: What has changed the most in the way you think about programming now, vs. then? Other than learning that bubble sort is
-
Basic concepts about Amazon Redshift
One of the first things you will know when you do the course Getting Started with Amazon Redshift are the following Redshift is based on PostgreSQL, and there are four key concepts to understand about
-
Logging in a file to avoid print statements
A video that was enlightening We need to avoid the misuse of the print statements once we master the basic tools and ideas about programming in particular and software developer in general. So, here
-
Peter Norvig and the idea of test to drive design
According the point of view about the way of doing software, that we can resume that try to develop the solver problem element. Another interesting topic where we can focus is the way to using testing
-
How programming has changed over the years by Peter Norvig
Continue the Peter Norvig series. The first post was about: What Peter Norvig Learn about ‘Industrial Programming’? now it's time to talk about the way of learning to work on a team during the time.
-
It's necessary and apprentice approach, according to Peter Norvig
Seibel: I’m surprised you think the master-programmer model is such a dumb idea. In your “Teach Yourself Programming in Ten Years” essay you make the point that programming is a skill that, like many
-
Peter Norvig and the Computer Science Curriculum
Seibel: Speaking of things that aren’t taught as much, you’ve been both an academic and in industry; do you feel like academic computer science and industrial programming meet in the right place?
-
Peter Norvig: everything in your head
Seibel: Though your job now doesn’t entail a lot of programming you still write programs for the essays on your web site. When you’re writing these little programs, how do you approach it? Norvig: I
-
What makes a good programmer by Joe Armstrong. Who does Joe Armstrong hire?
Joe Armstrong has talked a lot about the topic or being a good programmer as we can see in the previous posts: What Joe Armstrong did to be a better programmer? , Joe Armstrong and the Print
-
What Peter Norvig Learn about 'Industrial Programming'?
Peter Norvig is known for his technical abilities, for his degree and for being the Director of Research at google. But, he also made a big contribution to the learning discussion about how to learn
-
Joe Armstrong and the importance of the writing skills
Besides the opinion of Joe Armstrong about What Joe Armstrong did to be a better programmer? and Joe Armstrong and the Print Statements he also has an interesting opinion about other skills not
-
Joe Armstrong and the Print Statements
Following the Joe Armstrong quotes, there is one about the print statements. Seibel: What are the techniques that you use there? Print statements? Armstrong: Print statements. The great gods of
-
What Joe Armstrong did to be a better programmer?
Joe Armstrong the co-designer of Erlang was asked about what he did in order to improve as a programmer. Seibel: Is there anything that you have done specifically to improve your skill as a
-
Joshua Bloch and the religion about the computer languages
Continuing the ideas that Joshua Bloch gave us, that has started on this post: Joshua Bloch and his tier list of book here we can see an interesting one: Seibel: Why do people get so religious about
-
Brendan Eich and the age of the programmers
The creator of JavaScript, Brendan Eich was asked about the programming languages and the time Seibel: Do you feel at all that programming is a young person’s game? Eich: I think young people have
-
Brendan Eich and the languages over time
The creator of JavaScript, Brendan Eich was asked about the programming languages and the time Seibel: In general do you feel like languages are getting better over time? Eich: I think so, yeah. Maybe
-
Joshua Bloch and his tier list of book
Joshua Bloch a software engineer related, contributor and (in some way) evangelist of Java has been asking at the time Coders at Work was published about the books of every programmer should read.
-
Scraping Whale Alerts
Following the questions about Whale-Alert first post . The whale alert page provides interesting information about whales, in the cryptocurrency argot it’s a transaction above certain amount of money.
-
How Douglas Crockford detects the talent
Douglas Crockford, well known because he was the first person who specified the JSON format was asked about the question of detect the talent in a programmer. Seibel: When you’re hiring programmers,
-
Jamie Zawinski in Coders At Work
Jamie Zawinski is known about some things he has created. But, besides that, when he was asked about how he see himself he gave a really interesting answer: Seibel: That brings me to another of my
-
Three ideas to consider to develop a microservice
In the post about: Is microservice architecture the silver bullet? we can find the explanation about why is not a good idea the microservice architecture to the following applications: real-time
-
What is programming?
've finished reading "Coders at work" a series of interviews between @peterseibel and well-known programmers/coders/(etc). The first edition was in 2009. And in the preface you can read: Yet despite
-
Python calculate seconds and total_seconds
If you want to calculate the total seconds between two dates. You could be tempted to do a time delta and see the seconds. But this approach will give you an unexpected result. You have to use
-
SODA: A way to make data quality check
With the open-source library SODA, you can make different operations you need to know when you are going to make some transformation to a data set. Source: SODA on PIP
-
Peter Norvig Paper: Oh shinny! antidote
Dark Knights In the TED talks The mind behind Linux | Linus Torvalds https://www.youtube.com/watch?v=o8NPllzkFhE&ab_channel=TED One of the comments that Linus Said was: Edison may not have been a nice
-
What is an 'Ephemeral cluster'?
When you create a service to compute, for example in HD-Insight you can create a cluster which remains active once it's created or, in the other hand, stop (will be 'deleted') after some amount of
-
Logger In python - First Approach
Besides using the print statements and the debugging tools sometimes (more and more frequently) I'm seeing in the code the logging module. According to the python documentation: This module defines
-
Difference between Framework and Libraries
Software Development has tricky words. Some Jargon that seems as unreachable when we are starting. Even though is not a game changer understand this difference is a nice to have and in one or two
-
Set secrets in Databricks
If you add as plain text the user and password of your connections you are making a mistake that it's easy to solve. In order to solve You have to install the data bricks cli with pip: pip install
-
Resource from Vanguard ETF
Looking for vanguard ETF data is not an easy task. Because there are a lot of pages that need to subscribe or even purchase a subscription. So, we can't access to free information. I don't know if
-
Load data from Snowflake to S3
If you want to load data from Snowflake to S3 should try to use the COPY INTO command so, you run something like this command in the snowflake Web App: copy into @my_ext_unload_stage/d1 from mytable;
-
Using Presing in AWS
Presing is a command you can use in the AWS CLI that allows anyone to have the pre-signed URL to make and HTTP get request to retrieve the data that is inside the bucket pre-signed. In the CLI you
-
Is server-side rendering gives more importance to JavaScript?
In the last time, server-side rendering has become more important. I don't know how important. But this gives me a question about future jobs from JavaScript vs other Back end languages like Python.
-
Load CSV file from S3 to NEO4J
If you try to load data from S3 to NEO4J you are going to need to presing the file. So you need to expose the data to somebody that have the file. So, first you need to presing the file: aws s3
-
UNIX: A History and a Memoir
In the era of bright consultancy, where all things are opinionated, it’s difficult to find some refreshing ideas. For real it exists, but it is difficult to find. We are talking also about some
-
Enviroments in Virtual Env
The importance of using environments As was said in Setting environments in Python it’s important to use environments for your deployment, even if these are side projects or wild repositories. But at
-
Event Driven Architectures
In the Gartner Submit of 2006 Mani Chandy talked about the existence of a misconception of Event Driven Architecture (EDA). So, he proposed to talk about the understanding of EDA and its Return of
-
Matei Zaharia - Spark: The Definitive Guide - Architecture of a Spark Application
The Architecture of a Spark Application The Spark driver The driver is the process “in the driver seat” of your Spark Application. It is the controller of the execution of a Spark Application and
-
Matei Zaharia - Spark: The Definitive Guide - Life Cycle of a Spark Application
The Life Cycle of a Spark Application (Inside Spark) The SparkSession The first step of any Spark Application is creating a SparkSession. In many interactive modes, this is done for you, but in an
-
Matei Zaharia - Spark: The Definitive Guide. Common Operations
Define Schemas manually When using Spark for production Extract, Transform, and Load (ETL), it is often a good idea to define your schemas manually, especially when working with untyped data sources
-
Kleppmann - Designing Data Intensive Applications
A data-intensive application is typically built from standard building blocks that provide commonly needed functionality. For example, many applications need to: • Store data so that they, or another
-
Testing in Python: Pytest Vs Unit test
How important are the tests? Testing is one of the most important skills we need to develop once we join the industry. In fact, knowing about testing is something that is not as evaluated as it could
-
Why is it important to know what Environment Variables are?
While learning to avoid hardcoding some keys in my projects, I found the concept of environment variables. I've found this interesting article about this topic here in medium: An Introduction to
-
Setting environments in Python
When we start a project in Python we make the beginner mistake of installing each tool in any place. However, as we advance in our knowledge and looking to improve what we do we start thinking about
-
Empowerment for the new leaders in tech
Once a new hire is designing as a team leader of a team. One of the first challenges is how it could be possible that this new person could achieve ownership of the project and the inspiration of the
-
Agro Analytics Datasets
Looking for data set to put into practice some knowledge about Agroanalytics, I find some interesting challenges: There are a lot of courses about it, for example, at Wageningen University (In fact,
-
Assert or AssertEqual. Differences.
Difference Between the statement Asert and AssertEqual in Python
-
What is a bastion host?
Definition of Bastion Host A bastion host is a specific computer in a network that has the objective of not affecting another part of the system by the attack from outside the network. For Example,
-
Are SSH and Bash the same? (Spoiler: No)
The thing is: when you start to run some console commands you notice that all the things you write in that place are not the same. Simple to understand, difficult to order each part in your head. I
-
Connect Ubuntu in Virtual Box with SSH
After understanding the importance of a well understanding of ssh . It’s time to make our first practice connecting from our windows to a Ubuntu installed in a virtual machine in Virtual Box. Download
-
What is Whale Alert
What is Whale Alert ? Whale alert is a blockchain tracker, which reports interesting transactions. Especially the larger ones. What is a blockchain tracker? It is a process that follows the blockchain
-
Good Guidelines to improve as Software Developer
After learning the basics about programming and understanding the first steps necessary to become a competent beginner software developer, I've started to think. I'm trying to understand whats are the
-
SSH: A Brave new world
When you make your first steps as a developer, you realize that one of the first activities you have to do when your code is ready is to deploy it. In that case, generally, the senior dev or someone