AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

Software

Aws Glue & ETL bookmarks

psitbdUser4 hours ago03 mins

ETL is not a tool → it’s a methodology or workflow.
Extract → Transform → Load = a process to move raw data into a clean, usable form for analytics.

🔹 AWS Glue (tool/service)

AWS Glue = Amazon’s serverless ETL service.
It lets you build and run ETL pipelines without managing servers.
Glue provides all the parts you need to implement ETL:

🔑 How Glue Fits Into ETL

Extract

Glue connectors pull data from sources: S3, RDS, DynamoDB, JDBC databases, etc.
- Pull data from sources: databases, APIs, files, IoT sensors, logs, etc.
- Example: Extract customer data from MySQL, clickstream data from S3, and logs from CloudWatch.

Transform

This step ensures the data is usable and consistent.

Load

🔹 Extra Glue Features

Glue Data Catalog → a centralized metadata store (like a database of all your datasets).
Glue Crawlers → scan data sources and automatically infer schema (tables, columns, data types).
Glue Studio → visual interface to design ETL jobs.
Glue Streaming ETL → for real-time data pipelines.

🔹 Tools for ETL

AWS Glue (serverless ETL service).
Apache Spark, Apache Flink.
Talend, Informatica.
Custom Python jobs with Pandas.

🔹 What is a Job Bookmark?

A bookmark is a mechanism to keep track of previously processed data in an ETL job.
It ensures that when your ETL job runs again, it only processes new or changed data, instead of reprocessing everything.

🔹 Why It Matters

Without bookmarks:

Each ETL run processes the entire dataset → inefficient, expensive, and may cause duplicates.

With bookmarks:

ETL job “remembers” where it left off.
Next run starts from the last checkpoint (like saving your place in a book).

🔹 Where Used

AWS Glue ETL jobs (Spark or Python shell).
Glue streaming jobs (with checkpoints).
Similar concept in Apache Spark and other ETL tools → often called checkpointing or incremental processing.