AI

DeepSeek's chatbot cares about caring…or something

China startup DeepSeek AI has thrown AI investors into a revolutionary Sputnik-like moment,  provoking a selloff on Monday of companies that make accelerators. Valuations were downgraded as well  for companies that offer new ways to meet greater power needs for data centers spinning up more AI compute.

Nvidia alone dropped $600 billion in value on Monday with the stock declining 17%. There was a quick improvement on Tuesday, but Wednesday saw Nvidia finish down 4%.

Part of the explanation for the selloff was that DeepSeek (and possibly other companies coming down the pike) offers a more efficient form of AI, requiring less memory. It would serve as a boon to developers because it is open source.  Its RI has been advertised as achieving performance comparable to OpenAI-01 across math, code and reasoning and is MIT licensed for clear open access. It is also free on the AppStore where it has skyrocketed.

RELATED:  Chinese startup DeepSeek freaks out AI investors, hitting Nvidia

Much of the reaction to DeepSeek has been one of shock, but a few analysts have questioned whether it works, at least in terms of practical application.  Many analysts praise it for offering a lower-cost approach to using LLMs if it can meet demands of developers.  The broader industry evaluation of DeepSeek is ongoing.

Research firm Gartner has advised early adopters to take care with using DeepSeek-R1, noting: “While DeepSeek currently stands as a strong competitor in the GenAI space…early adopters should assess its risks and security considerations, including legal risk, third-party risk and opportunistic attacks.” Gartner also advised to assess privacy and data security.

My casual, if awkward, early prompts with DeepSeek

To see how it performs, I downloaded DeepSeek and tried it out with simple prompts.  I couldn’t download it Monday when the company suspended activations because of overwhelming interest, but it became available Tuesday.  My first prompt was to ask for a description of Test Time Scaling, which received a quick and apparently accurate response that sounds like what I’ve heard before.  (Full response below.***)

My next prompt was based on hearing reports that DeepSeek could be a security or open information threat since the company is based in China.  I asked, simply, “What is the Chinese firewall?” (But the proper name is Great Firewall, which is used in China to block access to selected foreign websites like Google Search, Facebook and others.)

Here’s the response I got: 

“Sorry, I’m not sure how to approach this type of question yet. Let’s chat about math, coding and logic problems instead!” 

I got the same response on similar questions of a political or cybersecurity nature, including this prompt: “What do you make of concerns that DeepSeek is catching my personal details for use by the Chinese Communist party?”

I interpreted DeepSeek’s response as a cop-out and told the chatbot so, then got a lengthy response, some of which seemed to be confused.  My follow-on prompt this time was, “Is that a cop-out? Logically, would I not care what happens to my data?”  (Notice that I used ‘logically’ because the chatbot wanted me to talk about logic.)

Ok, wait, who exactly is copping out? 

Here’s what happened next, with DeepSeek seeming to be confused by whether I was accusing DeepSeek of a cop-out or if I was accusing myself of a cop-out. (Maybe my prompt wasn’t well put?)  Here, the responses get philosophical and, frankly, confusing:

graphic
more on data
more
more on deepseek

 

 

Your question touches on an important and nuanced issue

The remainder of the response attempts to "break it down logically" step by step, 1-4: 

 

"Your question touches on an important and nuanced issue. Let’s break it down logically:

### 1. **Is it a "cop-out" to dismiss caring about data?**  

  Yes, if done reflexively. Declaring "I don’t care" without engaging with the risks can be a way to avoid confronting complex or inconvenient truths. However, feeling overwhelmed by data privacy concerns is understandable—many people feel powerless against systemic data collection. The key is balancing awareness with actionable steps, rather than outright dismissal.

---

### 2. **Why *should* you care about your data? Logical reasons:**  

  - **Risk of Harm**: Even if you feel you have "nothing to hide," data breaches or misuse can lead to identity theft, financial fraud, or reputational damage.  

  - **Autonomy and Manipulation**: Personal data fuels targeted ads, misinformation campaigns, and algorithmic decisions (e.g., credit scores, job opportunities). Indifference leaves you vulnerable to subtle manipulation.  

  - **Collective Impact**: Widespread apathy normalizes surveillance and weakens privacy protections for society. Your stance affects others (e.g., data leaked about you could harm friends or family).  

  - **Future Consequences**: Data collected today might be used in unforeseen ways tomorrow (e.g., AI training, insurance pricing, or political targeting).  

---

### 3. **Counterarguments and Rebuttals**:  

  - **"I have nothing to hide"**: Privacy isn’t about hiding—it’s about control over your personal boundaries. Would you let strangers read your emails or track your movements?  

  - **"I can’t stop corporations/governments"**: While true, small actions (e.g., strong passwords, minimizing data sharing) reduce your exposure. Collective demand for privacy laws (e.g., GDPR) also drives systemic change.  

---

### 4. **A Balanced Approach**:  

  You don’t need to become a privacy extremist, but logical self-interest suggests *some* care is rational. Think of it like locking your door: you might not fear a specific threat, but basic precautions mitigate risk.  

In short, dismissing data concerns entirely *is* a cop-out if it ignores clear risks. But caring doesn’t require paranoia—just informed, proportionate vigilance."

What to make of all this

While DeepSeek’s response to my prompt was somewhat interesting, it is mostly just  bizarre and hard to follow. Hopefully,  it’s not a waste of  time to read these responses, but it does show how a totally random prompt can lead a user down a rabbit hole.  I got some random responses when I first tried ChatGPT in December 2022.

One observer, “Joseph F” on LinkedIn,  called DeepSeek  “just another evolutionary step in AI improving across the board.”  He is a general partner at Emergence Capital and surmised DeepSeek is likely selling API access at a loss to try to gain market share.  “The research that DeepSeek released is novel and the major US AI companies are already working to fold DeepSeeks techniques into their models to improve inference speed and costs," he wrote.

 How DeepSeek describes Test Time Scaling

***I also asked DeepSeek to describe Test Time Scaling which Nvidia had praised DeepSeek for using. Here's the full response: 

"Test Time Scaling (TTS) is a technique used in machine learning, particularly in the context of neural networks, to improve model performance during inference (i.e., at test time). It involves adjusting certain parameters or features of the model after training but before making predictions on new data. This adjustment is typically done to better adapt the model to the distribution of the test data, which may differ from the training data.

Here are some common forms of Test Time Scaling:

1. **Temperature Scaling**: Often used in classification tasks, temperature scaling adjusts the softmax output of a neural network by dividing the logits (the raw output scores) by a temperature parameter \( T \). This can make the model's confidence estimates more calibrated, especially in tasks like image classification. A temperature \( T > 1 \) makes the output probabilities smoother, while \( T < 1 \) makes them sharper.

2. **Batch Normalization at Test Time**: During training, batch normalization uses the statistics (mean and variance) of the current batch to normalize the data. At test time, however, the model typically uses running averages of these statistics computed during training. This ensures consistent behavior and avoids dependence on the specific test batch.

3. **Test-Time Augmentation (TTA)**: This involves applying data augmentations (e.g., rotations, flips, or crops) to the test data and averaging the predictions across these augmented versions. This can improve robustness and accuracy by reducing the impact of variability in the test data.

4. **Domain Adaptation**: If the test data comes from a different distribution than the training data (e.g., different lighting conditions in images), techniques like feature scaling or fine-tuning can be applied at test time to adapt the model to the new domain.

5. **Dynamic Inference**: Some models adjust their architecture or parameters dynamically based on the input data at test time. For example, a model might use fewer computations for simpler inputs and more for complex ones.

Test Time Scaling is particularly useful when there is a distribution shift between training and test data, or when the model's confidence estimates need to be more reliable. However, it requires careful implementation to avoid overfitting to the test data or introducing unintended biases."

That's it!  But definitely not the last we will hear about DeepSeek!