Wednesday, February 12, 2025

raspberry pi gpio controls

#gpiozero library

https://gpiozero.readthedocs.io/en/stable/#

Thursday, January 30, 2025

fuzzy search

Below are from answers get from Deepseek

### **How Fuzzy Search Works**

Fuzzy search is designed to find strings that approximately match a given query, even if the query contains typos, misspellings, or incomplete words. The key idea is to rank potential matches based on how closely they align with the query.

#### Key Concepts:

1. **Subsequence Matching**:

- Fuzzy search looks for whether the characters in the query appear in the target string **in the same order**, but not necessarily consecutively.

- For example, the query `"leaders"` matches `"leadership skills"` because the letters `l`, `e`, `a`, `d`, `e`, `r`, `s` appear in order within the target string.

2. **Scoring**:

- Matches are ranked based on how closely they align with the query. Factors like:

- **Proximity of characters**: Characters that appear closer together in the target string are preferred.

- **Completeness**: Matches that include more of the query characters are ranked higher.

3. **Efficiency**:

- Fuzzy search algorithms are optimized to quickly filter and rank potential matches, even in large datasets.

---

### **How FuzzyFinder Implements Fuzzy Search**

FuzzyFinder uses a simple yet effective approach to implement fuzzy search. Here’s a high-level overview of how it works:

#### Algorithm Steps:

1. **Preprocessing**:

- Convert both the query and the target strings to lowercase (to make the search case-insensitive).

2. **Subsequence Matching**:

- For each target string, check if the characters in the query appear in the same order within the target string.

- Example:

- Query: `"leaders"`

- Target: `"leadership skills"`

- Check if `l`, `e`, `a`, `d`, `e`, `r`, `s` appear in order in `"leadership skills"`.

3. **Scoring**:

- Assign a score to each match based on:

- **Character proximity**: Characters that appear closer together in the target string are preferred.

- **Match completeness**: Matches that include more of the query characters are ranked higher.

4. **Ranking**:

- Sort the matches based on their scores and return the results.

---

### **Example Implementation of Fuzzy Search**

If you wanted to implement a simple version of fuzzy search yourself, here’s how you could do it in Python:

#### Code:

```python

def fuzzy_search(query, choices):

query = query.lower()

matches = []

for choice in choices:

choice_lower = choice.lower()

i = 0 # Index for query

j = 0 # Index for choice

score = 0

# Check if query characters appear in order in the choice

while i < len(query) and j < len(choice_lower):

if query[i] == choice_lower[j]:

i += 1

score += 1 # Increase score for each matching character

j += 1

# If all query characters were found in order, add to matches

if i == len(query):

matches.append((choice, score))

# Sort matches by score (higher scores first)

matches.sort(key=lambda x: x[1], reverse=True)

return [match[0] for match in matches]

# Example usage

terms = [

"team leaders",

"project leaders",

"leadership skills",

"manager",

"ceo",

"director",

"lead developer",

"head of department",

"lead engineer",

"leadership training"

]

query = "leaders"

matches = fuzzy_search(query, terms)

print("Fuzzy matches for 'leaders':", matches)

```

#### Output:

```

Fuzzy matches for 'leaders': ['team leaders', 'project leaders', 'leadership skills', 'leadership training']

```

---

### **Why FuzzyFinder is Efficient**

FuzzyFinder is optimized for performance. It avoids complex calculations (like Levenshtein distance) and instead focuses on **subsequence matching** and **scoring**, which makes it fast and suitable for real-time applications.

---

### **Comparison with Other Algorithms**

1. **Levenshtein Distance**:

- Measures the number of edits (insertions, deletions, substitutions) required to transform one string into another.

- More accurate but computationally expensive for large datasets.

2. **FuzzyWuzzy**:

- Uses Levenshtein distance and provides similarity scores.

- Slower than FuzzyFinder but more precise.

3. **Regular Expressions**:

- Can be used for pattern matching but doesn’t handle typos or approximate matches.

---

### **When to Use FuzzyFinder**

- When you need **fast, approximate matching** for large datasets.

- When you want a lightweight solution without heavy dependencies.

- When you don’t need the precision of Levenshtein distance.

---

### **Conclusion**

FuzzyFinder’s algorithm is not overly complicated—it relies on **subsequence matching** and **scoring** to efficiently find approximate matches. If you’re curious, you can even implement a basic version yourself (as shown above) to understand how it works under the hood.

If you’d like to dive deeper into the implementation or explore more advanced fuzzy search techniques, let me know! 😊

Sunday, April 28, 2024

C Programming

Header Files and Includes

https://cplusplus.com/forum/articles/10627/

https://stackoverflow.com/questions/2762568/c-c-include-header-file-order

Preprocessing Macros

I found this usage of defining empty macros in pico sdk, and it's sometimes useful:

#define _REG_(x)

_REG_(SIO_GPIO_OE_SET_OFFSET)

c preprocessor - Are empty macro definitions allowed in C? How do they behave? - Stack Overflow

Thursday, February 8, 2024

SVN Quick Reference

this is a short summary for my reference based on the great official tutorial at:

Basic Work Cycle (red-bean.com)

Check out a working copy from a repository

$ svn checkout -r 1234 url://repository/path

Update Your Working Copy

%svn update #this will download latest version file from repository to local working copy

#NOTE: will overwrite any local changes not commited?

%svn up -r 1234 #Checkout a specific revision from SVN

Make Your Changes

%svn add <file>

%svn delete <file>

%svn copy <file>

%svn move <file>

%svn mkdir <dir>

Changing the Repository Without a Working Copy

#not interested in this at this stage

Review Your Changes

%svn status #only show the local files different from local checkouted version in repository

#Note: even without internet work, svn can still show the changes, this is because svn keeps an origianl file in local.

%svn status stuff/fish.c #If you pass a specific path to svn status, you get information about that item alone

%svn status -v #verbose options, shows status of every local file

%svn status -u -v #-u options means -update, it shows also show you if any files has newer version in repository than you local version. outdated version and changed files will need resolve

$svn diff #difference details of locally changes files

%svn diff --old <file0> --new <file1> #like diff command, shows differences between two files

most common codes that svn status displays:

? item #The file, directory, or symbolic link item is not under version control.

A item #The file, directory, or symbolic link item has been scheduled for addition into the repository.

C item #The file item is in a state of conflict. That is, changes received from the server during an update overlap with local changes that you have in your working copy (and weren't resolved during the update). You must resolve this conflict before committing your changes to the repository.

D item #The file, directory, or symbolic link item has been scheduled for deletion from the repository.

M item #The contents of the file item have been modified.

! item #Item is missing (e.g., you moved or deleted it without using svn). This also indicates that a directory is incomplete (a checkout or update was interrupted).

#Note: advanced ussage of patch command from svn1.7 a future topic to study.

Fix Your Mistakes

%svn revert <file> #revert any changes you made to local copy of files

#Note: But note that svn revert can undo any scheduled operation—for example, you might decide that you don't want to add a new file after all

Resolve Any Conflicts

#TODO

Commit Your Changes

%svn commit -m "Corrected number of cheese slices."

Examining History

%svn diff -r 3 rules.txt #If a single --revision (-r) number is passed, your working copy is compared to the specified revision in the repository

%svn diff -r 2:3 rules.txt #If two revision numbers, separated by a colon, are passed via --revision (-r), the two revisions are directly compared:

%svn diff -c 3 rules.txt #A more convenient way of comparing one revision to the previous revision is to use the --change (-c) option:

%svn diff -r{2024-10-02} -r{2024-10-06} --summarize #list changed filenames between two dates

%svn log <file> #history message of a file

%svn log -q <file> #-q quiet, only show file name

%svn diff --summarize -r42:41  #just pirnt the filenames that has changes between 2 commits

Checkout a Specific Version of files

svn checkout -r 100 https://example.com/svn/repo/trunk/folder [/path/to/workspace/folder]

svn checkout -r {2023-01-01} https://example.com/svn/repo/trunk/folder [/path/to/workspace/folder]

Sometimes You Just Need to Clean Up

#you can just delete the local working copy without the need to notify server

#before removing local working copy, remommend to run svn status just to make sure you dont have anything changes that you want to keep

Saturday, January 13, 2024

PCIe Study Notes

Chapter 1: Background

Nice References

Flow Control: Down to the TLP: How PCI express devices talk (Part II) | xillybus.com

terminologies: PCIE常用缩写及含义_pcie elbi-CSDN博客

PCI

#TO Study Further: Reflected-Wave Signaling and the impact on number of max electrical load. higher frequencies -> less fewer load allowed on the bus. PCI frequency cannot faster than 66MHz.

Transaction Models: Programmed I/O, DMA, Peer-to-Peer.

PCI is a Shared Bus Architecture; PCI arbitration algorithm must be "fair" and not starve any device for access.

PCI Inefficiencies: 1) Retry: init->target not ready for >16T -> target retry initiator (STOP#); 2) Discconnect: init -> target transfer some data, then not ready for >8T -> target signal disconnect to initiator (STOP#)

Address Space Map: memory Map, IO Map, PCI Configuration Space( 256B per function)

PCI Configuration Header: Type0: non-bridge (6BARs), Type1: bridge (2 BARs)

PCI-X

PCI-X transfer efficiency ~85% vs PCI ~50%~60%

PCI-X Split-Transaction Model: Reqeust-Completer. Completer will memorize transaction (address, transaction type, count, requester ID) and signal a split response.

PCI-X add Attribute stage

PCI-X add MSI(Message Signaled Interrupt): eliminate the need to share interrupts accross multiple deivces. memory write trans, to pre-defined address range, data is interrupt vector of that device. cpu avoids the overhead of finding which device generated the interrupt. no sideband pins are needed.

PCI-X uses common or distributted clock: 1. signal skew; 2) clock skew between multiple devices; 3) flight time budget

PCI-X 2.0 is point-to-point design, instead of shared bus arch.

PCI-X 2.0 uses Source-synchronous clocking model

Chapter 2: PCIe Architecture Overview

PCIe serial data transmission.

Transaction Layer

Full Duplex

A Transaction is defined as the combination of a Request packet that a delivers a command to a targeted device, together with any completion packets the target sends back in reply.

posted transactions and non-posted transactions:

Memory Read -> Non-Posted

Memory Write -> Posted

Memory Read Lock -> Non-Posted

IO Read -> Non-Posted

IO Write -> Non-Posted

Configuration Read (Type 0 and Type 1) -> Non-Posted

Configuration Write (Type0 and Type 1) -> Non-Posted

Message -> Posted

Non-posted write (IO write and Cfg write) and Memory Read Lock can only come from the processor.

TLP (Transaction Layer Packet)

QoS: Virtual Channel Buffer

Data Link Lyaer

Layer main function:

TLP error correction: Ack/Nack causing LP retry

flow control

some Link power managment

DLLP fixed length: 8B

DLLP (Data Link Layer Packets) only transferred between Data Link Layers of the two neighboring devices on a Link, and not routed anywhere else.

Ack DLLP has the last good TLP sequence Number, and the transmitter flushes all the TLPs that wer send before the acked sequence number. the sequence number is generated by Data Link Layer, so it's always in order!

Nack DLLP: Receiver truturns a Nack to the transmitter when it detectrs a TLP error and drops this TLP. the transmitter then will replay all unacknowledged TLPs.

Error Check and Retry: Replay Buffer

Flow Control:

#TODO: hwen in NAK of MRd, while completer resend the CplD, there maybe new MWr from other requester, so the old CplD data may not be up to date. Is this expected?

Physical Layer

#TODO: transmission order on the physical line.

Logical Part

#TODO: ppm long term will cause overflow, how does ethernet and PCIe resolve this?

Elastic Buffer

Byte-Stripped?

Link Training and Initialization: several things are checked or established to ensure proper and optimal operation, such as:

Link width

Link data rate

Lane erversal - Lanes connected in reverse order

Polarity inversion - Lane polarity connected backward

Bit lock per Lane - Recovering the transmitter clock

Symbol lock per Lane - Findding a recognizable position in the bit-stream

Lane-to-Lane de-skew within a multi-Lane Link

Electriclal Part

Ordered Sets: always 4B. always terminate at the neighboring deivecs and are not routed through the PCIe fabric.

used in 1) link training; 2) ppe compensation; 3) enter/exit of low power state on the link

Chapter 3: Configuration Overview

Enumeration

   PCIe Buses: up to 256 Bus; Initial Bus 0 is tpoically assigned by HW to RC. Bus 0 consists of a Virtual PCI bus with integrated endpoints and Virtual PCI-to-PCI Bridges(P2P), which are hard-coded with a Device No. and Function No.
       PCIe Devices: 32 devices per bus; PCIe point-to-point link means that device will always be Device 0. Deach Device must implement Function 0.
       PCIe Functions: Functions do not need to be implemented sequenctially. SW must check every one of the 8 functions to decide which are present.
        Errors should not normally be reported during enumeration.(the enumeration sw could fail since it's typically writtent o execute before the OS or other error handling SW is available.

       Device not present: SW get Vendor ID of FFFFh, reserved for "device not present"
       #TODO: enumeration started after 100ms+link training after reset, how to do it in DV simulation?
       #TODO: what will enum sw do if there is no EP attached to a pci-pci bridge?
       #multi-root system: what if rc assign start bus number 64 to rc, but later find that the total buses are more then 265-64? is this an error?
      It is important to remember that each port on a switch is aBridge, and thus has its own configuration space with a Type 1 Header. In reality, each port (each P2P Bridge) has its own Type 1 Header and performs the same two checks on TLPs when they are seen by that port.

Chapter 4: Address Space & Transaction Routing

Confugration

Confugartion Space Total 4KB/function:

1. (256B/64DW) PCI-compatible spece access by legacy PCI software or PCIe enhanced configuration access mechanism: 64B for Configuration Headers (Type0/Type 1); 192B for Function-Specific Configuration Header Space.

2. (960DW) PCIe Extended space is only accessible by PCIe Ehanhanced Configuration Access Mechanism: for optional extended capability registers: ...

Enhanced Configuration Mechanism memory-mapped Address Range:

A[63:28]: upper bits of the 256MB-aligned base address of the address range allocated for PCIe enhanced configuration mechanism.

BAR only 32bits, how to handle >4GB address? Mem address 64-bit decoding uses 2 BARs.

BARs are used for decide which system address can be used to access the PCIe functions' internal locations.

BARs device has 6, but not all needs to be implemented, harded coded value of all 0 means it's not implemented.

All BARs must be evaluated sequentially.

Where is the BAR? it's inside each function?

#TODO: how base/limit registers to mean unused registers? e.g. if does not use IO address space, how to cfg IO base/limit register?

Base/Limit only needed in type 1 Bridge ports.

TLP Routing

Routing based on TLP header FMT/Type fields.

TLP:

MRd

MRdLk

MWr

IORd

IOWr

CfgRd0

CfgWr0

CfgRd1

CfgWr1

Msg

MsgD

Cpl

CplD

CplLk

CplDLk

FetchAdd

Swap

CAS

LPrfx

EPrfx

Address routing: MRd, MRdLk, MWr, AtomicOp, IORd, IOWr, Msg/MsgD

ID routing: CfgRd/Wr, Cpl/CplD, CplLk/CplDLk

Implicit routing: Msg/MsgD, for messages to mimic side-band signals.

Saturday, August 19, 2023

TCP/IP Study Notes

TCP/IP segmentation and fragmentation:

reference link: 【计算机网络】区分tcp分段和ip分片 - JoyoHub

Virtualbox Settings Reference

Networking Settings

to record issues when setting up different network options for my guest linux virtual machine.

refernce link: 【计算机网络】：一次性理清Virtualbox虚拟机网络模型 - JoyoHub

reference link: https://www.nakivo.com/blog/virtualbox-network-setting-guide/

1. issues when usign Bridged Adapter mode.

cannot get ipv4 address

set to NAT mode first, and get ip address by checking ifconfig

then switch to Bridged Adapter mode, and very importantly, choose the correct NIC (if you are using wifi, choose the wireleass NIC! in my case default is ehterent NIC, and i must change it to my wifi NIC), then check the ip address now using ifconfig. it works for me!

Extension Pack Installation

1. download and install

2. In order for VirtualBox to have access to the USB subsystem, the user running VirtualBox must belong to the vboxuser group. To do this:

% sudo usermod -aG vboxusers <your-username>

Note: if there is no vboxusers group, you need to first add it:

%sudo groupadd vboxusers

Enable USB

1. you need to first add the devices that you want the guest os to access in the hostos virtualbox settings.