What is collections.deque()
what does ’not’ in Python actually mean?
What is currying
:= What is this operator?
~ What is this operator? Use it in pandas to…
What is ‘scientific programming?’
Success
↳ Why is Python good for it?
if __name__ == '__main__':
Explain this in simple terms.
What is the difference between & vs and
List comprehension
I have an array of points scored in a sequential basketball games points=[7, 11, 5, 8, 12, 17, 12]. In order to forge my stats, I want to multiply each value in the array by 2. How can I achieve this?
Using list comprehensions
points = [score*2 for score in points]↳ Isn’t points * 2 easier?
points*2 does not scale the scores, but rather repeats the entire list 2 times. The result would be [7, 11, 5, 8, 12, 17, 12, 7, 11, 5, 8, 12, 17, 12]
Class inheritance
What on earth is a lambda function?
Oh my goodness, it’s not that hard.
Function as an object, which you should understand.
What is the difference between the following:
import pandas
import pandas as pd
from pandas import DataFrame
from pandas import DataFrame as df
Explain what a ‘library’ in Python is. Give a few examples.
Can you explain the internal structure of a Python package? What is the difference between a package, module, and class? Use the example from sklearn.ensemble import RandomForestClassifier to assist your explanation
It’s straightforward:
Packages = Folders (also known as libraries)
Modules = .py files
Objects = classes, functions, or specific variables
Package (directory/folder of Python files)
│
├── __init__.py (makes Python treat the directory as containing packages)
│
├── Module1 (a Python file .py)
│ ├── Class1 (a class in Module1)
│ │ ├── Function1 (a function in Class1)
│ │ └── Variable1 (a variable in Class1)
│ ├── Function2 (a function in Module1)
│ └── Variable2 (a variable in Module1)
│
├── Module2 (a Python file .py)
│ ├── Class2 (a class in Module2)
│ │ ├── Function3 (a function in Class2)
│ │ └── Variable3 (a variable in Class2)
│ ├── Function4 (a function in Module2)
│ └── Variable4 (a variable in Module2)
│
└── Subpackage (a directory of Python files within Package)
├── __init__.py
├── Submodule (a Python file .py within Subpackage)
│ ├── Class3 (a class in Submodule)
│ │ ├── Function5 (a function in Class3)
│ │ └── Variable5 (a variable in Class3)
│ ├── Function6 (a function in Submodule)
│ └── Variable6 (a variable in Submodule)
└── …(Note that there may be sub-packages)
* Remember, the final object being accessed doesn’t always have to be a class, it could be a function, or a specific variable.
Given that we used the second import type, how would you create a Random Forest Classifier called jles_forest?
jles_forest = ensemble.RandomForestClassifier()What is soldier in: from empire.army.soldier import shield
empire : Package
army : Module
soldier : Submodule
shield : Class
What is the __init__.py file?
What is the __main__() function?
What actually happens when you call my_list.sort() in Python?
Success
This is a glimpse into the real inner workings of, getting clsoer
- Algorithm: Python’s
sort()method uses an adaptive, stable, iterative mergesort algorithm known as Timsort. This algorithm is highly efficient for real-world data and maintains the relative order of equal elements (which is why it’s called “stable”).- Efficiency: Timsort is optimized for various scenarios, particularly for partially ordered lists or lists with many repeated elements. It has a time complexity of O(n log n) in the worst case, but it performs better on partially sorted lists.
Don’t take these magical functions for granted. We won’t go more in detail into the core algorithms that operate at a lower level in this section - saving all of that for Data Structures and Algorithms.
What is serializing?
Note
Serializing refers to translating a data structure or object state into a form that can be stored outside of the program, and then reconstructed later.
If you haven’t encountered the need to serialize yet, you probably haven’t done enough machine learning.
##### ↳ List some objects/data that you might need to serialize as a data scientist.
- Trained Machine Learning Models: Once you’ve trained a model, you need to save it for later use, whether for making predictions, further training, or analysis. Serialization allows you to store the model’s state.
- Data Preprocessing Pipelines: These include encoders, normalizers, and other transformers that you’ve fit to your data. Saving these ensures consistency in data preprocessing in future model use.
- Data Sets: Particularly large datasets that have been cleaned and preprocessed might be serialized for efficient reloading in future sessions.
- Configuration Files: Storing hyperparameters, environment settings, or experiment setups in a serialized format can help in replicability and documentation of your work.
- Custom Objects: If you have built custom classes or functions in your data science workflow, serializing them can be useful for reusability across different projects or sharing with colleagues.
##### ↳ Imagine you've just spent 7 hours and lots of \$\$\$ training a model in your Jupyter Notebook, only to realise you never implemented serialization. What do you do?
>>> OUTPUT: Model successfully trained
If I realized that I forgot to serialize a model after training, I would take the following steps:
Check if the Model is Still in Memory
- Interactive Python Environment (e.g., Jupyter Notebook, Python Shell):
- If I ran my code in an interactive environment where the Python session is still active, the model might still be in memory. In this case, I could directly serialize the model by executing the serialization code (using
pickle,joblib, or another method) in the same session.
- If I ran my code in an interactive environment where the Python session is still active, the model might still be in memory. In this case, I could directly serialize the model by executing the serialization code (using
import pickle
# Assuming my_model is your model variable
with open('model_filename.pkl', 'wb') as file:
pickle.dump(my_model, file)- Command Line or Non-Interactive Environment:
- If I ran the script via the command line or a non-interactive environment (like executing a
.pyfile), and the script has finished running, the Python process would terminate, and all variables (including my model) are cleared from memory. - In this case, unfortunately, there’s no way to serialize the model post hoc. The model would need to be retrained.
- If I ran the script via the command line or a non-interactive environment (like executing a
This is a costly mistake, but it emphasizes the importance of integrating serialization into the workflow.
What does range(3, 3) return?
An empty range.
Will this code throw an error? If so, what type?
dp = []
dp[0] = 3Yes. It will throw an IndexError. The array dp hasn’t been initialized yet, only declared. Thus, when you try to access the first index of the array, an error is thrown. You cannot index an array that has not yet been initialized with any length or values.
You need to initialize the dp list with a predefined size:
dp = [0] * n # option 1Or use methods like append() to add elements to it.
dp.append(3)Given a list myList = [1, 2, 4, 5], what is the value of myList[1:1]?
Slicing a list with the syntax myList[start:end] returns a sub-list starting from the index start and going up to, but not including, the index end.
Therefore, myList[1:1] will return an empty list, [].
Generators and Iterators
IDE vs Jupyter Notebook?
Does iterating over a dictionary in Python return the keys, values, or both?
Saved in memory
brightness = max(min(brightness, 1), 0)
What does the @ do in Python?
It denotes a decorator function.
Decorator
A decorator is a special type of function that can modify the behavior of a function or a method. Decorators provide a way to alter or extend the behavior of the function or method they are attached to, without permanently modifying it.
def my_decorator(func):
def wrapper():
print("Something is happening before the function is called.")
func()
print("Something is happening after the function is called.")
return wrapper
@my_decorator
def say_hello():
print("Hello!")
say_hello()↳ What about when they are use within classes? E.g what does @classmethod and @staticmethod mean?
Class method
- A method that is bound to the class rather than its instances.
- Class methods can access and modify the class state that applies across all instances of the class.
- They are defined using the
@classmethoddecorator, and they takeclsas the first parameter, which represents the class itself.
Static method
- A static method in Python is a method that belongs to a class but does not require an instance of the class to be called
- Static methods cannot change the class or instance state.
- They are typically used for utility functions that perform a task in isolation.
- They are defined using the
@staticmethoddecorator, and they do not require aselfparameter unlike most methods.
Practice question: Lightbulb Class
class LightBulb:
eco_mode = False # Class-wide setting
def __init__(self):
# The state of the bulb depends on whether eco_mode is enabled
self.is_on = not LightBulb.eco_mode
def toggle(self):
self.is_on = not self.is_on
↳ Write a class method toggle_eco_mode that switches the eco_mode for all instances of LightBulb
# ...
@classmethod
def toggle_eco_mode(cls):
cls.eco_mode = not cls.eco_mode↳ Now switch the eco mode across all bulbs and then create a LightBulb
LightBulb.toggle_eco_mode()
bulb1 = LightBulb()
print(f"Bulb1 is on: {bulb1.is_on}")How do you store a number that is guaranteed to always be smaller than any other number.
float(-inf)
Turn a dictionary, myDict, into a list of tuples sorted by the values.
You can do it. I know. It’s like 2 things to remember. What if you get asked this in your interview? Come on, ready?
First, to convert a dict() into a list()
myList = myDict.items()Next, sort using a key function by passing a lambda function to the key argument.
myList.sort(key=lambda x : x[1]) # x is each [key, value] object in myListCome on, you know how it works - just read.
What is the programming definition of a variable?
Variable
In programming, a variable is used to temporarily store data in computer memory.
What is bash?
How do you install libraries? What is actually contained within each library? How do you build your own libraries, packages, modules etc (practice affinda) What is pip?
What is PIP?
What is Anaconda?
Can sets contain lists in Python? If not, why?
☞ No, sets in Python cannot include lists as elements.
☞ This restriction is due to the fact that sets can only contain immutable (unchangeable) types, and lists are mutable (changeable).
☞ Since lists can be modified after their creation, they cannot be hashed, which is a requirement for elements of a set.
If I have an array in python, what does array[0:0] return
In Python, what is the time complexity of len(my_array)
How to represent a super small/big number
largest = float("inf")
smallest = -float("inf")Provide an example code explaining why it’s better to use ‘is None’ than ‘not’
Given a dictionary of letter : counts, how would you extract the letter with the highest count?
What will this program output?
paragraph = ["So the duck walked up",
"to the lemonade stand",
"and he said to the man",
"running the stand"]
print(line for line in paragraph)This will not behave as expected. The list comprehension will
What could you do to fix it?
print(*[line for line in sequence], sep='\n')match, case
print(, sep=
Diagnostic print statements
How many times does Wassup get printed?
for _ in range(1):
print('Wassup')Exactly once.
What do kwargs, args
and python sample.py --out_dir=out-shakespeare
What actually happens behind the scenes when you do a, b = b, a in python
What happens behind the scenes when you pass a list into a function in python like: my_function([3, 1, 4, 5])
Since lists are pass by reference, not pass by value, the list must first be created in memory, so that there is a reference to pass.
In Python, both False and 0 are considered “falsy” values, meaning they evaluate to False in a boolean context. However, if you want to specifically check if a value is False and not just any falsy value like 0, you can use the is operator. This operator checks for object identity, allowing you to distinguish between False and 0.
Does None==None is Python?
Yes. The equality operator checks for equality in value rather than identity
What is the important difference between these two code blocks?
return True if some_func() else Falseif some_func():
return TrueThe first code block has reached a return statement, which means that no matter what, the function will finish.
The second code block has reached a conditional statement which may not necessarily result in the function finishing - it is possible that it can continue.